Classical Physics BLandford and Thorne

February 26, 2017 | Author: d9bocek | Category: N/A

Share Embed Donate

Report this link

Short Description

Graduate level _Classical_ physics from Cal Tech....

Description

i

APPLICATIONS OF CLASSICAL PHYSICS

Roger D. Blandford and Kip S. Thorne

California Institute of Technology 2008—2009 version 0800.1.K.pdf, September 28, 2008

ii

Preface Please send comments, suggestions, and errata via email to [email protected], or on paper to Kip Thorne,130-33 Caltech, Pasadena CA 91125 This book is an introduction to the fundamentals and 21st-century applications of all the major branches of classical physics except classical mechanics, electromagnetic theory, and elementary thermodynamics (which we assume the reader has already learned elsewhere). Classical physics and this book deal with physical phenomena on macroscopic scales: scales where the particulate natures of matter and radiation are secondary to the behavior of particles in bulk; scales where particles’ statistical as opposed to individual properties are important, and where matter’s inherent graininess can be smoothed over. In this book, we shall take a journey through spacetime and phase space, through statistical and continuum mechanics (including solids, fluids, and plasmas), and through optics and relativity, both special and general. In our journey, we shall seek to comprehend the fundamental laws of classical physics in their own terms, and in relation to quantum physics. Using carefully chosen examples, we shall show how the classical laws are applied to important, contemporary, 21st-century problems and to everyday phenomena, and we shall uncover some deep connections among the various fundamental laws, and connections among the practical techniques that are used in different subfields of physics. Many of the most important recent developments in physics—and more generally in science and engineering—involve classical subjects such as optics, fluids, plasmas, random processes, and curved spacetime. Unfortunately, many young physicists today have little understanding these subjects and their applications. Our goal, in writing this book, is to rectify that. More specifically: • We believe that every masters-level or PhD physicist should be familiar with the basic concepts of all the major branches of classical physics, and should have had some experience in applying them to real-world phenomena; this book is designed to facilitate that. • A large fraction of physics, astronomy and engineering graduate students in the United

iii States and around the world use classical physics extensively in their research, and even more of them go on to careers in which classical physics is an essential component; this book is designed to facilitate that research and those careers. In pursuit of these goals, we seek, in this book, to give the reader a clear understanding of the basic concepts and principles of classical physics. We present these principles in the language of modern physics (not nineteenth century applied mathematics), and present them for physicists as distinct from mathematicians or engineers — though we hope that mathematicians and engineers will also find our presentation useful. As far as possible, we emphasize theory that involves general principles which extend well beyond the particular subjects we study. In this book, we also seek to teach the reader how to apply classical physics ideas. We do so by presenting contemporary applications from a variety of fields, such as • fundamental physics, experimental physics and applied physics; • astrophysics and cosmology; • geophysics, oceanography and meteorology; • biophysics and chemical physics; • engineering, optical science & technology, radio science & technology, and information science & technology. Why is the range of applications so wide? Because we believe that physicists should have at their disposal enough understanding of general principles to attack problems that arise in unfamiliar environments. In the modern era, a large fraction of physics students will go on to careers away from the core of fundamental physics. For such students, a broad exposure to non-core applications will be of great value; for those who wind up in the core, such an exposure is of value culturally, and also because ideas from other fields often turn out to have impact back in the core of physics. Our examples will illustrate how basic concepts and problem solving techniques are freely interchanged between disciplines. Classical physics is defined as the physics where Planck’s constant can be approximated as zero. To a large extent, it is the body of physics for which the fundamental equations were established prior to the development of quantum mechanics in the 1920’s. Does this imply that it should be studied in isolation from quantum mechanics? Our answer is, most emphatically, “No!”. The reasons are simple. First, quantum mechanics has primacy over classical physics: classical physics is an approximation, often excellent, sometimes poor, to quantum mechanics. Second, in recent decades many concepts and mathematical techniques developed for quantum mechanics have been imported into classical physics and used to enlarge our classical understanding and enhance our computational capability. An example that we shall discuss occurs in plasma physics, where nonlinearly interacting waves are treated as quanta (“plasmons”), despite the fact that they are solutions of classical field equations. Third, ideas developed initially for “classical” problems are frequently adapted for application to avowedly quantum mechanical subjects; examples (not discussed in this book) are found in supersymmetric string theory and in the liquid drop model of the atomic nucleus. Because of these intimate connections between quantum and classical physics, quantum physics will appear frequently in this book, in a variety of ways.

iv The amount and variety of material covered in this book may seem overwhelming. If so, please keep in mind the key goals of the book: to teach the fundamental concepts, which are not so extensive that they should overwhelm, and to illustrate those concepts. Our goal is not to provide a mastery the many illustrative applications contained in the book, but rather to convey the spirit of how to apply the basic concepts of classical physics. This book will also seem much more manageable and less overwhelming when one realizes that the same concepts and problem solving techniques appear over and over again, in a variety of different subjects and applications. These unifying concepts and techniques are listed in Appendix B, in outline form, along with the specific applications and section numbers in this book, where they arise. The reader may also find Appendix A useful. It contains an outline of the entire book based on concepts — an outline complementary to the Table of Contents. After a preliminary Chapter 1 described below, this book is divided into six parts; see the Table of Contents: I. Statistical physics — including kinetic theory, statistical mechanics, statistical thermodynamcs, and the theory of random processes. These subjects underly some portions of the rest of the book, especially plasma physics and fluid mechanics. Among the applications we study are the statistical-theory computation of macroscopic properties of matter (equations of state, thermal and electric conductivity, viscosity, ...); phase transitions (boiling and condensation, melting and freezing, ...); the Ising model and renormalization group; chemical and nuclear reactions, e.g. in nuclear reactors; Bose-Einstein condensates; Olber’s Paradox in cosmology; the Greenhouse effect and its influence on the earth’s climate; noise and signal processing, the relationship between information and entropy; entropy in the expanding universe; and the entropy of black holes. II. Optics, by which we mean classical waves of all sorts: light waves, radio waves, sound waves, water waves, waves in plasmas, and gravitational waves. The major concepts we develop for dealing with all these waves include geometrical optics, diffraction, interference, and nonlinear wave-wave mixing. Some of the applications we will meet are gravitational lenses, caustics and catastrophes, Berry’s phase, phasecontrast microscopy, Fourier-transform spectroscopy, radio-telescope interferometry, gravitational-wave interferometers, holography, frequency doubling and phase conjugation in nonlinear crystals, squeezed light, and how information is encoded on DVD’s and CD’s. III. Elasticity — elastic deformations, both static and dynamic, of solids. Here some of our applications are bifurcations of equilibria and bifurcation-triggered instabilities, stress-polishing of mirrors, mountain folding, buckling, seismology and seismic tomography. IV. Fluid Dynamics, with the fluids including, for example, air, water, blood, and interplanetary and interstellar gas. Some of the fluid concepts we study are vorticity, turbulence, boundary layers, subsonic and supersonic flows, convection, sound waves, shock waves and magnetohydrodynamics. Among our applications are the flow of blood through constricted vessels, the dynamics of a high-speed spinning baseball,

v convection in stars, helioseismology, supernovae, nuclear explosions, sedimentation and nuclear winter, the excitation of ocean waves by wind, salt fingers in the ocean, tornados and water spouts, the Sargasso Sea and the Gulf Stream in the Atlantic Ocean, nonlinear waves in fluids (solitons and their interactions), stellerators, tokamaks, and controlled thermonuclear fusion. V. Plasma Physics, with the plasmas including those in earth-bound laboratories and technological devices, the earth’s ionosphere, stellar interiors and coronae, and interplanetary and interstellar space. In addition to magnetohydrodynamics (treated in Part IV), we develop three other physical and mathematical descriptions of plasmas: kinetic theory, two-fluid formalism, and quasi-linear theory which we express in the quantum language of weakly coupled plasmons and particles. Among our plasma applications are: some of the many types of waves (plasmons) that a plasma can support—both linear waves and nonlinear (soliton) waves; the influence of the earth’s ionosphere on radio-wave propagation; the wide range of plasma instabilities that have plagued the development of controlled thermonuclear fusion; and wave-particle (plasmon-electron and plasmon-ion) interactions, including the two-stream instability for fast coronal electrons in the solar wind, isotropization of cosmic rays via scattering by magnetosonic waves, and Landau damping of electrostatic waves. VI. General Relativity, i.e. the physics of curved spacetime, including the laws by which mass-energy and momentum curve spacetime, and by which that curvature influences the motion of matter and inflluences the classical laws of physics (e.g., the laws of fluid mechanics, electromagntic fields, and optics). Here our applications include, among others, gravitational experiments on earth and in our solar system; relativistic stars and black holes, both spinning (Kerr) and nonspinning (Schwarzschild); the extraction of spin energy from black holes; interactions of black holes with surrounding and infalling matter; gravitational waves and their generation and detection; and the largescale structure and evolution of the universe (cosmology), including the big bang, the inflationary era, and the modern era. Throughout, we emphasize the physical content of general relativity and the connection of the theory to experiment and observation. Each of the six parts is semi-independent of the others. It should be possible to read and teach the parts independently, if one is willing to dip into earlier parts occasionally, as needed, to pick up an occasional concept, tool or result from earlier. We have tried to provide enough cross references to make this possible. The full book has been designed for a full-year course at the first-year graduate level; and that is how we have used it, covering one chapter per week. (Many fourth-year undergraduates have taken our course successfully, but not easily.) In most of this book, we adopt a geometrical view of physics; i.e., we express the laws of physics in geometric, frame-independent language (the language of vectors and tensors), and we use geometric reasoning in our applications. Because our geometric viewpoint is so fundamental and will be unfamiliar to many readers, we begin this book with a preliminary Chapter 1 that lays out that viewpoint carefully and pedagogically, both in 3-dimensional flat space (Newtonian physics) and in 4-dimensional, flat spacetime (special relativistic physics). In Parts I – V we focus largely on nonrelativistic, Newtonian physics,

vi with some major exceptions (for example, we develop kinetic theory relativistically and explore relativistic as well as Newtonian applications of it; and we study relativistic, highspeed fluids, though most of our fluid studies are Newtonian.) Part VI is fully relativistic, including, of course, the warping of 4-dimensional spacetime by the mass-energy and momentum that it contains. Because so little of Parts I — V is relativistic, some readers may prefer to skip the relativistic parts of Chapter 1 until moving into Part VI. To facilitate this, each section in Chapter 1 is labeled N for Newtonian and/or R for relativistic. Exercises are a major component of this book. There are five types of exercises: 1 Practice. Exercises that give practice at mathematical manipulations (e.g., of tensors). 2 Derivation. Exercises that fill in details of arguments or derivations which are skipped over in the text. 3 Example. Exercises that lead the reader step by step through the details of some important extension or application of the material in the text. 4 Problem. Exercises with few if any hints, in which the task of figuring out how to set the calculation up and get started on it often is as difficult as doing the calculation itself. 5 Challenge. An especially difficult exercise whose solution may require that one read other books or articles as a foundation for getting started. We urge readers to try working many of the exercises, and to read and think about all of the Example exercises. The Examples should be regarded as continuations of the text; they contain many of the most illuminating applications. A few words on units: In this text we will be dealing with practical matters and will frequently need to have a quantitative understanding of the magnitude of various physical quantities. This requires us to adopt a particular unit system. Students we teach are about equally divided in preferring cgs/Gaussian units or MKS/SI units. Both of these systems provide a complete and internally consistent set for all of physics and it is an oftendebated issue as to which of these is the more convenient or aesthetically appealing. We will not enter this debate! One’s choice of units should not matter and a mature physicist should be able to change from one system to another without thinking. However, when learning new concepts, having to figure out “where the 4π’s go” is a genuine impediment to progress. Our solution to this problem is as follows: We shall use the units that seem most natural for the topic at hand or those which, we judge, constitute the majority usage for the subculture that the topic represents. We shall not pedantically convert cm to m or vice versa at every juncture; we trust that the reader can easily make whatever translation is necessary. However, where the equations are actually different, for example as is the case in electromagnetic theory, we shall often provide, in brackets or footnotes, the equivalent equations in the other unit system and enough information for the reader to proceed in his or her preferred scheme. As an aid, we also give some unit-conversion information in Appendix C, and values of physical constants in Appendix D. We wrote this book in connection with a full-year course that we and others have taught at Caltech nearly every year since the early 1980s. We conceived that course and this book in response to a general concern at Caltech that our PhD physics students were

vii being trained too narrowly, without exposure to the basic concepts of classical physics beyond electricity & magnetism, classical mechanics, and elementary thermodynamics. Courses based on parts of this book, in its preliminary form, have been taught by various physicists, not only at Caltech but also at a few other institutions in recent years, and since moving to Stanford in 2003, Blandford has taught from it there. Many students who took our Caltech course, based on early versions of our book, have told us, with enthusiasm, how valuable it was in their later careers. Some were even enthusiastic during the course. Many generations of students and many colleagues have helped us hone the book’s presentation and its exercises through comments and criticisms, sometimes caustic, usually helpful; we thank them. Most especially: For helpful advice about presentations and/or exercises in the book, and/or material that went into the book, we thank Professors Steve Koonin, Steven Frautschi, Peter Goldreich, Sterl Phinney, David Politzer, and David Stevenson at Caltech (all of whom taught portions of our Caltech course at one time or another), and XXXXX [ROGER: WHO ELSE SHOULD WE BE LISTING?] Over the years, we have received extremely valuable advice about this book from the teaching assistants in our course: XXXXXXX[WE MUST ASSEMBLE A COMPLETE LIST] We are very indebted to them. We hope that, in its published form, this book will trigger a significant broadening of the training of physics graduate students elsewhere in the world, as it has done at Caltech. Roger D. Blandford and Kip S. Thorne Stanford University and Caltech, September 2008

viii

CONTENTS [For an alternative overview of this book, See Appendix A. Concept-Based Outline.] Preface [6pp] 1. Physics in Euclidean Space and Flat Spacetime: Geometric Viewpoint [56 pp] 1.1 Overview 1.2 Foundational Concepts 1.3 Tensor Algebra Without a Coordinate System 1.4 Particle Kinetics and Lorentz Force Without a Reference Frame 1.5 Component Representation of Tensor Algebra; Slot-Naming Index Notation 1.6 Particle Kinetics in Index Notation and in a Lorentz Frame 1.7 Orthogonal and Lorentz Transformations of Bases, and Spacetime Diagrams 1.8 Time Travel 1.9 Directional Derivatives, Gradients, Levi-Civita Tensor, Cross Product and Curl 1.10 Nature of Electric and Magnetic Fields; Maxwell’s Equations 1.11 Volumes, Integration, and the Integral Conservation Laws: conservation of charge, particles, baryons and rest-mass 1.12 The Stress-energy Tensor and Momentum Conservation; stress-energy tensors for perfect fluid and electromagnetic field

I. STATISTICAL PHYSICS 2. Kinetic Theory [53 pp] 2.1 Overview 2.2 Phase Space and Distribution Function: number density in phase space; distribution function for particles in a plasma; distribution function for photons; mean occupation number 2.3 Thermal Equilibrium Distribution Functions 2.4 Macroscopic Properties of Matter as Integrals Over Momentum Space: Newtonian particle density, flux and stress tensor; relativistic number-flux 4-vector and stress-energy tensor

ix 2.5 Isotropic Distribution Functions and Equations of State: density, pressure, energy density, equation of state for nonrelativistic hydrogen gas, for relativistic degenerate hydrogren gas, and for radiation 2.6 Evolution of the Distribution Function: Liouville’s Theorem, the Collisionless Boltzmann Equation, and the Boltzmann Transport Equation 2.7 Transport Coefficients: diffusive heat conduction inside a star, analyzed in order of magnitude and via the Boltzmann transport equation 3. Statistical Mechanics [58pp] 3.1 Overview 3.2 Systems, Ensembles, and Distribution Functions 3.3 Liouville’s Theorem and the Evolution of the Distribution Function 3.4 Statistical Equilibrium: canonical ensemble and distribution; Gibbs ensemble; grand canonical ensemble; Bose-Einstein and Fermi-Dirac distributions 3.5 Bose-Einstein Condensate 3.6 The Microcanonical Ensemble 3.7 The Ergodic Hypothesis 3.8 Entropy and the Evolution into Statistical Equilibrium: the second law of thermodynamics; what causes entropy to increase? 3.9 Grand Canonical Ensemble for an Ideal Monatomic Gas 3.10 Entropy Per Particle 3.11 Statistical Mechanics in the Presence of Gravity: Galaxies, Black Holes, the Universe, and Evolution of Structure in the Early Universe 3.12 Entropy and Information: information gained in measurements; information in communication theory; examples of information content; some properties of information; capacity of communication channels; erasing information from computer memories 4. Statistical Thermodynamics [41 pp] 4.1 Overview 4.2 Microcanonical Ensemble and the Energy Representation of Thermodynamics 4.3 Canonical Ensemble and the Free-Energy Representation of Thermodynamics 4.4 The Gibbs Representation of Thermodynamics; Phase Transitions and Chemical Reactions

x 4.5 Fluctuations of Systems in Statistical Equilibrium 4.6 The Ising Model and Renormalization Group Methods 4.7 Monte Carlo Methods 5. Random Processes [42 pp] 5.1 Overview 5.2 Random Processes and their Probability Distributions 5.3 Correlation Function, Spectral Density, and Ergodicity 5.4 Noise and its Types of Spectra 5.5 Filters, Signal-to-Noise Ratio and Shot Noise 5.6 Fluctuation-Dissipation Theorem 5.7 Fokker-Planck Equation and Brownian Motion

II. OPTICS 6. Geometrical Optics [40 pp] 6.1 Overview 6.2 Waves in a Homogeneous Medium: monochromatic plane waves; dispersion relation; wave packets 6.3 Waves in an Inhomogeneous, Time-Varying Medium: The Eikonal Approximation; relation to wavepackets; breakdown of Eikonal approximation; Fermat’s principle 6.4 Caustics and Catastrophes—Gravitational Lenses: formation of multiple images; formation of caustics 6.5 Paraxial Optics: axisymmetric paraxial systems; converging magnetic lens 6.6 Polarization and the Geometric (Berry) Phase 7. Diffraction [30 pp] 7.1 Overview 7.2 Helmholtz-Kirchhoff Integral: diffraction by an aperture; spreading of the wavefront 7.3 Fraunhofer Diffraction: diffraction grating; Babinet’s principle; Hubble space telescope

xi 7.4 Fresnel Diffraction: lunar occultation of a radio source; circular apertures 7.5 Paraxial Fourier Optics: coherent illumination; point spread functions; Abbé theory; phase contrast microscopy; Gaussian beams 7.6 Diffraction at a Caustic 8. Interference [33 pp] 8.1 Overview 8.2 Coherence: Young’s slits; extended source; van Cittert-Zernike theorem; spatial lateral coherence; 2-dimensional coherence; Michelson stellar interferometer; temporal coherence; Michelson interferometer; Fourier transform spectroscopy; degree of coherence 8.3 Radio Telescopes: two-element interferometer; multiple element interferometer; closure phase; angular resolution 8.4 Etalons and Fabry-Perot Interferometers; multiple-beam interferometry; FabryPerot interferometer; lasers 8.5 Laser Interferometer Gravitational Wave Detectors 8.6 Intensity Correlation and Photon Statistics 9. Nonlinear Optics [33 pp] 9.1 Overview 9.2 Lasers: Basic Principles; Types of Pumping and Types of Lasers 9.3 Holography 9.4 Phase-Conjugate Optics 9.5 Wave-Wave Mixing in Nonlinear Crystals: nonlinear dielectric susceptibility; wave-wave mixing; resonance conditions and growth equations 9.6 Applications of Wave-Wave Mixing: Frequency doubling; phase conjugation; squeezing

III. ELASTICITY 10. Elastostatics [53 pp] 10.1 Overview 10.2 Displacement and Strain; Expansion, Rotation, and Shear

xii 10.3 Cylindrical and Spherical Coordinates: Connection Coefficients and Components of Strain 10.4 Stress and Elastic Moduli: stress tensor; elastic moduli and elastostatic stress balance; energy of deformation; molecular origin of elastic stress; Young’s modulus and Poisson ratio 10.5 Solving the 3-Dimensional Elastostatic Equation; Thermoelastic noise in Gravitational Wave Detectors 10.6 Reducing the Elastostatic Equations to One Deimensions for a Bent Beam: Cantilever bridges 10.7 Reducing the Elastostaic Equations to Two Dimensions: Deformation of Plates — Keck Telescope Mirror 10.8 Bifurcation, Buckling and Mountain Folding 11. Elastodynamics [36pp] 11.1 Overview 11.2 Conservation Laws 11.3 Basic Equations of Elastodynamics: equation of motion; elastodynamic waves; longitudinal sound waves; transverse shear waves; energy of elastodynamic waves 11.4 Waves in Rods, Strings and Beams: compression waves; torsion waves; waves on strings; flexural waves on a beam; bifurcation of equilibria and buckling 11.5 Body and Surface Waves — Seismology: body waves; edge waves; Green’s function for a homogeneous half space; free oscillations of solid bodies; seismic tomography 11.6 The Relationship of Classical Waves to Quantum Mechanical Excitations

IV. FLUID DYNAMICS 12. Foundations of Fluid Dynamics [41 pp] 12.1 Overview 12.2 The Macroscopic Nature of a Fluid: Density, Pressure, Flow Velocity; Fluids vs. gases 12.3 Hydrostatics: Archimedes law; stars and planets; rotating fluids 12.4 Conservation Laws

xiii 12.5 Conservation Laws for an Ideal Fluid: mass conservation; momentum conservation; Euler equation; Bernoulli principle; conservation of energy 12.6 Incompressible Flows 12.7 Viscous Flows — Pipe Flow: decomposition of the velocity gradient into expansion, vorticity, and shear; Navier-Stokes equation; energy conservation and entropy production; molecular origin of viscosity; Reynolds’ number; blood flow 13. Vorticity [32 pp] 13.1 Overview 13.2 Vorticity and Circulation: vorticity transport; tornados; Kelvin’s theorem; diffusion of vortex lines; sources of vorticity 13.3 Low Reynolds’ Number Flow: Stokes’ flow; Nuclear Winter; sedimentation rate 13.4 High Reynolds’ Number Flow: Laminar Boundary Layers: vorticity profile; separation 13.5 Kelvin-Helmholtz Instability: temporal and spatial growth; excitation of ocean waves by wind; physical interpretation; the Rayleigh and Richardson stability criteria 14. Turbulence [34 pp] 14.1 Overview 14.2 The Transition to Turbulence — Flow past a Cylinder 14.3 Semi-Quantitative Analysis of Turbulence: weak turbulence; turbulent viscosity; relationship to vorticity; Kolmogorov spectrum 14.4 Turbulent Boundary Layers: profile of a turbulent boundary layer; instability of a laminar boundary layer; the flight of a ball 14.5 The Route to Turbulence — Onset of Chaos: Couette flow; Feigenbaum sequence 15. Waves and Rotating Flows [42 pp] 15.1 Overview 15.2 Gravity Waves on Surface of a Fluid: deep water waves; shallow water waves; capillary waves; helioseismology 15.3 Nonlinear Shallow Water Waves and Solitons: Korteweg-deVries equation; physical effects in the kdV equation; single soliton solution; two soliton solution; solitons in contemporary physics

xiv 15.4 Rotating Fluids: equations of fluid dynamics in a rotating reference frame; geostrophic flows; Taylor-Proudman theorem; Ekman boundary layers; Rossby waves 15.5 Sound Waves: wave energy, sound generation, radiation reaction, runaway solutions and matched asymptotic expansions 16. Compressible and Supersonic Flow [37 pp] 16.1 Overview 16.2 Equations of Compressible Flow 16.3 Stationary, Irrotational Flow: quasi-one-dimensional flow; setting up a stationary transonic flow; rocket engines 16.4 One Dimensional, Time-Dependent Flow: Riemann invariants; shock tube 16.5 Shock Fronts: shock jump conditions; Rankine-Hugoniot Relations; Internal Shock Structure; Mach cone 16.6 Similarity Solutions — Sedov-Taylor Blast Wave: atomic bomb; supernovae 17. Convection [25 pp] 17.1 Overview 17.2 Heat Conduction 17.3 Boussinesq Approximation 17.4 Rayleigh-Bernard Convection 17.5 Convection in Stars 17.6 Double Diffusion — Salt Fingers 18. Magnetohydrodynamics [41 pp] 18.1 Overview 18.2 Basic Equations of MHD: Maxwell’s equations in MHD approximation; momentum and energy conservation; boundary conditions; magnetic field and vorticity 18.3 Magnetostatic Equilibria: controlled thermonuclear fusion; Z pinch; θ pinch; tokamak 18.4 Hydromagnetic Flows: electromagnetic brake; MHD power generator; flow meter; electromagnetic pump; Hartmann flow 18.5 Stability of Hydromagnetic Equilibria: linear perturbation theory; Z pinch – sausage and kink instabilities; energy principle

xv 18.6 Dynamos and Magnetic Field Line Reconnection: Cowling’s theorem; kinematic dynamos; magnetic reconnection 18.7 Magnetosonic Waves and the Scattering of Cosmic Rays

V. PLASMA PHYSICS 19. The Particle Kinetics of Plasmas [32 pp] 19.1 Overview 19.2 Examples of Plasmas and their Density-Temperature Regimes: ionization boundary; degeneracy boundary; relativistic boundary; pair production boundary; examples of natural and man-made plasmas 19.3 Collective Effects in Plasmas: Debye shielding; collective behavior; plasma oscillations and plasma frequency 19.4 Coulomb Collisions: collision frequency; Coulomb logarithm; thermal equilibration times 19.5 Transport Coefficients: anomalous resistivity and anomalous equilibration 19.6 Magnetic field: Cyclotron frequency and Larmor radius; validity of the fluid approximation; conductivity tensor 19.7 Adiabatic invariants: homogeneous, time-independent magnetic field; homogeneous time-independent electric and magnetic fields; inhomogeneous timeindependent magnetic field; a slowly time-varying magnetic field 20. Waves in Cold Plasmas: Two-Fluid Formalism [30 pp] 20.1 Overview 20.2 Dielectric Tensor, Wave Equation, and General Dispersion Relation 20.3 Wave Modes in an Unmagnetized Plasma: two-fluid formalism; dielectric tensor and dispersion relation for a cold plasma; electromagnetic plasma waves; Langmuir waves and ion acoustic waves in a warm plasma; cutoffs and resonances 20.4 Wave Modes in a Cold, Magnetized Plasma: dielectric tensor and dispersion relation 20.5 Propagation of Radio Waves in the Ionosphere 20.6 CMA Diagram for Wave Modes in Cold, Magnetized Plasma 20.7 Two-Stream Instability

xvi 21. Kinetic Theory of Warm Plasmas [31 pp] 21.1 Overview 21.2 Basic Concepts of Kinetic Theory and its Relationship to Two-Fluid Theory: distribution function and Vlasov equation; relation to two-fluid theory; Jeans’ theorem 21.3 Electrostatic Waves in an Unmagnetized Plasma and Landau Damping; formal dispersion relation; two-stream instability; the Landau contour; dispersion relation for weakly damped or growing waves; Langmuir waves and their Landau damping; ion acoustic waves and conditions for their Landau damping to be weak 21.4 Stability of Electromagnetic Waves in an Unmagnetized Plasma 21.5 Particle Trapping 21.6 N-Particle Distribution Function 22. Nonlinear Dynamics of Plasmas [34 pp] 22.1 Overview 22.2 Quasi-Linear Theory in Classical Language: classical derivation of the theory; summary of the theory; conservation laws; generalization to three dimensions 22.3 Quasilinear Theory in Quantum Mechanical Language: wave-particle interactions; relationship between classical and quantum formulations; three-wave moxing 22.4 Quasilinear Evolution of Unstable Distribution Function: The Bump in Tail: instability of streaming cosmic rays 22.5 Parametric Instabilities 22.6 Solitons and Collisionless Shock Waves

VI. GENERAL RELATIVITY 23. From Special to General Relativity [37 pp] 23.1 Overview 23.2 Special Relativity Once Again: geometric, frame-independent formulation; inertial frames and components of vectors, tensors and physical laws; light speed, the interval, and spacetime diagrams

xvii 23.3 Differential Geometry in General Bases and in Curved Manifolds: nonorthonormal bases; vectors as differential operators; tangent space; commutators; differentiation of vectors and tensors; connection coefficients; integration 23.4 Stress-Energy Tensor Revisited 23.5 Proper Reference Frame of an Accelerated Observer 24. Fundamental Concepts of General Relativity [40 pp] 24.1 Overview 24.2 Local Lorentz Frames, the Principle of Relativity, and Einstein’s Equivalence Principle 24.3 The Spacetime Metric, and Gravity as a Curvature of Spacetime 24.4 Free-fall Motion and Geodesics of Spacetime 24.5 Relative Acceleration, Tidal Gravity, and Spacetime Curvature: Newtonian description of tidal gravity; relativistic description; comparison of descriptions 24.6 Properties of the Riemann curvature tensor 24.7 Curvature Coupling Delicacies in the Equivalence Principle, and some Nongravitational Laws of Physics in Curved Spacetime 24.8 The Einstein Field Equation 24.9 Weak Gravitational Fields: Newtonian limit of general relativity; linearized theory; gravitational field outside a stationary, linearized source; conservation laws for mass, momentum and angular momentum 25. Relativistic Stars and Black Holes [45 pp] 25.1 Overview 25.2 Schwarzschild’s Spacetime Geometry 25.3 Static Stars: Birkhoff’s theorem; stellar interior; local energy and momentum conservation; Einstein field equations; stellar models and their properties 25.4 Gravitational Implosion of a Star to Form a Black Hole 25.5 Spinning Black Holes: the Kerr metric; dragging of inertial frames; light-cone structure and the horizon; evolution of black holes — rotational energy and its extraction 25.6 The Many-Fingered Nature of Time 26. Gravitational Waves and Experimental Tests of General Relativity [49 pp] 26.1 Overview

xviii 26.2 Experimental Tests of General Relativity: equivalence principle, gravitational redshift, and global positioning system; perihelion advance of Mercury; gravitational deflection of light, Fermat’s principle and gravitational lenses; Shapiro time delay; frame dragging and Gravity Probe B; binary pulsar 26.3 Gravitational Waves and their Propagation: the gravitational wave equation; the waves’ two polarizations, + and ×; gravitons and their spin; energy and momentum in gravitational waves; wave propagation in a source’s local asymptotic rest frame; wave propagation via geometric optics; metric perturbation and TT gauge 26.4 The Generation of Gravitational Waves: multipole-moment expansion; quadrupole moment formalism; gravitational waves from a binary star system 26.5 The Detection of Gravitational Waves: interferometer analyzed in TT gauge; interferometer analyzed in proper reference frame of beam splitter; realistic interferometers 26.6 Sources of Gravitational Waves: [not yet written] 27. Cosmology [47 pp] 27.1 Overview 27.2 Homogeneity and Isotropy of the Universe — Robertson-Walker Line Element 27.3 The Stress-energy Tensor and the Einstein Field Equation 27.4 Evolution of the Universe: constituents of the universe — cold matter, radiation, and dark energy; the vacuum stress-energy tensor; evolution of the densities; evolution in time and redshift; physical processes in the expanding universe 27.5 Observational Cosmology: parameters characterizing the universe; local Lorentz frame of homogenous observers near Earth; Hubble expansion rate; primordial nucleosynthesis; density of cold dark matter; radiation temperature and density; anisotropy of the CMB: measurements of the Doppler peaks; age of the universe — constraint on the dark energy; magnitude-redshift relation for type Ia supernovae — confirmation that the universe is accelerating 27.6 The Big-Bang Singularity, Quantum Gravity and the Intial Conditions of the Universe

xix 27.7 Inflationary Cosmology: amplification of primordial gravitational waves by inflation; search for primordial gravitational waves by their influence on the CMB; probing the inflationary expansion rate

APPENDICES Appendix A: Concept-Based Outline of this Book Appendix B: Unifying Concepts Appendix C: Units Appendix D: Values of Physical Constants

Contents 1 Physics in Euclidean Space and Flat Spacetime: Geometric Viewpoint 1.1 [N & R] Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Foundational Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 [N] Newtonian Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 [R] Special Relativistic Concepts: Inertial frames, inertial coordinates, events, vectors, and spacetime diagrams . . . . . . . . . . . . . . . . 1.2.3 [R] Special Relativistic Concepts: Principle of Relativity; the Interval and its Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 [N & R] Tensor Algebra Without a Coordinate System . . . . . . . . . . . . 1.4 Particle Kinetics and Lorentz Force Without a Reference Frame . . . . . . . 1.4.1 [N] Newtonian Particle Kinetics . . . . . . . . . . . . . . . . . . . . . 1.4.2 [R] Relativistic Particle Kinetics: World Lines, 4-Velocity, 4-Momentum and its Conservation, 4-Force . . . . . . . . . . . . . . . . . . . . . . 1.4.3 [R] Geometric Derivation of the Lorentz Force Law . . . . . . . . . . 1.5 Component Representation of Tensor Algebra . . . . . . . . . . . . . . . . . 1.5.1 [N] Euclidean 3-space . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 [R] Minkowski Spacetime . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 [N & R] Slot-Naming Index Notation . . . . . . . . . . . . . . . . . . 1.6 [R] Particle Kinetics in Index Notation and in a Lorentz Frame . . . . . . . . 1.7 Orthogonal and Lorentz Transformations of Bases, and Spacetime Diagrams 1.7.1 [N] Euclidean 3-space: Orthogonal Transformations . . . . . . . . . . 1.7.2 [R] Minkowski Spacetime: Lorentz Transformations . . . . . . . . . . 1.7.3 [R] Spacetime Diagrams for Boosts . . . . . . . . . . . . . . . . . . . 1.8 [R] Time Travel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 [N & R] Directional Derivatives, Gradients, Levi-Civita Tensor, Cross Product and Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 [R] Nature of Electric and Magnetic Fields; Maxwell’s Equations . . . . . . . 1.11 Volumes, Integration, and Integral Conservation Laws . . . . . . . . . . . . . 1.11.1 [N] Newtonian Volumes and Integration . . . . . . . . . . . . . . . . 1.11.2 [R] Spacetime Volumes and Integration . . . . . . . . . . . . . . . . . 1.11.3 [R] Conservation of Charge in Spacetime . . . . . . . . . . . . . . . . 1.11.4 [R] Conservation of Particles, Baryons and Rest Mass . . . . . . . . . 1.12 The Stress-Energy Tensor and Conservation of 4-Momentum . . . . . . . . . 1.12.1 [N] Newtonian Stress Tensor and Momentum Conservation . . . . . . i

1 1 4 4 5 8 14 16 16 17 20 21 21 23 25 27 32 32 34 35 38 41 44 48 48 49 51 53 56 56

ii 1.12.2 [R] Relativistic Stress-Energy Tensor . . . . . . . . . . . . . . . . . . 1.12.3 [R] 4-Momentum Conservation . . . . . . . . . . . . . . . . . . . . . . 1.12.4 [R] Stress-Energy Tensors for Perfect Fluid and Electromagnetic Field

58 60 61

Chapter 1 Physics in Euclidean Space and Flat Spacetime: Geometric Viewpoint Version 0801.1.K by Kip, 1 October 2008 Please send comments, suggestions, and errata via email to [email protected], or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 1.1 Reader’s Guide • Sections and exercises labeled [N] are Newtonian, those labeled [R] are relativistic. The N material can be read without the R material, but the R material requires the N material as a foundation. • Readers who plan to study only the non-relativistic portions of this book should learn this book’s geometric viewpoint on Newtonian physics and some mathematical tools we shall use by reading or browsing the [N] sections. They will also need to know two items of relativity; see Box 1.3. • The R sections are a self-contained introduction to special relativity, though it will help if the reader has already had an elementary introduction. • Readers who already know special relativity well should browse this chapter, especially Secs. 1.1–1.4, 1.5.3, 1.9–1.12, to learn this book’s geometric viewpoint and a few concepts (such as the stress-energy tensor) that they might not have met.

1.1

[N & R] Overview

In this book, we shall adopt a different viewpoint on the laws of physics than that found in most elementary texts. In elementary textbooks, the laws are expressed in terms of 1

2 quantities (locations in space or spacetime, momenta of particles, etc.) that are measured in some coordinate system or reference frame. For example, Newtonian vectorial quantities (momenta, electric fields, etc.) are triplets of numbers [e.g., (1, 9, −4)] representing the vectors’ components on the axes of a spatial coordinate system, and relativistic 4-vectors are quadruplets of numbers representing components on the spacetime axes of some reference frame. By contrast, in this book, we shall express all physical quantities and laws in a geometric form, i.e. a form that is independent of any coordinate system or reference frame. For example, in Newtonian physics, momenta and electric fields will be vectors described as arrows that live in the 3-dimensional, flat Euclidean space of everyday experience. They require no coordinate system at all for their existence or description—though sometimes coordinates will be useful. We shall state physical laws, e.g. the Lorentz force law, as geometric (i.e. coordinate-free) relationships between these geometric (i.e. coordinate-independent) quantities. By adopting this geometric viewpoint, we shall gain great conceptual power and often also computational power. For example, when we ignore experiment and simply ask what forms the laws of physics can possibly take (what forms are allowed by the requirement that the laws be geometric), we shall find remarkably little freedom. Coordinate independence strongly constrains the laws (see, e.g., Sec. 1.4 below). This power, together with the elegance of the geometric formulation, suggests that in some deep (ill-understood) sense, Nature’s physical laws are geometric and have nothing whatsoever to do with coordinates or reference frames. The mathematical foundation for our geometric viewpoint is differential geometry (also called “tensor analysis” by physicists). This differential geometry can be thought of as an extension of the vector analysis with which all readers should be familiar. There are three different frameworks for the classical physical laws that scientists use, and correspondingly three different geometric arenas for the laws; cf. Fig. 1.1. General relativity is the most accurate classical framework; it formulates the laws as geometric relationships in the arena of curved 4-dimensional spacetime. Special relativity is the limit of general relativity in the complete absence of gravity; its arena is flat, 4-dimensional Minkowski spacetime 1 . Newtonian physics is the limit of general relativity when (i) gravity is weak but not necessarily absent, (ii) relative speeds of particles and materials are small compared to the speed of light c, and (iii) all stresses (pressures) are small compared to the total density of mass-energy; its arena is flat, 3-dimensional Euclidean space with time separated off and made universal (by contrast with the frame-dependent time of relativity). In Parts I–V of this book (statistical physics, optics, elasticity theory, fluid mechanics, plasma physics) we shall confine ourselves to the Newtonian and special relativistic formulations of the laws, and accordingly our arenas will be flat Euclidean space and flat Minkowski spacetime. In Part VI we shall extend many of the laws we have studied into the domain of strong gravity (general relativity), i.e., the arena of curved spacetime. In Parts I and II (statistical physics and optics), in addition to confining ourselves to flat space or flat spacetime, we shall avoid any sophisticated use of curvilinear coordinates; i.e., when using coordinates in nontrivial ways, we shall confine ourselves to Cartesian coordinates 1

so-called because it was Hermann Minkowski (1908) who identified the special relativistic invariant interval as defining a metric in spacetime, and who elucidated the resulting geometry of flat spacetime.

3

General Relativity The most accurate framework for Classical Physics Arena: Curved spacetime

weak gravity small speeds small stresses

vanishing gravity

Special Relativity Classical Physics in the absence of gravity Arena: Flat, Minkowski spacetime

low speeds small stresses add weak gravity

Newtonian Physics Approximation to relativistic physics Arena: Flat, Euclidean 3-space, plus universal time

Fig. 1.1: The three frameworks and arenas for the classical laws of physics, and their relationship to each other.

in Euclidean space, and Lorentz coordinates in Minkowski spacetime. This chapter is an introduction to all the differential geometric tools that we shall need in these limited arenas. In Parts III, IV, and V, when studying elasticity theory, fluid mechanics, and plasma physics, we will use curvilinear coordinates in nontrivial ways. As a foundation for them, at the beginning of Part III we will extend our flat-space differential geometric tools to curvilinear coordinate systems (e.g. cylindrical and spherical coordinates). Finally, at the beginning of Part VI, we shall extend our geometric tools to the arena of curved spacetime. In this chapter we shall alternate back and forth, one section after another, between the laws of physics and flat-space differential geometry, using each to illustrate and illuminate the other. We begin in Sec. 1.2 by recalling the foundational concepts of Newtonian physics and of special relativity. Then in Sec. 1.3 we develop our first set of differential geometric tools: the tools of coordinate-free tensor algebra. In Sec. 1.4 we illustrate our tensor-algebra tools by using them to describe—without any coordinate system or reference frame whatsoever—the kinematics of point particles that move through the Euclidean space of Newtonian physics and through relativity’s Minkowski spacetime; the particles are allowed to collide with each other and be accelerated by an electromagnetic field. In Sec. 1.5, we extend the tools of tensor algebra to the domain of Cartesian and Lorentz coordinate systems, and then in Sec. 1.6 we use these extended tensorial tools to restudy the motions, collisions, and electromagnetic accelerations of particles. In Sec. 1.7 we discuss rotations in Euclidean space and Lorentz transformations in Minkowski spacetime, and we develop relativistic spacetime diagrams in some depth and use them to study such relativistic phenomena as length contraction, time dilation, and simultaneity breakdown. In Sec. 1.8 we illustrate the tools we have developed by asking whether the laws of relativity permit a highly advanced civilization to build time machines for traveling backward in time as well as forward. In Sec. 1.9 we develop additional differential geometric tools: directional derivatives, gradients, and the Levi-Civita tensor, and in Sec. 1.10 we use these tools to discuss Maxwell’s equations and the geometric nature of electric and magnetic fields. In Sec. 1.11 we develop our final set of geometric tools: volume elements and the integration of tensors over spacetime, and in Sec. 1.12 we use these tools to define the stress tensor of Newtonian physics and relativity’s stress-energy tensor, and to formulate very general versions of the conservation of 4-momentum.

4

1.2 1.2.1

Foundational Concepts [N] Newtonian Concepts

The arena for the Newtonian laws is a spacetime composed of the familiar 3-dimensional Euclidean space of everyday experience (which we shall call 3-space), and a universal time t. We shall denote points (locations) in 3-space by capital script letters such as P and Q. These points and the 3-space in which they live require no coordinates for their definition. A scalar is a single number that we associate with a point, P, in 3-space. We are interested in scalars that represent physical quantities, e.g., temperature T . When a scalar is a function of location P in space, e.g. T (P), we call it a scalar field. A vector in Euclidean 3-space can be thought of as a straight arrow that reaches from one point, P, to another, Q (e.g., the arrow ∆x of Fig. 1.2a). Equivalently, ∆x can be thought of as a direction at P and a number, the vector’s length. Sometimes we shall select one point O in 3-space as an “origin” and identify all other points, say Q and P, by their vectorial separations xQ and xP from that origin. The Euclidean distance ∆σ between two points P and Q in 3-space can be measured with a ruler and so, of course, requires no coordinate system for its definition. (If one does have a Cartesian coordinate system, it can be computed by the Pythagorean formula, a precursor to the “invariant interval” of flat spacetime, Sec. 1.2.3.) This distance ∆σ is also the length |∆x| of the vector ∆x that reaches from P to Q, and the square of that length is denoted |∆x|2 ≡ (∆x)2 ≡ (∆σ)2 .

(1.1)

Of particular importance is the case when P and Q are neighboring points and ∆x is a differential (infinitesimal) quantity dx. By traveling along a sequence of such dx’s, laying them down tail-at-tip, one after another, we can map out a curve to which these dx’s are tangent (Fig. 1.2b). The curve is P(λ), with λ a parameter along the curve; and the infinitesimal vectors that map it out are dx = (dP/dλ)dλ. The product of a scalar with a vector is still a vector; so if we take the change of location dx of a particular element of a fluid during a (universal) time interval dt, and multiply it by 1/dt, we obtain a new vector, the fluid element’s velocity v = dx/dt, at the fluid element’s location P. Performing this operation at every point P in the fluid defines the velocity field v(P). Similarly, the sum (or difference) of two vectors is also a vector and so taking the

Q

xQ O

xP

∆x

P

(a)

C (b)

Fig. 1.2: (a) A Euclidean 3-space diagram depicting two points P and Q, their vectorial separations ~xP and ~xQ from the (arbitrarily chosen) origin O, and the vector ∆x = xQ − xP connecting them. (b) A curve C generated by laying out a sequence of infinitesimal vectors, tail-to-tip.

5 difference of two velocity measurements at times separated by dt and multiplying by 1/dt generates the acceleration a = dv/dt. Multiplying by the fluid element’s (scalar) mass m gives the force F = ma that produced the acceleration; dividing an electrically produced force by the fluid element’s charge q gives another vector, the electric field E = F/q, and so on. We can define inner products [Eq. (1.9a) below] of pairs of vectors at a point (e.g., force and displacement) to obtain a new scalar (e.g., work), and cross products [Eq. (1.60a)] of vectors to obtain a new vector (e.g., torque). By examining how a differentiable scalar field changes from point to point, we can define its gradient [Eq. (1.54b)]. In this fashion, which should be familiar to the reader and will be elucidated and generalized below, we can construct all of the standard scalars and vectors of Newtonian physics. What is important is that these physical quantities require no coordinate system for their definition. They are geometric (coordinate-independent) objects residing in Euclidean 3-space at a particular time. It is a fundamental (though often ignored) principle of physics that the Newtonian physical laws are all expressible as geometric relationships between these types of geometric objects, and these relationships do not depend upon any coordinate system or orientation of axes, nor on any reference frame (on any purported velocity of the Euclidean space in which the measurements are made).2 We shall return to this principle throughout this book. It is the Newtonian analog of Einstein’s Principle of Relativity (Sec. 1.2.3 below).

1.2.2

[R] Special Relativistic Concepts: Inertial frames, inertial coordinates, events, vectors, and spacetime diagrams

Because the nature and geometry of Minkowski spacetime are far less obvious intuitively than those of Euclidean 3-space, we shall need a crutch in our development of the Minkowski foundational concepts. That crutch will be inertial reference frames. We shall use them to develop in turn the following frame-independent Minkowski-spacetime concepts: events, 4vectors, the principle of relativity, geometrized units, the interval and its invariance, and spacetime diagrams. An inertial reference frame is a (conceptual) three-dimensional latticework of measuring rods and clocks (Fig. 1.3) with the following properties: (i ) The latticework moves freely through spacetime (i.e., no forces act on it), and is attached to gyroscopes so it does not rotate with respect to distant, celestial objects. (ii ) The measuring rods form an orthogonal lattice and the length intervals marked on them are uniform when compared to, e.g., the wavelength of light emitted by some standard type of atom or molecule; and therefore the rods form an orthonormal, Cartesian coordinate system with the coordinate x measured along one axis, y along another, and z along the third. (iii ) The clocks are densely packed throughout the latticework so that, ideally, there is a separate clock at every lattice point. (iv ) The clocks tick uniformly when compared, e.g., to the period of the light emitted by some standard type of atom or molecule; i.e., they are ideal clocks. (v ) The clocks are synchronized by the Einstein synchronization process: If a pulse of light, emitted by one of the clocks, bounces off a mirror attached to another and then returns, the time of bounce tb 2

By changing the velocity of Euclidean space, one adds a constant velocity to all particles, but this leaves the laws, e.g. Newton’s F = ma, unchanged.

6

Fig. 1.3: An inertial reference frame. From Taylor and Wheeler (1992).

as measured by the clock that does the bouncing is the average of the times of emission and reception as measured by the emitting and receiving clock: tb = 12 (te + tr ).3 Our second fundamental relativistic concept is the event. An event is a precise location in space at a precise moment of time; i.e., a precise location (or “point”) in 4-dimensional spacetime. We sometimes will denote events by capital script letters such as P and Q — the same notation as for points in Euclidean 3-space; there need be no confusion, since we will avoid dealing with 3-space points and Minkowski-spacetime points simultaneously. A 4-vector (also often referred to as a vector in spacetime) is a straight arrow ∆~x reaching from one event P to another Q; equivalently, ∆~x is a direction in spacetime at the event P where it lives, together with a number that tell us its length. We often will deal with 4-vectors and ordinary (3-space) vectors simultaneously, so we shall need different notations for them: bold-face Roman font for 3-vectors, ∆x, and arrowed italic font for 4-vectors, ∆~x. Sometimes we shall identify an event P in spacetime by its vectorial separation ~xP from some arbitrarily chosen event in spacetime, the “origin” O. An inertial reference frame provides us with a coordinate system for spacetime. The coordinates (x0 , x1 , x2 , x3 ) = (t, x, y, z) which it associates with an event P are P’s location (x, y, z) in the frame’s latticework of measuring rods, and the time t of P as measured by the clock that sits in the lattice at the event’s location. (Many apparent paradoxes in special relativity result from failing to remember that the time t of an event is always measured by a clock that resides at the event, and never by clocks that reside elsewhere in spacetime.) It is useful to depict events on spacetime diagrams, in which the time coordinate t = x0 of some inertial frame is plotted upward, and two of the frame’s three spatial coordinates, x = x1 and y = x2 , are plotted horizontally. Figure 1.4 is an example. Two events P and Q are shown there, along with their vectorial separations ~xP and ~xQ from the origin and the vector ∆~x = ~xQ − ~xP that separates them from each other. The coordinates of P and Q, 3

For a deeper discussion of the nature of ideal clocks and ideal measuring rods see, e.g., pp. 23–29 and 395–399 of Misner, Thorne, and Wheeler (1973).

7

t

y

Q

x→Q

→ P

x

→

∆x

x P

Fig. 1.4: A spacetime diagram depicting two events P and Q, their vectorial separations ~xP and ~xQ from an (arbitrarily chosen) origin, and the vector ∆~x = ~xQ − ~xP connecting them. The laws of physics cannot involve the arbitrary origin O; we introduce it only as a conceptual aid.

which are the same as the components of ~xP and ~xQ in this coordinate system, are (tP , xP , yP , zP ) and (tQ , xQ , yQ , zQ ); and correspondingly, the components of ∆~x are ∆x0 = ∆t = tQ − tP , ∆x2 = ∆y = yQ − yP ,

∆x1 = ∆x = xQ − xP , ∆x3 = ∆z = zQ − zP .

(1.2)

We shall denote these components of ∆~x more compactly by ∆xα , where the α index (and every other lower case Greek index that we shall encounter) takes on values t = 0, x = 1, y = 2, and z = 3. Similarly, in 3-dimensional Euclidean space, we shall denote the Cartesian components ∆x of a vector separating two events by ∆xj , where the j (and every other lower case Latin index) takes on the values x = 1, y = 2, and z = 3. When the physics or geometry of a situation being studied suggests some preferred inertial frame (e.g., the frame in which some piece of experimental apparatus is at rest), then we typically will use as axes for our spacetime diagrams the coordinates of that preferred frame. On the other hand, when our situation provides no preferred inertial frame, or when we wish to emphasize a frame-independent viewpoint, we shall use as axes the coordinates of a completely arbitrary inertial frame and we shall think of the spacetime diagram as depicting spacetime in a coordinate-independent, frame-independent way. The coordinate system (t, x, y, z) provided by an inertial frame is sometimes called an inertial coordinate system, and sometimes a Minkowski coordinate system (a term we shall not use), and sometimes a Lorentz coordinate system [because it was Lorentz (1904) who first studied the relationship of one such coordinate system to another, the Lorentz transformation]. We shall use the terms “Lorentz coordinate system” and “inertial coordinate system” interchangeably, and we shall also use the term Lorentz frame interchangeably with inertial frame. A physicist or other intelligent being who resides in a Lorentz frame and makes measurements using its latticework of rods and clocks will be called an observer. Although events are often described by their coordinates in a Lorentz reference frame, and vectors by their components (coordinate differences), it should be obvious that the concepts of an event and a vector need not rely on any coordinate system whatsoever for their definition. For example, the event P of the birth of Isaac Newton, and the event Q of the birth of Albert Einstein are readily identified without coordinates. They can be regarded as points in spacetime, and their separation vector is the straight arrow reaching through

8 spacetime from P to Q. Different observers in different inertial frames will attribute different coordinates to each birth and different components to the births’ vectorial separation; but all observers can agree that they are talking about the same events P and Q in spacetime and the same separation vector ∆~x. In this sense, P, Q, and ∆~x are frame-independent, geometric objects (points and arrows) that reside in spacetime.

1.2.3

[R] Special Relativistic Concepts: Principle of Relativity; the Interval and its Invariance

The principle of relativity states that Every (special relativistic) law of physics must be expressible as a geometric, frame-independent relationship between geometric, frame-independent objects, i.e. objects such as points in spacetime and vectors and tensors, which represent physical quantities such as events and particle momenta and the electromagnetic field. Since the laws are all geometric (i.e., unrelated to any reference frame or coordinate system), there is no way that they can distinguish one inertial reference frame from any other. This leads to an alternative form of the principle of relativity (one commonly used in elementary textbooks and equivalent to the above): All the (special relativistic) laws of physics are the same in every inertial reference frame, everywhere in spacetime. A more operational version of this principle is the following: Give identical instructions for a specific physics experiment to two different observers in two different inertial reference frames at the same or different locations in Minkowski (i.e., gravity-free) spacetime. The experiment must be self-contained, i.e., it must not involve observations of the external universe’s properties (the “environment”), though it might utilize carefully calibrated tools derived from the external universe. For example, an unacceptable experiment would be a measurement of the anisotropy of the Universe’s cosmic microwave radiation and a computation therefrom of the observer’s velocity relative to the radiation’s mean rest frame; such an experiment studies the Universal environment, not the fundamental laws of physics. An acceptable experiment would be a measurement of the speed of light using the rods and clocks of the observer’s own frame, or a measurement of cross sections for elementary particle reactions using cosmic-ray particles whose incoming energies and compositions are measured as initial conditions for the experiment. The principle of relativity says that in these or any other similarly self-contained experiments, the two observers in their two different inertial frames must obtain identically the same experimental results—to within the accuracy of their experimental techniques. Since the experimental results are governed by the (nongravitational) laws of physics, this is equivalent to the statement that all physical laws are the same in the two inertial frames. Perhaps the most central of special relativistic laws is the one stating that the speed of light c in vacuum is frame-independent, i.e., is a constant, independent of the inertial reference frame in which it is measured. In other words, there is no aether that supports light’s vibrations and in the process influences its speed — a remarkable fact that came as a great experimental surprise to physicists at the end of the nineteenth century. The constancy of the speed of light is built into Maxwell’s equations. In order for the Maxwell equations to be frame independent, the speed of light, which appears in them, must also be frame independent. In this sense, the constancy of the speed of light follows from the Principle of Relativity; it is not an independent postulate. This is illustrated in Box 1.2.

9 Box 1.2 Measuring the Speed of Light Without Light

ae

q,µ

v

q,µ r

r

am

Q Q In some inertial reference frame we perform two experiments using two particles, one with a large charge Q; the other, a test particle, with a much smaller charge q and mass µ. In the first experiment we place the two particles at rest, separated by a distance |∆x| ≡ r and measure the electrical repulsive acceleration ae of q (left diagram). In Gaussian cgs units (where the speed of light shows up explicitly instead of via ǫo µo = 1/c2 ), the acceleration is ae = qQ/r2 µ. In the second experiment, we connect Q to ground by a long wire, and we place q at the distance |∆x| = r from the wire and set it moving at speed v parallel to the wire. The charge Q flows down the wire with an e-folding time τ so the current is I = dQ/dτ = (Q/τ )e−t/τ . At early times 0 < t ≪ τ , this current I = Q/τ produces a solenoidal magnetic field at q with field strength B = (2/cr)(Q/τ ), and this field exerts a magnetic force on q, giving it an acceleration am = q(v/c)B/µ = 2vqQ/c2 τ r/µ. The ratio of the electric acceleration in the first experiment to the magnetic acceleration in the second experiment is ae /am = c2 τ /2rv. Therefore, we can measure the speed of light c in our chosen inertial frame by performing this pair of experiments, carefully measuring the separation r, speed v, current Q/τ , and accelerations, and then p simply computing c = (2rv/τ )(ae /am ). The principle of relativity insists that the result of this pair of experiments should be independent of the inertial frame in which they are performed. Therefore, the speed of light c which appears in Maxwell’s equations must be frame-independent. In this sense, the constancy of the speed of light follows from the Principle of Relativity as applied to Maxwell’s equations.

The constancy of the speed of light was verified with very high precision in an era when the units of length (centimeters) and the units of time (seconds) were defined independently. By 1983, the constancy had become so universally accepted that it was used to redefine the meter (which is hard to measure precisely) in terms of the second (which is much easier to measure with modern technology4 ): The meter is now related to the second in such a way 4

The second is defined as the duration of 9,192,631,770 periods of the radiation produced by a certain hyperfine transition in the ground state of a 133 Cs atom that is at rest in empty space. Today (2008) all fundamental physical units except mass units (e.g. the kilogram) are defined similarly in terms of fundamental constants of nature.

10 that the speed of light is precisely c = 299, 792, 458 m s−1 ; i.e., one meter is the distance traveled by light in (1/299, 792, 458) seconds. Because of this constancy of the light speed, it is permissible when studying special relativity to set c to unity. Doing so is equivalent to the relationship c = 2.99792458 × 1010 cm s−1 = 1

(1.3a)

between seconds and centimeters; i.e., equivalent to 1 second = 2.99792458 × 1010 cm .

(1.3b)

We shall refer to units in which c = 1 as geometrized units, and we shall adopt them throughout this book, when dealing with relativistic physics, since they make equations look much simpler. Occasionally it will be useful to restore the factors of c to an equation, thereby converting it to ordinary (SI or Gaussian-cgs) units. This restoration is achieved easily using dimensional considerations. For example, the equivalence of mass m and energy E is written in geometrized units as E = m. In cgs units E has dimensions ergs = gram cm2 sec−2 , while m has dimensions of grams, so to make E = m dimensionally correct we must multiply the right side by a power of c that has dimensions cm2 /sec2 , i.e. by c2 ; thereby we obtain E = mc2 . We turn, next, to another fundamental concept, the interval (∆s)2 between the two events P and Q whose separation vector is ∆~x. In a specific but arbitrary inertial reference frame and in geometrized units, (∆s)2 is given by (∆s)2 ≡ −(∆t)2 + (∆x)2 + (∆y)2 + (∆z)2 = −(∆t)2 +

X

δij ∆xi ∆xj ;

(1.4a)

i,j

cf. Eq. (1.2). Here δij is the Kronecker delta, (unity if i = j; zero otherwise) and the spatial indices i and j are summed over 1, 2, 3. If (∆s)2 > 0, the events P and Q are said to have a spacelike separation; if (∆s)2 = 0, their separation is null or lightlike; and if (∆s)2 < 0, their separation is timelike. For timelike separations, (∆s)2 < 0 implies that ∆s is imaginary; to avoid dealing with imaginary numbers, we describe timelike intervals by (∆τ )2 ≡ −(∆s)2 ,

(1.4b)

whose square root ∆τ is real. The coordinate separation between P and Q depends on one’s reference frame; i.e., if ′ ′ ∆xα and ∆xα are the coordinate separations in two different frames, then ∆xα 6= ∆xα . Despite this frame dependence, the principle of relativity forces the interval (∆s)2 to be the same in all frames: (∆s)2 = −(∆t)2 + (∆x)2 + (∆y)2 + (∆z)2 = −(∆t′ )2 + (∆x′ )2 + (∆y ′ )2 + (∆z ′ )2

(1.5)

We shall sketch a proof for the case of two events P and Q whose separation is timelike:

11

Fig. 1.5: Geometry for proving the invariance of the interval.

Choose the spatial coordinate systems of the primed and unprimed frames in such a way that (i) their relative motion (with speed β that will not enter into our analysis) is along the x direction and the x′ direction, (ii) event P lies on the x and x′ axes, and (iii) event Q lies in the x-y plane and in the x′ -y ′ plane, as shown in Fig. 1.5. Then evaluate the interval between P and Q in the unprimed frame by the following construction: Place a mirror parallel to the x-z plane at precisely the height h that permits a photon, emitted from P, to travel along the dashed line of Fig. 1.5 to the mirror, then reflect off the mirror and continue along the dashed path, arriving at event Q. If the mirror were placed lower, the photon would arrive at the spatial location of Q sooner than the time of Q; if placed higher, it would arrive later. Then the distance the photon travels (the length of the two-segment dashed line) is equal to c∆t = ∆t, where ∆t is the time between events P and Q as measured in the unprimed frame. If the mirror had not been present, the photon would have arrived at event R after time ∆t, so c∆t is the distance between P and R. From the diagram it is easy to see that the height of R above the x axis is 2h − ∆y, and the Pythagorean theorem then implies that (∆s)2 = −(∆t)2 + (∆x)2 + (∆y)2 = −(2h − ∆y)2 + (∆y)2 .

(1.6a)

The same construction in the primed frame must give the same formula, but with primes (∆s′ )2 = −(∆t′ )2 + (∆x′ )2 + (∆y ′ )2 = −(2h′ − ∆y ′ )2 + (∆y ′ )2 .

(1.6b)

The proof that (∆s′ )2 = (∆s)2 then reduces to showing that the principle of relativity requires that distances perpendicular to the direction of relative motion of two frames be the same as measured in the two frames, h′ = h, ∆y ′ = ∆y. We leave it to the reader to develop a careful argument for this [Ex. 1.2]. Because of its frame invariance, the interval (∆s)2 can be regarded as a geometric property of the vector ∆~x that reaches from P to Q; we shall call it the squared length (∆~x)2 of ∆~x: (∆~x)2 ≡ (∆s)2 .

(1.7)

Note that this squared length, despite its name, can be negative (for timelike ∆~x) or zero (for null ∆~x) as well as positive (for spacelike ∆~x).

12 This invariant interval between two events is as fundamental to Minkowski spacetime as the Euclidean distance between two points is to flat 3-space. Just as the Euclidean distance gives rise to the geometry of 3-space, as embodied, e.g., in Euclid’s axioms, so the interval gives rise to the geometry of spacetime, which we shall be exploring. If this spacetime geometry were as intuitively obvious to humans as is Euclidean geometry, we would not need the crutch of inertial reference frames to arrive at it. Nature (presumably) has no need for such a crutch. To Nature (it seems evident), the geometry of Minkowski spacetime, as embodied in the invariant interval, is among the most fundamental aspects of physical law. Before we leave this central idea, we should emphasize that vacuum electromagnetic radiation is not the only type of wave in nature. In this course, we shall encounter dispersive media, like optical fibers or plasmas, where electromagnetic signals travel slower than c, and we shall analyze sound waves and seismic waves where the governing laws do not involve electromagnetism at all. How do these fit into our special relativistic framework? The answer is simple. Each of these waves requires an underlying medium that is at rest in one particular frame (not necessarily inertial) and the velocity of the wave, specifically the group velocity, is most simply calculated in this frame from the waves’ and medium’s fundamental laws. We can then use the kinematic rules of Lorentz transformations to compute the velocity in another frame. However, if we had chosen to compute the wave speed in the second frame directly, using the same fundamental laws, we would have gotten the same answer, albeit perhaps with greater effort. All waves are in full compliance with the principle of relativity. What is special about vacuum electromagnetic waves and, by extension, photons, is that no medium (or “ether” as it used to be called) is needed for them to propagate. Their speed is therefore the same in all frames. This raises an interesting question. What about other waves that do not require an underlying medium? What about electron de Broglie waves? Here the fundamental wave equation, Schrödinger’s or Dirac’s, is mathematically different from Maxwell’s and contains an important parameter, the electron rest mass. This allows the fundamental laws of relativistic quantum mechanics to be written in a form that is the same in all inertial reference frames and that allows an electron, considered as either a wave or a particle, to travel at a different speed when measured in a different frame. What about non-electromagnetic waves whose quanta have vanishing rest mass? For a long while, we thought that neutrinos provided a good example, but we now know from experiment that their rest masses are non-zero. However, there are other particles that have not yet been detected, including photinos (the hypothesized, supersymmetric partners to photons) and gravitons (and their associated gravitational waves which we shall discuss in Chapter 26), that are believed to exist without a rest mass (or an ether!), just like photons. Must these travel at the same speed as photons? The answer to this question, according to the principle of relativity, is “yes”. The reason is simple. Suppose there were two such waves (or particles) whose governing laws led to different speeds, c and c′ < c, each the same in all reference frames. If we then move with speed c′ in the direction of propagation of the second wave, we would bring it to rest, in conflict with our hypothesis that its speed is frame-independent. Therefore all signals, whose governing laws require them to travel with a speed that has no governing parameters (no rest mass and no underlying medium with physical properties) must travel with a unique speed which we call “c”. The speed of light

13 is more fundamental to relativity than light itself! **************************** EXERCISES Exercise 1.1 Practice: [R] Geometrized Units Convert the following equations from the geometrized units in which they are written to cgs/Gaussian units: (a) The “Planck time” tP expressed in terms of Newton’s gravitation constant G and √ Planck’s constant ~, tP = G~. What is the numerical value of tP in seconds? in meters? (b) The Lorentz force law mdv/dt = e(E + v × B). (c) The expression p = ~ωn for the momentum p of a photon in terms of its angular frequency ω and direction n of propagation. How tall are you, in seconds? How old are you, in centimeters? Exercise 1.2 Derivation and Example: [R] Invariance of the Interval Complete the derivation of the invariance of the interval given in the text [Eqs. (1.6)], using the principle of relativity in the form that the laws of physics must be the same in the primed and unprimed frames. Hints, if you need them: (a) Having carried out the construction shown in Fig. 1.5 in the unprimed frame, use the same mirror and photons for the analogous construction in the primed frame. Argue that, independently of the frame in which the mirror is at rest (unprimed or primed), the fact that the reflected photon has (angle of reflection) = (angle of incidence) in the primed frame implies that this is also true for this same photon in the unprimed frame. Thereby conclude that the construction leads to Eq. (1.6b) as well as to (1.6a). (b) Then argue that the perpendicular distance of an event from the common x and x′ axis must be the same in the two reference frames, so h′ = h and ∆y ′ = ∆y; whence Eqs. (1.6b) and (1.6a) imply the invariance of the interval. [For a leisurely version of this argument, see Secs. 3.6 and 3.7 of Taylor and Wheeler (1992).]

****************************

14

1.3

[N & R] Tensor Algebra Without a Coordinate System

We now pause in our development of the geometric view of physical laws, to introduce, in a coordinate-free way, some fundamental concepts of differential geometry: tensors, the inner product, the metric tensor, the tensor product, and contraction of tensors. In this section we shall allow the space in which the concepts live to be either 4-dimensional Minkowski spacetime, or 3-dimensional Euclidean space; we shall denote its dimensionality by N; and ~ for vectors even though the space might be we shall use spacetime’s arrowed notation A Euclidean 3-space. ~ as a straight arrow from one point, say P, in our space We have already defined a vector A to another, say Q. Because our space is flat, there is a unique and obvious way to transport such an arrow from one location to another, keeping its length and direction unchanged.5 Accordingly, we shall regard vectors as unchanged by such transport. This enables us to ignore the issue of where in space a vector actually resides; it is completely determined by its direction and its length.

7.95

T

Fig. 1.6: A rank-3 tensor T.

A rank-n tensor T is, by definition, a real-valued, linear function of n vectors. Pictorially we shall regard T as a box (Fig. 1.6) with n slots in its top, into which are inserted n vectors, and one slot in its end, out of which rolls computer paper with a single real number printed on it: the value that the tensor T has when evaluated as a function of the n inserted vectors. Notationally we shall denote the tensor by a bold-face sans-serif character T T( , , , ) . | {z } տ n slots in which to put the vectors

(1.8a)

~ + f F~ , B, ~ C) ~ = eT(E, ~ B, ~ C) ~ + f T(F~ , B, ~ C) ~ , T(eE

(1.8b)

~ B, ~ C ~ will If T is a rank-3 tensor (has 3 slots) as in Fig. 1.6, then its value on the vectors A, ~ B, ~ C). ~ Linearity of this function can be expressed as be denoted T(A,

where e and f are real numbers, and similarly for the second and third slots. ~ 2 ≡ A ~ 2 of a vector A ~ as the squared We have already defined the squared length (A) distance (in 3-space) or interval (in spacetime) between the points at its tail and its tip. The ~ ·B ~ of two vectors is defined in terms of the squared length by inner product A i h ~·B ~ ≡ 1 (A ~ + B) ~ 2 − (A ~ − B) ~ 2 . A (1.9a) 4 5

This is not so in curved spaces, as we shall see in Sec. 24.7.

15 In Euclidean space this is the standard inner product, familiar from elementary geometry. ~·B ~ is a linear function of each of its vectors, we can regard Because the inner product A it as a tensor of rank 2. When so regarded, the inner product is denoted g( , ) and is called the metric tensor. In other words, the metric tensor g is that linear function of two vectors whose value is given by ~ B) ~ ≡A ~·B ~ . g(A, (1.9b) ~ ·B ~ =B ~ · A, ~ the metric tensor is symmetric in its two slots; i.e., one Notice that, because A gets the same real number independently of the order in which one inserts the two vectors into the slots: ~ B) ~ = g(B, ~ A) ~ g(A, (1.9c) ~ as a tensor of rank one: With the aid of the inner product, we can regard any vector A ~ ~ slot is The real number that is produced when an arbitrary vector C is inserted into A’s ~ C) ~ ≡A ~·C ~ . A(

(1.9d)

Second-rank tensors appear frequently in the laws of physics—often in roles where one sticks a single vector into the second slot and leaves the first slot empty thereby producing a single-slotted entity, a vector. A familiar example is a rigid body’s (Newtonian) momentof-inertia tensor I( , ). Insert the body’s angular velocity vector Ω into the second slot, and you get the body’s angular momentum vector J( ) = I( , Ω). Other examples are the stress tensor of a solid, a fluid, a plasma or a field (Sec. 1.12 below) and the electromagnetic field tensor (Secs. 1.4.3 and 1.10 below). ~ B, ~ C ~ we can construct a tensor, their tensor From three (or any number of) vectors A, ~ · B), ~ defined product (also called outer product in contradistinction to the inner product A as follows: ~ ⊗B ~ ⊗ C( ~ E, ~ F~ , G) ~ ≡ A( ~ E) ~ B( ~ F~ )C( ~ G) ~ = (A ~ · E)( ~ B ~ · F~ )(C ~ · G) ~ . A

(1.10a)

~ ⊗B ~ ⊗C ~ evaluated Here the first expression is the notation for the value of the new tensor, A ~ F~ , G; ~ the middle expression is the ordinary product of three real on the three vectors E, ~ ~ the value of B ~ on F~ , and the value of C ~ on G; ~ and the numbers, the value of A on E, third expression is that same product with the three numbers rewritten as scalar products. Similar definitions can be given (and should be obvious) for the tensor product of any two or more tensors of any rank; for example, if T has rank 2 and S has rank 3, then ~ F~ , G, ~ H, ~ J) ~ ≡ T(E, ~ F~ )S(G, ~ H, ~ J) ~ . T ⊗ S(E,

(1.10b)

One last geometric (i.e. frame-independent) concept we shall need is contraction. We shall illustrate this concept first by a simple example, then give the general definition. From ~ and B ~ we can construct the tensor product A ~ ⊗B ~ (a second-rank tensor), and two vectors A ~ ~ we can also construct the scalar product A · B (a real number, i.e. a scalar, i.e. a rank-0 ~ ·B ~ from A ~ ⊗B ~ tensor ). The process of contraction is the construction of A ~ ⊗ B) ~ ≡A ~·B ~ . contraction(A

(1.11a)

16 One can show fairly easily using component techniques (Sec. 1.5 below) that any second-rank ~ ⊗B ~ +C ~ ⊗D ~ + . . .; tensor T can be expressed as a sum of tensor products of vectors, T = A and correspondingly, it is natural to define the contraction of T to be contraction(T) = ~ ·B ~ +C ~ ·D ~ + . . .. Note that this contraction process lowers the rank of the tensor by two, A from 2 to 0. Similarly, for a tensor of rank n one can construct a tensor of rank n − 2 by contraction, but in this case one must specify which slots are to be contracted. For example, ~⊗B ~ ⊗C ~ +E ~ ⊗ F~ ⊗ G ~ + . . ., then the if T is a third rank tensor, expressible as T = A contraction of T on its first and third slots is the rank-1 tensor (vector) ~⊗B ~ ⊗C ~ +E ~ ⊗ F~ ⊗ G ~ + . . .) ≡ (A ~ · C) ~ B ~ + (E ~ · G) ~ F~ + . . . . 1&3contraction(A

(1.11b)

All the concepts developed in this section (vectors, tensors, metric tensor, inner product, tensor product, and contraction of a tensor) can be carried over, with no change whatsoever, into any vector space6 that is endowed with a concept of squared length.

1.4

Particle Kinetics and Lorentz Force Without a Reference Frame

In this section we shall illustrate our geometric viewpoint by formulating the laws of motion for particles, first in Newtonian physics and then in special relativity.

1.4.1

[N] Newtonian Particle Kinetics

In Newtonian physics, a classical particle moves through Euclidean 3-space as universal time t passes. At time t it is located at some point x(t) (its position). The function x(t) represents a curve in 3-space, the particle’s trajectory. The particle’s velocity v(t) is the time derivative of its position, its momentum p(t) is the product of its mass m and velocity, and its acceleration a(t) is the time derivative of its velocity v(t) = dx/dt ,

p(t) = mv(t),

a(t) = dv/dt = d2 x/dt2 .

(1.12)

Since points in 3-space are geometric objects (defined independently of any coordinate system), so also are the trajectory x(t), the velocity, the momentum, and the acceleration. (Physically, of course, the velocity has an ambiguity; it depends on one’s standard of rest.) Newton’s second law of motion states that the particle’s momentum can change only if a force F acts on it, and that its change is given by dp/dt = ma = F .

(1.13)

If the force is produced by an electric field E and magnetic field B, then this law of motion takes the familiar Lorentz-force form dp/dt = q(E + v × B) 6

(1.14)

or, more precisely, any vector space over the real numbers. If the vector space’s scalars are complex numbers, as in quantum mechanics, then slight changes are needed.

17 (here we have used the vector cross product, which will not be introduced formally until Sec. 1.9 below). Obviously, these laws of motion are geometric relationships between geometric objects.

1.4.2

[R] Relativistic Particle Kinetics: World Lines, 4-Velocity, 4-Momentum and its Conservation, 4-Force

In special relativity, a particle moves through 4-dimensional spacetime along a curve (its world line) which we shall denote, in frame-independent notation, by ~x(τ ). Here τ is time as measured by an ideal clock that the particle carries (the particle’s proper time), and ~x is the location of the particle in spacetime when its clock reads τ (or, equivalently, the vector from the arbitrary origin to that location). The particle typically will experience an acceleration as it moves—e.g., an acceleration produced by an external electromagnetic field. This raises the question of how the acceleration affects the ticking rate of the particle’s clock. We define the accelerated clock to be ideal if its ticking rate is totally unaffected by its acceleration, i.e., if it ticks at the same rate as a freely moving (inertial) ideal clock that is momentarily at rest with respect to it. The builders of inertial guidance systems for airplanes and missiles always try to make their clocks as acceleration-independent, i.e., as ideal, as possible. We shall refer to the inertial frame in which a particle is momentarily at rest as its momentarily comoving inertial frame or momentary rest frame. Now, the particle’s clock (which measures τ ) is ideal and so are the inertial frame’s clocks (which measure coordinate time t). Therefore, a tiny interval ∆τ of the particle’s proper time is equal to the lapse of coordinate time in the particle’s momentary rest frame, ∆τ = ∆t. Moreover, since the two events ~x(τ ) and ~x(τ + ∆τ ) on the clock’s world line occur at the same spatial location in its momentary rest frame, ∆xiP = 0 (where i = 1, 2, 3), the invariant interval between 2 2 those events is (∆s) = −(∆t) + i,j ∆xi ∆xj δij = −(∆t)2 = −(∆τ )2 . This shows that the √ particle’s proper time τ is equal to the square root of the invariant interval, τ = −s2 , along its world line.

t

u→

u→ x

1

7 6 5 4 3 2

τ =0

y

Fig. 1.7: Spacetime diagram showing the world line ~x(τ ) and 4-velocity ~u of an accelerated particle. Note that the 4-velocity is tangent to the world line.

Figure 1.7 shows the world line of the accelerated particle in a spacetime diagram where the axes are coordinates of an arbitrary Lorentz frame. This diagram is intended to emphasize

18 the world line as a frame-independent, geometric object. Also shown in the figure is the particle’s 4-velocity ~u, which (by analogy with the velocity in 3-space) is the time derivative of its position: ~u ≡ d~x/dτ . (1.15) This derivative is defined by the usual limiting process ~x(τ + ∆τ ) − ~x(τ ) d~x ≡ lim . ∆τ →0 dτ ∆τ

(1.16)

The squared length of the particle’s 4-velocity is easily seen to be −1: ~u2 ≡ g(~u, ~u) =

d~x · d~x d~x d~x · = = −1 . dτ dτ (dτ )2

(1.17)

The last equality follows from the fact that d~x · d~x is the squared length of d~x which equals the invariant interval (∆s)2 along it, and (dτ )2 is minus that invariant interval. The particle’s 4-momentum is the product of its 4-velocity and rest mass p~ ≡ m~u = md~x/dτ ≡ d~x/dζ .

(1.18)

Here the parameter ζ is a renormalized version of proper time, ζ ≡ τ /m .

(1.19)

This ζ, and any other renormalized version of proper time with position-independent renormalization factor, are called affine parameters for the particle’s world line. Expression (1.18), together with the unit length of the 4-velocity ~u2 = −1, implies that the squared length of the 4-momentum is ~p 2 = −m2 . (1.20) In quantum theory a particle is described by a relativistic wave function which, in the geometric optics limit (Chapter 6), has a wave vector ~k that is related to the classical particle’s 4-momentum by ~k = p~/~ . (1.21) The above formalism is valid only for particles with nonzero rest mass, m 6= 0. The corresponding formalism for a particle with zero rest mass (e.g. a photon or a graviton7 ) can be obtained from the above by taking the limit as m → 0 and dτ → 0 with the quotient dζ = dτ /m held finite. More specifically, the 4-momentum of a zero-rest-mass particle is well defined (and participates in the conservation law to be discussed below), and it is expressible in terms of the particle’s affine parameter ζ by Eq. (1.18) ~p = 7

d~x . dζ

(1.22)

We do not know for sure that photons and gravitons are massless, but the laws of physics as currently undertood require them to be massless and there are tight experimental limits on their rest masses.

19

t p→

p→

1

V → p

2

p→

2

1

y

x

Fig. 1.8: Spacetime diagram depicting the law of 4-momentum conservation for a situation where two particles, numbered 1 and 2, enter an interaction region V in spacetime, there interact strongly, and produce two new particles, numbered ¯1 and ¯2. The sum of the final 4-momenta, p~¯1 + ~p¯2 , must be equal to the sum of the initial 4-momenta, p~1 + p~2 .

However, the particle’s 4-velocity ~u = p~/m is infinite and thus undefined; and proper time τ = mζ ticks vanishingly slowly along its world line and thus is undefined. Because proper time is the square root of the invariant interval along the world line, the interval between two neighboring points on the world line vanishes identically; and correspondingly the world line of a zero-rest-mass particle is null. (By contrast, since dτ 2 > 0 and ds2 < 0 along the world line of a particle with finite rest mass, the world line of a finite-rest-mass particle is timelike.) The 4-momenta of particles are important because of the law of conservation of 4momentum (which, as we shall see in Sec. 1.6, is equivalent to the conservation laws for energy and ordinary momentum): If a number of “initial” particles, named A = 1, 2, 3, . . . enter a restricted region of spacetime V and there interact strongly to produce a new set of “final” particles, named A¯ = ¯1, ¯2, ¯3, . . . (Fig. 1.8), then the total 4-momentum of the final particles must be be the same as the total 4-momentum of the initial ones: X ¯ A

~pA¯ =

X

p~A .

(1.23)

A

Note that this law of 4-momentum conservation is expressed in frame-independent, geometric language—in accord with Einstein’s insistence that all the laws of physics should be so expressible. As we shall see in Part VI, momentum conservation is a consequence of the translation symmetry of flat, 4-dimensional spacetime. In general relativity’s curved spacetime, where that translation symmetry is lost, we lose momentum conservation except under special circumstances; see Sec. 24.9.4. If a particle moves freely (no external forces and no collisions with other particles), then its 4-momentum p~ will be conserved along its world line, d~p/dζ = 0. Since p~ is tangent to the world line, this means that the direction of the world line never changes; i.e., the free particle moves along a straight line through spacetime. To change the particle’s 4-momentum, one must act on it with a 4-force F~ , d~p/dτ = F~ . (1.24)

20 If the particle is a fundamental one (e.g., photon, electron, proton), then the 4-force must leave its rest mass unchanged, 0 = dm2 /dτ = −d~p2 /dτ = −2~p · d~p/dτ = −2~p · F~ ;

(1.25)

i.e., the 4-force must be orthogonal to the 4-momentum.

1.4.3

[R] Geometric Derivation of the Lorentz Force Law

As an illustration of these physical concepts and mathematical tools, we shall use them to deduce the relativistic version of the Lorentz force law. From the outset, in accord with the principle of relativity, we insist that the law we seek be expressible in geometric, frameindependent language, i.e. in terms of vectors and tensors. Consider a particle with charge q and rest mass m 6= 0, interacting with an electromagnetic field. It experiences an electromagnetic 4-force whose mathematical form we seek. The Newtonian version of the electromagnetic force F = q(E + v × B) is proportional to q and contains one piece (electric) that is independent of velocity v, and a second piece (magnetic) that is linear in v. It is reasonable to expect that, in order to produce this Newtonian limit, the relativistic 4-force F~ will be proportional to q and will be linear in the 4-velocity ~u. Linearity means there must exist some second-rank tensor F( , ), the “electromagnetic field tensor”, such that (1.26) d~p/dτ = F~ ( ) = qF( , ~u) . Because the 4-force F~ must be orthogonal to the particle’s 4-momentum and thence also to its 4-velocity, F~ · ~u ≡ F~ (~u) = 0, expression (1.26) must vanish when ~u is inserted into its empty slot. In other words, for all timelike unit-length vectors ~u, F(~u, ~u) = 0 .

(1.27)

It is an instructive exercise (Ex. 1.3) to show that this is possible only if F is antisymmetric, so the electromagnetic 4-force is d~p/dτ = qF( , ~u) ,

~ B) ~ = −F(B, ~ A) ~ for all A ~ and B ~ . where F(A,

(1.28)

This must be the relativistic form of the Lorentz force law. In Sec. 1.10 below, we shall deduce the relationship of the electromagnetic field tensor F to the more familiar electric and magnetic fields, and the relationship of this relativistic Lorentz force to its Newtonian form (1.14). This discussion of particle kinematics and the electromagnetic force is elegant, but perhaps unfamiliar. In Secs. 1.6 and 1.10 we shall see that it is equivalent to the more elementary (but more complex) formalism based on components of vectors. **************************** EXERCISES

21 Exercise 1.3 Derivation and Example: [R] Antisymmetry of Electromagnetic Field Tensor Show that Eq. (1.27) can be true for all timelike, unit-length vectors ~u if and only if F is antisymmetric. [Hints: (i) Show that the most general second-rank tensor F can be written as the sum of a symmetric tensor S and an antisymmetric tensor A, and that the antisymmetric ~ and C ~ be any two vectors such that B ~ +C ~ piece contributes nothing to Eq. (1.27). (ii) Let B ~ −C ~ are both timelike; show that S(B, ~ C) ~ = 0. (iii) Convince yourself (if necessary and B using the component tools developed in the next section) that this result, together with the ~ and B, ~ 4-dimensionality of spacetime and the large arbitrariness inherent in the choice of A implies S vanishes (i.e., it gives zero when any two vectors are inserted into its slots).] Exercise 1.4 [R] Problem: Relativistic Gravitational Force Law In Newtonian theory the gravitational potential Φ exerts a force F = dp/dt = −m∇Φ on a particle with mass m and momentum p. Before Einstein formulated general relativity, some physicists constructed relativistic theories of gravity in which a Newtonian-like scalar gravitational field Φ exerted a 4-force F~ = d~p/dτ on any particle with rest mass m, 4velocity ~u and 4-momentum p~ = m~u. What must that force law have been, in order to (i) obey the principle of relativity, (ii) reduce to Newton’s law in the non-relativistic limit, and (iii) preserve the particle’s rest mass as time passes?

****************************

1.5 1.5.1

Component Representation of Tensor Algebra [N] Euclidean 3-space

In the Euclidean 3-space of Newtonian physics, there is a unique set of orthonormal basis vectors {ex , ey , ez } ≡ {e1 , e2 , e3 } associated with any Cartesian coordinate system {x, y, z} ≡ {x1 , x2 , x3 } ≡ {x1 , x2 , x3 }. [In Cartesian coordinates in Euclidean space, we will usually place indices down, but occasionally we will place them up. It doesn’t matter. By definition, in Cartesian coordinates a quantity is the same whether its index is down or up.] The basis vector ej points along the xj coordinate direction, which is orthogonal to all the other coordinate directions, and it has unit length, so ej · ek = δjk .

(1.29a)

Any vector A in 3-space can be expanded in terms of this basis, A = Aj ej .

(1.29b)

Here and throughout this book, we adopt the Einstein summation convention: repeated indices (in this case j) are to be summed (in this 3-space case over j = 1, 2, 3). By virtue

22 z

e2

e1 x

t

e3 y

(a)

e→0

e→1 x

e→2

y

(b)

Fig. 1.9: (a) The orthonormal basis vectors ej associated with a Euclidean coordinate system in 3-space; (b) the orthonormal basis vectors ~eα associated with an inertial (Lorentz) reference frame in Minkowski spacetime.

of the orthonormality of the basis, the components Aj of A can be computed as the scalar product Aj = A · ej . (1.29c) (The proof of this is straightforward: A · ej = (Ak ek ) · ej = Ak (ek · ej ) = Ak δkj = Aj .) Any tensor, say the third-rank tensor T( , , ), can be expanded in terms of tensor products of the basis vectors: T = Tijk ei ⊗ ej ⊗ ek . (1.29d) The components Tijk of T can be computed from T and the basis vectors by the generalization of Eq. (1.29c) Tijk = T(ei , ej , ek ) . (1.29e) (This equation can be derived using the orthonormality of the basis in the same way as Eq. (1.29c) was derived.) As an important example, the components of the metric are gjk = g(ej , ek ) = ej · ek = δjk [where the first equality is the method (1.29e) of computing tensor components, the second is the definition (1.9b) of the metric, and the third is the orthonormality relation (1.29a)]: gjk = δjk

in any orthonormal basis in 3-space.

(1.29f)

In Part VI we shall often use bases that are not orthonormal; in such bases, the metric components will not be δjk . The components of a tensor product, e.g. T( , , ) ⊗ S( , ), are easily deduced by inserting the basis vectors into the slots [Eq. (1.29e)]; they are T(ei , ej , ek ) ⊗ S(el , em ) = Tijk Slm [cf. Eq. (1.10a)]. In words, the components of a tensor product are equal to the ordinary arithmetic product of the components of the individual tensors. In component notation, the inner product of two vectors and the value of a tensor when vectors are inserted into its slots are given by A · B = Aj Bj ,

T(A, B, C) = Tijk Ai Bj Ck ,

(1.29g)

as one can easily show using previous equations. Finally, the contraction of a tensor [say, the fourth rank tensor R( , , , )] on two of its slots [say, the first and third] has components that are easily computed from the tensor’s own components: Components of [1&3contraction of R] = Rijik

(1.29h)

23 Note that Rijik is summed on the i index, so it has only two free indices, j and k, and thus is the component of a second rank tensor, as it must be if it is to represent the contraction of a fourth-rank tensor.

1.5.2

[R] Minkowski Spacetime

In Minkowski spacetime, associated with any inertial reference frame (Fig. 1.3 and associated discussion in Sec. 1.2.2), there is a Lorentz coordinate system {t, x, y, z} = {x0 , x1 , x2 , x3 } generated by the frame’s rods and clocks. And associated with these coordinates is a set of orthonormal basis vectors {~et , ~ex , ~ey , ~ez } = {~e0 , ~e1 , ~e2 , ~e3 }; cf. Fig. 1.9. (The reason for putting the indices up on the coordinates but down on the basis vectors will become clear below.) The basis vector ~eα points along the xα coordinate direction, which is orthogonal to all the other coordinate directions, and it has squared length −1 for α = 0 (vector pointing in a timelike direction) and +1 for α = 1, 2, 3 (spacelike): ~eα · ~eβ = ηαβ .

(1.30)

Here ηαβ , the orthonormality values (a spacetime analog of the Kronecker delta) are defined by η00 ≡ −1 , η11 ≡ η22 ≡ η33 ≡ 1 , ηαβ ≡ 0 if α 6= β . (1.31) The fact that ~eα ·~eβ 6= δαβ prevents many of the Euclidean-space component-manipulation formulas (1.29c)–(1.29h) from holding true in Minkowski spacetime. There are two approaches to recovering these formulas. One approach, used in many old textbooks (including the first and second editions of Goldstein’s √ Classical Mechanics and Jackson’s Classical 0 Electrodynamics), is to set x = it, where i = −1 and correspondingly make the time basis vector be imaginary, so that ~eα · ~eβ = δαβ . When this approach is adopted, the resulting formalism does not care whether indices are placed up or down; one can place them wherever one’s stomach or liver dictate without asking one’s brain. However, this x0 = it approach has severe disadvantages: (i) it hides the true physical geometry of Minkowski spacetime, (ii) it cannot be extended in any reasonable manner to non-orthonormal bases in flat spacetime, and (iii) it cannot be extended in any reasonable manner to the curvilinear coordinates that one must use in general relativity. For these reasons, most modern texts (including the third editions of Goldstein and Jackson) take an alternative approach, one always used in general relativity. This alternative, which we shall adopt, requires introducing two different types of components for vectors, and analogously for tensors: contravariant components denoted by superscripts, e.g. T αβγ , and covariant components denoted by subscripts, e.g. Tαβγ . In Parts I–V of this book we introduce these components only for orthonormal bases; in Part VI we develop a more sophisticated version of them, valid for nonorthonormal bases. A vector or tensor’s contravariant components are defined as its expansion coefficients in the chosen basis [analog of Eq. (1.29d) in Euclidean 3-space]: ~ ≡ Aα~eα , A

T ≡ T αβγ ~eα ⊗ ~eβ ⊗ ~eγ .

(1.32a)

Here and throughout this book, Greek (spacetime) indices are to be summed whenever they are repeated with one up and the other down. The covariant components are defined as the

24 numbers produced by evaluating the vector or tensor on its basis vectors [analog of Eq. (1.29e) in Euclidean 3-space]: ~ eα ) = A ~ · ~eα , Aα ≡ A(~

Tαβγ ≡ T(~eα , ~eβ , ~eγ ) .

(1.32b)

These definitions have a number of important consequences. We shall derive them one after another and then at the end shall summarize them succinctly with equation numbers: (i) The covariant components of the metric tensor are gαβ = g(~eα , ~eβ ) = ~eα · ~eβ = ηαβ . Here the first equality is the definition (1.32b) of the covariant components and the second equality is the orthonormality relation (1.30) for the basis vectors. (ii) The covariant components of any tensor can be computed from the contravariant components by Tλµν = T(~eλ , ~eµ , ~eν ) = T αβγ ~eα ⊗~eβ ⊗~eγ (~eλ , ~eµ , ~eν ) = T αβγ (~eα ·~eλ )(~eβ ·~eµ )(~eγ · ~eν ) = T αβγ gαλ gβµ gγν . The first equality is the definition (1.32b) of the covariant components, the second is the expansion (1.32a) of T on the chosen basis, the third is the definition (1.10a) of the tensor product, and the fourth is one version of our result (i) for the covariant components of the metric. (iii) This result, Tλµν = T αβγ gαλ gβµ gγν , together with the numerical values (i) of gαβ , implies that when one lowers a spatial index there is no change in the numerical value of a component, and when one lowers a temporal index, the sign changes: Tijk = T ijk , T0jk = −T 0jk , T0j0 = +T 0j0 , T000 = −T 000 . We shall call this the “sign-flip-if-temporal” rule. As a special case, −1 = g00 = g 00 , 0 = g0j = −g 0j , δjk = gjk = g jk — i.e., the metric’s covariant and contravariant components are numerically identical; they are both equal to the orthonormality values ηαβ . (iv) It is easy to see that this sign-flip-if-temporal rule for lowering indices implies the same sign-flip-if-temporal rule for raising them, which in turn can be written in terms of metric components as T αβγ = Tλµν g λα g µβ g νγ . (v) It is convenient to define mixed components of a tensor, components with some indices up and others down, as having numerical values obtained by raising or lowering some but not all of its indices using the metric, e.g. T α µν = T αβγ gβµ gγν = Tλµν g λα . Numerically, this continues to follow the sign-flip-if-temporal rule: T 0 0k = −T 00k , T 0 jk = T 0jk , and it implies, in particular, that the mixed components of the metric are g α β = δαβ (the Kronecker-delta values; plus one if α = β and zero otherwise). Summarizing these results: The numerical values of the components of the metric in Minkowski spacetime are gαβ = ηαβ ,

g α β = δαβ ,

gα β = δαβ ,

g αβ = ηαβ ;

(1.32c)

and indices on all vectors and tensors can be raised and lowered using these components of the metric Aα = gαβ Aβ ,

Aα = g αβ Aβ ,

T α µν ≡ gµβ gνγ T αβγ

T αβγ ≡ g βµ g γν T α µν ,

(1.32d)

25 which says numerically that lowering a temporal index changes the component’s sign and lowering a spatial index leaves the component unchanged—and similarly for raising indices; the sign-flip-if-temporal rule. This index notation gives rise to formulas for tensor products, inner products, values of tensors on vectors, and tensor contractions, that are the obvious analogs of those in Euclidean space: [Contravariant components of T( , , ) ⊗ S( , )] = T αβγ S δǫ , ~·B ~ = Aα Bα = Aα B α , A

(1.32e)

T(A, B, C) = Tαβγ Aα B β C γ = T αβγ Aα Bβ Cγ ,

(1.32f)

Covariant components of [1&3contraction of R] = Rµ αµβ , Contravariant components of [1&3contraction of R] = Rµα µ β .

(1.32g)

Notice the very simple pattern in Eqs. (1.32), which universally permeates the rules of index gymnastics, a pattern that permits one to reconstruct the rules without any memorization: Free indices (indices not summed over) must agree in position (up versus down) on the two sides of each equation. In keeping with this pattern, one often regards the two indices in a pair that is summed (one index up and the other down) as “strangling each other” and thereby being destroyed, and one speaks of “lining up the indices” on the two sides of an equation to get them to agree. In Part VI, when we use non-orthonormal basis, all of these index-notation equations (1.32) will remain valid unchanged except for the numerical values (1.32c) of the metric components and the sign-flip-if-temporal rule.

1.5.3

[N & R] Slot-Naming Index Notation

[Note: In this and other sections marked “N&R”, the Newtonian reader should mentally lower all indices on tensor components and make them Latin; e.g. should mentally change T αβ α = T αβγ gαγ in Eq. (1.33) to Taba = Tabc gac .] We now pause, in our development of the component version of tensor algebra, to introduce a very important new viewpoint: Consider the rank-2 tensor F( , ). We can define a new tensor G( , ) to be the ~ and B ~ it is true same as F, but with the slots interchanged; i.e., for any two vectors A ~ ~ ~ ~ that G(A, B) = F(B, A). We need a simple, compact way to indicate that F and G are equal except for an interchange of slots. The best way is to give the slots names, say α and β—i.e., to rewrite F( , ) as F( α , β ) or more conveniently as Fαβ ; and then to write the relationship between G and F as Gαβ = Fβα . “NO!” some readers might object. This notation is indistinguishable from our notation for components on a particular basis. “GOOD!” a more astute reader will exclaim. The relation Gαβ = Fβα in a particular basis is a true statement if and only if “G = F with slots interchanged” is true, so why not use the same notation to symbolize both? This, in fact, we shall do. We shall ask our readers to look at any “index equation” such as Gαβ = Fβα like they would look at an Escher drawing: momentarily think of it as a relationship between components of tensors in a specific basis;

26 then do a quick mind-flip and regard it quite differently, as a relationship between geometric, basis-independent tensors with the indices playing the roles of names of slots. This mind-flip approach to tensor algebra will pay substantial dividends. As an example of the power of this slot-naming index notation, consider the contraction of the first and third slots of a third-rank tensor T. In any basis the components of 1&3contraction(T) are T αβ α ; cf. Eq. (1.32g). Correspondingly, in slot-naming index notation we denote 1&3contraction(T) by the simple expression T αβ α . We say that the first and third slots are “strangling each other” by the contraction, leaving free only the second slot (named β) and therefore producing a rank-1 tensor (a vector). By virtue of the “index-lowering” role of the metric, we can also write the contraction as T αβ α = T αβγ gαγ ,

(1.33)

and we can look at this relation from either of two viewpoints: The component viewpoint says that the components of the contraction of T in any chosen basis are obtained by taking a product of components of T and of the metric g and then summing over the appropriate indices. The slot-naming viewpoint says that the contraction of T can be achieved by taking a tensor product of T with the metric g to get T ⊗ g( , , , , ) (or T αβγ gµν in slotnaming index notation), and by then strangling on each other the first and fourth slots [named α in Eq. (1.33)], and also strangling on each other the third and fifth slots [named γ in Eq. (1.33)]. **************************** EXERCISES Exercise 1.5 Derivation: [N & R] Component Manipulation Rules If you are studying only the Newtonian part of this book, derive the component manipulation rules (1.29g) and (1.29h); otherwise, derive the relativistic rules (1.32e)–(1.32g). As you proceed, abandon any piece of the exercise when it becomes trivial for you. Exercise 1.6 Practice: [N & R] Numerics of Component Manipulations (a) In Euclidean space, in some Cartesian basis, the third rank tensor S( , , ) and vectors A and B have as their only nozero components S123 = S231 = S312 = +1, A1 = 3, B1 = 4, B2 = 5. What are the components of the vector S(A, B, ), the vector S(A, , B) and the tensor A ⊗ B? ~ and second(b) In Minkowski spacetime, in some inertial reference frame, the vector A 0 1 rank tensor T have as their only nonzero components A = 1, A = 2, A2 = A3 = 0; ~ A) ~ and the components of T(A, ~ ) T 00 = 3, T 01 = T 10 = 2, T 11 = −1. Evaluate T(A, ~ ⊗ T. and A Exercise 1.7 Practice: [N & R] Meaning of Slot-Naming Index Notation

27 (a) In Euclidean space, the following expressions and equations are written in slot-naming index notation; convert them to geometric, index-free notation: Ai Bjk ; Ai Bji, Sijk = Skji, Ai Bi = Ai Bj gij . (b) In Euclidean space, the following expressions are written in geometric, index-free notation; convert them to slot-naming index notation: T( , , A); T( , S(B, ), ). ~ ), ), ) into slot-naming index nota(c) In Minkowski spacetime, convert T( , S(R(C, tion. Exercise 1.8 Practice: [R] Index Gymnastics (a) Simplify the following expression so that the metric does not appear in it: Aαβγ gβρ Sγλ g ρδ g λ α . (b) The quantity gαβ g αβ is a scalar since it has no free indices. What is its numerical value? (c) What is wrong with the following expression and equation? Aα βγ Sαγ ; Aα βγ Sβ Tγ = Rαβδ S β .

****************************

1.6

[R] Particle Kinetics in Index Notation and in a Lorentz Frame

As an illustration of the component representation of tensor algebra, let us return to the relativistic, accelerated particle of Fig. 1.7 and, from the frame-independent equations for the particle’s 4-velocity ~u and 4-momentum ~p (Sec. 1.4), derive the component description given in elementary textbooks. We introduce a specific inertial reference frame and associated Lorentz coordinates xα and basis vectors {~eα }. In this Lorentz frame, the particle’s world line ~x(τ ) is represented by its coordinate location xα (τ ) as a function of its proper time τ . The contravariant components of the separation vector d~x between two neighboring events along the particle’s world line are the events’ coordinate separations dxα [Eq. (1.2)—which is why we put the indices up on coordinates]; and correspondingly, the components of the particle’s 4-velocity ~u = d~x/dτ are dxα uα = (1.34a) dτ (the time derivatives of the particle’s spacetime coordinates). Note that Eq. (1.34a) implies vj ≡

dxj /dτ uj dxj = = 0 . dt dt/dτ u

(1.34b)

aa 28

t

u

y

a u

x

v=u/γ

Fig. 1.10: Spacetime diagram in a specific Lorentz frame, showing the frame’s 3-space t = 0 (stippled region), the 4-velocity ~u of a particle as it passes through that 3-space (i.e., at time t = 0); and two 3-dimensional vectors that lie in the 3-space: the spatial part of the particle’s 4-velocity, u, and the particle’s ordinary velocity v.

Here v j are the components of the ordinary velocity as measured in the Lorentz frame. This relation, together with the unit norm of ~u, ~u2 = gαβ uα uβ = −(u0 )2 + δij uiuj = −1, implies that the components of the 4-velocity have the forms familiar from elementary textbooks: u0 = γ ,

uj = γv j ,

where γ =

1

1

(1 − δij v i v j ) 2

.

(1.34c)

It is useful to think of v j as the components of a 3-dimensional vector v, the ordinary velocity, that lives in the 3-dimensional Euclidean space t = const of the chosen Lorentz frame. As we shall see below, this 3-space is not well defined until a Lorentz frame has been chosen, and correspondingly, v relies for its existence on a specific choice of frame. However, once the frame has been chosen, v can be regarded as a coordinate-independent, basis-independent 3-vector lying in the frame’s 3-space t =const. Similarly, the spatial part of the 4-velocity ~u (the part with components uj in our chosen frame) can be regarded as a 3-vector u lying in the frame’s 3-space; and Eqs. (1.34c) become the component versions of the coordinate-independent, basis-independent 3-space relations u = γv ,

γ=√

1 . 1 − v2

(1.34d)

Figure 1.10 shows stippled the 3-space t = 0 of a specific Lorentz frame, and the 4-velocity ~u and ordinary velocity v of a particle as it passes through that 3-space. The components of the particle’s 4-momentum p~ in our chosen Lorentz frame have special names and special physical significances: The time component of the 4-momentum is the particle’s energy E as measured in that frame E ≡ p0 = mu0 = mγ = √ 1 ≃ m + mv2 2

m = (the particle’s energy) 1 − v2

for v ≡ |v| ≪ 1 .

(1.35a)

Note that this energy is the sum of the particle’s rest mass-energy m = mc2 and its kinetic energy mγ − m (which, for low velocities, reduces to the familiar nonrelativistic kinetic

29 energy E = 12 mv2 ). The spatial components of the 4-momentum, when regarded from the viewpoint of 3-dimensional physics, are the same as the components of the momentum, a 3-vector residing in the chosen Lorentz frame’s 3-space: pj = muj = mγv j = √

mv j = Evj = (j-component of particle’s momentum) ; 1 − v2

(1.35b)

or, in basis-independent, 3-dimensional vector notation, p = mu = mγv = √

mv = Ev = (particle’s momentum) . 1 − v2

(1.35c)

For a zero-rest-mass particle, as for one with finite rest mass, we identify the time component of the 4-momentum, in a chosen Lorentz frame, as the particle’s energy, and the spatial part as its momentum. Moreover, if—appealing to quantum theory—we regard a zero-rest-mass particle as a quantum associated with a monochromatic wave, then quantum theory tells us that the wave’s angular frequency ω as measured in a chosen Lorentz frame will be related to its energy by E ≡ p0 = ~ω = (particle’s energy) ;

(1.36a)

and, since the particle has p~2 = −(p0 )2 + p2 = −m2 = 0 (in accord with the lightlike nature of its world line), its momentum as measured in the chosen Lorentz frame will be p = En = ~ωn .

(1.36b)

Here n is the unit 3-vector that points in the direction of travel of the particle, as measured in the chosen frame; i.e. (since the particle moves at the speed of light v = 1), n is the particle’s ordinary velocity. Eqs. (1.36a) and (1.36b) are the temporal and spatial components of the geometric, frame-independent relation p~ = ~~k [Eq. (1.21), which is valid for zero-rest-mass particles as well as finite-mass ones]. The introduction of a specific Lorentz frame into spacetime can be said to produce a “3+1” split of every 4-vector into a 3-dimensional vector plus a scalar (a real number). The 3+1 split of a particle’s 4-momentum p~ produces its momentum p plus its energy E = p0 ; and correspondingly, the 3+1 split of the law of 4-momentum conservation (1.23) produces a law of conservation of momentum plus a law of conservation of energy: X X X X pA , pA¯ = EA . (1.37) EA¯ = ¯ A

A

¯ A

A

Here the unbarred quantities are the momenta or energies of the particles entering the interaction region, and the barred quantities are the momenta or energies of those leaving; cf. Fig. 1.8. Because the concept of energy does not even exist until one has chosen a Lorentz frame, and neither does that of momentum, the laws of energy conservation and momentum conservation separately are frame-dependent laws. In this sense they are far less fundamental than their combination, the frame-independent law of 4-momentum conservation.

30 Box 1.3 [N] Relativistic Particles for Newtonian Readers Readers who are skipping the relativistic parts of this book will need to know two important pieces of relativity: (i) geometrized units, as embodied in Eqs. (1.3), and (ii) the (relativistic) energy and momentum of a moving particle, as described here: A particle with rest mass m, moving with ordinary velocity v = dx/dt and speed v = |v|, has energy E (including its rest-mass), energy E and momentum p given by E=√

m m ≡E+m, ≡p 1 − v2 1 − v 2 /c2

p = Ev = √

mv . 1 − v2

(1)

[Eqs. (1.35)]. In the low-velocity, Newtonian limit, the energy E with rest mass removed and the momentum p and take their familiar, Newtonian forms: 1 When v ≪ c ≡ 1, E → mv 2 2

and p → mv .

(2)

A particle with zero rest mass (a photon or a graviton7 ) always moves with the speed of light v = c = 1, and like other particles it has has momentum p = Ev, so the magnitude of its momentum is equal to its energy: |p| = Ev = E. When particles interact (e.g. in chemical reactions, nuclear reactions, and elementaryparticle collisons) the sum of the particle energies E is conserved, as is the sum of the particle momenta p: Eq. (1.37). By learning to think about the 3+1 split in a geometric, frame-independent way, one can gain much conceptual and computational power. As a example, consider a particle with ~ In the observer’s own Lorentz 4-momentum p~, as studied by an observer with 4-velocity U. reference frame, her 4-velocity has components U 0 = 1 and U j = 0, and therefore, her 4~ = U α~eα = ~e0 , i.e. it is identically equal to the time basis vector of her Lorentz velocity is U frame. This means that the particle energy that she measures is E = p0 = −p0 = −~p · ~e0 = ~ This equation, derived in the observer’s Lorentz frame, is actually a geometric, frame−~p · U. independent relation: the inner product of two 4-vectors. It says that when an observer with ~ measures the energy of a particle with 4-momentum p~, the result she gets (the 4-velocity U time part of the 3+1 split of ~p as seen by her) is ~ . E = −~p · U

(1.38)

We shall use this equation fairly often in later chapters. In Exs. 1.9 and 1.10, the reader can get experience at deriving and interpreting other frame-independent equations for 3+1 splits. Exercise 1.11 exhibits the power of this geometric way of thinking by using it to derive the Doppler shift of a photon. **************************** EXERCISES

31 Exercise 1.9 **Practice: [R] Frame-Independent Expressions for Energy, Momentum, and Velocity8 ~ measures the properties of a particle with 4-momentum p~. An observer with 4-velocity U ~ Eq. (1.38). The energy she measures is E = −~p · E, (a) Show that the rest mass the observer measures is computable from m2 = −~p2 .

(1.39a)

(b) Show that the momentum the observer measures has the magnitude 1

~ )2 + p~ · p~] 2 . |p| = [(~p · U

(1.39b)

(c) Show that the ordinary velocity the observer measures has the magnitude |v| =

|p| , E

(1.39c)

where |p| and E are given by the above frame-independent expressions. (d) Show that the ordinary velocity v, thought of as a 4-vector that happens to lie in the observer’s 3-space of constant time, is given by ~v =

~ )U ~ ~p + (~p · U . ~ −~p · U

(1.39d)

Exercise 1.10 **Example: [R] 3-Metric as a Projection Tensor ~ who measures the properties of Consider, as in Exercise 1.9, an observer with 4-velocity U a particle with 4-momentum p~. (a) Show that the Euclidean metric of the observer’s 3-space, when thought of as a tensor in 4-dimensional spacetime, has the form ~ ⊗U ~ . P≡g+U

(1.40a)

~ is an arbitrary vector in spacetime, then −A ~·U ~ is the comShow, further, that if A ~ ~ ponent of A along the observer’s 4-velocity U , and ~ =A ~ + (A ~·U ~ )U ~ P( , A)

(1.40b)

~ into the observer’s 3-space; i.e., it is the spatial part of A ~ as seen is the projection of A by the observer. For this reason, P is called a projection tensor. In quantum mechanics one introduces the concept of a projection operator Pˆ as an operator that satisfies the equation Pˆ 2 = Pˆ . Show that the projection tensor P is a projection operator in the quantum mechanical sense: Pαµ P µ β = Pαβ . (1.40c) 8

Exercises marked with double stars are important expansions of the material presented in the text.

32 (b) Show that Eq. (1.39d) for the particle’s ordinary velocity, thought of as a 4-vector, can be rewritten as P( , ~p) ~v = . (1.41) ~ −~p · U Exercise 1.11 **Example: [R] Doppler Shift Derived without Lorentz Transformations

v

receiver

n

emitter

Fig. 1.11: Geometry for Doppler shift.

(a) An observer at rest in some inertial frame receives a photon that was emitted in a direction n by an atom moving with ordinary velocity v (Fig. 1.11). The photon frequency and energy as measured by the emitting atom are νem and Eem ; those measured by the receiving observer are νrec and Eem . By a calculation carried out solely in the receiver’s inertial frame (the frame of Fig. 1.11), and without the aid of any Lorentz transformation, derive the standard formula for the photon’s Doppler shift, √ 1 − v2 νrec = . (1.42) νem 1−v·n Hint: Use Eq. (1.38) to evaluate Eem using receiver-frame expressions for the emitting ~ and the photon’s 4-momentum ~p. atom’s 4-velocity U (b) Suppose that instead of emitting a photon, the emitter ejects a particle with finite rest mass m. Using the same method, derive an expression for the ratio of received energy to emitted energy, Erec /Eem , expressed in terms of the emitter’s ordinary velocity v and the particle’s ordinary velocity V (both as measured in the receiver’s frame).

****************************

1.7 1.7.1

Orthogonal and Lorentz Transformations of Bases, and Spacetime Diagrams [N] Euclidean 3-space: Orthogonal Transformations

Consider two different Cartesian coordinate systems {x, y, z} ≡ {x1 , x2 , x3 }, and {¯ x, y¯, z¯} ≡ {x¯1 , x¯2 , x¯3 }. Denote by {ei } and {ep¯} the corresponding bases. It must be possible to expand

33 the basis vectors of one basis in terms of those of the other. We shall denote the expansion coefficients by the letter R and shall write ei = ep¯Rp¯i , The quantities Rp¯i and Ri¯p are not of transformation matrices  R¯11 R¯12  [Rp¯i ] = R¯21 R¯22 R¯31 R¯32

ep¯ = ei Ri¯p .

(1.43)

the components of a tensor; rather, they are the elements  R¯13 R¯23  , R¯33



 R1¯1 R1¯2 R1¯3 [Ri¯p ] =  R2¯1 R2¯2 R2¯3  . R3¯1 R3¯2 R3¯3

(1.44a)

(Here and throughout this book we use square brackets to denote matrices.) These two matrices must be the inverse of each other, since one takes us from the barred basis to the unbarred, and the other in the reverse direction, from unbarred to barred: Rp¯i Ri¯q = δp¯q¯ ,

Ri¯p Rp¯j = δij .

(1.44b)

The orthonormality requirement for the two bases implies that δij = ei · ej = (ep¯Rp¯i ) · (eq¯Rq¯j ) = Rp¯i Rq¯j (ep¯ · eq¯) = Rp¯i Rq¯j δp¯q¯ = Rp¯i Rp¯j . This says that the transpose of [Rp¯i ] is its inverse—which we have already denoted by [Ri¯p ]; Ri¯p = Rp¯i .

(1.44c)

This property implies that the transformation matrix is orthogonal; i.e., the transformation is a reflection or a rotation [see, e.g., Goldstein (1980)]. Thus (as should be obvious and familiar), the bases associated with any two Euclidean coordinate systems are related by a reflection or rotation. Note: Eq. (1.44c) does not say that [Ri¯p ] is a symmetric matrix; in fact, it typically is not. Rather, (1.44c) says that [Ri¯p ] is the transpose of [Rp¯i ].] The fact that a vector A is a geometric, basis-independent object implies that A = Ai ei = Ai (ep¯Rp¯i ) = (Rp¯i Ai )ep¯ = Ap¯ep¯; i.e., Ap¯ = Rp¯i Ai ,

and similarly Ai = Ri¯p Ap¯ ;

(1.45a)

and correspondingly for the components of a tensor Tp¯q¯r¯ = Rp¯i Rq¯j Rr¯k Tijk ,

Tijk = Ri¯p Rj q¯Rk¯r Tp¯q¯r¯ .

(1.45b)

It is instructive to compare the transformation law (1.45a) for the components of a vector with those (1.43) for the bases. To make these laws look natural, we have placed the transformation matrix on the left in the former and on the right in the latter. In Minkowski spacetime, the placement of indices, up or down, will automatically tell us the order. If we choose the origins of our two coordinate systems to coincide, then the vector x reaching from the their common origin to some point P whose coordinates are xj and xp¯ has components equal to those coordinates; and as a result, the coordinates themselves obey the same transformation law as any other vector xp¯ = Rp¯i xi ,

xi = Ri¯p xp¯ ;

(1.45c)

The product of two rotation matrices, [Ri¯p Rp¯s¯] is another rotation matrix [Ris¯], which transforms the Cartesian bases es¯ to ei . Under this product rule, the rotation matrices form a mathematical group: the rotation group, whose “representations” play an important role in quantum theory.

34

1.7.2

[R] Minkowski Spacetime: Lorentz Transformations

Consider two different inertial reference frames in Minkowski spacetime. Denote their Lorentz coordinates by {xα } and {xµ¯ } and their bases by {eα } and {eµ¯ }, and write the transformation from one basis to the other as ~eα = ~eµ¯ Lµ¯ α ,

~eµ¯ = ~eα Lα µ¯ .

(1.46)

As in Euclidean 3-space, Lµ¯ α and Lα µ¯ are elements of two different transformation matrices, and since these matrices operate in opposite directions, they must be the inverse of each other: Lµ¯ α Lα ν¯ = δ µ¯ ν¯ , Lα µ¯ Lµ¯ β = δ α β . (1.47a) Notice the up/down placement of indices on the elements of the transformation matrices: the first index is always up, and the second is always down. This is just a convenient convention which helps systematize the index shuffling rules in a way that can be easily remembered. Our rules about summing on the same index when up and down, and matching unsummed indices on the two sides of an equation, automatically dictate the matrix to use in each of the transformations (1.46); and similarly for all other equations in this section. In Euclidean 3-space the orthonormality of the two bases dictated that the transformations must be orthogonal, i.e. must be reflections or rotations. In Minkowski spacetime, orthonormality implies gαβ = ~eα · ~eβ = (~eµ¯ Lµ¯ α ) · (~eν¯ Lν¯ β ) = Lµ¯ α Lν¯ β gµ¯ν¯ ; i.e., gµ¯ν¯ Lµ¯ α Lν¯ β = gαβ ,

and similarly gαβ Lα µ¯ Lβ ν¯ = gµ¯ν¯ .

(1.47b)

Any matrices whose elements satisfy these equations is a Lorentz transformation. From the fact that vectors and tensors are geometric, frame-independent objects, one can derive the Minkowski-space analogs of the Euclidean transformation laws for components (1.45a), (1.45b): Aµ¯ = Lµ¯ α Aα ,

T µ¯ν¯ρ¯ = Lµ¯ α Lν¯ β Lρ¯γ T αβγ ,

and similarly in the opposite direction. (1.48a) Notice that here, as elsewhere, these equations can be constructed by lining up indices in accord with our standard rules. If (as is conventional) we choose the spacetime origins of the two Lorentz coordinate systems to coincide, then the vector ~x extending from the origin to some event P, whose coordinates are xα and xα¯ , has components equal to those coordinates. As a result, the transformation law for the coordinates takes the same form as that (1.48a) for components of a vector: xα = Lα µ¯ xµ¯ , xµ¯ = Lµ¯ α xα . (1.48b) The product [Lα µ¯ Lµ¯ ρ¯] of two Lorentz transformation matrices is a Lorentz transformation matrix; and under this product rule, the Lorentz transformations form a mathematical group, the Lorentz group, whose “representations” play an important role in quantum field theory. An important specific example of a Lorentz transformation is the following     γ −βγ 0 0 γ βγ 0 0   βγ γ 0 0  γ 0 0   ,  , || Lµ¯ α || =  −βγ (1.49a) || Lα µ¯ || =   0  0 0 1 0  0 1 0  0 0 0 1 0 0 0 1

tan -1β

35

t

t

t

2

2 1

t P

t 1

2 1 1

2

x tan-1β

t

u x

x

x

(a)

us aneo x a 3-sp simultaneous 3-space in F x

u simult ce in F

(b)

(c)

Fig. 1.12: Spacetime diagrams illustrating the pure boost (1.49c) from one Lorentz reference frame to another.

where β and γ are related by |β| < 1 ,

1

γ ≡ (1 − β 2 )− 2 .

(1.49b)

One can readily verify that these matrices are the inverses of each other and that they satisfy the Lorentz-transformation relation (1.47b). These transformation matrices produce the following change of coordinates [Eq. (1.48b)] t = γ(t¯ + β x¯) , t¯ = γ(t − βx) ,

x = γ(¯ x + β t¯) , x¯ = γ(x − βt) ,

y = y¯ , y¯ = y ,

z = z¯ , z¯ = z .

(1.49c)

These expressions reveal that any point at rest in the unbarred frame (a point with fixed, time-independent x, y, z) is seen in the barred frame to move along the world line x¯ = const − β t¯, y¯ = const, z¯ = const. In other words, the unbarred frame is seen by observers at rest in the barred frame to move with uniform velocity ~v = −β~ex¯ , and correspondingly the barred frame is seen by observers at rest in the unbarred frame to move with the opposite uniform velocity ~v = +β~ex . This special Lorentz transformation is called a pure boost along the x direction.

1.7.3

[R] Spacetime Diagrams for Boosts

Figure 1.12 illustrates the pure boost (1.49c). Diagram (a) in that figure is a two-dimensional spacetime diagram, with the y- and z-coordinates suppressed, showing the t¯ and x¯ axes of the boosted Lorentz frame F¯ in the t, x Lorentz coordinate system of the unboosted frame F . That the barred axes make angles tan−1 β with the unbarred axes, as shown, can be inferred from the Lorentz transformation equation (1.49c). Note that invariance of the interval guarantees that the event x¯ = a on the x¯-axis lies at the intersection of that axis with the dashed hyperbola x2 − t2 = a2 ; and similarly, the event t¯ = a on the t¯-axis lies at the intersection of that axis with the dashed hyperbola t2 − x2 = a2 . As is shown in diagram (b) of the figure, the barred coordinates t¯, x¯ of an event P can be inferred by projecting from P onto the t¯- and x¯-axes, with the projection going parallel to the x¯- and t¯axes respectively. Diagram (c) shows the 4-velocity ~u of an observer at rest in frame F and

36 that, ~u¯ of an observer in frame F¯ . The events which observer F regards as all simultaneous, with time t = 0, lie in a 3-space that is orthogonal to ~u and includes the x-axis. This is the Euclidean 3-space of reference frame F and is also sometimes called F ’s 3-space of simultaneity. Similarly, the events which observer F¯ regards as all simultaneous, with t¯ = 0, live in the 3-space that is orthogonal to ~u¯ and includes the x¯-axis. This is the Euclidean 3-space (3-space of simultaneity) of frame F¯ . Exercise 1.14 uses spacetime diagrams, similar to Fig. 1.12, to deduce a number of important relativistic phenomena, including the contraction of the length of a moving object (“length contraction”), the breakdown of simultaneity as a universally agreed upon concept, and the dilation of the ticking rate of a moving clock (“time dilation”). This exercise is extremely important; every reader who is not already familiar with it should study it. **************************** EXERCISES Exercise 1.12 Derivation: [R] The Inverse of a Lorentz Boost Show that, if the Lorentz coordinates of an inertial frame F are expressed in terms of those of the frame F¯ by Eq. (1.49c), then the inverse transformation from F to F¯ is given by the same equation with the sign of β reversed. Write down the corresponding transformation matrix Lµ¯ α [analog of Eq. (1.49a)]. Exercise 1.13 Problem: [R] Allowed and Forbidden Electron-Photon Reactions Show, using spacetime diagrams and also using frame-independent calculations, that the law of conservation of 4-momentum forbids a photon to be absorbed by an electron, e + γ → e and also forbids an electron and a positron to annihilate and produce a single photon e+ + e− → γ (in the absence of any other particles to take up some of the 4-momentum); but the annihilation to form two photons, e+ + e− → 2γ, is permitted. Exercise 1.14 **Example: [R] Spacetime Diagrams Use spacetime diagrams to prove the following: (a) Two events that are simultaneous in one inertial frame are not necessarily simultaneous in another. More specifically, if frame F¯ moves with velocity ~v = β~ex as seen in frame F , where β > 0, then of two events that are simultaneous in F¯ the one farther “back” (with the more negative value of x¯) will occur in F before the one farther “forward”. (b) Two events that occur at the same spatial location in one inertial frame do not necessarily occur at the same spatial location in another. (c) If P1 and P2 are two events with a timelike separation, then there exists an inertial reference frame in which they occur at the same spatial location; and in that frame the time lapse between them √ is equal to the square root of the negative of their invariant interval, ∆t = ∆τ ≡ −∆s2 .

37 (d) If P1 and P2 are two events with a spacelike separation, then there exists an inertial reference frame in which they are simultaneous; and in that frame the p spatial distance i j between √ them is equal to the square root of their invariant interval, gij ∆x ∆x = ∆s ≡ ∆s2 . (e) If the inertial frame F¯ moves with speed β relative to the frame F , then a clock at rest in F¯ ticks more slowly as viewed from F than as viewed from F¯ —more slowly by 1 a factor γ −1 = (1 − β 2 ) 2 . This is called relativistic time dilation. (f) If the inertial frame F¯ moves with velocity ~v = β~ex relative to the frame F and the two frames are related by a pure boost, then an object at rest in F¯ as studied in F 1 appears shortened by a factor γ −1 = (1 − β 2 ) 2 along the x direction, but its length along the y and z directions is unchanged. This is called Lorentz contraction. Exercise 1.15 Example: [R] General Boosts and Rotations (a) Show that, if nj is a 3-dimensional unit vector and β and γ are defined as in Eq. (1.49b), then the following is a Lorentz transformation; i.e., it satisfies Eq. (1.47b). L0 ¯0 = γ ,

L0¯j = Lj ¯0 = βγnj ,

Lj k¯ = Lk ¯j = (γ − 1)nj nk + δ jk .

(1.50)

Show, further, that this transformation is a pure boost along the direction n with speed β, and show that the inverse matrix [Lµ¯ α ] for this boost is the same as [Lα µ¯ ], but with β changed to −β. (b) Show that the following is also a Lorentz transformation:   1 0 0 0   0  , || Lα µ¯ || =    0 [Ri¯j ] 0

(1.51)

where [Ri¯j ] is a three-dimensional rotation matrix for Euclidean 3-space. Show, further, that this Lorentz transformation rotates the inertial frame’s spatial axes (its latticework of measuring rods), while leaving the frame’s velocity unchanged; i.e., the new frame is at rest with respect to the old. One can show (not surprisingly) that the general Lorentz transformation [i.e., the general solution of Eqs. (1.47b)] can be expressed as a sequence of pure boosts, pure rotations, and pure inversions (in which one or more of the coordinate axes are reflected through the origin, so xα = −xα¯ ).

****************************

38 t

7 6 5

or en

ce

4 3

Fl

11 10 9 8 7 6 5 4 3 2 τc=1 0

Methuselah

t

2

τc=1 0

x

(a)

7 6

11 10 9 8 7 6 5 4 3 2 τc=1 0

5

4 3

2 τc=1 0

x

(b)

Fig. 1.13: (a) Spacetime diagram depicting the twins paradox. Marked along the two world lines are intervals of proper time as measured by the two twins. (b) Spacetime diagram depicting the motions of the two mouths of a wormhole. Marked along the mouths’ world tubes are intervals of proper time τc as measured by the single clock that sits on the common mouths.

1.8

[R] Time Travel

Time dilation is one facet of a more general phenomenon: Time, as measured by ideal clocks, is a “personal thing,” different for different observers who move through spacetime on different world lines. This is well illustrated by the infamous “twins paradox,” in which one twin, Methuselah, remains forever at rest in an inertial frame and the other, Florence, makes a spacecraft journey at high speed and then returns to rest beside Methuselah. The twins’ world lines are depicted in Fig. 1.13(a), a spacetime diagram whose axes are those of Methuselah’s inertial frame. The time measured by an ideal clock that Methuselah carries is the coordinate time t of his inertial frame; and its total time lapse, from Florence’s departure to her return, is treturn − tdeparture ≡ TMethuselah . By contrast, the time measured by an ideal clock that Florence carries is the proper time τ , i.e. the square root of the invariant interval (1.7), along her world line; and thus her total time lapse from departure to return is TFlorence =

Z

Z p Z 2 i j dτ = dt − δij dx dx =

TMethuselah

√

1 − v 2 dt .

(1.52)

0

Here (t, xi ) are the time and p space coordinates of Methuselah’s inertial frame, and v is Florence’s ordinary speed, v = δij (dxi /dt)(dxj /dt), relative to Methuselah’s frame. Obviously, Eq. (1.52) predicts that TFlorence is less than TMethuselah . In fact (cf. Exercise 1.16), even if Florence’s acceleration is kept no larger than one Earth gravity throughout her trip, and her trip lasts only TFlorence = (a few tens of years), TMethuselah can be hundreds or thousands or millions or billions of years. Does this mean that Methuselah actually “experiences” a far longer time lapse, and actually ages far more than Florence? Yes. The time experienced by humans and the aging

39 of the human body are governed by chemical processes, which in turn are governed by the natural oscillation rates of molecules, rates that are constant to high accuracy when measured in terms of ideal time (or, equivalently, proper time τ ). Therefore, a human’s experiential time and aging time are the same as the human’s proper time—so long as the human is not subjected to such high accelerations as to damage her body. In effect, then, Florence’s spacecraft has functioned as a time machine to carry her far into Methuselah’s future, with only a modest lapse of her own proper time (ideal time; experiential time; aging time). Is it also possible, at least in principle, for Florence to construct a time machine that carries her into Methuselah’s past—and also her own past? At first sight, the answer would seem to be Yes. Figure 1.13(b) shows one possible method, using a wormhole. [Papers on other methods are cited in Thorne (1993) and Friedman and Higuchi (2006).] Wormholes are hypothetical “handles” in the topology of space. A simple model of a wormhole can be obtained by taking a flat 3-dimensional space, removing from it the interiors of two identical spheres, and identifying the spheres’ surfaces so that if one enters the surface of one of the spheres, one immediately finds oneself exiting through the surface of the other. When this is done, there is a bit of strongly localized spatial curvature at the spheres’ common surface, so to analyze such a wormhole properly, one must use general relativity rather than special relativity. In particular, it is the laws of general relativity, combined with the laws of quantum field theory, that tell one how to construct such a wormhole and what kinds of materials (quantum fields) are required to “hold it open” so things can pass through it. Unfortunately, despite considerable effort, theoretical physicists have not yet deduced definitively whether those laws permit such wormholes to exist.9 On the other hand, assuming such wormholes can exist, the following special relativistic analysis shows how one might be used to construct a machine for backward time travel.10 The two identified spherical surfaces are called the wormhole’s mouths. Ask Methuselah to keep one mouth with himself, forever at rest in his inertial frame, and ask Florence to take the other mouth with herself on her high-speed journey. The two mouths’ world tubes (analogs of world lines for a 3-dimensional object) then have the forms shown in Fig. 1.13(b). Suppose that a single ideal clock sits on the wormhole’s identified mouths, so that from the external Universe one sees it both on Methuselah’s wormhole mouth and on Florence’s. As seen on Methuselah’s mouth, the clock measures his proper time, which is equal to the coordinate time t [see tick marks along the left world tube in Fig. 1.13(b)]. As seen on Florence’s mouth, the clock measures her proper time, Eq. (1.52) [see tick marks along the right world tube in Fig. 1.13(b)]. The result should be obvious, if surprising: When Florence returns to rest beside Methuselah, the wormhole has become a time machine. If she travels through the wormhole when the clock reads τc = 7, she goes backward in time as seen in Methuselah’s (or anyone else’s) inertial frame; and then, in fact, traveling along the everywhere timelike, dashed world line, she is able to meet her younger self before she entered the wormhole. This scenario is profoundly disturbing to most physicists because of the dangers of science9 See, e.g., Morris and Thorne (1987), Thorne (1993), Borde, Ford and Roman (2002), and references therein. 10 Morris, Thorne, and Yurtsever (1988).

40 fiction-type paradoxes (e.g., the older Florence might kill her younger self, thereby preventing herself from making the trip through the wormhole and killing herself). Fortunately perhaps, it now seems moderately likely (though not certain) that vacuum fluctuations of quantum fields will destroy the wormhole at the moment when its mouths’ motion first makes backward time travel possible; and it may be that this mechanism will always prevent the construction of backward-travel time machines, no matter what tools one uses for their construction.11 **************************** EXERCISES Exercise 1.16 Example: [R] Twins Paradox (a) The 4-acceleration of a particle or other object is defined by ~a ≡ d~u/dτ , where ~u is its 4-velocity and τ is proper time along its world line. Show that, if an observer carries an accelerometer, the magnitude of the acceleration a measured by the accelerometer will √ always be equal to the magnitude of the observer’s 4-acceleration, a = |~a| ≡ ~a · ~a. (b) In the twins paradox of Fig. 1.13(a), suppose that Florence begins at rest beside Methuselah, then accelerates in Methuselah’s x-direction with an acceleration a equal to one Earth gravity, “1g”, for a time TFlorence /4 as measured by her, then accelerates in the −x-direction at 1g for a time TFlorence /2 thereby reversing her motion, and then accelerates in the +x-direction at 1g for a time TFlorence /4 thereby returning to rest beside Methuselah. (This is the type of motion shown in the figure.) Show that the total time lapse as measured by Methuselah is 4 gTFlorence TMethuselah = sinh . (1.53) g 4 (b) Show that in the geometrized units used here, Florence’s acceleration (equal to acceletion of gravity at the surface of the Earth) is g = 1.033/yr. Plot TMethuselah as a function of TFlorence , and from your plot deduce that, if TFlorence is several tens of years, then TMethuselah can be hundreds or thousands or millions or even billions of years. Exercise 1.17 Challenge: [R] Around the World on TWA In a long-ago era when an airline named Trans World Airlines (TWA) flew around the world, J. C. Hafele and R. E. Keating carried out a real live twins paradox experiment: They synchronized two atomic clocks, and then flew one around the world eastward on TWA, and on a separate trip, around the world westward, while the other clock remained at home at the Naval Research Laboratory near Washington D.C. When the clocks were compared after each trip, they were found to have aged differently. Making reasonable estimates for the airplane routing and speeds, compute the difference in aging, and compare your result 11

Kim and Thorne (1991), Hawking (1992), Thorne (1993). But see also contrary indications in more recent research reviewed by Friedman and Higuchi (2006).

41 with the experimental data (Hafele and Keating, 1972). [Note: The rotation of the Earth is important, as is the general relativistic gravitational reshift associated with the clocks’ altitudes; but the gravitational redshift drops out of the difference in aging, if the time spent at high altitude is the same eastward as westward.] ****************************

1.9

[N & R] Directional Derivatives, Gradients, LeviCivita Tensor, Cross Product and Curl

[See note at the beginning of Sec. 1.5.3.] Let us return to the formalism of differential geometry. We shall use the vector notation ~ of Minkowski spacetime, but our discussion will be valid simultaneously for spacetime and A for Euclidean 3-space. ~ We define the Consider a tensor field T(P) in spacetime or 3-space and a vector A. ~ by the obvious limiting procedure directional derivative of T along A 1 ~ − T(~xP )] ∇A~ T ≡ lim [T(~xP + ǫA) ǫ→0 ǫ

(1.54a)

~ and similarly for the directional derivative of a vector field B(P) and a scalar field ψ(P). In this definition we have denoted points, e.g. P, by the vector ~xP that reaches from some arbitrary origin to the point. It should not be hard to convince oneself that the directional derivative of any tensor ~ along which one differentiates. Correspondingly, if T has field T is linear in the vector A rank n (n slots), then there is another tensor field, denoted ∇T, with rank n + 1, such that ∇A~ T = ∇T( ,

,

~ . , A)

(1.54b)

~ is put Here on the right side the first n slots (3 in the case shown) are left empty, and A into the last slot (the “differentiation slot”). The quantity ∇T is called the gradient of T. In slot-naming index notation, it is conventional to denote this gradient by Tαβγ;δ , where in general the number of indices preceding the semicolon is the rank of T. Using this notation, ~ reads [cf. Eq. (1.54b)] Tαβγ;δ Aδ . the directional derivative of T along A It is not hard to show that in any orthonormal (i.e., Cartesian or Lorentz) coordinate system, the components of the gradient are nothing but the partial derivatives of the components of the original tensor, Tαβγ;δ =

∂Tαβγ ≡ Tαβγ,δ . ∂xδ

(1.54c)

(Here and henceforth all indices that follow a subscript comma represent partial derivatives, e.g., Sα,µν ≡ ∂ 2 Sα /∂xµ ∂xν .) In a non-Cartesian and non-Lorentz basis, the components of the gradient typically are not obtained by simple partial differentiation [Eq. (1.54c) fails]

42 because of turning and length changes of the basis vectors as we go from one location to another. In Sec. 10.3 we shall learn how to deal with this by using objects called connection coefficients. Until then, however, we shall confine ourselves to Cartesian and Lorentz bases, so subscript semicolons and subscript commas can be used interchangeably. Because the gradient and the directional derivatives are defined by the same standard limiting process as one uses when defining elementary derivatives, they obey the standard Leibniz rule for differentiating products: ∇A~ (S ⊗ T) = (∇A~ S) ⊗ T + S ⊗ ∇A~ T , i.e., (S T γδǫ );µ Aµ = (S αβ ;µ Aµ )T γδǫ + S αβ (T γδǫ ;µ Aµ ) ; αβ

(1.55a)

and ∇A~ (f T) = (∇A~ f )T + f ∇A~ T ,

i.e., (f T αβγ );µ Aµ = (f,µ Aµ )T αβγ + f T αβγ ;µ Aµ .

(1.55b)

In an orthonormal basis these relations should be obvious: They follow from the Leibniz rule for partial derivatives. Because the components gαβ of the metric tensor are constant in any Lorentz or Cartesian coordinate system, Eq. (1.54c) (which is valid in such coordinates) guarantees that gαβ;γ = 0; i.e., the metric has vanishing gradient: ∇g = 0 ,

i.e., gαβ;µ = 0 .

(1.56)

From the gradient of any vector or tensor we can construct several other important ~ of a vector field A ~ has two derivatives by contracting on indices: (i ) Since the gradient ∇A ~ , ), we can strangle (contract) its slots on each other to obtain a scalar field. slots, ∇A( ~ and is denoted That scalar field is the divergence of A ~ ≡ (contraction of ∇A) ~ = Aα ;α . ∇·A

(1.57)

(ii ) Similarly, if T is a tensor field of rank three, then T αβγ ;γ is its divergence on its third slot, and T αβγ ;β is its divergence on its second slot. (iii ) By taking the double gradient and then contracting on the two gradient slots we obtain, from any tensor field T, a new tensor field with the same rank, ∇2 T ≡ (∇ · ∇)T ,

or, in index notation, Tαβγ;µ ;µ .

(1.58)

In Euclidean space ∇2 is called the Laplacian; in spacetime it is called the d’Alembertian. The metric tensor is a fundamental property of the space in which it lives; it embodies the inner product and thence the space’s notion of distance or interval and thence the space’s geometry. In addition to the metric, there is one (and only one) other fundamental tensor that embodies a piece of the space’s geometry: the Levi-Civita tensor ǫ. The Levi-Civita tensor has a number of slots equal to the dimensionality N of the space in which it lives, 4 slots in 4-dimensional spacetime and 3 slots in 3-dimensional Euclidean space; and ǫ is antisymmetric in each and every pair of its slots. These properties determine ǫ uniquely up to a multiplicative constant. That constant is fixed by a compatibility relation

43 between ǫ and the metric g: If {~eα } is an orthonormal basis [orthonormality being defined with the aid of the metric, ~eα · ~eβ = g(~eα , ~eβ ) = ηαβ in spacetime and = δαβ in Euclidean space], and if this basis is right-handed (a new property, not determined by the metric), then ǫ(~e1 , ~e2 , . . . , ~eN ) = +1 in a space of N dimensions;

ǫ(~e0 , ~e1 , ~e2 , ~e3 ) = +1 in spacetime. (1.59a) The concept of right handedness should be familiar in Euclidean 2-space or 3-space. In spacetime, the basis is right handed if {~e1 , ~e2 , ~e3 } is right handed and ~e0 points to the future. Equation (1.59a) and the antisymmetry of ǫ imply that in an orthonormal, right-handed basis, the only nonzero covariant components of ǫ are ǫ12...N = +1 , ǫαβ...ν = +1 if α, β, . . . , ν is an even permutation of 1, 2, . . . , N = −1 if α, β, . . . , ν is an odd permutation of 1, 2, . . . , N = 0 if α, β, . . . , ν are not all different;

(1.59b)

(In spacetime the indices run from 0 to 3 rather than 1 to N = 4.) One can show that these components in one right-handed orthonormal frame imply these same components in all other right-handed orthonormal frames by virtue of the fact that the orthogonal (3space) and Lorentz (spacetime) transformation matrices have unit determinant; and that in a left-handed orthormal frame the signs of these components are reversed. In 3-dimensional Euclidean space, the Levi-Civita tensor is used to define the cross product and the curl: A × B ≡ ǫ( , A, B)

i.e., in slot-naming index notation, ǫijk Aj Bk ;

(1.60a)

∇ × A ≡ (the vector field whose slot-naming index form is ǫijk Ak;j ) .

(1.60b)

[Equation (1.60b) is an example of an expression that is complicated if written in index-free notation; it says that ∇ × A is the double contraction of the rank-5 tensor ǫ ⊗ ∇A on its second and fifth slots, and on its third and fourth slots.] Although Eqs. (1.60a) and (1.60b) look like complicated ways to deal with concepts that most readers regard as familiar and elementary, they have great power. The power comes from the following property of the Levi-Civita tensor in Euclidean 3-space [readily derivable from its components (1.59b)]: ij ǫijm ǫklm = δkl ≡ δki δlj − δli δkj .

(1.61)

ij Here δki is the Kronecker delta. Examine the 4-index delta function δkl carefully; it says that either the indices above and below each other must be the same (i = k and j = l) with a + sign, or the diagonally related indices must be the same (i = l and j = k) with a − ij sign. [We have put the indices ij of δkl up solely to facilitate remembering this rule. Recall (first paragraph of Sec. 1.5) that in Euclidean space and Cartesian coordinates, it does not matter whether indices are up or down.] With the aid of Eq. (1.61) and the index-notation expressions for the cross product and curl, one can quickly and easily derive a wide variety of useful vector identities; see the very important Exercise 1.18.

44

**************************** EXERCISES Exercise 1.18 **Example and Practice: [N] Vectorial Identities for the Cross Product and Curl Here is an example of how to use index notation to derive a vector identity for the double cross product A×(B×C): In index notation this quantity is ǫijk Aj (ǫklm Bl Cm ). By permuting the indices on the second ǫ and then invoking Eq. (1.61), we can write this as ǫijk ǫlmk Aj Bl Cm = δijlm Aj Bl Cm . By then invoking the meaning (1.61) of the 4-index delta function, we bring this into the form Aj Bi Cj −Aj Bj Ci , which is the index-notation form of (A·C)B−(A·B)C. Thus, it must be that A × (B × C) = (A · C)B − (A · B)C. Use similar techniques to evaluate the following quantities: (a) ∇ × (∇ × A) (b) (A × B) · (C × D) (c) (A × B) × (C × D)

****************************

1.10

[R] Nature of Electric and Magnetic Fields; Maxwell’s Equations

Now that we have introduced the gradient and the Levi-Civita tensor, we are prepared to study the relationship of the relativistic version of electrodynamics to the nonrelativistic (“Newtonian”) version. Consider a particle with charge q, rest mass m and 4-velocity ~u interacting with an electromagnetic field F( , ). In index notation, the electromagnetic 4-force acting on the particle [Eq. (1.28)] is dpα /dτ = qF αβ uβ . (1.62) Let us examine this 4-force in some arbitrary inertial reference frame in which particle’s ordinary-velocity components are v j = vj and its 4-velocity components are u0 = γ, uj = γv j [Eqs. (1.34c)]. Anticipating the connection with the nonrelativistic viewpoint, we introduce the following notation for the contravariant components of the antisymmetric electromagnetic field tensor: F 0j = −F j0 = Ej , F ij = ǫijk Bk . (1.63) (Recall that spatial indices, being Euclidean, can be placed up or down freely with no change in sign of the indexed quantity.) Inserting these components of F and ~u into Eq. (1.62) and using the relationship dt/dτ = u0 = γ between t and τ derivatives, we obtain for the

45 t w

Bw

Ew y x

Fig. 1.14: The electric and magnetic fields measured by an observer with 4-velocity w, ~ shown as 4~ w~ and B ~ w~ that lie in the observer’s 3-surface of simultaneity (stippled 3-surface orthogonal vectors E to w). ~

components of the 4-force dpj /dτ = γdpj /dt = qγ(Ej + ǫijk vj Bk ) and dp0 /dτ = γdp0 /dt = γEj vj . Dividing by γ, converting into 3-space index notation, and denoting the particle’s energy by E = p0 , we bring these into the familiar Lorentz-force form dp/dt = q(E + v × B) ,

dE/dt = v · E .

(1.64)

Evidently E is the electric field and B the magnetic field as measured in our chosen Lorentz frame. This may be familiar from standard electrodynamics textbooks, e.g. Jackson (1999). Not so familiar, but quite important, is the following geometric interpretation of E and B: The electric and magnetic fields E and B are spatial vectors as measured in the chosen inertial frame. We can also regard them as 4-vectors that lie in the 3-surface of simultaneity t = const of the chosen frame, i.e. that are orthogonal to the 4-velocity (denote it w) ~ of the frame’s observers (cf. Figs. 1.12 and 1.14). We shall denote this 4-vector version of E and B ~ w~ and B ~ w~ , where the subscript w by E ~ identifies the 4-velocity of the observers who measure these fields. These fields are depicted in Fig. 1.14. ~ w~ are E 0 = 0, E j = Ej [the In the rest frame of the observer w, ~ the components of E w ~ w ~ ~ w~ ; and the components of w Ej appearing in Eqs. (1.63)], and similarly for B ~ are w 0 = 1, w j = 0. Therefore, in this frame Eqs. (1.63) can be rewritten as Ewα~ = F αβ wβ ,

1 Bwβ~ = ǫαβγδ Fγδ wα . 2

(1.65a)

(To verify this, insert the above components of F and w ~ into these equations and, after some 0 0 algebra, recover Eqs. (1.63) along with Ew~ = Bw~ = 0.) Equations (1.65a) say that in one special reference frame, that of the observer w, ~ the components of the 4-vectors on the left and on the right are equal. This implies that in every Lorentz frame the components of these 4-vectors will be equal; i.e., it implies that Eqs. (1.65a) are true when one regards them as geometric, frame-independent equations written in slot-naming index notation. These equations enable one to compute the electric and magnetic fields measured by an observer (viewed as 4-vectors in the observer’s 3-surface of simultaneity) from the observer’s 4-velocity and the electromagnetic field tensor, without the aid of any basis or reference frame.

46 Equations (1.65a) embody explicitly the following important fact: Although the electromagnetic field tensor F is a geometric, frame-independent quantity, the electric and magnetic ~ w~ and B ~ w~ individually depend for their existence on a specific choice of observer (with fields E 4-velocity w), ~ i.e., a specific choice of inertial reference frame, i.e., a specific choice of the split of spacetime into a 3-space (the 3-surface of simultaneity orthogonal to the observer’s 4-velocity w) ~ and corresponding time (the Lorentz time of the observer’s reference frame). Only after making such an observer-dependent “3+1 split” of spacetime into space plus time do the electric field and the magnetic field come into existence as separate entities. Different observers with different 4-velocities w ~ make this spacetime split in different ways, thereby ~ w~ and B ~ w~ . resolving the frame-independent F into different electric and magnetic fields E By the same procedure as we used to derive Eqs. (1.65a), one can derive the inverse relationship, the following expression for the electromagnetic field tensor in terms of the (4-vector) electric and magnetic fields measured by some observer: F αβ = w αEwβ~ − Ewα~ w β + ǫαβ γδ w γ Bwδ~ .

(1.65b)

Maxwell’s equations in geometric, frame-independent form are 4πJ α in Gaussian units αβ F ;β = J α /ǫo = µo J α in SI units , ǫαβγδ Fγδ;β = 0 .

(1.66)

(Since we are setting the speed of light to unity, ǫo = 1/µo .) Here J~ is the charge-current 4-vector, which in any inertial frame has components J 0 = ρe = (charge density) ,

J i = ji = (current density).

(1.67)

Exercise 1.20 describes how to think about this charge density and current density as geometric objects determined by the observer’s 4-velocity or 3+1 split of spacetime into space plus time. Exercise 1.21 shows how the frame-independent Maxwell equations (1.66) reduce to the more familiar ones in terms of E and B. Exercise 1.22 explores potentials for the electromagnetic field in geometric, frame-independent language and the 3+1 split. **************************** EXERCISES Exercise 1.19 Derivation and Practice: [R] Reconstruction of F Derive Eq. (1.65b) by the same method as was used to derive (1.65a). Exercise 1.20 Problem: [R] 3+1 Split of Charge-Current 4-Vector Just as the electric and magnetic fields measured by some observer can be regarded as 4~ w~ and B ~ w~ that live in the observer’s 3-space of simultaneity, so also the charge vectors E density and current density that the observer measures can be regarded as a scalar ρw~ and 4-vector ~jw~ that live in the 3-space of simultaneity. Derive geometric, frame-independent equations for ρw~ and ~jw~ in terms of the charge-current 4-vector J~ and the observer’s 4velocity w, ~ and derive a geometric expression for J~ in terms of ρw~ , ~jw~ , and w. ~

47 Exercise 1.21 Problem: [R] Frame-Dependent Version of Maxwell’s Equations From the geometric version of Maxwell’s equations (1.66), derive the elementary, framedependent version ∂E 4πρe in Gaussian units 4πj in Gaussian units ∇·E = ∇×B− = ρe /ǫo in SI units, µo j in SI units, ∂t ∂B ∇·B =0 , ∇×E+ =0. (1.68) ∂t

Exercise 1.22 Problem: [R] Potentials for the Electromagnetic Field (a) Express the electromagnetic field tensor as an antisymmetrized gradient of a 4-vector potential: in slot-naming index notation Fαβ = Aβ;α − Aα;β .

(1.69a)

~ the second of the Maxwell Show that, whatever may be the 4-vector potential A, equations (1.66) is automatically satisfied. Show further that the electromagnetic field tensor is unaffected by a gauge change of the form ~ new = A ~ old + ∇ψ , A

(1.69b)

where ψ is a scalar field (the generator of the gauge change). Show, finally, that it is possible to find a gauge-change generator that enforces “Lorenz gauge” ~=0 ∇·A

(1.69c)

on the new 4-vector potential, and show that in this gauge, the first of the Maxwell equations (1.66) becomes ~ = 4π J~ ; ∇2 A

i.e. Aα;µ µ = 4πJ α .

(1.69d)

(b) Introduce an inertial reference frame, and in that frame split F into the electric and magnetic fields E and B, split J~ into the charge and current densities ρe and j, and split the vector potential into a scalar potential and a 3-vector potential φ ≡ A0 ,

~. A = spatial part of A

(1.69e)

Deduce the 3+1 splits of Eqs. (1.69a)–(1.69d) and show that they take the form given in standard textbooks on electrodynamics.

****************************

48

1.11

Volumes, Integration, and Integral Conservation Laws

1.11.1

[N] Newtonian Volumes and Integration

The Levi-Civita tensor is the foundation for computing volumes and performing volume integrals in any number of dimensions. In Cartesian coordinates of 2-dimensional Euclidean space, the area (i.e. 2-dimensional volume) of a parallelogram whose sides are A and B is A1 B1 2-Volume = ǫab Aa Bb = A1 B2 − A2 B1 = det , (1.70a) A2 B2 a relation that should be familiar from elementary geometry. Equally familiar should be the expression for the 3-dimensional volume of a parallelopiped with legs A, B, and C:   A1 B1 C1 3-Volume = ǫijk Ai Bj Ck = A · (B × C) = det  A2 B2 C2  . (1.70b) A3 B3 C3 Recall that this volume has a sign: it is positive if {A, B, C} is a right handed set of vectors and negative if left-handed. Equations (1.70a) and (1.70b) are foundations from which one can derive the usual formulae dA = dx dy and dV = dx dy dz for the area and volume of elementary surface and volume elements with Cartesian side lengths dx, dy and dz. In Euclidean 3-space, we define the vectorial surface area of a 2-dimensional parallelogram with legs A and B to be (1.70c) Σ = A × B = ǫ( , A, B) . This vectorial surface area has a magnitude equal to the area of the parallelogram and a direction perpendicular to it. Such vectorial surface areas are the foundation for surface integrals in 3-dimensional space, and for the familiar Gauss theorem Z Z A · dΣ (1.71a) (∇ · A)dV = V3

∂V3

(where V3 is a compact 3-dimensional region and ∂V3 is its closed two-dimensional boundary) and Stokes theorem Z Z A · dl (1.71b) ∇ × A · dΣ = V2

∂V2

(where V2 is a compact 2-dimensional region, ∂V2 is the 1-dimensional closed curve that bounds it, and the last integral is a line integral around that curve). Notice that in Euclidean 3-space, the vectorial surface area ǫ( , A, B) can be thought of as an object that is waiting for us to insert a third leg C so as to compute a volume ǫ(C, A, B)—the volume of the parallelopiped with legs C, A, and B. This mathematics is illustrated by the integral and differential conservation laws for electric charge and for particles: The total R charge andR the total number of particles inside a three dimensional region of space V3 are V3 ρe dV and V3 ndV , where ρe is the charge density

49 and n the number density of particles. The rates that charge and particles flow out of V3 are the integrals of the current density j and the particle flux vector S over its boundary ∂V3 . Therefore, the laws of charge conservation and particle conservation say Z Z Z Z d d j · dΣ = 0 , S · dΣ = 0 . (1.72) ρe dV + ndV + dt V3 dt V3 ∂V3 ∂V3 Pull the time derivative inside each volume integral (where it becomes a partial derivative), R and apply Gauss’s law to each surface integral; the results are V3 (∂ρe /∂t + ∇ · j)dV = 0 and similarly for particles. The only way these equations can be true for all choices of V3 is by the integrands vanishing: ∂ρe /∂t + ∇ · j = 0 ,

∂n/∂t + ∇ · S = 0 .

(1.73)

These are the differential conservation laws for charge and for particles. They have a standard, universal form: the time derivative of the density of a quantity plus the divergence of its flux vanishes.

1.11.2

[R] Spacetime Volumes and Integration

The generalization to 4-dimensional spacetime should be obvious: The 4-dimensional par~ B, ~ C, ~ D ~ has a 4-dimensional volume given by allelopiped whose legs are the four vectors A, the analog of Eqs. (1.70a) and (1.70b): 

A0  1 ~ B, ~ C, ~ D) ~ = det  A2 4-Volume = ǫαβγδ Aα B β C γ D δ = ǫ(A,  A A3

B0 B1 B2 B3

C0 C1 C2 C3

 D0 D1   . D2  D3

(1.74)

~ B, ~ C, ~ D} ~ is right-handed and Note that this 4-volume is positive if the set of vectors {A, negative if left-handed. Just as Eqs. (1.70a) and (1.70b) give us a way to perform area and volume integrals in 2and 3-dimensional Euclidean space, so Equation (1.74) provides us a way to perform volume integrals over 4-dimensional Minkowski spacetime: To integrate a tensor field T over some region V of spacetime, we need only divide spacetime up into tiny parallelopipeds, multiply the 4-volume dΣ of each parallelopiped by the value of T at its center, and add. It is not hard to see from Eq. (1.74) that in any right-handed Lorentz coordinate system, the 4-volume of a tiny parallelopiped whose edges are dxα along the four orthogonal coordinate axes is dΣ = dt dx dy dz (the analog of dV = dx dy dz), and correspondingly the integral of T over V can be expressed as Z Z αβγ T dΣ = T αβγ dt dx dy dz . (1.75) V

V

By analogy with the vectorial area (1.70c) of a parallelogram in 3-space, any 3-dimensional ~ B, ~ C ~ has a vectorial 3-volume Σ ~ (not to be confused parallelopiped in spacetime with legs A, with the scalar 4-volume Σ) defined by ~ ) = ǫ( , A, ~ B, ~ C) ~ ; Σ(

Σµ = ǫµαβγ Aα B β C γ .

(1.76)

a a 50

Here we have written the volume vector both in abstract notation and in component notation. This volume vector has one empty slot, ready and waiting for a fourth vector (“leg”) to be inserted, so as to compute the 4-volume Σ of a 4-dimensional parallelopiped. ~ is orthogonal to each of its three legs (because of the Notice that the volume vector Σ ~ = V ~n where V is the antisymmetry of ǫ), and thus (unless it is null) it can be written as Σ magnitude of the volume and ~n is the unit normal to the three legs. Interchanging any two legs of the parallelopiped reverses the 3-volume’s sign. Consequently, the 3-volume is characterized not only by its legs but also by the order of its legs, ~ (reverse the order of or equally well, in two other ways: (i ) by the direction of the vector Σ ~ will reverse); and (ii ) by the sense of the 3-volume, defined the legs, and the direction of Σ as follows. Just as a 2-volume (i.e., a segment of a plane) in 3-dimensional space has two sides, so a 3-volume in 4-dimensional spacetime has two sides; cf. Fig. 1.15. Every vector ~ for which Σ ~ ·D ~ > 0 points out of one side of the 3-volume Σ. ~ We shall call that side the D ~ ~ “positive side” of Σ; and we shall call the other side, the one out of which point vectors D ~ ·D ~ < 0, its “negative side”. When something moves through or reaches through or with Σ points through the 3-volume from its negative side to its positive side, we say that this thing is moving or reaching or pointing in the “positive sense”; and similarly for “negative sense”. The examples shown in Fig. 1.15 should make this more clear. Σ

t

t

e itiv s o p se sen

a ∆t e0

positive sense

∆y ey

∆x ex

x

∆y ey

y

Σ

y

x

(a)

(b)

Fig. 1.15: Spacetime diagrams depicting 3-volumes in 4-dimensional spacetime, with one spatial dimension (that along the z-direction) suppressed.

~ = ǫ( , ∆x~ex , ∆y~ey , Figure 1.15(a) shows two of the three legs of the volume vector Σ ∆z~ez ), where x, y, z are the spatial coordinates of a specific Lorentz frame. It is easy to show ~ = −∆V ~e0 , where ∆V is the ordinary volume of the that this vector can also be written as Σ parallelopiped as measured by an observer in the chosen Lorentz frame, ∆V = ∆x∆y∆z. ~ is toward the past (direction of decreasing Lorentz time Thus, the direction of the vector Σ t). From this, and the fact that timelike vectors have negative squared length, it is easy to ~ ·D ~ > 0 if and only if the vector D ~ points out of the “future” side of the 3-volume infer that Σ ~ is the future side. (the side of increasing Lorentz time t); therefore, the positive side of Σ ~ points in the negative sense of its own 3-volume. This means that the vector Σ ~ = ǫ( , ∆t~et , ∆y~ey , Figure 1.15(b) shows two of the three legs of the volume vector Σ ~ ∆z~ez ) = −∆t∆A~ex (with ∆A = ∆y∆z). In this case, Σ points in its own positive sense.

51 This peculiar behavior is completely general: When the normal to a 3-volume is timelike, ~ points in the negative sense; when the normal is spacelike, Σ ~ points its volume vector Σ ~ lies in the 3-volume in the positive sense; and—it turns out—when the normal is null, Σ (parallel to its one null leg) and thus points neither in the positive sense nor the negative.12 Note the physical interpretations of the 3-volumes of Fig. 1.15: That in Fig. 1.15(a) is an instantaneous snapshot of an ordinary, spatial, parallelopiped, while that in Fig. 1.15(b) is the 3-dimensional region in spacetime swept out during time ∆t by the parallelogram with legs ∆y~ey , ∆z~ez and with area ∆A = ∆y∆z. Just as in 3-dimensional Euclidean space, vectorial surface areas can be used to construct 2-dimensional surface integrals, so also (and in identically the same manner) in 4dimensional spacetime, vectorial volume elements can be used toR construct integrals over ~ · dΣ. ~ More specifi3-dimensional volumes (also called 3-dimensional surfaces), e.g. V3 A cally: Let (a, b, c) be (possibly curvilinear) coordinates in the 3-surface V3 , and denote by ~x(a, b, c) the spacetime point P on V3 whose coordinate values are (a, b, c). Then (∂~x/∂a)da, (∂~x/∂b)db, (∂~x/∂c)dc are the vectorial legs of the elementary parallelopiped whose corners are at (a, b, c), (a+da, b, c), (a, b+db, c), etc; and the spacetime components of these vectorial legs are (∂xα /∂a)da, (∂xα /∂b)db, (∂xα /∂c)dc. The3-volume of this elementary parallelop~ = ǫ , (∂~x/∂a)da, (∂~x/∂b)db, (∂~x/∂c)dc , which has spacetime components iped is dΣ ∂xα ∂xβ ∂xγ dadbdc . ∂a ∂b ∂c This is the integration element to be used when evaluating Z Z ~ ~ Aµ dΣµ . A · dΣ = dΣµ = ǫµαβγ

V3

(1.77)

(1.78)

∂V3

Just as there are Gauss and Stokes theorems (1.71a) and (1.71b) for integrals in Euclidean 3-space, so also there are Gauss and Stokes theorems in spacetime. The Gauss theorem has the obvious form Z Z ~ · dΣ ~ , A

~ (∇ · A)dΣ =

V4

(1.79)

∂V4

where the first integral is over a 4-dimensional region V4 in spacetime, and the second is over the 3-dimensional boundary of V4 , with the boundary’s positive sense pointing outward, away from V4 (just as in the 3-dimensional case). We shall not write down the 4-dimensional Stokes theorem because it is complicated to formulate with the tools we have developed thus far; easy formulation requires the concept of a differential form, which we shall not introduce in this book.

1.11.3

[R] Conservation of Charge in Spacetime

We can use integration over a 3-dimensional region (3-surface) in 4-dimensional spacetime to construct an elegant, frame-independent formulation of the law of conservation of electric charge: 12

This peculiar behavior gets replaced by a simpler description if one uses one-forms rather than vectors to describe 3-volumes; see, e.g., Box 5.2 of Misner, Thorne, and Wheeler (1973).

52 ~ We We begin by examining the geometric meaning of the charge-current 4-vector J. x ~ defined J in Eq. (1.67) in terms of its components. The spatial component J = Jx = J(~ex ) is equal to the x component of current density jx ; i.e. it is the amount Q of charge that flows across a unit surface area lying in the y-z plane, in a unit time; i.e., the charge that flows ~ = ~ex . In other words, J( ~ Σ) ~ = J(~ ~ ex ) is the total charge Q that across the unit 3-surface Σ ~ = ~ex in Σ’s ~ positive sense; and similarly for the other spatial directions. The flows across Σ 0 ~ e0 ) is the charge density ρe ; i.e., it is the total charge temporal component J = −J0 = J(−~ Q in a unit spatial volume. This charge is carried by particles that are traveling through ~ = −~e0 . spacetime from past to future, and pass through the unit 3-surface (3-volume) Σ ~ Σ) ~ = J(−~ ~ e0 ) is the total charge Q that flows through Σ ~ = −~e0 in its positive Therefore, J( sense. This is precisely the same interpretation as we deduced for the spatial components of ~ J. ~ J~(Σ) ~ ≡ This makes it plausible, and indeed one can show, that for any small 3-surface Σ, α ~ in its positive sense. J Σα is the total charge Q that flows across Σ t V ∂V

y

x

Fig. 1.16: The 4-dimensional region V in spacetime, and its closed 3-boundary ∂V, used in formulating the law of charge conservation. The dashed lines symbolize, heuristically, the flow of charge from past toward future.

This property of the charge-current 4-vector is the foundation for our frame-independent formulation of the law of charge conservation. Let V be a compact, 4-dimensional region of spacetime and denote by ∂V its boundary, a closed 3-surface in 4-dimensional spacetime (Fig. 1.16). The charged media (fluids, solids, particles, ...) present in spacetime carry electric charge through V, from the past toward the future. The law of charge conservation says that all the charge that enters V through the past part of its boundary ∂V must exit through the future part of its boundary. If we choose the positive sense of the boundary’s ~ to point out of V (toward the past on the bottom boundary and infinitesimal 3-volume dΣ toward the future on the top), then this global law of charge conservation can be expressed mathematically as Z J α dΣα = 0 . (1.80) ∂V

When each tiny charge q enters V through its past boundary, it contributes negatively to the integral, since it travels through ∂V in the negative sense (from positive side of ∂V toward negative side); and when that same charge exits V through its future boundary, it contributes positively to the integral. Therefore its net contribution is zero, and similarly for all other charges.

53 In exercise 1.24, we show when this global law of charge conservation (1.80) is subjected to a 3+1 split of spacetime into space plus time, it becomes the nonrelativistic integral law of charge conservation (1.72). This global conservation law can be converted into a local R R αconservation law with the help α of the 4-dimensional Gauss theorem (1.79), ∂V J dΣα = V J ;α dΣ . Since the left-hand side vanishes, so must the right-hand side; and in order for this 4-volume integral to vanish for every choice of V, it is necessary that the integrand vanish everywhere in spacetime: J α ;α = 0 ;

~ · J~ = 0 . i.e.∇

(1.81)

In a specific but arbitrary Lorentz frame (i.e., in a 3+1 split of spacetime into space plus time), this becomes the standard differential law of charge conservation (1.73).

1.11.4

[R] Conservation of Particles, Baryons and Rest Mass

Any conserved scalar quantity obeys conservation laws of the same form as those for electric charge. For example, if the number of particles of some species (e.g. electrons or protons ~ or photons) is conserved, then we can introduce for that species a number-flux 4-vector S 0 ~ In any Lorentz coordinate system S is the number (analog of charge-current 4-vector J): j ~ is a small 3-volume (3-surface) in density of particles n and S is the particle flux. If Σ α ~ Σ) ~ = SW Σα is the number of particles that pass through Σ from its spacetime, then S( negative side to its positive side. The frame-invariant global and local conservation laws for these particles take the same form as those for electric charge: Z S α dΣα = 0, where ∂V is any closed 3-surface in spacetime, (1.82a) ∂V

S α ;α = 0 ;

~ =0. i.e. ∇ · S

(1.82b)

When fundamental particles (e.g. protons and antiprotons) are created and destroyed by quantum processes, the total baryon number (number of baryons minus number of antibaryons) is still conserved—or, at least this is so to the accuracy of all experiments performed thus far. We shall assume it so in this book. This law of baryon-number conservation ~ the number-flux 4-vector for baryons (with antakes the forms (1.82a) and (1.82b), with S tibaryons counted negatively). It is useful to reexpress this baryon-number conservation law in Newtonian-like language by introducing a universally agreed upon mean rest mass per baryon m ¯ B This m ¯ B is often taken to be 1/56 the mass of an 56 Fe (iron-56) atomic nucleus, since 56 Fe is the nucleus with the tightest nuclear binding, i.e. the endpoint of thermonuclear evolution in stars. We ~ by this mean rest mass per baryon to obtain a multiply the baryon number-flux 4-vector S rest-mass-flux 4-vector ~rm = m ~, S ¯ BS (1.83) which (since m ¯ B is, by definition, a constant) satisfies the same conservation laws (1.82a) and (1.82b) as baryon number.

54 For media such as fluids and solids, in which the particles travel only short distances between collisions or strong interactions, it is often useful to resolve the particle numberflux 4-vector and the rest-mass-flux 4-vector into a 4-velocity of the medium ~u (i.e., the 4-velocity of the frame in which there is a vanishing net spatial flux of particles), and the particle number density no or rest mass density ρo as measured in the medium’s rest frame: ~ = no~u , S

~rm = ρo~u . S

(1.84)

See Exercise 1.25. ~ = 0 and ∇ · S ~rm = 0 for particles We shall make use of the conservation laws ∇ · S and rest mass later in this book, e.g. when studying relativistic fluids; and we shall find the expressions (1.84) for the number-flux 4-vector and rest-mass-flux 4-vector quite useful. See, e.g., the discussion of relativistic shock waves in Ex. 16.11, and the nonrelativistic limit of a relativistic fluid in Sec. 23.4 . **************************** EXERCISES Exercise 1.23 Practice and Example: [R] Evaluation of 3-Surface Integral in Spacetime In Minkowski spacetime the set of all events separated from the origin by a timelike interval a2 is a 3-surface, the hyperboloid t2 − x2 − y 2 − z 2 = a2 , where {t, x, y, z} are Lorentz coordinates of some inertial reference frame. On this hyperboloid introduce coordinates {χ, θ, φ} such that t = a cosh χ ,

x = a sinh χ sin θ cos φ ,

y = a sinh χ sin θ sin φ; ,

z = a sinh χ cos θ . (1.85) Note that χ is a radial coordinate and (θ, φ) are spherical polar coordinates. Denote by V3 the portion of the hyperboloid with χ ≤ b. (a) Verify that for all values of (χ, θ, φ), the points (1.85) do lie on the hyperboloid. (b) On a spacetime diagram, draw a picture of V3 , the {χ, θ, φ} coordinates, and the ~ [Eq. (1.77)]. elementary volume element (vector field) dΣ R ~ ≡ ~e0 (the temporal basis vector), and express ~ · dΣ ~ as an integral over (c) Set A A V3 {χ, θ, φ}. Evaluate the integral. (d) Consider a closed 3-surface consisting of the segment V3 of the hyperboloid as its top, the hypercylinder {x2 + y 2 + z 2 = a2 sinh2 b, 0 < t < a cosh b} as its sides, and the sphere {x2 + y 2 + z 2 ≤ a2 sinh2 b , t = 0} as its bottom. Draw a picture of this closed 3-surface on R a spacetime diagram. Use Gauss’s theorem, applied to this 3-surface, to ~ · dΣ ~ is equal to the 3-volume of its spherical base. show that V3 A

55 Exercise 1.24 Derivation and Example: [R] Global Law of Charge Conservation in an Inertial Frame R Consider the global law of charge conservation ∂V J α dΣα = 0 for a special choice of the closed 3-surface ∂V: The bottom of ∂V is the ball {t = 0, x2 + y 2 + z 2 ≤ a2 }, where {t, x, y, z} are the Lorentz coordinates of some inertial frame. The sides are the spherical world tube {0 ≤ t ≤ T, x2 + y 2 + z 2 = a2 }. The top is the ball {t = T, x2 + y 2 + z 2 ≤ a2 }. (a) Draw this 3-surface in a spacetime diagram. R (b) Show that for this ∂V, ∂V J α dΣα = 0 is the nonrelativistic integral conservation law (1.72) for charge. Exercise 1.25 Example: [R] Rest-mass-flux 4-vector, Lorentz contraction of rest-mass density, and rest-mass conservation for a fluid Consider a fluid with 4-velocity ~u, and rest-mass density ρo as measured in the fluid’s rest frame. ~rm , deduce Eq. (a) From the physical meanings of ~u, ρo , and the rest-mass-flux 4-vector S (1.84). ~rm in a reference frame where the fluid moves with ordi(b) Examine the components of S √ nary velocity v. Show that S 0 = ρo γ, S j = ρo γv j , where γ = 1/ 1 − v2 . Explain the physical interpretation of these formulas in terms of Lorentz contraction. ~rm = 0, takes the form (c) Show that the law of conservation of rest-mass ∇ · S dρo = −ρo ∇ · ~u , dτ

(1.86)

where d/dτ is derivative with respect to proper time moving with the fluid. (d) Consider a small 3-dimensional volume V of the fluid, whose walls move with the fluid (so if the fluid expands, V goes up). Explain why the law of rest-mass conservation must take the form d(ρo V )/dτ = 0. Thereby deduce that ∇ · ~u = (1/V )(dV /dτ ) .

****************************

(1.87)

56

1.12

The Newtonian Stress Tensor, Relativistic Stressenergy Tensor, and Conservation of 4-Momentum

1.12.1

[N] Newtonian Stress Tensor and Momentum Conservation

Press your hands together in the x–y plane and feel the force that one hand exerts on the other across a tiny area A — say, one square millimeter of your hands’ palms (Fig. 1.17). That force, of course, is a vector F. It has a normal component (along the x direction). It also has a tangential component: if you try to slide your hands past each other, you feel a component of force along their surface, a “shear” force in the y and z directions. Not only is the force F vectorial; so is the 2-surface across which it acts, Σ = A ex . (Here ex is the unit vector orthogonal to the tiny area A, and we have chosen the negative side of the surface to be the −x side and the positive side to be +x. With this choice, the force F is that which the negative hand, on the −x side, exerts on the positive hand.) z

x

y

Fig. 1.17: Hands, pressed together, exert a stress on each other.

Now, it should be obvious that the force F is a linear function of our chosen surface Σ. Therefore, there must be a tensor, the stress tensor, that reports the force to us when we insert the surface into its second slot: F( ) = T( , Σ) ,

i.e., Fi = Tij Σj .

(1.88)

Newton’s law of action and reaction tells us that the force that the positive hand exerts on the negative hand must be equal and opposite to that that which the negative hand exerts on the positive. This shows up trivially in Eq. (1.88): By changing the sign of Σ, one reverses which hand is regarded as negative and which positive; and since T is linear in Σ, one also reverses the sign of the force. The definition (1.88) of the stress tensor gives rise to the the following physical meaning of its components: j-component of force per unit area Tjk = across a surface perpendicular to ~ek   j-component of momentum that crosses a unit =  area which is perpendicular to ~ek , per unit time,  . (1.89) k k with the crossing being from −x to +x The stresses inside a table with a heavy weight on it are described by the stress tensor T, as are the stresses in a flowing fluid or plasma, in the electromagnetic field, and in any other

57 physical medium. Accordingly, we shall use the stress tensor as an important mathematical tool in our study of force balance in kinetic theory (Chap. 2), elasticity theory (Part III), fluid mechanics (Part IV), and plasma physics (Part V). It is not obvious from its definition, but the stress tensor T is always symmetric in its two slots. To see this, consider a small cube with side L in any medium (or field) (Fig. 1.18). The medium outside the cube exerts forces, and thence also torques, on the cube’s faces. The z-component of the torque is produced by the shear forces on the front and back faces and on the left and right. As shown in the figure, the shear forces on the front and back faces have magnitudes Txy L2 and point in opposite directions, so they exert identical torques on the cube, Nz = Txy L2 (L/2) (where L/2 is the distance of each face from the cube’s center). Similarly, the shear forces on the left and right faces have magnitudes Tyx L2 and point in opposite directions, thereby exerting identical torques on the cube, Nz = −Tyx L2 (L/2). Adding the torques from all four faces and equating them to the rate of change of angular 1 1 momentum, 12 ρL5 dΩz /dt (where ρ is the mass density, 12 ρL5 is the cube’s moment of inertia, 1 and Ωz is the z component of its angular velocity), we obtain (Tyx − Txy )L3 = 12 ρL5 dΩz /dt. Now, let the cube’s edge length become arbitrarily small, L → 0. If Tyx − Txy does not vanish, then the cube will be set into rotation with an infinitely large angular acceleration, dΩz /dt ∝ 1/L2 → ∞ — an obviously unphysical behavior. Therefore Tyx = Txy , and similarly for all other components; the stress tensor is always symmetric under interchange of its two slots. L

Tyx L2

Txy L2

L

Txy L2 z

Tyx L2

y

L

x Fig. 1.18: The shear forces exerted on the left, right, front and back faces of a vanishingly small cube. The resulting torque about the z direction will set the cube into rotation with an arbitrarily large angular acceleration unless the stress tensor is symmetric.

Two examples will make the stress tensor more concrete: Perfect fluid: Inside a perfect fluid there is an isotropic pressure P , so Txx = Tyy = Tzz = P (the normal forces per unit area across surfaces in the yz plane, the zx plane and the xy plane are all equal to P ). The fluid cannot support any shear forces, so the off-diagonal components of T vanish. We can summarize this by Tij = P δij or equivalently, since δij are the components of the Euclidean metric, Tij = P gij . The frame-independent version of this is T = P g. To check this result, consider a 2-surface Σ = An with area A oriented perpendicular to some arbitrary unit vector n. The vectorial force that the fluid exerts across Σ is, in index notation, Fj = Tjk Σk = P gjk Ank = P Anj ; i.e. it is a normal force with magnitude equal to the fluid pressure P times the surface area A. This is what it should be.

58 Electromagnetic field: See Ex. 1.26 below. The stress tensor plays a central role in the Newtonian law of momentum conservation: Recall the physical intepretation of Tjk as the j-component of momentum that crosses a unit area perpendicular to ek per unit time [Eq. (1.89)]. Apply this definition to the little cube in Fig. 1.18. The momentum that flows into the cube in unit time across the front face (at y = 0) is Tjy L2 , and across the back face (at y = L) is −Tjy L2 ; and their sum is −Tjy,y L3 . Adding to this the contributions from the side faces and the top and bottom faces, we find for the rate of change of total momentum inside the cube (−Tjx,x − Tjy,y − Tjz,z )L3 = −Tjk,k L3 . Since the cube’s volume is L3 , this says that ∂(momentum density)/dt + ∇ · T = 0 .

(1.90)

This has the standard form for any local conservation law: the time derivative of the density of some quantity (here momentum), plus the divergence of the flux of that quantity (here momentum flux is the stress tensor), is zero. We shall make extensive use of this Newtonian local law of momentum conservation in Part III (elasticity theory), Part IV (fluid mechanics) and Part V (plasma physics). **************************** EXERCISES Exercise 1.26 **Problem: [R] Electromagnetic Stress Tensor An electric field E exerts (in Gaussian cgs units) a pressure E2 /8π orthogonal to itself and a tension of this same magnitude along itself. Similarly, a magnetic field B exerts a pressure B2 /8π orthogonal to itself and a tension of this same magnitude along itself. Verify that the following stress tensor embodies these stresses: T=

1 2 (E + B2 )g − 2(E ⊗ E + B ⊗ B) . 8π

(1.91)

****************************

1.12.2

[R] Relativistic Stress-Energy Tensor

We conclude this chapter by formulating the law of 4-momentum conservation in ways analogous to our laws of conservation of charge, particles, baryons and rest mass. This task is not trivial, since 4-momentum is a vector in spacetime, while charge, particle number, baryon number, and rest mass are scalar quantities. Correspondingly, the density-flux of 4-momentum must have one more slot than the density-fluxes of charge, baryon number and ~ S ~ and S ~rm ; it must be a second-rank tensor. We call it the stress-energy tensor rest mass, J, and denote it T( , ) (the same notation as we use for the stress-tensor in Euclidean space). Consider a medium or field flowing through 4-dimensional spacetime. As it crosses a ~ it transports a net electric charge J( ~ Σ) ~ from the negative side of Σ ~ to tiny 3-surface Σ,

59 ~ Σ) ~ and net rest mass S ~rm (Σ); ~ and similarly, it the positive side, and net baryon number S( ~ from the negative side to the positive side: transports a net 4-momentum T( , Σ) ~ ≡ (total 4-momentum P~ that flows through Σ); ~ T( , Σ)

i.e., T αβ Σβ = P α .

(1.92)

From this definition of the stress-energy tensor we can read off the physical meanings of its components on a specific, but arbitrary, Lorentz-coordinate basis: Making use of method (1.32b) for computing the components of a vector or tensor, we see that in a specific, but ~ = −~e0 is a volume vector representing a parallelopiped arbitrary, Lorentz frame (where Σ with unit volume ∆V = 1, at rest in that frame, with its positive sense toward the future):   α-component of 4-momentum that −Tα0 = T(~eα , −~e0 ) = P~ (~eα ) =  flows from past to future across a unit  volume ∆V = 1 in the 3-space t = const = (α-component of density of 4-momentum ) . (1.93a) Specializing α to be a time or space component and raising indices, we obtain the specialized versions of (1.93a) T 00 = (energy density as measured in the chosen Lorentz frame), T j0 = (density of j-component of momentum in that frame).

(1.93b)

Similarly, the αx component of the stress-energy tensor (also called the α1 component since x = x1 and ~ex = ~e1 ) has the meaning   α-component of 4-momentum that crosses  a unit area ∆y∆z = 1 lying in a surface of   Tα1 ≡ Tαx ≡ T(~eα , ~ex ) =   constant x, during unit time ∆t, crossing  from the −x side toward the +x side α component of flux of 4-momentum . (1.93c) = across a surface lying perpendicular to ~ex The specific forms of this for temporal and spatial α are (after raising indices) energy flux across a surface perpendidular to ~ex , 0x T = , (1.93d) from the −x side to the +x side flux of j-component of momentum across a surface jx component jx T = = . perpendicular to ~ex , from the −x side to the +x side of stress (1.93e) The αy and αz components have the obvious, analogous interpretations. These interpretations, restated much more briefly, are: T 00 = (energy density), T j0 = (momentum density), T 0j = (energy flux), T jk = (stress). (1.93f)

60 Although it might not be obvious at first sight, the 4-dimensional stress-energy tensor is always symmetric: in index notation (where indices can be thought of as representing the names of slots, or equally well components on an arbitrary basis) T αβ = T βα .

(1.94)

This symmetry can be deduced by physical arguments in a specific, but arbitrary, Lorentz frame: Consider, first, the x0 and 0x components, i.e., the x-components of momentum density and energy flux. A little thought, symbolized by the following heuristic equation, reveals that they must be equal ∆E (∆E)dx/dt energy momentum x0 = = , (1.95) T = = flux density ∆x∆y∆z ∆y∆z∆t and similarly for the other space-time and time-space components: T j0 = T 0j . [In Eq. (1.95), in the first expression ∆E is the total energy (or equivalently mass) in the volume ∆x∆y∆z, (∆E)dx/dt is the total momentum, and when divided by the volume we get the momentum density. The third equality is just elementary algebra, and the resulting expression is obviously the energy flux.] The space-space components, being equal to the stress tensor, are also symmetric, T jk = T kj , by the argument embodied in Fig. 1.18 above. Since T 0j = T j0 and T jk = T kj , all components in our chosen Lorentz frame are symmetric, T αβ = T βα . This means that, if we insert arbitrary vectors into the slots of T and evaluate the resulting number in our chosen Lorentz frame, we will find ~ B) ~ = T αβ Aα Bβ = T βα Aα Bβ = T(B, ~ A) ~ ; T(A,

(1.96)

i.e., T is symmetric under interchange of its slots. Let us return to the physical meanings (1.93) of the components of the stress-energy tensor. With the aid of T’s symmetry, we can restate those meanings in the language of a 3+1 split of spacetime into space plus time: When one chooses a specific reference frame, that choice splits the stress-energy tensor up into three parts. Its time-time part is the energy density T 00 , Its time-space part T 0j = T j0 is the energy flux or equivalently the momentum density, and its space-space part T jk is the symmetric stress tensor.

1.12.3

[R] 4-Momentum Conservation

~ Σ) ~ ≡ J α Σα as the net charge that flows through a small 3-surface Σ ~ Our interpretation of J( from its negative side to its positive side gave rise to the global conservation law for charge, R α ~ [T αβ Σβ in slot J dΣα = 0 [Eqs. (1.80) and Fig. 1.16]. Similarly the role of T( , Σ) ∂V ~ from its negative side naming index notation] as the net 4-momentum that flows through Σ to positive gives rise to the following equation for conservation of 4-momentum: Z T αβ dΣβ = 0 . (1.97) ∂V

This equation says that all the 4-momentum that flows into the 4-volume V of Fig. 1.16 through its 3-surface ∂V must also leave V through ∂V; it gets counted negatively when it

61 enters (since it is traveling from the positive side of ∂V to the negative), and it gets counted positively when it leaves, so its net contribution to the integral (1.97) is zero. This global law of 4-momentum conservation can be converted into a local law (analogous ~ to ∇ · J~ = 0 for charge) with the help of the 4-dimensional Gauss’s theorem (1.79). Gauss’s theorem, generalized in the obvious way from a vectorial integrand to a tensorial one, says: Z Z αβ T ;β dΣ = T αβ dΣβ . (1.98) ∂V

V

Since the right-hand side vanishes, so must the left-hand side; and in order for this 4-volume integral to vanish for every choice of V, it is necessary that the integrand vanish everywhere in spacetime: ~ ·T=0. T αβ ;β = 0 ; i.e., ∇ (1.99a) In the second, index-free version of this local conservation law, the ambiguity about which slot the divergence is taken on is unimportant, since T is symmetric in its two slots: T αβ ;β = T βα ;β . In a specific but arbitrary Lorentz frame, the local conservation law (1.99a) for 4momentum has as its temporal and spatial parts ∂T 00 ∂T 0k + =0, ∂t ∂xk

(1.99b)

i.e., the time derivative of the energy density plus the 3-divergence of the energy flux vanishes; and ∂T j0 ∂T jk + =0, (1.99c) ∂t ∂xk i.e., the time derivative of the momentum density plus the 3-divergence of the stress (i.e., of the momentum flux) vanishes. Thus, as one should expect, the geometric, frame-independent law of 4-momentum conservation includes as special cases both the conservation of energy and the conservation of momentum; and their differential conservation laws have the standard form that one expects both in Newtonian physics and in special relativity: time derivative of density plus divergence of flux vanishes; cf. Eq. (1.90) and associated discussion.

1.12.4

[R] Stress-Energy Tensors for Perfect Fluid and Electromagnetic Field

As an important example that illustrates the stress-energy tensor, consider a perfect fluid. A perfect fluid is a continuous medium whose stress-energy tensor, evaluated in its local rest frame (a Lorentz frame where T j0 = T 0j = 0), has the special form T 00 = ρ ,

T jk = P δ jk .

(1.100a)

Here ρ is a short-hand notation for the energy density (density of total mass-energy, including rest mass) T 00 , as measured in the local rest frame; and the stress tensor T jk as measured in that frame has the form of an isotropic pressure P , and vanishing shear stress. From this

62 special form of T αβ in the local rest frame, one can derive the following expression for the stress-energy tensor in terms of the 4-velocity ~u of the local rest frame, i.e., of the fluid itself, the metric tensor of spacetime g, and the rest-frame energy density ρ and pressure P : T αβ = (ρ + P )uα uβ + P g αβ ;

i.e., T = (ρ + P )~u ⊗ ~u + P g .

(1.100b)

See Ex. 1.28, below. In Part IV of this book, we shall explore in depth the implications of this stress-energy tensor, in the Newtonian limit. Another example of a stress-energy tensor is that for the electromagnetic field, which takes the following form in Gaussian units: 1 1 αβ µν αβ αµ β T = F F µ − g F Fµν (1.101) 4π 4 see Exercise 1.30 **************************** EXERCISES Exercise 1.27 Example: [R] Global Conservation of 4-Momentum in an Inertial Frame Consider the 4-dimensional parallelopiped V whose legs are ∆t~et , ∆x~ex , ∆y~ey ∆z~ez , where (t, x, y, z) = (x0 , x1 , x2 , x3 ) are the coordinates of some inertial frame. The boundary ∂V R of0βthis V has eight 3-dimensional “faces”. Identify these faces, and write the integral T dΣβ as the sum of contributions from each of them. According to the law of energy ∂V conservation, this sum must vanish. Explain the physical interpretation of each of the eight contributions to this energy conservation law. (Cf. Ex. 1.24 for an analogous interpretation of charge conservation.) Exercise 1.28 **Derivation and Example: [R] Stress-Energy Tensor and Energy-Momentum Conservation for a Perfect Fluid (a) Derive the frame-independent expression (1.100b) for the perfect fluid stress-energy tensor from its rest-frame components (1.100a). (b) Explain why the projection of ∇·T = 0 along the fluid 4-velocity, ~u ·(∇·T) = 0, should represent energy conservation as viewed by the fluid itself. Show that this equation reduces to dρ = −(ρ + P )∇ · ~u . (1.102a) dτ With the aid of Eq. (1.87), bring this into the form dV d(ρV ) = −P , dτ dτ

(1.102b)

where V is the 3-volume of some small fluid element as measured in the fluid’s local rest frame. What are the physical interpretations of the left and right sides of this equation, and how is it related to the first law of thermodynamics?

63 (c) Read the discussion, in Ex. 1.10, of the tensor P = g + ~u ⊗ ~u that projects into the 3-space of the fluid’s rest frame. Explain why Pµα T αβ ;β = 0 should represent the law of force balance (momentum conservation) as seen by the fluid. Show that this equation reduces to (ρ + P )~a = −P · ∇P , (1.102c) where ~a = d~u/dτ is the fluid’s 4-acceleration. This equation is a relativistic version of Newton’s “F = ma”. Explain the physical meanings of the left and right hand sides. Infer that ρ + P must be the fluid’s inertial mass per unit volume. See Ex. 1.29 for further justification of this inference. Exercise 1.29 **Example: [R] Inertial Mass Per Unit Volume Suppose that some medium has a rest frame (unprimed frame) in which its energy flux and momentum density vanish, T 0j = T j0 = 0. Suppose that the medium moves in the x direction with speed very small compared to light, v ≪ 1, as seen in a (primed) laboratory ′ ′ frame, and ignore factors of order v 2 . The “ratio” of the medium’s momentum density T j 0 as measured in the laboratory frame to its velocity vi = vδix is called its total inertial mass per unit volume, and is denoted ρinert ji : ′ ′

T j 0 = ρinert ji vi .

(1.103)

(a) Show, using a Lorentz transformation from the medium’s (unprimed) rest frame to the (primed) laboratory frame, that ρinert = T 00 δji + Tji . ji

(1.104)

(b) Give a physical explanation of the contribution Tji vi to the momentum density. (c) Show that for a perfect fluid [Eq. (1.100b)] the inertial mass per unit volume is isotropic and has magnitude ρ + P , where ρ is the mass-energy density and P is the pressure measured in the fluid’s rest frame: ρinert = (ρ + P )δji . ji

(1.105)

See Ex. 1.28 above for this inertial-mass role of ρ + P in the law of force balance (momentum conservation) for a fluid. Exercise 1.30 **Example: [R] Stress-Energy Tensor, and Energy-Momentum Conservation for the Electromagnetic Field (a) Compute from Eq. (1.101) the components of the electromagnetic stress-energy tensor in an inertial reference frame in Gaussian units. Your answer should be the expressions given in electrodynamic textbooks: E 2 + B2 E×B , T 0j ej = T j0 ej = , 8π 4π 1 2 (E + B2 )δjk − 2(Ej Ek + Bj Bk ) . T jk = 8π See also Ex. 1.26 above for an alternative derivation of the stress tensor Tjk . T 00 =

(1.106)

64 (b) Show that for the electromagnetic field, T αβ ;β = F αµ Jµ ,

(1.107a)

where Jµ is the charge-current 4-vector. (c) The matter that carries the electric charge and current can exchange energy and momentum with the electromagnetic field. Explain why Eq. (1.107a) is the rate per unit volume at which that matter feeds 4-momentum into the electromagnetic field, and conversely, −F αµ Jµ is the rate per unit volume at which the electromagnetic field feeds 4-momentum into the matter. Show, further, that (as viewed in any reference frame) the time and space components of this quantity are dEmatter = −F 0j Jj = E · j , dtdV

dpmatter = ρe E + j × B , dtdV

(1.107b)

where ρe is charge density and j is current density [Eq. (1.67)]. The first of these equations is ohmic heating of the matter by the electric field; the second is the Lorentz force per unit volume on the matter.

****************************

Bibliographic Note For an inspiring taste of the history of special relativity, see the original papers by Einstein, Lorentz, and Minkowski, translated into English and archived in Einstein et. al. (1923). Early relativity textbooks [see the bibliography on p. 567 of Jackson (1999)] emphasized the transformation properties of physical quantities, in going from one inertial frame to another, rather than their roles as frame-invariant geometric objects. Minkowski (1908) introduced geometric thinking, but only in recent years — in large measure due to the influence of John Wheeler — has the geometric viewpoint gained ascendancy. It is still not common in texts on Newtonian physics, but it is almost universal in modern relativity texts. In our opinion, the best elementary introduction to special relativity is the first edition of Taylor and Wheeler (1966); the more ponderous second edition (1992) is also good. Both adopt the geometric viewpoint. At an intermediate level, most physics students learn relativity from electrodynamics texts such as Griffiths (1999) and Jackson (1999), or classical mechanics texts such as Goldstein (1980). Avoid the first and second editions of Jackson and of Goldstein, which use imaginary time and obscure the geometry of spacetime! Griffiths and Jackson (like old relativity texts) adopt the “transformation” viewpoint on physical quantities, rather than the geometric viewpoint. Under John Safko’s influence, the third edition of Goldstein [Goldstein, Poole and Safko (2002)] has become strongly geometric. For fully geometric treatments of special relativity, analogous to ours, see not only the third edition of Goldstein, but also the special relativity sections in modern general relativity texts. Some we like at the undergraduate level are Schutz (1985) and especially Hartle (2002);

65 Box 1.4 Important Concepts in Chapter 1 • Foundational Concepts – Frameworks for physical laws (general relativity, special relativity and Newtonian physics) and their relationships to each other, Sec. 1.1. – Inertial reference frame, Sec. 1.2.2. – Invariant interval and how it defines the geometry of spacetime, Sec. 1.2.3. • Principle of Relativity: Laws of physics are frame-independent geometric relations between geometric objects, Sec. 1.2.3. Important examples: – Newton’s second law of motion F = ma, Eq. (1.13). – Lorentz force law in 3-dimensional Newtonian language (1.14), and in 4dimensional geometric language (1.28) and their connection, Sec. 1.10. – Conservation of 4-momentum in particle interations, Eq. (1.23). – Global and local conservation laws for charge, baryon number, and 4momentum, Secs. 1.11.3, 1.11.4, 1.12.3. • Differential geometry – Tensor as a linear function of vectors, Sec. 1.3. Examples: Electromagnetic field tensor (1.26), stress tensor (1.88) and stress-energy tensor (1.92). – Slot-naming index notation, Sec. 1.5.3. – Gauss’s theorem in Euclidean space (1.71a), and in spacetime (1.79). – Computations via geometric techniques, without coordinates or Lorentz transformations (e.g. derive Lorentz force law Ex. 1.4.3, derive Dopper shift Ex. 1.11, derive vector identities Ex. 1.18). • 3+1 Split of spacetime into space plus time induced by choice of inertial frame, Sec. 1.6, and resulting 3+1 split of physical quantities and laws: – 4-momentum → energy and momentum, Eqs. (1.35), (1.36), (1.38); Ex. 1.9. – Electromagnetic tensor → electric field and magnetic field, Sec. 1.10. – Charge-current 4-vector → charge density and current density, Ex. 1.20. • Spacetime diagrams used to understand Lorentz contraction, time dilation, breakdown of simultaneity (Sec. 1.7.3, Ex. 1.14) and conservation laws (Fig. 1.16).

66 and at a more advanced level, Carroll (2004) and the venerable Misner, Thorne and Wheeler (1973) — often cited as MTW. In Parts I–V of our book, we minimize, so far as possible, the proliferation of mathematical concepts (avoiding, e.g., differential forms and dual bases). By contrast, other advanced treatments (e.g. MTW, Goldstein 3rd edition, and Carroll) embrace the richer mathematics.

Bibliography Borde, Arvind, Ford, L.H., and Roman, Thomas A., 2002. “Constraints on spatial distributions of negative energy,” Physical Review D, 65, 084002. Carroll, S. 2004. An Introduction to Spacetime and Geometry, New York: Addison Wesley. Einstein, Albert, Lorentz, Hendrik A., Minkowski, Hermann, and Weyl, Hermann 1923. The Principle of Relativity, New York: Dover Publications. Feynman, Richard P. 1966. The Character of Physical Law, M.I.T. Press, Cambridge, Massachusetts. Friedman, John L. and Higuchi, A. “Topological censorship and chronology protection,” Annalen der Physik, 15, 109–128 (2006). Goldstein, Herbert 1980. Classical Mechanics, New York: Addison Wesley, second edition. Goldstein, Herbert, Poole, Charles and Safko, John 2002. Classical Mechanics, New York: Addison Wesley, third edition. Griffiths, David J. 1999. Introduction to Electrodynamics, Upper Saddle River NJ: Prentice-Hall, third edition. Hafele, J. C., and Keating, Richard E. 1972a. “Around-the-World Atomic Clocks: Predicted Relativistic Time Gains,” Science, 177, 166-168. Hafele, J. C., and Keating, Richard E. 1972b. “Around-the-World Aotmic Clocks: Observed Relativistic Time Gains,” Science, 177, 168-170. Hartle, J. B. 2002. Gravity: an Introduction to Einstein’s General Relativity, New York: Addison Wesley. Hawking, Stephen W. 1992. ”The Chronology Protection Conjecture,” Physical Review D, 46, 603-611. Jackson, John David 1999. Classical Electrodynamics, New York: Wiley, third edition.

67 Kim, Sung-Won and Thorne, Kip S. 1991. “Do Vacuum Fluctuations Prevent the Creation of Closed Timelike Curves?” Physical Review D, 43, 3929-3947. Lorentz, Hendrik A. 1904. “Electromagnetic Phenomena in a System Moving with Any Velocity Less than that of Light,” Proceedings of the Academy of Sciences of Amsterdam, 6, 809; reprinted in Einstein et al . (1923). Mathews, Jon and Walker, R. L. 1964. Mathematical Methods of Physics, New York: Benjamin. Minkowski, Hermann 1908. “Space and Time,” Address to the 80th Assembly of German Natural Scientists and Physicians, at Cologne, 21 September 1908; text published posthumously in Annalen der Physik , 47, 927 (1915); English translation in Einstein et al . (1923). Misner, Charles W., Thorne, Kip S., and Wheeler, John A. 1973. Gravitation, San Francisco: Freeman. Morris, Michael S. and Thorne, Kip S. 1987. “Wormholes in Spacetime and their Use for interstellar Travel—a Tool for Teaching General Relativity,” American Journal of Physics, 56, 395-412. Morris, Michael S., Thorne, Kip S., and Yurtsever, Ulvi 1987. “Wormholes, Time Machines, and the Weak Energy Condition,” Physical Review Letters, 61, 1446-1449. Schutz, Bernard F. 1985. A First Course in General Relativity, Cambridge: Cambridge University Press. Taylor, Edwin F. and Wheeler, John A. 1966. Spacetime Physics: Introduction to Special Relativity, San Francisco: Freeman, first edition. Taylor, Edwin F. and Wheeler, John A. 1992. Spacetime Physics: Introduction to Special Relativity, San Francisco: Freeman, second edition. Thorne, Kip S. 1993. “Closed Timelike Curves,” in General Relativity and Gravitation 1992, edited by R. J. Gleiser, C. N. Kozameh and O. M. Moreschi, Cambridge University Press, Cambridge, England.

Contents I

STATISTICAL PHYSICS

ii

2 Kinetic Theory 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Phase Space and Distribution Function . . . . . . . . . . . . . . . . . . . . . 2.2.1 [N] Newtonian Number density in phase space, N . . . . . . . . . . . 2.2.2 [N] Distribution function f (x, v, t) for Particles in a Plasma. . . . . . 2.2.3 [R] Relativistic Number Density in Phase Space, N . . . . . . . . . . 2.2.4 [R & N] Distribution function Iν /ν 3 for photons. . . . . . . . . . . . . 2.2.5 [N & R] Mean Occupation Number, η . . . . . . . . . . . . . . . . . . 2.3 [N & R] Thermal-Equilibrium Distribution Functions . . . . . . . . . . . . . 2.4 Macroscopic Properties of Matter as Integrals over Momentum Space . . . . 2.4.1 [N] Newtonian Particle Density n, Flux S, and Stress Tensor T . . . . ~ and Stress-Energy Tensor T 2.4.2 [R] Relativistic Number-Flux 4-Vector S 2.5 Isotropic Distribution Functions and Equations of State . . . . . . . . . . . . 2.5.1 [N] Newtonian Density, Pressure, Energy Density and Equation of State 2.5.2 [N] Equations of State for a Nonrelativistic Hydrogen Gas . . . . . . 2.5.3 [R] Relativistic Density, Pressure, Energy Density and Equation of State 2.5.4 [R] Equation of State for a Relativistic Degenerate Hydrogen Gas . . 2.5.5 [R] Equation of State for Radiation . . . . . . . . . . . . . . . . . . . 2.6 [N & R] Evolution of the Distribution Function: Liouville’s Theorem, the Collisionless Boltzmann Equation, and the Boltzmann Transport Equation . 2.7 [N] Transport Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Problem to be Analyzed: Diffusive Heat Conduction Inside a Star . . 2.7.2 Order-of-Magnitude Analysis . . . . . . . . . . . . . . . . . . . . . . 2.7.3 Analysis Via the Boltzmann Transport Equation . . . . . . . . . . . .

i

1 1 2 2 4 4 9 10 14 18 18 19 21 21 23 26 27 28 32 38 40 41 42

Part I STATISTICAL PHYSICS

ii

Statistical Physics Version 0802.1.K, 8 October 2008 In this first Part of the book we shall study aspects of classical statistical physics that every physicist should know but are not usually treated in elementary thermodynamics courses. Our study will lay the microphysical (particle-scale) foundations for the continuum physics of Parts II—VI; and it will elucidate the intimate connections between relativistic statistical physics and Newtonian theory, and between quantum statistical physics and classical theory. (The quantum-classical connection is of practical importance: even for fully classical systems, a quantum viewpoint can be computationally powerful; see, e.g., Chap. 22.) As in Chap. 1, our treatment will be so organized that readers who wish to restrict themselves to Newtonian theory can easily do so. Throughout, we presume the reader is familiar with elementary thermodynamics, but not with other aspects of statistical physics. In Chap. 2 we will study kinetic theory — the simplest of all formalisms for analyzing systems of huge numbers of particles (e.g., molecules of air, or neutrons diffusing through a nuclear reactor, or photons produced in the big-bang origin of the Universe). In kinetic theory the key concept is the “distribution function” or “number density of particles in phase space”, N ; i.e., the number of particles per unit 3-dimensional volume of ordinary space and per unit 3-dimensional volume of momentum space. Despite first appearances, N turns out to be a geometric, frame-independent entity. This N and the frame-independent laws it obeys provide us with a means for computing, from microphysics, the macroscopic quantities of continuum physics: mass density, thermal energy density, pressure, equations of state, thermal and electrical conductivities, viscosities, diffusion coefficients, ... . In Chap. 3 we will develop the foundations of statistical mechanics. Here our statistical study will be more sophisticated than in kinetic theory: we shall deal with “ensembles” of physical systems. Each ensemble is a (conceptual) collection of a huge number of physical systems that are identical in the sense that they have the same degrees of freedom, but different in that their degrees of freedom may be in different states. For example, the systems in an ensemble might be balloons that are each filled with 1023 air molecules so each is describable by 3 × 1023 spatial coordinates (the x, y, z of all the molecules) and 3 × 1023 momentum coordinates (the px , py , pz of all the molecules). The state of one of the balloons is fully described, then, by 6 × 1023 numbers. We introduce a distribution function N which is a function of these 6 × 1023 different coordinates, i.e., it is defined in a phase space with 6 × 1023 dimensions. This distribution function tells us how many systems (balloons) in our ensemble lie in a unit volume of that phase space. Using this distribution function we will study such issues as the statistical meaning of entropy, the relationship between entropy iii

iv and information, the statistical origin of the second law of thermodynamics, the statistical meaning of “thermal equilibrium”, and the evolution of ensembles into thermal equilibrium. Our applications will include derivations of the Fermi-Dirac distribution for fermions in thermal equilibrium and the Bose-Einstein distribution for bosons, a study of Bose-Einstein condensation in a dilute gas, and explorations of the meaning and role of entropy in gases, in black holes and in the universe as a whole. In Chap. 4 we will use the tools of statistical mechanics to study statistical thermodynamics, i.e. ensembles of systems that are in or near thermal equilibrium (also called statistical equilibrium). Using statistical mechanics, we shall derive the laws of thermodynamics, and we shall learn how to use thermodynamic and statistical mechanical tools, hand in hand, to study not only equilibria, but also the probabilities for random, spontaneous fluctuations away from equilibrium. Among the applications we shall study are: (i) chemical and particle reactions such as ionization equilibrium in a hot gas, and electron-positron pair formation in a still hotter gas; and (ii) phase transitions, such as the freezing, melting, vaporization and condensation of water. We shall focus special attention on a Ferromagnetic phase transition in which the magnetic moments of atoms spontaneously align with each other as iron is cooled, using it to illustrate two elegant and powerful techniques of statistical physics: the renormalization group, and Monte Carlo methods. In Chap. 5 we will develop the theory of random processes (a modern, mathematical aspect of which is the theory of stochastic differential equations). Here we shall study the dynamical evolution of processes that are influenced by a huge number of factors over which we have little control and little knowledge, except their statistical properties. One example is the Brownian motion of a dust particle being buffeted by air molecules; another is the motion of a pendulum used, e.g., in a gravitational-wave interferometer, when one monitors that motion so accurately that one can see the influences of seismic vibrations and of fluctuating “thermal” (“Nyquist”) forces in the pendulum’s suspension wire. The position of such a dust particle or pendulum cannot be predicted as a function of time, but one can compute the probability that it will evolve in a given manner. The theory of random processes is a theory of the evolution of the position’s probability distribution (and the probability distribution for any other entity driven by random, fluctuating influences). Among the random-process concepts we shall study are spectral densities, correlation functions, the Fokker-Planck equation which governs the evolution of probability distributions, and the fluctuation-dissipation theorem which says that, associated with any kind of friction there must be fluctuating forces whose statistical properties are determined by the strength of the friction and the temperature of the entities that produce the friction. The theory of random processes, as treated in Chap. 5, also includes the theory of signals and noise. At first sight this undeniably important topic, which lies at the heart of experimental and observational science, might seem outside the scope of this book. However, we shall discover that it is intimately connected to statistical physics and that similar principles to those used to describe, say, Brownian motion are appropriate when thinking about, for example, how to detect the electronic signal of a rare particle event against a strong and random background. We shall study techniques for extracting weak signals from noisy data by filtering the data, and the limits that noise places on the accuracies of physics experiments and on the reliability of communications channels.

Chapter 2 Kinetic Theory Version 0802.1.K, 8 October 2008 Box 2.1 Reader’s Guide This chapter develops nonrelativistic (Newtonian) kinetic theory and also the relativistic theory. As in Chap. 1, sections and exercises labeled [N] are Newtonian, those labeled [R] are relativistic. The N material can be read without the R material, but the R material requires the N material as a foundation. This chapter relies on Chap. 1’s geometric viewpoint on physics, both N and R; and it especially relies on Secs. 1.4 and 1.6 (Particle kinetics), Sec. 1.11.4 (the number-flux 4-vector) and Sec. 1.12 (the Newtonian stress tensor and relativistic stress-energy tensor). This chapter (mostly the N part) is a crucial foundation for the remainder of Part I of this book (Statistical Physics), for small portions of Part IV (Fluid Mechanics; especially equations of state of fluids, the origin of viscosity, and the diffusion of heat in fluids), and for half of Part V (Plasma Physics: Chaps. 21 and 22).

2.1

Overview

In this chapter we shall study kinetic theory, the simplest of all branches of statistical physics. Kinetic theory deals with the statistical distribution of a “gas” made from a huge number of “particles” that travel freely, without collisions, for distances (mean free paths) long compared to their sizes. Examples of particles (italicized) and phenomena that can be studied via kinetic theory are these: (i) How galaxies, formed in the early universe, congregate into clusters as the universe expands. (ii) How spiral structure develops in the distribution of our galaxy’s stars. (iii) How, deep inside a white-dwarf star, relativistic degeneracy influences the equation of state of the star’s electrons and protons. (iv) How a supernova explosion affects the evolution of the density and temperature of interstellar molecules. (v) How anisotropies in the expansion of the universe affect the temperature distribution of the cosmic microwave 1

2 photons—the remnants of the big-bang. (vi) How changes of a metal’s temperature affect its thermal and electrical conductivity (with the heat and current carried by electrons). Whether neutrons in a nuclear reactor can survive long enough to maintain a nuclear chain reaction and keep the reactor hot. Most of these applications involve particle speeds small compared to light and so can be studied with Newtonian theory, but some involve speeds near or at the speed of light and require relativity. Accordingly, we shall develop both versions of the theory, Newtonian and relativistic, and shall demonstrate that the Newtonian theory is the low-speed limit of the relativistic theory. We begin in Sec. 2.2 by introducing the concepts of momentum space, phase space, and the distribution function. In Sec. 2.3 we study the distribution functions that characterize systems of particles in thermal equilibrium. There are three such equilibrium distributions: one for quantum mechanical particles with half-integral spin (fermions), another for quantum particles with integral spin (bosons), and a third for classical particles. As special applications, we derive the Maxwell velocity distribution for low-speed, classical particles (Ex. 2.4) and its high-speed relativistic analog (Ex. 2.5 and Fig. 2.6) and we compute the effects of observers’ motions on their measurements of the cosmic microwave radiation created in the big-bang origin of the universe (Ex. 2.6). In Sec. 2.4 we learn how to compute macroscopic, physical-space quantities (particle density and flux, energy density, stress tensor, stressenergy tensor, ...) by integrating over the momentum portion of phase space. In Sec. 2.5 we show that, if the momentum distribution is isotropic in some reference frame, then on macroscopic scales the particles constitute a perfect fluid, and we use our momentum-space integrals to evaluate the equations of state of various kinds of perfect fluids: a nonrelativistic, hydrogen gas in both the classical, nondegenerate regime and the regime of electron degeneracy (Sec. 2.5.2), a relativistically degenerate gas (Sec. 2.5.4), and a photon gas (Sec. 2.5.5), and we use our results to discuss the physical nature of matter as a function of density and temperature (Fig. 2.7). In Sec. 2.6 we study the evolution of the distribution function, as described by Liouville’s theorem and by the associated collisionless Boltzmann equation when collisions between particles are unimportant, and by the Boltzmann Transport Equation when collisions are significant, and we use a simple variant of these evolution laws to study the heating of the Earth by the Sun, and the key role played by the Greenhouse effect (Ex. 2.14). Finally, in Sec. 2.7 we learn how to use the Boltzmann transport equation to compute the transport coefficients (diffusion coefficient, electrical conductivity, thermal conductivity, and viscosity) which describe the diffusive transport of particles, charge, energy, and momentum through a gas of particles that collide frequently; and we use the Boltzmann transport equation to study a chain reaction in a nuclear reactor (Ex. 2.20).

2.2 2.2.1

Phase Space and Distribution Function [N] Newtonian Number density in phase space, N

In Newtonian, 3-dimensional space (physical space), consider a particle with rest mass m that moves along a path x(t) as universal time t passes [Fig. 2.1(a)]. The particle’s timevarying velocity and momentum are v(t) = dx/dt and p(t) = mv. The path x(t) is a curve

3

z p(0)

pz

p(2)

x(2) 2 t=1

t=0 p(0)

3 4

t=1

5

2 p(2)

t=0 x(0)

y

x (a)

px

(b)

3 4 5

py

Fig. 2.1: (a) Euclidean physical space, in which a particle moves along a curve x(t) that is parametrized by universal time t, and in which the particle’s momentum p(t) is a vector tangent to the curve. (b) Momentum space in which the particle’s momentum vector p is placed, unchanged, with its tail at the origin. As time passes, the momentum’s tip sweeps out the indicated curve p(t).

in the physical space, and the momentum p(t) is a time-varying, coordinate-independent vector in the physical space. It is useful to introduce an auxiliary 3-dimensional space, called momentum space, in which we place the tail of p(t) at the origin. As time passes, the tip of p(t) sweeps out a curve in momentum space [Fig. 2.1(b)]. This momentum space is “secondary” in the sense that it relies for its existence on the physical space of Fig. 2.1(a). Any Cartesian coordinate system of physical space, in which the location x(t) of the particle has coordinates (x, y, z), induces in momentum space a corresponding coordinate system (px , py , pz ). The 3-dimensional physical space and 3-dimensional momentum space together constitute a 6-dimensional phase space, with coordinates (x, y, z, px , py , pz ). In this chapter we study a collection of a very large number of identical particles (all with the same rest mass m). As tools for this study, consider a tiny 3-dimensional volume dVx centered on some location x in physical space, and a tiny 3-dimensional volume dVp centered on location p in momentum space. Together these make up a tiny 6-dimensional volume d2 V ≡ dVx dVp .

(2.1)

In any Cartesian coordinate system, we can think of dVx as being a tiny cube located at (x, y, z) and having edge lengths dx, dy, dz; and similarly for dVp . Then, as computed in this coordinate system, these tiny volumes are dVx = dx dy dz ,

dVp = dpx dpy dpz ,

d2 V = dx dy dz dpx dpy dpz .

(2.2)

Denote by dN the number of particles (all with rest mass m) that reside inside d2 V in phase space (at some moment of time t). Stated more fully: dN is the number of particles that, at time t, are located in the 3-volume dVx centered on the location x in physical space, and that also have momentum vectors whose tips at time t lie in the 3-volume dVp centered

4 on location p in momentum space. Denote by N (x, p, t) ≡

dN dN = 2 dV dVx dVp

(2.3)

the number density of particles at location (x, p) in phase space at time t. This is also called the distribution function. This distribution function is kinetic theory’s principal tool for describing any collection of a large number of identical particles.

2.2.2

[N] Distribution function f (x, v, t) for Particles in a Plasma.

Throughout Part II of this book (including this chapter), we will adopt the above definition (2.3) for the non-relativistic distribution function and will regard it as depending on position and momentum. However, in Part V, when dealing with nonrelativistic plasmas (collections of electrons and ions that have speeds small compared to light), we will adopt a different viewpoint, one that is common among plasma physicists: We will regard the distribution function as depending on time t, location x in Euclidean space, and velocity v (instead of momentum p), and we will denote it f (t, x, v) ≡

dN dN = = m3 N . dVx dVv dxdydz dvx dvy dvz

(2.4)

(This change of viewpoint and notation when transitioning to plasma physics is typical of the textbook you are reading. When presenting any subfield of physics, we shall usually adopt the conventions, notation, and also the system of units that are generally used in that subfield.)

2.2.3

[R] Relativistic Number Density in Phase Space, N

We shall define the special relativistic distribution function in precisely the same way as the 2 non-relativistic one, N (x, √ p, t) ≡ dN/d V = dN/dVx dVp , except that now p is the relativistic 2 momentum, (p = mv/ 1 − v if the particle has nozero rest mass m). This definition of N appears, at first sight, to be frame dependent, since the physical 3-volume dVx and momentum 3-volume dVp do not even exist until we have selected a specific reference frame. In other words, this definition appears to violate our insistence that relativistic physical quantities be described by frame-independent geometric objects that live in 4-dimensional spacetime. In fact, the distribution function defined in this way is frame-independent, though it does not look so. In order to elucidate this, we shall develop carefully and somewhat slowly the 4-dimensional spacetime ideas that underlie this relativistic distribution function: Consider, as shown in Fig. 2.2(a), a classical particle with rest mass m, moving through spacetime along a world line P(ζ), or equivalently ~x(ζ), where ζ is an affine parameter related to the particle’s 4-momentum by ~p = d~x/dζ . (2.5a)

5 [Eq. (1.18)]. If the particle has non-zero rest mass, then its 4-velocity ~u and proper time τ are related to its 4-momentum and affine parameter by ~p = m~u ,

ζ = τ /m

(2.5b)

[Eqs. (1.18) and (1.19)], and we can parametrize the world line by either τ or ζ. On the other hand, if the particle has zero rest mass, then its world line is null and τ does not change along it, so we have no choice but to use ζ as the world line’s parameter. The particle can be thought of not only as living in four-dimensional spacetime [Fig. 2.2(a)], but also as living in a four-dimensional momentum space [Fig. 2.2(b)]. Momentum space, like spacetime, is a geometric, coordinate-independent concept: each point in momentum space corresponds to a specific 4-momentum p~. The tail of the vector p~ sits at the origin of momentum space and its head sits at the point representing p~. The momentum-space diagram drawn in Fig. 2.2(b) has as its coordinate axes the components (p0 , p1 = p1 ≡ px , p2 = p2 ≡ py , p3 = p3 ≡ pz ) of the 4-momentum as measured in some arbitrary inertial frame. Because the squared length of the 4-momentum is always −m2 , ~p · p~ = −(p0 )2 + (px )2 + (py )2 + (pz )2 = −m2 ,

(2.5c)

the particle’s 4-momentum (the tip of the 4-vector p~) is confined to a hyperboloid in momentum space. This hyperboloid is described mathematically by Eq. (2.5c) in any Lorentz coordinate system and is called the mass hyperboloid. If the particle has zero rest mass, then p is null and the mass hyperboloid is a cone with vertex at the origin of momentum space. As in Chap. 1, we shall often denote the particle’s energy p0 by E ≡ p0

(2.5d)

(with the E in script font to distinguish it from the nonrelativistic energy of a particle, E = 12 mv 2 ), and we shall embody its spatial momentum in the 3-vector p = px ex +py ey +pz ez , t

0

p =E

ζ =4 ζ =3

p = dx dζ

1 2

x

ζ =0

(a)

4

ζ =0

ζ =2 ζ =1

3

p(ζ =2) y px

p

y

(b)

Fig. 2.2: (a) The world line ~x(ζ) of a particle in spacetime (with one spatial coordinate, z, suppressed), parametrized by a parameter ζ that is related to the particle’s 4-momentum by p~ = d~x/dζ. (b) The trajectory of the particle in momentum space. The particle’s momentum is confined to the mass hyperboloid, p~2 = −m2 .

6 and therefore shall rewrite the mass-hyperboloid relation (2.5c) as E 2 = m2 + |p|2 .

(2.5e)

If no forces act on the particle, then its momentum is conserved and its location in momentum space remains fixed. However, forces (e.g., due to an electromagnetic field) can push the particle’s 4-momentum along some curve in momentum space that lies on the mass hyperboloid. If we parametrize that curve by the same parameter ζ as we use in spacetime, then the particle’s trajectory in momentum space can be written abstractly as p~(ζ). Such a trajectory is shown in Fig. 2.2(b). Because the mass hyperboloid is three dimensional, we can characterize the particle’s location on it by just three coordinates rather than four. We shall typically use as those coordinates the spatial components of the particle’s 4-momentum, (px , py , pz ) or the spatial momentum vector p as measured in some specific (but usually arbitrary) inertial frame. Momentum space and spacetime, taken together, constitute the relativistic phase space. We can regard phase space as eight dimensional (four spacetime dimensions plus four momentum space dimensions). Alternatively, if we think of the 4-momentum as confined to the three-dimensional mass hyperboloid, then we can regard phase space as seven dimensional. This 7 or 8 dimensional phase space, by contrast with the non-relativistic 6-dimensional phase space, is frame-independent. No coordinates or reference frame are actually needed to define spacetime and explore its properties, and none are needed to define and explore 4-momentum space or the mass hyperboloid — though inertial (Lorentz) coordinates are often helpful in practical situations. Turn attention, now, from an individual particle to a collection of a huge number of identical particles, each with the same rest mass m, and allow m to be finite or zero, it does not matter which. Examine those particles that pass close to a specific event P (also denoted ~x) in spacetime; and examine them from the viewpoint of a specific observer, who lives in a specific inertial reference frame. Fig. 2.3(a) is a spacetime diagram drawn in that observer’s frame. As seen in that frame, the event P occurs at time t and at spatial location (x, y, z). We ask the observer, at the time t of the chosen event, to define the distribution function N in identically the same way √ as in Newtonian theory, except that p is the relativistic spatial momentum p = mv/ 1 − v 2 instead of the nonrelativistic p = mv. Specifically, the observer, in her inertial frame, chooses a tiny 3-volume dVx = dxdydz

(2.6a)

centered on location x [little horizontal rectangle shown in drawing 2.3(a)] and a tiny 3volume dVp = dpx dpy dpz . (2.6b) centered on p in momentum space [little rectangle in the px -py plane in drawing 2.3(b)]. Ask the observer to focus on that collection G of particles which lie in dVx and have spatial momenta in dVp [Fig. 2.3]. If there are dN particles in this collection G, then the observer will identify dN dN ≡ 2 (2.7) N ≡ dVx dVp dV

7

p

t

dx

d Σp

0

p

d Vx dy y p

x

(a)

x

p

dp

x

p

d Vp dp

y

y

(b)

Fig. 2.3: Definition of the distribution function from the viewpoint of a specific observer in a specific inertial reference frame, whose axes are used in these drawings: At the event P [denoted by the dot in drawing (a)], the observer selects a 3-volume dVx , and she focuses on the set G of particles that lie in dVx and have momenta lying in a region of the mass hyperboloid that is centered on p~ and has 3-momentum volume dVp [drawing (b)]. If dN is the number of particles in that set G, then N (P, p~) ≡ dN/dVx dVp .

as the number density of particles in phase space. Notice in drawing 2.3(b) that the 4-momenta of the particles in G have their tails at the origin of momentum space (as by definition do all 4-momenta), and have their tips in a tiny rectangular box on the mass hyperboloid — a box centered on the 4-momentum p 0 2 m + p2 . The momentum p~ whose spatial part is p and temporal part is p = E = volume element dVp is the projection of that mass-hyperboloid box onto the horizontal (px py -pz ) plane in momentum space. (The mass-hyperboloid box itself can be thought of as a ~ p — the momentum-space version of the vectorial (frame-independent) vectorial 3-volume dΣ 3-volume introduced in Sec. 1.11.2; see below.) The number density N depends on the location P in spacetime of the 3-volume dVx and on the 4-momentum ~p about which the momentum volume on the mass hyperboloid is centered: N = N (P, ~p). From the chosen observer’s viewpoint, it can be regarded as a function of time t and spatial location x (the components of P) and of spatial momentum p. At first sight, one might expect N to depend also on the inertial reference frame used in its definition, i.e., on the 4-velocity of the observer. If this were the case, i.e., if N at fixed P and p were different when computed by the above prescription using different inertial frames, then we would feel compelled to seek some other object—one that is frame-independent—to serve as our foundation for kinetic theory. This is because the principle of relativity insists that all fundamental physical laws should be expressible in frame-independent language. Fortunately, the distribution function (2.7) is frame-independent by itself, i.e. it is a frame-independent scalar field in phase space; so we need seek no further.

8 Proof of Frame-Independence of N = dN/d2 V: To prove the frame-independence of N , we shall consider, first, the frame dependence of the spatial 3-volume dVx , then the frame dependence of the momentum 3-volume dVp , and finally the frame dependence of their product d2 V = dVx dVp and thence of the distribution function N = dN/d2 V. The thing that identifies the 3-volume dVx and 3-momentum dVp is the set of particles G. We shall select that set once and for all and hold it fixed, and correspondingly the number of particles dN in the set will be fixed. Moreover, we shall assume that the particles’ rest mass m is nonzero and shall deal with the zero-rest-mass case at the end by taking the limit m → 0. Then there is a preferred frame in which to observe the particles G: their own rest frame, which we shall identify by a prime. In their rest frame and at a chosen event P, the particles G occupy the interior of some box As seen in some other “laboratory” frame, with imaginary walls that has some 3-volume dVx′ . √ their box has a Lorentz-contracted volume dVx = 1 − v 2 dVx′ . Here v is their speed as seen in the laboratory frame. The Lorentz-contraction factor is related to the particles’ energy, √ as measured in the laboratory frame, by 1 − v 2 = m/E, and therefore EdVx = mdVx′ . The right-hand side is a frame-independent constant m times a well-defined number that everyone can agree on: the particles’ rest-frame volume dVx′ ; i.e. EdVx = (a frame-independent quantity) .

(2.8a)

Thus, the spatial volume dVx occupied by the particles is frame-dependent, and their energy E is frame dependent, but the product of the two is independent of reference frame. Turn now to the frame dependence of the particles’ 3-volume dVp . As one sees from ~p Fig. 2.3(b), dVp is the projection of the frame-independent mass-hyperboloid region dΣ ~ p . Now, onto the laboratory’s xyz 3-space. Equivalently, it is the time component dΣ0p of dΣ ~ p , like the 4-momentum p~, is orthogonal to the mass hyperboloid at the the 4-vector dΣ ~ p is parallel to p~. This means that, common point where they intersect it, and therefor dΣ when one goes from one reference frame to another, the time components of these two vectors ~ 0 = dVp is proportional to p0 = E. This will grow or shrink in the same manner, i.e., dΣ p means that their ratio is frame-independent: dVp = (a frame-independent quantity) . E

(2.8b)

(If this sophisticated argument seems too slippery to you, then you can develop an alternative, more elementary proof using simpler two-dimensional spacetime diagrams: Exercise 2.1.) By taking the product of Eqs. (2.8a) and (2.8b) we see that for our chosen set of particles G, dVx dVp = d2 V = (a frame-independent quantity) ; (2.8c) and since the number of particles in the set, dN, is obviously frame-independent, we conclude that dN dN N = (2.9) ≡ 2 = (a frame-independent quantity) . dVx dVp d V

9 d x dA

n

dt

dΩ

Fig. 2.4: Geometric construction used in defining the “specific intensity” Iν .

Although we assumed nonzero rest mass, m 6= 0, in our derivation, the conclusions that EdVx and dVp /E are frame-independent continue to hold if we take the limit as m → 0 and the 4-momenta become null. Correspondingly, all of Eqs. (2.8a) – (2.9) are valid for particles with zero rest mass as well as nonzero. The normalization that one uses for the distribution function is arbitrary; renormalize N by multiplying with any constant, and N will still be a geometric, frame-independent quantity and will still contain the same information as before. In this book, we shall use several renormalized versions of N , depending on the situation. We shall now introduce them:

2.2.4

[R & N] Distribution function Iν /ν 3 for photons.

[Note to those readers who are restricting themselves to the Newtonian portions of this book: Please read Box 1.3, which lists a few items of special relativity that you will need. As described there, you can deal with photons fairly easily by simply remembering that a photon has zero rest mass and has an energy E = hν and momentum p = (hν/c)n, where ν is its frequency and n is a unit vector pointing in its spatial direction. By keeping this in mind, adjusting to the idea that we use “geometrized units” in which the speed of light is unity [Eqs. (1.3)], and accepting the occasional notation p0 = E for the photon energy, you should be able to understand the photon portions of this and later chapters.] When dealing with photons or other zero-rest-mass particles, one often reexpresses N in terms of the specific intensity, Iν . This quantity is defined as follows (cf. Fig. 2.4): An observer places a CCD (or other measuring device) perpendicular to the spatial direction n of propagation of the photons—perpendicular as measured in her inertial frame. The region of the CCD that the photons hit has surface area dA as measured by her, and because the photons move at the speed of light c = 1, the product of that surface area with the time dt that they take to all go through the CCD is equal to the volume they occupy at a specific moment of time dVx = dAdt . (2.10a) The photons all have nearly the same frequency ν as measured by the observer, and their energy E and momentum p are related to that frequency and to their propagation direction n by E = p0 = hν , p = hνn , (2.10b)

10 where h is Planck’s constant. Their frequencies lie in a range dν centered on ν, and they come from a small solid angle dΩ centered on −n; and the volume they occupy in momentum space is related to these by dVp = |p|2 dΩd|p| = h3 ν 2 dΩdν .

(2.10c)

The photons’ specific intensity, as measured by the observer, is defined to be the total energy, dE = hνdN ,

(2.10d)

that crosses the CCD per unit area dA, per unit time dt, per unit frequency dν, and per unit solid angle dΩ (i.e., “per unit everything”): Iν ≡

dE . dAdtdνdΩ

(2.11)

(This Iν is sometimes denoted IνΩ .) From Eqs. (2.9), (2.10) and (2.11) we readily deduce the following relationship between this specific intensity and the distribution function: N =

c2 Iν . h4 ν 3

(2.12)

Here the factor c2 has been inserted so that Iν is expressed in ordinary (cgs or mks) units in accord with astronomers’ conventions. This relation shows that, with an appropriate renormalization, Iν /ν 3 is the photons’ distribution function. Astronomers and opticians regard the specific intensity (or equally well Iν /ν 3 ) as a function of the photon propagation direction n, photon frequency ν, location x in space, and time t. By contrast, relativistic physicists regard the distribution function N as a function of the photon 4-momentum ~p (on the photons’ mass hyperboloid, which is the light cone) and of location P in spacetime. Clearly, the information contained in these two sets of variables, the astronomers’ set and the physicists’ set, is the same. If two different astronomers in two different inertial frames at the same event in spacetime examine the same set of photons, they will measure the photons to have different frequencies ν (because of the Doppler shift between their two frames); and they will measure different specific intensities Iν (because of Doppler shifts of frequencies, Doppler shifts of energies, dilation of times, Lorentz contraction of areas of CCD’s, and aberrations of photon propagation directions and thence distortions of solid angles). However, if each astronomer computes the ratio of the specific intensity that she measures to the cube of the frequency she measures, that ratio, according to Eq. (2.12), will be the same as computed by the other astronomer; i.e., the distribution function Iν /ν 3 will be frame-independent.

2.2.5

[N & R] Mean Occupation Number, η

Although this book is about classical physics, we cannot avoid making occasional contact with quantum theory. The reason is that classical physics is quantum mechanical in origin. Classical physics is an approximation to quantum physics, and not conversely. Classical physics is derivable from quantum physics, and not conversely.

11 In statistical physics, the classical theory cannot fully shake itself free from its quantum roots; it must rely on them in crucial ways that we shall meet in this chapter and the next. Therefore, rather than try to free it from its roots, we shall expose the roots and profit from them by introducing a quantum mechanically based normalization for the distribution function: the “mean occupation number” η. As an aid in defining the mean occupation number, we introduce the concept of the density of states: Consider a relativistic particle of mass m, described quantum mechanically. Suppose that the particle is known to be located in a volume dVx (as observed in a specific inertial reference frame) and to have a spatial momentum in the region dVp centered on p. Suppose, further, that the particle does not interact with any other particles or fields. How many single-particle quantum mechanical states1 are available to the free particle? This question is answered most easily by constructing, in the particle’s inertial frame, a complete set of wave functions for the particle’s spatial degrees of freedom, with the wave functions (i ) confined to be eigenfunctions of the momentum operator, and (ii ) confined to satisfy the standard periodic boundary conditions on the walls of a box with volume dVx . For simplicity, let the box have edge length L along each of the three spatial axes of the Cartesian spatial coordinates, so dVx = L3 . (This L is arbitrary and will drop out of our analysis shortly.) Then a complete set of wave functions satisfying (i ) and (ii ) is the set {ψj,k,l } with ψj,k,l (x, y, z) =

1 i(2π/L)(jx+ky+lz) −iωt e e L3/2

(2.13a)

[cf., e.g., pp. 1440–1442 of Cohen-Tannoudji, Diu and Laloe (1977), especially the Comment at the end of this section]. Here the demand that the wave function take on the same values at the left and right faces of the box (x = −L/2 and x = +L/2), and at the front and back faces, and at the top and bottom faces (the demand for periodic boundary conditions) dictates that the quantum numbers j, k, and l be integers. The basis states (2.13a) are eigenfunctions of the momentum operator (~/i)∇ with momentum eigenvalues px =

2π~ j, L

py =

2π~ k, L

pz =

2π~ ; L

(2.13b)

and correspondingly the wave function’s frequency ω has the following values in Newtonian theory [N] and relativity [R]: 2 2π~ p2 1 [N] ~ω = E = = (j 2 + k 2 + l2 ) ; 2m 2m L p 2 2 [R] ~ω = E = m + p → m + E in the Newtonian limit.

(2.13c) (2.13d)

Equations (2.13b) tell us that the allowed values of the momentum are confined to “lattice sites” in 3-momentum space with one site in each cube of side 2π~/L. Correspondingly, the total number of states in the region dVx dVp of phase space is the number of cubes of side 1

A quantum mechanical state for a single particle is called an “orbital” in the chemistry literature and in the classic thermal physics textbook by Kittel and Kroemer (1980); we shall use physicists’ more conventional but cumbersome phrase “single-particle quantum state”.

12 2π~/L in the region dVp of momentum space: dNstates =

L3 dVp dVx dVp dVp = = . 3 3 (2π~/L) (2π~) h3

(2.14)

This is true no matter how relativistic or nonrelativistic the particle may be. Thus far we have considered only the particle’s spatial degrees of freedom. Particles can also have an internal degree of freedom called “spin”. For a particle with spin s, the number of independent spin states is  2s + 1 if m 6= 0; e.g., an electron or proton or atomic nucleus, gs = 2 if m = 0 & s > 0; e.g., a photon (s = 1) or graviton (s = 2), (2.15)  1 if m = 0 & s = 0; i.e., a hypothethical massless scalar particle. We shall call this number of internal spin states the particle’s multiplicity. Taking account both of the particle’s spatial degrees of freedom and its spin degree of freedom, we conclude that the total number of independent quantum states available in the region dVx dVp ≡ d2 V of phase space is dNstates = (gs /h3 )d2 V, and correspondingly the number density of states in phase space is Nstates ≡

dNstates gs = . d2 V h3

(2.16)

Note that, although we derived this number density of states using a specific inertial reference frame, it is a frame-independent quantity (in relativity theory), with a numerical value depending only on Planck’s constant and (through gs ) the particle’s rest mass m and spin s. The ratio of the number density of particles to the number density of quantum states is obviously the number of particles in each state (the state’s occupation number), averaged over many neighboring states—but few enough that the averaging region is small by macroscopic standards. In other words, this ratio is the quantum states’ mean occupation number η: η=

N Nstates

=

h3 N ; gs

i.e., N = Nstates η =

gs η. h3

(2.17)

The mean occupation number η plays an important role in quantum statistical mechanics, and its quantum roots have a profound impact on classical statistical physics: From quantum theory we learn that the allowed values of the occupation number for a quantum state depend on whether the state is that of a fermion (a particle with spin 1/2, 3/2, 5/2, . . .) or that of a boson (a particle with spin 0, 1, 2, . . .). For fermions no two particles can occupy the same quantum state, so the occupation number can only take on the eigenvalues 0 and 1. For bosons one can shove any number of particles one wishes into the same quantum state, so the occupation number can take on the eigenvalues 0, 1, 2, 3, . . .. Correspondingly, the mean occupation numbers must lie in the ranges 0 ≤ η ≤ 1 for fermions ,

0 ≤ η < ∞ for bosons .

(2.18)

Quantum theory also teaches us that when η ≪ 1, the particles, whether fermions or bosons, behave like classical, discrete, distinguishable particles; and when η ≫ 1 (possible only for

13 bosons), the particles behave like a classical wave—if the particles are photons (s = 1), like a classical electromagnetic wave; and if they are gravitons (s = 2), like a classical gravitational wave. This role of η in revealing the particles’ physical behavior will motivate us frequently to use η as our distribution function instead of N . Of course η, like N , is a function of location in phase space, η(P, p~) in relativity with no inertial frame chosen; or η(t, x, p) in both relativity and Newtonian theory when an inertial frame is in use. **************************** EXERCISES t

t' p

G

0

p

0'

mass hyperboliod x'

d x'

p

dx

x dp

(a)

dp x'

x'

p x

x

(b)

Fig. 2.5: (a) Spacetime diagram drawn from viewpoint of the (primed) rest frame of the particles G for the special case where the laboratory frame moves in the x′ direction with respect to them. (b) Momentum space diagram drawn from viewpoint of the unprimed observer.

Exercise 2.1 Problem: Derivation of Frame-Dependences of dVx and dVp Use the twodimensional spacetime diagrams of Fig. 2.5 to show that EdVx and dVp /E are frame-independent [Eqs. (2.8a) and (2.8b)]. Exercise 2.2 **Example: [R] Distribution function for Particles with a Range of Rest Masses Consider a collection of particles (e.g. stars) that are not all identical, but instead have different rest masses. (a) For a subset G of particles like that of Fig. 2.3 and associated discussion, but with a range of rest masses dm centered on some value m, introduce the phase-space volume d2 V ≡ dVx dVp dm that the particles G occupy. Explain why this occupied volume is frame-invariant. (b) Show that this invariant occupied volume can be rewritten as d2 V = (dVx E/m)(dVp dE) = (dVx E/m)(dp0 dpx dpy dpz ). Explain the physical meaning of each of the terms in parentheses, and show that each is frame-invariant.

14 If the number of particles in the set G is dN, then we define the frame-invariant distribution function by dN dN . (2.19) N ≡ 2 = d V dVx dVp dm This is a function of location P in 4-dimensional spacetime and location p~ in 4-dimensional momentum space (not confined to the mass hyperboloid), i.e. a function of location in 8dimensional phase space. We will explore the evolution of this distribution function in Box 2.2 below. Exercise 2.3 Practice & Example: [N & R] Regimes of Particulate and Wave-like Behavior (a) A gamma-ray burster is an astrophysical object (probably a fireball of hot gas exploding outward from the vicinity of a newborn black hole or colliding neutron stars or colliding neutron star and black hole) at a cosmological distance from earth (∼ 1010 light years). The fireball emits gamma rays with individual photon energies as measured at earth E ∼ 100 keV. These photons arive at Earth in a burst whose total energy per unit area is roughly 10−6 ergs/cm2 and that lasts about one second. Assume the diameter of the emitting surface as seen from earth is ∼ 1000 km and there is no absorption along the route to earth. Make a rough estimate of the mean occupation number of the burst’s photon states. Your answer should be in the region η ≪ 1, so the photons behave like classical, distinguishable particles. Will the occupation number change as the photons propagate from the source to earth? (b) A highly nonspherical supernova in the Virgo cluster of galaxies (40 million light years from earth) emits a burst of gravitational radiation with frequencies spread over the band 500 Hz to 2000 Hz, as measured at earth. The burst comes out in a time of about 10 milliseconds, so it lasts only a few cycles, and it carries a total energy of roughly 10−3 M⊙ c2 , where M⊙ = 2 × 1033 g is the mass of the sun. The emitting region is about the size of the newly forming neutron-star core (10 km), which is small compared to the wavelength of the waves; so if one were to try to resolve the source spatially by imaging the waves with a gravitational lens, one would see only a blur of spatial size one wavelength rather than seeing the neutron star. What is the mean occupation number of the burst’s graviton states? Your answer should be in the region η ≫ 1, so the gravitons behave like a classical gravitational wave.

****************************

2.3

[N & R] Thermal-Equilibrium Distribution Functions

In Chap. 3 we will introduce with care, and explore in detail, the concept of “statistical equilibrium”—also called “thermal equilibrium”. For now, we rely on the reader’s prior experience with this concept.

15 If a collection of many identical particles is in thermal equilibrium in the neighborhood of an event P then, as we shall see in Chap. 3, there is a special inertial reference frame (the mean rest frame of the particles near P) in which there are equal numbers of particles of any given speed going in all directions, i.e. the mean occupation number η is a function only of the magnitude |p| of the particle momentum and does not depend on the momentum’s direction. Equivalently, η is a function of the particle’s energy. In the relativistic regime we shall use two different energies, one denoted E that includes the contribution of the particle’s rest mass and the other denoted E that does not and is defined by E ≡E −m=

p p2 m2 + p2 − m → in the low-velocity, Newtonian limit. 2m

(2.20)

In the nonrelativistic, Newtonian regime we shall use only E = p2 /2m. Most readers will already know that the details of the thermal equilibrium are fixed by two quantities: the mean density of particles and the mean energy per particle, or equivalently (as we shall see) by the chemical potential µ and the temperature T . By analogy with our treatment of relativistic energy, we shall use two different chemical potentials: one, µ ˜ that includes rest mass and the other µ≡µ ˜−m (2.21) that does not. In the Newtonian regime we shall use only µ. As we shall prove by an elegant argument in Chap. 3, in thermal equilibrium the mean occupation number has the following form at all energies, relativistic or nonrelativistic: η=

1 e(E−µ)/kB T

η=

+1

1 e(E−µ)/kB T

−1

for fermions ,

(2.22a)

for bosons .

(2.22b)

Here kB = 1.381 × 10−16 erg K−1 = 1.381 × 10−23 J K−1 is Boltzmann’s constant. Equation (2.22a) for fermions is called the Fermi-Dirac distribution; Eq. (2.22b) for bosons is called the Bose-Einstein distribution. In the relativistic regime, we can also write these distribution functions in terms of the energy E that includes the rest mass as [R] η =

1 e(E−µ)/kB T

±1

=

1 e(E−˜µ)/kB T

±1

.

(2.22c)

Notice that the equilibrium mean occupation number (2.22a) for fermions lies in the range 0 to 1 as required, while that (2.22b) for bosons lies in the range 0 to ∞. In the regime µ ≪ −kB T , the mean occupation number is small compared to unity for all particle energies E (since E is never negative, i.e. E is never less than m). This is the domain of distinguishable, classical particles, and in it both the Fermi-Dirac and BoseEinstein distributions become η ≃ e−(E−µ)/kB T = e−(E−˜µ)/kB T when µ ≡ µ ˜ − m ≪ −kB T

(classical particles). (2.22d)

16 This limiting distribution is called the Boltzmann distribution.2 By scrutinizing the distribution functions (2.22), one can deduce that the larger the temperature T at fixed µ, the larger will be the typical energies of the particles, and the larger the chemical potential µ at fixed T , the larger will be the total density of particles; see Ex. 2.4 and Eqs. (2.37). For bosons, µ must always be negative or zero, i.e. µ ˜ cannot exceed the particle rest mass m; otherwise η would be negative at low energies, which is physically impossible. For bosons with µ extremely close to zero, there is a huge number of very low energy particles, leading quantum mechanically to a boson condensate; we shall study boson condensates in Sec. 3.5. In the special case that the particles of interest can be created and destroyed completely freely, with creation and destruction constrained only by the laws of 4-momentum conservation, the particles quickly achieve a thermal equilibrium in which the relativistic chemical potential vanishes, µ ˜ = 0 (as we shall see in Sec. 4.4). For example, inside a box whose walls are perfectly emitting and absorbing and have temperature T , the photons acquire the mean occupation number (2.22b) with zero chemical potential, leading to the standard black-body (Planck) form η=

1 ehν/kB T − 1

,

N =

2 1 , 3 hν/k T −1 B h e

Iν =

(2h/c2 )ν 3 . ehν/kB T − 1

(2.23)

(Here we have set E = hν where ν is the photon frequency as measured in the box’s rest frame, and in the third expression we have inserted the factor c−2 so that Iν will be in ordinary units.) On the other hand, if one places a fixed number of photons inside a box whose walls cannot emit or absorb them but can scatter them, exchanging energy with them in the process, then the photons will acquire the Bose-Einstein distribution (2.22b) with temperature T equal to that of the walls and with nonzero chemical potential µ fixed by the number of photons present; the more photons there are, the larger will be the chemical potential. **************************** EXERCISES Exercise 2.4 **Example: [N] Maxwell Velocity Distribution Consider a collection of thermalized, classical particles with nonzero rest mass, so they have the Boltzmann distribution (2.22d). Assume that the temperature is low enough, kB T ≪ mc2 that they are nonrelativistic. (a) Explain why the total number density of particles n in physical Rspace (as measured in the particles’ mean rest frame) is given by the integral n = N dVp . Show that n ∝ eµ/kB T , and derive the proportionality constant. [Hint: use spherical coordinates in momentum space so dVp = 4πpdp with p ≡ |p|.] Your answer should be Eq. (2.37a) below. 2

Lynden-Bell (1967) identifies a fourth type of thermal distribution which occurs in the theory of violent relaxation of star clusters. It corresponds to individually distinguishable, classical particles (in his case stars with a range of masses) that obey the same kind of exclusion principle as fermions.

17 1.0 0.6 0.8

0.5 0.4 vo P(vo) 0.3

P(v) Pmax

vo=1

vo=0.1

vo=2

vo=0.5

0.6 0.4

0.2 0.2

0.1 0

0

0.5

1

1.5

2

2.5

3

0

0

0.2

0.4

v/vo

v

0.6

0.8

1

(b)

(a)

Fig. 2.6: (a) Maxwell velocity distribution for thermalized, classical, nonrelativistic particles. (b) Extension of the Maxwell velocity distribution into the relativistic domain. In both plots p vo = 2kB T /m

(b) Explain why the mean energy per particle is given by E¯ = E¯ = 23 kB T .

R

(p2 /2m)dVp . Show that

(c) Show that P (v)dv ≡(probability that a randomly chosen particle will have speed v ≡ |v| in the range dv) is given by r 2kB T 4 v 2 −v2 /vo3 , where vo = . P (v) = √ 3 e (2.24) π vo m This is called the Maxwell velocity distribution; it is graphed in Fig. 2.6(a). Notice that the peak of the distribution is at speed vo . Exercise 2.5 Problem: [R] Velocity Distribution for Thermalized, Classical, Relativistic Particles Show that for thermalized, classical relativistic particles the probability distribution for the speed [relativistic version of the Maxwell distribution (2.24)] is r 2kB T 2/vo2 v2 exp − √ . (2.25) P (v) = constant , where vo = 2 5/2 (1 − v ) m 1 − v2 This is plotted in Fig. 2.6(b) for a sequence of temperatures ranging from the nonrelativistic regime kB T ≪ m toward the ultrarelativistic regime kB T ≫ m. In the ultrarelativistic regime the particles are (almost) all moving at very close to the speed of light, v = 1. Exercise 2.6 Example: [R] Observations of Cosmic Microwave Radiation from Earth The universe is filled with cosmic microwave radiation left over from the big bang. At each event in spacetime the microwave radiation has a mean rest frame; and as seen in that mean rest frame the radiation’s distribution function η is almost precisely isotropic and thermal with zero chemical potential: 1 , with To = 2.73 K . (2.26) η = hν/k To B −1 e Here ν is the frequency of a photon as measured in the mean rest frame.

18 (a) Show that the specific intensity of the radiation as measured in its mean rest frame has the Planck spectrum (2h/c2 )ν 3 . (2.27) Iν = hν/k To B −1 e Plot this specific intensity as a function of wavelength and from your plot determine the wavelength of the intensity peak. (b) Show that η can be rewritten in the frame-independent form 1 η = −~p·~uo /k To , (2.28) B −1 e where p~ is the photon 4-momentum and ~uo is the 4-velocity of the mean rest frame. [Hint: See Sec. 1.6 and especially Eq. (1.38).] (c) In actuality the earth moves relative to the mean rest frame of the microwave background with a speed v of about 600 km/sec toward the Hydra-Centaurus region of the sky. An observer on earth points his microwave receiver in a direction that makes an angle θ with the direction of that motion, as measured in the earth’s frame. Show that the specific intensity of the radiation received is precisely Planckian in form [Eq. (2.23)], but with a direction-dependent Doppler-shifted temperature √ 1 − v2 . (2.29) T = To 1 − v cos θ Note that this Doppler shift of T is precisely the same as the Doppler shift of the frequency of any specific photon. Note also that the θ dependence corresponds to an anisotropy of the microwave radiation as seen from earth. Show that because the earth’s velocity is small compared to the speed of light, the anisotropy is very nearly dipolar in form. What is the magnitude ∆T /T of the variations between maximum and minimum microwave temperature on the sky? It was by measuring these variations that astronomers3 discovered the motion of the earth relative to the mean rest frame of the cosmic microwave radiation. ****************************

2.4

Macroscopic Properties of Matter as Integrals over Momentum Space

2.4.1

[N] Newtonian Particle Density n, Flux S, and Stress Tensor T

If one knows the Newtonian distribution function N = (gs /h3 )η as a function of momentum p at some location (x, t) in space and time, one can use it to compute various macroscopic properties of the particles. Specifically: 3

Corey and Wilkinson (1976), and Smoot, Gorenstein, and Muller (1977).

19 From the definition N ≡ dN/dVx dVp of the distribution function, it is clear that the number density of particles n(x, t) in physical space is given by the integral dN n= = dVx

Z

dN dVp = dVx dVp

Z

N dVp .

(2.30a)

Similarly, the number of particles crossing a unit surface in the y-z plane per unit time, i.e. the x component of the flux of particles, is Z Z dN dx px dN Sx = = dVp = N dVp , dydzdt dxdydzdVp dt m where dx/dt = px /m is the x component of the particle velocity. This and the analogous equations for Sy and Sz can be combined into the single geometric, coordinate-independent integral Z dVp . S= Np (2.30b) m Finally, since the stress tensor T is the flux of momentum [Eq. (1.89)], its j-x component (j component of momentum crossing a unit area in the y-z plane per unit time) must be Z Z Z dx px dN dN pj dVp = pj dVp = N pj dVp . Tjx = dydzdtdVp dxdydzdVp dt m This and the corresponding equations for Tjy and Tjz can be collected together into the single geometric, coordinate-independent integral Tjk =

Z

dVp N pj pk , m

i.e., T =

Z

Np ⊗ p

dVp . m

(2.30c)

Notice that the number density n is the zero’th moment of the distribution function in momentum space [Eq. (2.30a)], and aside from factors 1/m the particle flux is the first moment [Eq. (2.30b)], and the stress is the second moment [Eq, (2.30c)]. All three moments are geometric, coordinate-independent quantities, and they are the simplest such quantities that one can construct by integrating the distribution function over momentum space.

2.4.2

~ and Stress-Energy [R] Relativistic Number-Flux 4-Vector S Tensor T

When we switch from Newtonian theory to special relativity’s 4-dimensional spacetime viewpoint, we require that all physical quantities be described by components of geometric, frame-independent objects (scalars, vectors, tensors) in 4-dimensional spacetime. We can construct such objects as momentum-space integrals over the frame-independent, relativistic distribution function N (P, p~) = (gs /h3 )η. The frame-independent quantities that can appear in these integrals are (i) N itself, (ii) the 4-momentum p~, and (iii) the frame-independent

20 p integration element dVp /E [Eq. (2.8b)], which takes the form dpx dpy dpz / m2 + p2 in any inertial reference frame. By analogy with the Newtonian regime, the most interesting such integrals are the lowest three moments of the distribution function: Z dVp ; (2.31a) R≡ N E ~≡ S T≡

Z

Z

dVp , N ~p E

dVp , N ~p ⊗ ~p E

µ

i.e. S ≡ i.e. T

µν

Z

≡

N pµ Z

dVp ; E

N pµ pν

dVp . E

(2.31b) (2.31c)

Here and throughout this chapter, relativistic momentum-space integrals unless otherwise specified are taken over the entire mass hyperboloid. We can learn the physical meanings of each of the momentum-space integrals (2.31) by introducing a specific but arbitrary inertial reference frame, and using it to perform a 3+1 split of spacetime into space plus time [cf. the paragraph containing Eq. (1.37)]. When we do this and rewrite N as dN/dVx dVp , the scalar field R of Eq. (2.31a) takes the form Z dN 1 R= dVp (2.32) dVx dVp E (where of course dVx = dxdydz and dVp = dpx dpy dpz ). This is the sum, over all particles in a unit 3-volume, of the inverse energy. Although it is intriguing that this quantity is a frame-independent scalar, it is not a quantity that appears in any important way in the laws of physics. ~ of Eq. (2.31b) plays a very important role in physics. By contrast, the 4-vector field S Its time component in our chosen frame is Z Z dN p0 dN 0 S = dVp = dVp (2.33a) dVx dVp E dVx dVp (since p0 and E are just different notations for the same thing, the relativistic energy p m2 + p2 of a particle). Obviously, this S 0 is the number of particles per unit spatial volume as measured in our chosen inertial frame: S 0 = n = (number density of particles) . ~ is The x component of S Z Z Z dN dN dx dN px x dVp , dVp = dVp = S = dVx dVp E dxdydz dVp dt dtdydz dVp

(2.33b)

(2.33c)

which is the number of particles crossing a unit area in the y-z plane per unit time, i.e. the x-component of the particle flux; and similarly for other directions j, S j = (j-component of the particle flux vector S).

(2.33d)

21 [In Eq. (2.33c), the second equality follows from pj dxj /dζ dxj pj = 0 = = = (j-component of velocity) , E p dt/dζ dt

(2.33e)

where ζ is the “affine parameter” such that ~p = d~x/dζ.] Since S 0 is the particle number ~ must be the number-flux 4-vector introduced and density and S j is the particle flux, S studied in Sec. 1.11.4. Notice that in the RNewtonian limit, where p0 = E → R m, the temporal ~ = N ~p (dVp /E) reduce to S 0 = N dVp and S = and spatial parts of the formula S R N p(dVp /m), which are the coordinate-independent expressions (2.30a) and (2.30b) for the Newtonian number density of particles and flux of particles. Turn to the quantity T µν defined by the integral (2.31c). When we perform a 3+1 split of it in our chosen inertial frame, we find the following for its various parts: Z Z dN dN µ0 µ 0 dVp T = p p 0 = pµ dVp , (2.34a) dVx dVp p dVx dVp is the µ-component of 4-momentum per unit volume (i.e., T 00 is the energy density and T j0 is the momentum density). Also, Z Z Z dN dN dN dx µ µx µ j dVp T = p p 0 = pµ dVp (2.34b) p dVp = dVx dVp p dxdydzdVp dt dtdydzdVp is the amount of µ-component of 4-momentum that crosses a unit area in the y-z plane per unit time; i.e., it is the x-component of flux of µ-component of 4-momentum. More specifically, T 0x is the x-component of energy flux (which is the same as the momentum density T x0) and T jx is the x component of spatial-momentum flux—or, equivalently, the jx component of the stress tensor. These and the analogous expressions and interpretations of T µy and T µz can be summarized by T 00 = (energy density) ,

T j0 = (momentum density) = T 0j = (energy flux) , T jk = (stress tensor) . (2.34c)

Therefore [cf. Eq. (1.93f)], T must be the stress-energy tensor introduced and studied in Sec. 1.12. Notice that in the Newtonian limit, where E → m, the coordinate-independent R equation (2.31c) for the spatial part of the stress-energy tensor (the stress) becomes N p ⊗ p dVp /m, which is the same as our coordinate-independent equation (2.30c) for the stress.

2.5

Isotropic Distribution Functions and Equations of State

2.5.1

[N] Newtonian Density, Pressure, Energy Density and Equation of State

Let us return, for awhile, to Newtonian theory:

22 If the Newtonian distribution function in momentum space, i.e. is a function p 2 is isotropic only of the magnitude p ≡ |p] = px + p2y + p2z of the momentum (as is the case, for example, when the particle distribution is thermalized), then the particle flux S vanishes (equal numbers of particles travel in all directions), and the stress tensor is isotropic, T = P g, i.e. Tjk = P δjk ; i.e. it is that of a perfect fluid. [Here P is the isotropic pressure and g is the metric tensor of Euclidan 3-space, with Cartesian components equal to the Kronecker delta; Eq. (1.29f).] In this isotropic case, the pressure can be computed most easily as 1/3 the trace of the stress tensor (2.30c): Z Z Z 2 1 1 1 ∞ 4π ∞ 2 2 2 dVp 2 4πp dp P = Tjj = N (px + py + pz ) Np N p4 dp . (2.35a) = = 3 3 m 3 o m 3m 0 Here in the third step we have written the momentum-volume element in spherical polar coordinates as dVp = p2 dθdφdp and have integrated over angles to get 4πp2 dp. Similarly, we can reexpress the number density of particles (2.30a) and the corresponding mass density as Z ∞ Z ∞ 2 n = 4π N p dp , ρ ≡ mn = 4πm p2 dp . (2.35b) 0

0

Finally, because each particle carries an energy E = p2 /2m, the energy density in this isotropic case is 3/2 the pressure: Z 2 Z p 3 4π ∞ ǫ= N p2 dp = P ; N dVp = (2.35c) 2m 2m 0 2 cf. Eq. (2.35a). If we know the distribution function for an isotropic collection of particles, Eqs. (2.35) give us a straightforward way of computing the collection’s number density of particles n, mass density ρ = nm, perfect-fluid energy density ǫ, and perfect-fluid pressure P as measured in the particles’ mean rest frame. For a thermalized gas, the distribution functions (2.22a), (2.22b), and (2.22d) [with N = (gs /h3 )η] depend on two parameters: the temperature T and chemical potential µ, so this calculation gives n, ǫ and P in terms of µ and T . One can then invert n(µ, T ) to give µ(n, T ) and insert into the expressions for ǫ and P to obtain equations of state for thermalized, nonrelativistic particles ǫ = ǫ(ρ, T ) ,

P = P (ρ, T ) .

(2.36)

For a gas of nonrelativistic, classical particles, the distribution function is Boltzmann [Eq. (2.22d)], N = (gs /h3 )e(µ−E)/kB T with E = p2 /2m, and this procedure gives, quite easily [Ex. 2.7]: n=

gs gs eµ/kB T = 3 (2πmkB T )3/2 eµ/kB T , 3 λT dB h 3 ǫ = nkB T , 2

P = nkB T .

(2.37a)

(2.37b)

23 Notice that the mean energy per particle is 3 E¯ = kB T . 2

(2.37c)

√ In Eq. (2.37a), λT dB ≡ h/ 2πmkB T is the particles’ thermal deBroglie wavelength, i.e. the wavelength of Schroedinger wave-function oscillations for a particle with the thermal kinetic energy E = πkB T . Note that the classical regime η ≪ 1 (i.e. µ/kB T ≪ −1), in which our computation is being performed, corresponds to a mean number of particles in a thermal deBroglie wavelength small compared to one, nλ3T dB ≪ 1, which should not be surprising.

2.5.2

[N] Equations of State for a Nonrelativistic Hydrogen Gas

As an application, consider ordinary matter. Fig. 2.7 shows its physical nature as a function of density and temperature, near and above “room temperature”, 300 K. We shall study solids (lower right) in Part III, fluids (lower middle) in Part IV, and plasmas (middle) in Part V. Our kinetic theory tools are well suited to any situation where the particles have mean free paths large compared to their sizes; this is generally true in plasmas and sometimes in fluids (e.g. air and other gases but not water), and even sometimes in solids (e.g. for electrons in a metal). Here we shall focus on a nonrelativistic plasma, i.e. the region of Fig. 2.7 that is bounded by the two dashed lines and the slanted solid line. For concreteness and simplicity, we shall regard the plasma as made solely of hydrogen. (This is a good approximation in most astrophysical situations; the modest amounts of helium and traces of other elements usually do not play a major role in equations of state. By contrast, for a laboratory plasma it can be a poor approximation; for quantitative analyses one must pay attention to the plasma’s chemical composition.) Our nonrelativistic Hydrogen plasma consists of a mixture of two gases (or “fluids”): free electrons and free protons, in equal numbers. Each fluid has a particle number density n = ρ/mp , where ρ is the total mass density and mp is the proton mass. (The electrons are so light that they do not contribute significantly to ρ.) Correspondingly, the energy density and pressure include equal contributions from the electrons and protons and are given by [cf. Eqs. (2.37b)] ǫ = 3(kB /mp )ρT , P = 2(kB /mp )ρT . (2.38) In “zero’th approximation”, the high-temperature boundary of validity for this equation of state is the temperature Trel = me c2 /kB = 6 × 109 K at which the electrons become strongly relativistic (top dashed line in Fig. 2.7). In Ex. 4.5 we shall compute the thermal production of electron-positron pairs in the hot plasma and thereby shall discover that the upper boundary is actually somewhat lower than this (Fig. 4.6). The bottom dashed line in Fig. 2.7 is the temperature Trec ∼ (ionization energy of hydrogen)/(a few kB ) ∼ 104 K at which electrons and protons begin to recombine and form neutral hydrogen. In Ex. 4.6 we shall analyze the conditions for ionization-recombination equilibrium and thereby shall refine this boundary. The solid right boundary is the point at which the electrons cease to behave like classical particles, because their mean occupation number ηe ceases to be small compared to unity. As one can see from the Fermi-Dirac distribution (2.22a), for typical

6 5 4 3 2

Plasma

electron degenerate

Ionized Neutral

Fluid -28 -24 -20 -16 -12 -8 log10 ρ , g/cm 3

Nonrelativistic Relativistic

7

Tokamak

cla Sun qu ssica an tum l

log10T, οK

8

Relativistic Nonrelativistic

Ionosphere

9

Interstellar Medium Van Allen Belts

10

Intergalactic Medium

24

-4

Solid 0

4

8

Fig. 2.7: Physical nature of matter at various densities and temperatures. The plasma regime is discussed in great detail in Part V, and the equation of state there is Eq. (2.38). The region of electron degeneracy (to the right of the slanted solid line) is analyzed in Sec. 2.5.4, and for the nonrelativistic regime (between slanted solid line and vertical dotted line) in the second half of Sec. 2.5.2. The boundary between the plasma regime and the electron-degenerate regime (slanted solid line) is Eq. (2.39); that between nonrelativistic degeneracy and relativistic degeneracy (vertical dotted line) is Eq. (2.43). The relativistic/nonrelativistic boundary (upper dashed curve) is governed by electron-positron pair production (Ex. 4.5 and Fig. 4.6). The ionized-neutral boundary (lower dashed curve) is governed by the Saha equation (Ex. 4.6).

electrons (which have energies E ∼ kB T ), the regime of classical behavior (ηe ≪ 1; left side of solid line) is µe ≪ −kB T and the regime of strong quantum behavior (ηe ≃ 1; electron degeneracy; right side of solid line) is µe ≫ +kB T . The slanted solid boundary in Fig. 2.7 is thus the location µe = 0, which translates via Eq. (2.37a) to ρ = ρdeg ≡ 2mp /λ3T dB = (2mp /h3 )(2πme kB T )3/2 = 0.01(T /104K)3/2 g/cm3 .

(2.39)

Although the hydrogen gas is degenerate to the right of this boundary, we can still compute its equation of state using our kinetic-theory equations (2.46b) and (2.46c), so long as we use the quantum mechanically correct distribution function for the electrons—the Fermi-Dirac distribution (2.22a). In this electron-degenerate region, µe ≫ kB T , the electron mean occupation number ηe = 1/(e(E−µe )/kB T + 1) has the form shown in Fig. 2.8 and thus can be well approximated by ηe = 1 for E = p2 /2me < µe and ηe = 0 for E > µe ; or, equivalently by p (2.40) ηe = 1 for p < pF ≡ 2me µe , ηe = 0 for p > pF . Here pF is called the Fermi momentum. (The word “degenerate” refers to the fact that almost all the quantum states are fully occupied or are empty; i.e., ηe is everywhere nearly one or zero.) By inserting this degenerate distribution function [or, more precisely, Ne = (2/h2 )ηe ]

25 into Eqs. (2.35) and integrating, we obtain ne ∝ pF 3 and Pe ∝ pF 5 . By then setting 1/3 ne = np = ρ/mp and solving for pF ∝ ne ∝ ρ1/3 and inserting into the expression for Pe and evaluating the constants, we obtain (Ex. 2.8) the following equation of state for the electron pressure 2/3 5/3 me c2 ρ 1 3 . (2.41) Pe = 20 π λ3c mp /λ3c Here λc = h/mc = 2.426 × 10−10 cm is the electron Compton wavelength. The rapid growth Pe ∝ ρ5/3 of the electron pressure with increasing density is due to the degenerate electrons’ being being confined by the Pauli exclusion Principle to regions of ever shrinking size, causing their zero-point motions and associated pressure to grow. By contrast, the protons, with their far larger rest masses, remain nondegenerate [until their density becomes (mp /me )3/2 ∼ 105 times higher than Eq. (2.39)], and so their pressure is negligible compared to that of the electrons: the total pressure is P = Pe

in regime of nonrelativistic electron degeneracy.

(2.42)

When the density in this degenerate regime is pushed on upward to ρrel deg =

8πmp ≃ 1.0 × 106 g/cm3 . 3λ3c

(2.43)

(dotted vertical line in the figure), the electrons’ zero-point motions become relativistically fast (the electron chemical potential µe becomes of order me c2 ), so the non-relativistic, Newtonian analysis fails and the matter enters a domain of “relativistic degeneracy”. Both domains, nonrelativistic degeneracy (µe ≪ me c2 ) and relativistic degeneracy (µe & me c2 ), occur for matter inside a massive white-dwarf star—the type of star that the Sun will become when it dies. We shall study the structures of such stars in the Fluid-Mechanics part of this book (Chap. 12); and in Chap. 25 we shall see how general relativity (spacetime curvature) modifies that structure and helps force sufficiently massive white dwarfs to collapse. 1 0.8 0.6 ηe 0.4 0.2 4kT/µe

0.2

0.4

0.6 E/µe

0.8

1

1.2

Fig. 2.8: The Fermi-Dirac distribution function for electrons in the nonrelativistic, degenerate regime kB T ≪ µe ≪ me , with temperature such that kB T /µe = 0.03. Note that ηe drops from near one to near zero over the range µe − 2kB T . E . µe + 2kB T . See Ex. 2.10(b).

26 The (almost) degenerate Fermi-Dirac distribution function shown in Fig. 2.8 has a thermal tail whose width is 4kT /µe . As the temperature T is increased, the number of electrons in this tail increases, thereby increasing the electrons’ total energy Etot . This increase is responsible for the electrons’ specific heat (Ex. 2.10)—a quantity of importance for both the electrons in a metal (e.g. a copper wire) and the electrons in a white dwarf star. The electrons dominate the specific heat when the temperature is sufficiently low; but at higher temperatures it is dominated by the energies of sound waves (see Ex. 2.10 where we use the kinetic theory of phonons to compute the sound waves’ specific heat).

2.5.3

[R] Relativistic Density, Pressure, Energy Density and Equation of State

We turn, now, to the relativistic domain of kinetic theory, initially for a single species of particle with rest mass m and then (in the next subsection) for matter composed of electrons and protons. The relativistic mean rest frame of the particles, at some event P in spacetime, is that frame in which the particle flux Sj vanishes. We shall denote by ~urf the 4-velocity of this mean rest frame. As in Newtonian theory (above), we are especially interested in distribution functions N that are isotropic in the mean rest frame, i.e., distribution functions that depend on the magnitude |p| ≡ p of the spatial momentum of a particle, but not on its direction—or equivalently that depend solely on the particles’ energy E = −~urf · ~p expressed in frame-independent form [Eq. (1.38)], p E = p0 = m2 + p2 in mean rest frame .

(2.44)

Such isotropy is readily produced by particle collisions (discussed later in this chapter). Notice that isotropy in the mean rest frame, i.e., N = N (P, E) does not imply isotropy in any other inertial frame. As seen in some other (“primed”) frame, ~urf will have a time ′ component u0rf = γ and a space component u′ rf = γV [where V is the mean rest frame’s 1 velocity relative to the primed frame and γ = (1 − V2 ) 2 ]; and correspondingly, in the primed frame N will be a function of 2

1

E = −~urf · p~ = γ[(m2 + p′ ) 2 − V · p′ ] ,

(2.45)

which is anisotropic: it depends on the direction of the spatial momentum p′ relative to the velocity V of the particle’s mean rest frame. An example is the cosmic microwave radiation as viewed from earth, Ex. 2.6 above. As in Newtonian theory, isotropy greatly simplifies the momentum-space integrals (2.31) that we use to compute macroscopic properties of R R the particles: (i) The integrands of the expressions S j = N pj (dVp /E) and T j0 = T 0j = N pj p0 (dVp /E) for the particle flux, energy j flux and momentum density are all odd in the momentum-space coordinate therefore R p j and j j0 0j jk give vanishing integrals: S = T = T = 0. (ii) The integral T = N p pk dVp /E produces an isotropic stress tensor, T jk = P g jk = P δ jk , whose pressure is most easily computed from its trace, P = 13 T jj . Using this and the relations |p| ≡ p for the magnitude of the mop mentum, dVp = 4πp2 dp for the momentum-space volume element, and E = p0 = m2 + p2

27 for the particle energy, we can easily evaluate Eqs. (2.31) for the particle number density n = S 0 , the total density of mass-energy T 00 (which we shall denote ρ—the same notation as we use for mass density in Newtonian theory), and the pressure P . The results are: Z Z ∞ 0 n ≡ S = N dVp = 4π N p2dp , , (2.46a) 0

ρ ≡ T 00 = 1 P = 3

2.5.4

Z

Z

N EdVp = 4π

2 dVp

4π Np = E 3

Z

∞

N Ep2dp ,

(2.46b)

0

Z

0

∞

Np

p4 dp m2 + p2

.

(2.46c)

[R] Equation of State for a Relativistic Degenerate Hydrogen Gas

Return to the hydrogen gas whose nonrelativistic equations of state were computed in Sec. 2.5.1. As we deduced there, at densities ρ & 105 g/cm3 (near and to the right of the dotted line in Fig. 2.7) the electrons are squeezed into such tiny volumes that their zero-point energies are & me c2 , forcing us to treat them relativistically. We can do so with the aid of the following approximation for the relativistic Fermi-Dirac mean occupation number ηe = 1/[e(E−˜µe )/kB T ) + 1)]: q (2.47) ηe ≃ 1 for E < µ ˜e ≡ EF ; i.e., for p < pF = EF2 − m2 , ηe ≃ 0 for E > EF ; i.e., for p > pF .

(2.48)

Here EF is called the relativistic Fermi energy and pF the relativistic Fermi momentum. By inserting this ηe along with Ne = (2/h3 )ηe into the integrals (2.46) for the electrons’ number density ne , total density of mass-energy ρe and pressure, Pe , and performing the integrals (Ex. 2.9), we obtain results that are expressed most simply in terms of a parameter t (not to be confused with time) defined by q EF ≡ µ ˜ e ≡ me cosh(t/4) , pF ≡ EF2 − m2e ≡ me sinh(t/4) . (2.49a) The results are: ne ρe Pe

3 pF 8π = 3 sinh3 (t/4) , me 3λc Z pF /me √ 8π me π me x2 1 + x2 dx = 3 [sinh(t) − t] , = 3 λc λc 0 Z pF /me π me 8π me x4 √ dx = = [sinh(t) − 8 sinh(t/2) + 3t] . 3 2 λc 12λ3c 1+x 0

8π = 3λ3c

(2.49b) (2.49c) (2.49d)

These parametric relationships for ρe and Pe as functions of the electron number density ne are sometimes called the Anderson-Stoner equation of state, because they were first

28 derived by Wilhelm Anderson and Edmund Stoner in 1930 [see the history on pp. 153–154 of Thorne (1994)]. They are valid throughout the full range of electron degeneracy, from nonrelativistic up to ultra-relativistic. In a white-dwarf star, the protons, with their high rest mass, are nondegenerate, the total density of mass-energy is dominated by the proton rest-mass density, and since there is one proton for each electron in the hydrogen gas, that total is ρ ≃ mp ne =

8πmp sinh3 (t/4) . λ3c

(2.50a)

By contrast (as in the nonrelativistic regime), the pressure is dominated by the electrons (because of their huge zero-point motions), not the protons; and so the total pressure is P = Pe =

π me [sinh(t) − 8 sinh(t/2) + 3t] . 12λ3c

(2.50b)

In the low-density limit, where t ≪ 1 so pF ≪ me = me c, we can set solve the relativistic equation (2.49b) for t as a function of ne = ρ/mp and insert the result into the relativistic expression (2.50b); the result in the nonrelativistic equation of state (2.41). The dividing line ρ = ρrel deg = 8πmp /3λ3c ≃ 1.0 × 106 g/cm3 between nonrelativistic and relativistic degeneracy is the point where the electron Fermi momentum is equal to the rest mass, i.e. sinh t = 1 The equation of state (2.49d) implies Pe ∝ ρ5/3 Pe ∝ ρ4/3

in the nonrelativistic regime, ρ ≪ ρrel deg , in the relativistic regime, ρ ≫ ρrel deg .

(2.50c)

These asymptotic equations of state turn out to play a crucial role in the structure and stability of white-dwarf stars [Chaps. 12 and 25; Chap. 4 of Thorne(1994)].

2.5.5

[R] Equation of State for Radiation

As was discussed at the end of Sec. 2.3, for a gas of thermalized photons in an environment where photons are readily created and absorbed, the distribution function has the blackbody (Planck) form η = 1/(eE/kB T − 1), which we can rewrite as 1/(ep/kB T − 1) since the energy E of a photon is the same as the magnitude p of its momentum. In this case, the relativistic integrals (2.46) give (see Ex. 2.12) n = bT 3 ,

ρ = aT 4 ,

1 P = ρ, 3

(2.51a)

where b = 16πζ(3)

3 kB = 20.28 cm−3 K−3 , h3c3

(2.51b)

4 8π 5 kB = 7.56 × 10−15 erg cm−3 K−4 = 7.56 × 10−16 J m−3 K−4 (2.51c) a = 3 3 15 h c P −3 are radiation constants. Here ζ(3) = ∞ = 1.2020569... is the Riemann Zeta function. n=1 n

29 As we shall see in Sec. 27.4, when the Universe was younger than about 100,000 years, its energy density and pressure were predominantly due to thermalized photons (plus neutrinos which contributed roughly the same as the photons), so its equation of state was given by Eq. (2.51a) with the coefficient changed by a factor of order unity. Einstein’s general relativistic field equations (Part VI) required that 3 = ρ ≃ aT 4 (2.52a) 32πGτ 2 [Eq. (27.45)], where G is Newton’s gravitation constant and τ was the age of the universe as measured in the mean rest frame of the photons. Putting in numbers, we find that ρ=

4.9 × 10−12 g/cm3 , (τ /1yr)2

106 K T ≃p . τ /1yr

(2.52b)

This implies that when the universe was one minute old, its radiation density and temperature were about 1 g/cm3 and 6 × 108 K. These conditions were well suited for burning hydrogen to helium; and, indeed, about 1/4 of all the mass of the universe did get burned to helium at this early epoch. We shall examine this in further detail in Sec. 27.4. **************************** EXERCISES Exercise 2.7 Derivation & Practice: [N] Equation of State for Nonrelativistic, Classical Gas Consider a collection of identical, classical (i.e., with η ≪ 1) particles with a distribution function N which is thermalized at a temperature T such that kB T ≪ mc2 (nonrelativistic temperature). (a) Show that the distribution function, expressed in terms of the particles’ momentum or velocity in their mean rest frame, is gs 2 N = 3 eµ/kB T e−p /2mkB T , where p = |p| = mv , (2.53) h with v being the speed of a particle. (b) Show that the number density of particles in the mean rest frame is given by Eq. (2.37a). (c) Show that this gas satisfies the equations of state (2.37b). Note: The following integrals, for nonnegative integral values of q, will be useful: Z ∞ (2q − 1)!! √ 2 x2q e−x dx = π, 2q+1 0 where n!! ≡ n(n − 2)(n − 4) . . . (2 or 1); and Z ∞ 1 2 x2q+1 e−x dx = q! . 2 0

(2.54)

(2.55)

30 Exercise 2.8 Derivation and Practice: [N] Equation of State for Nonrelativistic, ElectronDegenerate Hydrogen Derive Eq. (2.41) for the electron pressure in a nonrelativistic, electron-degenerate hydrogen gas. Exercise 2.9 Derivation and Practice: [R] Equation of State for Relativistic, ElectronDegenerate Hydrogen Derive the equation of state (2.49d) for an electron-degenerate hydrogen gas. (Note: It might be easiest to compute the integrals with the help of symbolic manipulation software such as Mathematica or Maple.) Exercise 2.10 Example: [N] Specific Heat for Nonrelativistic, Degenerate Electrons in White Dwarfs and in Metals Consider a nonrelativistically degenerate electron gas at finite but small temperature. (a) Show that the inequalities kB T ≪ µe ≪ me are equivalent to the words “nonrelativistically degenerate”. (b) Show that the electrons’ mean occupation number ηe (E) has the form depicted in Fig. 2.8: It is near unity out to (nonrelativistic) energy E ≃ µe − 2kB T , and it then drops to nearly zero over a range of energies ∆E ∼ 4kB T . (c) If the electrons were nonrelativistic but nondegenerate, their thermal energy density would be ǫ = 32 nkB T , so the total electron energy (excluding rest mass) in a volume V containing N = nV electrons would be Etot = 23 NkB T , and the electron specific heat, at fixed volume, would be ∂Etot 3 CV ≡ = NkB (nondegenerate, nonrelativistic). (2.56) ∂T V 2 Using the semiquantitative form of ηe depicted in Fig. 2.8, show that to within a factor of order unity the specific heat of degenerate electrons is smaller than in the nondegenerate case by a factor ∼ kB T /µe : kB T ∂Etot ∼ NkB (degenerate, nonrelativistic). (2.57) CV ≡ ∂T V µe (d) Compute the multiplicative factor in this equation for CV . More specifically, show that, to first order in kB T /µe , π 2 kB T NkB . (2.58) CV = 2 µe (e) As an application, consider hydrogen inside a white dwarf star with density ρ = 105 g cm−3 and temperature T = 106 K. (These are typical values for a white dwarf interior). What are the numerical values of µe /me and kB T /µe for the electrons? What is the numerical value of the dimensionless factor (π 2 /2)(kB T /µe ) by which degeneracy reduces the electron specific heat?

31 (f) As a second application, consider the electrons inside a copper wire in a laboratory on earth at room temperature. Each copper atom donates about one electron to a “gas” of freely traveling (conducting) electrons, and keeps the rest of its electrons bound to itself. What are the numerical values of µe /me and kB T /µe for the conducting electron “gas”? Verify that these are in the range corresponding to nonrelativistic degeneracy. What is the value of the factor (π 2 /2)(kB T /µe ) by which degeneracy reduces the electron specific heat? At room temperature, this electron contribution to the specific heat is far smaller than the contribution from thermal vibrations of the copper atoms (i.e., thermal sound waves, i.e. thermal phonons), but at very low temperatures the electron contribution dominates, as we shall see in the next exercise. Exercise 2.11 Example: [N] Specific Heat for Phonons in an Isotropic Solid In Sec. 11.2 we will study classical sound waves propagating through an isotropic, elastic solid. As we shall see, there are two types of sound waves: longitudinal with frequencyindependent speed cL , and transverse with a somewhat smaller frequency-independent speed cT . For each type of wave, s = L or T , the material of the solid undergoes an elastic displacement ξ = Afs exp(ik · x − ωt), where A is the wave amplitude, fs is a unit vector (polarization vector) pointing in the direction of the displacement, k is the wave vector, and ω is the wave frequency. The wave speed is cs = ω/|k|. Associated with these waves are quanta called phonons. As for any wave, each phonon has a momentum related to its wave vector by p = ~k, and an energy related to its frequency by E = ~ω. Combining these relations we learn that the relationship between a phonon’s energy and the magnitude p = |p| of its momentum is E = cs p. This is the same relationship as for photons, but with the speed of light replaced by the speed of sound! For longitudinal waves, fL is in the propagation direction k so there is just one polarization, gL = 1; for transverse waves, fT is orthogonal to k, so there are two orthogonal polarizations (e.g. fT = ex and fT = ey when k points in the ez direction); i.e., gT = 2. (a) Phonons of both types, longitudinal and transverse, are bosons. Why? (b) Phonons are fairly easily created, absorbed, scattered and thermalized. A general argument regarding chemical reactions (Sec. 4.4) can be applied to phonon creation and absorption to deduce that, once they reach complete thermal equilibrium with their environment, the phonons will have vanishing chemical potential µ = 0. What, then, will be their distribution functions η and N ? (c) Ignoring the fact that the sound waves’ wavelengths λ = 2π/|k| cannot be larger than about twice the spacing between the atoms of the solid, show that the total phonon energy (wave energy) in a volume V of the solid is identical to that for black-body photons in a volume V , but with the speed of light c replaced by the speed of sound cs , and with the photons’ number of spin states, 2, replaced by gs (2 for transverse waves, 4 1 for longitudinal): Etot = as T 4 V , with as = gs (4π 5 /15)(kB /h3 c3s ); cf. Eqs. (2.51). (d) Show that the specific heat of the phonon gas (the sound waves) is CV = 4as T 3 V . This scales as T 3 , whereas in a metal the specific heat of the degenerate electrons scales as

32 T [previous exercise], so at sufficiently low temperatures the electron specific heat will dominate over that of the phonons. (d) Show that in the phonon gas, only phonon modes with wavelengths longer than ∼ λT = cs h/kB T are excited; i.e., for λ ≪ λT the mean occupation number is η ≪ 1; for λ ∼ λT , η ∼ 1; and for λ ≫ λT , η ≫ 1. As T is increased, λT gets reduced. Ultimately it becomes of order the interatomic spacing, and our computation fails because most of the modes that our calculation assumes are thermalized actually don’t exist. What is the critical temperature (Debye temperature) at which our computation fails and the T 3 law for CV changes? Show by a ∼ one-line argument that above the Debye temperature CV is independent of temperature. Exercise 2.12 Derivation & Practice: [N, R] Equation of State for a Photon Gas (a) Consider a collection of photons with a distribution function N which, in the mean rest frame of the photons, is isotropic. Show, using Eqs. (2.46b) and (2.46c), that this photon gas obeys the equation of state P = 13 ρ. (b) Suppose the photons are thermalized with zero chemical potential, i.e., they are isotropic with a black-body spectrum. Show that ρ = aT 4 , where a is the radiation constant of Eq. (2.51c). Note: Do not hesitate to use Mathematica or Maple or other computer programs to evaluate integrals! (c) Show that for this isotropic, black-body photon gas the number density of photons is n = bT 3 where b is given by Eq. (2.51b), and that the mean energy of a photon in the gas is π4 ¯ kB T = 2.7011780... kB T . (2.59) Eγ = 30ζ(3)

2.6

[N & R] Evolution of the Distribution Function: Liouville’s Theorem, the Collisionless Boltzmann Equation, and the Boltzmann Transport Equation

We now turn to the issue of how the distribution function η(P, ~p), or equivalently N = (gs /h3 )η, evolves from point to point in phase space. We shall explore the evolution under the simple assumption that between their very brief collisions, the particles all move freely, uninfluenced by any forces. It is straightforward to generalize to a situation where the particles interact with electromagnetic or gravitational or other fields as they move, and we shall do so in Box 2.2, Sec. ??, and Chap. 27. However, in the body of this chapter we shall restrict attention to the very common situation of free motion between collisions. Initially we shall even rule out collisions; only at the end of this section will we restore them, by inserting them as an additional term in our collision-free evolution equation for η.

33 px

x

∆ px

∆ px

px

x ∆x

∆x

(a)

(b)

Fig. 2.9: The phase space region (x-px part) occupied by a collection G of particles with finite rest mass, as seen in the inertial frame of the central, fiducial particle. (a) The initial region. (b) The region after a short time.

The foundation for our collision-free evolution law will be Liouville’s Theorem: Consider a collection G of particles which are initially all near some location in phase space and initially occupy an infinitesimal (frame-independent) phase-space volume d2 V = dVx dVp there. Pick a particle at the center of the collection G and call it the “fiducial particle”. Since all the particles in G have nearly the same initial position and velocity, they subsequently all move along nearly the same world line through spacetime; i.e., they all remain congregated around the fiducial particle. Liouville’s theorem says that the phase-space volume occupied by the particles G is conserved along the world line (or Newtonian trajectory) of the fiducial particle: d (dVx dVp ) = 0 ; . dℓ

(2.60)

Here ℓ is an arbitrary paraemeter along the particle’s world line (trajectory); for example, in Newtonian theory it could be universal time t or distance l travelled, and in relativity it could be proper time τ as measured by the fiducial particle (if its rest mass is nonzero) or the affine parameter ζ that is related to the fiducial particle’s 4-momentum by ~p = d~x/dζ. We can prove Liouville’s theorem with the aid of the diagrams in Fig. 2.9. Assume, for simplicity, that the particles have nonzero rest mass. Consider the region in phase space occupied by the particles, as seen in the inertial reference frame (rest frame) of the fiducial particle, and choose for ℓ the time t of that inertial frame (or in Newtonian theory the universal time t). Choose the particles’ region dVx dVp at t = 0 to be a rectangular box centered on the fiducial particle, i.e. on the origin xj = 0 of its inertial frame [Fig. 2.9(a)]. Examine the evolution with time t of the 2-dimensional slice y = py = z = pz = 0 through the occupied region. The evolution of other slices will be similar. Then, as t passes, the particle at location (x, px ) moves with velocity dx/dt = px /m, (where the nonrelativistic approximation to the velocity is used because all the particles are very nearly at rest in the fiducial particle’s inertial frame). Because the particles move freely, each has a conserved px , and their motion dx/dt = px /m (larger speed higher in the diagram) deforms the particles’ phase space region as shown in Fig. 2.9(b). Obviously, the area of the occupied region, ∆x∆px , is conserved. This same argument shows that the x-px area is conserved at all values of y, z, py , pz ; and similarly for the areas in the y-py planes and the areas in the z-pz planes. As a consequence, the total volume in phase space, dVx dVp = ∆x∆px ∆y∆py ∆z∆pz is conserved.

34 Although this proof of Liouville’s theorem relied on the assumption that the particles have nonzero rest mass, the theorem is also true for particles with zero rest mass—as one can deduce by taking the relativistic limit as the rest mass goes to zero and the particles’ 4-momenta become null. Since, in the absence of collisions or other nongravitational interactions, the number dN of particles in the collection G is conserved, Liouville’s theorem immediately implies also the conservation of the number density in phase space, N = dN/dVx dVp : dN =0 dℓ

along the trajectory of a fiducial particle.

(2.61)

This conservation law is called the collisionless Boltzmann equation; and in the context of plasma physics it is sometimes called the Vlasov equation. Note that it says that not only is the distribution function N frame-independent (in relativity theory); N also is constant along the phase-space trajectory of any freely moving particle. The collisionless Boltzmann equation is actually far more general than is suggested by the above derivation; see Box 2.2, which is best read after finishing this section. The collisionless Boltzmann equation is most nicely expressed in the frame-independent form (2.61). For some purposes, however, it is helpful to express the equation in a form that relies on a specific but arbitrary choice of inertial reference frame. Then N can be regarded as a function of the seven phase-space coordinates, N = N (t, xj , pk ), and the collisionless Boltzmann equation (2.61) then takes the coordinate-dependent form dt ∂N dxj ∂N dpj ∂N dt ∂N ∂N dN =0. (2.62) = + + = + vj dℓ dℓ ∂t dℓ ∂xj dℓ ∂pj dℓ ∂t ∂xj Here we have used the equation of straight-line motion dpj /dt = 0 for the particles and have set dxj /dt equal to the particle velocity v j = vj . Since our derivation of the collisionless Boltzmann equation relied on the assumption that no particles are created or destroyed as time passes, the collisionless Boltzmann equation in ~ ·S ~ = 0 relativistically, or turn should guarantee conservation of the number of particles, ∇ ∂n/∂t + ∇ · S in Newtonian theory or relativity (Sec. 1.11.4). Indeed, this is so; see Ex. 2.13. Similarly (relativistically), since the collisionless Boltzmann equation is based on the law of 4-momentum conservation for all the individual particles, it is reasonable to expect that the collisionless Boltzmann equation will guarantee the conservation of their total 4-momentum, ~ · T = 0 [Eq. (1.99a)]. Indeed, this conservation law does follow i.e. will guarantee that ∇ from the collisionless Boltzmann equation; see Ex. 2.13. Thus far we have assumed that the particles move freely through phase space with no collisions. If collisions occur, they will produce some nonconservation of N along the trajectory of a freely moving, noncolliding fiducial particle, and correspondingly, the collisionless Boltzmann equation will get modified to read dN = dℓ

dN dℓ

collisions

,

(2.63)

35 where the right-hand side represents the effects of collisions. This equation, with collision terms present, is called the Boltzmann transport equation. The actual form of the collision terms depends, of course, on the details of the collisions. We shall meet some specific examples in the next section [Eqs. (2.73), (2.81a), (2.82), and Ex. 2.20] and in our study of plasmas (Chaps. 21 and 22). Whenever one applies the collisionless Boltzmann equation or Boltzmann transport equation to a given situation, it is helpful to simplify one’s thinking in two ways: (i) Adjust the normalization of the distribution function so it is naturally tuned to the situation. For example, when dealing with photons, Iν /ν 3 is typically best, and if—as is usually the case—the photons do not change their frequencies as they move and only a single reference frame is of any importance, then Iν alone may do; see Ex. 2.14. (ii) Adjust the differentiation parameter ℓ so it is also naturally tuned to the situation. **************************** EXERCISES Exercise 2.13 [N & R] Derivation and Problem: Collisionless Boltzmann Implies Conservation of Particles and of 4-Momentum Consider a collection of freely moving, noncolliding particles, which satisfy the collisionless Boltzmann equation dN /dℓ = 0. (a) Show that this equation guarantees that the particle conservation law ∂n/∂t+∇·S = 0 is satisfied, where n and S are expressed in terms of the distribution function N by the Newtonian momentum-space integrals (2.30). (b) Show that the relativistic Boltzmann equation guarantees the relativistic conservation ~ ·S ~ = 0 and ∇ ~ · T = 0,, where the number-flux vector S ~ and the stress-energy laws ∇ tensor T are expressed in terms of N by the momentum-space integrals (2.31). Exercise 2.14 Example: [N] Solar Heating of the Earth: The Greenhouse Effect In this example we shall study the heating of the Earth by the Sun. Along the way, we shall derive some important relations for black-body radiation. Since we will study photon propagation from the Sun to the Earth with Doppler shifts playing a negligible role, there is a preferred inertial reference frame: that of the Sun and Earth with their relative motion neglected. We shall carry out our analysis in that frame. Since we are dealing with thermalized photons, the natural choice for the distribution function is Iν /ν 3 ; and since we use just one unique reference frame, each photon has a fixed frequency ν, so we can forget about the ν 3 and use Iν . (a) Assume, as is very nearly true, that each spot on the sun emits black-body radiation in all outward directions with a temperature T⊙ = 5800 K. Show, by integrating over the black-body Iν , that the total energy flux (i.e. power per unit surface area) F emitted by the Sun is F ≡

dE = σT⊙4 , dtdA

where σ =

4 ac 2π 5 kB erg = = 5.67 × 10−5 2 4 . 3 2 4 15 h c cm sK

(2.64)

36 Box 2.2 Sophisticated Derivation of Relativistic Collisionless Boltzmann Equation ~ ≡ {P, p~} a point in 8-dimensional phase space. In an inertial frame Denote by X ~ are {x0 , x1 , x2 , x3 , p0 , p1 , p2 , p3 }. [We use up indices (“contravariant” the coordinates of X indices) on x and down indices (“covariant” indices) on p because this is the form required in Hamilton’s equations below; i.e., it is pα not pα that is canonically conjugate to xα .] ~ in 8-dimensional phase space. The fact that our Regard N as a function of location X particles all have the same rest mass so N is nonzero only on the mass hyperboloid means ~ N entails a delta function. For the following derivation that that as a function of X, delta function is irrelevant; the derivation is valid also for distributions of non-identical particles, as treated in Ex. 2.2. ~ moves through phase space along a world A particle in our distribution at location X ~ ~ line with tangent vector dX/dζ, where ζ is its affine parameter. The product N dX/dζ represents the number-flux 8-vector of particles through spacetime, as one can see by an argument analogous to Eq. (2.33c). We presume that, as the particles move through phase space, none are created or destroyed. The law of particle conservation in phase ~ ·S ~ = 0 in spacetime, takes the form ∇ ~ · (N dX/dζ) ~ space, by analogy with ∇ = 0. In terms of coordinates in an inertial frame, this conservation law says ∂ dxα dpα ∂ + α N =0. (1) N ∂xα dζ ∂p dζ The motions of individual particles in phase space are governed by Hamilton’s equations dxα ∂H = α , dζ ∂p

dpα ∂H =− α . dζ ∂x

(2)

For the freely moving particles of this chapter, the relativistic Hamiltonian is [cf. Sec. 8.4 of Goldstein, Poole and Safko (2002) and p. 489 of Misner, Thorne and Wheeler (1973)] 1 H = (pα pβ g αβ − m2 ) . 2

(3)

Our derivation of the collisionless Boltzmann equation does not depend on this specific form of the Hamiltonian; it is valid for any Hamiltonian and thus, e.g., for particles interacting with an electromagnetic field or even a relativistic gravitational field (spacetime curvature; Part VI). By inserting Hamilton’s equations (2) into the 8-dimensional law of particle conservation (1), we obtain ∂H ∂ ∂H ∂ N α − α N α =0. (4) ∂xα ∂p ∂p ∂x Using the rule for differentiating products, and noting that the terms involving two

37 Box 2.2, Continued derivatives of H cancel, we bring this into the form 0=

∂N ∂H ∂N ∂H ∂N dxα ∂N dpα dN − = − α = , α α α α α ∂x ∂p ∂p ∂x ∂x dζ ∂p dζ dζ

(5)

which is the collisionless Boltzmann equation. [To get the second expression we have used Hamilton’s equations, and the third follows directly from the formulas of differential calculus.] Thus, the collisionless Boltzmann equation is a consequence of just two assumptions, conservation of particles and Hamilton’s equations for the motion of each particle, which implies it has very great generality. We shall extend and explore this generality in the next chapter.

(b) Since the distribution function Iν is conserved along each photon’s trajectory, observers on Earth, looking at the sun, see identically the same black-body specific intensity Iν as they would if they were on the Sun’s surface. (No wonder our eyes hurt if we look directly at the Sun!) By integrating over Iν at the Earth [and not by the simpler method of using Eq. (2.64) for the flux leaving the Sun], show that the energy flux arriving at Earth is F = σT⊙4 (R⊙ /r)2 , where R⊙ = 696, 000 km is the Sun’s radius and r = 1.496 × 108 km is the distance from the Sun to Earth. (c) Our goal is to compute the temperature T⊕ of the Earth’s surface. As a first attempt, assume that all the Sun’s flux arriving at Earth is absorbed by the Earth’s surface, heating it to the temperature T⊕ , and then is reradiated into space as black-body radiation at temperature T⊕ . Show that this leads to a surface temperature of T⊕ = T⊙

R⊙ 2r

1/2

= 280 K = 7 C .

(2.65)

This is a bit cooler than the correct mean surface temperature (287 K = 14 C). (d) Actually, the Earth has an “albedo” of A = 0.30, which means that 40 per cent of the sunlight that falls onto it gets reflected back into space with an essentially unchanged spectrum, rather than being absorbed. Show that with only a fraction 1 − A = 0.70 of the solar radiation being absorbed, the above estimate of the Earth’s temperature becomes 1/2 √ 1 − A R⊙ = 255 K = −18 C . (2.66) T⊕ = T⊙ 2r This is even farther from the correct answer. (e) The missing piece of physics, which raises the temperature from −18C to something much nearer the correct 14C, is the Greenhouse Effect: The absorbed solar radiation has most of its energy at wavelengths ∼ 0.5µm (in the visual band), which pass rather easily through the Earth’s atmosphere. By contrast, the black-body radiation that

38 the Earth’s surface wants to radiate back into space, with its temperature ∼ 300K, is concentrated in the infrared range from ∼ 8µm to ∼ 30µm. Water molecules and carbon dioxide in the Earth’s atmosphere absorb about half of the energy that the Earth tries to reradiate at these energies,4 causing the reradiated energy to be about half that of a black body at the Earth’s surface temperature. Show that with this “Greenhouse” correction, T⊕ becomes about 293K = +20C. Of course, the worry is that human activity is increasing the amount of carbon dioxide in the atmosphere by enough to raise the Earth’s temperature significantly further and disrupt our comfortable lives. Exercise 2.15 Challenge: [N] Olbers’ Paradox and Solar Furnace Consider a universe in which spacetime is flat and is populated throughout by stars that cluster into galaxies like our own and our neighbors, with interstellar and intergalactic distances similar to those in our neighborhood. Assume that the galaxies are not moving apart, i.e., there is no universal expansion. Using the collisionless Boltzmann equation for photons, show that the Earth’s temperature in this universe would be about the same as the surface temperatures of the universe’s hotter stars, ∼ 10, 000 K, so we would all be fried. What features of our universe protect us from this fate? Motivated by this model universe, describe a design for a furnace that relies on sunlight for its heat and achieves a temperature nearly equal to that of the sun’s surface, 5770 K.

2.7

[N] Transport Coefficients

In this section we turn to a practical application of kinetic theory: the computation of transport coefficients. Our primary objective is to illustrate the use of kinetic theory, but the transport coefficients themselves are also of interest: they will play important roles in Parts IV and V of this book (Fluid Mechanics and Plasma Physics). What are transport coefficients? An example is the electrical conductivity κe . When an electric field E is imposed on a sample of matter, Ohm’s law tells us that the matter responds by developing a current density j = κe E . (2.67a) The electrical conductivity is high if electrons can move through the material with ease; it is low if electrons have difficulty moving. The impediment to electron motion is scattering off other particles—off ions, other electrons, phonons (sound waves), plasmons (plasma waves), . . .. Ohm’s law is valid when (as almost always) the electrons scatter many times, so they diffuse (random-walk their way) through the material. In order to compute the electrical conductivity, one must analyze, statistcally, the effects of the many scatterings on the electrons’ motions. The foundation for an accurate analysis is the Boltzmann transport equation. Another example of a transport coefficient is the thermal conductivity κ, which appears in the law of heat conduction q = −κ∇T . (2.67b) 4

See, e.g., the section and figures on “Absorption of Atmospheric Gases” in Allen (2000)

39 Here q is the diffusive energy flux from regions of high temperature T to low. The impediment to heat flow is scattering of the conducting particles; and, correspondingly, the foundation for accurately computing κ is the Boltzmann transport equation. Other examples of transport coefficients are (i) the coefficient of shear viscosity ηshear which determines the stress Tij (diffusive flux of momentum) that arises in a shearing fluid Tij = −2ηshear σij ,

where σij is the fluid’s “rate of shear”, Ex. 2.18;

(2.67c)

and (ii) the diffusion coefficient D, which determines the diffusive flux of particles S from regions of high particle density n to low, S = −D∇n .

(2.67d)

There is a diffusion equation associated with each of these transport coefficients. For example, the differential law of particle conservation ∂n/∂t + ∇ · S = 0 [Eq. (1.73)], when applied to material in which the particles scatter many times so S = −D∇n, gives the following diffusion equation for the particle number density: ∂n = D∇2 n . ∂t

(2.68)

In Ex. 2.16, by exploring solutions to this equation, we shall see that the√rms distance the particles travel is proportional to the square root of their travel time, ¯l = 4Dt, a behavior characteristic of diffusive random walks. Similarly, the law of energy conservation, when applied to diffusive heat flow q = −κ∇T , leads to a diffusion equation for the thermal energy density ǫ and thence for temperature (Ex. 2.17); Maxwell’s equations in a magnetized fluid, when combined with Ohm’s law j = κe E, lead to a diffusion equation (18.6) for magnetic field lines; and the law of angular momentum conservation, when applied to a shearing fluid with Tij = −2ηshear σij , leads to a diffusion equation (13.6) for vorticity. These diffusion equations, and all other physical laws involving transport coefficients, are approximations to the real world—approximations that are valid if and only if (i ) many particles are involved in the transport of the quantity of interest (charge, heat, momentum, particles) and (ii ) on average each particle undergoes many scatterings in moving over the length scale of the macroscopic inhomogeneities that drive the transport. This second requirement can be expressed quantitatively in terms of the mean free path λ between scatterings (i.e., the mean distance a particle travels between scatterings, as measured in the mean rest frame of the matter) and the macroscopic inhomogeneity scale L for the quantity that drives the transport (for example, in heat transport that scale is L ∼ T /|∇T |, i.e., it is the scale on which the temperature changes by an amount of order itself). In terms of these quantities, the second criterion of validity is λ ≪ L. These two criteria (many particles and λ ≪ L) together are called diffusion criteria, since they guarantee that the quantity being transported (charge, heat, momentum, particles) will diffuse through the matter. If either of the two diffusion criteria fails, then the standard transport law (Ohm’s law, the law of heat conduction, the Navier-Stokes equation, or the diffusion equation) breaks down and the corresponding transport coefficient becomes irrelevant and meaningless.

40 The accuracy with which one can compute a transport coefficient using the Boltzmann transport equation depends on the accuracy of one’s description of the scattering. If one uses a high-accuracy collision term (dN /dℓ)collisions in the Boltzmann equation, one can derive a highly accurate transport coefficient. If one uses a very crude approximation for the collision term, one’s resulting transport coefficient might be accurate only to within an order of magnitude—in which case, it was probably not worth the effort to use the Boltzmann equation; a simple order-of-magnitude argument would have done just as well. In this section we shall compute the coefficient of thermal conductivity κ first by an orderof-magnitude argument, and then by the Boltzmann equation with a highly accurate collision term. In Exs. 2.18and 2.19 readers will have the opportunity to compute the coefficient of viscosity and the diffusion coefficient using moderately accurate collision terms, and in Ex. 2.20 we will meet diffusion in momentum space, by contrast with diffusion in physical space.

2.7.1

Problem to be Analyzed: Diffusive Heat Conduction Inside a Star

The specific problem we shall treat here in the text is heat transport through hot gas deep inside a young, massive star. We shall confine attention to that portion of the star in which the temperature is 107 K . T . 109 K, the mass density is ρ . 10 g/cm3 (T /107 K)2 , and heat is carried primarily by diffusing photons rather than by diffusing electrons or ions or by convection. (We shall study convection in Chap. 17.) In this regime the primary impediment to the photons’ flow is collisions with electrons. The lower limit on temperature, 107 K, guarantees that the gas is almost fully ionized, so there is a plethora of electrons to do the scattering. The upper limit on density, ρ ∼ 10 g/cm3 (T /107 K)2 guarantees that (i ) the inelastic scattering, absorption, and emission of photons by electrons accelerating in the coulomb fields of ions (“bremsstrahlung” processes) are unimportant as impediments to heat flow compared to scattering off free electrons; and (ii ) the scattering electrons are nondegenerate, i.e., they have mean occupation numbers η small compared to unity and thus behave like classical, free, charged particles. The upper limit on temperature, T ∼ 109 K, guarantees that (i ) the electrons which do the scattering are moving thermally at much less than the speed of light (the mean thermal energy 1.5kB T of an electron is much less than its rest mass-energy me c2 ); and (ii ) the scattering is nearly elastic, with negligible energy exchange between photon and electron, and is describable with good accuracy by the Thomson scattering cross section: In the rest frame of the electron, which to good accuracy will be the same as the mean rest frame of the gas since the electron’s speed relative to the mean rest frame is ≪ c, the differential cross section dσ for a photon to scatter from its initial propagation direction n′ into a unit solid angle dΩ centered on a new propagation direction n is dσ(n′ → n) 3 = σT [1 + (n · n′ )2 ] . dΩ 16π

(2.69a)

Here σT is the total Thomson cross section [the integral of the differential cross section

41 α caT0 4 α caTλ4

z= 0

z= λ

Fig. 2.10: Heat exchange between two layers of gas separated by a distance of one photon mean free path in the direction of the gas’s temperature gradient.

(2.69a) over solid angle]: σT =

Z

dσ(n′ → n) 8π 2 dΩ = r = 0.665 × 10−24 cm2 , dΩ 3 o

(2.69b)

where ro = e2 /me c2 is the classical electron radius. For a derivation and discussion of the Thomson cross sections (2.69) see, e.g., Sec. 14.8 of Jackson (1999).

2.7.2

Order-of-Magnitude Analysis

Before embarking on any complicated calculation, it is always helpful to do a rough, orderof-magnitude analysis, thereby identifying the key physics and the approximate answer. The first step of a rough analysis of our heat transport problem is to identify the magnitudes of the relevant lengthscales. The inhomogeneity scale L for the temperature, which drives the heat flow, is the size of the hot stellar core, a moderate fraction of the Sun’s radius: L ∼ 105 km. The mean free path of a photon can be estimated by noting that, since each electron presents a cross section σT to the photon and there are ne electrons per unit volume, the probability of a photon being scattered when it travels a distance l through the gas is of order ne σT l; and therefore to build up to unit probability for scattering, the photon must travel a distance ! mp 1g/cm3 1 ∼ ∼ 3 cm ∼ 3 cm . (2.70) λ∼ ne σT ρσT ρ Here mp is the proton rest mass, ρ ∼ 1 g/cm3 is the mass density in the core of a young, massive star, and we have used the fact that stellar gas is mostly hydrogen to infer that there is approximately one nucleon per electron in the gas and hence that ne ≃ ρ/mp . Note that L ∼ 105 km is 3 × 104 times larger than λ ∼ 3 cm, and the number of electrons and photons inside a cube of side L is enormous, so the diffusion description of heat transport is quite accurate. In the diffusion description the heat flux q as measured in the gas’s rest frame is related to the temperature gradient ∇T by the law of diffusive heat conduction q = −κ∇T , To estimate the thermal conductivity κ, orient the coordinates so the temperature gradient is in the z direction, and consider the rate of heat exchange between a gas layer located near z = 0 and a layer one photon-mean-free-path away, at z = λ (Fig. 2.10). The heat exchange is carried by photons that are emitted from one layer, propagate nearly unimpeded to the

42 other, and then scatter. Although the individual scatterings are nearly elastic (and we thus are ignoring changes of photon frequency in the Boltzmann equation), tiny changes of photon energy add up over many scatterings to keep the photons nearly in local thermal equilibrium with the gas. Thus, we shall approximate the photons and gas in the layer at z = 0 to have a common temperature T0 and those in the layer at z = λ to have a common temperature Tλ = T0 + λdT /dz. Then the photons propagating from the layer at z = 0 to that at z = λ carry an energy flux q0→λ = αca(T0 )4 , (2.71a) where a is the radiation constant of Eq. (2.51c), a(T0 )4 is the photon energy density at z = 0, and α is a dimensionless constant of order 1/4 that accounts for what fraction of the photons at z = 0 are moving rightward rather than leftward, and at what mean angle to the z direction. (Throughout this section, by contrast with early sections of this chapter, we shall use non-geometrized units, with the speed of light c present explicitly). Similarly, the flux of energy from the layer at z = λ to the layer at z = 0 is qλ→0 = −αca(Tλ )4 ;

(2.71b)

and the net rightward flux, the sum of (2.71a) and (2.71b), is q = αca[(T0 )4 − (Tλ )4 ] = −4αcaT 3 λ

dT . dz

(2.71c)

Noting that 4α is approximately one, inserting expression (2.70) for the photon mean free path, and comparing with the law of diffusive heat flow q = −κ∇T , we conclude that the thermal conductivity is acT 3 . (2.72) κ ∼ aT 3 cλ = σT ne

2.7.3

Analysis Via the Boltzmann Transport Equation

With these physical insights and rough answer in hand, we turn to a Boltzmann transport analysis of the heat transfer. Our first step is to formulate the Boltzmann transport equation for the photons (including effects of Thomson scattering off the electrons) in the rest frame of the gas. To simplify the analysis, we use as the parameter ℓ in the transport equation the distance l that a fiducial photon travels, and we regard the distribution function N as a function of location x in space, the photon propagation direction (unit vector) n, and the photon frequency ν: N (x, n, ν). Because the photon frequency does not change during free propagation nor in the Thomson scattering, it can be treated as a constant when solving the Boltzmann equation. Along the trajectory of a fiducial photon, N (x, n, ν) will change as a result of two things: (i ) the scattering of photons out of the n direction and into other directions, and (ii ) the scattering of photons from other directions n′ into the n direction. These effects produce the following two “collision” terms in the Boltzmann transport equation (2.63): Z dN (x, n, ν) dσ(n′ → n) = −σT ne N (x, n, ν) + ne N (t, x, n′, ν)dΩ′ . (2.73) dl dΩ

43 Because the mean free path λ = 1/σT ne is so short compared to the length scale L of the temperature gradient, the heat flow will show up as a tiny correction to an otherwise isotropic, perfectly thermal distribution function. Thus, we can write the photon distribution function as the sum of an unperturbed, perfectly isotropic and thermalized piece N0 and a tiny, anisotropic perturbation N1 : N = N0 + N1 ,

where N0 =

1 2 . 3 hν/k T −1 B h e

(2.74a)

Here the perfectly thermal piece N0 has the standard black-body form (2.23); it depends on the photon 4-momentum only through the frequency ν as measured in the mean rest frame of the gas, and it depends on location in spacetime only through the temperature T , which we assume is time independent in the star but is a function of spatial location, T = T (x). If the photon mean free path were vanishingly small, there would be no way for photons at different locations x to discover that the temperature is inhomogeneous; and, correspondingly, N1 would be vanishingly small. The finiteness of the mean free path permits N1 to be finite, and so it is reasonable to expect (and turns out to be true) that the magnitude of N1 is λ N1 ∼ N0 . (2.74b) L Thus, N0 is the leading-order term, and N1 is the first-order correction in an expansion of the distribution function N in powers of λ/L. This is called a two-lengthscale expansion; see Box 2.3. Box 2.3 Two-Lengthscale Expansions Equation (2.74b) is indicative of the mathematical technique that underlies Boltzmann-transport computations: a perturbative expansion in the dimensionless ratio of two lengthscales, the tiny mean free path λ of the transporter particles and the far larger macroscopic scale L of the inhomogeneities that drive the transport. Expansions in lengthscale ratios λ/L are called two-lengthscale expansions, and are widely used in physics and engineering. Most readers will previously have met such an expansion in quantum mechanics: the WKB approximation, where λ is the lengthscale on which the wave function changes and L is the scale of changes in the potential V (x) that drives the wave function. Kinetic theory itself is the result of a two-lengthscale expansion: It follows from the more sophisticated statistical-mechanics formalism in Chap. 3, in the limit where the particle sizes are small compared to their mean free paths. In this book we shall use two-lengthscale expansions frequently—e.g., in the geometric optics approximation to wave propagation (Chap. 6), in the study of boundary layers in fluid mechanics (Secs. 13.4 and 14.4), in the quasi-linear formalism for plasma physics (Chap. 22), and in the definition of a gravitational wave (Chap. 26). Inserting N = N0 + N1 into our Boltzmann transport equation (eq:qbed) and using d/dl = n · ∇ for the derivative with respect to distance along the fiducial photon trajectory,

44 we obtain ∂N1 ∂N0 + nj = nj ∂xj ∂xj +

dσ(n′ → n) ′ −σT ne N0 + ne cN0 dΩ dΩ Z dσ(n′ → n) ′ ′ −σT ne cN1 (n, ν) + ne cN1 (n , ν)dΩ . (2.74c) dΩ Z

Because N0 is isotropic, i.e., is independent of photon direction n′ , it can be pulled out of the integral over n′ in the first square bracket on the right side; and when this is done, the first and second terms in that square bracket cancel each other. Thus, the unperturbed part of the distribution, N0 , completely drops out of the right side of (2.74c). On the left side the term involving the perturbation N1 is tiny compared to that involving the unperturbed distribution N0 , so we shall drop it; and because the spatial dependence of N0 is entirely due to the temperature gradient, we can bring the first term and the whole transport equation into the form Z ∂T ∂N0 dσ(n′ → n) nj = −σT ne N1 (n, ν) + ne N1 (n′ , ν)dΩ′ . (2.74d) ∂xj ∂T dΩ The left side of this equation is the amount by which the temperature gradient causes N0 to fail to satisfy the Boltzmann equation, and the right side is the manner in which the perturbation N1 steps into the breach and enables the Boltzmann equation to be satisfied. Because the left side is linear in the photon propagation direction nj (i.e., it has a cos θ dependence in coordinates where ∇T is in the z-direction; i.e., it has a “dipolar”, l = 1 angular dependence), N1 must also be linear in nj , i.e. dipolar, in order to fulfill Eq. (2.74d). Thus, we shall write N1 in the dipolar form N1 = Kj (x, ν)nj ,

(2.74e)

and we shall solve the transport equation (2.74d) for the function Kj . [Important side remark : This is a special case of a general situation: When solving the Boltzmann equation in diffusion situations, one is performing a power series expansion in λ/L; see Box 2.3. The lowest-order term in the expansion, N0 , is isotropic, i.e., it is monopolar in its dependence on the direction of motion of the diffusing particles. The firstorder correction, N1 , is down in magnitude by λ/L from N0 and is dipolar (or sometimes quadrupolar; see Ex. 2.18) in its dependence on the particles’ direction of motion. The second-order correction, N2 , is down in magnitude by (λ/L)2 from N0 and its multipolar order is one higher than N1 (quadrupolar here; octupolar in Ex. 2.18). And so it continues on up to higher and higher orders.5 ] When we insert the dipolar expression (2.74e) into the angular integral on the right side of the transport equation (2.74d) and notice that the differential scattering cross section (2.69a) is unchanged under n′ → −n′ , but Kj n′ j changes sign, we find that the integral vanishes. As a result the transport equation (2.74d) takes the simplified form nj 5

∂T ∂N0 = −σT ne Kj nj , ∂xj ∂T

(2.74f)

For full details in nonrelativistic situations see, e.g., Grad (1957); and for full relativistic details see, e.g., Thorne (1981).

45 from which we can read off the function Kj and thence N1 = Kj nj : N1 = −

∂N0 /∂T ∂T nj . σT ne ∂xj

(2.74g)

Notice that, as claimed above, the perturbation has a magnitude 1 1 λ N1 ∼ |∇T | ∼ . N0 σT ne T L

(2.74h)

We can now evaluate the energy flux qi carried by the diffusing photons. Relativity physicists will recognize that flux as the T 0i part of the stress-energy tensor and will therefore evaluate it as Z Z 0 i dVp 0i 2 2 Np p 0 = c qi = T = c N pi dVp (2.75) p [cf. Eq. (2.31c) with the factors of c restored]. Newtonian physicists can deduce this formula by noticing that photons with momentum p in dVp carry energy E = |p|c and move with velocity v = cp/|p|, so their energy flux is N Ev dVp = c2 N p dVp; integrating this up over momentum space gives Eq. (2.75). Inserting N = N0 + N1 into this equation and noting that the integral over N0 vanishes, and inserting Eq. (2.74h) for N1 , we obtain Z Z c ∂T ∂ 2 N0 cnj pi dVp . (2.76a) N1 pi dVp = − qi = c σT ne ∂xj ∂T (Here we have used the fact that the only way in which the integral can depend on location x is through the photon temperature T .) The relativity physicist will identify the integral as Eq. (2.31c) for the photons’ stress tensor Tij (since nj = pj /p0 = pj /E. The Newtonian physicist, with a little thought, will recognize the integral in Eq. (2.76a) as the j-component of the flux of i-component of momentum, which is precisely the stress tensor. Since this stress tensor is being computed with the isotropic, thermalized part of N , it is isotropic Tji = P δji and its pressure has the standard black-body-radiation form P = 31 aT 4 [Eqs. (2.51a)]. Replacing the integral in Eq. (2.76a) by this black-body stress tensor, we obtain our final answer for the photons’ energy flux: c 4 3 ∂T c ∂T d 1 4 aT δji = − aT . (2.76b) qi = − σT ne ∂xj dT 3 σT ne 3 ∂xi Thus, from the Boltzmann transport equation we have simultaneously derived the law of diffusive heat conduction q = −κ∇T and evaluated the coefficient of heat conductivity κ=

4 acT 3 . 3 σT ne

(2.77)

Notice that this heat conductivity is 4/3 times our crude, order-of-magnitude estimate (2.72). The above calculation, while somewhat complicated in its details, is conceptually fairly simple. The reader is encouraged to go back through the calculation and identify the main conceptual steps (expansion of distribution function in powers of λ/L, insertion of zeroorder plus first-order parts into the Boltzmann equation, multipolar decomposition of the

46 zero and first-order parts with zero-order being monopolar and first-order being dipolar, neglect of terms in the Boltzmann equation that are smaller than the leading ones by factors λ/L, solution for the coefficient of the multipolar decomposition of the first-order part, reconstruction of the first-order part from that coefficient, and insertion into a momentumspace integral to get the flux of the quantity being transported). Precisely these same steps are used to evaluate all other transport coefficients that are governed by classical physics. For examples of other such calculations see, e.g., Shkarofsky, Johnston, and Bachynski (1966). As an application of the thermal conductivity (2.77), consider a young (main-sequence) 7 solar mass (7M⊙ ) star as modeled, e.g., on page 480 of Clayton (1968). Just outside the star’s convective core, at radius r ≃ 0.8R⊙ ≃ 6×105 km (where R⊙ is the Sun’s radius), the density and temperature are ρ ≃ 5g/cm3 and T ≃ 1.6 × 107 K, so the number density of electrons is ne ≃ ρ/mp ≃ 3 × 1024 cm−3 . For these parameters, Eq. (2.77) gives a thermal conductivity κ ≃ 7 × 1017 erg s−1 cm−2 K−1 . The lengthscale L on which the temperature is changing is approximately the same as the radius, so the temperature gradient is |∇T | ∼ T /r ∼ 3 × 10−4 K/cm. The law of diffusive heat transfer then predicts a heat flux q = κ|∇T | ∼ 2 × 1014 erg s−1 cm−2 , and thus a total luminosity L = 4πr 2 q ∼ 8 × 1036 erg/s ≃ 2000L⊙ (2000 solar luminosities). What a difference the mass makes! The heavier a star, the hotter its core, the faster it burns, and the higher its luminosity. Increasing the mass by a factor 7 drives the luminosity up by 2000. **************************** EXERCISES Exercise 2.16 Example: **Solution of Diffusion Equation in an Infinite, Homogeneous Medium (a) Show that the following is a solution to the diffusion equation (2.68) for particles in a homogeneous, infinite medium: n=

N 2 e−r /4Dt , 3/2 (4πDt)

(2.78)

p R (where r ≡ x2 + y 2 + z 2 is radius), and that it satisfies n dVx = N, so N is the √total number of particles. Note that this is a Gaussian distribution with width σ = 4Dt. Plot this solution for several values of σ. In the limit as t → 0, the particles are all localized at the origin. As time passes, √ they random-walk (diffuse) away from the origin, traveling a mean distance σ = 4Dt after time t. We will meet this squareroot-of-time evolution in other random-walk situations elsewhere in this book. (b) Suppose that the particles have an arbitrary initial distribution no (x) at time t = 0. Show that their distribution at a later time t is given by the following “Greens function” integral: Z no (x′ ) −|x−x′ |2 /4Dt e . (2.79) n(x, t) = 4πDt

47 Exercise 2.17 Problem: Diffusion Equation for Temperature Use the law of energy conservation to show that, when heat diffuses through a homogeneous medium whose volume is being kept fixed, the evolution of the temperature perturbation δT ≡ T − (average temperature) is governed by the diffusion equation (2.68) with diffusion constant D = κ/CV . Here CV is the specific heat at fixed volume. Exercise 2.18 Example: Viscosity of a Monatomic Gas Consider a nonrelativistic fluid that, in the neighborhood of the origin, has fluid velocity vi = σij xj , with σij symmetric and trace free. As we shall see in Sec. 12.6.1, this represents a purely shearing flow, with no rotation or volume changes of fluid elements; σij is called the fluid’s rate of shear. Just as a gradient of temperature produces a diffusive flow of heat, so the gradient of velocity embodied in σij produces a diffusive flow of momentum, i.e. a stress. In this exercise we shall use kinetic theory to show that, for a monatomic gas with isotropic scattering of atoms off each other, this stress is Tij = −2ηshear σij ,

(2.80a)

1 ηshear ≃ ρλvth , 3

(2.80b)

with the coefficient of shear viscosity

where ρ is the gas density, λ is the atoms’ mean free path between collisons, and vth = p 3kTB /m is the atoms’ rms speed. Our analysis will follow the same route as the analysis of heat conduction in Secs. 2.7.2 and 2.7.3. (a) Derive Eq. (2.80b) for the shear viscosity by an order of magnitude analysis like that in Sec. 2.7.2. (b) Regard the atoms’ distribution function N as being a function of the magnitude p and direction n of an atom’s momentum, and of location x in space. Show that, if the scattering is isotropic with cross section σs and the number density of atoms is n, then the Boltzmann transport equation can be written as Z dN 1 1 = n · ∇N = − N + N (p, n′ , x)dΩ′ , (2.81a) dl λ 4πλ where λ = 1/nσs is the atomic mean free path (mean distance traveled between scatterings) and l is distance traveled by a fiducial atom. (b) Explain why, in the limit of vanishingly small mean free path, the distribution function has the following form: N0 =

n exp[−(p − mσ · x)2 /2mkB T ] . (2πmkB T )3/2

(2.81b)

(c) Solve the Boltzmann transport equation (2.81a) to obtain the leading-order correction N1 to the distribution function at x = 0 . [Answer: N1 = −(λp/kB T )σab na nb N0 .]

48 (d) Compute the stress via a momentum-space integral. Your answer should be Eq. (2.80a) with ηshear given by Eq. (2.80b) to within a few tens of per cent accuracy. [Hint: Along the way you will need the following angular integral: Z 4π na nb ni nj dΩ = (δab δij + δai δbj + δaj δbi ) . (2.81c) 15 Derive this by arguing that the integral must have the above delta-function structure, and by then computing the multiplicative constant by performing the integral for a = b = i = j = z.] Exercise 2.19 Example: Diffusion Coefficient Computed in the “Collision-Time” Approximation Consider a collection of identical “test particles” with rest mass m 6= 0 that diffuse through a collection of thermalized “scattering centers”. (The test particles might be molecules of one species, and the scattering centers might be molecules of a much more numerous species.) The scattering centers have a temperature T such that kB T ≪ mc2 , so if the test particles acquire this same temperature they will have thermal speeds small compared to the speed of light, as measured in the mean rest frame of the scattering centers. We shall study the effects of scattering on the test particles using the following “collision-time” approximation for the collision terms in the Boltzmann equation: 2 dN 1 e−p /2mkB T = (N0 − N ) , where N0 ≡ n. (2.82) dt collision τˆ (2πmkB T )3/2 All frame-dependent quantities appearing here are evaluated in the mean rest frame of the scattering centers; in particular, t is Lorentz time, T is the temperature of the Rscattering centers, p = |p| is the magnitude of the test particles’ spatial momentum, n = N dVp is the number density of test particles and τˆ is a constant to be discussed below. (a) Show that this collision term preserves test particles in the sense that Z dN dn ≡ dpx dpy dpz = 0 . dt collision dt collision

(2.83)

(b) Explain why this collision term corresponds to the following physical picture: Each test particle has a probability per unit time 1/ˆ τ of scattering; and when it scatters, its direction of motion is randomized and its energy is thermalized at the scattering centers’ temperature. (c) Suppose that the temperature T is homogeneous (spatially constant), but the test particles are distributed inhomogeneously, n = n(x) 6=const. Let L be the length scale on which their number density n varies. What condition must L, τˆ, T , and m satisfy in order that the diffusion approximation be reasonably accurate? Assume that this condition is satisfied.

49 (d) Compute, in order of magnitude, the particle flux S = −D∇n produced by the gradient of the number density n, and thereby evaluate the diffusion coefficient D (e) Show that the Boltzmann transport equation takes the form pj ∂N 1 ∂N + = (N0 − N ) . ∂t m ∂xj τˆ

(2.84a)

(f) Show that to first order in a small diffusion-approximation parameter, the solution of this equation is N = N0 + N1 , where N0 is as defined in Eq. (2.82) above, and 2

pj τˆ ∂n e−p /2mkB T N1 = − . m ∂xj (2πmkB T )3/2

(2.84b)

Note that N0 is monopolar (independent of the direction of p), while N1 is dipolar (linear in p). (h) Show that the perturbation N1 gives rise to a particle flux given by Eq. (2.67d), with the diffusion coefficient kB T D= τˆ . (2.85) m Exercise 2.20 Example: Neutron Diffusion in a Nuclear Reactor Here are some salient, oversimplified facts about nuclear reactors (see, e.g., Stephenson 1954, especially Chap. 4): A reactor’s core is made of a mixture of natural uranium (0.72 percent 235 U and 99.28 percent 238 U) and a solid or liquid material such as carbon (graphite) or water, made of low-atomic-number atoms and called the moderator. For concreteness we shall assume that the moderator is graphite. Slow (thermalized) neutrons, with kinetic energies of ∼ 0.1 eV, get captured by the 235 U nuclei and trigger them to fission, releasing ∼ 170 MeV of kinetic energy per fission (which ultimately goes into heat and then electric power), and also releasing an average of about 2 fast neutrons (kinetic energies ∼ 1 MeV). The fast neutrons must be slowed to thermal speeds so as to capture onto 235 U atoms and induce further fissions. The slowing is achieved by scattering off the moderator atoms—a scattering in which the crucial effect, energy loss, occurs in momentum space. The momentum-space scattering is elastic and isotropic in the center-of-mass frame, with total cross section σs ≃ 4.8 × 10−24 cm2 ≡ 4.8 barns. Using the fact that in the moderator’s rest frame, the incoming neutron has a much higher kinetic energy than the moderator carbon atoms, and using energy and momentum conservation and the isotropy of the scattering, one can show that in the moderator’s rest frame, the logarithm of the neutron’s energy is reduced in each scattering by an average amount ξ that is independent of energy and is given by: A−1 (A − 1)2 ≃ 0.158 . (2.86) ln ξ ≡ −∆ ln E = 1 + 2A A+1 Here A ≃ 12 is the ratio of the mass of the scattering atom to the mass of the scattered neutron.

50 There is a dangerous hurdle that the diffusing neutrons must overcome during their slowdown: as the neutrons pass through a critical energy region of about 7 to 6 eV, the 238 U atoms can absorb them. The absorption Rcross section has a huge resonance there, with width about 1 eV and resonance integral σa d ln E ≃ 240 barns. For simplicity, we shall approximate the cross section in this absorption resonance by σa ≃ 1600 barns at 6eV < E < 7eV, and zero outside this range. To achieve a viable fission chain reaction and keep the reactor hot, it is necessary that about half of the neutrons (one per original 235 U fission) slow down through this resonant energy without getting absorbed. Those that make it through will thermalize and trigger new 235 U fissions (about one per original fission), maintaining the chain reaction. We shall idealize the Uranium and moderator atoms as homogeneously mixed on lengthscales small compared to the neutron mean free path, λs = 1/(σs ns ) ≃ 2 cm, where ns is the number density of moderator (carbon) atoms. Then the neutrons’ distribution function N , as they slow down, will be isotropic in direction and independent of position; and in our steady state situation, it will be independent of time. It therefore will depend only on the magnitude p of the neutron momentum, or equivalently on the neutron kinetic energy E = p2 /2m: N = N (E). Use the Boltzmann transport equation or other considerations to develop the theory of the slowing down of the neutrons in momentum space, and of their struggle to pass through the 238 U resonance region without getting absorbed. More specifically: (a) Use as the distribution function not N (E) but rather nE (E) ≡ dN/dVx dE = (number of neutrons per unit volume and per unit kinetic energy), and denote by q(E) the number of neutrons per unit volume that slow down through energy E per unit time. Show that outside the resonant absorption region these two quantities are related by √ q = σs ns ξE nE v , where v = 2mE (2.87) is the neutron speed, so q contains the same information as the distribution function nE . Explain why the steady-state operation of the nuclear reactor requires q to be independent of energy in this non-absorption region, and infer that nE ∝ E −3/2 . (b) Show further that inside the resonant absorption region, 6eV < E < 7eV, the relationship between q and E is modified: q = (σs ns + σa na )ξE nE v .

(2.88)

Here ns is the number density of scattering (carbon) atoms and na is the number density of absorbing (238 U) atoms. [Hint: require that the rate at which neutrons scatter into a tiny interval of energy δE ≪ ξE is equal to the rate at which they leave that tiny interval.] Then show that the absorption causes q to vary with energy according to the following differential equation: d ln q σa na = . d ln E (σs ns + σa na )ξ

(2.89)

51 (c) By solving this differential equation in our idealization of constant σa over the range 6 to 7 eV, show that the condition to maintain the chain reaction is σa ln(7/6) σa ns ≃ (2.90) − 1 ≃ 0.41 ≃ 140 . na σs ξ ln 2 σs Thus, to maintain the reaction in the presence of the huge 238 U absorption resonance for neutrons, it is necessary that more than 99 per cent of the reactor volume be taken up by moderator atoms and less than 1 per cent by uranium atoms. We reiterate that this is a rather idealized version of what happens inside a nuclear reactor, but it provides insight into some of the important processes and the magnitudes of various relevant quantitities. For a graphic example of an additional complexity, see the description of “Xenon poisoning” of the chain reaction in the first production-scale nuclear reactor (built during World War II to make plutonium for the first American atomic bombs), in John Archibald Wheeler’s autobiography (Wheeler 1998).

****************************

Bibliographic Note Newtonian kinetic theory is treated in many textbooks on statistical physics. At an elementary level, Chap. 14 of Kittel and Kroemer (1980) is rather good. At a more advanced level, see, e.g., Secs. 7.9–7.13 and Chaps. 12, 13, and 14 of Reiff (1965). For a very advanced treatment with extensive applications to electrons and ions in plasmas, and electrons, phonons and quasi-particles in liquids and solids, see Lifshitz and Pitaevskii (1981). Relativistic kinetic theory is rarely touched on in statistical-physics textbooks but is well known to researchers in general relativity and astrophysics. The treatment here is easily lifted into general relativity theory; see, e.g., Sec. 22.6 of Misner, Thorne and Wheeler (1973).

Bibliography Allen, C. W. 2000. Astrophysical Quantitites, fourth edition, London:The Athelone Press. Cohen-Tannoudji, Claude, Diu, Bernard, and Laloe, Franck. 1977. Quantum Mechanics, Volume 2. New York: Wiley. Clayton, Donald D. 1968. Principles of Stellar Evolution and Nucleosynthesis New York:McGraw-Hill Corey, B. E. & Wilkinson, David T. 1976. Bulletin of the American Astronomical Society, 8, 351.

52 Grad, Harold. 1957. Principles of the Kinetic Theory of Gases in Handbuch der Physik, Berlin: Springer Verlag, 205–294. Goldstein, Herbert, Poole, Charles, and Safko, John. 2002. Classical Mechanics, Third Edition, New York: Addison-Wesley. Jackson, John, David. 1999. Classical Electrodynamics, Third Edition, New York: Wiley. Kittel, C., & Kroemer, H. 1980. Thermal Physics, San Francisco: Freeman. Lifshitz, E.M. & Pitaevskii, L.P. 1981. Physical Kinetics, Volume 10 of Landau & Lifshitz Course of Theoretical Physics. Lynden-Bell, D. 1967. Monthly Notices of the Royal Astronomical Society, 136, 101. Misner, Charles W., Thorne, Kip S., & Wheeler, John A. 1973. Gravitation, San Francisco: Freeman. Reiff, F. 1965. Fundamentals of Statistical and Thermal Physics, New York: McGrawHill. Shkarofsky, I. P., Johnston, T. W., & Bachynski, M. P. 1966. The Particle Kinetics of Plasmas, New York: Addison-Wesley. Smoot, George F., Gorenstein, Mark V., & Muller, Richard A. 1977. Determination of anisotropy in the cosmic black body radiation”, Physical Review Letters, 39, 898–901. Stephenson, Richard. 1954. Introduction to Nuclear Engineering, New York: McGrawHill. Thorne, Kip S. 1981. “Relativistic radiative transfer: Moment formalisms”, Monthly Notices of the Royal Astronomical Society, 194, 439–473. Thorne, Kip S. 1994. Black Holes and Time Warps: Einstein’s Outrageous Legacy, New York: W.W. Norton. Wheeler, John Archibald. 1998. Geons, Black Holes and Quantum Foam: A Life in Physics, New York: W.W. Norton, pp. 54–59.

53 Box 2.4 Important Concepts in Chapter 2 • Foundational Concepts – Momentum space and phase space, Secs. 2.2.1 and 2.2.3. – Distribution function, three important variants: N = dN/dVx dVp (Secs. 2.2.1 and 2.2.3), mean occupation number η (Sec. 2.2.5), and for photons the specific intensity Iν (Sec. 2.2.4). – Frame invariance of N , η and Iν /ν 3 in relativity theory, Sec. 2.2.3. – Liouville’s theorem and collisionless Boltzmann equation (constancy of distribution function along a particle trajectory in phase space): Sec. 2.6. – Boltzmann transport equation with collisions: Eq. (2.63). – Thermal equilibrium: Fermi-Dirac, Bose-Einstein and Boltzmann distribution functions: Sec. 2.3. – Electron degeneracy: Sec. 2.5.2. – Macroscopic properties of matter expressed as momentum-space integrals— density, pressure, stress tensor, stress-energy tensor: Sec. 2.4. – Two-lengthscale expansion: Box 2.3. • Equations of State Computed Via Kinetic Theory: – Computed using momentum-space integrals: Eq. (2.36) and preceding discussion. – Important cases: nonrelativistic, classical gas, Sec. 2.5.1; degnerate Fermi gas, Eq. (2.41) and Sec. 2.5.4; radiation, Sec. 2.5.5. – Density-temperature plane for hydrogen gas: Sec. 2.5.2 and Fig. 2.7. • Transport Coefficients – Defined: beginning of Sec. 2.7. – Electrical conductivity, thermal conductivity, shear viscosity and diffusion coefficient [Eqs. (2.67)] – Order-of-magnitude computations of: Sec. 2.7.2 – Computations using Boltzmann transport equation: Sec. 2.7.3.

Contents 3 Statistical Mechanics 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Systems, Ensembles, and Distribution Function . . . . . . . . . . . . . . . . 3.2.1 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Liouville’s Theorem and the Evolution of the Distribution Function . . . . . 3.4 Statistical Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Canonical Ensemble and Distribution . . . . . . . . . . . . . . . . . . 3.4.2 General Equilibrium Ensemble and Distribution; Gibbs Ensemble; Grand Canonical Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Bose-Einstein and Fermi-Dirac Distributions . . . . . . . . . . . . . . 3.5 The Microcanonical Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 The Ergodic Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Entropy and Evolution Into Statistical Equilibrium . . . . . . . . . . . . . . 3.7.1 Entropy and the Second Law of Thermodynamics . . . . . . . . . . . 3.7.2 What Causes the Entropy to Increase? . . . . . . . . . . . . . . . . . 3.8 Grand Canonical Ensemble for an Ideal Monatomic Gas . . . . . . . . . . . 3.9 Entropy Per Particle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 T2 Bose-Einstein Condensate . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 T2 Statistical Mechanics in the Presence of Gravity . . . . . . . . . . . . . 3.11.1 T2 Galaxies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.2 T2 Black Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.3 The Universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.4 T2 Structure Formation in the Expanding Universe: Violent Relaxation and Phase Mixing . . . . . . . . . . . . . . . . . . . . . . . . . 3.12 T2 Entropy and Information . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1 T2 Information Gained When Measuring the State of a System in a Microcanonical Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.2 T2 Information in Communication Theory . . . . . . . . . . . . . . 3.12.3 T2 Examples of Information Content . . . . . . . . . . . . . . . . . 3.12.4 T2 Some Properties of Information . . . . . . . . . . . . . . . . . .

i

1 1 3 3 5 6 10 12 12 15 17 19 20 21 21 23 27 31 36 43 43 45 48 49 50 50 51 53 54

ii 3.12.5 T2 Capacity of Communication Channels; Erasing Information from Computer Memories . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

Chapter 3 Statistical Mechanics Version 0803.1.K by Kip, October 16, 2008. Please send comments, suggestions, and errata via email to [email protected] and [email protected], or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 3.1 Reader’s Guide • Relativity enters into portions of this chapter solely via the relativistic energies and momenta of high-speed particles (Box 1.4.) We presume that all readers are familiar with at least this much relativity and accordingly, we do not provide a Newtonian track through this chapter. We will make occasional additional side remarks for the benefit of relativistic readers, but a Newtonian reader’s failure to understand them will not compromise mastering all of this chapter’s material. • This chapter relies in crucial ways on Secs. 2.2 and 2.3 of Chap. 2. • Chapter 4 is an extension of this chapter. To understand it and portions of Chap. 5, one must master the fundamental concepts of statistical mechanics (Secs. 3.2–3.4, 3.5–3.7, and 3.9), and also the application to an ideal, monatomic gas (Sec. 3.8). • Other chapters do not depend strongly on this one.

3.1

Overview

While kinetic theory (Chap. 2) gives a powerful description of some statistical features of matter, other features are outside its realm and must be treated using the more sophisticated tools of statistical mechanics. Examples are: (i) Correlations: Kinetic theory’s distribution function N tells us, on average, how many particles will occupy a given phase-space volume, but it says nothing about whether the 1

2 particles like to clump, or avoid each other. It is therefore inadequate to describe the distribution of galaxies, which aggregate under their mutual gravitational attraction, or that of electrons (Chap. 21), which are mutually repulsive and thus are spatially anti-correlated. (ii) Fluctuations: In experiments to measure a very weak mechanical force (e.g. tests of the equivalence principle and searches for gravitational waves), one typically monitors the motion of a pendulum’s test mass, on which the force acts. Molecules of gas hitting the test mass also make it move. Kinetic theory predicts how many molecules will hit in one millisecond, on average, and how strong is the resulting pressure acting in all directions; but kinetic theory’s distribution function N cannot tell us the probability that in one millisecond more molecules will hit one side of the test mass than the other, mimicking the force to be measured. The probability distribution for fluctuations is an essential tool for analyzing the noise in this and any other physical experiment, and it falls in the domain of statistical mechanics, not kinetic theory. (iii) Strongly interacting particles: As should be familiar, the thermal motions of an ionic crystal are best described not in terms of individual atoms (as in the “Einstein theory”), but instead by decomposing the atoms’ motion into normal modes (phonons; “Debye theory”). The thermal excitation of phonons is governed by statistical mechanics. (iv) Microscopic origin of thermodynamic laws: The laws of classical thermodynamics can be (and often are) derived from a few elementary, macroscopic postulates without any reference to the microscopic, atomic nature of matter. Kinetic theory provides a microscopic foundation for some of thermodynamics’ abstract macroscopic ideas (e.g. the first law of thermodynamics) and permits the computation of equations of state. However a full appreciation of entropy and the second law of thermodynamics, and of behavior at phase transitions requires the machinery of statistical mechanics. In this chapter we shall develop the conceptual foundations for classical statistical mechanics and its interface with quantum physics, and we shall also delve deeply enough into the quantum world to be able to treat a few simple quantum problems. More specifically: In Sec. 3.2 we shall introduce the concepts of systems, ensembles of systems, and the distribution function for an ensemble. In Sec. 3.3 we shall use Hamiltonian dynamics to study the evolution of an ensemble’s distribution function and derive the statistical mechanical version of Liouville’s theorem. In Sec. 3.4, we shall develop the concept of statistical equilibrium and shall derive the general forms of distribution functions for ensembles of systems that have reached statistical equilibrium. In Sec. 3.5 we will study an illustrative example of statistical-equilibrium ensembles: the phenomenon of Bose-Einstein condensation. In Sec. 3.6 we shall explore a peculiar but important example of an equilibrium ensemble, one called (for historical reasons) microcanonical, and we shall learn its relationship to ergodicity. In Sec. 3.7 we shall introduce the concept of the entropy of an ensemble of systems and shall show that an ensemble of identical systems that are isolated from the external universe maximizes its entropy by evolving (via phase mixing and coarse-graining) into statistical equilibrium. Having laid all these foundations, we shall develop illustrative applications of them in Secs. 3.5, 3.8, 3.9 and a number of exercises. Our examples will include, in addition to Bose-Einstein condensation, a simple monatomic gas in both the nonrelativistic and ultrarelativistic domains, an ionized-hydrogen plasma, the mean occupation numbers of boson and fermion states, and stars, galaxies, black holes, and the universe as a whole.

3 For galaxies, black holes and the universe we will have to confront the role of gravity in statistical mechanics (Sec. 3.9). Finally, in Sec. 3.10 we will give a brief introduction to the concept of information and its connection to entropy.

3.2 3.2.1

Systems, Ensembles, and Distribution Function Systems

Systems play in statistical mechanics the same role as is played by particles in kinetic theory. A system is any physical entity. (Obviously, this is an exceedingly general concept!) Examples are a galaxy, the sun, a sapphire crystal, the fundamental mode of vibration of that crystal, an aluminum atom in that crystal, an electron from that aluminum atom, a quantum state in which that electron could reside, . . .. Statistical mechanics focuses special attention on systems that couple only weakly to the rest of the universe. Stated more precisely, we are interested in systems whose relevant “internal” evolution timescales, τint , are short compared with the “external” timescales, τext , on which they exchange energy, entropy, particles, etc. with their surroundings. Such systems are said to be semiclosed, and in the idealized limit where one completely ignores their external interactions, they are said to be closed. The statistical-mechanics formalism for dealing with them relies on the assumption τint /τext ≪ 1; in this sense, it is a variant of a two-lengthscale expansion (Box 2.3). Some examples will elucidate these concepts: For a galaxy of, say, 1011 stars, τint is the time it takes a star to cross the galaxy, τint ∼ 108 yr. The external timescale is the time since the galaxy’s last collison with a neighboring galaxy or the time since it was born by separating from the material that formed neighboring galaxies; both these times are τext ∼ 1010 yr, so τint /τext ∼ 1/100 and the galaxy is semiclosed. For a small volume of gas inside the sun, say 1 m on a side, τint is the timescale for the constituent electrons, ions and photons to interact through collisions, typically τint ∼ 10−11 s; this is much smaller than the time for external heat or particles to diffuse from the cube’s surface to its center, τext ∼ 10−5 s, so the cube is semiclosed. An individual atom in a crystal is so strongly coupled to its neighboring atoms by electrostatic forces that τint ∼ τext , which means the atom is not semiclosed. By contrast, for a vibrational mode of the crystal, τint is the mode’s vibration period and τext is the time to exchange energy with other modes and thereby damp the chosen mode’s vibrations; and quite generally, the damping time is far longer than the period, so the mode is semiclosed. (For a highly-polished, cold sapphire crystal, τext can be ∼ 109 τint .) Therefore, it is the crystal’s vibrational normal modes and not its atoms that are amenable to the statistical mechanical tools we shall develop. When a semiclosed classical system is idealized as closed, so its interactions with the external universe are ignored, then its evolution can be described using Hamiltonian dynamics.1 The system’s classical state is described by generalized coordinates q ≡ {qj } and generalized momenta p ≡ {pj }, where the index j runs from 1 to W = (the number of 1

See, for example, Goldstein et. al. (2002), or Thornton and Marion (2004).

4 degrees of freedom). The evolution of q, p is governed by Hamilton’s equations dqj ∂H = , dt ∂pj

dpj ∂H =− , dt ∂qj

(3.1)

where H(q, p) is the Hamiltonian and each equation is really W separate equations. Note that, because the system is idealized as closed, there is no explicit time dependence in the Hamiltonian. Of course, not all physical systems (e.g. not those with strong internal dissipation) are describable by Hamiltonian dynamics, though in principle this restriction can usually be circumvented by increasing the number of degrees of freedom to include the cause of the dissipation. Let us return to our examples. For an individual star inside a galaxy, there are three degrees of freedom (W = 3) which we might choose to be the motion along three mutually orthogonal Cartesian directions so q1 = x, q2 = y, q3 = z. Because the star’s speed is small compared to light, it’s Hamiltonian has the standard form for a nonrelativistic particle H(q, p) =

1 (p1 2 + p2 2 + p3 2 ) + mΦ(q1 , q2 , q3 ) . 2m

(3.2a)

Here m is the stellar mass and Φ(q1 , q2 , q3 ) is the galaxy’s Newtonian gravitational potential (whose sign we take to be negative). Now make a canonical transformation 2 (in this case the same as a coordinate transformation) to new Q1 = r, Q2 = θ, Q3 = φ, where (r, θ, φ) are the star’s spherical polar coordinates with r measured from the center of the galaxy. The corresponding, canonically conjugate momenta turn out to be P1 = pr , P2 = rpθ , P3 = r sin θpφ where pr , pθ and pφ are the components of the star’s momentum along unit vectors that point in the r, θ, and φ directions. In terms of these new coordinates, the Hamiltonian (3.2a) takes the form P2 2 P3 2 1 2 P1 + 2 + 2 2 H= + mΦ(r, θ, φ) , (3.2b) 2m r r sin θ and Hamilton’s equations with this Hamiltonian continue to govern the star’s motion. Now consider not just one star, but K ∼ 1011 of them in a galaxy. There are now W = 3K degrees of freedom and the Hamiltonian is simply the sum of the Hamiltonians for each individual star so long as we ignore interactions between stars. If our system is the fundamental mode of a sapphire crystal, then the number of degrees of freedom is only W = 1 and we can take the single generalized coordinate q to be the displacement of one end of the crystal from equilibrium; there will be an “effective mass” M for the mode (approximately equal to the actual mass of the crystal) such that the mode’s generalized momentum is p = Mdq/dt. The Hamiltonian will be the standard one for a harmonic oscillator: 1 p2 + Mω 2 q 2 , (3.3a) H(p, q) = 2M 2 where ω is the mode’s angular frequency of oscillation. 2

See Ex. 3.1; also Goldstein et. al. (2002) or Thornton and Marion (2004).

5 If we want to describe a whole crystal with N ∼ 1027 atoms, then we obtain H by summing over W = 3N oscillator Hamiltonians for the crystal’s W normal modes and adding an interaction potential Hint that accounts for the very weak interactions between modes: W X p2j 1 2 2 (3.3b) + Mj ωj qj + Hint (q1 , . . . , qW , p1 , . . . , pW ) . H= 2M 2 j j=1 Here Mj is the effective mass of mode jm and ωj is the mode’s angular frequency. This description of the crystal is preferable to one in which we use, as our generalized coordinates and momenta, the coordinate locations and momentum components of each of the 1027 atoms. Why? Because the normal modes are so weakly coupled to each other, that they are semiclosed subsystems of the crystal, whereas the atoms are so strongly coupled that they are not, individually, semiclosed. As we shall see, there is great power in decomposing a complicated system into semiclosed subsystems.

3.2.2

Ensembles

In kinetic theory we study, statistically, a collection of a huge number of particles. Similarly, in statistical mechanics, we study, statistically, a collection or ensemble of a huge number of systems. This ensemble is actually only a conceptual device, a foundation for statistical arguments that take the form of thought experiments. As we shall see, there are many different ways that one can imagine forming an ensemble and this freedom can be used to solve many different types of problems. In some applications, we require that all the systems in the ensemble be closed, and be identical in the sense that they all have identically the same number of degrees of freedom W , and are governed by Hamiltonians with identically the same functional forms H(q, p), and have identically the same volume V and total internal energy E. However, the values of the generalized coordinates and momenta at a specific time t, {q(t), p(t)}, need not be the same; i.e., the systems need not be in the same state at time t. If such a conceptual ensemble of identical closed systems (first studied by Boltzmann) evolves until it reaches “statistical equilibrium” (Sec. 3.4), it then is called microcanonical ; see Table 3.1. Ensemble Microcanonical Canonical Gibbs Grand Canonical

Quantities Exchanged with Surroundings nothing Energy E Energy E and Volume V Energy E and number of particles NI of various species I

Table 3.1: Statistical-equilibrium ensembles used in this chapter.

Sometimes we will deal with an ensemble of systems that can exchange energy (heat) with their identical surroundings so the internal energy of each system can fluctuate. If the surroundings (sometimes called heat baths) have far greater heat capacity than the individual

6 systems, and if statistical equilibrium has been reached, then we call this sort of ensemble (introduced by Gibbs) canonical. At the next level of freedom, the systems can also expand, i.e. they can exchange volume as well as energy with their identical surroundings. This was also studied by Gibbs and in equilibrium is known as the Gibbs ensemble. A fourth ensemble in common use is Pauli’s grand canonical ensemble in which each system can exchange energy and particles (but not volume) with its surroundings; see Table 3.1. We will study these equilibrium ensembles in Sec. 3.4 below.

3.2.3

Distribution Function

In kinetic theory (Chap. 2), we described the statistical properties of a collection of identical particles by a distribution function, and we found it useful to tie that distribution function’s normalization to quantum theory: η(t; x, p) = (mean number of particles that occupy a quantum state at location {x, p} in 6-dimensional phase space at time t). In statistical mechanics, we will use the obvious generalization of this: η = (mean number of systems that occupy a quantum state at location {q, p} in an ensemble’s 2W -dimensional phase space, at time t) — except that we need two modifications: First: This η is proportional to the number of systems Nsys in our ensemble. (If we double Nsys , then η will double.) Because our ensemble is only a conceptual device, we don’t really care how many systems it contains, so we divide η by Nsys to get a renormalized, Nsys -independent distribution function, ρ = η/Nsys , whose physical interpretation is probability that a system, drawn randomly from our ensemble, ρ(t; q, p) = . will be in a quantum state at location (q, p) in phase space, at time t (3.4) Second: If the systems of our ensemble can exchange particles with the external universe (as is the case, for example, in the grand canonical ensemble of Table 3.1), then their number W of degrees of freedom can change, so ρ depends on W as well as on location in the 2W -dimensional phase space: ρ(t; W, q, p). In the sector of the system’s phase space with W degrees of freedom, denote the number density of quantum states by Nstates (W, q, p) =

dNstates dNstates . ≡ W W d qd p dΓW

(3.5)

Here dW q ≡ dq1 dq2 · · · dqW ,

dW p ≡ dp1 dp2 · · · dpW ,

dΓW ≡ dW qdW p .

(3.6)

Then the sum of the occupation probability ρ over all quantum states, which must (by the meaning of probability) be unity, takes the following form: X XZ ρNstates dΓW . (3.7) ρn = n

W

Here, on the left side n is a formal index that labels the various quantum states |ni available to the ensemble’s systems; and on the right side the sum is over all possible values of the

7 system’s dimensionality W , and the integral is over all of the 2W -dimensional phase space, with dΓW a short-hand notation for the phase-space integration element dW qdW p. Equations (3.4)–(3.7) require some discussion: Just as the event {t, x} and 4-momentum {E, p} in relativistic kinetic theory are geometric, frame-independent objects, similarly location in phase space in statistical mechanics is a geometric, coordinate-independent concept (though our notation does not emphasize it). The quantities {q, p} ≡ {q1 , q2 , ...qW , p1 , p2 , ...pW } are the coordinates of that phase-space location. When one makes a canonical transformation from one set of generalized coordinates and momenta to another, the q’s and p’s change but the geometric location in phase space does not. Moreover, just as the individual spatial and momentum volumes dVx and dVp occupied by a set of relativistic particles in kinetic theory are frame-dependent but their product dVx dVp is frame-independent [cf. Eqs. (2.8a)–(2.8c)], so also in statistical mechanics the volumes dW q and dW p occupied by some chosen set of systems are dependent on the choice of canonical coordinates and they change under a canonical transformation, but the product dW qdW p ≡ dΓW (the systems’ total volume in phase space) is independent of the choice of canonical coordinates and is unchanged by a canonical transformation. Correspondingly, the number density of states in phase space Nstates = dNstates /dΓW and the statistical mechanical distribution function ρ(t; W, q, p), like their kinetic-theory counterparts, are geometric, coordinate-independent quantities; i.e., they are unchanged by a canonical transformation. See Ex. 3.1. Classical thermodynamics was one of the crowning achievements of nineteenth century science. However, thermodynamics was inevitably incomplete and had to remain so until the development of quantum theory. A major difficulty, one that we have already confronted in Chap. 2, was how to count the number of states available to a system. As we saw in Chap. 2, the number density of quantum mechanical states in the 6-dimensional, single-particle phase space of kinetic theory is (ignoring particle spin) Nstates = 1/h3 , where h is Planck’s constant. Generalising to the 2W -dimensional phase space of statistical mechanics, the number density of states turns out to be 1/hW , (one factor of 1/h for each of the canonical pairs (q1 , p1 ), (q2 , p2 ) , · · · , (qW , pW ).) Formally, this follows from the canonical quantization procedure of elementary quantum mechanics. There was a second problem in nineteenth century classical thermodynamics, that of distinguishability: If we swap two similar atoms in phase space do we have a new state or not? If we mix two containers of the same gas at the same temperature and pressure, does the entropy increase? This problem was recognized classically, but was not resolved in a completely satisfactory classical manner. When the laws of quantum mechanics were developed, it became clear that all identical particles are indistinguishable (Ex. 3.9 below), so having particle 1 at location A in phase space and an identical particle 2 at location B must be counted as the same state as particle 1 at B and particle 2 at A. Correspondingly, if we attribute half the quantum state to the classical phase space location {1 at A, 2 at B} and the other half to {1 at B, 2 at A}, then the classical number density of states per unit volume of phase space must be reduced by a factor 2—and more generally by some multiplicity factor M. In general, therefore, we can write the actual number density of

8 states in phase space as Nstates =

dNstates 1 = , dΓW MhW

(3.8a)

and correspondingly we can rewrite the normalization condition (3.7) for our probabilistic distribution function as: X XZ XZ dΓW =1. ρn ≡ ρNstates dΓW = ρ (3.8b) MhW n W W This equation can be regarded, in the classical domain, as defining the meaning of the sum over states n. We shall make extensive use of such sums over states. For N identical and indistinguishable particles with zero spin, it is not hard to see that M = N!. If we include the effects of quantum mechanical spin, then there are gs [Eq. (2.15)] more states present in phase space than we thought, so the multiplicity M (the number of different phase space locations to be attributed to each state) is reduced to M=

N! gs

for a system of N identical particles with spin s.

(3.8c)

This is the quantity that appears in the denominator of the sum over states, Eq. (3.8b). Occasionally, for conceptual purposes, it is useful to introduce a renormalized distribution function analogous to kinetic theory’s number density of particles Nsys in phase space: Nsys = Nsys Nstates ρ =

dnumber of systems . dvolume in 2W -dimensional phase space

(3.9)

However, this version of the distribution function will rarely if ever be useful computationally. Each system in an ensemble is endowed with a total energy that is equal to its Hamiltonian, E = H(q, p) [or E = H(q, p) nonrelativistically]. Because different systems reside at different locations (q, p) in phase space, they typically will have different energies. A quantity of much interest is the ensemble-averaged energy, which is the average value of E over all systems in the ensemble XZ XZ X dΓW ρE ρENstates dΓW = hEi = ρn En = . (3.10a) W Mh n W W For any other function A(q, p) defined on the phase space of a system, for example the linear momentum or the angular momentum, one can compute an ensemble average by the obvious analog of Eq. (3.10a): X hAi = ρn An . (3.10b) n

Our probabilistic distribution function ρn = ρ(t; W, q, p) has deeper connections to quantum theory than the above discussion reveals: In the quantum domain, even if we start with a system whose wave function ψ is in a pure state (ordinary, everyday type of quantum

9 state), the system may evolve into a mixed state as a result of (i) interaction with the rest of the universe and (ii) our choice not to keep track of correlations between the universe and the system; Box 3.2 and Sec. 3.7.2 below. The system’s initial, pure state can be described in geometric, basis-independent quantum language by a state vector (“ket”) |ψi; but its final, mixed state requires a different kind of quantum description: a density operator ρˆ. In the classical limit, the quantum mechanical density operator ρˆ becomes our classical probabilistic distribution function ρ(t, W, q, p); see Box 3.2 for some details. **************************** EXERCISES Exercise 3.1 Derivation and Example: Canonical Transformation1 Let (qj , pk ) be one set of generalized coordinates and momenta for a given system and let (Qj , Pk ) be another set. Then (except in degenerate cases, for which a different generating function must be used) there is a generating function F (qj , Pk ), which depends on the “old” coordinates qj and “new” momenta Pk , such that pj =

∂F , ∂qj

Qj =

∂F . ∂Pj

(3.11)

(a) As an example, what are the new coordinates and momenta in terms of the old that result from W X F = fi (qj )Pi , (3.12) i=1

where fi are arbitrary functions of the old coordinates? (b) The canonical transformation generated by Eq. (3.11) for arbitrary F (qj , Pk ) leaves unchanged the value, but not the functional form, of the Hamiltonian at each point in phase space, i.e., H is a geometric, coordinate-independent function (scalar field) of location in phase space. Show, for the special case of a system with one degree of freedom (one q, one p, one Q, and one P ) that, if Hamilton’s equations (3.1) are satisfied in the old variables (q, p), then they will be satisfied in the new variables (Q, P ). (c) Show, for a system with one degree of freedom, that although dq 6= dQ and dp 6= dP , the volume in phase space is unaffected by the canonical transformation: dpdq = dP dQ. H H (d) Hence show that for any closed path in phase space, pdq = P dQ. (e) As a higher-dimensional example (W = 3), consider a star moving in a galaxy [Eqs. (3.2a), (3.2b) and associated discussion]. Show that d3 q = dxdydz 6= d3 Q = drdθdφ, and d3 p 6= d3 P , but d3 qd3 p = d3 Qd3 P .

****************************

10 Box 3.2 T2 Density Operator and Quantum Statistical Mechanics For readers who know some quantum statistical mechanics, we here describe briefly the connection of our probabilistic distribution function ρ to the full quantum statistical theory as laid out, e.g., in Feynman (1972). Consider a single quantum mechanical system that is in a pure state |ψi. One can formulate the theory of such a pure state equally well in terms of |ψi or the density operator ̺ˆ ≡ |ψihψ|. For example, the expectation value of some observable, described ˆ can be expressed equally well as hAi = hψ|A|ψi ˆ by a Hermitian operator A, orPas hAi = ˆ Trace(ˆ ̺A). (In any basis |φi i, “Trace” is just the trace of the matrix product j ̺ij Ajk , ˆ k i, and ̺ij ≡ hφi |ˆ where Ajk ≡ hφj |A|φ ̺|φj i is called the density matrix in that basis.) If our chosen system interacts with the external universe and we have no knowledge of the correlations that the interaction creates between the system and the universe, then the interaction drives the system into a mixed state, which is describable by a density operator ̺ˆ but not by a ket vector |ψi. This ̺ˆ can be regarded as a classical-type average of |ψihψ| over an ensemble of systems, each of which has interacted with the external universe and then has been driven into a pure state |ψi by a measurement of the universe. Equivalently, ̺ˆ can be constructed from the pure state of universe plus system by “tracing over the universe’s degrees of freedom”. If the systems in the ensemble behave nearly classically, then it turns out that in the basis |φn i, whose states are labeled by the classical variables n = {W, q, p}, the density matrix ̺nm ≡ hφn |ˆ ̺|φm i is very nearly diagonal. The classical probability ρn of classical statistical mechanics (and of this book when dealing with classical or quantum systems) is then equal to the diagonal value of this density matrix: ρn = ̺nn . It can be demonstrated that the equation of motion for the density operator ̺ˆ, when the systems in the quantum mechanical ensemble are all evolving freely (no significant interactions with the external universe), is ∂ ̺ˆ 1 ˆ =0. + [ˆ ̺, H] ∂t i~

(1)

This is the quantum statistical analog of Liouville’s equation (3.14), and the quantum ˆ appearing here is the quantum mechanical analog of the mechanical commutator [ˆ ̺, H] Poisson bracket [ρ, H]q,p, which appears in Liouville’s equation. If the quantum systems ˆ so the density matrix are in eigenstates of their Hamiltonians, then ̺ˆ commutes with H is constant in time and there will be no transitions. This is the quantum analog of the classical ρ being constant in time and thus a constant of the motion; Sec. 3.4 below.

3.3

Liouville’s Theorem and the Evolution of the Distribution Function

In kinetic theory the distribution function N was not only a frame-independent entity; it was also a constant along the trajectory of any freely moving particle, so long as collisions

11 between particles were negligible. Similarly, in statistical mechanics the probability ρ is not only coordinate-independent (unaffected by canonical transformations); ρ is also a constant along the phase-space trajectory of any freely evolving system, so long as the systems in the ensemble are not interacting significantly with the external universe, i.e. so long as they can be idealized as closed. This is the statistical mechanical version of Liouville’s theorem, and its proof is a simple exercise in Hamiltonian mechanics, analogous to the “sophisticated” proof of the collisionless Boltzmann equation in Box 2.2: Since the ensemble’s systems are closed, no system changes its dimensionality W during its evolution. This permits us to fix W in the proof. Since no systems are created or destroyed during their evolution, the number density of systems in phase space, Nsys = Nsys Nstates ρ [Eq. (3.9)] must obey the same kind of conservation law as we encountered in Eq. (1.73) for electric charge and particle number in Newtonian physics. For particle number, the conservation law is ∂n/∂t + ∇ · (nv) = 0, where n is the number density of particles in physical space, v is their velocity in physical space, and nv is their flux. Our ensemble’s systems have velocity dqj /dt = ∂H/∂pj in physical space, and “velocity” dpj /dt = −∂H/∂qj in momentum space, so the conservation law (valid for ρ as well as Nsys since they are proportional to each other) is ∂ρ dqj ∂ dpj ∂ ρ + ρ =0. (3.13) + ∂t ∂qj dt ∂pj dt Here the implicit sums over j are from 1 to W . Using Hamilton’s equations, we can rewrite this as ∂H ∂ ∂H ∂ρ ∂ρ ∂H ∂ρ ∂ ∂ρ ∂H ∂ρ ρ − ρ = − = + + + [ρ, H]q,p , 0= ∂t ∂qj ∂pj ∂pj ∂qj ∂t ∂qj ∂pj ∂pj ∂qj ∂t where [ρ, H]q,p is the Poisson bracket.1 By using Hamilton’s equations once again in the third expression, we discover that this is the time derivative of ρ moving with a fiducial system through the 2W -dimensional phase space: dρ ∂ρ dqj ∂ρ dpj ∂ρ ∂ρ ≡ + + = + [ρ, H]q,p = 0. dt ∂t dt ∂qj dt ∂pj ∂t

(3.14)

Therefore, the probability ρ is constant along the system’s phase space trajectory, as was to be proved. We shall call Eq. (3.14), which embodies this Liouville theorem, the statistical mechanical Liouville equation or collisionless Boltzmann equation. As a simple, qualitative example, consider a system consisting of hot gas expanding adiabatically so that its large random kinetic energy is converted into ordered radial motion. If we examine a set G of such systems very close to each other in phase space, then it is apparent that, as the expansion proceeds, the size of G’s physical-space volume dW q increases and the size of its momentum-space volume dW p diminishes, so that the product dW qdW p remains constant (Fig. 3.1), and correspondingly ρ ∝ Nsys = dNsys /dW qdW p is constant. What happens if the systems being studied interact weakly with their surroundings? We must then include an interaction term on the right hand side of Eq. (3.14), thereby converting

12 p

p

i

i

∆p

i

q

qi

i

(a)

(b)

Fig. 3.1: Liouville’s theorem: (a) The region in the qi − pi part of phase space (with i fixed) occupied by a set G of identical, closed systems at time t = 0. (b) The region occupied by the same set of systems a short time later, t > 0. The Hamiltonian-generated evolution of the individual systems has moved them in such a manner as to skew the region they occupy, but the volume R dpi dqi is unchanged.

it into the statistical mechanical version of the Boltzmann transport equation:

dρ dt

= moving with a fiducial system

dρ dt

.

(3.15)

interactions

Here the time derivative on the left is taken moving through phase space with a fiducial system that does not interact with the external universe.

3.4 3.4.1

Statistical Equilibrium Canonical Ensemble and Distribution

Consider an ensemble of identical systems, all of which have the same huge number of degrees of freedom (dimensionality W ≫ 1). Put all the systems initially in identically the same state, and then let them exchange heat (but not particles or volume or anything else) with an external thermal bath that has a huge heat capacity and is in thermodynamic equilibrium at some temperature T . (For example, the systems might be impermeable cubes of gas 1 kilometer on a side near the center of the sun, and the thermal bath might be all the surrounding gas near the sun’s center; or the systems might be identical sapphire crystals inside a huge cryostat, and the thermal bath might be the cryostat’s huge store of liquid Helium.) After a sufficiently long time, t ≫ τext , the ensemble will settle down into equilibrium with the bath, i.e. it will become the canonical ensemble mentioned in Table 3.1 above. In this final, canonical equilibrium state, the probability ρ(t, q, p) will be independent of time t, and it no longer will be affected by interactions with the external environment; i.e., the interaction terms in the evolution equation (3.15) will have ceased to have any net effect: on average, for each interaction event that feeds energy into a system there will be an interaction event that takes away an equal amount of energy. The distribution function, therefore, will satisfy the interaction-free, collisionless Boltzmann equation (3.14) with the

13 time derivative ∂ρ/∂t removed: [ρ, H]q,p ≡

∂ρ ∂H ∂ρ ∂H − =0 ∂qj ∂pj ∂pj ∂qj

(3.16)

We shall use the phrase statistical equilibrium to refer to any ensemble whose distribution function has attained such a state and thus satisfies Eq. (3.16). Equation (3.16) is a well-known equation in Hamiltonian mechanics; it says that ρ is a function solely of constants of the individual systems’ Hamiltonian-induced motions;1 i.e., ρ can depend on location (q, p) in phase space only through those constants of the motion. Sometimes this goes by the name Jeans’ Theorem. Among the constants of motion in typical situations (for typical Hamiltonians) are the system’s energy E, its linear momentum P, its angular momentum J, its number NI of conserved particles of various types I (e.g. electrons, protons, ...), and its volume V . Note that these constants of motion are all additive: if we double the size of a system, they each double. We shall call such additive constants of the Hamiltonian-induced motion extensive variables (a term borrowed from thermodynamics) and we shall denote them by an enumerated list K1 , K2 , .... Now, the systems that we are studying have exchanged energy E with their environment (the thermal bath) and thereby have acquired some range of E’s; therefore, ρ can depend on E. However, the systems have not exchanged anything else with their environment, and they all thus have retained their original (identical) values of the other extensive variables KA ; therefore, ρ must be a delta function in the others. We shall write ρ = ρ(E) ,

(3.17a)

and shall not write down the delta functions explicitly. As an aid in discovering the form of the function ρ(E), let us decompose each system in the ensemble into a huge number of subsystems. For example, each system might be a cube 1 km on a side inside the sun and its subsystems might be the 109 1-m cubes into which one can divide it, or the systems might be identical sapphire crystals each containing 1027 atoms, and the subsystems might be the crystals’ 3 × 1027 normal modes of vibration. We shall label the subsystems of each system by an integer a in such a way that subsystem a in one system has the same Hamiltonian as subsystem a in any other system. (For the sapphire crystals, a = 1 could be the fundamental mode of vibration, a = 2 the first harmonic, a = 3 the second harmonic, etc.) The subsystems with fixed a make up a subensemble because of their relationship to the original ensemble. Because the full ensemble is in statistical equilibrium, the subensembles will also be in statistical equilibrium; and therefore their probabilities must be functions of those extensive variables E, KA that they can exchange with each other: ρa = ρa (Ea , K1 a , K2 a , ...) .

(3.17b)

(Although each system can exchange only energy E with its heat bath, the subsystems may be able to exchange other quantities with each other; for example, if subsystem a is a 1-meter cube inside the sun with permeable walls, then it can exchange energy Ea and particles of all species I, so KI a = NI a .)

14 Since there is such a huge number of subsystems in each system, it is reasonable to expect that in statistical equilibrium there will be no significant correlations at any given time between the actual state of subsystem a and the state of any other subsystem. In other words, the probability ρa (Wa, qa , pa ) that subsystem a is in a quantum state with Wa degrees of freedom and with its generalized coordinates and momenta near the values (qa , pa ) is independent of the state of any other subsystem. This lack of correlations, which has the mathematical statement Y ρ(E) = ρa , (3.17c) a

is called statistical independence. (Statistical independence is actually a consequence of a “2-lengthscale approximation” [Box 2.3]. The size of each subsystem is far smaller than that of the full system, and precise statistical independence arises in the limit as the ratio of these sizes goes to zero.) Statistical independence places a severe constraint on the functional forms of ρ and ρa , as the following argument shows. By taking the logarithm of Eq. (3.17c), we obtain X ln ρ(E) = ln ρa (Ea , K1 a , ...) . (3.17d) a

We also know, since energy is a linearly additive quantity, that X E= Ea .

(3.17e)

a

Now, we have not stipulated the way in which the systems are decomposed into subsystems. For our solar example, the subsystems might have been 2-m cubes or 7-m cubes rather than 1-m cubes. Exploiting this freedom, one can deduce that Eqs. (3.17d) and (3.17e) can be satisfied simultaneously if and only if ln ρ and ln ρa depend linearly on the energies E and Ea , with the same proportionality constant −β: ln ρa = −βEa + (some function of K1 a , K2 a , ...) , ln ρ = −βE + constant.

(3.18a) (3.18b)

Since the parameter β is the same for all the systems and their subsystems, it must be some characteristic of the thermal bath with which the systems and subsystems equilibrated. By exponentiating Eq. (3.18b) and noting that it has the same functional form as the Boltzmann distribution (2.22d) of kinetic theory, we infer that β must be 1/kB T , where T is the temperature of the thermal bath. To summarize, an ensemble of identical systems with many degrees of freedom W ≫ 1, which have reached statistical equilibrium by exchanging energy but nothing else with a huge thermal bath, has the following canonical distribution function: ρcanonical = C exp(−E/kB T ) .

(3.19)

Here E(q, p) is the energy of a system at location {q, p} in phase space, k is Boltzmann’s constant, T is the temperature Pof the heat bath, and C is whatever normalization constant is required to guarantee that n ρn = 1.

15 Actually, we have proved more than this. Not only must the ensemble of huge systems (W ≫ 1) have the energy dependence ρ ∝ exp(−E/kB T ), so must each subensemble of smaller systems, ρa ∝ exp(−Ea /kB T ), even if (for example) the subensemble’s identical subsystems have only one degree of freedom Wa = 1. Thus, if the subsystems exchanged only heat with their parent systems, then they must have the same canonical distribution (3.19) as the parents. This shows that the canonical distribution is the equilibrium state independently of the number of degrees of freedom W .

3.4.2

General Equilibrium Ensemble and Distribution; Gibbs Ensemble; Grand Canonical Ensemble

We can easily generalize this canonical distribution to an ensemble of systems that exchange other additive conserved quantitities (extensive variables) K1 , K2 , ..., in addition to energy E, with a huge, thermalized bath. By an obvious generalization of the above argument, the resulting statistical equilibrium distribution function must have the form ! X ρ = C exp −βE − βA KA . (3.20) A

When the extensive variables KA that are exchanged with the bath, and that thus appear explicitly in the distribution function ρ, are energy E, and/or momentum P, and/or angular momentum J, and/or the number NI of the species I of conserved particles, and/or volume V , it is conventional to rename the multiplicative factors β and βA so that ρ takes on the following form P −E + U · P + Ω · J + I µ ˜I NI − P V ρ = C exp (3.21) . kB T Here kB is Boltzmann’s constant; T , U, Ω, µ ˜I , and P are constants (called intensive variables) that are the same for all systems and subsystems, i.e., that characterize the full ensemble and all its subensembles and that therefore must have been acquired from the bath; and any extensive variable that is not exchanged with the bath must be omitted from the exponential and be replaced by an implicit delta function. As we have seen, T is the temperature that the ensemble and subensembles acquired from the bath; i.e., it is the bath temperature. From the Lorentz transformation law for energy and momentum [E ′ = γ(E − U · P)] we see that, if we were to transform to a reference frame that moves with velocity U with respect to our original frame, then the exp(U · P/kB T ) term in ρ would disappear, and the distribution function would be isotropic in P. This tells us that U is the velocity of the bath with respect to our chosen reference frame. By a similar argument, Ω is the bath’s angular velocity with respect to an inertial frame. By comparison with Eq. (2.22d) we see that µ ˜I is the chemical potential of the conserved species I. Finally, experience with elementary thermodynamics suggests (and it turns out to be true) that P is the bath’s pressure.3 Note that, by contrast with the corresponding extensive variables E, 3 One can also identify these physical interpretations of T , µ ˜I and P by analyzing idealized measuring devices; cf. Sec. 4.2 of the next chapter.

16 P, J, NI , and V , the intensive variables T , U, Ω, µ ˜ I , and P do not double when the size of a system is doubled, i.e. they are not additive; rather, they are properties of the ensemble as a whole and thus are independent of the systems’ sizes. By removing the rest masses of all the particles from each system’s energy and similarly removing the particle rest mass from each chemical potential, X E≡E− NI mI , µI ≡ µ ˜ I − mI (3.22) I

[Eqs. (2.20) and (2.21)], we bring the distribution function into a form that is identical to Eq. (3.21) but with E → E and µ ˜ I → µI :

−E + U · P + Ω · J + ρ = C exp kB T

P

I

µI NI − P V

.

(3.23)

This is the form used in Newtonian theory, but it is also valid relativistically. Henceforth (except in Sec. 3.8, when discussing black-hole atmospheres), we shall restrict our baths always to be at rest in our chosen reference frame and to be non-rotating with respect to inertial frames, so that U = Ω = 0. The distribution function ρ can then either be a delta function in the system momentum P and angular momentum J (if momentum and angular momentum are not exchanged with the bath), or it can involve no explicit dependence whatsoever on P and J (if momentum and angular momentum are exchanged with the bath; cf. Eq. (3.21) with U = Ω = 0). In either case, if energy is the only other quantity exchanged with the bath, then the distribution function is said to be canonical and has the form (3.19): −E −E ′ = C exp , (3.24a) ρcanonical = C exp kB T kB T P where (obviously) the constants C and C ′ are related by C ′ = C exp[− I NI mI /kB T ]. If, in addition to energy, volume can also be exchanged with the bath (e.g., if the systems are floppy bags of gas whose volumes can change and through which heat can flow), then the equilibrium is the Gibbs ensemble, which has the distribution function ρGibbs

−(E + P V ) = C exp kB T

−(E + P V ) = C exp kB T ′

(3.24b)

(and an implicit delta function in NI and possibly in J and P). The combination E + P V is known as the enthalpy H. If the exchanged quantities are energy and particles but not volume (e.g., if the systems are 1-m cubes inside the sun with totally imaginary walls through which particles and heat can flow), then the equilibrium is the grand canonical ensemble, with P P −E + I µ −E + I µI NI ˜ I NI ρgrand canonical = C exp = C exp (3.24c) kB T kB T

17 (and an implicit delta function in V and perhaps in J and P). See the summary in Table 3.1. We mention, as a preview of an issue to be addressed in Chap. 4, that an individual system, picked randomly from the ensemble and then viewed as a bath for its own tiny subsystems, will not have identically the same temperature T , and/or chemical potential µ ˜I , and/or pressure P as the huge bath with which the ensemble has equilibrated; rather, the individual system’s T , µ ˜ I , and/or P can fluctuate a tiny bit about the huge bath’s values (about the values that appear in the above probabilities), just as its E, NI , and/or V fluctuate. We shall study these fluctuations in Chap. 4.

3.4.3

Bose-Einstein and Fermi-Dirac Distributions

The concepts and results developed in this chapter have enormous generality. They are valid (when handled with sufficient care) for quantum systems as well as classical, and they are valid for semiclosed or closed systems of any type whatsoever. The systems need not resemble the examples we have met in the text. They can be radically different, but so long as they are closed or semiclosed, our concepts and results will apply. As an important example, let each system be a single-particle quantum state of some field, rather than a collection of particles or normal modes of a crystal. These quantum states can exchange particles (quanta) with each other. As we shall see, in this case the above considerations imply that, in statistical equilibrium at temperature T , the mean number of particles in a state, whose individual particle energies are E, is given by the FermiDirac formula (for fermions) η = 1/(e(E−˜µ)/kB T + 1) and Bose-Einstein formula (for bosons) η = 1/(e(E−˜µ)/kB T − 1) , which we used in our kinetic-theory studies in the last chapter [Eqs. (2.22a), (2.22b)]. Our derivation of these mean occupation numbers will illustrate the closeness of classical statistical mechanics and quantum statistical mechanics: the proof is fundamentally quantum mechanical because the regime η ∼ 1 is quantum mechanical (it violates the classical condition η ≪ 1); nevertheless, the proof makes use of precisely the same concepts and techniques as we have developed for our classical studies. As a conceptual aid in the derivation, consider an ensemble of complex systems in statistical equilibrium. Each system can be regarded as made up of a large number of fermions (electrons, protons, neutrons, neutrinos, . . .) and/or bosons (photons, gravitons, alpha particles, phonons, . . .). We shall analyze each system by identifying a complete set of singleparticle quantum states (which we shall call modes) into which the particles can be inserted. (For photons, these “modes” are the normal modes of the classical electromagnetic field; for phonons in a crystal, they are the normal modes of the crystal’s vibrations; for nonrelativistic electrons or protons or alpha particles, they are energy eigenstates of the nonrelativistic Schroedinger equation; for relativistic electrons, they are energy eigenstates of the Dirac equation.) A complete enumeration of modes is the starting point for the second quantization formulation of quantum field theory, and also the starting point for our far simpler analysis. Choose one specific mode S [e.g., a nonrelativistic electron plane-wave mode in a box of side L with spin up and momentum p = (5, 3, 17)h/L)]. There is one such mode S in each of the systems in our ensemble, and these modes (all identical in their properties)

18 form a subensemble of our original ensemble. Our derivation focuses on this subensemble of identical modes S. Because each of these modes can exchange energy and particles with all the other modes in its system, the subensemble is grand canonically distributed. The (many-particle) quantum states allowed to the mode S are states in which S contains a finite number of particles (quanta), n. Denote by ES the energy of one particle residing in the mode S. Then the mode’s total energy when it is in the state |ni (when it p contains n quanta) is En = nES . [Note: for a freely traveling, relativistic electron mode, ES = m2 + p2 where p is the mode’s momentum, px = jh/L (for j some integer) etc.; for a phonon mode with angular eigenfrequency of vibration ω, ES = ~ω; etc.] Since the distribution of the ensemble’s modes among the allowed quantum states is grand canonical, the probability ρn of being in state |ni is [Eq. (3.24c)] −En + µ ñ n(˜ µ − ES ) ρn = const × exp = const × exp , (3.25) kB T kB T where µ ˜ and T are the chemical potential and temperature of the bath of other modes, with which the mode S interacts. Suppose that S is a fermion mode (i.e., its particles have half-integral spin). Then the Pauli exclusion principle dictates that S cannot contain more than one particle; i.e., n can take on only the values 0 and 1. In this case, the normalization constant in the distribution function (3.25) is determined by ρ0 + ρ1 = 1, which implies that ρ0 =

1 , 1 + exp[(˜ µ − ES )/kB T ]

ρ1 =

exp[(˜ µ − ES )/kB T ] . 1 + exp[(˜ µ − ES )/kB T ]

(3.26a)

This is the explicit form of the grand canonical distribution for a fermion mode. For many purposes (including all those in Chap. 2), this full probability distribution is more than one needs. Quite sufficient instead is the mode’s mean occupation number ηS ≡ hni =

1 X n=0

nρn =

1 1 = . exp[(ES − µ ˜)/kB T ] + 1 exp[(ES − µ)/kB T ] + 1

(3.26b)

Here ES = ES − m and µ = µ ˜ − m are the energy of a particle in the mode with rest mass removed, and the chemical potential with rest mass removed — the quantities used in the nonrelativistic (Newtonian) regime. Equation (3.260 is the Fermi-Dirac mean occupation number asserted in Chap. 2 [Eq. (2.22a)], and studied there for the special case of a gas of freely moving, noninteracting fermions. Because our derivation is completely general, we conclude that this mean occupation number and the underlying grand canonical distribution (3.26a) are valid for any mode of a fermion field — for example, the modes for an electron trapped in an external potential well or a magnetic bottle, and the (single-particle) quantum states of an electron in a hydrogen atom. Suppose that S is a boson mode (i.e., its particles have integral spin), so it can contain any nonnegative number of quanta; i.e., n can assume the values 0, 1, 2, 3, . . . . Then the

19 normalization condition (3.25), resulting in

P∞

n=0

ρn = 1 fixes the constant in the Grand canonical distribution

µ ˜ − ES n(˜ µ − ES ) ρn = 1 − exp exp . kB T kB T

(3.27a)

From this grand canonical distribution we can deduce the mean number of bosons in mode S: ηS ≡ hni =

∞ X n=1

ρn =

1 1 = exp[(ES − µ ˜)/kB T ] − 1 exp[(ES − µ)/kB T ] − 1

(3.27b)

in accord with Eq. (2.22b). As for fermions, this Bose-Einstein mean occupation number and underlying grand canonical distribution (3.27a) are valid generally, and not solely for the freely moving bosons of Chap. 2. When the mean occupation number is small, ηS ≪ 1, both the bosonic and the fermionic distribution functions are well approximated by the classical Boltzmann mean occupation number ηS = exp[−(ES − µ ˜)/kB T ] . (3.28) In Sec. 3.10 below we shall explore an important modern application of the Bose-Einstein mean occupation number (3.27): Bose-Einstein condensation of bosonic atoms in a magnetic trap.

3.5

The Microcanonical Ensemble

Turn attention, now, from ensembles of systems that interact with an external, thermal bath (as discussed in Sec. 3.4.1), to an ensemble of identical, precisely closed systems, i.e. systems that have no interactions whatsoever with the external universe. By “identical” we mean that every system in the ensemble has (i) precisely the same set of degrees of freedom, and thus (ii) precisely the same number of degrees of freedom W , (iii) precisely the same Hamiltonian, and (iv) precisely the same values for all the additive constants of motion (E, K1 , K2 , . . .) except perhaps total momentum P and total angular momentum J4 ). Suppose that these systems begin with values of (q, p) that are spread out in some (arbitrary) manner over a hypersurface in phase space that has H(q, p) equal to the common value of energy E. Of course, we cannot choose systems whose energy is precisely equal to E. For most E this would be a set of measure zero. Instead we let the systems occupy a tiny range of energy between E and E + δE and then discover (in Ex. 3.8) that our results are highly insensitive to δE as long as it is extremely small compared with E. 4

Exercise 3.8 below is an example of a microcanonical ensemble where P and J are not precisely fixed, though we do not discuss this in the exercise. The gas atoms in that example are contained inside an impermeable box whose walls cannot exchange energy or atoms with the gas, but obviously can and do exchange momentum and angular momentum when atoms collide with the walls. Because the walls are at rest in our chosen reference frame, the distribution function has U = Ω = 0 and so is independent of P and J [Eq. (3.23) above] rather than having precisely defined values of them.

20 It seems reasonable to expect that this ensemble, after evolving for a time much longer than its longest internal dynamical time scale t ≫ τint , will achieve statistical equilibrium, i.e. will evolve into a state with ∂ρ/∂t = 0. (In the next section we will justify this expectation.) The distribution function ρ will then satisfy the collisionless Boltzmann equation (3.14) with vanishing time derivative, and therefore will be a function only of the Hamiltonian’s additive constants of the motion E, KA . However, we already know that ρ is a delta function in KA and a delta function with a tiny but finite spread in E; and the fact that it cannot depend on any other phase-space quantities then implies ρ is a constant over the hypersurface in phase space that has the prescribed values of KA and E, and is zero everywhere else in phase space. This equilibrium ensemble is called microcanonical. There is a subtle aspect of this microcanonical ensemble that deserves discussion. Suppose we split each system in the ensemble up into a huge number of subsystems that can exchange energy (but for concreteness nothing else) with each other. We thereby obtain a huge number of subensembles, in the manner of Sec. 3.4. The original systems can be regarded as a thermal bath for the subsystems, and correspondingly the subensembles will have canonical distribution functions, ρa = Ce−Ea /kB TQ. One might also expect the subensembles to be statistically independent, so that ρ = a ρa .PHowever, such independence is not possible, since together with additivity of energy E = a Ea , it would imply that ρ = Ce−E/kB T , i.e. that the full ensemble is canonically distributed rather than microcanonical. What is wrong here? The answer is that there in fact is a tiny correlation between the subensembles: If, at some moment of time, subsystem a = 1 happens to have an unusually large energy, then the other subsystems must correspondingly have a little less energy Q than usual; and this very slightly invalidates the statistical-independence relation ρ = a ρa , thereby enabling the full ensemble to be microcanonical even though all its subensembles are canonical. In the language of two-lengthscale expansions, where one expands in the dimensionless ratio (size of subsystems)/(size of full system) [Box 2.3], this correlation is a higher-order correction to statistical independence. We are now in a position to understand more deeply the nature of the thermalized bath that we have invoked to drive ensembles into statistical equilibrium. That bath can be any huge system which contains the systems we are studying as subsystems; and the bath’s thermal equilibrium can be either a microcanonical statistical equilibrium, or a statistical equilibrium involving exponentials of its extensive variables. Exercise 3.8 gives a concrete illustration of the microcanonical ensemble, but we delay presenting it until we have developed some additional concepts that it also illustrates.

3.6

The Ergodic Hypothesis

The ensembles we have been studying are almost always just conceptual ones that do not exist in the real universe. We have introduced them and paid so much attention to them not for their own sake, but because, in the case of statistical-equilibrium ensembles, they can be powerful tools for studying the properties of a single, individual system that really does exist in the universe or in our laboratory.

21 This power comes about because a sequence of “snapshots” of the single system, taken at times separated by sufficiently large intervals ∆t, has a probability distribution ρ (for the snapshots’ instantaneous locations {q, p} in phase space) that is the same as the distribution function ρ of some conceptual, statistical-equilibrium ensemble. If the single system is closed, so its evolution is driven solely by its own Hamiltonian, then the time between snapshots should be ∆t ≫ τint and its snapshots will be (very nearly) microcanonically distributed. If the single system exchanges energy, and only energy, with a thermal bath, then the time between snapshots should be ∆t ≫ τext and its snapshots will be canonically distributed; and similarly for the other types of bath interactions. This property of snapshots is equivalent to the statement that for the individual system, the long-term time average of any function of the system’s location in phase space is equal to the statistical-equilibrium ensemble average: A¯ ≡ lim

T →0

Z

+T /2

A(q(t), p(t)) = hAi ≡ −T /2

X

An ρn .

(3.29)

n

This property comes about because of ergodicity: the individual system, as it evolves, visits each accessible quantum state n for a fraction of the time that is equal to the equilibrium ensemble’s probability ρn . Or, stated more carefully, the system comes sufficiently close to each state n for a sufficient length of time that, for practical purposes, we can approximate it as spending a fraction ρn of its time at n. At first sight, ergodicity may seem obvious. However, it is not a universal property of all systems: one can easily devise idealized examples of non-ergodic behavior (e.g., an elastic billiard ball bouncing around a square billiard table). On the other hand, generic systems, whose properties and parameters are not carefully fine tuned, do typically behave ergodically, but to prove so is one of the most difficult problems in statistical mechanics. We shall assume throughout this book’s discussion of statistical physics that all the systems we study are indeed ergodic; this is called the ergodic hypothesis. Correspondingly, sometimes (for ease of notation) we shall denote the ensemble average with a bar. One must be cautious in practical applications of the ergodic hypothesis: It can sometimes require much longer than one might naively expect for a system to wander sufficiently close to accessible states that A¯ = hAi for observables A of interest. [GIVE REFERENCES TO THE LITERATURE ON THE ERGODIC HYPOTHESIS? E.G. ter Haar (1955).]

3.7 3.7.1

Entropy and Evolution Into Statistical Equilibrium Entropy and the Second Law of Thermodynamics

For any ensemble of systems, whether it is in statistical equilibrium or not, and also whether it is quantum mechanical or not, the ensemble’s entropy S is defined, in words, by the following awful sentence: S is the mean value (ensemble average) of the logarithm of the probability that a random system in the ensemble occupies a given quantum state, summed over states and multiplied by −k. More specifically, denoting the probability that a system is in state n by ρn , the ensemble’s entropy S is the following sum over quantum states (or

22 the equivalent integral over phase space): S ≡ −kB

X

ρn ln ρn .

(3.30)

n

Entropy is a measure of our lack of information about the state of any system chosen at random from an ensemble; see Sec. 3.12 below. In this sense, the entropy can be regarded as a property of a random individual system in the ensemble, as well as of the ensemble itself. If all the systems are in the same quantum state, e.g. in the state n = 17, then ρn = δn,17 so we know precisely the state of any system pulled at random from the ensemble, and Eq. (3.30) dictates that the entropy vanish. Vanishing entropy thus corresponds to a perfect knowledge of the system’s quantum state; it corresponds to the quantum state being pure. By contrast, consider a system in microcanonical statistical equilibrium. In this case, all states are equally likely (ρ is constant), so if there are Nstates states altogether, then ρn = 1/Nstates and the entropy (3.30) takes the form5 S = kB ln Nstates .

(3.31)

The entropy, so defined, has some important properties. One is that whenever the ensemble can be broken up into statistically independent subensembles Q of subsystems (as is generally the case Pfor big systems in statistical equilibrium), so ρ = a ρa , then the entropy is additive, S = a Sa ; see Ex. 3.3. This permits us to regard the entropy, like the systems’ additive constants of the motion, as an extensive variable. A second very important property is the fact that, as an ensemble of systems evolves, its entropy cannot decrease and it generally tends to increase. This is the statistical mechanical version of the second law of thermodynamics. As an example of this second law, consider two different gases (e.g., nitrogen and oxygen) in a container, separated by a thin membrane. One set of gas molecules is constrained to lie on one side of the membrane; the other set lies on the opposite side. The total number of available states Nstates is less than if the membrane is ruptured and the two gases are allowed to mix. The mixed state is accessible from the partitioned state and not vice-versa. When the membrane is removed, the entropy begins to increase in accord with the second law of thermodynamics. Since any ensemble of identical, closed systems will ultimately, after a time t ≫ τint , evolve into microcanonical statistical equilibrium, it must be that the microcanonical distribution function ρ = constant has a larger entropy than any other distribution function that the ensemble could acquire. That this, indeed, is so, can be demonstrated formally as follows: Consider the class of all distribution functions ρ that: (i) vanish unless the constants of motion have the prescribed values E (in the tiny range δE) and KA , (ii) can be non-zero anywhere in the region of phase space, So , where the prescribed values E, KA are taken on, and (iii) are correctly normalized so that Z X ρNstates dΓ = 1 (3.32a) ρn ≡ n

5

So

This formula, with slightly different notation, can be found on Boltzmann’s tomb.

23 P [Eq. (3.8b)]. We ask which ρ in this class gives the largest entropy S = −k n ρn ln ρn . The requirement that the entropy be extremal (stationary) under variations δρ of ρ that preserve the normalization (3.32a) is embodied in the variational principle6 Z (−kρ ln ρ − Λρ)Nstates dΓ = 0 . (3.32b) δ So

Here Λ is a Lagrange multiplier that enforces the normalization (3.32a). Performing the variation, we find that Z (−k ln ρ − k − Λ)δρNstates dΓ = 0 , (3.32c) So

which is satisfied if and only if ρ is a constant, ρ = e−1−Λ/k , independent of location in the allowed region So of phase space; i.e., if and only if ρ is that of the microcanonical ensemble. This calculation actually only shows that the microcanonical ensemble has stationary entropy. To show it is a maximum, one must perform Rthe second variation; i.e., one must compute the second-order contribution of δρ to δS = δ (−kρ ln ρ)Nstates dΓ. That second-order contribution is easily seen to be Z (δρ)2 2 −k δ S= Nstates dΓ < 0 . (3.32d) ρ So Thus, the microcanonical distribution does maximize the entropy, as claimed.

3.7.2

What Causes the Entropy to Increase?

There is an apparent paradox at the heart of statistical mechanics and, at various stages in the development of the subject it has led to confusion and even despair.7 It still creates controversy.8 Its simplest and most direct expression is to ask how can the time-reversible, microscopic laws, encoded in a time-independent Hamiltonian, lead to the remorseless increase of entropy? In search of insight, consider, first, a classical, microcanonical ensemble of precisely closed systems (no interaction at all with the external universe). Assume, for simplicity, that at time t = 0 all the systems are concentrated in a small but finite region of phase space with volume ∆Γ, as shown in Fig. 3.2(a), with ρ = 1/Nstates ∆Γ in the occupied retion, and ρ = 0 everywhere else. As time passes each system evolves under the action of the systems’ common Hamiltonian. As is depicted in Fig. 3.2(b), this evolution distorts the occupied region of phase space; but Liouville’s theorem dictates that the occupied region’s volume, ∆Γ, remain unchanged and, correspondingly, that the ensemble’s entropy Z S = −k (ρ ln ρ)Nstates dΓ = k ln(Nstates ∆Γ) (3.33) remain unchanged. 6

See, e.g., Mathews & Walker (1964), Chap. 12. Boltzmann committed suicide. 8 See, e.g., Hawking (1989) and Penrose (1989). 7

24 p

p

p

k

k

k

∆Γ k

(a)

k

k

q

q

q

(b)

(c)

Fig. 3.2: Evolution of a classical ensemble into statistical equilibrium by means of phase mixing followed by coarse-graining of one’s viewpoint.

How can this be so? The ensemble is supposed to evolve into statistical equilibrium, with its distribution function uniformly spread out over that entire portion of phase space allowed by the Hamiltonian’s constants of motion—a portion of phase space far far larger than ∆Γ—and in the process the entropy is supposed to increase. Fig. 3.2(b,c) resolves the paradox. As time passes, the occupied region becomes more and more distorted. It retains its phase-space volume, but gets strung out into a winding, contorted surface [Fig. 3.2(b)] which (by virtue of the ergodic hypothesis) ultimately passes arbitrarily close to any given point in the region allowed by the constants of motion. This ergodic wandering is called phase mixing. Ultimately the physicist gets tired of keeping track of all these contortions of the occupied region and chooses instead to take a coarse-grained viewpoint that averages over scales larger than the distance between adjacent portions of the occupied surface and thereby regards the ensemble as having become spread over the entire allowed region [Fig. 3.2(c)]. More typically, the physicist will perform a coarse-grained smearing out on some given, constant scale at all times; and once the transverse scale of the ensemble’s lengthening and narrowing phase-space region drops below the smearing scale, its smeared volume and its entropy start to increase. Thus, for an ensemble of closed systems it is the physicist’s choice to perform coarse-grain averaging that causes entropy to increase and causes the ensemble to evolve into statistical equilibrium. The situation is a bit more subtle for an ensemble of systems interacting with a thermal bath. The evolution into statistical equilibrium is driven by the interactions. Thus, it might appear at first sight that the physicist is not, this time, to blame for the entropy increase and the achievement of statistical equilibrium. A deeper examination, however, reveals the physicist’s ultimate culpability. If the physicist were willing to keep track of all those dynamical degrees of freedom of the bath which are influenced by and influence the systems in the ensemble, then the physicist could incorporate these degrees of freedom into the description of the systems and define a phase space volume that obeys Liouville’s theorem and thus does not increase, and an entropy that correspondingly remains constant. However, physicists instead generally choose to ignore the microscopic details of the bath, and that choice forces them to attribute a growing entropy to the ensemble and regard the ensemble as approaching statistical equilibrium.

25 When one reexamines these issues in quantum mechanical language, one discovers that the entropy increase is caused by the physicists’ discarding the quantum mechanical correlations (the off-diagonal terms in the density matrix of Box 3.2) that get built up through the systems’ interaction with the rest of the universe. This discarding of correlations is accomplished through a trace over the external universe’s basis states (Box 3.2), and if the state of system plus universe was originally pure, this tracing (discarding of correlations) makes it mixed. From this viewpoint, then, it is the physicist’s choice to discard correlations with the external universe that causes the entropy increase and the evolution into statistical equilibrium. Heuristically, we can say that the entropy does not increase until the physicist actually (or figuratively) chooses to let it increase by ignoring the rest of the universe. For a simple, pedagogical example, see Box 3.3 and Ex. 3.5. This then raises a most intriguing question. What if we regard the universe as the ultimate microcanonical system? In this case, we might expect that the entropy of the universe will remain identically zero for all time, unless physicists (or other intelligent beings) perform some sort of coarse graining or discard some sort of correlations. However, such coarse graining or discarding are made deeply subtle by the fact that the physicists (or intelligent beings) are themselves part of the system being studied. Further discussion of these questions introduces fascinating, though ill-understood, quantum mechanical and cosmological considerations to which we shall briefly return in Sec. 27.6. **************************** EXERCISES Exercise 3.2 Practice: Estimating Entropy Make rough estimates of the entropy of the following systems, assuming they are in statistical equilibrium: (a) An electron in a hydrogen atom at room temperature (b) A glass of wine (c) The Pacific ocean (d) An ice cube (e) The universe (This is mostly contained in the 3 K microwave background radiation.) Exercise 3.3 Derivation: Additivity of Entropy for Statistically Independent Systems Consider an ensemble of classical systems with Q each system made up of a large number of statistically independent subsystems, so ρ = a ρa . Show that the entropy of the full P ensemble is equal to the sum of the entropies of the subensembles a: S = a Sa . Exercise 3.4 **Example: Entropy of a Thermalized Mode of a Field Consider a mode S of a fermionic or bosonic field, as discussed in Sec. 3.4.3 above. Suppose that an ensemble of identical such modes is in statistical equilibrium with a heat and particle bath and thus is grand-canonically distributed.

26 Box 3.3 Entropy Increase Due to Discarding Quantum Correlations As an idealized, pedagogical example of entropy increase due to physicists’ discarding quantum correlations, consider an electron that interacts with a photon. The electron’s initial quantum state is |ψe i = α| ↑i + β| ↓i, where | ↑i is the state with spin up, | ↓i is that with spin down, and α and β are complex probability amplitudes with |α|2 +|β|2 = 1. The interaction is so arranged that, if the electron spin is up then the photon is put into a positive helicity state |+i, and if down, the photon is put into a negative helicity state |−i. Therefore, after the interaction the combined system of electron plus photon is in the state |Ψi = α| ↑i ⊗ |+i + β| ↓i ⊗ |−i. The photon flies off into the universe leaving the electron isolated. Suppose that we measure some electron observable Aê . The expectation values for the measurement result before and after the interaction with the photon are Before: hψe |Aê |ψe i = |α|2h↑ |Aê | ↑i + |β|2h↓ |Aê | ↓i + α∗ βh↑ |Aê | ↓i + β ∗ αh↓ |Aê | ↑i . (1) After: hΨ|Aê |Ψi = |α|2 h↑ |Aê | ↑i h+|+i +|β|2h↓ |Aê | ↓i h−|−i | {z } | {z } 1

1

+α βh↑ |Aê | ↓i h+|−i +β ∗ αh↓ |Aê | ↑i h−|+i | {z } | {z } ∗

0 2

= |α| h↑ |Aê | ↑i + |β| h↓ |Aê | ↓i . 2

0

(2)

Comparing Eqs. (1) and (2), we see that the correlations with the photon have removed the α∗ β and β ∗ α quantum interference terms from the expectation value. The two pieces α| ↑i and β| ↓i of the electron’s original quantum state |ψe i are said to have decohered. Since the outcomes of all measurements can be expressed in terms of expectation values, this quantum decoherence is “complete” in the sense that no quantum interference whatsoever between the α| ↑i and β| ↓i pieces of the electron state |ψe i will ever be seen again in any measurement on the electron, unless the photon returns, interacts with the electron, and thereby removes its correlations with the electron state. If physicists are confident the photon will never return and the correlations will never be removed, then they are free to change mathematical descriptions of the electron state: Instead of describing the post-interaction state as |Ψi = α| ↑i ⊗ |+i + β| ↓ i ⊗ |−i, the physicists can discard the correlations with the photon, and regard the electron has having classical probabilities ρ↑ = |α|2 for spin up and ρ↓ = |β|2 for spin down — i.e. as being in a mixed state. This new, mixed-state viewpoint leads to the same expectation value (2) for all physical measurements as the old, correlated, pure-state viewpoint |Ψi. The important point for us is that, when discarding the quantum correlations with the photon (with the external universe), the physicist changes the entropy from zero (the value for any pure state including |Ψi) to S = −k(p↑ ln p↑ + p↓ ln p↓ ) = −k(|α|2 ln |α|2 + |β|2 ln |β|2) > 0. The physicist’s change of viewpoint has increased the entropy. In Ex. 3.5 this pedagogical example is reexpressed in terms of the density operator.

27 (a) Show that if S is fermionic, then the ensemble’s entropy is

SS = −k[η ln η + (1 − η) ln(1 − η)] ≃ −kη(ln η − 1) in the classical regime η ≪ 1 ,

(3.34a)

where η is the mode’s fermionic mean occupation number (3.26b). (b) Show that if the mode is bosonic, then the entropy is

SS = k[(η + 1) ln(η + 1) − η ln η] ≃ −kη(ln η − 1) in the classical regime η ≪ 1 ,

(3.34b)

where η is the bosonic mean occupation number (3.27b). Note that in the classical regime, η ≃ e−(E−˜µ)/kB T ≪ 1, the entropy is insensitive to whether the mode is bosonic or fermionic. (c) Explain why the entropy per particle in units of Boltzmann’s constant (which we denote by σ) is σ = SS /ηk. Plot σ as a function of η for fermions and for bosons. Show analytically that for degenerate fermions (η ≃ 1) and for the bosons’ classical-wave regime (η ≫ 1) the entropy per particle is small compared to unity. Exercise 3.5 Problem: Quantum Decoherence and Entropy Increase in terms of the Density Operator Reexpress Box 3.3’s pedagogical example of quantum decoherence and entropy increase in the language of the quantum mechanical density operator ρˆ (Box 3.2). Use this example to explain the meaning of the various statements made in the next-to-last paragraph of Sec. 3.7.2. ****************************

3.8

Grand Canonical Ensemble for an Ideal Monatomic Gas

We now turn to an example that illustrates the formalism we have developed: an ideal, relativistic, classical monatomic gas. In this section and Ex. 3.6 we shall use the grand canonical ensemble to compute the gas’s statistical-equilibrium properties, and in doing so we shall elucidate the connections between statistical mechanics and thermodynamics. In Ex. 3.8 we shall compute this gas’s entropy in the microcanonical ensemble. In Chap. 4 we shall see that the statistical-mechanics tools developed in this section have applicabilities that go

28

Fig. 3.3: An ensemble of gas cells, each with volume V , inside a heat and particle bath.

far beyond the ideal gas, and have straightforward analogs in other statistical-equilibrium ensembles. We consider the ensemble of systems illustrated in Fig. 3.3. Each system in the ensemble is a cell of fixed volume V , with imaginary walls, inside a huge thermal bath of a monatomic gas (e.g. helium or atomic hydrogen or neutrons or photons or ...). Since the cells’ walls are imaginary, the cells can and do exchange energy and atoms with the bath. We presume that the gas particles do not interact with each other (no potential energies in the Hamiltonian) so the gas is ideal . The bath is characterized by a chemical potential µ ˜ for these particles and by a temperature T , and each particle has rest mass m. In most textbooks on statistical mechanics one does not include the particles’ rest mass in the chemical potential; however, we wish to allow the particles to be relativistic (e.g. they might be photons, which move at the speed of light), so we will be careful to include the rest mass in both the chemical potential µ ˜ and in the energy E = (m2 + |p|2 )1/2 = (m2 c4 + |p|2 c2 )1/2 of each particle. We assume that the chemical potential is sufficiently small (sufficiently few particles) that the mean occupation number of the particles’ quantum states, η, is small compared to unity, so they behave classically, which means that µ≡µ ˜ − mc2 ≪ −kB T

(3.35)

[Eq. (2.22d)]. However, we do not require for now that kB T be ≪ mc2 , i.e., that the particles be nonrelativistic. We presume that our ensemble of cells has reached statistical equilibrium with the bath, so its probabilistic distribution function has the grand canonical form (3.24c): Ω − En + µ Ñn −En + µ Ñn 1 = exp . (3.36) ρn = exp Z kB T kB T Here En is the energy of a system that is in the many-particle quantum state |ni, Nn is the Ω/kB T number of particles is the normalization constant P in that quantum state, and 1/Z ≡ e that guarantees n ρn = 1; i.e., Z ≡ exp

−Ω kB T

≡

X n

exp

−En + µ Ñn kB T

.

(3.37)

This normalization constant, whether embodied in Z or in Ω, is a function of the bath’s temperature T and chemical potential µ ˜, and also of the cells’ common volume V (which influences the set of available states |ni). When regarded as a function of T , µ ˜, and V ,

29 the quantity Z(V, µ ˜, T ) is called the gas’s grand partition function, and Ω(T, µ ˜ , V ) is called its grand potential. The following general argument shows that, once one has computed the explicit functional form for the grand potential Ω(V, µ ˜, T ) ,

(3.38)

or equally well for the grand partition function Z(V, µ ˜, T ), one can then derive from it all the thermodynamic properties of the thermally equilibrated gas. The argument is so general that it applies to every grand canonical ensemble of systems, not just to our chosen, monatomic gas; but for simplicity, we shall restrict ourselves to systems made of a single type of particle (which need not for now be monatomic or free of potential-energy interactions). We introduce, as key quantities in the argument, the mean energy and mean number of particles in the ensemble’s systems (cells of Fig. 3.3): X X E≡ ρn En , and N ≡ ρn Nn . (3.39) n

n

(We denote these with bars E¯ rather than brackets hEi for ease of notation.) We now ask how the grand potential will change if the temperature T and chemical potential µ ˜ of the bath and therefore of the ensemble are slowly altered with the common volume V of the cells held fixed. The answer for the change dΩ produced by changes dT and d˜ µ can be derived from the normalization equation (3.37), which we rewrite as X X Ω − En + µ Ñn . (3.40a) 1= ρn = exp k T B n n Since the normalization condition must continue to hold as T and µ ˜ change, the sum in Eq. (3.40a) must be left unchanged, which means that X dΩ + Nn d˜ µ − (Ω − En + µ Ñn )T −1 dT 0= ρn . (3.40b) kB T n P Using n ρn = 1 and expressions (3.39) for the mean energy and the mean number of particles, and rearranging terms, we obtain µ + (Ω − E + µ Ñ)T −1 dT . dΩ = −N d˜

(3.40c)

This change can be reexpressed in a more useful form by introducing the ensemble’s entropy. Inserting expression (3.36) for ρn into the log term in the definition of entropy P S = −k n ρn ln ρn , we obtain X X Ω − En + µ Ñn Ñ Ω−E +µ S = −k ρn ln ρn = −k ρn =− ; (3.40d) k T T B n n or, equivalently Ω = E − TS − µ Ñ .

(3.41)

30 By inserting expression (3.41) into Eq. (3.40c), we obtain µ − SdT . dΩ = −N d˜

(3.42)

Equation (3.42) has several important consequences. The first consequence is the fact that it is actually the First Law of Thermodynamics in disguise. To see this, insert expression (3.41) for Ω into (3.42), thereby bringing it into the form dE = µ ˜dN + T dS,

(3.43)

which is the familiar form of the first law of thermodynamics, but with the “−P dV ” work, associated with a change in a cell’s volume, omitted because the cells have fixed volume V . If we (momentarily) pass from our original grand canonical ensemble, all of whose cells have the same V , µ ˜, and T , to another grand canonical ensemble whose cells have the same µ ˜ and T as before, but have slightly larger volumes, V + dV , then according to Eq. (3.41) with µ ˜ and T fixed, Ω will change by dΩ = dE − T dS − µ ˜dN (where dS and dN are the changes of entropy and mean number of particles induced by the volume change); and by the elementary first law of thermodynamics, dE = −P dV + µ ˜dN + T dS ,

(3.44)

this change of Ω at fixed µ ˜ and T is simply −P dV . Combining with Eq. (3.42), this gives for the change of Ω when all of µ ˜, T , and V change: dΩ = −P dV − Nd˜ µ − SdT .

(3.45)

A second consequence of Eq. (3.42), in the generalized form (3.45), is the fact that it tells us how to compute the mean number of particles, the entropy, and the pressure in terms of µ ˜, T , and V : ∂Ω ∂Ω ∂Ω , S=− , P =− . (3.46) N =− ∂µ ˜ V,T ∂T V,˜µ ∂V µ˜,T A third consequence is the fact that Eqs. (3.46) and (3.41) determine the cell’s full set of ˜, V, P, S} in terms of the ensemble’s three standard thermodynamic parameters, {E, T, N, µ independent quantities T , µ ˜, V . Thus, as soon as the functional form of Ω(T, µ ˜, V ) is known, from it we can generate all the thermodynamic properties of our equilibrated cells of gas. As we shall see in Chap. 4 and shall illustrate in Ex. 3.8, the resulting thermodynamic properties are independent of the equilibrium ensemble used to compute them: grand canonical, canonical, Gibbs, or microcanonical. This is completely general: When confronted by a new type of system, one can compute its thermodynamic properties using whatever equilibrium ensemble is most convenient; the results will be ensemble-independent. Returning to the specific case of a monatomic, relativistic gas, we can compute Ω(T, µ ˜, V ) by carrying out explicitly the sum over states of Eq. (3.37). We first fix (temporarily) the number of particles N in an individual system (cell) and its corresponding number of degrees of freedom W = 3N. We then perform the integral over phase space. Then we sum over

31 all N from 1 to ∞. The details are spelled out in Ex. 3.6 below—a very important exercise for readers who have never done computations of this sort. In the non-relativistic limit, kB T ≪ mc2 , this calculation, with the aid of the “classical particles” condition (3.35), yields Ω(T, µ, V ) = −kB T V

(2πmkB T )3/2 µ/kB T e , h3

(3.47a)

where µ = µ ˜ − mc2 . In the ultra-relativistic limit, kB T ≫ mc2 (e.g. for photons or neutrinos or an ultrarelativistic gas of electrons), the calculation gives Ω(T, µ ˜, V ) = −

8πV (kB T )4 µ˜/kB T e . h3

(3.47b)

In the non-relativistic limit, differentiation of the grand potential (3.47a) gives [with the aid of Eqs. (3.46) and (3.41)] the following thermodynamic relations N

=

(2πmkB T )3/2 µ/kB T e V h3 3/2

P = kB T (2πmkhB3 T )

,

eµ/kB T =

S=

N k T V B

5 2

−

,

µ kB T

kN ,

E = 23 kB T N .

(3.47c)

The corresponding ultrarelativistic expressions are derived in Ex. 3.6(e). Note that the first of the nonrelativistic expressions (3.47c) is the same number density of particles, N /V , as we derived from kinetic theory in the last chapter [Eq. (2.37a)] but with gs = 1 since we have ignored spin degrees of freedom in the above analysis. The second expression says that each particle in a nonrelativistic, thermalized, ideal, classical gas carries an entropy of 5/2 − µ/kB T in units of Boltzmann’s constant. The third and fourth expressions are the standard, nonrelativistic ideal-gas equations of state, P = (N /V )kB T and ǫ ≡ E/V = 3 nkB T [Eq. (2.37b)]. 2 The statistical mechanical tools developed in this chapter not only can reproduce all the thermodynamic results that kinetic theory gives us; they can do much more. For example, as we shall see in the following two chapters (and we learn as a preview in Ex. 3.7), these tools can be used to study statistical fluctuations of the physical quanties, whose mean values we have computed.

3.9

Entropy Per Particle

The entropy per particle in units of Boltzmann’s constant, σ ≡ S/NkB ,

(3.48)

is a very useful concept in both quantitative analyses and order-of-magnitude analyses; see, e.g., Ex. 3.10 and the discussion of entropy in the expanding universe in Sec. 3.11.3. One reason is the second law of thermodynamics. Another is that in the real universe σ generally lies somewhere between 0 and 100 and thus is a natural quantity in terms of which to think and remember. For example, for ionized hydrogen gas in the nonrelativistic, classical

32 10

90 70

8

60

7

50 40

6

30 20

5

10

4 -32

-28

-24

-20

-16 -12 -8 log10 ρ , g/cm

-4

0

ton s deg nond ene ege rat nera e te

80

pro

T, K

9

0 4

8

Fig. 3.4: The proton entropy per proton σp for an ionized hydrogen gas. The electron entropy per electron, σe , is a factor ≃ 10 smaller; the electrons are degenerate when σe ≃ σp − 10 . 1. The protons are degenerate when σp . 1.

domain, Eqs. (3.47c) with µ/kB T from the first equation inserted into the second) say that the protons’ entropy per proton is a factor ≃ 10 higher than the electrons’, σp − σe = 3 ln(mp /me ) = 11.27 ≃ 10, and thus for an ionized hydrogen gas most of the entropy is in 2 the protons. The proton entropy per proton grows logarithmically with decreasing density ρ, " 3/2 # µp 5 2mp 2πmp kB T 5 ; (3.49) = + ln σp = − 2 kB T 2 ρ h2 [Eqs. (3.47c) and (3.48)]. This entropy per proton is plotted as a function of density and temperature in Fig. 3.4. It ranges from σ ≪ 1 in the regime of extreme proton degeneracy (lower right of Fig. 3.4; see Ex. 3.4) to σ ∼ 1 near the onset of proton degeneracy (the boundary of the classical approximation), to about σ ∼ 100 at the lowest density that occurs in the universe, ρ ∼ 10−29 g/cm3 . This range is an example of the fact that the logarithms of almost all dimensionless numbers that occur in nature lie between ∼ −100 and +100. **************************** EXERCISES Exercise 3.6 **Derivation and Example: Grand Canonical Ensemble for Monatomic Gas Consider the cells of ideal, classical, monatomic gas with volume V that reside in the heat and particle bath of Fig. 3.3. Assume initially that the bath’s temperature T has an arbitrary magnitude relative to the rest mass-energy mc2 of the particles, but require kB T ≪ −µ so all the particles behave classically. Ignore the particles’ spin degrees of freedom, if any. (a) The number of particles in a given system can be anything from N = 0 to N = ∞. Restrict attention, for the moment, to a situation in which the cell contains a precise number of particles, N. Explain why the multiplicity is M = N! even though the

33 density is so low that the particles’ wave functions do not overlap, and they are behaving classically; cf. Ex. 3.9 below. (b) Still holding fixed the number of particles in the cell, show that the number of degrees of freedom W , the number density of states in phase space Nstates and the energy EN in the cell are W = 3N ,

Nstates =

1 , N!h3N

EN =

N X

1

(pA 2 + m2 ) 2 ,

(3.50a)

A=1

where pA is the momentum of classical particle number A. P (c) Using Eq. (3.8b) to translate from the formal sum over states n to a sum over W = 3N and an integral over phase space, show that the sum over states (3.37) for the grand partition function becomes ! #N "Z ∞ ∞ N 2 2 21 X V (p + m ) Z = e−Ω/kB T = 4πp2 dp . (3.50b) eµÑ/kB T exp − 3N N!h k T B 0 N =0 (d) Show that in the nonrelativistic limit this gives Eq. (3.47a), and in the extreme relativistic limit it gives Eq. (3.47b). (e) For the extreme relativistic limit use your result (3.47b) for the grand potential Ω(V, T, µ ˜) to derive the mean number of particles N , the pressure P , the entropy S, and the mean energy E as functions of V , µ ˜, and T . Note that for a photon gas, because of the spin degree of freedom, the correct values of N , E and S will be twice as large as you obtain in this calculation. Show that E/V = 3P (a relation valid for any ultrarelativistic gas); and that E/N = 3kB T (which is higher than the 2.70 . . . kB T for black-body radiation, as derived in Ex. 2.5, because in the classical regime of η ≪ 1 photons don’t cluster in the same states at low frequency; that clustering lowers the mean photon energy for black-body radiation.) Exercise 3.7 **Example: Probability Distribution for the Number of Particles in a Cell Suppose that we make a large number of measurements of the number of atoms in one of the systems of the ensemble of Fig. 3.3, (and Ex. 3.6), and that from those measurements we compute a probability pN for that cell to contain N particles. (a) How widely spaced in time must the measurements be to guarantee that the measured probability distribution is the same as that which one computes from the ensemble of cells at a specific moment of time? (b) Assume that the measurements are widely enough separated for this criterion [part (a)] to be satisfied. Use the grand canonical distribution to show that the probability pN is given by the Poisson distribution N

pN = e−N (N /N!) .

(3.51a)

34 (c) Show that the mean number of particles in a single system, as predicted by this distribution, is hNi = N , and the root mean square deviation from this is 1

1

∆N ≡ h(N − N )2 i 2 = N 2 .

(3.51b)

Exercise 3.8 Example: Entropy of a Classical, Nonrelativistic, Monatomic Gas in the Microcanonical Ensemble Consider a microcanonical ensemble of closed cubical cells with volume V , each containing precisely N particles of a monatomic, nonrelativistic, ideal, classical gas, and each containing a nonrelativistic total energy E ≡ E − Nmc2 . For the moment (by contrast with the text’s discussion of the microcanonical ensemble), assume that E is precisely fixed instead of being spread over some tiny but finite range. (a) Explain why the region So of phase space accessible to each system is |xjA | < L/2 ,

N X 1 |pA |2 = E , 2m A=1

(3.52a)

where A labels the particles and L ≡ V 1/3 is the side of the cell. (b) In order to compute the entropy of the microcanonical ensemble, we compute the volume in phase space ∆Γ that it occupies, then multiply by the number density of states in phase space (which is independent of location in phase space), and then take the logarithm. Explain why ∆Γ ≡

N Z Y

A=1

So

dxA dyA dzA dpxA dpyA dpzA

(3.52b)

vanishes. This illustrates the “set of measure zero” statement in the text (second paragraph of Sec. 3.5), which we used to assert that we must allow the systems’ energies to be spread over some tiny but finite range. (c) Now permit the energies of our ensemble’s cells to lie in the tiny but finite range Eo − δEo < E < Eo . Show that ∆Γ = V N [Vν (a) − Vν (a − δa)] ,

(3.52c)

where Vν (a) is the volume of a sphere of radius a in a Euclidean space with ν ≫ 1 dimensions, and where a≡

p 2mEo ,

δa 1 δEo ≡ , a 2 Eo

ν ≡ 3N .

(3.52d)

It can be shown (and you might want to try) that Vν (a) =

π ν/2 ν a (ν/2)!

for ν ≫ 1 .

(3.52e)

35 (d) Show that, so long as 1 ≫ δEo /Eo ≫ 1/N (where N in practice is an exceedingly huge number), Vν (a) − Vν (a − δa) ≃ Vν (a)[1 − e−νδa/a ] ≃ Vν (a) , (3.52f) which is independent of δEo and thus will produce a value for ∆Γ and thence Nstates and S independent of δEo , as desired. From this and with the aid of Stirling’s approximation, n! ≃ 2πn1/2 (n/e)n for large n, and taking account of the multiplicity M = N!, show that the entropy of the microcanonically distributed cells is given by " # 3/2 3/2 V E 4πm 5/2 S(V, E, N) = NkB ln . (3.53) e N N 3h2 This is known as the Sackur-Tetrode equation. (e) Returning to the grand canonical ensemble of the text and Ex. 3.6, show that its entropy (3.47c), when expressed in terms of V /N and E/N, takes precisely the same form as this Sackur-Tetrode equation. This illustrates the fact that the thermodynamic properties of a thermally equilibrated system are independent of the nature of its statistical equilibrium, i.e. independent of the type of bath (if any) that has brought it to equilibrium. Exercise 3.9 **Example: Entropy of Mixing, Indistinguishability of Atoms, and the Gibbs Paradox (a) Consider two identical chambers, each with volume V , separated by an impermeable membrane. Into one chamber put energy E and N molecules of Helium, and into the other, energy E and N molecules of Xenon, with E/N and N/V small enough that the gases are nonrelativistic and nondegenerate. The membrane is ruptured, and the gases mix. Show that this mixing drives the entropy up by an amount ∆S = 2NkB ln 2. [Hint: use the Sackur-Tetrode equation (3.53).] (b) Suppose that energy E and N atoms of Helium are put into both chambers (no Xenon). Show that, when the membrane is ruptured and the gases mix, there is no increase of entropy. Explain why this is reasonable, and explain the relationship to entropy being an extensive variable. (c) Suppose that the N Helium atoms were distinguishable instead of indistinguishable. Show this would mean that, in the microcanonical ensemble, they have N! more states available to themselves, and their entropy would be larger by kB ln N! ≃ kB (N ln N − N); and as a result, the Sackur-Tetrode formula (3.53) would be " # 3/2 3/2 E 4πm Distinguishable particles: S(V, E, N) = NkB ln V e3/2 . N 3h2 (3.54) Before the advent of quantum theory, physicists thought that atoms are distinguishable, and up to an additive multiple of N which they could not compute, they deduced this entropy.

36 (d) Show that if, as pre-quantum physicists believed, atoms were distinguishable, then when the membrane between two identical Helium-filled chambers is ruptured there would be an entropy increase identical to that when the membrane between Helium and Xenon is ruptured: ∆S = 2NkB ln 2 [cf. parts (a) and (b)]. This result, which made pre-quantum physicists rather uncomfortable, was called the Gibbs paradox. Exercise 3.10 Problem: Primordial Element Formation When the expanding universe was t ∼ 10−4 second old, it contained equal numbers of protons and neutrons, plus far more photons, electrons, and positrons, all in statistical equilibrium at a temperature T ∼ 1012 K. As the universe continued to expand, its temperature fell √ as T ∝ 1/ t. Nuclear reactions in this expanding plasma were continually trying to make alpha particles (helium nuclei): 2n + 2p → α + 7 Mev, with the 7 Mev of energy able to go into thermalized photons. Using an order-of-magnitude computation based on entropy per baryon, show that the second law of thermodynamics prevented this helium formation from occuring until the temperature had fallen to some critical value Tcrit , and thereafter the second law encouraged helium formation to occur. Compute Tcrit and the corresponding time t at which helium could start to form. The fraction of the universe’s protons and neutrons that actually became α’s (about 25%) was controlled by the actual rates for the relevant nuclear reactions and by the complete shutoff of these reactions via beta decay of the neutrons at t ∼ (neutron half life) = 11 minutes. For further detail, see Chap. 27. ****************************

3.10

T2 Bose-Einstein Condensate

In this section we shall explore an important modern application of the Bose-Einstein mean occupation number for bosons in statistical equilibrium. Our objectives are: (i) to present an important modern application of the tools developed in this chapter, and (ii) give a nice example of the connections between quantum statistical mechanics (which we shall use in the first 3/4 of this section) and classical statistical mechanics (which we shall use in the last 1/4). For bosons in statistical equilibrium, the mean occupation number η = 1/[e(E−µ)/kB T − 1] diverges as E → 0, if the chemical potential µ vanishes. This divergence is intimately connected to Bose-Einstein condensation, a phenomenon that provides a nice and important example of the concepts we are developing: Consider a dilute atomic gas in the form of a large number N of bosonic atoms spatially confined by a magnetic trap. When the gas is cooled below some critical temperature Tc , µ is negative but gets very close to zero [Eq. (3.56d) below], causing η to become huge near zero energy. This huge η is manifest, physically, by a large number N0 of atoms collecting into the trap’s mode of lowest (vanishing) energy, the Schroedinger equation’s ground state [Eq. (3.58a) and Fig. 3.5(a) below]. This condensation was predicted by Einstein (1925), but an experimental demonstration was not technologically feasible until 1995, when two research groups independently exhibited

37 it: one at JILA (University of Colorado) led by Eric Cornell and Carl Wieman; the other at MIT led by Wolfgang Ketterle. For these experiments, Cornell, Ketterle and Wieman were awarded the 2001 Nobel Prize. Bose-Einstein condensates have great promise as tools for precision measurement technology and for nanotechnology. As a concrete example of Bose-Einstein condensation, we shall analyze an idealized version of one of the early experiments by the JILA group (Ensher et. al. 1996): a gas of 40,000 87 Rb atoms placed in a magnetic trap that we approximate as a spherically symmetric, harmonic oscillator potential9 1 1 V (r) = mωo2 r 2 = mωo2 (x2 + y 2 + z 2 ) . 2 2

(3.55a)

Here x, y, z are Cartesian coordinates and r is radius. The harmonic-oscillator frequency ωo and associated temperature ~ωo /k, the number N of Rubidium atoms trapped in the potential, and the atoms’ rest mass m are ωo /2π = 181Hz ,

~ωo /k = 8.7nK ,

N = 40, 000 ,

m = 1.444 × 10−25 kg .

(3.55b)

Each 87 Rb atom is made from an even number of fermions [Z = 37 electrons, Z = 37 protons, and (A − Z) = 50 neutrons], and the many-particle wave function Ψ for the system of N = 40, 000 atoms is antisymmetric (changes sign) under interchange of each pair of electrons, each pair of protons, and each pair of neutrons. Therefore, when any pair of atoms is interchanged (entailing interchange of an even number of fermion pairs), there is an even number of sign flips in Ψ. This means that Ψ is symmetric (no sign change) under interchange of atoms; i.e., the atoms behave like bosons and must obey Bose-Einstein statistics. Repulsive forces between the atoms have a moderate influence on the experiment, but only a tiny influence on the quantities that we shall compute (see, e.g., Dalfovo et. al. 1999). We shall ignore those forces and treat the atoms as noninteracting. To make contact with our derivation, above, of the Bose-Einstein distribution, we must identify the modes S available to the atoms. Those modes are the energy eigenstates of the Schroedinger equation for a 87 Rb atom in the harmonic-oscillator potential V (r). Solution of the Schroedinger equation [e.g., Complement BVII (pp. 814ff) of Cohen-Tannudji et. al. 1977] reveals that the energy eigenstates can be labeled by the number of quanta of energy {nx , ny , nz } associated with an atom’s motion along the x, y, and z directions; and the energy of the mode {nx , ny , nz } is Enx ,ny ,nz = ~ωo [(nx + 1/2) + (ny + 1/2) + (nz + 1/2)]. We shall simplify subsequent formulas by subtracting 23 ~ωo from all energies and all chemical potentials. This is merely a change in what energy we regard as zero, a change under which our statistical formalism is invariant. Correspondingly, we shall attribute to the mode {nx , ny , nz } the energy Enx ,ny ,nz = ~ωo (nx + ny + nz ). Our calculations will be simplified by lumping together all modes that have the same energy, so we switch from {nx , ny , nz } 9

In the actual experiment the potential was harmonic but prolate spheroidal rather than spherical, i.e. 2 (x2 + y 2 ) + ωz2 z 2 ] with ωz somewhat smaller than ω̟ . For in Cartesian coordinates V (x, y, z) = 12 m[ω̟ pedagogical simplicity we treat the potential as spherical, with ωo set to the geometric mean of the actual 2 frequencies along the three Cartesian axes, ωo = (ω̟ ωz )1/3 . This choice of ωo gives good agreement between our model’s predictions and the prolate-spheroidal predictions, for the quantities that we compute.

38 to q ≡ nx + ny + nz = (the mode’s total number of quanta) as our fundamental quantum number, and we write the mode’s energy as Eq = q~ωo .

(3.56a)

It is straightforward to verify that the number of independent modes with q quanta (the number of independent ways to choose {nx , ny , nz } such that their sum is q) is 21 (q +1)(q +2). Of special interest is the ground-state mode of the potential, {nx , ny , nz } = {0, 0, 0}. This mode has q = 0 and it is unique: (q + 1)(q + 2)/2 = 1. Its energy is E0 = 0, and its Schroedinger wave function is ψo = (πσo2 )−3/4 exp(−r 2 /2σo2 ), so for any atom that happens to be in this ground-state mode, the probability distribution for its location is the Gaussian r 3/2 2 1 r ~ 2 |ψo (r)| = = 0.800µm . (3.56b) exp − 2 , where σo = 2 πσo σo mωo The entire collection of N atoms in the magnetic trap is a system; it interacts with its environoment, which is at temperature T , exchanging energy but nothing else, so a conceptual ensemble consisting of this N-atom system and a huge number of identical systems has the canonical distribution, ρ = C exp(−Etot /kB T ), where Etot is the total energy of all the atoms. The modes labeled by {nx , ny , nz } (or by q and two degeneracy parameters that we have not specified) are subsystems of this system, and because the modes can exchange atoms with each other as well as energy, a conceptual ensemble consisting of any chosen mode and its clones is grand canonically distributed [Eq. (3.27a)], and its mean occupation number is given by the Bose-Einstein formula (3.27b) with ES = q~ωo: ηq =

1 . exp[(q~ωo − µ)/kB T ] − 1

(3.56c)

The temperature T is inherited from the environment (heat bath) in which the atoms live. The chemical potential µ is common to all the modes, and takes on whatever value is required to guarantee that the total number of atoms in the trap is N — or, equivalently, that the sum of the mean occupation number (3.56c) over all the modes is N. For the temperatures of interest to us: (i) The number N0 ≡ η0 of atoms in the groundstate mode q = 0 will be large, N0 ≫ 1, which permits us to expand the exponential in Eq. (3.56c) with q = 0 to obtain N0 = 1/[e−µ/kB T − 1] ≃ −kB T /µ, i.e., µ/kB T = −1/N0 .

(3.56d)

And (ii) the atoms in excited modes will be spread out roughly uniformly over many values of q, so we can approximate the total number of excited-mode atoms by an integral: Z ∞ ∞ 1 2 X (q + 3q + 2)dq (q + 1)(q + 2) 2 N − N0 = ηq ≃ . (3.57a) 2 exp[(q~ω /k T + 1/N )] − 1 o B 0 0 q=1 The integral is dominated by large q’s, so it is a rather good approximation to keep only the q 2 term in the numerator and to neglect the 1/N0 in the exponential: 3 Z ∞ kB T q 2 /2 = ζ(3) . (3.57b) N − N0 ≃ exp(q~ωo/kB T ) − 1 ~ωo 0

39 600 500

N0

400

(a)

(b)

300 200 100 0 0.98

0.985

0.99

0.995

1

0

0

Fig. 3.5: The number N0 of atoms in the Bose-Einstein condensate at the center of a magnetic trap as a function of temperature T ; panel (a) — low resolution, panel (b) — high resolution. The dashed curve in each panel is the prediction (3.58a) of the simple theory presented in the text, using the parameters shown in Eq. (3.55b). The dotted curve in panel (b) is the prediction derived in Ex. 3.11(c). The solid curves are our most accurate prediction (3.63) [Ex. 3.11(d)] including details of the condensation turn on. The large dots are experimental data from Ensher et. al. (1996). The left panel is adapted from Dalfovo et. al. (1999).

Here ζ(3) ≃ 1.2020569... is the Riemann Zeta function [which also appeared in our study of the equation of state of thermalized radiation, Eq. (2.51b)]. It is useful to rewrite Eq. (3.57b) as " 3 # T , (3.58a) N0 = N 1 − Tc0 where Tc0

~ωo = kB

N ζ(3)

1/3

= 280nK ≫ ~ωo /kB = 8.7nK

(3.58b)

is our leading-order approximation to the critical temperature Tc . Obviously, we cannot have a negative number of atoms in the ground-state mode, so Eq. (3.58a) must fail for T > Tc0 . Presumably N0 becomes so small there, that our approximation (3.56d) fails. Figure 3.5(a), from a review article by Dalfovo et. al. (1999), compares our simple prediction (3.58a) for N0 (T ) (dashed curve) with the experimental measurements by the JILA group (Ensher et. al. 1996). Both theory and experiment show that, when one lowers the temperature T through a critical temperature Tc , the atoms suddenly begin to accumulate in large numbers in the ground-state mode. At T ≃ 0.8Tc , half the atoms have condensed into the ground state (“Bose-Einstein condensation”); at T ≃ 0.2Tc almost all are in the ground state. The simple formula (3.58a) is remarkably good at 0 < T < Tc ; and, evidently, Tc0 [Eq. (3.58b)] is a rather good “leading-order” approximation to the critical temperature Tc at which the Bose-Einstein condensate begins to form. Exercise 3.11 and Fig. 3.5(b) use more accurate approximations to Eq. (3.57a) to explore

40 the onset of condensation as T is gradually lowered through the critical temperature. The onset is actually continuous when viewed on a sufficiently fine temperature scale; but on scales 0.01Tc or greater, it appears truly discontinuous. The onset of Bose-Einstein condensation is an example of a phase transition: a sudden (nearly) discontinuous change in the properties of a thermalized system. Among the sudden changes accompanying this phase transition is a (nearly) discontinuous change of the atoms’ specific heat (Ex. 3.12). We shall study some other phase transitions in Chap. 4. Notice that the critical temperature Tc0 is larger, by a factor ≃ N 1/3 = 34, than the temperature ~ωo/k = 8.7 nK associated with the harmonic-oscillator potential. Correspondingly, at the critical temperature there are significant numbers of atoms in modes with q as large as ∼ 34 — which means that the vast majority of the atoms are actually behaving rather classically at T ≃ Tc0 , despite our use of quantum mechanical concepts to analyze them! It is illuminating to compute the spatial distribution of these atoms at the critical temperature using classical techniques. (This distribution could equally well be deduced using the above quantum techniques.) In the near classical, outer region of the trapping potential, the atoms’ number density in phase space N = (2/h3 )η must have the classical, Boltzmanndistribution form (3.28): dN/dVx dVp ∝ exp(−E/kB T ) = exp{−[V (r) + p2 /2m]/kB Tc }, where V (r) is the harmonic-oscillator potential (3.55a). Integrating over momentum space, dVp = 4πp2 dp, we obtain for the number density of atoms n = dN/dVx 2 −r −V (r) , where (3.59a) = exp n(r) ∝ exp kB Tc a2o s √ 2kB Tc 2 ao = = N 1/6 σo = 1.371N 1/6 σo = 8.02σo = 6.4µm . (3.59b) 2 1/6 mωo [ζ(3)] Thus, at the critical temperature, the atoms have an approximately Gaussian spatial distribution with radius ao eight times larger than the trap’s 0.80µm ground-state Gaussian distribution. This size of the distribution gives insight into the origin of the Bose-Einstein condensation: The mean inter-atom spacing at the critical temperature Tc0 is ao /N 1/3 . It is easy to verify √ that this is approximately equal to the typical atom’s deBroglie wavelength λdB = ~/ 2mkB T = ~/(typical momentum) — which is the size of the region that we can think of each atom as occupying. The atomic separation is smaller than this in the core of the atomic cloud, so the atoms there are beginning to overlap and feel each others’ presence, and thereby want to accumulate into the same quantum state (i.e., want to begin condensing). By contrast, the mean separation is larger than λdB in the outer portion of the cloud, so the atoms there continue to behave classically. At temperatures below Tc , the N0 condensed, ground-state-mode atoms have a spatial Gaussian distribution with radius, σo , 8 times smaller than that, ao , of the (classical) excitedstate atoms, so the condensation is manifest, visually, by the growth of a sharply peaked core of atoms at the center of the larger, classical, thermalized cloud. In momentum space the condensed atoms and classical cloud also have Gaussian distributions, with rms momenta pcloud ≃ 8pcondensate . In early experiments, the existence of the condensate was observed by suddenly shutting off the trap and letting the condensate and cloud expand ballistically to sizes pcondensate t/m and pcloud t/m, and then observing them visually. The condensate was

41

Fig. 3.6: Velocity distribution of rubidium atoms in a bose-condensate experiment by Anderson et. al. (1995), as observed by the ballistic expansion method described in the text. In the left frame T is slightly higher than Tc and there is only the classical cloud. In the center frame T is a bit below Tc and the condensate sticks up sharply above the cloud. The right frame, at still lower T , shows almost pure condensate. Figure from Cornell (1996).

revealed as a sharp Gaussian peak, sticking out of the ∼ 8 times larger, classical cloud; Fig. 3.6. **************************** EXERCISES Exercise 3.11 **Example: Onset of Bose-Einstein Condensation By using more accurate approximations to Eq. (3.57a), explore the onset of the condensation near T = Tc0 . More specifically: (a) Approximate the numerator in (3.57a) by q 2 + 3q and keep the 1/N0 term in the exponential. Thereby obtain N − N0 =

kB T ~ωo

3

−1/N0

Li3 (e

Here Lin (u) =

3 )+ 2

kB T ~ωo

2

Li2 (e−1/N0 ) .

∞ X up p=1

pn

(3.60)

(3.61a)

is a special function called the polylogarithm (Lewin 1981), which is known to Mathematica and other symbolic manipulation software, and has the properties Lin (1) = ζ(n) ,

dLin (u) Lin−1 (u) = , du u

where ζ(n) is the Riemann zeta function.

(3.61b)

42 (b) Show that by setting e−1/N0 = 1 and ignoring the second polylogarithm in Eq. (refeq3:NLi), we obtain the leading-order description of the condensation discussed in the text: Eqs. (3.57b) and (3.58). (c) By continuing to set e−1/N0 = 1 but keeping the second polylogarithm, obtain an improved equation for N0 (T ). Your answer should continue to show a discontinuous turn on of the condensation, but at a more accurate, slightly lower critical temperature ζ(2) 1 1 0 = 0.979Tc0 . (3.62) Tc = Tc 1 − ζ(3)2/3 N 1/3 This equation illustrates the fact that the approximations we are making are a large-N expansion, i.e. and expansion in powers of 1/N. (d) By keeping all details of Eq. (3.60) but rewriting it in terms of Tc0 , show that " # 3 2 T Li3 (e−1/N0 ) 3 T 1 N0 = N 1 − − Li2 (e−1/N0 ) . Tc0 ζ(3) 2ζ(3)2/3 N 1/3 Tc0

(3.63)

Solve this numerically to obtain N0 (T /Tc0 ) for N = 40, 000 and plot your result graphically. It should take the form of the solid curves in Fig. 3.5: a continuous turn on of the condensation over the narrow temperature range 0.98Tc0 . T . 0.99Tc0 , i.e. a range ∆T ∼ Tc0 /N 1/3 . In the limit of an arbitrarily large number of atoms, the turn on is instantaneous, as described by Eq. (3.58a). Exercise 3.12 **Problem: Discontinuous change of Specific Heat Analyze the behavior of the atoms’ total energy near the onset of condensation, in the limit of arbitrarily large N — i.e., keeping only the leading order in our 1/N 1/3 expansion and approximating the condensation as turning on discontinuously at T = Tc0 . More specifically: (a) Show that the total energy of the atoms in the magnetic trap is 4 T 3ζ(4) 0 when T < Tc0 , NkB Tc Etotal = ζ(3) Tc0 4 T 3Li4 (eµ/kB T ) 0 NkB Tc when T > Tc0 , Etotal = ζ(3) Tc

(3.64a)

where (at T > Tc ) eµ/kB T is a function of N, T determined by N = (kB T /~ωo )2 Li3 (eµ/kB T ), and µ = 0 at T = Tc0 . Because Lin (1) = ζ(n), this energy is continuous across the critical temperature Tc . (b) Show that the specific heat C = (∂Etotal /∂T )N is discontinuous across the critical temperature Tc0 : 12ζ(4) Nk = 10.80Nk as T → Tc from below, ζ(3) 12ζ(4) 9ζ(3) Nk = 4.227Nk as T → Tc from above. (3.64b) − C = ζ(3) ζ(2) C =

43 Note: for gas contained within the walls of a box, there are two specific heats: CV = (∂E/∂T )N,V when the box volume is held fixed, and CP = (∂E/∂T )N,P when the pressure exerted by the box walls is held fixed. For our trapped atoms, there are no physical walls; the quantity held fixed in place of V or P is the trapping potential V (r). Exercise 3.13 Problem: Momentum Distributions in Condensate Experiments Show that in the Bose-Einstein condensate discussed in the text, the momentum distribution √ for the ground-state-mode atoms is Gaussian with rms momentum pcondensate = ~/σ = ~mω po and that for the classical cloud it is Gaussian with rms momentum pcloud = √ o 2mkB Tc ≃ 2N 1/3 ~mωo ≃ 8pcondensate . Exercise 3.14 Problem: Bose-Einstein Condensation in a Cubical Box Analyze Bose-Einstein condensation in a cubical box with edge lengths L, i.e. for a potential V (x, y, z) that is zero inside the box and infinite outside it. Show, in particular, using the analog of the text’s simplest approximation, that the critical temperature at which condensation begins is " 1/3 #2 N 2π~ 1 , (3.65a) Tc = 2πmkB L ζ(3/2) and the number of atoms in the ground-state condensate, when T < Tc , is " 3/2 # T N0 = N 1 − . Tc

(3.65b)

****************************

3.11

T2 Statistical Mechanics in the Presence of Gravity

Systems with significant gravity behave quite differently, statistical mechanically, than systems without gravity. This has led to much controversy as to whether statistical mechanics can really be applied to gravitating systems. Despite that controversy, statistical mechanics has been applied in the presence of gravity in a variety of ways, with great success, and with important, fundamental conclusions. In this section we sketch some of those applications: to galaxies, black holes, the universe as a whole, and the formation of structure in the universe. Our discussion is intended to give just the flavor of these subjects and not full details, so we shall state a number of things without derivation. This is necessary in part because many of the phenomena we shall describe rely for their justification on general relativity (Part VI) and/or quantum field theory in curved spacetime (not treated in this book).

44

3.11.1

T2 Galaxies

We shall idealize a galaxy as a spherically symmetric cluster of N ∼ 1011 stars each with mass m comparable to that of the sun, m ∼ 1M⊙ .10 The stars are bound under their collective self-gravity. (In fact we know that there is also dark matter present, but we will ignore its effects. In addition, most galaxies including our own are not spherical, a fact we shall also ignore.) Now the characteristic size of a galaxy is R ∼ 10 kiloparsecs (kpc).11 Therefore the magnitude of its Newtonian gravitational potential in units of c2 is |Φ/c2 | ∼ GNm/Rc2 ∼ 10−6 ≪ 1 (where G is Newton’s gravitational constant) and the characteristic speed of the stars in the galaxy is v ∼ (GNm/R)1/2 ∼ 200 km s−1 ≪ c, which means the gravity and stars can be analyzed in the nonrelativistic, Newtonian framework (cf. Fig. 1.1 and Part VI). The time it takes stars to cross the galaxy is τint ∼ 2R/v ∼ 108 yr.12 This time is short compared with the age of a galaxy, ∼ 1010 yr. Galaxies have distant encounters with their neighbors on timescales that can be smaller than their ages but still much longer than τint ; in this sense, they can be thought of as semiclosed systems, weakly coupled to their environments. However, we shall idealize our chosen galaxy as fully closed (no interaction with its environment). Direct collisions between stars are exceedingly rare, and strong twostar gravitational encounters, which happen when the impact parameter is smaller than ∼ Gm/v 2 ∼ R/N, are also neglibibly rare. We can therefore regard each of our galaxy’s stars as moving under a gravitational potential determined by the smoothed-out mass of all the other stars, and can use Hamiltonian dynamics to describe their motions. We imagine that we have an ensemble of such galaxies, all with the same number of stars N, the same mass M = Nm, and the same energy E (in a tiny range δE), and we begin our study of that ensemble by making an order of magnitude estimate of the probability ρ of finding a chosen galaxy from the ensemble in some chosen (single-particle) quantum state. We compute that probability from the corresponding probabilities for its subsystems, individual stars: The phase-space volume available to each star in the galaxy is ∼ R3 (mv)3 , the density of states in each star’s phase space is 1/h3 , the number of available states is the product of these ∼ (Rmv/h)3 , and the probability of the star occupying the chosen state, or any other state, is the reciprocal of this, ∼ (h/Rmv)3 . The probability of the galaxy occupying a state in its phase space is the product of the probabilities for each of its N stars [Eq. (3.17c)]: 3N h ρ∼ ∼ 10−1000 . (3.66) Rmv This very small number suggests that it is somewhat silly of us to use quantum mechanics to normalize the distribution function (i.e., silly to use the probabilistic distribution function ρ), when dealing with a system as classical as a whole galaxy. Silly, perhaps; but dangerous, no. The key point is that, so far as classical statistical mechanics is concerned, the only important feature of ρ is that it is proportional to the classical distribution function Nsys ; its 10

1M⊙ ≡ 2 × 1030 kg. 1kpc≡ 3.1 × 1016 km. 12 1 yr∼ π × 107 s.

11

45 absolute normalization is usually not important, classically. It was this fact that permitted so much progress to be made in statistical mechanics prior to the advent of quantum mechanics. Are real galaxies in statistical equilibrium? To gain insight into this question, we shall estimate the entropy of a galaxy in our ensemble and shall then ask whether that entropy has any chance of being the maximum value allowed to the galaxy’s stars (as it must be if the galaxy is in statistical equilibrium). Obviously, the stars (by contrast with electrons) are distinguishable, so we can assume multiplicity M = 1 when estimating the galaxy’s entropy. Ignoring the (negligible) correlations between stars, the entropy computed by integating ρ ln ρ over the galaxy’s full 6N dimensional phase space is just N times the entropy associated with a single star, which is approximately S ∼ NkB ln(∆Γ/h3 ) [Eqs. (3.33) and (3.8a)], where ∆Γ is the phase space volume over which the star wanders in its ergodic, Hamiltonian-induced motion, i.e. the phase space volume available to the star. We shall express this entropy in terms of the galaxy’s total mass M = Nm and its total nonrelativistic energy E ≡ E − Mc2 ∼ −GM 2 /2R. Since the characteristic stellar speed is v ∼ (GM/R)1/2 , the volume of phase space over which the star wanders is ∆Γ ∼ (mv)3 R3 ∼ (GMm2 R)3/2 ∼ (−G2 M 3 m2 /2E)3/2 , and the entropy therefore is SGalaxy ∼ (M/m)kB ln(∆Γ/h3 ) ∼ (3M/2m)kB ln(−G2 M 3 m2 /2Eh2 ) .

(3.67)

Is this the maximum possible entropy available to the galaxy, given the constraints that its mass be M and its nonrelativistic energy be E? No. Its entropy can be made larger by removing a single star from the galaxy to radius r ≫ R, where the star’s energy is negligible. The entropy of the remaining stars will decrease slightly since the mass M diminishes by m at constant E. However, the entropy associated with the removed star, ∼ (3/2) ln(GMm2 r/h2 ) can be made arbitrarily large by making its radius r arbitrarily large. Therefore, by this thought experiment we discover that galaxies cannot be in a state of maximum entropy at fixed E and M, and they therefore cannot be in a true statistical equilibrium state.13 (One might wonder whether there is entropy associated with the galaxy’s gravitational field, and whether that entropy invalidates our analysis. The answer is no. The gravitational field has no randomness , beyond that of the stars themselves, and thus no entropy; its structure is uniquely determined, via Newton’s field equation, by the stars’ spatial distribution.) In a real galaxy or other star cluster, rare near-encounters between stars in the cluster core cause individual stars to be ejected from the core into distant orbits or be ejected from the cluster altogether. These ejections increase the cluster’s entropy in just the manner of our thought experiment. The core of the galaxy shrinks, a diffuse halo grows, and the total number of stars in the galaxy gradually decreases. This evolution to ever larger entropy is demanded by the laws of statistical mechanics, but by contrast with systems without gravity, it does not bring the cluster to statistical equilibrium. The long-range influence of gravity prevents a true equilibrium from being reached. Ultimately, the cluster’s core may collapse to form a black hole. 13

A true equilibrium can be achieved if the galaxy is enclosed in an idealized spherical box whose walls prevent stars from escaping, or if the galaxy lives in an infinite thermalized bath of stars so that, on average, whenever one star is ejected into a distant orbit in the bath, another gets injected into the galaxy; see, e.g., Ogorodnikov (1965) and Lynden-Bell (1967). However, in the real universe galaxies are not surrounded by walls or by thermalized star baths.

46

3.11.2

T2 Black Holes

Quantum field theory predicts that, near the horizon of a black hole, the vacuum fluctuations of quantized fields behave thermally, as seen by stationary (non-infalling) observers. More specifically, such observers see the horizon surrounded by an atmosphere that is in statistical equilibrium (a “thermalized” atmosphere) and that rotates with the same angular velocity ΩH as the hole’s horizon.14 The atmosphere contains all types of particles that can exist in nature. Very few of the particles manage to escape from the hole’s gravitational pull; most emerge from the horizon, fly up to some maximum height, then fall back down to the horizon. Only if they start out moving almost vertically upward (i.e., with near zero angular momentum) do they have any hope to escape. The few that do escape make up a tiny trickle of “Hawking radiation” that will ultimately cause the black hole to evaporate, unless it grows more rapidly due to infall of material from the external universe. In discussing the distribution function for the hole’s thermalized, rotating atmosphere, one must take account of the fact (Part VI) that the locally measured energy of a particle decreases as it climbs out of the hole’s gravitational field; one does so by attributing to the particle the energy that it would ultimately have if it were to escape from the hole’s gravitational grip. This is called the particle’s “redshifted” energy and is denoted by E∞ . ˆ H of the particle’s This E∞ is conserved along the particle’s world line, as is the projection j· Ω ˆ orbital angular momentum j along the hole’s spin axis (unit direction ΩH ). The hole’s horizon behaves like the wall of a black-body cavity: Into each upgoing mode a of any and every quantum field that can exist in nature, it deposits particles that are thrmalized with (redshifted) temperature TH , vanishing chemical potential, and angular velocity ΩH . As a result, the mode’s distribution function (probability of finding Na particles in it with net redshifted energy Ea ∞ = Na × (redshifted energy of one quantum in the mode) ˆ H = Na × (angular momentum of one and net axial component of angular momentum ja · Ω quantum in the mode) is [Eq. (3.21)] −Ea ∞ + ΩH · ja ρa = C exp . (3.68) kB TH The distribution function for the entire thermalized atmosphere (made of all modes that Q emerge from the horizon) is, of course, ρ = a ρa . (“Ingoing” modes, which originate at infinity, i.e. far from the black hole, are not thermalized; they contain whatever the Universe chooses to send toward the hole.) Because Ea ∞ is the redshifted energy in mode a, TH is similarly a redshifted temperature: it is the temperature that the Hawking radiation exhibits when it has escaped from the hole’s gravitational grip. Near the horizon the locally measured atmospheric temperature is gravitationally blue-shifted to much higher values than TH . The temperature TH and angular velocity ΩH , like all properties of a black hole, are determined completely by the hole’s spin angular momentum JH and its mass MH ; and, to 14

This remarkable conclusion, due to Stephen Hawking, William Unruh, and Paul Davies, is discussed pedagogically in a book by Thorne, Price and Macdonald (1986) and more rigorously in a book by Wald (1994). For detailed but fairly brief analyses along the lines of this section, see Zurek and Thorne (1986) and Frolov and Page (1994). For a review of the literature on black-hole thermodynamics and of conundrums swept under the rug in our simple-minded discussion, see, Wald (2001),

47 within factors of order unity, they have magnitudes TH ∼

~ 6 × 10−8 K ≃ , 8πGMH /c3 MH /M⊙

ΩH ∼

JH . MH (2GMH /c2 )2

(3.69)

Notice how small the hole’s temperature is, if its mass is greater than or of order M⊙ . For such holes the thermal atmosphere is of no practical interest, though it has deep implications for fundamental physics. Only for tiny black holes (that might, e.g., have been formed in the big bang) is TH high enough to be physically interesting. Suppose that the black hole evolves much more rapidly by accreting matter than by emitting Hawking radiation. Then the evolution of its entropy can be deduced from the first law of thermodynamics for its atmosphere. By techniques analogous to those in Sec. 3.8 [Eqs. (3.36)–(3.45)] above, one can argue that the atmosphere’s equilibrium distribution (3.68) implies the following form for the first law (where we set c = 1): dMH = TH dSH + ΩH · dJH .

(3.70)

Here dMH is the change of the hole’s mass due to the accretion (with each infalling particle contributing its E∞ to dMH ), dJH is the change of the hole’s spin angular momentum due to the accretion (with each infalling particle contributing its j), and dSH is the increase of the black hole’s entropy. Because this first law can be deduced (as described above) via the techniques of statistical mechanics, it can be argued (e.g. Zurek and Thorne 1986) that the hole’s entropy increase has the standard statistical mechanical origin and interpretation: If Nstates is the total number of quantum states that the infalling material could have been in (subject only to the requirement that the total infalling mass-energy be dMH and total infalling angular momentum be dJH ), then dSH = kB log Nstates [cf. Eq. (3.31)]. In other words, the hole’s entropy increases by kB times the logarithm of the number of quantum mechanically different ways that we could have produced its changes of mass and angular momentum, dMH and dJH . Correspondingly, we can regard the hole’s total entropy as kB times the logarithm of the number of ways in which it could have been made. That number of ways is enormous, and correspondingly the hole’s entropy is enormous: The above analysis, when carried out in full detail, reveals that the entropy is 2 AH M 77 SH = k B (3.71) ∼ 1 × 10 kB M⊙ 4lP 2 p where AH ∼ 4π(2GMH /c2 ) is the surface area of the hole’s horizon and lP = G~/c3 = 1.616 × 10−33 cm is the Planck length — a result first speculated by Bekenstein and proved by Hawking. What is it about a black hole that leads to this peculiar thermal behavior and enormous entropy? Why is a hole so different from a star or galaxy? The answer lies in the black-hole horizon and the fact that things which fall inward through the horizon cannot get back out. In quantum field theory it is the horizon that produces the thermal behavior. In statistical mechanics it is the horizon that produces the loss of information about how the black-hole

48 was made and the corresponding entropy increase. In this sense, the horizon for a black hole plays a role analogous to coarse graining in conventional classical statistical mechanics.15 The above statistical mechanical description of a hole’s atmosphere and thermal behavior is based on the laws of quantum field theory in curved spacetime—laws in which the atmosphere’s fields (electromagnetic, neutrino, etc.) are quantized but the hole itself is not governed by the laws of quantum mechanics. As of 2008, a much deeper understanding is arising from string theory—the most promising approach to quantum gravity and to quantization of black holes. Indeed, the thermal properties of black holes, and most especially their entropy, is a powerful testing ground for candidate theories of quantum gravity.

3.11.3

The Universe16

Observations and theory agree that the universe began in a highly thermalized state, with all types of particles (except gravitons) in mutual statistical equilibrium. As the universe expanded, the energies of the particles and the equilibrium temperature all got cosmologically redshifted by the same amount, thereby maintaining the equilibrium; see Part VI. In the absence of gravity, this expanding equilibrium state would have had maximum entropy, and the second law of thermodynamics would have forbidden the development of galaxies, stars, and people. Fortunately, gravity exists and through its long-range force it produced condensations with an attendant increase of entropy. The enormous power of gravity to do this is epitomized by the following: After an early epoch of “inflation” (Part VI), the universe’s expansion settled into a more leisurely radiation-dominated form in which, when its age was t, a region with radius R ∼ t was able to communicate with itself. Here and below we set c = 1. As we p shall show in Part VI, of this region, was given by kB T ∼ EP tP /t, where p at time t, −43 p the temperature 19 5 5 s are the Planck energy and Planck EP = ~c /G ∼ 10 Gev and tP = ~G/c ∼ 10 time.17 As we shall also show in Part VI, the number of particles inside the communicable region was N ∼ (t/tP )3/2 (∼ 1091 today). Since each particle, in order of magnitude, carried an entropy kB (cf. Sec. 3.9), the total entropy in the communicable region at time t was S ∼ kB (t/tP )3/2 . This actual entropy should be compared with the maximum entropy that could be produced, in the communicable region, with the aid of gravity. That maximum entropy would arise if we were to collapse the entire mass M ∼ NkB T ∼ EP (t/tP ) of the communicable region into a black hole, thereby giving up all information about the content of 15

It is not completely clear in 2008 whether the information about what fell into the hole gets completely lost, in principle as well as in practice, or whether—as with coarse-graining—the loss of information is the physicist’s fault. There is strong evidence that the information is somehow retained in the black-hole atmosphere and we have just coarse-grained it away by our faulty understanding of the relevant black-hole quantum mechanics. 16 This subsection is based on Frautschi (1982), which contains many fascinating ideas and facts about entropy in the expanding universe. 17 This formula predicts a temperature today for the cosmic microwave radiation To ∼ 100 K that is too high by a factor ∼ 30 because the details of the expansion changed, altering this formula a bit, when the universe was about a million years old and nonrelativistic matter became the dominant source of gravity rather than radiation; see Part VI.

49 the communicable region except its mass and (approximately vanishing) angular momentum: Smax ∼ kB (GM)2 /lP 2 ∼ kB (t/tP )2 . Notice that near the Planck time (when space and time as we know them presumably were coming into existence), the actual entropy in the communicable region and the maximum achievable entropy were of the same order, S ∼ Smax ∼ kB , so it was difficult to produce gravitational condensations. However, as the universe expanded, the actual entropy S ∼ kB (t/tP )3/2 (in the radiation-dominated era) lagged further and further behind the maximum achievable entropy Smax ∼ kB (t/tP )2 , so gravitational condensations became more and more favorable, statistically. Ultimately, those condensations did occur under the inexorable driving force of gravity, producing (with aid from the other fundamental forces) galaxies, stars and people.

3.11.4

T2 Structure Formation in the Expanding Universe: Violent Relaxation and Phase Mixing

The formation of stars and galaxies (“structure”) by gravitational condensation provides a nice illustration of the phase mixing and coarse graining that underly the second law of thermodynamics (Sec. 3.7.2 above): It is believed that a galaxy’s stars formed out of the universe’s almost uniform, expanding gas, when slight overdensities (presumably irregular in shape) stopped expanding and began to contract under their mutual gravitational attraction. This gas had little internal motion, so the stars were formed with very small relative velocities. Correspondingly, although the physical volume Vx occupied by the galaxy’s N stars was initially somewhat larger than its volume today, its stars’ kinetic-theory momentum-space volume Vp was far smaller than today. Translated R 3Ninto theNlanguage of an ensemble of such galaxies, the initial coordinatespace volume d x ∼ Vx occupied by the R ensemble’s galaxies was moderately larger than today, while its momentum-space volume d3N p ∼ Vp N was far smaller. The phase-space volume must therefore have increased considerably during the galaxy formation—with the increase due to a big increase in the relative momenta of neighboring stars. For this to occur, it was necessary that the stars changed their relative energies during the collapse, and this required a time-dependent Hamiltonian. In other words the gravitational potential Φ felt by the stars must have varied rapidly so that the individual stellar energies would vary according to ∂H ∂Φ dE = =m . (3.72) dt ∂t ∂t The largest changes of energy occurred when the galaxy was collapsing dynamically, so the potential changed significantly on the time it took stars to cross the galaxy, τint ∼ R/v. Numerical simulations show that this energy transfer was highly efficient. This process is known as violent relaxation. Although violent relaxation could create the observed stellar distribution functions, it was not by itself a means of diluting the phase-space density, since Liouville’s theorem still applied. The mechanism that changed the phase-space density was phase mixing and coarsegraining (Sec. 3.7.2 above). During the initial collapse, when the stars were newly formed, they could be thought of as following highly perturbed radial orbits. The orbits of nearby

50 stars were somewhat similar, though not identical. This means that small elements of phase space became highly contorted as the stars moved along their phase-space paths. Let us make a simple model of this process by assuming the individual stars initially populated a fraction f ≪ 1 of the final occupied phase-space volume Vfinal . After one dynamical timescale τint ∼ R/v, this small volume f Vfinal is (presumably) deformed into a convoluted surface that folds back upon itself once or twice, while still occupying the same volume f Vfinal . After n dynamical times, there are ∼ 2n such folds (cf. Fig. 3.2 (b) above). After n ∼ − log2 f dynamical timescales the spacing between folds becomes comparable with the characteristic thickness of this convoluted surface and it is no longer practical to distinguish the original distribution function. Coarse-graining has been accomplished for all practical purposes; only a pathological physicist would resist it and insist on trying to continue keeping track of which contorted phase-space regions have the original high density and which do not. For a galaxy we might expect that f ∼ 10−3 and so this natural coarse-graining can occur in a time ∼ − log2 10−3 ∼ 10 τint ∼ 109 yr, which is 10 times shorter than the present age of galaxies. It need therefore not be a surprise that the galaxy we know best, our own Milky Way, exhibits no obvious vestigial trace of its initial high-density (low phase-space-volume) distribution function.

3.12

T2 Entropy and Information

3.12.1

T2 Information Gained When Measuring the State of a System in a Microcanonical Ensemble

In Sec. 3.7 above, we said that “Entropy is a measure of our lack of information about the state of any system chosen at random from an ensemble”. In this section we shall make this heuristic statement precise by introducing a precise definition of information. Consider a microcanonical ensemble of identical systems. Each system can reside in any one of a finite number, Nstates , of quantum states, which we label by integers n = 1, 2, 3, . . . , Nstates . Because the ensemble is microcanonical, all Nstates states are equally probable; they have probabilities ρn = 1/Nstates . P Therefore, the entropy of any system chosen at random from this ensemble is S = −kB n ρn = kB ln Nstates [Eqs. (3.30) and (3.31)]. Suppose, now, that we measure the state of our chosen system, and find it to be (for example) state number 238 out of the Nstates equally probable states. How much information have we gained? For this thought experiment, and more generally (see below), the amount of information gained, expressed in bits, is defined to be the minimum number of binary digits required to distinguish the measured state from all the other Nstates states that the system could have been in. To evaluate this information gain, we label each state n by the number n−1 written in binary code (state n = 1 is labeled by {000}, state n = 2 is labeled by {001}, 3 is {010}, 4 is {011}, 5 is {100}, 6 is {101}, 7 is {110}, 8 is {111}, etc.). If Nstates = 4, then the number of binary digits needed is 2 (the leading 0 in the enumeration above can be dropped), so in measuring the system’s state we gain 2 bits of information. If Nstates = 8, the number of binary digits needed is 3, so our measurement gives us 3 bits of information. In general, we need log2 Nstates binary digits to distinguish the states from each other, so the

51 amount of information gained in measuring the system’s state is the logarithm to the base 2 of the number of states the system could have been in: I = log2 Nstates = (1/ ln 2) ln Nstates = 1.4427 ln Nstates .

(3.73a)

Notice that this information gain is proportional to the entropy S = −kB ln Nstates of the system before the measurement was made: I = S/(kB ln 2) .

(3.73b)

The measurement reduces the system’s entropy from S = −kB ln Nstates to zero (and increases the entropy of the rest of the universe by at least this amount), and it gives us I = S/(kB ln 2) bits of information about the system. We shall discover below that this entropy/information relationship is true of measurements made on a system drawn from any ensemble, not just a microcanonical ensemble. But first we must develop a more complete understanding of information.

3.12.2

T2 Information in Communication Theory

The definition of “the amount of information I gained in a measurement” was formulated by Claude Shannon (1948) in the context of his laying the foundations of communication theory. Communication theory deals (among other things) with the problem of how, most efficiently, to encode a message as a binary string (a string of 0’s and 1’s) in order to transmit it across a communication channel that transports binary signals. Shannon defined the information in a message as the number of bits required, in the most compressed such encoding, to distinguish this message from all other messages that might be transmitted. Shannon focused on messages that, before encoding, consist of a sequence of symbols. For an English-language message, each symbol might be a single character (a letter A B,C,...Z or a space, N = 27 distinct symbols in all), and a specific message might be the following sequence of “length” L = 17 characters: “I DO NOT UNDERSTAND”. Alternatively, each symbol might be an English word (approximately N = 12, 000 words in a person’s vocabulary) and a specific message might be the following sequence of length L = 4 words: {I}{DO}{NOT}{UNDERSTAND}. Suppose, for simplicity, that in the possible messages, all N distinct symbols appear with equal frequency (this, of course, is not the case for English-language messages), and suppose that the length of some specific message (its number of symbols) is L. Then, the number of bits needed to encode this message and distinguish it from all other possible messages of length L is I = log2 N L = L log2 N . (3.74a) In other words, the average number of bits per symbol (the average amount of information per symbol) is I¯ = log2 N . (3.74b) If there are only 2 possible symbols, we have one bit per symbol in our message. If there are 4 possible (equally likely) symbols, we have two bits per symbol, etc.

52 It is usually the case that not all symbols occur with the same frequency in the allowed messages. For example, in English messages the letter “A” occurs with a frequency pA ≃ 0.07, while the letter “Z” occurs with the much smaller frequency pZ ≃ 0.001. All English messages, of character length L ≫ N = 27, constructed by a typical English speaker, will have these frequencies of occurence for “A” and “Z”. Any purported message with frequencies for “A” and “Z” differing substantially from 0.07 and 0.001will not be real English messages, and thus need not be included in the binary encoding of messages. As a result, it turns out that the most efficient binary encoding of English messages (the most compressed encoding) will use an average number of bits per character somewhat less than log2 N = log2 27 = 4.755. In other words, the average information per character in English language messages is somewhat less than log2 27. To deduce the average information per character when the characters do not all occur with the same frequency, we shall begin with a simple example, one where the number of distinct characters to be used in the message is just N = 2 (the characters “B” and “E”); and their frequencies of occurence in very long allowed messages are pB = 3/5, pE = 2/5. For example, in the case of messages with length L = 100, the message EBBEEBBBBEBBBBBBBBBEBEBBBEEEEBBBEB BEEEEEBEEBEEEEEEBBEBBBBBEBBBBBEBBE BBBEBBBEEBBBEBBBBBBBEBBBEBBEEBEB

(3.75a)

contains 63 B’s and 37 E’s, and thus (to within statistical variations) has the correct frequencies pB ≃ 0.6, pE ≃ 0.4 to be an allowed message. By contrast, the message BBBBBBBBBBBBBBEBBBBBBBBBBBBBBBBBBB BBBBBBBBBBBEBBBBBBBBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBEBBBBBBBBBBBBBBBB

(3.75b)

contains 97 B’s and 3 E’s and thus is not an allowed message. To deduce the number of allowed messages and thence the number of bits required to encode them, distinguishably, we map this problem of 60% probable B’s and 40% probable E’s onto the problem of messages with 5 equally probable symbols, as follows: Let the set of distinct symbols be the letters “a”, “b”, “c”, “y”, “z”, all occuring in allowed messages equally frequently, pa = pb = pc = py = pz = 1/5. An example of an allowed message, then, is zcczzcaabzccbabcccczaybacyzyzcbbyc ayyyyyayzcyzzzzzcczacbabybbbcczabz bbbybaazybccybaccabazacbzbayycyc

(3.75c)

We map each such message from our new message set into one from the previous message set by identifying “a”, “b”, and “c” as from the Beginning of the alphabet (and thus converting them into “B”), and identifying “y” and “z” as from the End of the alphabet (and thus converting them into “E”). Our message (3.75c) from the new message set then maps into the message (3.75a) from the old set. This mapping enables us to deduce the number of

53 bits required to encode the old messages, with their unequal frequencies pE = 3/5 and pB = 2/5, from the number required for the new messages, with their equal frequencies pa = pb = . . . = pz = 1/5: The number of bits needed to encode the new messages, with length L ≫ Nnew = 5, is I = L log2 5. Now, the characters “a”, “b”, and “c” from the beginning of the alphabet occur 3 L times in each new message (in the limit that L is arbitrarily large). When converting 5 the new message to an old one, we no longer need to distinguish between “a”, “b” and “c”, so we no longer need the 53 L log2 3 bits that were being used to make those distinctions. Similarly, the number of bits we no longer need when we drop the distinction between our two end-of-alphabet characters “y” and “z” is 25 L log2 2. As a result, the number of bits still needed, to distinguish between old messages (messages with “B” occurring 3/5 of the time and “E” occurring 2/5 of the time) is 2 3 3 2 2 3 I = L log2 5 − L log2 3 − log2 2 = L − log2 − log2 2 5 5 5 5 5 = L(−pB log2 pB − pE log2 pE ) (3.75d) A straightforward generalization of this argument (Ex. 3.15) shows that, whenever one constructs messages with very large length L ≫ N from a pool of N symbols that occur with frequencies p1 , p2 , ... , pN , the minimum number of bits required to distinguish all the allowed messages from each other (i.e., the amount of information in each message) is I=L

N X

−pn log2 pn ;

(3.76)

n=1

so the average information per symbol in the message is I¯ =

N X n=1

3.12.3

−pn log2 pn = (1/ ln 2)

N X

−pn ln pn .

(3.77)

n=1

T2 Examples of Information Content

Notice the similarity of the general information formula (3.76) to the general formula (3.30) for the entropy of an arbitrary ensemble. This similarity has a deep consequence: Consider an arbitrary ensemble of systems in statistical mechanics. As usual, label the quantum states available to each system by the integer n = 1, 2, . . . , Nstates , and denote by pn the probability that any chosen system in the ensemble will turn out to be in state n. Now, select one system out of the ensemble and measure its quantum state n1 ; select a second system and measure its state, n2 ; and continue this process until some large number L >> Nstates of systems have been measured. The sequence of measurement results {n1 , n2 , . . . , nL } can be regarded as a message. The minimum number of bits needed to distinguish this message from all other possible such messages is given by the general information formula (3.76). This is the total information in the L system measurements. Correspondingly, the amount of information we get from measuring the state of one system (the average

54 information per measurement) is given by Eq. (3.77). This acquired information is related to the entropy of the system before measurement (3.30) by the same standard formula (3.73a) as we obtained earlier for the special case of the microcanonical ensemble: I¯ = S/(kB ln 2) .

(3.78)

For another example of information content, we return to English-language messages (Shannon 1948). Evaluating the information content of a long English message is a very difficult task, since it requires figuring out how to compress the message most compactly. We shall make a sequence of estimates: A crude initial estimate of the information per character is that obtained by idealizing all the characters as occurring equally frequently: I¯ = log2 27 ≃ 4.76 bits per character. This is an overestimate because the 27 characters actually Poccur with very different frequencies. We could get a better estimate by evaluating I¯ = 27 n=1 −pn log2 pn taking account of the characters’ varying frequencies pn (the result is about I¯ = 4.1); but we can do even better by converting from characters as our symbols to words. The average number of characters in an English word is about 4.5 letters plus one space, or 5.5 characters per word. We can use this number to convert from characters as our symbols to words. The number of words in a typical English speaker’s vocabulary is roughly 12,000. If we idealize these 12,000 words as occuring with the same frequencies, then the information per word is log2 12, 000 ≃ 13.6, so the information per character is I¯ = (1/5.5) log2 12, 000 ≃ 2.46. This is much smaller than our initial estimate. A still better estimate is obtained by using Zipf’s (1935) approximation pn = 0.1/n to the frequencies of occurrence of the words in English-language messages. [The most frequently occuring word is “THE”, and its frequency is about 0.1 (one in 10 words is ”THE” in a long message). The next most frequent words are “OF”, “AND”, P and “TO”; their frequencies are about 0.1/2, 0.1/3, and 0.1/4; and so forth.] To ensure that N n=1 pn = 1 for Zipf’s approximation, we require that the number of words be N = 12, 367. We then obtain, P12,367 as our improved estimate of the information per word, n=1 (−0.1/n) log2 (0.1/n) = 9.72, corresponding to an information per character I¯ ≃ 9.72/5.5 = 1.77. This is substantially smaller than our initial, crudest estimate of 4.76 .

3.12.4

T2 Some Properties of Information

Because ofPthe similarity of the general formulas for information and entropy (both proportional to n −pn ln pn ), information has very similar properties to entropy. In particular (Ex. 3.16): 1. Information is additive (just as entropy is additive). The information in two successive, independent messages is the sum of the information in each message. 2. If the frequencies of occurrence of the symbols in a message are pn = 0 for all symbols except one, which has pn = 1, then the message contains zero information. This is analogous to the vanishing entropy when all states have zero probability except for one, which has unit probability.

55 3. For a message L symbols long, whose symbols are drawn from a pool of N distinct symbols, the information content is maximized if the probabilities of the symbols are all equal, pn = 1/N; and the maximal value of the information is I = L log2 N. This is analogous to the microcanonical ensemble having maximal entropy.

3.12.5

T2 Capacity of Communication Channels; Erasing Information from Computer Memories

A noiseless communication channel has a maximum rate (number of bits per second) at which it can transport information. This rate is called the channel capacity and is denoted C. When one subscribes to a cable internet connection in the United States, one typically pays a monthly fee that depends on the connection’s channel capacity; for example, in Pasadena, California in autumn 2003 the fee was $29.99 per month for a connection with capacity C = 384 kilobytes per second = 3.072 megabits per second, and $39.99 for C = 16.384 megabits per second. It should be obvious from the way we have defined the information I in a message, that the maximum rate at which we can transmit optimally encoded messages, each with information content I, is C/I messages per second. This is called Shannon’s theorem. When a communication channel is noisy (as all channels actually are), for high-confidence transmission of messages one must put some specially designed redundancy into one’s encoding. With cleverness, one can thereby identify and correct errors in a received message, caused by the noise (“error-correcting code”); see, e.g., Shannon (1948), Raisbeck (1963), and Pierce (1980).18 The redundancy needed for such error identification and correction reduces the channel’s capacity. As an example, consider a symmetric binary channel : one that carries messages made of 0’s and 1’s, with equal frequency p0 = p1 = 0.5, and whose noise randomly converts a 0 into a 1 with some small error probability pe , and randomly converts a 1 into 0 with that same probability pe . Then one can show (e.g., Raisbeck 1963, Pierce 1980) that the channel capacity is reduced, by the need to find and correct for these errors, by a factor ¯ e )] , C = Cnoiseless [1 − I(p

¯ e ) = −pe log2 pe − (1 − pe ) log2 (1 − pe ) . where I(p

(3.79)

Note that the fractional reduction of capacity is by the amount of information per symbol in messages made from symbols with frequencies equal to the probabilities pe of making an error and 1 − pe of not making an error — a remarkable and nontrivial conclusion! This is one of many important results in communication theory. Information is also a key concept in the theory of computation. As an important example of the relationship of information to entropy, we cite Landauer’s (1961, 1991) theorem: In a computer, when one erases L bits of information from memory, one necessarily increases the entropy of the memory and its environment by at least ∆S = LkB ln 2 and correspondingly one increases the thermal energy (heat) of the memory and environment by ∆Q = T ∆S = LkB T ln 2 (Ex. 3.19). 18

A common form of error-correcting code is based on “parity checks”.

56 **************************** EXERCISES Exercise 3.15 Derivation: Information Per Symbol When Symbols Are Not Equally Probable Derive Eq. (3.76) for the average number of bits per symbol in a long message constructed from N distinct symbols whose frequency of occurence are pn . [Hint: generalize the text’s derivation of Eq. (3.75d).] Exercise 3.16 Derivation: Properties of Information Prove the properties of entropy enumerated in Sec. 3.12.4. Exercise 3.17 Problem: Information Per Symbol for Messages Built from Two Symbols Consider messages of length L ≫ 2 constructed from just two symbols, N = 2, which ¯ in such occur with frequencies p and (1 − p). Plot the average information per symbol I(p) messages, as a function of p. Explain why your plot has a maximum I¯ = 1 when p = 1/2, and has I¯ = 0 when p = 0 and when p = 1. (Relate these properties to the general properties of information.) Exercise 3.18 Problem: Information in a Sequence of Dice Throws Two dice are thrown randomly, and the sum of the dots showing on the upper faces is computed. This sum (an integer n in the range 2 ≤ n ≤ 12) constitutes a symbol, and the sequence of results of L ≫ 12 throws is a message. Show that the amount of information per symbol in this message is I¯ ≃ 3.2744. Exercise 3.19 Derivation: Landauer’s Theorem Derive, or at least give a plausibility argument for, Landauer’s theorem (stated at the end of Sec. 3.12.5).

****************************

Bibliographic Note Statistical Mechanics has inspired a variety of readable and innovative texts. The classic treatment is Tolman (1938). Among more modern approaches that deal in much greater depth with the topics covered by this chapter are Lifshitz & Pitaevski (1980), Pathria (1972), Reichl (1980), Riedi (1988) and Reif (1965). A highly individual and advanced treatment, emphasizing quantum statistical mechanics is Feynman (1972). A particularly readable account in which statistical mechanics is used heavily to describe the properties of solids, liquids and gases is Goodstein (1985). Readable, elementary introductions to information theory are Pierce (1980), and Raisbeck (1963); an advanced, recent text is McEliece (2002).

57 Box 3.4 Important Concepts in Chapter 3 • Foundational Concepts – A system, its phase space, its Hamiltonian and canonical transformations Sec. 3.2.1 and Ex. 3.1. – Ensemble of systems, Sec. 3.2.2, its distribution function ρ or ρn , and ensemble averages hAi, Sec. 3.2.3. ∗ Concepts underlying the notation ρn : multiplicity M and number density of states in phase space Nstates , Sec. 3.2.3. – Evolution of distribution functions: Liouville’s theorem - Sec. 3.3. – Statistical Equilibrium, Sec. 3.4. – Ergodic Hypothesis A¯ = hAi, Sec. 3.6. – Entropy, Sec. 3.7.1; its connection to information, Sec 3.12 and Eq. (3.78); entropy per particle as a useful tool, Sec. 3.9 . – Second law of thermodynamics: law of entropy increase, Secs. 3.7.1 and 3.7.2. – First law of thermodynamics, Eq. (3.44). • Statistical-Equilibrium Ensembles – Summary table - Sec. 3.2.2 . – General form of distribution function, Sec. 3.4.2 and Eqs. (3.21) and (3.23). – Canonical, Sec. 3.4.1 and Eq. (3.19); Grand Canonical, Secs. 3.4.2 and 3.8 and Eq. (3.24c); Gibbs, Sec. 3.4.2 and Eq. (3.24b); Microcanonical, Sec. 3.5. – Grand potential and grand partition function, and their use to deduce thermodynamic properties of a system, Sec. 3.8 and Ex. 3.6. – For a single-particle quantum state (mode): ∗ For fermions: Fermi-Dirac distribution, Eqs. (3.26). ∗ For bosons: Bose-Einstein distribution, Eqs. (3.27) and Bose-Einstein condensate, Sec. 3.10. ∗ For classical particles: Boltzmann distribution, Eq. (3.28).

Bibliography Anderson, M.H, Ensher, J.R., Matthews, M.R., Weiman C.E. and Cornell, E.A. 1995. Science, 269, 198.

58 Cornell, E. 1996. J. Res. Natl. Inst. Stand. Technol. 101, 419. Cohen-Tannoudji, C. , Diu, B., and Laloë, F. 1977. Quantum Mechanics, New York: Wiley Dalfovo, F., Giorgini, S., Pitaevski, L.P., and Stringari, S. 1999. Reviews of Modern Physics, 71, 463–512. Einstein, A. 1925. Sitzungsber. K. Preuss. Akad. Wiss., Phys. Math. Kl., 3, 18. Ensher, J.R., Jin, D.S., Mathews, M.R., Wieman, C.E., and Cornell, E.A. 1996. Physical Review Letters, 77, 4984. Feynman, R. P. 1972. Statistical Mechanics,Reading: Benjamin-Cummings Frautschi, Steven, 1982. Entropy in an expanding universe, Science 217, 593–599. V.P. Frolov and D.N. Page. 1993. Phys. Rev. Lett. bf 71, 2902. Goldstein, Herbert, 1980. Classical Mechanics. (Second Edition), New York: Addison Wesley Goldstein, Herbert, Poole, Charles, and Safko, John, 2002. Classical Mechanics. (Third Edition), New York: Addison Wesley Goodstein, D. L., 1985. States of Matter, New York: Dover Publications Hawking, Stephen W. 1976. Particle creation by black holes, Communications in Mathematical Physics, 43, 199–220. Hawking, S. W. 1989. Entropy decrease in contracting universe Kittel, Charles 1958. Elementary Statistical Physics, New York: Wiley. Landauer, R. 1961. “Irreversibility and Heat Generation in the Computing Process,” IBM J. Research and Development, 3, 183. Landauer, R. 1991. “Information is Physical,” Physics Today, May 1991, p. 23. Lewin, L. 1981. Polylogarithms and Associated Functions. North Holland. Lifshitz, E. M. & Pitaevskii, L. P. 1980. Statistical Physics. Part 1 (Third Edition),Oxford: Pergamon. Lynden-Bell, D. 1967,Monthly Notices of the Royal Astronomical Society, 136, 101. Mathews, Jon & Walker, Robert L. 1964. Mathematical Methods of Physics, New York: Benjamin. Ogorodnikov, K. F. 1965. Dynamics of Stellar Systems, Oxford: Pergamon.

59 Pathria, R. K. 1972. Statistical Mechanics, Oxford: Pergamon. Penrose, R. 1989. The Emperor’s New Mind, Oxford: Oxford University Press. Reichl, L. E. 1980. A Modern Course in Statistical Physics, London: Arnold. Reif, F. 1965. Fundamentals of Statistical and Thermal Physics, New York: McGrawHill. Riedi, P.C. 1988. Thermal PhysicsOxford: Oxford Science Publications. Shannon, C.E. 1948. “A mathematical theory of communication” The Bell System Technical Journal, 27, 379; available at http:XXXXXX. ter Haar, D. 1955. “Foundations of Statistical Mechanics,” Reviews of Modern Physics, 27, 289. Thornton, Stephen T. and Marion, Jerry B. 2004. Classical Dynamics of Particles and Systems, Brooks/Cole, Belmont California. Thorne, Kip S., Price, Richard H., & Macdonald, Douglas M., 1986. Black Holes: The Membrane Paradigm, New Haven: Yale University Press. Tolman, R. C. 1938. The Principles of Statistical Mechanics, Oxford: Oxford University Press. Wald, R.M. 1994. Quantum Field Theory in Curved Spacetime and Black Hole Thermodynamics, Chicago: University of Chicago Press. Wald, R.M. 2001. Living Rev. Relativity 4, 6. URL http://www.livingreviews.org/lrr2001-6 . Zipf, G. K. 1935. Psycho-Biology of Languages, Houghton-Mifflan. Zurek, W. and Thorne, K.S. 1986. Phys. Rev. Lett. 24, 2171.

Contents 4 Statistical Thermodynamics 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Microcanonical Ensemble and the Energy Representation of Thermodynamics 4.2.1 Extensive Variables and Fundamental Potential . . . . . . . . . . . . 4.2.2 Intensive Variables Identified Using Measuring Devices; First Law of Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Euler’s Equation and Form of Fundamental Potential . . . . . . . . . 4.2.4 Maxwell Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Mechanim of Entropy Increase When Energy is Injected . . . . . . . 4.2.6 Representations of Thermodynamics . . . . . . . . . . . . . . . . . . 4.3 Canonical Ensemble and the Physical-Free-Energy Representation of Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Experimental Meaning of Physical Free Energy . . . . . . . . . . . . 4.4 The Gibbs Representation of Thermodynamics; Phase Transitions and Chemical Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Minimum Principles for Gibbs and Other Fundamental Thermodynamic Potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Chemical Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Fluctuations of Systems in Statistical Equilibrium . . . . . . . . . . . . . . . 4.6 T2 The Ising Model and Renormalization Group Methods . . . . . . . . . 4.7 T2 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . . . . . .

i

1 1 3 3 5 7 8 9 9 10 12 14 16 18 21 26 33 39

Chapter 4 Statistical Thermodynamics Version 0804.1.K, 22 October 2008 Please send comments, suggestions, and errata via email to [email protected], or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 4.1 Reader’s Guide • Relativity enters into portions of this chapter solely via the relativistic energies and momenta of high-speed particles (Box 1.4.) • This chapter relies in crucial ways on Secs. 2.2 and 2.3 of Chap. 2 and on Secs. 3.2– 3.9 of Chap. 3. • Portions of Chap. 5 rely on Sec. 4.5 of this chapter. Portions of Part IV (Fluid Mechanics) rely on elementary thermodynamic concepts and equations of state treated in this chapter and Chap. 4, but most readers will already have met these in a course on elementary thermodynamics. • Other chapters do not depend strongly on this one.

4.1

Overview

In Chap. 3, we introduced the concept of statistical equilibrium and studied, briefly, some of the properties of equilibrated systems. In this chapter we shall develop the theory of statistical equilibrium in a more thorough way. The title of this chapter, “Statistical Thermodynamics,” emphasizes two aspects of the theory of statistical equilibrium. The term “thermodynamics” is an ancient one that predates statistical mechanics. It refers to a study of the macroscopic properties of systems that are in or near equilibrium, such as their energy and entropy. Despite paying no attention to the microphysics, classical thermodynamics is 1

2 a very powerful theory for deriving general relationships between macroscopic properties. Microphysics influences the macroscopic world in a statistical manner and so, in the late nineteenth century, Willard Gibbs and others developed statistical mechanics and showed that it provides a powerful conceptual underpinning for classical thermodynamics. The resulting synthesis, statistical thermodynamics, adds greater power to thermodynamics by augmenting to it the statistical tools of ensembles and distribution functions. In our study of statistical thermodynamics we shall restrict attention to an ensemble of large systems that are in statistical equilibrium. By “large” is meant a system that can be broken into a large number Nss of subsystems that are all macroscopically identical to the full system except for having 1/Nss as many particles, 1/Nss as much volume, 1/Nss as much energy, 1/Nss as much entropy, . . . . (Note that this constrains the energy of interaction between the subsystems to be negligible.) Examples are one kilogram of plasma in the center of the sun and a one kilogram sapphire crystal. The equilibrium thermodynamic properties of any type of large system (e.g. a monatomic gas) can be derived using any one of the statistical equilibrium ensembles of the last chapter (microcanonical, canonical, grand canonical, Gibbs). For example, each of these ensembles will predict the same equation of state P = (N/V )kB T for an ideal monatomic gas, even though in one ensemble each system’s number of particles N is precisely fixed, while in another ensemble N can fluctuate so that strictly speaking one should write the equation ¯ /V )kB T with N ¯ the ensemble average of N. (Here and throughout this of state as P = (N chapter, for compactness we use bars rather than brackets to denote ensemble averages, i.e. ¯ rather than hNi) N The equations of state are the same to very√high accuracy because the fractional fluctua¯ ; cf. Ex. 3.7. Although the thermodynamic tions of N are so extremely small, ∆N/N ∼ 1/ N properties are independent of the equilibrium ensemble, specific properties are often derived most quickly, and the most insight usually accrues, from that ensemble which most closely matches the physical situation being studied. In Sec. 3.8 we used the grand canonical ensemble, and in Secs. 4.2, 4.3, and 4.4 we shall use the microcanonical, canonical and Gibbs ensembles to derive many useful results from both classical and statistical thermodynamics: equations of state, Maxwell relations, Euler’s equation, sum-over-states methods for computing fundamental potentials, applications of fundamental potentials, ... . Table 4.1 summarizes those statistical-equilibrium results and some generalizations of them. Readers may wish to delay studying this table until they have read further into the chapter. As we saw in Chap. 3, when systems are out of statistical equilibrium, their evolution toward equilibrium is driven by the law of entropy increase—the second law of thermodynamics. In Sec. 4.4 we formulate the fundamental potential (Gibbs potential) for an out-of-equilibrium ensemble that interacts with a heat and volume bath, we discover a simple relationship between that fundamental potential and the entropy of system plus bath, and from that relationship we learn that the second law, in this case, is equivalent to a law of decrease of the Gibbs potential. As an application, we learn how chemical potentials drive chemical reactions and also drive phase transitions. In Sec. 4.5 we study spontaneous fluctuations of a system away from equilibrium, when it is coupled to a heat and particle bath, and discover how the fundamental potential (in this case Gibbs potential) can be used

3 Representation & Ensemble Energy & Microcanonical (Secs. 3.5 and 4.2) Enthalpy (Exs. 4.3 and 4.8) Physical-Free-Energy & Canonical (Secs. 3.4.1 and 4.3) Gibbs (Secs. 3.4.2 and 4.4) Grand Canonical (Secs. 3.4.2 and 3.8)

First Law dE = T dS + µ ˜dN − P dV

Bath none

dH = T dS + µ ˜dN + V dP dF = −SdT + µ ˜dN − P dV

V &E dE = −P dV E

Distribution Function ρ const = e−S/kB E const in δE const = e−S/kB Hconst (F −E)/kB T e

dG = −SdT + µ ˜dN + V dP

E &V

e(G−E−P V )/kB T

dΩ = −SdT − Nd˜ µ − P dV

E &N

e(Ω−E+˜µN )/kB T

Table 4.1: Representations and Ensembles for Statistical Equilibrium; cf. Table 4.2.

to compute the probabilities of such fluctuations. These out-of-equilibrium aspects of statistical mechanics (evolution toward equilibrium and fluctuations away from equilibrium) are summarized in Table 4.2, not just for heat and volume baths, but for a wide variety of other baths. Again, readers may wish to delay studying the table until they have read further into the chapter. Although the conceptual basis of statistical thermodynamics should be quite clear, deriving quantitative results for real systems can be formidably difficult. In a macroscopic sample, there is a huge number of possible microscopic arrangements (quantum states) and these all have to be taken into consideration via statistical sums if we want to understand the macroscopic properties of the most frequently occuring configurations. Direct summation over states is hopelessly impractical for real systems. However, in recent years a number of powerful approximation techniques have been devised for performing the statistical sums. In Secs. 4.6 and 4.7 we give the reader the flavor of two of these techniques: the renormalization group and Monte Carlo methods. We illustrate and compare these techniques by using them to study a phase transition in a simple model for Ferromagnetism called the Ising model.

4.2 4.2.1

Microcanonical Ensemble and the Energy Representation of Thermodynamics Extensive Variables and Fundamental Potential

Consider a microcanonical ensemble of large, closed systems that have attained statistical equilibrium. We can describe the ensemble macroscopically using a set of thermodynamic variables. As we saw in Chap. 3, these variables can be divided into two classes: extensive variables which double if one doubles the system’s size (volume, mass, . . .), and intensive variables whose magnitudes are independent of the system’s size. Examples of extensive variables are a system’s total energy E, entropy S, volume V , magnetization M, and number of conserved particles of various species NI . Examples of intensive variables are temperature

4 T , pressure P , the magnetic field strength H imposed on the system from the outside, and the chemical potentials µ ˜ I for various species of particles. The particle species I must only include those species whose particles are conserved on the timescales of interest. For example, if photons can be emitted and absorbed, then one must not specify Nγ , the number of photons; rather, Nγ will come to an equilibrium value that is governed by the values of the other extensive variables. Also, one must omit from the set {I} any conserved particle species whose numbers are automatically determined by the numbers of other, included species. For example, gas inside the sun is always charge neutral to very high precision, and therefore the number of electrons Ne in a sample of gas is always determined by the number of protons Np and the number of Helium nuclei (alpha particles) Nα : Ne = Np + 2Nα . Therefore, if one includes Np and Nα in one’s complete set of extensive variables, one must omit Ne . As in Chap. 3, we shall formulate the theory relativistically correctly, but shall formulate it solely in the mean rest frames of the systems and baths being studied. Correspondingly, in our formulation we shall generally include the particle rest masses mI in the total energy E and in the chemical potentials µ ˜I . For very nonrelativistic systems, however, we shall usually P replace E by the nonrelativistic energy E ≡ E − I NI mI c2 , and µ ˜ I by the nonrelativistic 2 chemical potential µI ≡ µ ˜I −mI c (though, as we shall see in Sec. 4.4 when studying chemical reactions, the identification of the appropriate rest mass mI to subtract is occasionally a delicate issue.) Let us now specialize to a microcanonical ensemble of one-species systems, which all have identically the same values of the energy E,1 number of particles N, and volume V . Suppose that the microscopic nature of the ensemble’s systems is known. Then, at least in principle and often in practice, one can identify from that microscopic nature the quantum states that are available to the system (given its constrained values of E, N, and V ), one can count those quantum states, and from their total number Nstates one can compute the ensemble’s total entropy S = kB ln Nstates [cf Eq. (3.31)]. The resulting entropy can be regarded as a function of the complete set of extensive variables, S = S(E, N, V ) ,

(4.1)

and this equation can then be inverted to give the total energy in terms of the entropy and the other extensive variables E = E(S, N, V ) . (4.2) We call the energy E, viewed as a function of S, N, and V , the fundamental thermodynamic potential for the microcanonical ensemble. From it, one can deduce all other thermodynamic properties of the system. 1

In practice, as was illustrated in Ex. 3.8, one must allow E to fall in some tiny but finite range δE rather than constraining it precisely, and one must then check to be sure that the results of one’s analysis are independent of δE.

5

4.2.2

Intensive Variables Identified Using Measuring Devices; First Law of Thermodynamics

In Sec. 3.4.1, we used kinetic theory considerations to identify the thermodynamic temperature T of the canonical ensemble [Eq. (3.19)]. It is instructive to discuss how this temperature arises in the microcanonical ensemble. Our discussion makes use of an idealized thermometer consisting of an idealized atom that has only two states, |0i and |1i with energies E0 and E1 = E0 + ∆E. The atom, initially in its ground state, is brought into thermal contact with one of the large systems of our microcanonical ensemble and then monitored over time as it is stochastically excited and de-excited. The ergodic hypothesis (Sec. 3.5) guarantees that the atom traces out a history of excitation and deexcitation that is governed statistically by the canonical ensemble for a collection of such atoms exchanging energy (heat) with our large system (the heat bath). More specifically, if T is the (unknown) temperature of our system, then the fraction of the time the atom spends in its excited state, divided by the fraction spent in its ground state, is equal to the canonical distribution’s probability ratio ρ1 e−E1 /kB T = −E0 /k T = e−∆E/kB T . B ρ0 e

(4.3a)

This ratio can also be computed from the properties of the full system augmented by the two-state atom. This augmented system is microcanonical with a total energy E + E0 , since the atom was in the ground state when first attached to the full system. Of all the quantum states available to this augmented system, the ones in which the atom is in the ground state constitute a total number N0 = eS(E,N,V )/kB ; and those with the atom in the excited state constitute a total number N1 = eS(E−∆E,N,V )/kB . Here we have used the fact that the number of states available to the augmented system is equal to that of the original, huge system (since the atom, in each of the two cases, is forced to be in a unique state); and we have expressed that number of states of the original system, for each of the two cases, in terms of the original system’s entropy function, Eq. (4.1). The ratio of the number of states N1 /N0 is (by the ergodic hypothesis) the ratio of the time that the augmented system spends with the atom excited, to the time spent with the atom in its ground state; i.e., it is equal to ρ1 /ρ0 N1 eS(E−∆E,N,V )/kB ∆E ∂S ρ1 = = . (4.3b) = exp − ρ0 N0 eS(E,N,V )/kB kB ∂E N,V By equating Eqs. (4.3a) and (4.3b), we obtain an expression for the original system’s temperature T in terms of the partial derivative (∂E/∂S)N,V of its fundamental potential E(S, N, V ) 1 ∂E T = , (4.3c) = (∂S/∂E)N,V ∂S N,V where we have used Eq. (1) of Box 4.2. A similar thought experiment, using a highly idealized measuring device that can exchange one particle ∆N = 1 with the system but cannot exchange any energy with it, gives for the fraction of the time spent with the extra particle in the measuring device (“state 1”)

6 Box 4.2 Two Useful Relations between Partial Derivatives Expand a differential increment in the energy E(S, N, V ) in terms of differentials of its arguments S, N, V ∂E ∂E ∂E dS + dN + dV . dE(S, N, V ) = ∂S N,V ∂N V,S ∂V S,N Next expand the entropy S(E, N, V ) similarly and substitute the resulting expression for dS into the above equation to obtain

dE =

∂E ∂S

N,V

∂S ∂E

dE +

N,V

+

"

∂E ∂S

∂S ∂N

"

∂E ∂S

∂S ∂V

N,V

N,V

E,V

∂E ∂N

N,E

∂E ∂V

+ +

S,V

S,N

#

dN

#

dV ,

Noting that this relation must be satisfied for all values of dE, dN, and dV , we conclude that ∂E 1 = , (1) ∂S N,V (∂S/∂E)N,V ∂E ∂S ∂E =− , (2) ∂N S,V ∂S N,V ∂N E,V etc.; and similar for other pairs and triples of partial derivatives. These equations, and their generalization to other variables, are useful in manipulations of thermodynamic equations. and in the system (“state 0”): ρ1 = eµ˜∆N/kB T ρ0 ∆N ∂S eS(E,N −∆N,V )/kB , = exp − = eS(E,N,V )/kB kB ∂N E,V

(4.4a)

Here the first expression is computed from the viewpoint of the measuring device’s grand canonical ensemble, and the second from the viewpoint of the combined system’s microcanonical ensemble. Equating these two expressions, we obtain ∂S ∂E µ ˜ = −T = . (4.4b) ∂N E,V ∂N S,V In the last step we have used Eq. (4.3c) and Eq. (4) of Box 4.2. The reader should be able to construct a similar thought experiment involving an idealized pressure transducer (Ex. 4.1),

7 which yields the following expression for the system’s pressure: ∂E . P =− ∂V S,N

(4.5)

Having identifed the three intensive variables T , µ ˜, and P as partial derivatives [Eqs. (4.3c), (4.4b), (4.5)], we now see that the fundamental potential’s differential relation ∂E ∂E ∂E dS + dN + dV . (4.6) dE(S, N, V ) = ∂S N,V ∂N V,S ∂V S,N is nothing more nor less than the ordinary first law of thermodynamics dE = T dS + µ ˜dN − P dV .

(4.7)

Notice the “pairing” of intensive and extensive variables in this first law: Temperature T is paired with entropy S; chemical potential µ ˜ is paired with number of particles N; and pressure P is paired with volume V . We can think of each intensive variable as a “generalized force” acting upon its corresponding extensive variable to change the energy of the system. We can add additional pairs of intensive and extensive variables if appropriate, calling them XA , YA (for example the externally imposed magnetic field H and the magnetization M). We can also generalize to a multi-component system, i.e. one that has several types of conserved particles with numbers NI and associated chemical potentials µ ˜I . We can also convert to nonrelativistic by subtracting off the rest-mass contributions (switching from E to P language E ≡ E− NI mI c2 and from µ ˜I to µI = µ ˜ I − mI c2 ). The result is the nonrelativistic, extended first law X X dE = T dS + µI dNI − P dV + XA dYA . (4.8) I

A

(e.g., Sec. 18 of Kittel 1958).

4.2.3

Euler’s Equation and Form of Fundamental Potential

We can integrate the differential form of the first law to obtain a remarkable, though essentially trivial relation known as Euler’s equation. More specifically, we decompose our system into a large number of subsystems in equilibrium with each other. As they are in equilibrium, they will all have the same values of the intensive variables T, µ ˜, P ; and therefore, if we add up all their energies dE to obtain E, their entropies dS to obtain S, etc., we obtain from the first law (4.7) 2 E = TS + µ Ñ − P V . (4.9a) 2

There are a few (very few!) systems for which some of the thermodynamic laws, including Euler’s equation, take on forms different from those presented in this chapter. A black hole is an example (cf Sec. 3.11.2). A black hole cannot be divided up into subsystems, so the above derivation of Euler’s equation fails. Instead of increasing linearly with the mass MH of the hole, the hole’s extensive variables SH = (entropy) and JH = (spin angular momentum) increase quadratically with MH ; and instead of being independent of the hole’s mass, the intensive variables TH = (temperature) and ΩH = (angular velocity) scale as 1/MH . See, e.g., Tranah & Landsberg (1980) and see Sec. sec3:BHs for some other aspect of black-hole thermodynamic.

8 Since the energy E is itself extensive, Euler’s equation (4.9a) must be expressible in the functional form E = Nf (V /N, S/N) (4.9b) for some function f . For example, for a monatomic ideal gas, the non-relativistic version of Eq. (4.9b) is 2 −2/3 V 5 2 S 3h ; (4.9c) exp − E(V, S, N) = N 4πm N 3kB N 3 cf Eq. (3.53). Here m is the mass of an atom and h is Planck’s constant.

4.2.4

Maxwell Relations

There is no need to memorize a lot of thermodynamic relations; most all relations can be deduced almost trivially from the fundamental potential plus the first law. For example, if one remembers only that the nonrelativistic fundamental potential expresses E as a function of the other extensive variables S, N, V , then by writing out the differential relation (4.8) and comparing with the first law dE = T dS + µdN − P dV , one can immediately read off the intensive variables in terms of partial derivatives of the fundamental potential: ∂E ∂E ∂E T = , µ= , P =− . (4.10a) ∂S V,N ∂N V,S ∂V S,N One can then go on to notice that the resulting P (V, S, N), T (V, S, N), and µ(V, S, N) are not all independent. The equality of mixed partial derivatives (e.g., ∂ 2 E/∂V ∂S = ∂ 2 E/∂S∂V ) together with Eqs. (4.10a) implies that they must satisfy the following Maxwell relations: ∂µ ∂P ∂T ∂µ ∂P ∂T = , − = , =− . ∂N S,V ∂S N,V ∂S V,N ∂V S,N ∂V N,S ∂N V,S (4.10b) Additional relations can be generated using the types of identities proved in Box 4.2 — or they can be generated more easily by applying the above procedure to the fundamental potentials associated with other ensembles; see, e.g., Secs. 3.8, 4.3 and 4.4. All equations of state, i.e. all relations between intensive and extensive variables, must satisfy the Maxwell relations. For our simple example of a nonrelativistic, monatomic gas we can substitute our fundamental potential E [Eq. (4.9c)] into Eqs. (4.10a) to obtain

5/3 N 5 2S , P (V, S, N) = − exp V 3kB N 3 2/3 h2 N 5 2S T (V, S, N) = , − exp 2πmkB V 3kB N 3 2 2/3 h N S 2S 5 µ(V, S, N) = 5−2 exp . − 4πm V kB N 3kB N 3 h2 2πm

These clearly do satisfy the Maxwell relations.

(4.11)

9

4.2.5

Mechanim of Entropy Increase When Energy is Injected

Turn, now, from formalism to a simple thought experiment that gives insight into entropy. Consider a single, large, closed system (not an ensemble), which has evolved for a time far longer than τint and thereby has reached statistical equilibrium. Let T (V, S, N) be the temperature that characterizes this system’s grand-canonically-distributed subsystems. Now add a small amount ∆Q of thermal energy (heat) to the system, without changing its volume V or its number of conserved particles N. The added heat, being on an equal footing with any other kind of energy in the law of energy conservation, must appear in the first law as a ∆E = ∆Q; and correspondingly, according to the first law (4.7), the added heat must increase the system’s entropy by an amount ∆S =

∆Q . T

(4.12)

This can be generalized: The energy need not be inserted into the system in the form of heat. Rather, one can add the energy mechanically, e.g., if the system is a liquid by stirring it; or one can add it optically by shining a laser beam into it and letting a few of the system’s atoms absorb the laser light. In either case the system, immediately after energy insertion, will be far from statistical equilibrium; i.e., its macroscopic properties such as the number of atoms with energies far higher than the mean (i.e. it’s macrostate) will be highly improbable according to the microcanonical distribution.3 However, if one waits long enough (∆t ≫ τint ) after the energy addition, the system will thermalize; i.e., it will evolve into a macrostate that is rather probable according to the microcanonical distribution, and thereafter it will wander ergodically through system quantum states that correspond, more or less, to this macrostate. This final, thermalized macrostate and the initial macrostate, before energy insertion, both have the same volume V and the same number of conserved particles N; but they differ in energy by the amount ∆E that was inserted. Correspondingly, they also differ in entropy by ∆E ∆S = . (4.13) T Where did this entropy come from? Suppose that the energy was injected by a laser. Then initially the energy went into those specific atoms that absorbed the photons. Subsequently, however, those atoms randomly exchanged and shared the energy with other atoms. This exchange and sharing is a variant of the phase mixing of Sec. 3.7, and it is responsible for the thermal equilibration and the entropy increase.

4.2.6

Representations of Thermodynamics

The treatment of thermodynamics given in this section is called the energy representation because it is based on the fundamental potential E(S, V, N) in which the energy is expressed as a function of the complete set of extensive variables {S, V, N}. This energy representation is intimately related to the microcanonical ensemble, as the discussion near the beginning 3

We use the word “macrostate” to distinguish clearly from the quantum states available to the system as a whole, which in equilibrium are all equally likely. The probability for a macrostate is proportional to the number of system quantum states that correspond to it.

10 of this section shows. In Sec. 3.8 of the last chapter we met a grand-potential representation for thermodynamics, which was based on the grand potential Ω(T, µ ˜, V ) and was intimately related to the grand canonical ensemble for systems of volume V in equilibrium with a heat and particle bath that has temperature T and chemical potential µ ˜. In the next two sections we shall meet the two representations of thermodynamics that are intimately related to the canonical and Gibbs ensembles, and shall discover their special power at handling certain special issues. And in Ex. 4.3 we shall meet a representation and ensemble based on enthalpy. These five representations and their ensembles are summarized in Table 4.1 above. **************************** EXERCISES Exercise 4.1 Problem: Pressure-Measuring Device For the microcanonical ensemble considered in Sec. 4.2, derive Eq. (4.5) for the pressure using a thought experiment involving a pressure-measuring device. Exercise 4.2 Derivation: Energy Representation for a Nonrelativistic Monatomic Gas (a) Use the fundamental potential E(V, S, N) for the nonrelativistic, monatomic gas to derive Eq. (4.11) for the the pressure, temperature, and chemical potential. (b) Show that these equations of state satisfy the Maxwell relations (4.10b). (c) Combine these equations of state to obtain the perfect-gas equation of state P =

N kB T . V

(4.14)

Note that this is the same equation as we obtained using the grand canonical ensemble, Eq. (3.47c) except that the average pressure and particle number are replaced by their exact microcanonical equivalents. As we discussed in Sec. 4.1, this is no coincidence.

****************************

4.3

Canonical Ensemble and the Physical-Free-Energy Representation of Thermodynamics

In this section we focus attention on an ensemble of systems that can exchange energy but nothing else with a heat bath at temperature T . The systems thus have variable total energy E, but they all have the same, fixed values of the two remaining extensive variables N and V . (Once again generalization to additional particle species and additional means of performing work on the system is straightforward.) We presume that the ensemble has

11 reached statistical equilibrium, so it is canonical with distribution function (probability of occupying any quantum state of energy E) given by Eq. (3.19) 1 ρ = e−E/kB T ≡ e(F −E)/kB T . z

(4.15)

Here, as in the grand canonical ensemble [Eq. (3.36)], we have introduced special notations for the normalization constant: 1/z = eF/kB T , where z (the partition function) and F (the physical free energy or Helmholtz free energy) are functions of the systems’ fixed N and V and the bath’s temperature T . Once the microscopic configurations (quantum states |ni) of fixed N and V but variable E have been identified,Pthe functions z(N, V, T ) and F (N, V, T ) can be computed from the normalization relation n ρn = 1: −F/kB T

e

≡ z(T, N, V ) =

X

−En /kB T

e

n

W W Z −E(q,p)/kB T d qd p . = e MhW

(4.16)

Having evaluated z(T, N, V ) or equivalently F (T, N, V ), one can then proceed as follows to determine other thermodynamic properties of the ensemble’s systems. P The entropy S can be computed from the standard expression S = −kB n ρn ln ρn = −kB ln ρ, together with Eq. (4.15) for ρ: S=

E¯ − F , T

(4.17a)

It is helpful to rewrite Eq. (4.17a) as an equation for the physical free energy F F = E¯ − T S .

(4.17b)

Suppose, now, that the canonical ensemble’s parameters T, N, V are changed slightly. By how much will the physical free energy change? Equation (4.17b) tells us that dF = dE¯ − T dS − SdT .

(4.17c)

Because macroscopic thermodynamics is independent of the statistical ensemble being studied, we can evaluate dE¯ using the first law of thermodynamics (4.7) with the microcanonical ¯ The result is exact energy E replaced by the canonical mean energy E. dF = −SdT + µ ˜dN − P dV .

(4.18)

Equation (4.18) contains the same information as the first law of thermodynamics and can be thought of as the first law rewritten in a new mathematical representation, the physicalfree-energy representation. In the original energy representation we regarded E(S, N, V ) as the fundamental potential, and the first law described how E changes when its independent variables S, N, V are changed. In the physical-free-energy representation we regard F (T, N, V ) as the fundamental potential, and the first law (4.18) describes how F changes when its independent variables T , N, V change. Because the microcanonical ensemble deals

12

Heat Bath

Gas

Piston

Fig. 4.1: Origin of the name physical free energy for F (V, T, N ).

with systems of fixed E, N, V , it is the foundation that underlies the energy representation. Because the canonical ensemble deals with systems of fixed T , N, V , it is the foundation that underlies the physical-free-energy representation. Equation (4.17b), which leads mathematically from the energy representation to the physical-free-energy representation, is called a Legendre transformation and is a common tool (e.g. in classical mechanics4 ) for switching from one set of independent variables to another. The independent variables in the physical-free-energy representation are the differentials on the right hand side of the first law (4.18), namely T , N, V . The “generalized forces” paired with these independent variables, −S, µ ˜, −P respectively, are the coefficients of the first-law changes. The corresponding Euler equation can be inferred from the first law by thinking of building up the full system, piece-by-piece, with the extensive variables growing in unison and the intensive variables held fixed; the result is F = −P V + µ Ñ .

(4.19)

Note that the temperature is present in this relation only implicitly, through the dependence of F , P , and µ ˜ on the representation’s independent variables T, N, V . We can use the physical-free-energy form of the first law to read off equations of state for the generalized forces, ∂F ∂F ∂F −P = , −S = , µ ˜= ; (4.20) ∂V T,N ∂T V,N ∂N V,T Finally, Maxwell relations can be derived from the equality of mixed partial derivatives, as in the energy representation; for example, (∂P/∂T )V,N = (∂S/∂V )T,N .

4.3.1

Experimental Meaning of Physical Free Energy

The name physical free energy for F can be understood using the idealized experiment shown in Fig. 4.1. Gas is placed in a chamber, one wall of which is a piston; and the chamber comes into thermal equilibrium with a heat bath, with which it can exchange heat but not particles. The volume of the chamber has some initial value Vi ; and correspondingly, the gas has some initial physical free energy F (Vi , T, N). The gas is then allowed to push the piston to the right sufficiently slowly for the gas to remain always in thermal equilibrium with the heat bath, at the bath’s temperature T . When the chamber has reached its final volume Vf , the 4

For example, Goldstein (1980).

13 total work done on the piston by the gas, i.e., the total energy extracted by the piston from this “engine”, is Z Vf −P dV . (4.21a) Eextracted = Vi

Using the first law dF = −SdT + µ ˜dN − P dV and remembering that T and N are kept constant, Eq. (4.21a) becomes Eextracted = F (Vf , T, N) − F (Vi , T, N) ≡ ∆F .

(4.21b)

Thus, F is the energy that is “free to be extracted” in an isothermal, physical expansion of the gas.5 If the expansion had been done in a chamber that was perfectly thermally insulated, so no heat could flow in or out of it, Eq. (4.12) tells us that there would have been no entropy change. Correspondingly, with S and N held fixed but V changing during the expansion, the natural way to analyze the expansion would have been in the energy representation; and that representation’s ˜dN would have told us that the total R first law dE = −P dV + T dS + µ energy extracted, −P dV , was the change ∆E of the gas’s total energy. Such a process, which occurs without any heat flow or entropy increase, is called adiabatic. Thus, the energy E (or in the nonrelativistic regime E) measures the amount of energy that can be extracted from an adiabatic engine, by contrast with F which measures the energy extracted from an isothermal engine. **************************** EXERCISES Exercise 4.3 **Example: The Enthalpy Representation of Thermodynamics (a) Enthalpy H is a macroscopic thermodynamic variable defined by H ≡ E + PV .

(4.22)

Show that this definition can be regarded as a Legendre transformation that converts from the energy representation of thermodynamics with E(V, S, N) as the fundamental potential, to an enthalpy representation with H(P, S, N) as the fundamental potential. More specifically, show that the first law, reexpressed in terms of H, takes the form dH = V dP + T dS + µ ˜dN ;

(4.23)

and then explain why this first law dictates that H(P, S, N) be taken as the fundamental potential. 5

More generally, the phrase “free energy” means the energy that can be extracted in a process that occurs in contact with some sort of environment. The nature of the free energy depends on the nature of the contact. We will meet “chemical free energy” in the next section.

14 (b) There is an equilibrium statistical mechanics ensemble associated with the enthalpy representation. Show that each system of this ensemble (fluctuationally) exchanges volume and energy with a surrounding bath but does not exchange heat or particles, so the exchanged energy is solely that associated with the exchanged volume, dE = −P dV , and the enthalpy H does not fluctuate. (c) Show that this ensemble’s distribution function is ρ = e−S/kB =constant for those states in phase space that have a specified number of particles N and a specified enthalpy H. Why do we not need to allow for a small range δH of H, by analogy with the small range E for the microcanonical ensemble (Sec. 3.5 and Ex. 3.8). (d) What equations of state can be read off from the enthalpy first law? What are the Maxwell relations between these equations of state? (e) What is the Euler equation for H in terms of a sum of products of extensive and intensive variables? (f) Show that the system’s enthalpy is equal to its total inertial mass (multiplied by the speed of light squared); cf. Exs. 1.28 and 1.29. (g) As another interpretation of the enthalpy, think of the system as enclosed in an impermeable box of volume V . You are asked to inject into the box a “sample” of additional material of the same sort as is already there. (It may be helpful to think of the material as a gas.) The sample is to be put into the same thermodynamic state, i.e. macrostate, as that of the box’s material; i.e., it is to be given the same values of temperature T , pressure P , and chemical potential µ ˜. Thus, the sample’s material is indistinguishable in its thermodynamic properties from the material already in the box, except that its extensive variables (denoted by ∆’s) are far smaller: ∆V /V = ∆E/E = ∆S/S ≪ 1. Perform the injection by opening up a hole in one of the box’s walls, pushing aside the box’s material enough to make a little cavity of volume ∆V equal to that of the sample, inserting the sample into the cavity, and then closing the hole in the wall. The box now has the same volume V as before, but its energy has changed. Show that the energy change, i.e., the energy required to create the sample and perform the injection, is equal to the enthalpy ∆H of the sample. Thus, enthalpy has the physical interpretation of “energy of injection at fixed volume V ”.

****************************

4.4

The Gibbs Representation of Thermodynamics; Phase Transitions and Chemical Reactions

Turn attention, next, to the most important of the various representations of thermodynamics: the one appropriate to systems in which the temperature T and pressure P are both

15 being controlled by an external environment (bath) and thus are treated as independent variables in the fundamental potential. This is the situation in most laboratory experiments. This representation can be obtained from the energy representation using a Legendre transformation G ≡ E − TS + PV . (4.24) We call the quantity G the Gibbs potential (it is also known as the chemical free-energy or the Gibbs free-energy), and we call the representation of thermodynamics based on it the Gibbs representation. Differentiating Eq. (4.24) and combining with the energy representation’s first law (4.7), we obtain the first law in the Gibbs representation: dG = V dP − SdT + µ ˜dN .

(4.25)

From this first law we read out the independent variables of the Gibbs representation, namely {P, T, N} and the equations of state: ∂G ∂G ∂G , S=− µ ˜= ; (4.26) V = ∂P T,N ∂T P,N ∂N P,T and from the equality of mixed partial derivatives, we read off Maxwell relations. By imagining building up a large system from many tiny subsystems (all with the same, fixed P and T ) and applying the first law (4.25) to this buildup, we obtain the Euler relation G=µ Ñ .

(4.27)

The energy representation (Sec. 4.2) is intimately associated with the microcanonical ensemble, the physical-free-energy representation (Sec. 4.3) is intimately associated with the canonical ensemble, and the grand-potential representation (Sec. 3.8) is intimately associated with the grand-canonical ensemble. What, by analogy, is the ensemble associated with the Gibbs representation? In general, those independent variables of a representation that are intensive are properties of the bath with which the ensemble’s systems interact, the extensive variables paired with those independent intensive variables are the quantitites exchanged with the bath, and the independent extensive variables are the quantities held fixed in the ensemble’s systems. For the Gibbs representation the independent intensive variables are pressure P and temperature T , and therefore the bath must be characterized by fixed P and T . Paired with P and T are the extensive variables V and E, and therefore the systems exchange volume and energy with the bath. The only independent extensive variable is N, and thus the systems in the ensemble must have fixed N. A little thought then leads to the following conclusion: The bath can be regarded (for pedagogical purposes) as an enormous, closed system, of fixed volume, fixed energy and fixed number of particles. Each system in our ensemble can be thought of as immersed in this bath (or in a separate but identical bath). The system’s particles are prevented from escaping into the bath and the bath’s particles are prevented from entering the system. We can achieve this by mentally identifying a huge set of particles that we regard as the system and placing an imaginary interface between them and the bath. Alternatively, we can imagine

16 a membrane separating the system from the bath — a membrane impermeable to particles, but through which heat can pass, and with negligible surface tension so the system and the bath can buffet each other freely. The system’s volume will then fluctuate stochastically as its boundary membrane is buffeted by fluctuating forces from particles hitting it; and the system’s energy will fluctuate stochastically both as P dV work is done on it by the volumefluctuating bath, and as heat stochastically flows back and forth through the membrane between bath and system. Correspondingly, we can think of the bath as a “heat and volume bath”; it freely exchanges heat and volume with the system, and exchanges nothing else. In Sec. 3.4.2 we showed that an ensemble of systems in equilibrium with such a heat and volume bath (a Gibbs ensemble) is described by the distribution function ρ = const × e−E/kB T e−P V /kB T . By analogy with Eq. (4.15) for the canonical ensemble’s distribution function and (3.36) for the grand canonical ensemble’s distribution function, it is reasonable to expect the normalization constant to be the exponential of the Gibbs potential G(P, T, N); i.e., ρ = eG/kB T e−(E+P V )/kB T ,

(4.28)

where T and P are the temperature and pressure of the bath, and E and V are the energy and volume of a specific system in the Gibbs ensemble. We can verify this by computing the entropy S = −kB ln ρ using expression (4.28) for ρ, and getting S = −(G− E¯ −P V¯ )/T , which agrees with the definition (4.24) of G when we identify the ensemble-averaged energy E¯ with the “precise” energy E and the ensemble-averaged volume V¯ with the “precise” volume V . While this Gibbs ensemble is conceptually interesting, it is not terribly useful for computing the fundamental potential G(T, P, N), because evaluating the normalization sum X e−G/kB T = e−(En +P Vn )/kB T (4.29) n

is much more difficult than computing the physical free energy F (T, V, N) from the sum over states (4.16), or than computing the grand potential Ω(T, V, µ ˜) from the sum over states (3.37). Correspondingly, in most statistical mechanics textbooks there is little or no discussion of the Gibbs ensemble.

4.4.1

Minimum Principles for Gibbs and Other Fundamental Thermodynamic Potentials

Despite its lack of usefulness in computing G, the Gibbs ensemble plays an important conceptual role in a minimum principle for the Gibbs potential, which we shall now derive. Consider an ensemble of systems, each of which is immersed in an identical heat and volume bath, and assume that the ensemble begins with some arbitrary distribution function, one that is not in equilibrium with the baths. As time passes each system will interact with its bath and will evolve in response to that interaction; and correspondingly the ensemble’s distribution function ρ will evolve. At any moment systems will have P of time the ensemble’sP some mean (ensemble-averaged) energy E¯ ≡P n ρn En and volume V¯ ≡ n ρn Vn , and the ensemble will have some entropy S = −kB n ρn ln ρn . From these quantities (which are well defined even though the ensemble may be very far from statistical equilibrium), we

17 can compute a Gibbs potential G for the ensemble. This G is defined by the analog of the equilibrium definition (4.24) G ≡ E¯ + P V¯ − T S , (4.30) where P and T are the pressure and temperature of the identical baths with which the ensemble’s systems are interacting.6 Now, as the evolution proceeds, the total entropy of the baths’ ensemble plus the systems’ ensemble will continually increase, until equilibrium is reached. Suppose that during a short stretch of evolution the systems’ mean energy changes ¯ their mean volume changes by ∆V¯ , and the entropy of the ensemble changes by by ∆E, ∆S. Then, by conservation of energy and volume, the baths’ mean energy and volume must change by ∆E¯bath = −∆E¯ , ∆V¯bath = −∆V¯ . (4.31a) Because the baths (by contrast with the systems) are in statistical equilibrium, we can apply to them the first law of thermodynamics for equilibrated systems ∆E¯bath = −P ∆V¯bath + T ∆Sbath + µ ˜∆Nbath .

(4.31b)

Since Nbath is not changing (the systems cannot exchange particles with their baths) and since the changes of bath energy and volume are given by Eqs. (4.31a), Eq. (4.31b) tells us that the baths’ entropy changes by ∆Sbath =

−∆E¯ − P ∆V¯ . T

(4.31c)

Correspondingly, the sum of the baths’ entropy and the systems’ entropy changes by the following amount, which cannot be negative: ∆Sbath + ∆S =

−∆E¯ − P ∆V¯ + T ∆S ≥ 0. T

(4.31d)

Because the baths’ pressure P and temperature T are not changing (the systems are so tiny compared to the baths that the energy and volume they exchange with the baths cannot have any significant effect on the baths’ intensive variables), the numerator of expression (4.31d) is equal to the evolutionary change in the ensemble’s Gibbs potential (4.30): ∆Sbath + ∆S =

−∆G ≥ 0. T

(4.32)

Thus, the second law of thermodynamics for an ensemble of arbitrary systems in contact with identical heat and volume baths is equivalent to the law that the systems’ Gibbs 6

Notice that, because the number N of particles in the system is fixed as is the bath temperature T , the evolving Gibbs potential is proportional to g≡

E¯ P V¯ S G . = + − N kB T N kB T N kB T N kB

This quantity is dimensionless and generally of order unity. Note that the last term is the dimensionless entropy per particle [Eq. (3.48) and associated discussion].

18 potential can never increase. As the evolution proceeds and the entropy of baths plus systems continually increases, the Gibbs potential G will be driven smaller and smaller, until ultimately, when statistical equilibrium with the baths is reached, G will stop at its final, minimum value. The ergodic hypothesis guarantees that this minimum principle applies not only to an ensemble of systems, but also to a single, individual system when that system is averaged over times long compared to its internal timescales τint (but times that may be very short compared to the timescale for interaction with the heat and volume bath): The system’s time-averaged energy E¯ and volume V¯ , and its entropy S (as computed, e.g., by examining the temporal wandering of its state on timescales ∼ τint ), combine with the bath’s temperature T and pressure P to give a Gibbs function G = E¯ + P V¯ − T S. This G evolves on times long compared to the averaging time used to define it; and that evolution must be one of continually decreasing G. Ultimately, when the system reaches equilibrium with the bath, G achieves its minimum value. At this point we might ask about the other thermodynamic potentials. Not surprisingly, associated with each of them there is an extremum principle analogous to “minimum G”: (i ) For the energy potential E(V, S, N), one focuses on closed systems and switches to S(V, E, N); and the extremum principle is then the standard second law of thermodynamics: An ensemble of closed systems of fixed E, V , N must evolve always toward increasing entropy S; and when it ultimately reaches equilibrium, the ensemble will be microcanonical and will have maximum entropy. (ii ) For the physical free energy F (V, T, N) one can derive, in a manner perfectly analogous to the Gibbs derivation, the following minimum principle: For an ensemble of systems interacting with a heat bath, the physical free energy F will always decrease, ultimately reaching a minimum when the ensemble reaches its final, equilibrium, canonical distribution. (iii ) The grand-potential Ω(V, T, µ ˜) (Sec. 3.8) satisfies the analogous minimum principle: For an ensemble of systems interacting with a heat and particle bath, the grand potential Ω will always decrease, ultimately reaching a minimum when the ensemble reaches its final, equilibrium, grand-canonical distribution. (iv ) For the enthalpy H(P, S, N) (Ex. 4.3) the analogous minimum principle should be obvious: For an ensemble of systems interacting with a volume bath, the enthalpy H will always decrease, ultimately reaching a minimum when the ensemble reaches its final equilibrium distribution.

4.4.2

Phase Transitions

The minimum principle for the Gibbs potential G is a powerful tool in understanding phase transitions: “Phase” in the phrase “phase transitions” refers to a specific pattern into which the atoms or molecules of a substance organize themselves. For the substance H2 O there are three familiar phases: water vapor, liquid water, and solid ice. Over one range of pressure P and temperature T , the H2 O molecules prefer to organize themselves into the vapor phase; over another, the liquid phase; and over another, the solid ice phase. It is the Gibbs potential that governs their preference. To understand this role of the Gibbs potential, consider a cup of water in a refrigerator (and because the water molecules are highly norelativistic, adopt the nonrelativistic viewpoint with the molecules’ rest masses removed from their energy E and chemical potential

19 G Gc

liq u s o li d id

s o li d liq ui d Tc

T

Fig. 4.2: The Gibbs potential G(T, P, N ) for H2 O as a function of temperature T with fixed P and N , near the freezing point 273K. The solid curves correspond to the actual path traversed by the H2 O if the phase transition is allowed to go. The dotted curves correspond to superheated solid ice and supercooled liquid water that are unstable against the phase transition because their Gibbs functions are higher than those of the other phase. Note that G tends to decrease with increasing temperature. This is caused by the −T S term in G = E + P V − T S. G

G So lid

∆V

Gc

liqu

Gc

id

First Order

SecondOrder

V

(a)

V

(b)

Fig. 4.3: The changes of volume (plotted rightward) with increasing Gibbs function (plotted upward) at fixed P and N for a first-order phase transition [diagram (a)] and a second-order phase transition [diagram (b)]. Gc is the critical value of the Gibbs potential at which the transition occurs.

µH2 O and also from their Gibbs potential). The refrigerator’s air forms a heat and volume bath for the water in the cup (the system). There is no membrane between the air and the water, but none is needed. Gravity, together with the density difference between water and air, serves to keep the water molecules in the cup and the air above the water’s surface. Allow the water to reach thermal and pressure equilibrium with the refrigerator’s air; then turn down the refrigerator’s temperature slightly and wait for the water to reach equilibrium again; and then repeat the process. Pretend that you are clever enough to compute from first-principles the Gibbs potential G for the H2 O at each step of the cooling, using two alternative assumptions: that the H2 O molecules organize themselves into the liquid water phase; and that they organize themselves into the solid ice phase. Your calculations will produce curves for G as a function of temperature T at fixed (atmospheric) pressure that are shown in Fig. 4.2. At temperatures T > 273K the liquid phase has the lower Gibbs potential G, and at T < 273K the solid phase has the lower G. Correspondingly, when the cup’s temperature sinks slightly below 273K, the H2 O molecules have a statistical preference for reorganizing themselves into the solid phase. The water freezes, forming ice. It is a familiar fact that ice floats on water, i.e. ice is less dense than water, even when they are both precisely at the phase-transition temperature of 273K. Correspondingly,

20 Barium Oxygen Titanium

(a)

(b)

Fig. 4.4: (a) The unit cell for a BaTiO3 crystal at relatively high temperatures. (b) The displacements of the titanium and oxygen ions relative to the corners of the unit cell, that occur in this crystal with falling temperature when it undergoes its second-order phase transition. The magnitudes of the displacements are proportional to the amount Tc − T by which the temperature T drops below the critical temperature Tc , for small Tc − T .

when our sample of water freezes, its volume increases discontinuously by some amount ∆V ; i.e., when viewed as a function of the Gibbs potential G, the volume V of the statistically preferred phase is discontinous at the phase-transition point; see Fig. 4.3(a). It is also a familiar fact that when water freezes, it releases heat into its surroundings. This is why the freezing requires a moderately long time: the solidifying water can remain at or below its freezing point and continue to solidify only if the surroundings carry away the released heat, and the surroundings typically cannot carry it away quickly. The heat ∆Q released during the freezing (the latent heat) and the volume change ∆V are related to each other in a simple way; see Ex. 4.4, which focuses on the latent heat per unit mass ∆q and the density change ∆ρ instead of on ∆Q and ∆V . Phase transitions with finite volume jumps ∆V 6= 0 and finite latent heat ∆Q 6= 0 are called first-order. Less familiar, but also important, are second-order phase transitions. In such transitions the volumes V of the two phases are the same at the transition point, but their rates of change with decreasing G are different (and this is so whether one holds P fixed as G decreases or holds T fixed or holds some combination of P and T fixed); see Fig. 4.3(b). Crystals provide examples of both first-order and second-order phase transition. A crystal can be characterized as a 3-dimensional repetition of a “unit cell”, in which ions are distributed in some fixed way. For example, Fig. 4.4(a) shows the unit cell for a BaTiO3 crystal at relatively high temperatures. This unit cell has a cubic symmetry. The full crystal can be regarded as made up of such cells stacked side by side and one upon another. A first-order phase transition occurs when, with decreasing temperature, the Gibbs potential G of some other ionic arrangement, with a distinctly different unit cell, drops below the G of the original arrangement. Then the crystal can spontaneously rearrange itself, converting from the old unit cell to the new one with some accompanying release of heat and some discontinuous change in volume. BaTiO3 does not behave in this way. Rather, as the temperature falls a bit below a critical value, all the Titanium and Oxygen ions get displaced a bit in their unit cells parallel to one of the original crystal axes; see Fig. 4.4(b). If the temperature is only a tiny bit below critical, they are displaced by only a tiny amount. When the temperature falls further, their displacements increase. If the temperature is raised back up above critical, the ions return

21 to the standard, rigidly fixed positions shown in Fig. 4.4(a). The result is a discontinuity, at the critical temperature, in the rate of change of volume dV /dG [Fig. 4.3(b)], but no discontinuous jump of volume and no latent heat. This BaTiO3 example illustrates a frequent feature of phase transitions: When the transition occurs, i.e., when the titanium and oxygen atoms start to move, the cubic symmetry gets broken. The crystal switches, discontinuously, to a “lower” type of symmetry, a “tetragonal” one. Such symmetry breaking is a common occurence in phase transitions. Bose-Einstein condensation of a bosonic atomic gas in a magnetic trap is another example of a second-order phase transition; see Sec. 3.10. As we saw in Ex. 3.12, the specific heat of the atoms changes discontinuously (in the limit of an arbitrarily large number of atoms) at the critical temperature.

4.4.3

Chemical Reactions

A second important application of the Gibbs potential is to the study of chemical reactions. Under the term “chemical reactions” we include any change in the constituent particles of the material being studied, including the joining of atoms to make molecules, the liberation of electrons from atoms in an ionization process, the joining of two atomic nuclei to make a third kind of nucleus, the decay of a free neutron to produce an electron and a proton, ... . In other words, the “chemical” of chemical reactions encompasses the reactions studied by nuclear physicists and elementary particle physicists as well as those studied by chemists. The Gibbs representation is the appropriate one for discussing chemical reactions, because such reactions generally occur in an environment (“bath”) of fixed temperature and pressure. As a specific example, consider in the earth’s atmosphere the breakup of two molecules of water vapor to form two hydrogen molecules and one oxygen molecule, 2H2 O → 2H2 + O2 . The inverse reaction 2H2 +O2 → 2H2 O also occurs in the atmosphere, and it is conventional to write down the two reactions simultaneously in the form 2H2 O ↔ 2H2 + O2 .

(4.33)

A chosen (but arbitrary) portion of the atmosphere, with idealized walls to keep all its molecules in, can be regarded as a “system”. (The walls are unimportant in practice, but are pedagogically useful.) The kinetic motions of this system’s molecules reach and maintain statistical equilibrium, at fixed temperature T and pressure P , far more rapidly than chemical reactions can occur. Accordingly, if we view this system on timescales short compared to that τreact for the reactions (4.33) but long compared to the kinetic relaxation time, then we can regard the system as in statistical equilibrium with fixed numbers of water molecules NH2 O , hydrogen molecules NH2 , and oxygen molecules NO2 , and with a Gibbs potential whose value is given P by the Euler relation (4.27) generalized to a system with several conserved species: G= Iµ ˜I NI , i.e. (4.34) ˜ O2 NO2 . G=µ ˜ H2 O NH2 O + µ ˜ H2 NH2 + µ (Here, even though the Earth’s atmosphere is highly nonrelativistic, we include rest masses in the chemical potentials and in the Gibbs potential; the reason will become evident at the end of this section.) When one views the sample over a longer timescale, ∆t ∼ τreact , one

22 discovers that these molecules are not inviolate; they can change into one another via the reactions (4.33), thereby changing the value of the Gibbs potential (4.34). The changes of G are more P readily computed from the Gibbs representation of the first law dG = V dP − SdT + I µ ˜I dNI than from the Euler relation (4.34); taking account of the constancy of P and T and the fact that the reactions entail transforming two water molecules into two hydrogen molecules and one oxygen molecule (or conversely) so dNH2 = −dNH2 O ,

1 dNO2 = − dNH2 O , 2

(4.35a)

we obtain

1 dG = (2˜ µH2 O − 2˜ µ H2 − µ Õ2 ) dNH2 O . (4.35b) 2 The reactions (4.33) proceed in both directions, but statistically there is a preference for one direction over the other. The preferred direction, of course, is the one that reduces the Gibbs potential. Thus, if 2˜ µH2 O is larger than 2˜ µ H2 + µ ˜ O2 , then water molecules preferentially break up to form hydrogen plus oxygen; but if 2˜ µH2 O is less than 2˜ µ H2 + µ Õ2 , then oxygen and hydrogen preferentially combine to form water. As the reactions proceed, the changing N’s produce changes in the chemical potentials µ ˜I . [Recall from Eq. (3.47c) the intimate connection (2πmI kB T )3/2 µI /kB T e V (4.36) NI = h3 between µI = µ ˜I − mI c2 and NI for a gas in the nonrelativistic regime]. These changes in the NI ’s and µ ˜ I ’s lead ultimately to a macrostate (thermodynamic state) of minimum Gibbs potential G—a state in which the reactions (4.33) can no longer reduce G. In this final, equilibrium macrostate the dG of expression (4.35b) must be zero; and correspondingly, the combination of chemical potentials appearing in it must vanish: 2˜ µH2 O = 2˜ µ H2 + µ ˜ O2 .

(4.37)

The above analysis shows that the “driving force” for the chemical reactions is the combination of chemical potentials in the dG of Eq. (4.35b). Notice that this combination has coefficients in front of the µ ˜I ’s that are identical to the coefficients in the reactions (4.33) themselves; and the equilibrium relation (4.37) also has the same coefficients as the reactions (4.35b). It is easy to convince oneself that this is true in general: Consider any chemical reaction. Write the reaction in the form X X νjR AR (4.38) νjL ALj ↔ j . j

j

Here the superscripts L and R denote the “left” and “right” sides of the reaction, the Aj ’s are the names of the species of particle or atomic nucleus or atom or molecule involved in the reaction, and the νj ’s are the number of such particles (or nuclei or atoms or molecules) involved. Suppose that this reaction is occuring in an environment of fixed temperature and pressure. Then to determine the direction in which the reaction preferentially goes, examine the chemical-potential sums for the two sides of the reaction, X X νjL µ ˜Lj , νjR µ ˜R (4.39) j . j

j

23 P me

Water

lt

SecondOrder

Ice

eva

pora

te

Vapor

T

Fig. 4.5: Phase diagram for H2 O.

The reaction will proceed from the side with the larger chemical-potential sum to the side with the smaller; and ultimately the reaction will bring the two sides into equality. That final equality is the state of statistical equilibrium. Exercises 4.5 and 4.6 illustrate this analysis of chemical equilibrium. When dealing with chemical reactions between highly nonrelativistic molecules and atoms— e.g. water formation and destruction in the Earth’s atmosphere—one might wish to omit rest masses from the chemical potentials. If one does so, and if one wishes to preserve the criterion that the reaction goes in the direction of decreasing dG = (2µH2 O − 2µH2 − µO2 ) 12 dNH2 O [Eq. (4.35b) with tildes removed], then one must choose as the “rest masses” to be subtracted values that do not include chemical binding energies; i.e. one must define the rest masses in such a way that 2mH2 O = 2mH2 + mO2 . One can avoid this delicacy by simply using the relativistic chemical potentials. The derivation of the Saha equation (Ex. 4.6) is an example. **************************** EXERCISES Exercise 4.4 Example: Latent Heat and the Clausius-Clapeyron Equation (a) Consider H2 O in contact with a heat and volume bath of temperature T and pressure P . For certain values of T and P the H2 O will be water; for others, ice; for others, water vapor—and for certain values it may be a two- or three-phase mixture of water, ice, and/or vapor. Show, using the Gibbs potential, that if two phases a and b are present and in statistical equilibrium with each other, then their chemical potentials must be equal : µa = µb . Explain why, for any phase a, µa is a unique function of T and P . Explain why the condition µa = µb for two phases to be present implies that the two-phase regions of the T − P plane are lines and the three-phase regions are points; see Fig.4.5. The three-phase region is called the “triple point”. The volume V of the two- or three-phase system will vary depending on how much of each phase is present, since the density of each phase (at fixed T and P ) is different. (b) Show that the slope of the ice-water interface curve in Fig. 4.5 (the “melting curve”) is given by the “Clausius-Clapeyron equation” dP ∆qmelt ρice ρwater = , (4.40a) dT melt T ρice − ρwater

24 where ρ is density (mass per unit volume) and ∆qmelt is the latent heat per unit mass for melting (or freezing), i.e., the amount of heat required to melt a unit mass of ice, or the amount released when a unit mass of water freezes. Notice that, because ice is less dense than water, the slope of the melting curve is negative. [Hint: compute dP/dT by differentiating µa = µb , and then use the thermodynamic properties of Ga = µa Na and Gb = µb Nb .] (c) Suppose that a small amount of water is put into a closed container of much larger volume than the water. Initially there is vacuum above the water’s surface, but as time passes some of the H2 O evaporates to give vapor-water equilibrium. The vapor pressure will vary with temperature in accord with the Clausius-Clapeyron equation dPvapor ∆qevaporate ρwater ρvapor . (4.40b) = dT T ρwater − ρvapor Now, suppose that a foreign gas (not water vapor) is slowly injected into the container. Assume that this gas does not dissolve in the liquid water. Show that, as the pressure Pgas of the foreign gas gradually increases, it does not squeeze water vapor into the water, but rather it induces more water to vaporize: ρvapor dPvapor = >0, (4.40c) dPtotal T fixed ρwater where Ptotal = Pvapor + Pgas . Exercise 4.5 Example: Electron-Positron Equilibrium at “Low” Temperatures Consider hydrogen gas in statistical equilibrium at a temperature T ≪ me c2 /kB ≃ 6 × 109 K. Electrons at the high-energy end of the Boltzmann energy distribution can produce electron-positron pairs by scattering off protons e− + p → e− + p + e− + e+ .

(4.41)

[There are many other ways of producing pairs, but in analyzing statistical equilibrium we get all the information we need (a relation among the chemical potentials) by considering just one way.] (a) In statistical equilibrium the above reaction and its inverse must proceed at the same rate, on average. What does this imply about the relative magnitudes of the electron and positron chemical potentials µ ˜− and µ ˜+ (where the rest mass-energies are included in the µ ˜’s)? (b) Although these reactions require an e− that is relativistic in energy, almost all the electrons and positrons will have kinetic energies of magnitude E − mc2 ∼ kB T ≪ mc2 , and thus will have E ≃ mc2 + p2 /2m. What are the densities in phase space N± = dN± /d3 xd3 p for positrons and electrons in terms of p, µ ˜ ± , and T ? Explain why for a hydrogen gas we must have µ ˜− > 0 and µ ˜+ < 0.

25 10

10 T, K 10

9

10

8 -30

10

-25

10

-20

10

10 -15 ρ, g/cm 3

-10

10

-5

10

1

10 5

Fig. 4.6: The temperature Tp at which electron-positron pairs form in a dilute hydrogen plasma, plotted as a function of density ρ. This is the correct upper limit (upper dashed curve in Fig. 2.7) on the region where the plasma can be considered fully nonrelativistic. Above this curve, although T may be ≪ me c2 /kB ≃ 6 × 109 K, a proliferation of electron-positron pairs radically changes the properties of the plasma.

(c) Assume that the gas is very dilute so that η ≪ 1 for both electrons and positrons. Then integrate over momenta to obtain the following formula for the number densities in physical space of electrons and positrons µ ˜ ± − mc2 2 3/2 . (4.42) n± = 3 (2πmkB T ) exp h kB T In cgs units, what does the dilute-gas assumption η ≪ 1 correspond to in terms of n± ? What region of hydrogen mass density ρ and temperature T is the dilute-gas region? (d) Let n be the number density of protons. Then by charge neutrality n = n− − n+ will also be the number density of “ionization electrons” (i.e., of electrons that have been ionized off of hydrogen). Show that the ratio of positrons (and hence of pairs) to ionization electrons is given by n+ 1 = 1 n 2y[y + (1 + y 2 ) 2 ] where

1 2 y ≡ nλ3 emc /kB T , 4

(4.43a)

h (4.43b) 2πmkB T is the thermal deBroglie wavelength of the electrons. Fig. 4.6 shows the temperature Tp at which, according to this formula, n+ = n (and y = 0.354), as a function of mass density ρ ≃ mproton n. This Tp can be thought of as the “temperature at which pairs form” in a dilute plasma. Somewhat below Tp there are hardly any pairs; somewhat above, the pairs are profuse. and λ ≡ √

(e) Note that at low densities pairs form at temperatures T ∼ 108 K ≃ 0.02me c2 /kB . Explain in terms of “available phase space” why the formation temperature is so low. Exercise 4.6 Example: Saha Equation for Ionization Equilibrium

26 Consider an optically thick hydrogen gas in statistical equilibrium at temperature T . (By “optically thick” is meant that photons can travel only a distance small compared to the size of the system before being absorbed, so they are confined by the hydrogen and kept in statistical equilibrium with it.) Among the reactions that are in statistical equilibrium are H + γ ↔ e + p [ionization and recombination of Hydrogen H, with the H in its ground state] and e + p ↔ e + p + γ [emission and absorption of photons by “bremsstrahlung”, i.e., by the coulomb-force-induced acceleration of electrons as they fly past protons]. Let µ ˜γ , µ ˜H , µ ˜e , and µ ˜p be the chemical potentials including rest mass-energies; let mH , me , mp be the rest masses; denote by I ≡ (13.6 electron volts) the ionization energy of hydrogen, so that mH c2 = me c2 + mp c2 − I; denote µj ≡ µ ˜j − mj c2 ; and assume that T ≪ me c2 /kB ≃ 6 × 109 K, and that the density is low enough that the electrons, protons, and Hydrogen atoms can be regarded as nondegenerate (i.e., as distinguishable, classical particles). (a) What relationships hold between the chemical potentials µ ˜γ , µ ˜H , µ ˜ e , and µ ˜p? (b) What are the number densities nH , ne , and np expressed in terms of T and µ ˜H, µ ˜e , 1 µ ˜p —taking account of the fact that the electron and proton both have spin 2 , and including in H all possible electron and nuclear spin states? (c) Derive the Saha equation for ionization equilibrium (2πme kB T )3/2 −I/kB T ne np = e . nH h3

(4.44)

This equation is widely used in astrophysics and elsewhere. ****************************

4.5

Fluctuations of Systems in Statistical Equilibrium

As we saw in Chap. 3, statistical mechanics is built on a distribution function ρ, which is equal to the probability of finding a chosen system in a quantum state at some chosen location in the system’s phase space. For systems in statistical equilibrium this probability is given by the microcanonical or canonical or grand canonical or Gibbs or . . . distribution, depending on the nature of the interaction of the system with its surroundings. Classical thermodynamics, as studied in this chapter, makes use of only a tiny portion of the information in this probability distribution: the mean values of a few macroscopic parameters (energy, volume, pressure, . . .) and the entropy. Also contained in the distribution function, but ignored by classical thermodynamics, is detailed information about fluctuations of a system away from its mean values. We studied a simple example of this in Ex. 3.7: fluctuations in the number of particles in some chosen volume V of a dilute, nonrelativistic gas. The chosen volume, regarded as a system, was described by the grand canonical ensemble because it could freely exchange

27

Bath None V & E with dE = −P dV Heat Heat & Volume Heat & Particle

Fundamental Potential S(ρ) with E const S(P ; ρ) with H = E + P V const F (T ; ρ) = E¯ − T S G(T, P ; ρ) = E¯ + P V¯ − T S ¯ − TS Ω(T, µ ˜, ρ) = E¯ − µ Ñ

Total Entropy Second S + Sb Law S+ const dS ≥ 0 S+ const dS ≥ 0 (see Ex. 4.8) −F/T + const dF ≤ 0 −G/T + const dG ≤ 0 −Ω/T + const dΩ ≤ 0

Fluctuational Probability ∝ eS/kB ∝ eS/kB ∝ e−F/kB T ∝ e−G/kB T ∝ e−Ω/kB T

Table 4.2: Deviations from Statistical Equilibrium; cf. Table 4.1.

heat and particles with its surroundings (i.e. it had imaginary walls); and from the grandcanonical probability distribution we found that the number of particles in the volume V was described by a Poisson distribution ¯

pN = e−N

¯N N . N!

(4.45)

¯ of this distribution is equal to the number predicted by thermodynamics The mean N √ ¯ [Eq. (3.39)], and the root-mean-square deviation from the mean, σN , is equal to N. ¯ is huge, as it is for all the systems studied in this chapter, the Poisson distribuWhen N tion (4.45) is extremely well approximated by a Gaussian. to that Gaussian, take √ To convert N the logarithm of Eq. (4.45), use Stirling’s formula N! ≃ 2πN(N/e) , and expand in powers ¯ keeping only terms up through quadratic order. The result, after exponentiating, of N − N is ¯ 2 1 (N − N) pN = √ exp − . (4.46) ¯ ¯ 2N 2π N In the next chapter (Sec. 5.2) we shall learn that the probability distribution (4.45) had to be very nearly Gaussian: Any probability distribution that is produced by a superposition of the influences of many independent, random variables (in this case the independent, random motions of many gas particles) must be Gaussian to very high precision. In this section we shall sketch the general theory of fluctuations of large systems in statistical equilibrium—a theory of which the above example is a special case. We begin by confining attention to a second specific case, and we then shall generalize. Our second specific case is a microcanonical ensemble of boxes, each with volume V and each containing precisely N identical, dilute (η ≪ 1), nonrelativistic gas particles and containing energy (excluding rest mass) between E and E +δE, where δE ≪ E. (Remember the “kludge” that was necessary in Ex. 3.8). Focus attention on a set of quantities yj which characterize these boxes of gas and which are not fixed by the set E, V, N. For example, y1 might be the total number n of particles in the right half of a box, and y2 might be the total energy ε in the right half. We seek a joint probability distribution for these yj ’s. If the yj ’s can take on only discrete values (e.g., y1 = n), then the total number of quantum states that correspond to specific values of the yj ’s is related to the entropy S by the standard microcanonical relation Nstates (yj ; E, V, N) = exp[S(yj ; E, V, N)/kB ] ;

(4.47)

28 and correspondingly, since all states are equally probable in the microcanonical ensemble, the probability of finding a system of the ensemble to have the specific values yj is S(yj ; E, V, N) Nstates (yj ; E, V, N) . (4.48a) = const × exp p(yj ; E, V, N) = P kB yj Nstates (yj ; E, V, N)

Similarly, if the yj take on a continuous range of values (e.g., y2 = ε), then the probability of finding yj in some tiny, fixed range dyj is proportional to exp[S(yj ; E, V, N)/kB ], and correspondingly the probability per unit yj interval of finding a system to have specific values is dp(yj ; E, V, N) S(yj ; E, V, N) . (4.48b) = const × exp dy1 dy2 . . . dyr kB In expressions (4.48a) and (4.48b), the entropy S(yj ; E, V, N) is to be computed via statistical mechanics (or, when possible, via thermodynamics) not for the original ensemble of boxes in which the yj were allowed to vary, but rather for an ensemble in which the yj ’s are fixed at the chosen values. The probability distributions (4.48a) and (4.48b) though “exact,” are not terribly instructive. To get better insight we expand S in powers of the deviation of yj from its mean. Denote by y¯j the value of yj that maximizes the entropy (this will turn out also to be the mean of the distribution). Then for small |yj − y¯j |, Eq. (4.48b) becomes 2 dp(yj ; E, V, N) ∂ S 1 (yj − y¯j )(yk − y¯k ) (4.48c) = const × exp dy1 dy2 . . . dyr 2kB ∂yj ∂yk

and similarly for Eq. (4.48a). Here the second partial derivative of the entropy is to be evaluated at the maximum-entropy location, where yj = y¯j for all j. Expression (4.48c) is a (multidimensional) Gaussian probability distribution, as expected. Moreover, for this distribution the values y¯j that were defined to give maximal entropy (i.e., the “most probable” values) are also the means. For the specific example where y1 ≡ n =(number of particles in right half of box) and y2 ≡ ε =(amount of energy in right half of box), we can infer S(n, ε; N, E, V ) from Eq. (3.53) as applied to the two halves of the box and then added: 3/2 3/2 4πm 5/2 V ε S(n, ε; N, E, V ) = kB n ln e 3h2 2 n5/2 3/2 3/2 4πm 5/2 V (E − ε) + kB (N − n) ln . (4.49a) e 3h2 2 (N − n)5/2 It is straightforward to compute from expression (4.49a) the values ε¯ and n ¯ of ε and n that maximize the entropy: N E ¯= . (4.49b) ε¯ = , n 2 2 Thus, in agreement with intuition, the mean values of the energy and particle number in the right half box are equal to half of the box’s total energy and particle number. It is also straightforward to compute from expression (4.49a) the second partial derivatives of the

29 entropy with respect to ε and n, evaluate them at ε = ε¯ and n = n ¯ , and plug them into the probability distribution (4.48c). The result is −(n − N/2)2 −[(ε − E/2) − (E/N)(n − N/2)]2 dpn . (4.49c) = const × exp + dε 2(N/4) 2(N/6)(E/N)2 [There is no dn in the denominator of the left side because n is a discrete variable; cf Eqs. (4.48a) and (4.48b).] This Gaussian distribution has the following interpretation: (i ) there is a correlation between the energy ε and the particle number n in the right half of the box, as one might have expected: If there is an excess of particles in the right half, then we must expect an excess of energy there as well. (ii ) The quantity that is not correlated with n is ε − (E/N)n, as one might have expected. (iii ) For fixed n, dp pn /dε is Gaussian with mean ε¯ = E/2 + (E/N)(n − N/2) and with variance σε = (E/N) N/6. (iv ) After integrating over ε we obtain −(n − N/2)2 pn = const × exp . (4.49d) 2N/4 p This is Gaussian with mean n ¯ = N/2 and variance σn = N/4. By contrast, if the right half of the box had been in equilibrium with a bath far p larger than itself, n would have had a variance equal to the square root of its mean, σn = N/2. The fact that the “companion” of the right half has only the same size as the right half, rather thanp being far p larger, has reduced the variance of the number of particles in the right half from N/2 to N/4. Notice that all the concrete probability distributions we have derived, Eqs. (4.45), (4.49c), and (4.49d) are exceedingly sharply peaked about their means: Their variances (“halfwidths”) divided √ by their means, i.e., the magnitude of their fractional fluctuations, are ¯ , where N ¯ is the mean number of particles in a system; and in realistic all of order 1/ N ¯ ¯ is of order 1029 for a cubic meter of gas inside the situations N is very large. (For example, N sun, and thus the fractional fluctuations of thermodynamic quantities are of order 10−14 .) It is this extremely sharp peaking that makes classical thermodynamics insensitive to the choice of type of equilibrium ensemble—i.e., sensitive only to means and not to fluctuations about the means. How does the above, microcanonical, analysis generalize to ensembles of systems in equilibrium with various kinds of baths? Consider, as a specific example, a Gibbs ensemble of identical systems in equilibrium with a set of identical heat and volume baths. The systems all contain the same number of particles N, but they have a spread of energies E and volumes V , as well as a spread of some set of variables yi that interest us. (For example, one of the yi’s might be the energy in the right half of a system.) For ease of notation and presentation we shall assume that E and V are completely determined by the full set of chosen yi’s together with N. Suppose that we pick a system at random from our equilibrium ensemble. We would like to know the probability p(yi ) that it has a specific set of values of its variables yi (or, for those variables that can vary continuously, the probability it has the specific values yi in a chosen, fixed, tiny range dyi). Because each system plus bath is closed, the systems plus their baths are microcanonically distributed. Therefore, the probability p(yi) is proportional to the total number of quantum states that a system plus its bath can have when the system

30 is constrained to the values yi in dyi , p(yi ) ∝ Nstates ; the total number of quantum states in turn is related to the entropy of system plus bath by Nstates = e(Ssystem +Sbath )/kB ; and the entropy of system plus bath in turn is related to the Gibbs potential of the system by Ssystem + Sbath = −G/Tb + constant

(4.50)

p(yi ; N, Tb , Pb ) = const × e−G(yi ,N,Tb ,Pb )/kB Tb

(4.51a)

[Eq. (4.32)]. Thus, in the case of discrete yi, and similarly for continuous yi : dp(yi; N, Tb , Pb ) = const × e−G(yi ,N,Tb ,Pb)/kB Tb dy1dy2 . . . dyr

(4.51b)

Here the Gibbs potential is not that of the equilibrium Gibbs ensemble; rather, in accord with the derivation of Eq. (4.32), it is a potential constructed from the bath temperature (denoted T in (4.32) but here and below denoted Tb ), the bath pressure Pb , the energy E(yi, N) of the system, the volume V (yi , N) of the system, and the entropy S(yi, N) of a microcanonical subensemble of systems that have the specified values of the yi ’s, and the same value of N as all the systems in the original equilibrium ensemble: G(yi, N, Tb , Pb ) = E(yi , N) + Pb V (yi, N) − Tb S(yi, N) .

(4.52)

As a specific example of the probability distribution (4.51a), consider a monatomic gas and examine its fluctuations of temperature and volume. More specifically, inside a huge bath of monatomic gas that is in statistical equilibrium pick out at random a small sample containing precisely N atoms (with N >>> 1). That sample (the system) will have, because of statistical fluctuations, a temperature T that differs slightly from Tb and a volume V that differs slightly from the equilibrium value V¯ predicted by the Gibbs ensemble. We want to know the probability that it will have specific values of T and V in specific ranges dT and dV . The answer is given by Eq. (4.51a) with y1 = T and y2 = V . Expanding the G of that answer in powers of V − V¯ and T − Tb , setting p(yi) = dp (a mere change of notation) and dividing by dT dV , we obtain 2 dp ∂ G 1 ∂2G ∂2G 2 2 ¯ ¯ = const ×exp − (V − V ) + (T −Tb ) + 2 (V − V )(T −Tb ) . dT dV 2kB Tb ∂V 2 ∂T 2 ∂T ∂V (4.53a) Here the Gibbs function to be differentiated is [cf. the nonrelativistic version of Eq. (4.52)] G = E(T, V, N) + Pb V − Tb S(T, V, N) ,

(4.53b)

with E(T, V, N) and S(T, V, N) being the energy and entropy of the system (gas cell with N particles) at its specified values of T , V , and N; and the derivatives are to be evaluated at T = Tb and V = V¯ . The terms linear in V − V¯ and T − Tb in the expansion (4.53a) have been omitted because (as we shall see) their coefficients ∂G/∂V and ∂G/∂T vanish when V = V¯ and T = Tb .

31 By straightforwardly differentiating Eq. (4.53b) once with respect to T and V (while holding N fixed) and invoking the first law of thermodynamics dE = −P dV + T dS + µdN, we obtain ∂G ∂S ∂G ∂S = (T − Tb ) , = (Pb − P ) + (T − Tb ) . (4.53c) ∂T V,N ∂T V,N ∂V T,N ∂V T,N These quantities vanish when V = V¯ and T = Tb , as promised. Their vanishing, in fact, is guaranteed by the statistical equilibrium of the bath. The second derivatives of G, evaluated for V = V¯ and T = Tb , are readily computed from expressions (4.53c) to be 2 2 2 ∂ G ∂ G CV 1 ∂ G , =0, (4.53d) = = , 2 2 ∂T V,N Tb ∂V T,N κ ∂T ∂V N where CV is the gas sample’s specific heat at fixed volume and κ is its compressibility at fixed temperature: ∂V ∂E , κ≡− . (4.53e) CV ≡ ∂T V,N ∂P T,N Inserting these relations into expression (4.53a) we obtain the final form of the probability distribution for temperature and volume fluctuations in the monatomic gas: dp (V − V¯ )2 CV (T − Tb )2 = const × exp − − . (4.53f) dT dV 2kB Tb κ 2kB Tb2 Notice that the root-mean-square fluctuations of the volume in this Gaussian probability p √ 2 distribution are σV = kB Tb κ, and those of the temperature are σT = kB Tb /CV . Since CV√and κ are proportional √ to the number of atoms, N, in the sample of gas, σT scales as 1/ N, and σV scales as N, as one might expect. Because there are no cross terms, (V − V¯ )(T − Tb ), in the probability distribution (4.53f), the volume and temperature fluctuations are uncorrelated. It is straightforward to generalize the probability distribution (4.51a) to other kinds of baths. In general, the quantity that replaces G in (4.51a) is the fundamental potential for the chosen kind of bath: the physical free energy F in the case of a heat bath, the enthalpy H in the case of a volume bath, and the grand potential Ω in the case of a heat and particle bath. **************************** EXERCISES Exercise 4.7 Example: Fluctuations and Phase Transitions in a van der Waals Gas A real monatomic gas exhibits attractive forces between its atoms (or molecules) when the atoms are moderately far apart, and repulsive forces when they are close together. These forces modify expression (4.9c) for the gas’s nonrelativistic energy E in terms of

a 32

P

P

T=

T>

T

cri

t

T

crit

a T<

rit

Tc

B

V/N

b

(a)

A

V/N

(b)

Fig. 4.7: (a) The van der Waals equation of state P (N, V, T ) plotted as pressure P versus specific volume V /N at fixed temperature T , for various values of the temperature T . (b) The route of a phase transition in the van der Waals gas. The transition is a discontinuous jump from point A to point B.

its volume V , entropy S, and number of atoms N. A simple analytic approximation to these modifications is given by the van der Waals equation −2/3 aN 2 5 2 S 3h2 V − −b − . exp E(V, S, N) = N 4πm N 3kB N 3 V

(4.54)

Here b is the specific volume (volume per particle) at which the repulsion becomes so strong that this approximation idealizes it as infinite. The term −aN 2 /V is associated with the attractive force at moderate distances, and a characteristic temperature below which this attractive force is important is given by To ≡ a/bkB .

(a) Derive the equation of state P = P (N, V, T ) for this gas, and compare it with that of an ideal gas. Show that the equation of state has the form depicted in Fig. 4.7. What is the critical temperature Tcrit below which the curve in Fig. 4.7 (a) has a local maximum and a local minimum?

(b) Where along the curves in Fig. 4.7(a) is the gas stable against volume fluctuations, and where is it unstable? For what range of T and P can there be two different phases that are both stable against volume fluctuations?

(c) Let the temperature T be fixed at T < Tcrit , and gradually increase the density from zero (decrease the volume from infinity). At low densities the gas will be vaporous, and at high densities it will be liquid. The phase transition from vapor to liquid involves a discontinuous jump from some point A in Fig. 4.7 (b) to another point B. Use the principle of minimum Gibbs potential (Sec. 4.4) to prove that the straight line from A to B in Fig. 4.7 (b) is horizontal and has a height such that the areas of the two stippled regions are equal.

(d) At what values of the pressure P and specific volume V /N does the gas exhibit huge volume fluctuations?

33 Exercise 4.8 Example: Fluctuations of Systems in Contact with a Volume Bath Exercise 4.3 explored the enthalpy representation of thermodynamics for an equilibrium ensemble of systems in contact with a volume bath. Here we extend that analysis to an ensemble out of equilibrium. We denote by Pb the bath pressure. (a) The systems are free to exchange volume with the bath but not heat or particles. Explain why, even though the ensemble may be far from equilibrium, any system’s volume change dV must be accompanied by an energy change dE = −Pb dV . This implies that the system’s enthalpy H = E + Pb V is conserved. All systems in the ensemble are assumed to have the same enthalpy H and the same number of particles N. (b) Using equilibrium considerations for the bath, show that interaction with a system cannot change the bath’s entropy. (c) Show that the ensemble will always evolve toward increasing entropy S, and that when the ensemble finally reaches statistical equilibrium with the bath, its distribution function will be that of the enthalpy ensemble (Table 4.1): ρ = e−S/kB = const for all regions of phase space that have the specified particle number N and enthalpy H. (d) Show that fluctuations away from equilibrium are described by the probability distributions (4.48a) and (4.48b), but with the system energy E replaced by the system enthalpy H; cf. Table 4.2.

****************************

4.6

T2 The Ising Model and Renormalization Group Methods

Having presented a thermodynamic description and classification of phase transitions, we now seek microphysical insight into them. After a little contemplation, one discovers that this is an extremely challenging problem because a phase change is an intrinsically nonperturbative process. Perhaps for this reason, the statistical mechanics of phase transitions has inspired the development of some of the most beautiful and broadly applicable methods in modern theoretical physics. In this section and the next we shall give the flavor of these developments by presenting simple examples of two methods of analysis: the renormalization group, and Monte Carlo techniques.7 We shall illustrate these methods using a simplified model of a second order ferromagnetic phase transition, which involves spins arranged on a two dimensional square lattice. Each spin s can take on the discrete values +1 (“up”) and −1 (“down”), and it is idealized as interacting solely with each of its four nearest neighbors, with an interaction energy −Jss′ that is attractive if the spins are aligned and repulsive if they are opposite. (Note that we 7

Our presentation is based in part on Maris and Kadanoff (1978) and in part on Chandler (1987).

34

1 2

5

4

3 Fig. 4.8: Partition of a square lattice into two interlaced square lattices (solid circles and open circles). In the renormalization group approach, the open-circle spins are removed from the lattice, and all their interactions are replaced by modified interactions between the remaining solid-circle spins. The new lattice is rotated by π/4 with respect to the original lattice and the lattice spacing increases by a factor 21/2 .

do not explicitly include more distant interactions although these are surely present. As we shall see, these are not essential to give us a model of a phase transition. However, as we shall also see the “knock-on” effect from one spin to the next does introduce an indirect long range organization which can propagate across the lattice as the temperature is reduced below its critical value.) The proportionality constant in the interaction energy depends on V /N, where N is the total number of spins and V is the lattice’s 2-dimensional volume (i.e. its area). We assume that these are both held constant. (Recall from Sec. 4.4 that the volume does not change at a second order phase transition.) For ease of later notation, we shall write the interaction energy between two neighboring spins as F (V /N) , (4.55) −Jss′ = −kB T Kss′ , where K = kB T with F a function whose actual form will be unimportant. Note that K = F /kB T is dimensionless. As the interaction is attractive, K > 0. This is the Ising model after E. Ising who first investigated it in 1925. When the temperature is so high that J ≪ kB T , i.e. when K ≪ 1, the spins will be almost randomly aligned and the total interaction energy will be close to zero. Conversely, at low temperatures, where K ≫ 1, the strong coupling will make it energetically favorable for most of the spins to be aligned over large volumes. In the limit, the total interaction energy → −2NJ. At some critical intermediate temperature Tc and corresponding value Kc of K, there will be a phase transition. We shall compute the critical Kc and the dependence of the lattice’s specific heat on T near Tc , using renormalization group methods in this section and Monte Carlo methods in the next; and we shall examine the accuracy of these methods by comparing our results with an exact solution for the Ising model, derived in a celebrated 1944 paper by Lars Onsager. The key idea behind the renormalization group approach is to try to replace the full lattice by a sparser lattice that has similar thermodynamic properties and then to iterate, making the lattice more and more sparse; cf Fig. 4.8. In implementing this procedure, we

35 shall embody all the lattice’s thermodynamic properties in its physical free energy F (N, V, T ) (the appropriate fundamental potential for our situation of fixed N and V and interaction with a heat bath); we shall evaluate F using the canonical-ensemble sum over states P and −En /kB T e−F/kB T ≡ z = e . For our Ising model with its nearest-neighbor interaction n energies, Eq. (4.55), this sum becomes X 1 z= eKΣ si sj . (4.56a) {s1 =±1,s2 =±1,...}

Here in the exponential Σ1 means a sum over all pairs of nearest neighbor sites {i, j}. The first step in the renormalization group method is to rewrite Eq. (4.56a) so that each of the open-circle spins of Fig. 4.8, e.g. s5 , appears in only one term in the exponential, and then explicitly sum each of those spins over ±1 so they no longer appear in the summations: X

z =

· · · eK(s1 +s2 +s3 +s4 )s5 · · ·

{...,s4 =±1,s5 =±1,s6 =±1,...}

X

=

{...s4 =±1,s6 =±1...}

· · · eK(s1 +s2 +s3 +s4 ) + e−K(s1 +s2 +s3 +s4 ) · · · .

(4.56b)

(This rewriting of z is possible because each open-circle spin interacts only with solid-circle spins.) The partition function is now a product of terms like those in the square brackets, one for each open-circle lattice site that we have “removed”. We would like to rewrite each square bracketed term in a form involving solely nearest-neighbor interactions of the solid-circle spins, so that we can then iterate our procedure. Such a rewrite, however, is not possible; after some experimentation, one can verify that the rewrite also requires nextnearest-neighbor interactions and four-site interactions:

eK(s1 +s2 +s3 +s4 ) + e−K(s1 +s2 +s3 +s4 ) 1

= f (K)e 2 K1 (s1 s2 +s2 s3 +s3 s4 +s4 s1 )+K2 (s1 s3 +s2 s4 )+K3 s1 s2 s3 s4

(4.56c)

where we can determine the functions K1 (K), K2 (K), K3 (K), f (K) by substituting each of the three distinct combinations of {s1 , s2 , s3 , s4 } into Eq. (4.56b). The result is 1 ln cosh(4K) 4 1 ln cosh(4K) K2 = 8 1 1 ln cosh(4K) − ln cosh(2K) K3 = 8 2 1/2 f (K) = 2[cosh(2K)] [cosh(4K)]1/8 K1 =

(4.56d)

By inserting expression (4.56c) and the analogous expressions for the other terms into Eq. (4.56b), we obtain the partition function for our original N-spin lattice of open and

36 closed circles, expressed as a sum over the N/2-spin lattice of closed circles: X 1 2 3 z(N, K) = [f (K)]N/2 e[K1 Σ si sj +K2 Σ si sj +K3 Σ si sj sk sl ]

(4.56e)

Here the symbol Σ1 still represents a sum over all nearest neighbors but now in the N/2 lattice, Σ2 is a sum over the four next nearest neighbors and Σ3 is a sum over spins located at the vertices of a unit cell. (The reason we defined K1 with the 1/2 in Eq. (4.56c) was because each nearest neighbor interaction appears in two adjacent squares of the solid-circle lattice, thereby converting the 1/2 to a 1 in Eq. (4.56e).) So far, what we have done is exact. We now make two drastic approximations that are designed to simplify the remainder of the calculation and thereby elucidate the renormalization group method. First, in evaluating the partition function (4.56e), we drop completely the quadruple interaction (ie we set K3 = 0). This is likely to be decreasingly accurate as we lower the temperature and the spins become more aligned. Second, we assume that near the critical point, in some average sense, the degree of alignment of next nearest neighbors (of which there are as many as nearest neighbors) is “similar” to that of the nearest neighbors, so that we can set K2 = 0 but increase K1 to K ′ = K1 + K2 =

3 ln cosh(4K). 8

(4.57a)

(If we simply ignored K2 we would not get a phase transition.) This substitution ensures that the energy of a lattice with N/2 aligned spins, and therefore N nearest neighbor and N next nearest neighbor bonds, namely −(K1 + K2 )NkB T , is the same as that of a lattice in which we just include the nearest neighbor bonds, but strengthen the interaction. Clearly this will be unsatisfactory at high temperature. These approximations bring the partition function (4.56e) into the form z(N, K) = [f (K)]N/2 z(N/2, K ′ ) ,

(4.57b)

which relates the partition function for our original Ising lattice of N spins and interaction constant K to that of a similar lattice with N/2 spins and interaction constant K ′ . As the next key step in the renormalization procedure, we note that because the free energy, F = −kB T ln z, is an extensive variable, ln z must increase in direct proportion to the number of spins; i.e, it must have the form −F/kB T ≡ ln z(N, K) = Ng(K)

(4.58a)

defining the function g(K). By combining Eqs. (4.57b) and (4.58a) we obtain a relation for the function g(K) in terms of the function f (K): g(K ′ ) = 2g(K) − ln f (K) .

(4.58b)

Here K ′ is given by Eq. (4.57a) and f (K) by Eq. (4.56d). Eq. (4.57a), (4.58b) are the fundamental equations that allow us to calculate thermodynamic properties under this approximation. Let us examine them more carefully.

37 The iterative map (4.57a) which expresses the coupling constant K ′ for a lattice of size N/2 in terms of K for a lattice of size N, has a fixed point which is obtained by setting K ′ = K, or Kc = 83 ln cosh(4Kc ); i.e. Kc = 0.507 .

(4.59)

This fixed point corresponds to the critical point for the lattice. We can make the identification on physical grounds. Suppose that K is slightly smaller than Kc and we make successive iterations. As dK/dK ′(Kc ) < 1 the difference increases with each step - the fixed point is unstable. What this means is that as we look on larger and larger scales, the lattice becomes more disordered. Conversely, at low temperature, when K > Kc , the lattice become more ordered with increasing scale. Only when K = Kc does the lattice appear to be comparably disordered on all scales. It is here that the increase of order with length scale changes from the inside out (high temperature) to the outside in (low temperature). To demonstrate that K = Kc is indeed the location of a phase transition, we shall compute the lattice’s specific heat in the vicinity of Kc . The first step in the computation is to compute the lattice’s entropy, S = −(∂F/∂T )V,N . Recalling that K ∝ 1/T at fixed V, N [Eq. (4.55)] and using expression (4.58a) for F , we see that ∂F dg S=− = NkB g − K . (4.60a) ∂T V,N dK The specific heat at constant volume is then, in turn, given by ∂S d2 g CV = T = NkB K 2 . ∂T V,N dK 2

(4.60b)

Next we note that, as the iteration Eq. (4.57a) is unstable near Kc , the inverse iteration K=

1 cosh−1 [exp(8K ′ /3)] 4

(4.60c)

is stable. The corresponding inverse transformation for the function g(K) is 1 1 g(K) = g(K ′) + ln{2 exp(2K ′ /3)[cosh(4K ′ /3)]1/4 } 2 2

(4.60d)

Now we know that at low temperature K >> Kc , all the spins are aligned and g(K) ≃ 2K. Conversely, at high temperature, there is complete disorder and K → 0. This means that every one of the 2N terms in the partition function is unity and g(K) ≃ ln 2. We can therefore use the iterative map, Eq. (4.60c),(4.60d) to approach K = Kc from either side starting with the high temperature and low temperature limits. This allows us to compute thermodynamic quantities, Fig. 4.9. For each value of K, we evaluate g(K), g ′(K) and g ′′ (K) numerically and use the results to compute F , S and CV using Eq. (4.58a),(4.60a),(4.60b). Note that the specific heat diverges at Kc , as K → Kc , verifying that this is a second order phase transition. In order to calculate the form of this divergence, suppose that g(K) is a sum of an analytic (infinitely differentiable) function and a non-analytic part, the latter being designated with

0.510 0.508 0.506

-2.136 -2.138

(a)

F/NJ

K

38

0.504 0.502

(b)

-2.140 -2.142

-2.144 .502 .504 .506 .508 .51

.502 .504 .506 .508 .51

J/kT

0 -0.02 -0.04 -0.06 -0.08

100

(c)

(d)

80

CV /Nk

S/Nk

K’

.502 .504 .506 .508 .51

60 40 20

.502 .504 .506 .508 .51

J/kT

J/kT

Fig. 4.9: a. Iteration map K(K ′ ) in the vicinity of the critical point. b. Free energy per spin c. Entropy per spin, d. Specific heat per spin.

a ∼. Suppose that g˜(K) ∼ |K − Kc |2−α for some “critical exponent” α. This implies that CV diverges ∝ |K − Kc |−α ∝ |T − Tc |−α , where Tc is the critical temperature. Now, from Eq. (4.60d), we have that |K ′ − Kc |2−α = 2|K − Kc |2−α , (4.61a) or equivalently,

dK ′ = 21/(2−α) . dK Evaluating the derivative at K = Kc from Eq. (4.60c), we obtain α =2−

ln 2 = 0.131 ln(dK ′ /dK)c

(4.61b)

(4.61c)

which is consistent with the numerical calculation. For comparison, the exact Onsager analysis gives Kc = 0.441 and CV ∝ − ln |T − Tc |. This analysis appears to have a serious problem in that it gives a negative value for the entropy in the vicinity of the critical point. This is surely unphysical. (The entropy becomes positive on either side of the critical point.) This is an artificiality associated with our particular ansatz which it does not seem easy to cure. For example, if we write h(K) = ln f (K) in Eq. (4.58b), then the entropy at the critical point is given by d ln K ′ NkB h(K) d ln h 2− (4.62) Sc = − 2 − d ln K ′ /d ln K d ln K d ln K all evaluated at K = Kc . Simply multiplying K ′ (K) and h(K) by different coefficients cannot change the sign of Sc . Nonetheless this procedure does exhibit the physical essentials of the renormalization group approach to critical phenomena.

39 Why did we bother to go through this cumbersome procedure when Onsager has given us an exact analytical solution? The answer is that it is not possible to generalize the Onsager solution to more complex and realistic problems. In particular, it has not even been possible to find a three dimensional counterpart. However, once the machinery of the renormalization group has been mastered, it can produce approximate answers, with an accuracy that can be estimated, for a whole variety of problems. In the following section we shall look at a quite different approach to the same 2D Ising problem with exactly the same motivation in mind. **************************** EXERCISES Exercise 4.9 Example: One Dimensional Ising Lattice (a) Write down the partition function for a one dimensional Ising lattice as a sum over terms describing all possible spin organisations. (b) Show that by separating into even and odd numbered spins, it is possible to factorize the partition function and relate z(N, K) exactly to z(N/2, K ′ ). Specifically show that z(N, K) = f (K)N/2 z(N/2, K ′ )

(4.63)

where K ′ = ln[cosh(2K)]/2 and f (K) = 2[cosh(2K)]1/2 . (c) Use these relations to demonstrate that the one dimensional Ising lattice does not exhibit a second order phase transition. Exercise 4.10 Derivation: One Dimensional Ising Lattice Derive Eq. 4.62 (or show that it is incorrect).

****************************

4.7

T2 Monte Carlo Methods

We now turn to our second general method for approximately analyzing phase transitions (and a much larger class of problems in statistical physics). This is the Monte Carlo approach.8 It will be instructive to tackle the same two dimensional Ising problem that we discussed in the last section. The approach is much more straightforward in principle. We set up a square lattice of spins as in Sec. 4.6 and initialize the spins randomly. (This calculation will be performed numerically and will need a (pseudo) random number generator. Most programming languages now supply this utility which is mostly used uncritically, occasionally with unintended consequences. Defining and testing randomness is an important topic which unfortunately, we 8

This is a laconic reference to the casino whose patrons believe that they will profit by exploiting random processes.

40 shall not address. See, for example, Press et al 1992.) We now imagine that this lattice is in contact with a thermal bath with a fixed temperature T – it is one member of a canonical ensemble of systems – and allow it to approach equilibrium by changing the orientiations of the spins in a prescribed manner. Our goal is to compute thermodynamic quantities using ¯ = z −1 Σe−E/kB T X where the sum is over all states. For example we can compute the X specific heat (at constant volume) from ¯ ∂ dE = CV = dT ∂T

Σe−E/kB T E Σe−E/kB T

2

E2 − E = . kB T 2

(Note how a singularity in the specific heat at a phase transition will be associated with large fluctuations in the energy as we discussed in Sec. 4.5.) In order to compute quantities like CV , we replace ensemble averages by averages over successive configurations of the lattice. Clearly, we cannot visit every one of the 2N configurations and so we must sample these fairly. How do we prescribe the rules for changing the spins? It turns out that there are many answers to this question and we shall just give one of the simplest, due to Metropolis et al (1953). In order to understand this, we must appreciate that we don’t need to understand the detailed dynamics through which a spin in a lattice flips. All that is required is that the prescription we adopt should maintain thermodynamic equilibrium. Let us label a single lattice state, specified by a matrix whose entries are ±1, Si and let its total energy be Ei . In addition, let us assign the probability of making a transition from a state Si to a new state Si′ to be pii′ . Now, in a steady state, Σi′ ρi′ pi′ i = ρi Σi′ pii′

(4.64a)

However, we know that in equilibrium, ρi′ = ρi exp[(Ei − Ei′ )/kB T ]

(4.64b)

The Metropolis rule is simple: if Ei > Ei′ , then pii′ = 1, but if Ei < Ei′ , then pii′ = exp[(Ei − Ei′ )/kB T ]. This will maintain thermodynamic equilibrium and, as can easily be shown, drive an out of equilibrium system towards equilibrium. The numerical expression of this procedure is to start with a random lattice and then choose one spin, at random, to make a trial flip. If the new configuration has a lower energy, we always accept the change. If it has a higher energy we only accept the change with a probability given by exp[−∆E/kB T ], where ∆E > 0 is the energy change. (Actually, there is a small subtlety here. The probability of making a given transition is the product of the probability of making the trial flip and of accepting the trial. However the probability of making a trial flip from up to down is the same as that for down to up and these trial probabilities cancel, so it is only the ratio of the probabilities of acceptance that matters.) In this way, we choose a sequence of states that will ultimately have the equilibrium distribution function, and we can perform our thermodynamic averages using this sequence in an unweighted fashion. This is a particularly convenient procedure for the Ising problem because, by changing one spin at a time, ∆E can only take one of 5 values and it is possible to change

41

T=1

T=2

T=3

Fig. 4.10: Typical Ising lattices for T = 1, 2, 3J/kB .

from one state to the next very quickly. (It also helps to store the two threshold probabilities for making an energy-gaining transition and avoid evaluating exponentials every step.) How big a lattice do we need and how many states should we consider? The lattice size can be surprisingly small to get qualitatively correct results, if we adopt periodic boundary conditions. That is to say, we imagine an infinite tiling of our actual lattice and every time we need to know the spin at a site beyond the last column, we use the corresponding spin in the first column, and so on. This device minimizes the effects of the boundary on the final answer. Lattices as small as 32 × 32 can be useful. The length of the computation depends upon the required accuracy. (In practice, this is usually implemented the other way round. The time available on a computer of given speed determines the accuracy.) One thing should be clear. It is necessary that we explore a reasonable volume of state space in order to be able to sample it fairly and compute meaningful estimates of thermodynamic quantities. The final lattice should exhibit no vestigial patterns from the configuration when the computation was half complete. In practice, it is this consideration that limits the size of the lattice and it is one drawback of the Metropolis algorithm that the step sizes are necessarily small. There is a large bag of tricks used in Monte Carlo simulation that can be used for variance reduction and estimation but we only concern ourselves here with the general method. Returning to the Ising problem, we show typical equilibrium lattices for three temperatures (measured in units of J/kB ) in Fig. 4.10. Recall that the critical temperature is Tc = J/kB Kc = 2.268J/kB . Note the increasingly long range order as the temperature is reduced. We have concluded this chapter with an examination of a very simple system that can approach equilibrium according to specified rules and which can exhibit strong fluctuations. In the following chapter, we shall examine these matters more systematically. **************************** EXERCISES Exercise 4.11 Practice: Direct Computation of Thermodynamic Integrals

42 Estimate how long it would take a PC to compute the partition function for a 32 × 32 Ising lattice by evaluating every possible configuration. Exercise 4.12 Example: Monte Carlo Approach to Phase Transition Write a simple computer program to compute the energy and the specific heat of a 2 dimensional Ising lattice as described in the text. Examine the accuracy of your answers by varying the size of the lattice and the number of states sampled. (You might also try to compute a formal variance estimate.) Exercise 4.13 Problem: Ising Lattice with an Applied Field It is straightforward to generalize our approach to the problem of a lattice placed in a uniform magnetic field B. This adds a term ∝ −Bs to the energy Eq. (4.55). Modify the computer program from Ex (4.12) to include this term and compute the magnetization and the magnetic susceptibility. **************************** Box 4.3 Important Concepts in Chapter 4 • Foundational Concepts – XXX – XXX • YYYY – XXXX

Bibliographic Note There is no shortage of textbooks on Statistical Thermodynamics. A particularly useful treatment of phase transitions, on which Sec. 4.6 is based, is Chandler (1987).

Bibliography Binney, J. J., Dowrick, N. J., Fisher, A. J. & Newman, M. E. J. 1992 The Theory of Critical Phenomena Oxford: Oxford University Press Chandler, D. 1987 Introduction to Modern Statistical MechanicsOxford: Oxford University Press

43 Goldstein, H. 1959 Classical Mechanics New York: Addison Wesley Kittel, Charles 1958 Elementary Statistical PhysicsNew York: Wiley Maris, H. J. & Kadanoff, L. J. 1978 Am. J. Phys. 46 652 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A. & Teller, E. 1953 J Chem. Phys. 21 1087 Pathria, R. K. 1972 Statistical Mechanics Oxford: Oxford University Press Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. 1992 Numerical RecipesCambridge: Cambridge University Press Tranah, D. & Landberg, P. T. 1980 Collective Phenomena 3 81 Zel’dovich, Ya. B. & Novikov, I. D. 1971 Relativistic Astrophysics. Volume 1, Stars and RelativityChicago: University of Chicago Press

Contents 5 Random Processes 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Random Processes and their Probability Distributions . . . . . . . . . . . . . 5.2.1 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Gaussian Processes and the Central Limit Theorem . . . . . . . . . . 5.3 Correlation Functions, Spectral Densities, and Ergodicity . . . . . . . . . . . 5.3.1 Correlation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Ergodic Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Fourier Transforms and Spectral Densities . . . . . . . . . . . . . . . 5.3.4 Doob’s Theorem for Gaussian, Markov Processes . . . . . . . . . . . 5.4 Noise and its Types of Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Intuitive Meaning of Spectral Density . . . . . . . . . . . . . . . . . . 5.4.2 Shot Noise, Flicker Noise and Random-Walk Noise . . . . . . . . . . 5.4.3 Information Missing from Spectral Density . . . . . . . . . . . . . . . 5.5 Filters, Signal-to-Noise Ratio, and Shot Noise . . . . . . . . . . . . . . . . . 5.5.1 Filters, their Kernals, and the Filtered Spectral Density . . . . . . . . 5.5.2 Band-Pass Filter and Signal to Noise Ratio . . . . . . . . . . . . . . . 5.5.3 Shot Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Fluctuation-Dissipation Theorem . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Generalized Coordinate and its Impedance . . . . . . . . . . . . . . . 5.6.2 Fluctuation-Dissipation Theorem for Generalized Coordinate Interacting with Thermalized Heat Bath . . . . . . . . . . . . . . . . . . . . 5.6.3 Johnson Noise and Langevin Equations . . . . . . . . . . . . . . . . . 5.7 Fokker-Planck Equation for a Markov Processes Conditional Probability . . . 5.7.1 Fokker-Planck for a One-Dimensional Markov Process . . . . . . . . . 5.7.2 Fokker-Planck for a Multi-Dimensional Markov Process . . . . . . . . 5.7.3 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

i

1 2 2 5 6 8 8 9 10 14 16 17 18 19 20 20 22 24 29 29 31 34 37 37 40 41

Chapter 5 Random Processes Version 0805.1.K, 29 Oct 08 Please send comments, suggestions, and errata via email to [email protected], or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 5.1 Reader’s Guide • Relativity does not enter into this chapter. • This chapter does not rely in any major way on previous chapters, but it does make occasional reference to results from Chaps. 3 and 4 about statistical equalibrium and fluctutions in and away from statistical equilibrium. KTB?? • No subsequent chapter relies in any major way on this chapter. However: – The concepts of spectral density and correlation function, developed in Sec. 5.3, will be used in Ex. 8.7 in treating coherence properties of radiation, in Sec. 10.5 in studying thermal noise in solids, in Sec. 14.3 in studying turbulence in fluids, in Sec. 22.2.1 in treating the quasilinear formalism for weak plasma turbulence, and in Sec. 27.5.7 in discussing observations of the anisotropy of the cosmic microwave background radiation. – The fluctuation-dissipation theorem, developed in Sec. 5.6, will be used in Ex. 10.14 for thermoelastic noise in solids, and in Sec. 11.5 for normal modes of an elastic body. – The Fokker-Planck equation, developed in Sec. 5.7, will be referred to in Sec. 19.4.3 and Ex. 19.8 when discussing thermal equilibration in a plasma and thermoelectric transport coefficients, and it will be used in Sec. 22.3.1 in developing the quasilinear theory of wave-particle interactions in a plasma.

1

2

5.1

Overview

In this chapter we shall analyze, among others, the following issues: • What is the time evolution of the distribution function for an ensemble of systems that begins out of statistical equilibrium and is brought into equilibrium through contact with a heat bath? • How can one characterize the noise introduced into experiments or observations by noisy devices such as resistors, amplifiers, etc.? • What is the influence of such noise on one’s ability to detect weak signals? • What filtering strategies will improve one’s ability to extract weak signals from strong noise? • Frictional damping of a dynamical system generally arises from coupling to many other degrees of freedom (a bath) that can sap the system’s energy. What is the connection between the fluctuating (noise) forces that the bath exerts on the system and its damping influence? The mathematical foundation for analyzing such issues is the theory of random processes, and a portion of that subject is the theory of stochastic differential equations. The first two sections of this chapter constitute a quick introduction to the theory of random processes, and subsequent sections then use that theory to analyze the above issues and others. More specifically: Section 5.2 introduces the concept of a random process and the various probability distributions that describe it, and discusses two special classes of random processes: Markov processes and Gaussian processes. Section 5.3 introduces two powerful mathematical tools for the analysis of random processes: the correlation function and the spectral density. In Secs. 5.4 and 5.5 we meet the first application of random processes: to noise and its characterization, and to types of signal processing that can be done to extract weak signals from large noise. Finally, in Secs. 5.6 and 5.7 we use the theory of random processes to study the details of how an ensemble of systems, interacting with a bath, evolves into statistical equilibrium. As we shall see, the evolution is governed by a stochastic differential equation called the “Langevin equation,” whose solution is described by an evolving probability distribution (the distribution function). As powerful tools in studying the probability’s evolution, in Sec. 5.6 we develop the fluctuation-dissipation theorem, which characterizes the forces by which the bath interacts with the systems; and in Sec. 5.7 we the develop the Fokker-Planck equation, which describes how the probability diffuses through phase space.

5.2

Random Processes and their Probability Distributions

Definition of “random process”. A (one-dimensional) random process is a (scalar) function y(t), where t is usually time, for which the future evolution is not determined uniquely by

3 any set of initial data—or at least by any set that is knowable to you and me. In other words, “random process” is just a fancy phrase that means “unpredictable function”. Throughout this chapter we shall insist for simplicity that our random processes y take on a continuum of values ranging over some interval, often but not always −∞ to +∞. The generalization to y’s with discrete (e.g., integral) values is straightforward. Examples of random processes are: (i ) the total energy E(t) in a cell of gas that is in contact with a heat bath; (ii ) the temperature T (t) at the corner of Main Street and Center Street in Logan, Utah; (iii ) the earth-longitude φ(t) of a specific oxygen molecule in the earth’s atmosphere. One can also deal with random processes that are vector or tensor functions of time, but in this chapter’s brief introduction we shall refrain from doing so, except for occasional side remarks, equations and brief paragraphs. Ensembles of random processes. Since the precise time evolution of a random process is not predictable, if one wishes to make predictions one can do so only probabilistically. The foundation for probabilistic predictions is an ensemble of random processes—i.e., a collection of a huge number of random processes each of which behaves in its own, unpredictable way. In the next section we will use the ergodic hypothesis to construct, from a single random process that interests us, a conceptual ensemble whose statistical properties carry information about the time evolution of the interesting process. However, until then we will assume that someone else has given us an ensemble; and we shall develop a probabilistic characterization of it. Probability distributions. An ensemble of random processes is characterized completely by a set of probability distributions p1 , p2 , p3 , . . . defined as follows: pn (yn , tn ; . . . ; y2 , t2 ; y1 , t1 )dyn . . . dy2 dy1

(5.1)

tells us the probability that a process y(t) drawn at random from the ensemble (i ) will take on a value between y1 and y1 + dy1 at time t1 , and (ii ) also will take on a value between y2 and y2 + dy2 at time t2 , and . . ., and (iii ) also will take on a value between yn and yn + dyn at time tn . (Note that the subscript n on pn tells us how many independent values of y appear in pn , and that earlier times are placed to the right—a practice common for physicists.) If we knew the values of all of an ensemble’s probability distributions (an infinite number of them!) for all possible choices of their times (an infinite number of choices for each time that appears in each probability distribution) and for all possible values of y (an infinite number of possible values for each time that appears in each probability distribution), then we would have full information about the ensemble’s statistical properties. Not surprisingly, it will turn out that, if the ensemble in some sense is in statistical equilibrium, we can compute all its probability distributions from a very small amount of information. But that comes later; first we must develop more formalism. Ensemble averages. From the probability distributions we can compute ensemble averages (denoted by brackets). For example, the quantity Z hy(t1)i ≡ y1 p1 (y1 , t1 )dy1 (5.2a)

4 is the ensemble-averaged value of y at time t1 . Similarly, hy(t2)y(t1 )i ≡

Z

y2 y1 p2 (y2 , t2 ; y1, t1 )dy2dy1

(5.2b)

is the average value of the product y(t2)y(t1 ). Conditional probabilities. Besides the (absolute) probability distributions pn , we shall also find useful an infinite series of conditional probability distributions P1 , P2 , . . ., defined as follows: Pn (yn , tn |yn−1, tn−1 ; . . . ; y1 , t1 )dyn (5.3) is the probability that if y(t) took on the values y1 at time t1 and y2 at time t2 and . . . and yn−1 at time tn−1 , then it will take on a value between yn and yn + dyn at time tn . It should be obvious from the definitions of the probability distributions that pn (yn , tn ; . . . ; y1 , t1 ) = Pn (yn , tn |yn−1 , tn−1 ; . . . ; y1 , t1 )pn−1 (yn−1 , tn−1 ; . . . ; y1, tn−1 ) . (5.4) Using this relation, one can compute all the conditional probability distributions Pn from the absolute distributions p1 , p2 , . . . . Conversely, using this relation recursively, one can build up all the absolute probability distributions pn from the first one p1 (y1 , t1 ) and all the conditional distributions P2 , P3 , . . .. Stationary random processes. An ensemble of random processes is said to be stationary if and only if its probability distributions pn depend only on time differences, not on absolute time: pn (yn , tn + τ ; . . . ; y2 , t2 + τ ; y1 , t1 + τ ) = pn (yn , tn ; . . . ; y2 , t2 ; y1 , t1 ) . (5.5) If this property holds for the absolute probabilities pn , then Eq. (5.4) guarantees it also will hold for the conditional probabilities Pn . Colloquially one says that “the random process y(t) is stationary” even though what one really means is that “the ensemble from which the process y(t) comes is stationary”. More generally, one often speaks of “a random process y(t)” when what one really means is “an ensemble of random processes {y(t)}”. Nonstationary random processes arise when one is studying a system whose evolution is influenced by some sort of clock that cares about absolute time. For example, the speeds v(t) of the oxygen molecules in downtown Logan, Utah make up an ensemble of random processes regulated in part by the rotation of the earth and the orbital motion of the earth around the sun; and the influence of these clocks makes v(t) be a nonstationary random process. By contrast, stationary random processes arise in the absence of any regulating clocks. An example is the speeds v(t) of oxygen molecules in a room kept at constant temperature. Stationarity does not mean “no time evolution of probability distributions”. For example, suppose one knows that the speed of a specific oxygen molecule vanishes at time t1 , and one is interested in the probability that the molecule will have speed v2 at time t2 . That probability, P2 (v2 , t2 |0, t1 ) will be sharply peaked around v2 = 0 for small time differences t2 −t1 , and will be Maxwellian for large time differences t2 − t1 (Fig. 5.1). Despite this evolution, the process is stationary (assuming constant temperature) in that it does not depend on the specific

5 P2

extremely small t2 -t1 small t2 -t1 large t2 -t1

v2

Fig. 5.1: The probability P2 (0, t1 ; v2 , t2 ) that a molecule which has vanishing speed at time t1 will have speed v2 (in a unit interval dv2 ) at time t2 . Although the molecular speed is a stationary random process, this probability evolves in time.

time t1 at which v happened to vanish, only on the time difference t2 − t1 : P2 (v2 , t2 |0, t1 ) = P2 (v2 , t2 − t1 |0, 0). Henceforth, throughout this chapter, we shall restrict attention to random processes that are stationary (at least on the timescales of interest to us); and, accordingly, we shall denote p1 (y) ≡ p1 (y, t1)

(5.6a)

since it does not depend on the time t1 . We shall also denote P2 (y2 , t|y1) ≡ P2 (y2 , t|y1 , 0)

(5.6b)

for the probability that, if a random process begins with the value y1 , then after the lapse of a time t it has the value y2 .

5.2.1

Markov Processes

Markov process. A random process y(t) is said to be Markov (also sometimes called Markovian) if and only if all of its future probabilities are determined by its most recently known value: Pn (yn , tn |yn−1, tn−1 ; . . . ; y1 , t1 ) = P2 (yn , tn |yn−1 , tn−1 ) for all tn ≥ . . . ≥ t2 ≥ t1 .

(5.7)

This relation guarantees that any Markov process (which, of course, we require to be stationary without saying so) is completely characterized by the probabilities p1 (y) and P2 (y2 , t|y1 ) ≡

p2 (y2 , t; y1, 0) ; p1 (y1 )

(5.8)

i.e., by one function of one variable and one function of three variables. From these p1 (y) and P2 (y2 , t|y1 ) one can reconstruct, using the Markovian relation (5.7) and the general relation (5.4) between conditional and absolute probabilities, all of the process’s distribution functions. As an example, the x-component of velocity vx (t) of a dust particle in a room filled with constant-temperature air is Markov (if we ignore the effects of the floor, ceiling, and walls by

6 making the room be arbitrarily large). By contrast, the position x(t) of the particle is not Markov because the probabilities of future values of x depend not just on the initial value of x, but also on the initial velocity vx —or, equivalently, the probabilities depend on the values of x at two initial, closely spaced times. The pair {x(t), vx (t)} is a two-dimensional Markov process. The Smoluchowski equation. Choose three (arbitrary) times t1 , t2 , and t3 that are ordered, so t1 < t2 < t3 . Consider an arbitrary random process that begins with a known value y1 at t1 , and ask for the probability P2 (y3 , t3 |y1 ) (per unit y3 ) that it will be at y3 at time t3 . Since the process must go through some value y2 at the intermediate time t2 (though we don’t care what that value is), it must be possible to write the probability to reach y3 as Z P2 (y3 , t3 |y1 , t1 ) = P3 (y3 , t3 |y2 , t2 ; y1 , t1 )P2 (y2 , t2 |y1 , t1 )dy2 , where the integration is over all allowed values of y2 . This is not a terribly interesting relation. Much more interesting is its specialization to the case of a Markov process. In that case P3 (y3 , t3 |y2, t2 ; y1, t1 ) can be replaced by P2 (y3 , t3 |y2 , t2 ) = P2 (y3, t3 −t2 |y2 , 0) ≡ P2 (y3 , t3 − t2 |y2), and the result is an integral equation involving only P2 . Because of stationarity, it is adequate to write that equation for the case t1 = 0: P2 (y3 , t3 |y1 ) =

Z

P2 (y3 , t3 − t2 |y2 )P2 (y2 , t2 |y1 )dy2 .

(5.9)

This is the Smoluchowski equation, valid for any Markov random process and for times 0 < t2 < t3 . We shall discover its power in our derivation of the Fokker-Planck equation in Sec. 5.7.1 below.

5.2.2

Gaussian Processes and the Central Limit Theorem

Gaussian processes. A random process is said to be Gaussian if and only if all of its (absolute) probability distributions are Gaussian, i.e., have the following form: X n n X αjk (yj − y¯)(yk − y¯) , (5.10a) pn (yn , tn ; . . . ; y2 , t2 ; y1 , t1 ) = A exp − j=1

k=1

where (i ) A and αjk depend only on the time differences t2 − t1 , t3 − t1 , . . . , tn − t1 ; (ii ) A is a positive normalization constant; (iii ) [αjk ] is a positive-definite matrix (otherwise pn would not be normalizable); and (iv ) y¯ is a constant, which one readily can show is equal to the ensemble average of y, Z y¯ ≡ hyi =

yp1 (y)dy .

(5.10b)

Gaussian random processes are very common in physics. For example, the total number of particles N(t) in a gas cell that is in statistical equilibrium with a heat bath is a Gaussian random process [Eq. (4.46) and associated discussion]. In fact, as we saw in Sec. 4.5,

7 p(y)

p(Y) large N medium N small N y

(a)

Y

(b)

Fig. 5.2: Example of the central limit theorem. The random variable y with the probability distribution p(y) shown in (a) produces, for various values of N , the variable Y = (y1 + . . . +yN )/N with the probability distributions p(Y ) shown in (b). In the limit of very large N , p(Y ) is a Gaussian.

macroscopic variables that characterize huge systems in statistical equilibrium always have Gaussian probability distributions. The underlying reason is that, when a random process is driven by a large number of statistically independent, random influences, its probability distributions become Gaussian. This general fact is a consequence of the “central limit theorem” of probability theory: Central limit theorem. Let y be a random variable (not necessarily a random process; there need not be any times involved; however, our application is to random processes). Suppose that y is characterized by an arbitrary probability distribution p(y) [e.g., that of Fig. 5.2(a)], so the probability of the variable taking on a value between y and y + dy is p(y)dy. Denote by y¯ and σy the mean value of y and its standard deviation (the square root of its variance) Z y¯ ≡ hyi = yp(y)dy , (σy )2 ≡ h(y − y¯)2 i = hy 2 i − y¯2 . (5.11a) Randomly draw from this distribution a large number, N, of values {y1 , y2 , . . . , yN } and average them to get a number N 1 X yi . (5.11b) Y ≡ N i=1 Repeat this many times, and examine the resulting probability distribution for Y . In the limit of arbitrarily large N that distribution will be Gaussian with mean and standard deviation σy Y¯ = y¯ , σY = √ ; (5.11c) N ı.e., it will have the form 1 (Y − Y¯ )2 p(Y ) = √ exp − 2σY 2 2πσY 2

(5.11d)

with Y¯ and σY given by Eq. (5.11c). See Fig. 5.2(b). Proof of Central Limit Theorem: The key to proving this theorem is the Fourier transform of the probability distribution. (That Fourier transform is called the distribution’s characteristic

8 function, but we shall not in this chapter delve into the details of characteristic functions.) Denote the Fourier transform of p(y) by p˜y (f ) ≡

Z

+∞

ei2πf y p(y)dy =

−∞

∞ X (i2πf )n

n=0

n!

hy n i .

(5.12a)

The second expression follows from a power series expansion of the first. Similarly, since a power series expansion analogous to (5.12a) must hold for p˜Y (k) and since hY n i can be computed from hY n i = hN −n (y1 + y2 + . . . + yN )n i Z = N −n (y1 + . . . + yN )n p(y1 )...p(yN )dy1 ...dyN ,

(5.12b)

it must be that p˜Y (f ) =

∞ X (i2πf )n

n!

hY n i

n=0 Z = exp[i2πf N −1 (y1 + . . . + yN )]p(y1 ) . . . p(yN )dy1 . . . dyn N Z 1 i2πf y¯ (2πf )2 hy 2 i i2πf y/N N = [ e p(y)dy] = 1 + − +O N 2N 2 N3 1 (2πf )2 (hy 2 i − y¯2 ) +O . = exp i2πf y¯ − 2N N2

(5.12c)

Here the last equality can be obtained by taking the logarithm of the preceding quantity, expanding in powers of 1/N , and then exponentiating. By inverting the Fourier transform (5.12c) and using (σy )2 = hy 2 i − y¯2 , we obtain for p(Y ) the Gaussian (5.11d). QED

5.3

Correlation Functions, Spectral Densities, and Ergodicity

5.3.1

Correlation Functions

Time averages. Forget, between here and Eq. (5.16), that we have occasionally used y¯ to denote the numerical value of an ensemble average, hyi. Instead, insist that bars denote time averages, so that if y(t) is a random process and F is a function of y, then 1 F¯ ≡ lim T →∞ T

Z

+T /2

F [y(t)]dt .

(5.13)

−T /2

Correlation function. Let y(t) be a random process with time average y¯. Then the correlation function of y(t) is defined by 1 Cy (τ ) ≡ [y(t) − y¯][y(t + τ ) − y¯] ≡ lim T →∞ T

Z

+T /2

−T /2

[y(t) − y¯][y(t + τ ) − y¯]dt .

(5.14)

9

Cy (τ)

σy2 τr

τ

Fig. 5.3: Example of a correlation function that becomes negligible for delay times τ larger than some relaxation time τr .

This quantity, as its name suggests, is a measure of the extent to which the values of y at times t and t + τ tend to be correlated. The quantity τ is sometimes called the delay time, and by convention it is taken to be positive. [One can easily see that, if one also defines Cy (τ ) for negative delay times τ by Eq. (5.14), then Cy (−τ ) = Cy (τ ). Thus, nothing is lost by restricting attention to positive delay times.] Relaxation time. Random processes encountered in physics usually have correlation functions that become negligibly small for all delay times τ that greatly exceed some “relaxation time” τr ; i.e., they have Cy (τ ) qualitatively like that of Fig. 5.3. Henceforth we shall restrict attention to random processes with this property.

5.3.2

Ergodic Hypothesis

An ensemble E of (stationary) random processes will be said to satisfy the ergodic hypothesis if and only if it has the following property: Let y(t) be any random process in the ensemble E. Construct from y(t) a new ensemble E ′ whose members are Y K (t) ≡ y(t + KT ) ,

(5.15)

where K runs over all integers, negative and positive, and where T is a time interval large compared to the process’s relaxation time, T ≫ τr . Then E ′ has the same probability distributions pn as E—i.e., pn (Yn , tn ; . . . ; Y1 , t1 ) has the same functional form as pn (yn , tn ; . . . ; y1 , t1 )—for all times such that |ti − tj | < T . This is essentially the same ergodic hypothesis as we met in Sec. 3.5. As in Sec. 3.5, the ergodic hypothesis guarantees that time averages defined using any random process y(t) drawn from the ensemble E are equal to ensemble averages: F¯ ≡ hF i ,

(5.16)

where F is any function of y: F = F (y). In this sense, each random process in the ensemble is representative, when viewed over sufficiently long times, of the statistical properties of the entire ensemble—and conversely. Henceforth we shall restrict attention to ensembles that satisfy the ergodic hypothesis. This, in principle, is a severe restriction. In practice, for a physicist, it is not severe at all. In physics one’s objective when introducing ensembles is usually to acquire computational

10 techniques for dealing with a single, or a small number of random processes; and one acquires those techniques by defining one’s conceptual ensembles in such a way that they satisfy the ergodic hypothesis. Because we insist that the ergodic hypothesis be satisfied for all our random processes, the value of the correlation function at zero time delay will be Cy (0) ≡ (y − y¯)2 = h(y − y¯)2 i ,

(5.17a)

which by definition is the variance σy 2 of y: Cy (0) = σy 2 .

(5.17b)

If x(t) and y(t) are two random processes, then by analogy with the correlation function Cy (τ ) we define their cross correlation as Cxy (τ ) ≡ x(t)y(t + τ ) .

(5.18a)

Sometimes Cy (τ ) is called the autocorrelation function of y to distinguish it clearly from this cross correlation function. Notice that the cross correlation satisfies Cxy (−τ ) = Cyx (τ ) ,

(5.18b)

and the cross correlation of a random process with itself is equal to its autocorrelation Cyy (τ ) = Cy (τ ). The matrix Cx (τ ) Cxy (τ ) Cxx (τ ) Cxy (τ ) (5.18c) = Cxy (τ ) Cy (τ ) Cyx (τ ) Cyy (τ ) can be regarded as a correlation matrix for the 2-dimensional random process {x(t), y(t)}. We now turn to some issues which will prepare us for defining the concept of “spectral density”.

5.3.3

Fourier Transforms and Spectral Densities

Fourier transforms. There are several different sets of conventions for the definition of Fourier transforms. In this book we adopt a set which is commonly (but not always) used in the theory of random processes, but which differs from that common in quantum theory. Instead of using the angular frequency ω, we shall use the ordinary frequency f ≡ ω/2π; and we shall define the Fourier transform of a function y(t) by y˜(f ) ≡

Z

+∞

y(t)ei2πf t dt .

(5.19a)

−∞

Knowing the Fourier transform y˜(f ), we can invert (5.19a) to get y(t) using y(t) ≡

Z

+∞

y˜(f )e−i2πf t df . −∞

(5.19b)

11 √ Notice that with this set of conventions there are no factors of 1/2π or 1/ 2π multiplying the integrals. Those factors have been absorbed into the df of (5.19b), since df = dω/2π. Fourier transforms are not useful when dealing with random processes. The reason is that a random process y(t) is generally presumed to go on and on and on forever; and, as a result, its Fourier transform y˜(f ) is divergent. One gets around this problem by crude trickery: (i ) From y(t) construct, by truncation, the function yT (t) ≡ y(t) if − T /2 < t < +T /2 ,

and yT (t) ≡ 0 otherwise .

(5.20a)

Then the Fourier transform y˜T (f ) is finite; and by Parseval’s theorem it satisfies Z

+T /2

−T /2

2

[y(t)] dt =

Z

+∞ 2

[yT (t)] dt =

−∞

Z

+∞ 2

|˜ yT (f )| df = 2 −∞

Z

∞

|˜ yT (f )|2 df .

(5.20b)

0

Here in the last equality we have used the fact that because yT (t) is real, y˜T∗ (f ) = y˜T (−f ) where ∗ denotes complex conjugation; and, consequently, the integral from −∞ to 0 of |˜ yT (f )|2 is the same as the integral from 0 to +∞. Now, the quantities on the two sides of (5.20b) diverge in the limit as T → ∞, and it is obvious from the left side that they diverge linearly as T . Correspondingly, the limit Z Z 2 ∞ 1 +T /2 2 [y(t)] dt = lim |˜ yT (f )|2 df (5.20c) lim T →∞ T 0 T →∞ T −T /2 is convergent. Spectral density. These considerations motivate the following definition of the spectral density (also sometimes called the power spectrum) Sy (f ) of the random process y(t): Z 2 +T /2 i2πf t [y(t) − y ¯ ]e dt Sy (f ) ≡ lim T →∞ T −T /2

2

.

(5.21)

Notice that the quantity inside the absolute value sign is just y˜T (f ), but with the mean of y removed before computation of the Fourier transform. (The mean is removed so as to avoid an uninteresting delta function in Sy (f ) at zero frequency.) Correspondingly, by virtue of our R +T /2 R∞ motivating result (5.20c), the spectral density satisfies 0 Sy (f )df = limT →∞ T1 −T /2 [y(t)− y¯]2 dt = (y − y¯)2 = σy 2 ; i.e.

Z

∞

Sy (f )df = Cy (0) = σy2 .

(5.22)

0

In words: The integral of the spectral density of y over all positive frequencies is equal to the variance of y. By convention, our spectral density is defined only for nonnegative frequencies f . This is because, were we to define it also for negative frequencies, the fact that y(t) is real would imply that Sy (f ) = Sy (−f ), so the negative frequencies contain no new information. Our insistence that f be positive goes hand in hand with the factor 2 in the 2/T of the definition (5.21): that factor 2 in essence folds the negative frequency part over onto the positive

12

10-12

Sqrt[Sx(f)], m/Sqrt[Hz]

10-13 10-14 10-15 10-16 10-17 10-18 10-19 10-20 10

100

1000

10000

Frequency, Hz Fig. 5.4: The square root of the spectral density of the separation x(t) between hanging mirrors in the 4km long LIGO gravitational-wave detector at Hanford Washington, as measured on March 18, 2007. The black curve is the noise that was specified as this instrument’s goal. At frequencies above about 150 Hz, the observed noise is due to random arrival times of the photons used to measure x(t) (shot noise; Sec. 5.5.3). Between 150Hz and 40Hz, the noise is partially due to fluctuating forces inside the mirrors and in the wires by which they hang (thermal noise, described by the Fluctuation-Dissipation Theorem of Sec. 5.6). Below 40Hz, the noise is largely due to shaking of the ground (“ambient seismic noise”) sneaking through mechanical filters that isolate the mirrors from their environment.

frequency part. This choice of convention is called the single-sided spectral density. Some of the literature uses a double-sided spectral density, 1 Sydouble−sided (f ) = Sy (f ) 2

(5.23)

in which f is regarded as both positive and negative and frequency integrals generally run from −∞ to +∞ instead of 0 to ∞.. Notice that the spectral density has units of y 2 per unit frequency; or, more colloquially (since frequency f is usually measured in Hertz, i.e., cycles per second) its units are y 2 /Hz. Spectral densities are widely used in science and engineering. Figure 5.4 shows an example: the spectral density of the noise in an interferometric gravitational-wave detector (Sec. 26.5). As we shall learn in Chap. 26, the gravitational wave from some distant source (e.g. two colliding black holes) pushes two mirrors (hanging by wires) back and forth with respect to each other. Laser interferometry is used to monitor the separation x(t) between the mirrors. The measured x(t) is influenced by noise in the instrument as well as by gravi-

13 tational waves. Figure 5.4 shows thepsquare root of the spectral density of the noise-induced √ fluctuations in x(t). Note that this Sx (f ) has units m/ Hz. If x(t) and y(t) are two random processes, then by analogy with the spectral density Sy (f ) we define their cross spectral density as Z Z +T /2 2 +T /2 ′ −2πif t [x(t) − x¯]e dt [y(t′ ) − y¯]e+2πif t dt′ . (5.24a) Sxy (f ) = lim T →∞ T −T /2 −T /2 Notice that the cross spectral density of a random process with itself is equal to its spectral density Syy (f ) = Sy (f ) and is real, but if x(t) and y(t) are different random processes then Sxy (f ) is generally complex, with ∗ Sxy (f ) = Sxy (−f ) = Syx (f ) .

(5.24b)

This relation allows us to confine attention to positive f without any loss of information. The matrix Sxx (f ) Sxy (f ) Sx (f ) Sxy (f ) = (5.24c) Syx (f ) Syy (f ) Sxy (f ) Sy (f ) can be regarded as a spectral density matrix that describes how the power in the 2dimensional random process {x(t), y(t)} is distributed over frequency. The Wiener-Khintchine Theorem says that for any random process y(t) the correlation function Cy (τ ) and the spectral density Sy (f ) are the cosine transforms of each other and thus contain precisely the same information Z ∞ Z ∞ Cy (τ ) = (5.25a) Sy (f ) cos(2πf τ )df , Sy (f ) = 4 Cy (τ ) cos(2πf τ )dτ , 0

0

and similarly the cross correlation Cxy (τ ) and cross spectral density Sxy (f ) of any two random processes x(t) and y(t) are the ordinary Fourier transforms of each other and thus contain the same information: Z Z 1 ∞ 1 +∞ −i2πf τ Sxy (f )e df = Sxy (f )e−i2πf τ + Syx (f )e+i2πf τ df , Cxy (τ ) = 2 −∞ 2 Z ∞ Z ∞0 Sxy (f ) = 2 Cxy (τ )ei2πf τ dτ = 2 Cxy (f )e+i2πf τ + Cyx (f )e−i2πf τ df . (5.25b) 0

−∞

The factors 4, 1/2, and 2 in these formulas result from our folding negative frequencies into positive in our definitions of the spectral density, Proof of Wiener-Khintchine Theorem: This theorem is readily proved as a consequence of Parseval’s theorem: Assume, from the outset, that the means have been subtracted from x(t) and y(t) so x ¯ = y¯ = 0. [This is not really a restriction on the proof, since Cy , Cxy , Sy and Sxy are insensitive to the means of y and x.] Denote by yT (t) the truncated y of Eq. (5.20a) and by y˜T (f ) its Fourier transform, and similarly for x. Then the generalization of Parseval’s theorem1 Z +∞ Z +∞ ∗ ∗ ˜ ∗ + h˜ ˜ g∗ )df (˜ gh (5.26a) (gh + hg )dt = −∞

1

−∞

This follows by subtracting Parseval’s theorem for g and for h from Parseval’s theorem for g + h.

14 ˜ = y˜T (f )e−i2πf τ ] says [with g = xT (t) and h = yT (t + τ ) both real and g˜ = x ˜T (f ), h Z +∞ Z +∞ x ˜∗T (f )˜ yT (f )e−i2πf τ df . xT (t)yT (t + τ )dt =

(5.26b)

−∞

−∞

By dividing by T , taking the limit as T → ∞, and using Eqs. (5.18a) and (5.24a), we obtain the first equality in Eqs. (5.25b). The second equality follows from Sxy (−f ) = Syx (f ), and the second line in Eqs. (5.25b) follows from Fourier inversion. Equations (5.25a) for Sy and Cy follow by setting x = y. QED

The Wiener-Khintchine theorem implies the following formulas for ensemble averaged products of Fourier transforms of random processes: 2h˜ y (f )˜ y ∗(f ′ )i = Sy (f )δ(f − f ′ ) ,

(5.27a)

2h˜ x(f )˜ y ∗(f ′ )i = Sxy (f )δ(f − f ′ ) .

(5.27b)

Eq. (5.27a) quantifies the strength of the infinite value of |˜ y (f )|2, which motivated our definition (5.21) of the spectral density. To prove Eq. (5.27b) we proceed as follows: Z +∞ Z +∞ ′ ′ ∗ ′ h˜ x (f )˜ y (f )i = hx(t)y(t′ )i e−2πif t e+2πif t dtdt′ . (5.28a) −∞

−∞

Setting t′ = t + τ and using the ergodic hypothesis and the definition (5.18a) of the cross correlation, we bring this into the form Z +∞ Z +∞ 1 ′ ′ 2πif ′ τ (5.28b) Cxy (τ )e dτ e2πi(f −f )t dt′ = Sxy (f )δ(f − f ′ ) , 2 −∞ −∞ where we have used the Wiener-Khintchine relation (5.25b) and also the expression δ(ν) = R +∞ 2πiνt ′ e dt′ for the Dirac delta function δ(ν). This proves Eq. (5.27b); Eq. (5.27a) follows −∞ by setting x = y.

5.3.4

Doob’s Theorem for Gaussian, Markov Processes

Doob’s Theorem. A large fraction of the random processes that one meets in physics are Gaussian, and many of them are Markov. As a result, the following remarkable theorem about processes that are both Gaussian and Markov is quite important: Any one-dimensional random process y(t) that is both Gaussian and Markov has the following forms for its correlation function, its spectral density, and the two probability distributions p1 and P2 which determine all the others: Cy (τ ) = σy 2 e−τ /τr , (5.29a) Sy (f ) =

(4/τr )σy 2 , (2πf )2 + (1/τr )2

1 (y − y¯)2 p1 (y) = p , exp − 2σy 2 2πσy 2

(5.29b)

(5.29c)

15 Cy

Sy

σ y2

4σy2τr Sy τ

τr (a)

σ y2 / τr π 2f 2

1/π τ r

f

(b)

Fig. 5.5: (a) The correlation function (5.29a) and spectral density (5.29b) for a Gaussian, Markov process.

[y2 − y¯ − e−τ /τr (y1 − y¯)]2 P2 (y2 , τ |y1 ) = . 1 exp − 2(1 − e−2τ /τr )σy 2 [2π(1 − e−2τ /τr )σy 2 ] 2 1

(5.29d)

Here y¯ is the process’s mean, σy is its standard deviation (σy 2 is its variance), and τr is its relaxation time. This result is Doob’s theorem.2 The correlation function (5.29a) and spectral density (5.29b) are plotted in Fig. 5.5. Note the great power of Doob’s theorem: Because all of y’s probability distributions are computable from p1 [Eq. (5.29c)] and P2 [Eq. (5.29d)], and these are determined by y¯, σy , and τr , this theorem says that all statistical properties of a Gaussian, Markov process are determined by just three parameters: its mean y¯, its variance σy 2 , and its relaxation time τr . Proof of Doob’s Theorem : Let y(t) be Gaussian and Markov (and, of course, stationary). For ease of notation, set ynew = (yold − y¯old )/σyold , so y¯new = 0, σynew = 1. If the theorem is true for ynew , then by the rescalings inherent in the definitions of Cy (τ ), Sy (f ), p1 (y), and P2 (y2 , τ |y1 ), it will also be true for yold . Since y ≡ ynew is Gaussian, its first two probability distributions must have the following Gaussian forms (these are the most general Gaussians with the required mean y¯ = 0 and variance σy 2 = 1): 1 2 (5.30a) p1 (y) = √ e−y /2 2π 1 y1 2 + y2 2 − 2C21 y1 y2 p2 (y2 , t2 ; y1 , t1 ) = q exp − . (5.30b) 2(1 − C21 2 ) (2π)2 (1 − C21 2 ) By virtue of the ergodic hypothesis, this p2 determines the correlation function: Z Cy (t2 − t1 ) ≡ hy(t2 )y(t1 )i = p2 (y2 , t2 ; y1 , t1 )y2 y1 dy2 dy1 = C21 .

(5.30c)

Thus, the constant C21 in p2 is the correlation function. From the general expression (5.4) for conditional probabilities in terms of absolute probabilities we can compute P2 : (y2 − C21 y1 )2 1 . (5.30d) exp − P2 (y2 , t2 |y1 , t1 ) = q 2 2(1 − C ) 2 21 2π(1 − C21 ) 2

It is so named because it was first identified and proved by J. L. Doob (1942).

16 We can also use the general expression (5.4) for the relationship between conditional and absolute probabilities to compute p3 : p3 (y3 , t3 ; y2 , t2 ; y1 , t1 ) = P3 (y3 , t3 |y2 , t2 ; y1 , t1 )p2 (y2 , t2 ; y1 , t1 ) = P2 (y3 , t3 |y2 , t2 )p2 (y2 , t2 ; y1 , t1 ) (y3 − C32 y2 )2 1 exp − = q 2(1 − C32 2 ) 2π(1 − C32 2 ) (y1 2 + y2 2 − 2C21 y1 y2 ) 1 exp − . (5.30e) × q 2(1 − C21 2 ) (2π)2 (1 − C21 2 ) Here the second equality follows from the fact that y is Markov, and in order that it be valid we insist that t1 < t2 < t3 . From the explicit form (5.30e) of p3 we can compute Z Cy (t3 − t1 ) ≡ C31 ≡ hy(t3 )y(t1 )i = p3 (y3 , t3 ; y2 , t2 ; y1 , t1 )y3 y1 dy3 dy2 dy1 . (5.30f) The result is C31 = C32 C21 .

(5.30g)

Cy (t3 − t1 ) = Cy (t3 − t2 )Cy (t2 − t1 ) for any t3 > t2 > t1 .

(5.30h)

In other words, The unique solution to this equation, with the “initial condition” that Cy (0) = σy 2 = 1, is Cy (τ ) = e−τ /τr ,

(5.30i)

where τr is a constant (which we identify as the relaxation time; cf. Fig. 5.3). From the WienerKhintchine relation (5.25a) and this correlation function we obtain Sy (f ) =

4/τr . + (1/τr )2

(2πf )2

(5.30j)

Equations (5.30j), (5.30i), (5.30a), and (5.30d) are the asserted forms (5.29a)–(5.29d) of the correlation function, spectral density, and probability distributions in the case of our ynew with y¯ = 0 and σy = 1. From these, by rescaling, we obtain the forms (5.29a)–(5.29d) for yold . Thus, Doob’s theorem is proved. QED

5.4

Noise and its Types of Spectra

Experimental physicists and engineers encounter random processes in the form of “noise” that is superposed on signals they are trying to measure. Examples: (i ) In radio communication, “static” on the radio is noise. (ii ) When modulated laser light is used for optical communication, random fluctuations in the arrival times of photons always contaminate the signal; the effects of such fluctuations are called “shot noise” and will be studied below. (iii ) Even the best of atomic clocks fail to tick with absolutely constant angular frequencies ω; their frequencies fluctuate ever so slightly relative to an ideal clock, and those fluctuations can be regarded as noise.

17 Sometimes the “signal” that one studies amidst noise is actually itself some very special noise (“one person’s signal is another person’s noise”). An example is in radio astronomy, where the electric field Ex (t) of the waves from a quasar, in the x-polarization state, is a random process whose spectrum (spectral density) the astronomer attempts to measure. Notice from its definition that the spectral density, SEx (f ) is nothing but the specific intensity, Iν [Eq. (2.11) with ν = f ], integrated over the solid angle subtended by the source: Z 4π d Energy 4π Iν dΩ . (5.31) = SEx (f ) = c d Area d time df c (Here ν and f are just two alternative notations for the same frequency, and we use Gaussian units.) It is precisely this SEx (f ) that radio astronomers seek to measure; and they must do so in the presence of noise due to other, nearby radio sources, noise in their radio receivers, and “noise” produced by commercial radio stations.

5.4.1

Intuitive Meaning of Spectral Density

As an aid to understanding various types of noise, we shall seek an intuitive understanding of the meaning of the spectral density Sy (f ): Suppose that we examine the time evolution of a random process y(t) over a specific interval of time ∆t. That time evolution will involve fluctuations at various frequencies from f = ∞ on down to the lowest frequency for which we can fit at least one period into the time interval studied, i.e., down to f = 1/∆t. Choose a frequency f in this range, and ask what are the mean square fluctuations in y at that frequency. By definition, they will be 2 [∆y(∆t, f )] ≡ lim N →∞ N 2

n=+N/2

X 1 Z (n+1)∆t i2πf t [y(t) − y¯]e dt ∆t n∆t

2

.

(5.32a)

n=−N/2

Here the factor 2 in 2/N accounts for our insistence on folding negative frequencies f into positive, and thereby regarding f as nonnegative; i.e., the quantity (5.32a) is the mean square fluctuation at frequency −f plus that at +f . The phases of the finite Fourier transforms appearing in (5.32a) (one transform for each interval of time ∆t) will be randomly distributed with respect to each other. As a result, if we add these Fourier transforms and then compute their absolute square rather than computing their absolute squares first and then adding, the new terms we introduce will have random relative phases that cause them to cancel each other. In other words, with vanishing error in the limit N → ∞, we can rewrite (5.32a) as 2 [∆y(∆t, f )] = lim N →∞ N 2

n=+N/2

X

n=−N/2

1 ∆t

Z

(n+1)∆t

n∆t

i2πf t

[y(t) − y¯]e

dt

2

.

(5.32b)

By defining T ≡ N∆t and noting that a constant in y(t) contributes nothing to the Fourier transform at finite (nonzero) frequency f , we can rewrite this expression as Z 2 1 2 +T /2 1 i2πf t 2 (y − y¯)e dt = Sy (f ) . (5.32c) [∆y(∆t, f )] = lim T →∞ T ∆t ∆t −T /2

18

t

t

(a)

(b)

Fig. 5.6: Examples of two random processes that have flicker noise spectra, Sy (f ) ∝ 1/f . [From Press (1978).]

It is conventional to call the reciprocal of the time ∆t on which these fluctuations are studied the bandwidth ∆f of the study; i.e., ∆f ≡ 1/∆t ,

(5.33)

and correspondingly it is conventional to interpret (5.32b) as saying that the root-meansquare (rms) fluctuations at frequency f and during a time ∆t ≥ f −1 are ∆y(∆t, f ) =

5.4.2

q Sy (f )∆f

where ∆f = 1/∆t .

(5.34)

Shot Noise, Flicker Noise and Random-Walk Noise

Special noise spectra. Certain spectra have been given special names: Sy (f ) independent of f

— white noise spectrum,

(5.35a)

Sy (f ) ∝ 1/f

— flicker noise spectrum,

(5.35b)

Sy (f ) ∝ 1/f 2

— random walk spectrum.

(5.35c)

White noise is called “white” because it has equal amounts of “power per unit frequency” Sy at all frequencies, just as white light has roughly equal powers at all light frequencies. Put differently, if y(t) has a white-noise spectrum, then its rms fluctuations over a fixed time interval ∆t (i.e., in a fixed bandwidth ∆f ) are independent of frequency f ; i.e., ∆y(∆t, f ) = p Sy /∆t is independent of f since Sy is independent of f . Flicker noise gets its name from the fact that, when one looks at the time evolution y(t) of a random process with a flicker-noise spectrum, one sees fluctuations (“flickering”) on all timescales, and the rms amplitude of flickering is independent of the timescale one chooses. Stated more precisely, choose any timescale ∆t and then choose a frequency f ∼ 3/∆t so one can fit roughly three periods of oscillation into the chosen timescale. Then the rms amplitude of the fluctuations one observes will be q ∆y(∆t, f = 3/∆t) = Sy (f )f /3 , (5.36)

19 ra nd S

ln Sω ( f )

ω

-8

10

om

~1

w

/f 2 alk

-6

10

white Sω = const -4

10

10

-2

1

2

f, Hz

10

Fig. 5.7: The spectral density of the fluctuations in angular frequency ω of ticking of a Rubidium atomic clock.

which is a constant independent of f when the spectrum is that of flicker noise, Sy ∝ 1/f . Stated differently, flicker noise has the same amount of power in each octave of frequency. Figure 5.6 is an illustration: Both graphs shown there depict random processes with flickernoise spectra. (The differences between the two graphs will be explained below.) No matter what time interval one chooses, these processes look roughly periodic with one or two or three oscillations in that time interval; and the amplitudes of those oscillations are independent of the chosen time interval. Random-walk spectra arise when the random process y(t) undergoes a random walk. We shall study an example in Sec. 5.7.3 below. Notice that for a Gaussian Markov process the spectrum (Fig. 5.5) is white at frequencies f ≪ 1/(2πτr ) where τr is the relaxation time, and it is random-walk at frequencies f ≫ 1/(2πτr ). This is typical: random processes encountered in the real world tend to have one type of spectrum over one large interval of frequency, then switch to another type over another large interval. The angular frequency of ticking of a Rubidium atomic clock furnishes another example. That angular frequency fluctuates slightly with time, ω = ω(t); and those fluctuations have the form shown in Fig. 5.7. At low frequencies, 10−7 Hz . f . 10−2 Hz (over long timescales 100sec . ∆t . 3 hr), ω exhibits random-walk noise; and at higher frequencies, 10−2Hz . f . 10 Hz (timescales 0.1sec . ∆t . 100 sec), it exhibits white noise.

5.4.3

Information Missing from Spectral Density

In experimental studies of noise, attention focuses very heavily on the spectral density Sy (f ) and on quantities that one can compute from it. In the special case of a Gaussian-Markov process, the spectrum Sy (f ) and the mean y¯ together contain full information about all statistical properties of the random process. However, most random processes that one encounters are not Markov (though most are Gaussian). (Whenever the spectrum deviates from the special form in Fig. 5.5, one can be sure the process is not Gaussian-Markov.) Correspondingly, for most processes the spectrum contains only a tiny part of the statistical information required to characterize the process. The two random processes shown in Fig. 5.6 above are a good example. They were constructed on a computer as superpositions of pulses F (t − to ) with random arrival times to and with identical forms √ F (t) = 0 for t < 0 , F (t) = K/ t for t > 0 . (5.37)

20 The two y(t)’s look very different because the first [Fig. 5.6 (a)] involves frequent small pulses, while the second [Fig. 5.6(b)] involves less frequent, larger pulses. These differences are obvious to the eye in the time evolutions y(t). However, they do not show up at all in the spectra Sy (f ): the spectra are identical; both are of flicker type. Moreover, the differences do not show up in p1 (y1 ) or in p2 (y2 , t2 ; y1 , t1 ) because the two processes are both superpositions of many independent pulses and thus are Gaussian; and for Gaussian processes p1 and p2 are determined fully by the mean and the correlation function, or equivalently by the mean and spectral density, which are the same for the two processes. Thus, the differences between the two processes show up only in the probabilities pn of third order and higher, n ≥ 3.

5.5 5.5.1

Filters, Signal-to-Noise Ratio, and Shot Noise Filters, their Kernals, and the Filtered Spectral Density

Filters. In experimental physics and engineering one often takes a signal y(t) or a random process y(t) and filters it to produce a new function w(t) that is a linear functional of y(t): w(t) =

Z

+∞

K(t − t′ )y(t′ )dt′ .

(5.38)

−∞

The quantity y(t) is called the filter’s input; K(t − t′ ) is the filter’s kernel, and w(t) is its output. We presume throughout this chapter that the kernel depends only on the time difference t − t′ and not on absolute time. One says that the filter is stationary when this is so; and when it is violated so K = K(t, t′ ) depends on absolute time, the filter is said to be nonstationary. Our restriction to stationary filters goes hand-in-hand with our restriction to stationary random processes, since if y(t) is stationaryR as we require, and if the filter is +∞ stationary as we require, then the filtered process w(t) = −∞ K(t − t′ )y(t′ )dt′ is stationary. Some examples of kernels and their filtered outputs are these: K(τ ) = δ(τ ) : w(t) = y(t) , K(τ ) = δ ′ (τ ) : w(t) = dy/dt , K(τ ) = 0 for τ < 0 and 1 for τ > 0 :

w(t) =

Rt

−∞

(5.39) ′

′

y(t )dt .

As with any function, a knowledge of the kernel K(τ ) is equivalent to a knowledge of its Fourier transform Z +∞ ˜ K(f ) ≡ K(τ )ei2πf τ dτ . (5.40) −∞

This Fourier transform plays a central role in the theory of filtering (also called the theory of linear signal processing): The convolution theorem of Fourier transform theory says that, if y(t) is a function whose Fourier transform y˜(f ) exists (converges), then the Fourier transform of the filter’s output w(t) [Eq. (5.38)] is given by ˜ )˜ w(f ˜ ) = K(f y(f ) .

(5.41)

21

K( τ )

τ Fig. 5.8: The kernel (5.44a) whose filter multiplies the spectral density by a factor 1/f , thereby converting white noise into flicker noise, and flicker noise into random-walk noise.

Similarly, by virtue of the definition (5.21) of spectral density in terms of Fourier transforms, if y(t) is a random process with spectral density Sy (f ), then the filter’s output w(t) will be a random process with spectral density ˜ )|2 Sy (f ) . Sw (f ) = |K(f

(5.42)

˜ ), like all Fourier transforms, is defined for both positive and [Note that, although K(f negative frequencies, when its modulus is used in (5.42) to compute the effect of the filter on a spectral density, only positive frequencies are relevant; spectral densities are strictly positive-frequency quantitities.] ˜ )|2 that appears in the very important relation (5.42) is most easily The quantity |K(f computed not by evaluating directly the Fourier transform (5.40) and then squaring, but rather by sending the function ei2πf t through the filter and then squaring. To see that this works, notice that the result of sending ei2πf t through the filter is Z +∞ ′ ˜ ∗ (f )ei2πf t , K(t − t′ )ei2πf t dt′ = K (5.43) −∞

˜ ) by complex conjugation and a change of phase, and which thus has which differs from K(f ˜ )|2 . For example, if w(t) = dn y/dtn , then when we absolute value squared equal to |K(f ˜ )|2 = (2πf )2n , and send ei2πf t through the filter we get (i2πf )n ei2πf t ; and, accordingly, |K(f 2n Sw (f ) = (2πf ) Sy (f ). This last example shows that by differentiating a random process once, one changes its spectral density by a multiplicative factor (2πf )2; for example, one can thereby convert random-walk noise into white noise. Similarly, by integrating a random process once in time (the inverse of differentiating), one multiplies its spectral density by (2πf )−2. If one wants, instead, to multiply by f −1 , one can achieve that using the filter r 2 for τ > 0 ; (5.44a) K(τ ) = 0 for τ < 0 , K(τ ) = τ see Fig. 5.8. Specifically, it is easy to show, by sending a sinusoid through this filter, that Z t r 2 y(t′ )dt′ (5.44b) w(t) ≡ ′ t − t −∞

22

∆f

2

|K ( f)|

| K ( fo)| 2 fo

f

Fig. 5.9: A band-pass filter centered on frequency fo with bandwidth ∆f .

has

1 Sy (f ) . (5.44c) f Thus, by filtering in this way one can convert white noise into flicker noise, and flicker noise into random-walk noise. Sw (f ) =

5.5.2

Band-Pass Filter and Signal to Noise Ratio

In experimental physics and engineering one often meets a random process Y (t) that consists of a sinusoidal signal on which is superposed noise y(t) Y (t) =

√

2Ys cos(2πfo t + δo ) + y(t) .

(5.45a)

We shall assume that the frequency fo and phase δo of the signal are√known, and we want to determine the signal’s root-mean-square amplitude Ys . (The factor 2 is included in (5.45a) because√the time average of the square of the cosine is 1/2; and, correspondingly, with the factor 2 present, Ys is the rms signal amplitude.) The noise y(t) is an impediment to the determination of Ys . To reduce that impediment, we can send Y (t) through a band-pass filter , i.e., a filter with a shape like that of Fig. 5.9. For such a filter, with central frequency fo and with bandwidth ∆f ≪ fo , the bandwidth is defined by ∆f ≡

R∞ 0

˜ )|2df |K(f . ˜ o )|2 |K(f

The output, W (t) of such a filter, when Y (t) is sent in, will have the form √ ˜ o )| 2Ys cos(2πfo t + δ1 ) + w(t) , W (t) = |K(f

(5.45b)

(5.45c)

where the first term is the filtered signal and the second is the filtered noise. The output signal’s phase δ1 may be different from the input signal’s phase δo , but that difference can be evaluated in advance for one’s filter and can be taken into account in the measurement of Ys , and thus it is of no interest to us. Assuming, as we shall, that the input noise y(t) has spectral density Sy which varies negligibly over the small bandwidth of the filter, the filtered noise w˜ will have spectral density ˜ )|2 Sy (fo ) . Sw (f ) = |K(f

(5.45d)

23 Correspondingly, by virtue of Eq. (5.32c) for the rms fluctuations of a random process at various frequencies and on various timescales, w(t) will have the form w(t) = wo (t) cos[2πfo t + φ(t)] ,

(5.45e)

with an amplitude wo (t) and phase φ(t) that fluctuate randomly on timescales ∆t ∼ 1/∆f , but that are nearly constant on timescales ∆t ≪ 1/∆f . Here ∆f is the bandwidth of the filter, and hence [Eq. (5.45d)] the bandwidth within which Sw (f ) is concentrated. The filter’s net output, W (t), thus consists of a precisely sinusoidal signal at frequency fo , with known phase δ1 , and with an amplitude that we wish to determine, plus a noise w(t) that is also sinusoidal at frequency fo but that has amplitude and phase which wander randomly on timescales ∆t ∼ 1/∆f . The rms output signal is ˜ o )|Ys , S ≡ |K(f [Eq. (5.45c)] while the rms output noise is Z ∞ Z q 1 N ≡ σw = [ Sw (f )df ] 2 = Sy (fo )[ 0

∞ 0

q ˜ o )| Sy (fo )∆f , ˜ )|2 df ] 21 = |K(f |K(f

(5.45f)

(5.45g)

where the first integral follows from Eq. (5.22), the second from Eq. (5.45d), and the third from the definition (5.45b) of the bandwidth ∆f . The ratio of the rms signal (5.45f) to the rms noise (5.45g) after filtering is S Ys . =p N Sy (fo )∆f

(5.46)

Thus, the rms output S + N of the filter is the signal amplitude to within an rms fractional error N/S given by the reciprocal of (5.46). Notice that the narrower the filter’s bandwidth, the more accurate will be the measurement of the signal. In practice, of course, one does not know the signal frequency with complete precision in advance, and correspondingly one does not want to make one’s filter so narrow that the signal might be lost from it. A simple example of a band-pass filter is the following finite-Fourier-transform filter : Z t w(t) = cos[2πfo (t − t′ )]y(t′ )dt′ where ∆t ≫ 1/fo . (5.47a) t−∆t

In Ex. 5.2 it is shown that this is indeed a band-pass filter, and that the integration time ∆t used in the Fourier transform is related to the filter’s bandwidth by ∆f =

1 . ∆t

(5.47b)

This is precisely the relation (5.33) that we introduced when discussing the temporal char˜ o )| to unity), Eq. (5.45g) acteristics of a random process; and (setting the filter’s “gain” |K(f p for the rms noise after filtering, rewritten as N = σw = Sw (fo )∆f , is precisely expression (5.34) for the rms fluctuations in the random process w(t) at frequency fo and on timescale ∆t = 1/∆f .

24 F(τ)

Sy

(a)

τp

(b) τ

1/ τp

f

Fig. 5.10: (a) A broad-band pulse that produces shot noise by arriving at random times. (b) The spectral density of the shot noise produced by that pulse.

5.5.3

Shot Noise

A specific kind of noise that one frequently meets and frequently wants to filter is shot noise. A random process y(t) is said to consist of shot noise if it is a random superposition of a large number of pulses. In this chapter we shall restrict attention to a simple variant of shot noise in which the pulses all have identically the same shape, F (τ ) [e.g., Fig. 5.10 (a)]), but their arrival times ti are random: X F (t − ti ) . (5.48a) y(t) = i

We denote by R the mean rate of pulse arrivals (the mean number per second). It is straightforward, from the definition (5.21) of spectral density, to see that the spectral density of y is Sy (f ) = 2R|F˜ (f )|2 , (5.48b) where F˜ (f ) is the Fourier transform of F (τ ) [e.g., Fig. 5.10 (b)]. Note that, if the pulses are broad-band bursts without much substructure in them [as in Fig. 5.10 (a)], then the duration τp of the pulse is related to the frequency fmax at which the spectral density starts to cut off by fmax ∼ 1/τp ; and since the correlation function is the cosine transform of the spectral density, the relaxation time in the correlation function is τr ∼ 1/fmax ∼ τp . In the common (but not universal) case that many pulses are on at once on average, Rτp ≫ 1, y(t) at any moment of time is the sum of many random processes; and, correspondingly, the central limit theorem guarantees that y is a Gaussian random process. Over time intervals smaller than τp ∼ τr the process will not generally be Markov, because a knowledge of both y(t1 ) and y(t2 ) gives some rough indication of how many pulses happen to be on and how many new ones turned on during the time interval between t1 and t2 and thus are still in their early stages at time t3 ; and this knowledge helps one predict y(t3 ) with greater confidence than if one knew only y(t2 ). In other words, P3 (y3 , t3 |y2, t2 ; y1 , t1 ) is not equal to P2 (y3 , t3 |y2 , t2 ), which implies non-Markovian behavior. On the other hand, if many pulses are on at once, and if one takes a coarse-grained view of time, never examining time intervals as short as τp or shorter, then a knowledge of y(t1 ) is of no special help in predicting y(t2 ), all correlations between different times are lost, and the process is Markov and (because it is a random superposition of many independent influences) it is also Gaussian — an example of the Central Limit Theorem at work — and it thus must have the standard Gaussian-Markov spectral density (5.29b) with vanishing correlation time

25 τr —i.e., it must be white. Indeed, it is: The limit of Eq. (5.48b) for f ≪ 1/τp and the corresponding correlation function are Sy (f ) = 2R|F˜ (0)|2 ,

Cy (τ ) = R|F˜ (0)|2 δ(τ ) .

(5.48c)

**************************** EXERCISES Exercise 5.1 Practice: Spectral density of the sum of two random processes Let u and v be two random processes. Show that Su+v (f ) = Su (f ) + Sv (f ) + Suv (f ) + Svu (f ) = Su (f ) + Sv (f ) + 2ℜSuv (f ) .

(5.49)

Exercise 5.2 Derivation and Example: Bandwidths of a finite-Fourier-transform filter and an averaging filter (a) If y is a random process with spectral density Sy (f ), and w(t) is the output of the finite-Fourier-transform filter (5.47a), what is Sw (f )? ˜ )|2 for this finite-Fourier-transform filter, and (b) Draw a sketch of the filter function |K(f show that its bandwidth is given by (5.47b). (c) An “averaging filter” is one which averages its input over some fixed time interval ∆t: Z t 1 y(t′ )dt′ . (5.50a) w(t) ≡ ∆t t−∆t ˜ )|2 for this filter? Draw a sketch of this |K(f ˜ )|2. What is |K(f (d) Suppose that y(t) has a spectral density that is very nearly constant at all frequencies f . 1/∆t, and that this y is put through the averaging filter (5.50a). Show that the rms fluctuations in the averaged output w(t) are q (5.50b) σw = Sy (0)∆f , where ∆f , interpretable as the bandwidth of the averaging filter, is ∆f =

1 . 2∆t

(5.50c)

(Recall that in our formalism we insist that f be nonnegative.) Why the factor 1/2 here and no 1/2 for an averaging filter, Eq. (5.47b)? Because here, with f restricted to positive frequencies and the filter centered on zero frequency, we see only the right half of the filter: f ≥ fo = 0 in Fig. 5.9.

26 Exercise 5.3 ***Example: Wiener’s Optimal Filter Suppose that you have a noisy receiver of weak signals (a radio telescope, or a gravitationalwave detector, or . . .). You are expecting a signal s(t) with finite duration and known form to come in, beginning at a predetermined time t = 0, but you are not sure whether it is present or not. If it is present, then your receiver’s output will be Y (t) = s(t) + y(t) ,

(5.51a)

where y(t) is the receiver’s noise, a random process with spectral density Sy (f ) and with zero mean, y¯ = 0. If it is absent, then Y (t) = y(t). A powerful way to find out whether the signal is present or not is by passing Y (t) through a filter with a carefully chosen kernel K(t). More specifically, compute the number Z +∞ W ≡ K(t)Y (t)dt . (5.51b) −∞

If K(t) is chosen optimally, then W will be maximally sensitive to the signal s(t) and minimally sensitive to the noise y(t); and correspondingly, if W is large you will infer that the signal was present, and if it is small you will infer that the signal was absent. This exercise derives the form of the optimal filter , K(t), i.e., the filter that will most effectively discern whether the signal is present or not. As tools in the derivation we use the quantities S and N defined by Z +∞ Z +∞ S≡ K(t)s(t)dt , N ≡ K(t)y(t)dt . (5.51c) −∞

−∞

Note that S is the filtered signal, N is the filtered noise, and W = S + N. Since K(t) and s(t) are precisely defined functions, S is a number; but since y(t) is a random process, the value of N is not predictable, and instead is given by some probability distribution p1 (N). ˜ ) of the kernel K(t). We shall also need the Fourier transform K(f (a) In the measurement being done one is not filtering a function of time to get a new function of time; rather, one is just computing a number, W = S + N. Nevertheless, as an aid in deriving the optimal filter it is helpful to consider the time-dependent output of the filter which results when noise y(t) is fed continuously into it: Z +∞ N(t) ≡ K(t − t′ )y(t′ )dt′ . (5.52a) −∞

Show that this random process has a mean squared value Z ∞ 2 ˜ )|2 Sy (f )df . N = |K(f

(5.52b)

0

Explain why this quantity is equal to the average of the number N 2 computed via (5.51c) in an ensemble of many experiments: Z Z ∞ 2 2 2 ˜ )|2 Sy (f )df . N = hN i ≡ p1 (N)N dN = |K(f (5.52c) 0

27 (b) Show that of all choices of K(t), the one that will give the largest value of S 1

hN 2 i 2

(5.52d)

˜ ) is is Norbert Wiener’s (1949) optimal filter: the K(t) whose Fourier transform K(f given by ˜ ) = const × s˜(f ) , K(f (5.53a) Sy (f ) where s˜(f ) is the Fourier transform of the signal s(t) and Sy (f ) is the spectral density of the noise. Note that when the noise is white, so Sy (f ) is independent of f , this optimal filter function is just K(t) = const × s(t); i.e., one should simply multiply the known signal form into the receiver’s output and integrate. On the other hand, when the noise is not white, the optimal filter (5.53a) is a distortion of const × s(t) in which frequency components at which the noise is large are suppressed, while frequency components at which the noise is small are enhanced. (c) Show that when the optimal filter (5.53a) is used, the square of the signal-to-noise ratio is Z ∞ |˜ s(f )|2 S2 = 2 df . (5.53b) hN 2 i Sy (f ) 0 Exercise 5.4 ***Example: Allan Variance of Clocks3 Highly stable clocks (e.g., Rubidium clocks or Hydrogen maser clocks) have angular frequencies ω of ticking which tend to wander so much over long time scales that their variances are divergent. More specifically, they typically show random-walk noise on long time scales (low frequencies) Sω (f ) ∝ 1/f 2 at low f ; (5.54a) and correspondingly, 2

σω =

Z

∞

Sω (f )df = ∞ ;

(5.54b)

0

cf. Fig. 5.7 and associated discussion. For this reason, clock makers have introduced a special technique for quantifying the frequency fluctuations of their clocks: They define Z t φ(t) = ω(t′)dt′ = (phase) , (5.55a) 0

[φ(t + 2τ ) − φ(t + τ )] − [φ(t + τ ) − φ(t)] √ , (5.55b) 2¯ ωτ √ where ω ¯ is the mean frequency. Aside from the 2, this is the fractional difference of clock readings for two successive intervals of duration τ . [In practice the measurement of t is made by a clock more accurate than the one being studied; or, if a more accurate clock is not available, by a clock or ensemble of clocks of the same type as is being studied.] Φτ (t) =

3

For a readable review article on how to characterize frequency fluctuations of clocks, see Rutman (1978).

28 (a) Show that the spectral density of Φτ (t) is related to that of ω(t) by 2 2 cos 2πf τ − 1 Sω (f ) SΦτ (f ) = ω ¯2 2πf τ ∝ f 2 Sω (f ) at f ≪ 1/2πτ , ∝ f −2 Sω (f ) at f ≫ 1/2πτ .

(5.56)

Note that SΦτ (f ) is much better behaved (more strongly convergent when integrated) than Sω (f ), both at low frequencies and at high. (b) The Allan variance of the clock is defined as 2

στ ≡ [ variance of Φτ (t)] =

Z

∞

SΦτ (f )df .

(5.57)

0

Show that

1 Sω (1/2τ ) 1 2 στ = α , ω ¯2 2τ

(5.58)

where α is a constant of order unity which depends on the spectral shape of Sω (f ) near f = 1/2τ . (c) Show that if ω has a white-noise spectrum, √ then the clock stability is better for long averaging times than for short [στ ∝ 1/ τ ]; that if ω has a flicker-noise spectrum, then the clock stability is independent of averaging time; and if ω has a random-walk spectrum, √ then the clock stability is better for short averaging times than for long [στ ∝ τ ]. Exercise 5.5 ***Example: Cosmological Density Fluctuations Random processes can be stochastic functions of some other variable or variables rather than time. For example, we can describe relative density fluctuations in the large scale distribution of mass in the universe using the quantity δ(x) ≡

ρ(x) − hρi hρi

(5.59)

(not to be confused with a Dirac delta function). This is a function of 3-dimensional position rather than one-dimensional time, and h. . .i is to be interpreted conceptually as an ensemble average and practically as a volume average. (a) Define the Fourier transform of δ over some large averaging volume V by Z δ˜V (k) = dxeik·x δ(x) ,

(5.60a)

V

and define a spectral density by 1 ˜ |δV (k)|2 . V →∞ V

Pδ (k) ≡ lim

(5.60b)

29 (Note that we here use cosmologists’ normalization for Pδ , which is different from our normalization for a random process in time; we do not fold negative values of kj onto positive values.) Show that the two-point correlation function for cosmological density fluctuations, defined by ξδ (r) ≡ hδ(x)δ(x + r)i , (5.60c) is given by the following version of the Wiener-Khintchine equations: Z Z dk 2 dk −ik·r e Pδ (k) = k sinc(kr)Pδ (k) , ξδ (r) = 3 (2π) 2π 2

(5.60d)

where sinc x ≡ sin x/x and we have used the fact that the universe is isotropic to obtain the second identity. (b) Show that the variance of the total mass M inside a sphere of radius R is Z dk 2 2 σM = k Pδ (k)W 2 (kR) , 2π 2 where W (x) =

3(sinc x − cos x) . x2

(5.60e)

(5.60f)

****************************

5.6 5.6.1

Fluctuation-Dissipation Theorem Generalized Coordinate and its Impedance

In the remainder of this chapter, we use the theory of random processes to study the evolution of a semiclosed system which is interacting weakly with a heat bath. For example, we shall study the details of how an ensemble of such systems moves from a very well known state, with low entropy and with its systems concentrated in a tiny region of phase space, into statistical equilibrium with high entropy and systems spread out widely over phase space. We develop two tools to aid in analyzing such situations: the Fluctuation-dissipation theorem (this section), and the Fokker-Planck equation (next section). The fluctation-dissipation theorem describes the behavior of any generalized coordinate q of any system that is weakly coupled to a thermalized bath with many degrees of freedom. For example, (i) q could be the x coordinate of a dust particle floating in air, and the bath would then consist of the air molecules that buffet it. (i) q could be the electric charge on a capacitor in an electrical circuit, and the bath would then consist of the thermalized internal degrees of freedom of all the resistors in the circuit.

30 (iii) q could be the horizontal position of a pendulum in vacuum, and the bath would then consist of the vibrational normal modes (“phonon modes”) of the pendulum’s wire and its overhead support. (iv) q could be the location of the front face of a mirror as measured by a reflecting laser beam, i.e., Z −r2 /ro2 e q= z(r, φ)rdφdr , (5.61) πro2 where z(r, φ) is the longitudinal location of the mirror face at transverse position (r, φ) and ro is the transverse radius at which the beam’s Gaussian energy-flux profile has dropped to 1/e of its central value. In this case the bath would consist of the mirror’s vibrational normal modes (phonon modes). This last example, due to Levin (1998), illustrates the fact that q need not be the generalized coordinate of an oscillator or a free mass. In this case, instead, q is a linear superposition of the generalized coordinates of many different oscillators (the mirror’s normal modes whose eigenfunctions entail significant motion of the mirror’s front face). See Ex. 5.8 for further detail on this example. When a sinusoidal external force F = Fo e−iωt acts on the generalized coordinate q [so q’s canonically conjugate momentum p is being driven as (dp/dt)drive = Fo e−iωt ], then the velocity of the resulting sinuosoidal motion will be q˙ ≡

1 dq = −iωq = Fo e−iωt , dt Z(ω)

(5.62a)

where the real part of each expression is to be taken. The ratio Z(ω) of force to velocity, which appears here (before the real part is taken), is q’s complex impedance; it is determined by the system’s details. If the system were completely conservative, then the impedance would be perfectly imaginary, Z = iI. For example, for a freely moving dust particle in vacuum [(i) above with the air removed], driven by a sinusoidal force, the momentum is p = mq˙ (where m is the particle’s mass), the equation of motion is F = dp/dt = m(d/dt)q˙ = m(−iω)q, ˙ and so the impedance is Z = −imω, which is pure imaginary. The bath prevents the system from being conservative: Energy can be fed back and forth between the generalized coordinate q and the bath’s many degrees of freedom. This energy coupling influences the generalized coordinate q in two important ways: First, it changes the impedance Z(ω) from pure imaginary to complex, Z(ω) = iI(ω) + R(ω) ,

(5.62b)

where R is the (frictional) resistance experienced by q; and correspondingly, when the sinusoidal force F = Fo e−iωt is applied, the resulting motions of q feed energy into the bath, frictionally dissipating power at a rate Wdiss = hℜ(F )ℜ(q)i ˙ = hℜ(Fo e−iωt )ℜ(Fo e−iωt /Z)i = hFo cos ωt ℜ(1/Z) Fo cos ωt)i; i.e., Wdiss =

1 R 2 F . 2 |Z|2 o

(5.63)

31 Second, the thermal motions of the bath exert a randomly fluctuating force F ′ (t) on q, driving its generalized momentum as (dp/dt)drive = F ′ . This is illustrated by the following two examples: (i) For a dust particle floating in air with q the particle’s x coordinate, the air molecules produce a frictional force −Rq˙ (with R the coefficient of friction) that slows the particle down, and they also produce the fluctuating force F ′ that buffets it randomly. We identify the impedance by writing down the equation of motion with a sinusoidal force F = Fo e−iωt imposed and the fluctuating force F ′ ignored: F = dp/dt = mdq/dt+R ˙ q˙ = [m(−iω) + R]q, ˙ from which we read off Z(ω) = F/q˙ = −imω + R. Thus, as expected, the coefficient of friction R is the real part of the impedance, i.e. it is the resistance. (ii) For a circuit that contains an inductance L, capacitance C, and resistance R (resistance in the sense of electrical circuit theory), the bath is the many thermalized degrees of freedom in the resistor; they produce the resistance R and also produce a fluctuating voltage V ′ (t) across the resistor. We choose as the generalized coordinate the charge q on the capacitor (so q˙ is the current in the circuit), and we can then identify the generalized momentum by shutting off the bath, writing down the Lagrangian for the resulting L-C circuit L = 21 Lq˙2 − 21 q 2 /C and computing p = ∂L/∂ q˙ = Lq. ˙ (Equally well, we can identify p from one of Hamilton’s equations for the Hamiltonian H = 1 Lp2 + 12 q 2 /C). We then evaluate the impedance from the equation of motion for this 2 Lagrangian with the bath’s resistance added (but not its fluctuating voltage), and with a sinusoidal voltage V = Vo e−iωt imposed: dq˙ q 1 dp = L + + Rq˙ = −iωL + + R q˙ = Vo e−iωt . (5.64a) dt dt C −iωC Evidently, V = Vo e−iωt is the generalized force F that drives the generalized momentum, and the complex impedance (ratio of force to velocity) is Z(ω) =

V 1 = −iωL + +R . q˙ −iωC

(5.64b)

This is identical to the impedance as defined in the standard theory of electrical circuits (which is what motivates our “F/q” ˙ definition of impedance), and as expected, the real part of this impedance is the circuit’s resistance R.

5.6.2

Fluctuation-Dissipation Theorem for Generalized Coordinate Interacting with Thermalized Heat Bath

Because the fluctuating force F ′ (equal to fluctuating voltage V ′ in the case of the circuit) and the resistance R to an external force both arise from interaction with the same heat bath, there is an intimate connection between them. For example, the stronger the coupling to the bath, the stronger will be the resistance R and the stronger will be F ′ . The precise relationship between the dissipation embodied in R and the fluctuations embodied in F ′ is

32 given by the following formula for the spectral density SF ′ (f ) of F ′ SF ′ (f ) = 4R(f )

SF ′ (f ) = 4R(f )kB T

hf 1 hf + hf /k T B 2 e −1

in general

in the classical limit, kB T ≫ hf ,

(5.65a) (5.65b)

which is valid at all frequencies f that are coupled to the bath. Here T is the temperature of the bath, h is Planck’s constant, and we have written the resistance as R(f ) to emphasize that it can depend on frequency f = ω/2π. This formula has two names: the fluctuationdissipation theorem and the generalized Nyquist theorem.4 Notice that in the classical domain, kB T ≫ hf , the spectral density SF ′ (f ) has a whitenoise spectrum. Moreover, since F ′ is produced by interaction with a huge number of bath degrees of freedom, it must be Gaussian, and it will typically also be Markov. Thus, in the classical domain F ′ is typically a Gaussian, Markov, white-noise process. In the extreme quantum domain f ≫ kB T /h, by contrast, SF ′ (f ) consists of a portion 4R(hf /2) that is purely quantum mechanical in origin (it arises from coupling to the zeropoint motions of the bath’s degrees of freedom), plus a thermal portion 4Rhf e−hf /kB T that is exponentially suppressed because any degrees of freedom in the bath that possess such high characteristic frequencies have exponentially small probabilities of containing any thermal quanta at all, and thus exponentially small probabilities of producing thermal fluctuating forces on q. Since this quantum-domain SF ′ (f ) does not have the standard Gaussian-Markov frequency dependence (5.29b), in the quantum domain F ′ is not a Gaussian-Markov process. Derivation of the Fluctuation-Dissipation Theorem: Consider a thought experiment in which the system’s generalized coordinate q is weakly coupled to an external oscillator that has a very large mass M , and has an angular eigenfrequency ωo near which we wish to derive the fluctuation-dissipation formula (5.65). Denote by Q and P the external oscillator’s generalized coordinate and momentum and by K the weak coupling constant between the oscillator and q, so the Hamiltonian of system plus oscillator is H = Hsystem (q, p, ...) +

1 P2 + M ωo2 Q2 + KQq . 2M 2

(5.66a)

Here the “...” refers to the other degrees of freedom of the system, some of which might be strongly coupled to q and p [as is the case, e.g., for the laser-measured mirror of example (iv) above and Ex. 5.8]. Hamilton’s equations state that the external oscillator’s generalized coordinate Q(t) has ˜ a Fourier transform Q(ω) at angular frequency ω given by ˜ = −K q˜ , M (−ω 2 + ωo2 )Q

(5.66b)

where −K q˜ is the Fourier transform of the weak force exerted on the oscillator by the system. Hamilton’s equations also state that the external oscillator exerts a force −KQ(t) on the system. 4

This theorem was derived for the special case of voltage fluctuations across a resistor by Nyquist (1928) and was derived in the very general form presented here by Callen and Welton (1951).

33 In the Fourier domain the system responds to the sum of this force and the bath’s fluctuating force F ′ (t) with a displacement given by the impedance-based expression q˜ =

1 ˜ + F˜ ′ ) . (−K Q −iωZ(ω)

(5.66c)

Inserting Eq. (5.66c) into Eq. (5.66b) and splitting the impedance into its imaginary and real parts, we obtain for the equation of motion of the external oscillator −K ˜ ′ iK 2 R ˜ 2 ′2 Q= M (−ω + ωo ) + F , (5.66d) ω|Z|2 iωZ where ωo′ 2 = ωo2 + K 2 I/(ω|Z|2 ), which we make as close to ωo as we wish by choosing the coupling constant K sufficiently small. This equation can be regarded as a filter which produces from the random process F ′ (t) a random evolution Q(t) of the external oscillator, so by the general influence (5.42) of a filter on the spectrum of a random process, SQ must be SQ =

(K/ω|Z|)2 SF ′ . M 2 (−ω 2 + ωo′ 2 )2 + K 4 R2 /(ω|Z|2 )2

(5.66e)

We make the resonance as sharp as we wish by choosing the coupling constant K sufficiently small, and thereby we guarantee that throughout the resonance, the resistance R and impedance Z are as constant as desired. The mean energy of the oscillator, averaged over an arbitrarily long timescale, can be computed in either of two ways: (i) Because the oscillator is a mode of some boson field and (via its coupling through q) must be in statistical equilibrium with the bath, its mean occupation ′ number must have the standard Bose-Einstein value η¯ = 1/(e~ωo /kB T − 1) plus 21 to account for the oscillator’s zero-point fluctuations; and since each quantum carries an energy ~ωo′ , its mean energy is5 ~ωo′ ¯ = 1 ~ω ′ + E . (5.66f) 2 o e~ωo′ /kB T − 1 (ii) Because on average half the oscillator’s energy is potential and half kinetic, and its mean potential energy is 12 M ωo′ 2 Q2 , and because the ergodic hypothesis tells us that time averages are the same as ensemble averages, it must be that Z ∞ 1 ′2 2 ′2 ¯ SQ (f )df . (5.66g) E = 2 M ωo hQ i = M ωo 2 0 By inserting the spectral density (5.66e) and performing the frequency integral with the help of the sharpness of the resonance, we obtain ′ ¯ = SF ′ (f = ωo /2π) . E 4R

(5.66h)

Equating this to our statistical-equilibrium expression (5.66f) for the mean energy, we see that at the frequency f = ωo′ /2π the spectral density SF ′ (f ) has the form (5.65) claimed in the fluctuationdissipation theorem. Moreover, since ωo′ /2π can be chosen to be any frequency we wish (in the range coupled to the bath), the spectral density SF ′ (f ) has the claimed form anywhere in this range. QED 5

Callen and Welton (1951) give an alternative proof in which the inclusion of the zero-point energy is justified more rigorously.

34 L

γ C

β

α R

Fig. 5.11: The circuit appearing in Ex. 5.6

5.6.3

Johnson Noise and Langevin Equations

One example of the fluctuation-dissipation theorem is the Johnson noise in a resistor: The equation of motion for the charge q on the capacitance is [cf. Eq. (5.64a) above] L¨ q + C −1 q + Rq˙ = V + V ′ ,

(5.67)

where V (t) is whatever voltage is imposed on the circuit and V ′ (t) is the random-process voltage produced by the resistor’s thermalized internal degrees of freedom. The spectral density of V ′ is given, in the classical limit (which is almost always the relevant regime), by SV ′ = 4RkB T . This fluctuating voltage is called Johnson noise and the fluctuationdissipation relationship SV ′ (f ) = 4RkB T is called Nyquist’s theorem because J. B. Johnson (1928) discovered the voltage fluctuations F ′ (t) experimentally and H. Nyquist (1928) derived this fluctuation-dissipation relationship for a resistor in order to explain them. Because the circuit’s equation of motion (5.67) involves a driving force V ′ (t) that is a random process, one cannot solve it to obtain q(t). Instead, one must solve it in a statistical way to obtain the evolution of q’s probability distributions pn (qn , tn ; . . . ; q1 , t1 ) and/or the spectral density of q. This and other evolution equations which involve random-process driving terms are called, by modern mathematicians, stochastic differential equations; and there is an extensive body of mathematical formalism for solving them. In statistical physics stochastic differential equations such as (5.67) are known as Langevin equations. **************************** EXERCISES Exercise 5.6 Practice: Noise in an L-C-R Circuit Consider an L-C-R circuit as shown in Fig. 5.11. This circuit is governed by the differential equation (5.67), where F ′ is the fluctuating voltage produced by the resistor’s microscopic degrees of freedom, and F vanishes since there is no driving voltage in the circuit. Assume that the resistor has temperature T ≫ hfo /k where fo is the circuit’s resonant frequency, and that the circuit has a large quality factor (weak damping) so R ≪ 1/(ωoC) ≃ ωo L. (a) Initially consider the resistor R decoupled from the rest of the circuit, so current cannot flow across it. What is the spectral density Vαβ of the voltage across this resistor?

35 (b) Now place the resistor into the circuit as shown in Fig. 5.11. There will be an additional fluctuating voltage produced by a fluctuating current. What now is the spectral density of Vαβ ? (c) What is the spectral density of the voltage Vαγ between points α and γ? (d) What is the spectral density of the voltage Vβγ ? (e) The voltage Vαβ is averaged from time t = t0 to t = t0 + τ (with τ ≫ 1/fo ), giving some average value U0 . The average is measured once again from t1 to t1 + τ giving U1 . A long sequence of such measurements gives an ensemble of numbers {U0 , U1 , . . ., 1 Un }. What are the mean U¯ and root mean square deviation ∆U ≡ h(U − U¯ )2 i 2 of this ensemble? Exercise 5.7 Example: Thermal Noise in a Resonant-Mass Gravitational Wave Detector The fundamental mode of end-to-end vibration of a solid cylinder obeys the harmonic oscillator equation 2 m(¨ x + x˙ + ω 2 x) = F (t) + F ′ (t) , (5.68) τ∗ where x is the displacement of the cylinder’s end associated with that mode, m, ω, τ∗ are the effective mass, angular frequency, and amplitude damping time associated with the mode, F (t) is an external driving force, and F ′ (t) is the fluctuating force associated with the dissipation that gives rise to τ∗ . Assume that ωτ∗ ≫ 1. (a) Weak coupling to other modes is responsible for the damping. If the other modes are thermalized at temperature T , what is the spectral density SF ′ (f ) of the fluctuating force F ′ ? What is the spectral density Sx (f ) of x? (b) A very weak sinusoidal force drives the fundamental mode precisely on resonance: √ (5.69) F = 2Fs cos ωt . Here Fs is the rms signal. What is the x(t) produced by this signal force? (c) A noiseless sensor monitors this x(t) and feeds it through a narrow-band filter with central frequency f = ω/2π and bandwidth ∆f = 1/ˆ τ (where τˆ is the averaging time used by the filter). Assume that τˆ ≫ τ∗ . What is the rms thermal noise σx after filtering? What is the strength Fs of the signal force that produces a signal x(t) = √ 2xs cos(ωt + δ) with rms amplitude equal to σx ? This is the minimum detectable force at the “one-σ level”. (d) If the force F is due to a sinusoidal √ gravitational wave with 2dimensionless wave field h+ (t) at the crystal given by h+ = 2hs cos ωt, then Fs ∼ mω lhs where l is the length of the crystal; see Chap. 26. What is the minimum detectable gravitational-wave strength hs at the one-σ level? Evaluate hs for the parameters of gravitational-wave detectors that were operating in Europe and the US in the early 2000s: cylinders made

36 of aluminum and cooled to T ∼ 0.1K (100 millikelvin), with masses m ∼ 2000 kg, lengths ℓ ∼ 3 m, angular frequencies ω ∼ 2π × 900 Hz, quality factors Q = ωτ∗ /π ∼ 3 × 106 , and averaging times τˆ ∼ 1 year. [Note: thermal noise is not the only kind of noise that plagues these detectors, but in the narrow-band observations of this exercise, the thermal noise is the most serious.] Exercise 5.8 Problem: Fluctuations of Mirror Position as Measured by a laser Consider a mirror that resides in empty space and interacts only with a laser beam. The beam reflects from the mirror, and in reflecting acquires a phase shift that is proportional to the position q of the mirror averaged over the beam’s transverse light distribution [Eq. (5.61)]. This averaged position q fluctuates due to coupling of the mirror’s face to its internal, thermalized phonon modes (assumed to be in statistical equilibrium at temperature T ). Show that the spectral density of q is given by Sq (f ) =

8kB T Wdiss , (2πf )2 Fo2

(5.70)

where Fo and Wdiss are defined in terms of the following thought experiment: The laser beam is turned off, and then a sinusoidal pressure is applied to the face of the mirror at the location where the laser beam had been. The transverse pressure profile is given by the same Gaussian distribution as the laser light and the pressure’s net force integrated over the mirror face is Fo e−i2πf t . This sinusoidal pressure produces sinusoidal internal motions in the mirror, which in turn dissipate energy at a rate Wdiss . The Fo and Wdiss in Eq. (5.70) are the amplitude of the force and the power dissipation in this thought experiment. [For the solution of this problem and a discussion of its application to laser interferometer gravitational-wave detectors, see Levin (1998).] Exercise 5.9 ***Challenge: Quantum Limit for a Measuring Device Consider any device that is designed to measure a generalized coordinate q of any system. The device inevitably will superpose fluctuating measurement noise q ′ (t) on its output, so that the measured coordinate is q(t) + q ′ (t). The device also inevitably will produce a fluctuating back-action noise force F ′ (t) on the measured system, so the generalized momentum p conjugate to q gets driven as (dp/dt)drive = F ′ (t). As an example, q might be the position of a charged particle, the measuring device might be the light of a Heisenberg microscope (as described in standard quantum mechanics textbooks when introducing the uncertainty principle), and in this case q ′ will arise from the light’s photon shot noise and F ′ will be the fluctuating radiation-pressure force that the light exerts on the particle. The laws of quantum mechanics dictate that the back-action noise F ′ must enforce the uncertainty principle, so that if the rms error of the measurement of q [as determined by the device’s measurement noise q ′ (t)] is ∆q and the rms perturbation of p produced by F ′ (t) is ∆p, then ∆q∆p ≥ ~/2. (a) Suppose that q ′ (t) and F ′ (t) are uncorrelated. Show, by a thought experiment for a measurement that lasts for a time τˆ ∼ 1/f for any chosen frequency f , that Sq′ (f )SF ′ (f ) & ~2 .

(5.71)

37 (b) Continuing to assume that q ′ (t) and F ′ (t) are uncorrelated, invent a thought experiment by which to prove the precise uncertainty relation Sq′ (f )SF ′ (f ) ≥ ~2 .

(5.72a)

[Hint: Adjust the system so that q and p are the generalized coordinate and momentum of a harmonic oscillator with eigenfrequency 2πf , and use a thought experiment with a modulated coupling designed to measure the complex amplitude of excitation of the oscillator by averaging over a very long time.] (c) Now assume that q ′ (t) and F ′ (t) are correlated. Show by a thought experiment like that in part (b) that the determinant of their correlation matrix satisfies the uncertainty relation Sq′ SF ′ − Sq′ F ′ SF ′ q′ = Sq′ SF ′ − |Sq′ F ′ |2 ≥ ~2 . (5.72b)

The uncertainty relation (5.72a) without correlations is called the “standard quantum limit” on measurement accuracies and it holds for any measuring device with uncorrelated measurement and back-action noises. By clever experimental designs one can use the correlations embodied in the modified uncertainty relation (5.72b) to make one’s experimental output insensitive to the back-action noise. For a detailed discussion, see Braginsky and Khalili (1992); for an example, see e.g. Braginsky et. al. (2000), especially Sec. II. ****************************

5.7 5.7.1

Fokker-Planck Equation for a Markov Processes Conditional Probability Fokker-Planck for a One-Dimensional Markov Process

Turn attention next to the details of how interaction with a heat bath drives an ensemble of simple systems, with one degree of freedom y, into statistical equilibrium. We shall require in our analysis that y(t) be Markov. Thus, for example, y could be the x-velocity vx of a dust particle that is buffeted by air molecules, in which case it would be governed by the Langevin equation m¨ x + Rx˙ = F ′ (t) , i.e. my˙ + Ry = F ′ (t) . (5.73) However, y could not be the generalized coordinate q or momentum p of a harmonic oscillator (e.g., of the fundamental mode of a solid cylinder), since neither of them is Markov. On the other hand, if we were considering 2-dimensional random processes (which we are not, until the end of this section), then y could be the pair (q, p) of the oscillator since that pair is Markov; see Ex. 5.12. Because the random evolution of y(t) is produced by interaction with the heat bath’s huge number of degrees of freedom, the central limit theorem guarantees that y is Gaussian.

38 Because our one-dimensional y(t) is Markov, all of its statistical properties are determined by its first absolute probability distribution p1 (y) and its first conditional probability distribution P2 (y, t|yo). Moreover, because y is interacting with a bath, which keeps producing fluctuating forces that drive it in stochastic ways, y ultimately must reach statistical equilibrium with the bath. This means that at very late times the conditional probability P2 (y, t|yo) forgets about its initial value yo and assumes a time-independent form which is the same as p1 (y): lim P2 (y, t|yo) = p1 (y) . (5.74) t→∞

Thus, the conditional probability P2 by itself contains all the statistical information about the Markov process y(t). As a tool in computing the conditional probability distribution P2 (y, t|yo), we shall derive a differential equation for it, called the Fokker-Planck equation. The Fokker-Planck equation has a much wider range of applicability than just to our degree of freedom y interacting with a heat bath; it is valid for (almost) any Markov process, regardless of the nature of the stochastic forces that drive the evolution of y; see below. The Fokker-Planck equation says ∂ ∂ 1 ∂2 P2 = − [A(y)P2 ] + [B(y)P2 ] . ∂t ∂y 2 ∂y 2

(5.75)

Here P2 = P2 (y, t|yo) is to be regarded as a function of the variables y and t with yo fixed; i.e., (5.75) is to be solved subject to the initial condition P2 (y, 0|yo) = δ(y − yo ) .

(5.76)

As we shall see later, the Fokker-Planck equation is a diffusion equation for the probability P2 : as time passes the probability diffuses away from its initial location, y = yo , spreading gradually out over a wide range of values of y. In the Fokker-Planck equation (5.75) the function A(y) produces a motion of the mean away from its initial location, while the function B(y) produces a diffusion of the probability. If one can deduce the evolution of P2 for very short times by some other method [e.g., in the case of our dust particle, by solving the Langevin equation (5.73)], then from that short-time evolution one can compute the functions A(y) and B(y): Z 1 A(y) = lim (y ′ − y)P2 (y ′, ∆t|y)dy ′ , (5.77a) ∆t→0 ∆t Z 1 (y ′ − y)2P2 (y ′, ∆t|y)dy ′ . (5.77b) B(y) = lim ∆t→0 ∆t [These equations can be deduced by reexpressing the limit as an integral of the time derivative ∂P2 /∂t then inserting the Fokker-Planck equation and integrating by parts; Ex. 5.10.] Note that the integral (5.77a) for A(y) is the mean change ∆y in the value of y that occurs in time ∆t, if at the beginning of ∆t (at t = 0) the value of the process is precisely y; moreover (since the integral of yP2 is just equal to y which is a constant), A(y) is also the rate of

39 change of the mean d¯ y/dt. Correspondingly we can write (5.77a) in the more suggestive form ∆y d¯ y A(y) = lim (5.78a) = . ∆t→0 ∆t dt t=0 Similarly the integral (5.77b) for B(y) is the mean-square change in y, (∆y)2, if at the beginning of ∆t the value of the process is precisely y; Rand (one can fairly easily show; Ex. 5.10) it is also the rate of change of the variance σy2 = (y ′ − y¯)2 P2 dy ′. Correspondingly, (5.77b) can be written ! dσy2 (∆y)2 = . B(y) = lim (5.78b) ∆t→0 ∆t dt t=0 It may seem surprising that ∆y and (∆y)2 can both increase linearly in time for small times [cf. the ∆t in the denominators of both (5.78a) and (5.78b)], thereby both giving rise to finite functions A(y) and B(y). In fact, this is so: The linear evolution of ∆y at small t corresponds to the motion of the mean, i.e., of the peak of the probability distribution; while the linear evolution of (∆y)2 corresponds to the diffusive broadening of the probability distribution. Derivation of the Fokker-Planck equation (5.75): Because y is Markov, it satisfies the Smoluchowski equation (5.9), which we rewrite here with a slight change of notation: Z +∞ P2 (y − ξ, t|yo )P2 (y − ξ + ξ, τ |y − ξ)dξ . (5.79a) P2 (y, t + τ |yo ) = −∞

Take τ and ξ to be small, and expand in a Taylor series in τ on the left side of (5.79a) and in the ξ of y − ξ on the right side: Z +∞ ∞ X 1 ∂n n P2 (y, t|yo )P2 (y + ξ, τ |y)dξ P (y, t|y ) τ = P2 (y, t|yo ) + 2 o n! ∂tn −∞ n=1 Z ∞ X 1 +∞ ∂n + (5.79b) (−ξ)n n [P2 (y, t|yo )P2 (y + ξ, τ |y)]dξ . n! −∞ ∂y n=1

In the first integral on the right side the first term is independent of ξ and can be pulled out from under the integral, and the second term then integrates to one; thereby the first integral on the right reduces to P2 (y, t|yo ), which cancels the first term on the left. The result then is ∞ X ∂n 1 P2 (y, t|yo ) τ n n! ∂tn n=1 Z +∞ ∞ X (−1)n ∂ n = ξ n P2 (y + ξ, τ |y)dξ] . (5.79c) [P2 (y, t|yo ) n! ∂y n −∞ n=1

Divide by τ , take the limit τ → 0, and set ξ ≡ y ′ − y to obtain ∞ X (−1)n ∂ n ∂ P2 (y, t|yo ) = [Mn (y)P2 (y, t|yo )] , ∂t n! ∂y n n=1

(5.80a)

40 where

1 Mn (y) ≡ lim ∆t→0 ∆t

Z

(y ′ − y)n P2 (y ′ , ∆t|y)dy ′

(5.80b)

is the “n’th moment” of the probability distribution P2 after time ∆t. This is a form of the FokkerPlanck equation that has slightly wider validity than (5.75). Almost always, however, the only nonvanishing functions Mn (y) are M1 ≡ A, which describes the linear motion of the mean, and M2 ≡ B, which describes the linear growth of the variance. Other moments of P2 grow as higher powers of ∆t than the first power, and correspondingly their Mn ’s vanish. Thus, almost always (and always, so far as we shall be concerned), Eq. (5.80a) reduces to the simpler version (5.75) of the Fokker-Planck equation. QED

Time-Independent Fokker-Planck Equation For our applications below it will be true that p1 (y) can be deduced as the limit of P2 (y, t|yo) for arbitrarily large times t. Occasionally, however, this might not be so. Then, and in general, p1 can be deduced from the timeindependent Fokker-Planck equation: −

1 ∂2 ∂ [A(y)p1 (y)] + [B(y)p1 (y)] = 0 . ∂y 2 ∂y 2

This equation is a consequence of the following expression for p1 in terms of P2 , Z +∞ p1 (y) = P2 (y, t|yo)p1 (yo )dyo ,

(5.81)

(5.82)

−∞

plus the fact that this p1 is independent of t despite the presence of t in P2 , plus the FokkerPlanck equation (5.75) for P2 . Notice that, if P2 (y, t|yo) settles down into a stationary (timeindependent) state at large times t, it then satisfies the same time-independent Fokker-Planck equation as p1 (y), which is in accord with the obvious fact that it must then become equal to p1 (y).

5.7.2

Fokker-Planck for a Multi-Dimensional Markov Process

Few one-dimensional random processes are Markov, so only a few can be treated using the one-dimensional Fokker-Planck equation. However, it is frequently the case that, if one augments additional variables onto the random process, it becomes Markov. An important example is a harmonic oscillator driven by a Gaussian random force (Ex. 5.12). Neither the oscillator’s position x(t) nor its velocity v(t) is Markov, but the pair {x, v} is a 2-dimensional, Markov process. For such a process, and more generally for any n-dimensional, Gaussian, Markov process {y1 (t), y2 (t), . . . , yn (t)} ≡ {y(t)}, the conditional probability distribution P2 (y, t|yo) satisfies the following Fokker-Planck equation [the obvious generalization of Eq. (5.75)]: 1 ∂2 ∂ ∂ [Aj (y)P2] + [Bjk (y)P2] . P2 = − ∂t ∂yj 2 ∂yj ∂yk

(5.83a)

41 Here the functions Aj and Bjk , by analogy with Eqs. (5.77a)–(5.78b), are Z 1 ∆yj ′ ′ n ′ (yj − yj )P2 (y , ∆t|y)d y = lim , Aj (y) = lim ∆t→0 ∆t→0 ∆t ∆t Z 1 ∆yj ∆yk ′ ′ ′ n ′ Bjk (y) = lim (yj − yj )(yk − yk )P2 (y , ∆t|y)d y = lim . ∆t→0 ∆t ∆t→0 ∆t

(5.83b)

(5.83c)

In Ex. 5.12 we shall use this Fokker-Planck equation to explore how a harmonic oscillator settles into equilibrium with a dissipative heat bath.

5.7.3

Brownian Motion

As an application of the Fokker-Planck equation, we use it in Ex. 5.11 to derive the following description of the evolution into statistical equilibrium of an ensemble of dust particles, all with the same mass m, being buffeted by air molecules: Denote by v(t) the x-component (or, equally well, the y- or z-component) of velocity of a dust particle. The conditional probability P2 (v, t|vo ) describes the evolution into statistical equilibrium from an initial state, at time t = 0, when all the particles in the ensemble have velocity v = vo . We shall restrict attention to time intervals large compared to the extremely small time between collisions with air molecules; i.e., we shall perform a coarse-grain average over some timescale large compared to the mean collision time. Then the fluctuating force F ′ (t) of the air molecules on the dust particle can be regarded as a Gaussian, Markov process with white-noise spectral density given by the classical version of the fluctuation-dissipation theorem. Correspondingly, v(t) will also be Gaussian and Markov, and will satisfy the Fokker-Planck equation (5.75). In Ex. 5.11 we shall use the Fokker-Planck equation to show that the explicit, Gaussian form of the conditional probability P2 (v, t|vo), which describes evolution into statistical equilibrium, is

(v − v¯)2 P2 (v, t|vo ) = √ exp − 2σ 2 2πσ 2 1

.

(5.84a)

Here the mean velocity at time t is v¯ = vo e−t/τ∗

with τ∗ ≡

m R

(5.84b)

the damping time due to friction; and the variance of the velocity at time t is σ2 =

kB T (1 − e−2t/τ∗ ) . m

(5.84c)

[Side remark : for free masses the damping time is τ∗ = m/R as in (5.84b), while for oscillators it is τ∗ = 2m/R because half the time an oscillator’s energy is stored in potential form where it is protected from frictional damping, and thereby the damping time is doubled.] Notice that at very early times the variance (5.84c) grows linearly with time (as the Fokker-Planck

42 t=0 P2 t~τ /2 t~τ

t = oo

√ kT/m

t r∗ , so the amplitude is slowly dying and (as you shall see) the frequency is slowly decreasing—due to gradual spindown of the star. (a) What are ω(x, t) and k(x, t) for these gravitational waves? (b) Verify that these ω and k satisfy the dispersion relation (6.4). (c) For this simple dispersion relation, there is no dispersion; the group and phase velocities are the same. Explain why this means that the phase must be constant along the rays, dϕ/dt = 0. From this fact, deduce that the rays are given by {t − r∗ , θ, φ} = constant. Explain why this means means that t − r∗ can be regarded as the retarded time for these waves. (d) Verify that the waves’ amplitude satisfies the propagation law (6.24). Exercise 6.5 Derivation: Hamilton’s Equations for Dispersionless Waves Show that Hamilton’s equations for the standard dispersionless dispersion relation (6.4) imply the same ray equation (6.42) as we derived using Fermat’s principle. Exercise 6.6 Problem: Propagation of Sound Waves in a Wind Consider sound waves propagating in an isothermal atmosphere with constant sound speed c in which there is a horizontal wind shear. Let the (horizontal) wind velocity u = ux ex increase linearly with height z above the ground according to ux = Sz, where S is the constant shearing rate. Just consider rays in the x − z plane. (a) Give an expression for the dispersion relation ω = Ω(x, t; k). [Hint: in the local rest frame of the air, Ω should have its standard sound-wave form.] (b) Show that kx is constant along a ray path and then demonstrate that sound waves will not propagate when ω − ux (z) < c . (6.45) kx

20 (c) Consider sound rays generated on the ground which make an angle θ to the horizontal initially. Derive the equations describing the rays and use them to sketch the rays distinguishing values of θ both less than and greater than π/2. (You might like to perform this exercise numerically.) Exercise 6.7 Example: Self-Focusing Optical Fibers Optical fibers in which the refractive index varies with radius are commonly used to transport optical signals. Provided that the diameter of the fiber is many wavelengths, we can use geometric optics. Let the refractive index be n = n0 (1 − α2 r 2 )1/2

(6.46a)

where n0 and α are constants and r is radial distance from the fiber’s axis. (a) Consider a ray that leaves the axis of the fiber along a direction that makes a small angle θ to the axis. Solve the ray transport equation (6.42) to show that the radius of the ray is given by sin θ αz r= (6.46b) sin α cos θ where z measures distance along the fiber. (b) Next consider the propagation time T for a light pulse propagating along a long length L of fiber. Show that n0 L T = [1 + O(θ4 )] (6.46c) c and comment on the implications of this result for the use of fiber optics for communication. Exercise 6.8 *** Example: Geometric Optics for the Schrödinger Equation Consider the non-relativistic Schrödinger equation for a particle moving in a time-dependent, 3-dimensional potential well. " # 2 ~ ~ ∂ψ 1 = ∇ + V (x, t) ψ . − (6.47) i ∂t 2m i (a) Seek a geometric optics solution to this equation with the form ψ = AeiS/~, where A and V are assumed to vary on a lengthscale L and timescale T long compared to those, 1/k and 1/ω, on which S varies. Show that the leading order terms in the twolengthscale expansion of the Schrödinger equation give the Hamilton-Jacobi equation ∂S 1 + (∇S)2 + V = 0 . (6.48a) ∂t 2m Our notation ϕ ≡ S/~ for the phase ϕ of the wave function ψ is motivated by the fact that the geometric-optics limit of quantum mechanics is classical mechanics, and the function S = ~ϕ becomes, in that limit, “Hamilton’s principal function,” which obeys the Hamilton-Jacobi equation.8 8

See, e.g., Chap. 10 of Goldstein (1980).

21 (b) From this equation derive the equation of motion for the rays (which of course is identical to the equation of motion for a wave packet and therefore is also the equation of motion for a classical particle): p dx = , dt m

dp = −∇V , dt

(6.48b)

where p = ∇S. (c) Derive the propagation equation for the wave amplitude A and show that it implies d|A|2 ∇·p + |A|2 =0 dt m

(6.48c)

Interpret this equation quantum mechanically. Exercise 6.9 *** Example: Energy Density and Flux, and Adiabatic Invariant, for a Dispersionless Wave (a) Show that the standard dispersionless scalar wave equation (6.17) follows from the variational principle Z δ Ldtd3 x = 0 , (6.49a) where L is the Lagrangian density # " 2 1 2 1 ∂ψ − C (∇ψ)2 . L=W 2 ∂t 2

(6.49b)

(not to be confused with the lengthscale L of inhomogeneities in the medium). ˙ ∇ψ, x, t), there is a canonical, relativistic proce(b) For any scalar-field Lagrangian L(ψ, dure for constructing a stress-energy tensor: Tµ ν = −

∂L ψ,µ + δµ ν L . ∂ψ,ν

(6.49c)

Show that, if L has no explicit time dependence (e.g., for the Lagrangian (6.49b) if C = C(x) and W = W (x) do not depend on time t), then the field’s energy is conserved, T 0ν ,ν = 0. A similar calculation shows that if the Lagrangian has no explicit space dependence (e.g., if C and W are independent of x), then the field’s momentum is conserved, T jν ,ν = 0. Here and throughout this chapter we use Cartesian spatial coordinates, so spatial partial derivatives (denoted by commas) are the same as covariant derivatives. (c) Show that expression (6.49c) for the field’s energy density U = T 00 = −T0 0 and its energy flux Fi = T 0i = −T0 0 agree with Eqs. (6.18).

22 (d) Now,Rregard the wave amplitude ψ as a generalized coordinate. Use the Lagrangian L = Ld3x to define a momentum Π conjugate to this ψ, and then compute a wave action Z 2π/ω Z J≡ Π(∂ψ/∂t)d3 x dt , (6.49d) 0

which is the continuum analog of Eq. (6.38). Note that the temporal integral is over one wave period. Show that this J is proportional to the wave energy divided by the frequency and thence to the number of quanta in the wave. [Comment: It is shown in standard texts on classical mechanics that, for approximately periodic oscillations, the particle action (6.38), with the integral limited to one period of oscillation of q, is an adiabatic invariant. By the extension of that proof to continuum physics, the wave action (6.49d) is also an adiabatic invariant. This means that the wave action (and thence also the number of quanta in the waves) is conserved when the medium [in our case the index of refraction n(x)] changes very slowly in time—a result asserted in the text, and a result that also follows from quantum mechanics. We shall study the particle version of this adiabatic invariant, Eq. (6.38) in detail when we analyze charged particle motion in a magnetic field in Chap. 19.]

****************************

6.4

Paraxial Optics

It is quite common in optics to be concerned with a bundle of rays that are almost parallel. This implies that the angle that the rays make with some reference ray can be treated as small—an approximation that underlies the first order theory of simple optical instruments like the telescope and the microscope. This approximation is called paraxial optics, and it permits one to linearize the geometric optics equations and use matrix methods to trace their rays. We shall develop the paraxial optics formalism for waves whose dispersion relation has the simple, time-independent, nondispersive form Ω = kc/n(x). Recall that this applies to light in a dielectric medium — the usual application. As we shall see below, it also applies to charged particles in a storage ring (Sec. 6.4.2) and to light being lensed by a weak gravitational field (Sec. 6.5). We start by linearizing the ray propagation equation (6.42). Let z measure distance along a reference ray. Let the two dimensional vector x(z) be the transverse displacement of some other ray from this reference ray, and denote by (x, y) = (x1 , x2 ) the Cartesian components of x, with the transverse Cartesian basis vectors ex and ey transported parallely along the reference ray. Under paraxial conditions, |x| is small compared to the z-lengthscales of the propagation. Now, let us Taylor expand the refractive index, n(x, z). 1 n(x, z) = n(0, z) + xi n,i (0, z) + xi xj n,ij (0, z) + . . . , 2

(6.50a)

23 where the subscript commas denote partial derivatives with respect to the transverse coordinates, n,i ≡ ∂n/∂xi . The linearized form of Eq. (6.42) is then given by dxi d n(0, z) = n,i (0, z) + xj n,ij (0, z) . (6.50b) dz dz It is helpful to regard z as “time” and think of Eq. (6.50b) as an equation for the two dimensional simple harmonic motion of a particle (the ray) in a quadratic potential well. We are usually concerned with aligned optical systems in which there is a particular choice of reference ray called the optic axis, for which the term n,i (0, z) on the right hand side of Eq. (6.50b) vanishes. If we choose the reference ray to be the optic axis, then Eq. (6.50b) is a linear, homogeneous, second-order equation for x(z), (d/dz)(ndxi /dz) = xj n,ij

(6.50c)

˙ ′ ) where the dot denotes differentiation which we can solve given starting values x(z ′ ), x(z with respect to z, and z ′ is the starting location. The solution at some point z is linearly ˙ related to the starting values. We can capitalize on this linearity by treating {x(z), x(z)} as a 4 dimensional vector Vi (z), with V1 = x,

V2 = x, ˙

V3 = y,

V4 = y, ˙

(6.51)

and embodying the linear transformation from location z ′ to location z in a transfer matrix Jab (z, z ′ ): Va (z) = Jab (z, z ′ ) · Vb (z ′ ). (6.52) The transfer matrix contains full information about the change of position and direction of all rays that propagate from z ′ to z. As is always the case for linear systems, the transfer matrix for propagation over a large interval, from z ′ to z, can be written as the product of the matrices for two subintervals, from z ′ to z ′′ and from z ′′ to z: Jac (z, z ′ ) = Jab (z, z ′′ )Jbc (z ′′ , z ′ ).

6.4.1

(6.53)

Axisymmetric, Paraxial Systems

p If the index of refraction is everywhere axisymmetric, so n = n( x2 + y 2, z), then there is no coupling between the motions of rays along the x and y directions, and the equations of motion along x are identical to those along y. In other words, J11 = J33 , J12 = J34 , J21 = J43 , and J22 = J44 are the only nonzero components of the transfer matrix. This reduces the dimensionality of the propagation problem from 4 dimensions to 2: Va can be regarded as either {x(z), x(z)} ˙ or {y(z), y(z)}, ˙ and in both cases the 2 × 2 transfer matrix Jab is the same. Let us illustrate the paraxial formalism by deriving the transfer matrices of a few simple, axisymmetric optical elements. In our derivations it is helpful conceptually to focus on rays

24 u

v

Source

Lens

Fig. 6.5: Simple converging lens used to illustrate the use of transfer matrices.The total transfer matrix is formed by taking the product of the straight section transfer matrix with the lens matrix and another straight section matrix.

that move in the x-z plane, i.e. that have y = y˙ = 0. We shall write the 2-dimensional Vi as a column vector x Va = (6.54) x˙ The simplest case is a straight section of length d extending from z ′ to z = z ′ + d. The components of V will change according to

so Jab =

1 d 0 1

x = x′ + x˙ ′ d x˙ = x˙ ′

(6.55)

for straight section of length d,

(6.56)

where x′ = x(z ′ ) etc. Next, consider a thin lens with focal length f . The usual convention in optics is to give f a positive sign when the lens is converging and a negative sign when diverging. A thin lens gives a deflection to the ray that is linearly proportional to its displacement from the optic axis, but does not change its transverse location. Correspondingly, the transfer matrix in crossing the lens (ignoring its thickness) is: Jab =

1 0 −1 −f 1

for thin lens with focal length f.

(6.57)

Similarly, a spherical mirror with radius of curvature R (again adopting a positive sign for a converging mirror and a negative sign for a diverging mirror ) has a transfer matrix Jab =

1 0 −1 2R 1

for spherical mirror with radius of curvature R.

(6.58)

As a simple illustration let us consider rays that leave a point source which is located a distance u in front of a converging lens of focal length f and solve for the ray positions a

25 distance v behind the lens (Fig. 6.5). The total transfer matrix is the product of the transfer matrix for a straight section, Eq. (6.56) with the product of the lens transfer matrix and a second straight-section transfer matrix: 1 v 1 0 1 u 1 − vf −1 u + v − uvf −1 (6.59) Jab = = −f −1 1 − uf −1 0 1 −f −1 1 0 1 When the 1-2 element (upper right entry) of this transfer matrix vanishes, the position of the ray after traversing the optical system is independent of the starting direction. In other words, rays from the point source form a point image; when this happens, the planes containing the source and the image are said be conjugate. The condition for this to occur is 1 1 1 + = . u v f

(6.60)

This is the standard thin lens equation. The linear magnification of the image is given by M = J11 = 1 − v/f , i.e. v M =− , (6.61) u where the negative sign indicates that the image is inverted. Note that the system does not change with time, so we could have interchanged the source and the image planes.

6.4.2

Converging Magnetic Lens

Since geometric optics is the same as particle dynamics, these matrix equations can be used for paraxial motions of electrons and ions in a storage ring. (Note, however, that the Hamiltonian for such particles is dispersive, since the Hamiltonian does not depend linearly on the particle momentum, and so for our simple matrix formalism to be valid, we must confine attention to a mono-energetic beam.) Quadrupolar magnetic fields are used to guide the particles around the storage ring. Since these magnetic fields are not axisymmetric, to analyze them we must deal with a four-dimensional vector V. The simplest, practical magnetic lens is quadrupolar. If we orient our axes appropriately, the magnetic field can be expressed in the form B=

B0 (yex + xey ) . r0

(6.62)

Particles traversing this magnetic field will be subjected to a Lorentz force which will curve their trajectories. In the paraxial approximation, a particle’s coordinates will satisfy the two differential equations x y x¨ = − 2 , y¨ = 2 , (6.63a) λ λ where the dots (as above) mean d/dz = v −1 d/dt and λ=

pr0 qB0

1/2

(6.63b)

26 ey N

S

ex

S

N

Fig. 6.6: Quadrupolar Magnetic Lens. The magnetic field lines lie in a plane perpendicular to the optic axis. Positively charged particles moving along ez are converged when y = 0 and diverged when x = 0.

with q the particle’s charge (assumed positive) and p its momentum. The motions in the x and y directions are decoupled. It is convenient in this case to work with two 2-dimensional vectors, {Vx1 , Vx2 } ≡ {x, x} ˙ and {Vy1 , Vy2 } = {y, y}. ˙ From the elementary solutions to the equations of motion (6.63a), we infer that the transfer matrices from the magnet’s entrance to its exit are Jx ab , Jy ab , where cos φ λ sin φ Jx ab = (6.64a) −λ−1 sin φ cos φ coshφ λsinhφ Jy ab = (6.64b) λ−1 sinhφ coshφ and φ = L/λ with L the distance from entrance to exit. The matrices Jx ab , Jy ab can be decomposed as follows 1 λ tan φ/2 1 0 1 λ tan φ/2 Jx ab = 0 1 − sin φ/λ 1 0 1 1 λtanhφ/2 1 0 1 λtanhφ/2 Jy ab = 0 1 sinhφ/λ 1 0 1

(6.64c)

(6.64d) (6.64e)

Comparing with Eqs. (6.56), (6.57), we see that the action of a single magnet is equivalent to the action of a straight section, followed by a thin lens, followed by another straight section. Unfortunately, if the lens is focusing in the x direction, it must be de-focusing in the y direction and vice versa. However, we can construct a lens that is focusing along both directions by combining two magnets that have opposite polarity but the same focusing strength φ = L/λ:

27 Consider the motion in the x direction first. Let f+ = λ/ sin φ be the equivalent focal length of the first converging lens and f− = −λ/sinhφ that of the second diverging lens. If we separate the magnets by a distance s, this must be added to the two effective lengths of the two magnets to give an equivalent separation, d = λ tan(φ/2) + s + λtanh(φ/2) for the two equivalent thin lenses. The combined transfer matrix for the two thin lenses separated by this distance d is then 1 0 1 d 1 0 1 − df+−1 d = (6.65a) −f−−1 1 0 1 −f+−1 1 −f∗−1 1 − df−−1 where 1 1 d 1 = + − f∗ f− f+ f− f+ sin φ sinhφ d sin φ sinh φ = − + . λ λ λ2

(6.65b)

Now if we assume that φ ≪ 1 and s ≪ L, then we can expand as a Taylor series in φ to obtain 3λ 3λ4 f∗ ≃ 3 = . (6.66) 2φ 2L3 The effective focal length of the combined magnets, f∗ is positive and so the lens has a net focussing effect. From the symmetry of Eq. (6.65b) under interchange of f+ and f− , it should be clear that f∗ is independent of the order in which the magnets are encountered. Therefore, if we were to repeat the calculation for the motion in the y direction we would get the same focusing effect. (The diagonal elements of the transfer matrix are interchanged but as they are both close to unity, this is a fairly small difference.) The combination of two quadrupole lenses of opposite polarity can therefore imitate the action of a converging lens. Combinations of magnets like this are used to collimate particle beams in storage rings and particle accelerators. **************************** EXERCISES Exercise 6.10 Problem: Matrix Optics for a Simple Refracting Telescope Consider a simple refracting telescope that comprises two thin converging lenses and that takes parallel rays of light from distant stars which make an angle θ with the optic axis and converts them into parallel rays making an angle −Mθ where M ≫ 1 is the magnification (Fig. 6.7). (a) Use matrix methods to investigate how the output rays depend on the separation of the two lenses and hence find the condition that the output rays are parallel when the input rays are parallel. (b) How does the magnification M depend on the ratio of the focal lengths of the two lenses?

28

θ

optic axis

ray ray M θ

ray

Fig. 6.7: Simple refracting telescope.

x3 x2 x1 Fig. 6.8: An optical cavity formed by two mirrors, and a light beam bouncing back and forth inside it.

Exercise 6.11 Example: Rays bouncing between two mirrors Consider two spherical mirrors each with radius of curvature R, separated by distance d so as to form an “optical cavity,” as shown in Fig. 6.8. A laser beam bounces back and forth between the two mirrors. The center of the beam travels along a geometric-optics ray. (a) Show, using matrix methods, that the central ray hits one of the mirrors (either one) at successive locations x1 , x2 , x3 . . . (where x ≡ (x, y) is a 2 dimensional vector in the plane perpendicular to the optic axis), which satisfy the difference equation xk+2 − 2bxk+1 + xk = 0 where

4d 2d2 + 2. R R Explain why this is a difference-equation analogue of the simple-harmonic-oscillator equation. b=1−

(b) Show that this difference equation has the general solution xk = A cos(k cos−1 b) + B sin(k cos−1 b) . Obviously, A is the transverse position x0 of the ray at its 0’th bounce. The ray’s 0’th position x0 and its 0’th direction of motion x˙ 0 together determine B. (c) Show that if 0 ≤ d ≤ 2R, the mirror system is “stable”. In other words, all rays oscillate about the optic axis. Similarly, show that if d > 2R, the mirror system is unstable and the rays diverge from the optic axis.

29 (d) For an appropriate choice of initial conditions x0 and x˙ 0 , the laser beam’s successive spots on the mirror lie on a circle centered on the optic axis. When operated in this manner, the cavity is called a Harriet delay line. How must d/R be chosen so that the spots have an angular step size θ? (There are two possible choices.)

****************************

6.5

T2 Caustics and Catastrophes—Gravitational Lenses

Albert Einstein’s general relativity theory (Part VI of this book) predicts that light rays should be deflected by the gravitational field of the Sun. Newton’s law of gravity combined with his corpuscular theory of light also predict such a deflection, but through an angle half as great as relativity predicts. A famous measurement, during a 1919 solar eclipse, confirmed the relativistic prediction, thereby making Einstein world famous. The deflection of light by gravitational fields allows a cosmologically distant galaxy to behave like a crude lense and, in particular, to produce multiple images of a more distant quasar. Many examples of this phenomenon have been observed. The optics of these gravitational lenses provides an excellent illustration of the use of Fermat’s principle and also the properties of caustics.9

6.5.1

T2 Formation of Multiple Images

The action of a gravitational lens can only be understood properly using general relativity. However, when the gravitational field is weak, there exists an equivalent Newtonian model which is adequate for our purposes. In this model, curved spacetime behaves as if it were spatially flat and endowed with a refractive index given by n=1−

2Φ c2

(6.67)

where Φ is the Newtonian gravitational potential, normalized to vanish far from the source of the gravitational field and chosen to have a negative sign (so, e.g., the field at a distance r from a point mass M is Φ = −GM/r). Time is treated in the Newtonian manner in this model. We will justify this index-of-refraction model in Part VI. Consider, first, a ray which passes by a point mass M with an impact parameter b. The ray trajectory is determined by solving the paraxial ray equation (6.50c), d/dz(ndx/dz) = (x · ∇)(∇n), where x(z) is the ray’s transverse position relative to an optic axis that passes through the point mass, and z is distance along the optic axis. The ray will be bent through a deflection angle α, cf. Fig. 6.9. An equivalent way of expressing the motion is to say that the photons moving with speed c are subject to a Newtonian gravitational force and are accelerated kinematically with twice the Newtonian acceleration. 9

Schneider, Ehlers & Falco (1999).

30

α Q

G

(v/u) θ u

θ v

Fig. 6.9: Geometry for a gravitational lens. Light from a distant quasar, Q treated as a point source, passes by a galaxy G and is deflected through an angle α on its way to earth ⊕. The galaxy is a distance u from the quasar and v from earth.

The problem is therefore just that of computing the deflection of a charged particle passing by an oppositely charged particle. The deflection, under the impulse approximation, is given by −4Φ(r = b) 4GM = , (6.68) α= 2 bc c2 where b is the ray’s impact parameter. For a ray from a distant star, passing close to the limb of the sun, this deflection is α = 1.75 arc seconds. Now, let us replace the sun with a cosmologically distant galaxy and the star with a more distant quasar. Let the distance from the galaxy to Earth be v and from the galaxy to the quasar be u (Fig. 6.9).10 It is instructive to make an order of magnitude estimate of Φ ∼ −GM/b, where M is the mass interior to the impact parameter, b. It is convenient to relate the deflection to the mean square velocity σ 2 of the constituent stars (measured in one dimension). (This quantity can be measured spectroscopically.) We find that Φ ∼ −3σ 2 /2. Therefore, an order of magnitude estimate of the angle of deflection is α ∼ 6σ 2 /c2 . If we do a more careful calculation for a simple model of a galaxy in which the mass density varies inversely with the square of the radius, then we obtain α∼

4πσ 2 c2

(6.69)

For typical galaxies, σ ∼ 300 km s−1 and α ∼ 1 − 2 arc sec. The paraxial approximation therefore is fully justified. Now, the distances u, v are roughly ten billion light years ∼ 1026 m and so the tranverse displacement of the ray due to the galaxy is ∼ vα/2 ∼ 3 × 1020 m, which is well within the galaxy. This means that light from a quasar lying behind the galaxy can pass by either side of the galaxy. We should then see at least two distinct images of the quasar separated by an angular distance ∼ α/2. The imaging is illustrated in Fig. 6.9. First trace a ray backward from the observer, in the absence of the intervening galaxy, to the quasar. We call this the reference ray. Next, 10

There is a complication in the fact that the universe is expanding and that it contains large scale gravitational fields. However, as we discuss Chap. 27, the universe is spatially flat and so the relation between angles and lengths is the same as in the solar system. We can also imagine performing this observation after the universe has expanded to its present age everywhere and stopping the expansion so as to measure the distances u, v. If we use this definition of distance, we can ignore cosmological effects in understanding the optics.

31 reinstate the galaxy and consider a virtual ray that propagates at an angle θ (a 2-dimensional vector on the sky) to the reference ray in a straight line from the earth to the galaxy where it is deflected toward the quasar. (A virtual ray is a path that will become a real ray if it satisfies Fermat’s principle.) The optical phase for light propagating along this virtual ray will exceed that along the reference ray by an amount ∆ϕ called the phase delay. There are two contributions to ∆ϕ: First, the geometrical length of the path is longer than the reference ray by an amount (u + v)vθ2 /2u (cf. Fig. 6.9), and thus the travel time is longer 2 by an amount (u the light is delayed as it passes through the potential R + v)vθ /2uc. Second, R well by a time (n − 1)ds/c = −2 Φds/c3 , where ds is an element of length along the path. We can express this second delay as 2Φ2 /c3 . Here Φ2 =

Z

Φds

(6.70)

is the two-dimensional (2D) Newtonian potential and can be computed from the 2D Poisson equation Z 2 ∇ Φ2 = 4πGΣ , where Σ = ρds (6.71a) is the surface density of mass in the galaxy integrated along the line of sight. Therefore, the phase delay ∆ϕ is given by (u + v)v 2 2Φ2 (θ) θ − . ∆ϕ = ω 2uc c3

(6.71b)

We can now invoke Fermat’s principle. Of all possible virtual rays, parametrized by the angular coordinate θ, the only ones that correspond to real rays are those for which the phase difference is stationary, i.e. those for which ∂∆ϕ =0, ∂θj

(6.71c)

where θj (with j = x, y) are the Cartesian components of θ. Differentiating Eq. (6.71b) we obtain a 2D vector equation for the angular location of the images as viewed from Earth: θj =

∂Φ2 2u . 2 (u + v)vc ∂θj

(6.71d)

Note that ∂Φ2 /∂θj is a function of θj , so if Φ2 (θj ) is known, this becomes a (usually) nonlinear equation to be solved for the vector θj . Referring to Fig. 6.9, and using simple geometry, we can identify the deflection angle for the image’s ray: αj =

2 ∂Φ2 vc2 ∂θj

(6.71e)

We can understand quite a lot about the properties of the images by inspecting a contour plot of the phase delay function ∆ϕ(θ) (Fig. 6.10). When the galaxy is very light or quite

32 (a)

(b)

L L

(d)

(c)

L

L

S H

H S

L

S

Fig. 6.10: Contour plots of the phase delay ∆ϕ(θ) for four different gravitational lenses. a) In the absence of a lens Φ2 = 0, the phase delay (6.71b) has a single minimum corresponding to a single undeflected image. b) When a small galaxy with a shallow potential Φ2 is interposed, it pushes the phase delay ∆ϕ up in its vicinity [Eq. (6.71b) with negative Φ2 ], so the minimum and hence the image are deflected slightly away from the galaxy’s center. c) When a galaxy with a deeper potential well is present, the delay surface will be raised so much near the galaxy’s center that additional stationary points will be created and two more images will be produced. d) If the potential well deepens even more, five images can be produced. In all four plots the local extrema of ∆ϕ are denoted L for a low point (local minimum), H for a high point (local maximum) and S for saddle point.

distant from the line of sight, then there is a single minimum in the phase delay. However, a massive galaxy along the line of sight to the quasar can create two or even four additional stationary points and therefore three or five images. Note that with a transparent galaxy, the additional images are created in pairs. Note, in addition, that the stationary points are not necessarily minima. They can be minima labeled by L in the figure, maxima labeled by H, or saddle points labeled by S. This is inconsistent with Fermat’s original statement of his principle (“minimum phase delay”), but there are images at all the stationary points nevertheless. Now suppose that the quasar is displaced by a small angle δθ ′ as seen from Earth in the absence of the lens. This is equivalent to moving the lens by a small angle −δθ ′ as seen from Earth. Equation (6.71d) says that the image will be displaced by a small angle δθ satisfying the equation ∂ 2 Φ2 2u δθj . (6.72a) δθi − δθi′ = (u + v)vc2 ∂θi ∂θj By combining with Eq. (6.71b), we can rewrite this as δθi′ = Hij δθj ,

(6.72b)

33 where the matrix [Hij ] is Hij =

uc/ω (u + v)v

∂ 2 ∆ϕ 2u = δij − Φ2,ij . ∂θi ∂θj (u + v)vc2

(6.72c)

Now consider a small solid angle of source dΩ′ = dθ1′ dθ2′ . Its image in the presence of the lens will have solid angle dΩ = dθ1 dθ2 . Because the specific intensity of the light, Iν = dE/dtdAdΩdν is conserved along a ray (unaffected by the lensing), the flux of light received from the source is Iν dΩ′ in the absence of the lens and Iν dΩ in its presence. The ratio of these fluxes is the magnification M = dΩ/dΩ′ . From Eq. (6.72b) we see that the magnification is just the determinant of the inverse of the matrix Hij : M=

1 dΩ = . ′ dΩ det [Hij ]

(6.73)

The curvature of the phase delay surface [embodied in det [∂ 2 ∆ϕ/∂θi ∂θj ] which appears in Eq.6.72c)] is therefore a quantitative measure of the magnification. Small curvature implies large magnification of the images and vice versa. Furthermore images associated with saddle points in the phase delay surface have opposite parity to the source. Those associated with maxima and minima have the same parity as the source, although the maxima are rotated on the sky by 180◦ . These effects have been seen in observed gravitational lenses. There is an additional, immediate contact to the observations and this is that the phase delay function at the stationary points is equal to ω times the extra time it takes a signal to arrive along that ray. In order of magnitude, the time delay difference will be ∼ vα2 /8c ∼ 1 month. Now, many quasars are intrinsically variable, and if we monitor the variation in two or more images, then we should be able to measure the time delay between the two images. This, in turn, allows us to measure the distance to the quasar and, consequently, provides a measurement of the size of the universe.

6.5.2

T2 Catastrophe Optics — Formation of Caustics

Many simple optical instruments are carefully made so as to form point images from point sources. However, naturally occuring optical systems, and indeed precision optical instruments, when examined in detail, bring light to a focus on a 2D surface in 3D space, called a caustic.11 Caustics are quite familiar and can be formed when sunlight is refracted or reflected by the choppy water on the surface of a swimming pool. The bright lines one sees on the pool’s bottom are intersections of 2D caustics with the pool’s 2D surface Another good example is the cusped curve (called a nephroid) formed by light from a distant source reflected off the cylindrical walls of a mug of coffee onto the surface of the coffee. What is surprising is that caustics formed under quite general conditions can be classified into a rather small number of types. As a simple, concrete example that illustrates the theory of caustics, let us consider the problem of the refraction of light by an axisymmetric, converging lens, for example a 11

See, for example, Berry & Upstill (1980).

34

C A

S

B

x

C

Screen

d Fig. 6.11: The formation of caustics by a phase changing screen which is constrained to be circularly symmetric. Light from a distant source is refracted at the screen. The envelope of the refracted rays forms a caustic surface C. An observer at a point A, outside C, will see a single image of the distant source, whereas one at point B, inside C, will see three images. If the observer at B moves toward the caustic, then she will see two of the images approach each other, merge and then vanish. If the source has a finite angular size, the observed angular size of the two merging images will increase prior to their vanishing, and the energy flux from the two images will also increase. In this example, the caustic terminates in a cusp point which becomes structurally unstable if we remove the constraint that the phase screen be axisymmetric. THIS DIAGRAM MUST BE REDRAWN. IT COULD BE IMPROVED IF WE SHOW THAT THERE ARE THREE RAYS PASSING THROUGH OFF AXIS POINTS WITH C AND MOVE B TO ONE OF THEM. HOWEVER IT GOT RATHER COMPLICATED.

gravitational lens (cf. Fig. 6.11). Consider a set of rays from a distant source with impact parameter s at the lens. Let these rays pass through a point of observation a distance d from the lens with radial coordinate x = s − θd 0

φ

φ x =0

s

x 0) and now rays when she lies outside (x < 0).

written in the form of a Taylor series for which the leading terms are: 1 ϕ(s; d, x) = as3 − bxs + . . . 3

(6.74)

where the factor 1/3 is just a convention and we have dropped a constant. Note that, by changing coordinates, we have removed the quadratic terms. Now, for any given lens we can compute the coefficients a, b accurately through a careful Taylor expansion about the caustic. However, their precise form does not interest us here as we are only concerned with scaling laws. Now, invoking Fermat’s Principle and differentiating Eq. (6.74) with respect to s, we see that there are two true rays and two images for x > 0, (passing through s = ±(bx/a)1/2 ), and no images for x < 0. x = 0 marks the location of the caustic at this distance behind the lens. We can now compute the magnification of the images as the caustic is approached. This is given by 1/2 ds 1 b M∝ = . (6.75) dx 2 ax Notice that the magnification, and thus also the total flux in each image, scales inversely with the square root of the distance from the caustic. This does not depend on the optical details (i.e. on the coefficients in our power series expansions). It therefore is equally true for reflection at a spherical mirror, or refraction by a gravitational lens, or refraction by the rippled surface of the water in a swimming pool. This is just one example of several scaling laws which apply to caustics. The theory of optical caustics is a special case of a more general formalism called catastrophe theory, and caustics are examples of catastrophes. In this theory, it is shown that there are only a few types of catastrophe and they have many generic properties. The key mathematical requirement is that the behavior of the solution should be structurally stable. That is to say, if we make small changes in the physical conditions, the scaling laws etc. are robust. The catastrophe that we have just considered is the most elementary example and is called the fold. A gravitational lens example is shown in shown in Fig. 6.13. The next

36

Fig. 6.13: Gravitational lens in which a distant quasar, Q1115+080, is quadruply imaged by an intervening galaxy. (There is also a fifth, unseen image.) Two of the images are observed to be much brighter and closer than the other two because the quasar is located quite close to a fold caustic surface. The images are observed to vary in the order predicted by modeling the gravitational potential of the galaxy and this variation can be used to estimate the size of the universe.

simplest catastrophe, known as the cusp, is the curve where two fold surfaces meet. (The point cusp displayed in Fig. 6.12, is actually structurally unstable as a consequence of the assumed strict axisymmetry. However if we regard s, x as just 1D Cartesian coordinates, then Fig. 6.12 provides a true representation of the geometry.) In total there are seven elementary catastrophes. Catastrophe theory has many interesting applications in optics, dynamics, and other branches of physics. Let us conclude with an important remark. If we have a point source, the magnification will diverge to infinity as the caustic is approached, according to Eq. 6.75. Two factors prevent the magnification from becoming truly infinite. The first is that a point source is only an idealization, and if we allow the source to have finite size, different parts will produce caustics at slightly different locations. The second is that geometric optics, on which our analysis was based, pretends that the wavelength of light is vanishingly small. In actuality, the wavelength is always nonzero, and near a caustic its finiteness leads to diffraction effects, which limit the magnification to a finite value. Diffraction is the subject of the next chapter. ****************************

37 EXERCISES Exercise 6.12 Example: Stellar gravitational lens Consider a star M that is located a distance v from us and acts as a gravitational lens to produce multiple images of a second star that is a further distance u behind the lens. (a) Verify Eq. (6.68) under the assumption that the deflection angle is small. (b) Use this equation to show that when the second star lies on the continuation of the Earth-lens line, it will produce a thin-ring image at the observer of angular radius given by 1/2 4GM θE = , (6.76) Dc2 and evaluate the distance D in terms of the distances u, v. (This ring is known as the Einstein ring and the radius as the Einstein radius.) (c) Show that when the source is displaced from this line, there will be just two images, one lying within the Einstein ring, the other lying outside. Find their locations. (d) Denote the ratio of the fluxes in these two images by R. Show that the angular radii of the two images can be expressed in the form θ± = ±θE R±1/4 and explain how to measure the Einstein radius observationally. Incidentally, Einstein effectively performed this calculation, in one of his unpublished notebooks, prior to understanding general relativity. Exercise 6.13 Challenge: Catastrophe Optics of an Elliptical Lens Consider an elliptical gravitational lens where the potential at the lens plane varies as Φ2 (θ) = (1 + Aθ12 + 2Bθ1 θ2 + Cθ22 )q ;

0 < q < 1/2.

Determine the generic form of the caustic surfaces and the types of catastrophe encountered. Note that it is in the spirit of catastrophe theory not to compute exact expressions but to determine scaling laws and to understand the qualitative behavior of the images.

****************************

38

6.6

Polarization

In our geometric optics analyses thus far, we have either dealt with a scalar wave (e.g., a sound wave) or simply supposed that individual components of vector or tensor waves can be treated as scalars. For most purposes, this is indeed the case and we shall continue to use this simplification in the following chapters. However, there are some important wave properties that are unique to vector (or tensor) waves. Most of these come under the heading of polarization effects. In Part VI we shall study polarization effects for (tensorial) gravitational waves. Here and in several other chapters we shall examine them for electromagnetic waves.

6.6.1

Polarization Vector and its Geometric-Optics Propagation Law

An electromagnetic wave in vacuo has its electric and magnetic fields E and B perpendicular ˆ and perpendicular to each other. In a medium, E and B may to its propagation direction k ˆ depending on the medium’s properties. For example, or may not remain perpendicular to k, an Alfvén wave has its vibrating magnetic field perpendicular to the background magnetic ˆ By contrast, in the simplest case field, which can make an arbitrary angle with respect to k. of an isotropic dielectric medium, where the dispersion relation has our standard dispersionˆ and E and B turn out free form Ω = (c/n)k, the group and phase velocities are parallel to k, ˆ and to each other—as in vacuum. In this section, we shall confine to be perpendicular to k attention to this simple situation, and to linearly polarized waves, for which E oscillates ˆ linearly back and forth along a polarization direction ˆf that is perpendicular to k: E = Aeiϕ ˆf ,

ˆf · k ˆ ≡ ˆf · ∇ϕ = 0 .

(6.77)

In the eikonal approximation, Aeiϕ ≡ ψ satisfies the geometric-optics propagation laws of Sec. 6.3, and the polarization vector ˆf, like the amplitude A, will propagate along the rays. The propagation law for ˆf can be derived by applying the eikonal approximation to Maxwell’s equations, but it is easier to infer that law by simple physical reasoning: (i) If the ray is straight, then the medium, being isotropic, is unable to distinguish a slow righthanded rotation of ˆf from a slow left-handed rotation, so there will be no rotation at all: ˆf will continue always to point in the same direction, i.e. ˆf will be kept parallel to itself during ˆ transport along the ray. (ii) If the ray bends, so dk/ds 6= 0 (where s is distance along the ˆ The ray), then ˆf will have to change as well, so as always to remain perpendicular to k. ˆ ˆ direction of f’s change must be k, since the medium, being isotropic, cannot provide any other preferred direction for the change. The magnitude of the change is determined by the ˆ remain zero all along the ray and that k ˆ·k ˆ = 1. This immediately requirement that ˆf · k implies that the propagation law for ˆf is ! ˆ d k dˆf ˆ ˆf · . = −k (6.78) ds ds ˆ Here “parallel transport” means: We say that the vector ˆf is parallel-transported along k. (i) Carry ˆf a short distance along the trajectory keeping it parallel to itself in 3-dimensional

39

v

f v G v

F E

D Ray

f

H

C

B

α

D

f

F v

k

A

(a)

α

E v

α

C B

A

G

α

k H

(b)

Fig. 6.14: (a) The ray along the optic axis of a circular loop of optical fiber, and the polarization ˆ ˆf · vector ˆf that is transported along the ray by the geometric-optics transport law dˆf /ds = −k( ˆ dk/ds). (b) The polarization vector ˆf drawn on the unit sphere. The vector from the center of the ˆ and the polarization sphere to each of the points A, B, ..., is the ray’s propagation direction k, ˆ vector (which is orthogonal to k and thus tangent to the sphere) is identical to that in the physical space of the ray [drawing (a)].

ˆ (ii) Project ˆf perpendicular to space. This will cause ˆf to no longer be perpendicular to k. ˆ ˆ k by adding onto it the appropriate multiple of k. (The techniques of differential geometry for curved surfaces, which we shall develop in Part VI when studying general relativity, give powerful mathematical tools for analyzing this parallel transport.)

6.6.2

T2 Geometric Phase

We shall use the polarization propagation law (6.78) to illustrate a quite general phenomenon known as the Geometric phase.12 Consider linearly polarized, monochromatic light beam that propagates in an optical fiber. The fiber’s optic axis is the principal ray along which the light propagates. We can imagine bending the fiber into any desired shape, and thereby controlling the shape of the ray. The ray’s shape in turn will control the propagation of the polarization via Eq. (6.78). If the fiber and ray are straight, then the propagation law (6.78) keeps ˆf constant. If the fiber and ray are circular, then the propagation law (6.78) causes ˆf to rotate in such a way as to always point along the generator of a cone as shown in Fig. 6.14 (a). This polarization behavior, and that for any other ray shape, can be deduced with the aid of a unit sphere ˆ [Fig. 6.14 (b)]. For example, the ray directions at ray on which we plot the ray direction k locations C and H [drawing (a)] are as shown in drawing (b). Notice, that the trajectory ˆ around the unit sphere is a great circle. This is because the ray in physical space is of k a closed circle. If, instead, the fiber and ray were bent into a helix (Fig. 6.15), then the trajectory on the unit sphere would be a smaller circle. On the unit sphere we also plot the polarization vector ˆf — one vector at each point ˆ = 0, the polarization vectors are always corresponding to a ray direction. Because ˆf · k 12

Berry (1990).

40

θ

α C

D

θ

A

α C

B

α

D A

α

α Lag

A B

Fig. 6.15: (a) The ray along the optic axis of a helical loop of optical fiber, and the polarization ˆ ˆf · vector ˆf that is transported along the ray by the geometric-optics transport law dˆf /ds = −k( ˆ ˆ makes an angle θ = 73o to the vertical direction. (b) The dk/ds). The ray’s propagation direction k ˆ trajectory of k on the unit sphere (a circle with polar angle θ = 73o ), and the polarization vector ˆf that is parallel transported along that trajectory. The polarization vectors in drawing (a) are deduced from the parallel transport law of drawing (b). The lag angle αlag = 2π(1 − cos θ) = 1.42π ˆ (the θ = 73o circle). is equal to the solid angle contained inside the trajectory of k

tangent to the unit sphere. Notice that each ˆf on the unit sphere is identical in length and direction to the corresponding one in the physical space of drawing (a). For the circular, closed ray of Fig. 6.14 (a), the parallel transport law keeps constant the angle α between ˆf and the trajectory of ˆf [drawing (b)]. Translated back to drawing (a), this constancy of α implies that the polarization vector points always along the generators of the cone whose opening angle is π/2 − α, as shown. ˆ rotates, always maintaining For the helical ray of Fig. 6.15 (a), the propagation direction k the same angle θ to the vertical direction, and correspondingly its trajectory on the unit sphere of Fig. 6.15 (b) is a circle of constant polar angle θ. In this case (as one can see, e.g., with the aid of a large globe of the Earth and a pencil that one transports around a circle of latitude 90o − θ), the parallel transport law dictates that the angle α between ˆf and the circle not remain constant, but instead rotate at the rate dα/dφ = cos θ .

(6.79)

Here φ is the angle (longitude on the globe) around the circle. (This is the same propagation law as for the direction of swing of a Foucault Pendulum as the earth turns, and for the same reason: the gyroscopic action of the Foucault Pendulum is described by parallel transport of its plane along the earth’s spherical surface.) In the case θ ≃ 0 (a nearly straight ray), the transport equation (6.79) predicts dα/dφ = ˆ turns rapidly around a tiny circle about 1: although ˆf remains constant, the trajectory of k the pole of the unit sphere, so α changes rapidly—by a total amount ∆α = 2π after one

41 trip around the pole. For an arbitrary helical pitch angle θ, the propagation equation (6.79) predicts that during one round trip α will change by an amount 2π cos α that lags behind its change for a tiny circle (nearly straight ray) by the lag angle αLag = 2π(1 − cos θ), which ˆ on the unit sphere: is also the solid angle ∆Ω enclosed by the path of k αLag = ∆Ω .

(6.80)

For the circular ray of Fig. 6.14, the enclosed solid angle is ∆Ω = 2π steradians, so the lag angle is 2π radians, which means that ˆf returns to its original value after one trip around the optical fiber, in accord with the drawings in the figure. By itself, the relationship αLag = ∆Ω is merely a cute phenomenon. However, it turns out to be just one example of a very general property of both classical and quantum mechanical systems when they are forced to make slow adiabatic changes described by circuits in the space of parameters that characterize them. In the more general case one focuses on a phase lag, rather than a direction-angle lag. We can easily translate our example into such a phase lag: The apparent rotation of ˆf by the lag angle αLag = ∆Ω can be regarded as an advance of the phase of one circularly polarized component of the wave by ∆Ω and a phase retardation of the other circular polarization by the same amount. This implies that the phase of a circularly polarized wave will change, after R one circuit around the fiber’s helix, by an amount equal to the usual phase advance ∆ϕ = k · dx (where dx is displacement along the fiber) plus an extra geometric phase change ±∆Ω, where the sign is given by the sense of circular polarization. This type of geometric phase change is found quite generally when classical vector or tensor waves propagate through backgrounds that change slowly, either temporally or spatially; and the phases of the wave functions of quantum mechanical particles with spin behave similarly. **************************** EXERCISES Exercise 6.14 Derivation: Parallel-Transport Use the parallel-transport law 6.78 to derive the relation 6.79. Exercise 6.15 Example: Martian Rover A Martian Rover is equipped with a single gyroscope that is free to pivot about the direction perpendicular to the plane containing its wheels. In order to climb a steep hill on Mars without straining its motor, it must circle the summit in a decreasing spiral trajectory. Explain why there will be an error in its measurement of North after it has reached the summit. Could it be programmed to navigate correctly? Will a stochastic error build up as it traverses a rocky terrain?

****************************

42 Box 6.3 Important Concepts in Chapter 6 • General Concepts – – – –

Dispersion relation – Sec. 6.2.1, Ex. 6.2, Eq. (6.34) Phase velocity and group velocity – Eqs. (6.2) Wave Packet, its motion and spreading, and its internal waves – Sec. 6.2.2 Quanta associated with geometric-optics waves – Secs. 6.2.2 and 6.3.2

• General formulation of geometric optics: Sec. 6.3.3 and the following: – – – – – – – –

Eikonal (geometric optics) approximation – beginning of Sec. 6.3 Bookkeeping parameter for eikonal approximation – Box 6.2 Hamilton’s equations for rays – Eqs. (6.25) Connection to quantum theory – Sec. 6.3.2 Connection to Hamilton-Jacobi theory – Ex. 6.8, Eq. (6.26a) Propagation law for amplitude (conservation of quanta) – Eqs. (6.33) Fermat’s principle, Eq. (6.39) Breakdown of geometric optics – Sec. 6.3.5

• EM waves in dielectric medium, sound waves in fluid or isotropic solid, and gravitational waves in a weak, Newtonian gravitational field – Lagrangian, wave equation, energy density and energy flux – Ex. 6.9, Eqs. (6.17)– (6.19) – Dispersion relation Ω = C(x, t)k – Eq. (6.23) – Ray equation in second–order form – Eq. (6.42) – Fermat’s principle for rays – Eqs. (6.40), – Snell’s law for rays in a stratified medium – Eq. (6.43) – Conservation of phase along a ray – Eq. (6.28) – Propagation law for amplitude – Eq. (6.24) ∗ Geometric phase – Sec. 6.6.2 – Parallel propagation law for polarization vector – Eq. (6.78) – Paraxial optics – Sec. 6.4 ∗ Matrix formalism for rays – Secs. 6.4, 6.4.1 ∗ Application to charged particles in storage ring – Sec. 6.4.2 – Multiple images, crossing of rays, coalescence of images and caustics – Sec. 6.5.1 – Magnification at a caustic – Eq. (6.75)

Bibliographic Note Most texts on optics are traditional in outlook, which is entirely appropriate for a subject with an extensive history that underlies its contemporary technological applications. The

43 most comprehensive of these traditional texts with on extensive discussion of geometric optics is Born & Wolf (1970). A standard text written from a more modern perspective is Hecht (1987). A clear, brief contemporary monograph is Welford (1990). An excellent summary of catastrophe optics, covering both the geometric limit and diffractive effects is Berry (1982).

Bibliography Born, M. & Wolf, E. 1970 Principles of Optics Oxford: Oxford University Press Berry, M. V. 1990 Physics Today 43:12 34 Berry, M. V. & Upstill, C. 1980 Progress in Optics 18 Blandford, R. D. & Narayan, R. 1992 Cosmological Applications of Gravitational Lenses Ann. Rev. Astr. Astrophys. 30 311 Goldstein, H. 1980 Classical Mechanics New York: Addison Wesley Guenther, R. 1990 Modern Optics New York: Wiley Hecht, E. 1987 Optics New York: Wiley Klein, M. V. & Furtak, T. E. 1986 Optics New York: Wiley Schneider, P., Ehlers, J. and Falco, E.E. 1999 Gravitational Lenses Berlin: SpringerVerlag Welford, W. T. 1990 Optics Oxford: Oxford Science Publications

Contents 7 Diffraction 7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Helmholtz-Kirchhoff Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Diffraction by an Aperture . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Spreading of the Wavefront: Fresnel and Fraunhofer Regions . . . . . 7.3 Fraunhofer Diffraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Diffraction Grating . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Airy Pattern of a Circular Aperture: Hubble Space Telescope . . . . 7.3.3 Babinet’s Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Fresnel Diffraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Rectangular Aperture, Fourier Integrals and Cornu Spiral . . . . . . . 7.4.2 Unobscured Plane Wave . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Fresnel Diffraction by a Straight Edge: Lunar Occultation of a Radio Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4 Circular Apertures: Fresnel Zones and Zone Plates . . . . . . . . . . 7.5 Paraxial Fourier Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Coherent Illumination . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Point Spread Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Abbé’s Description of Image Formation by a Thin Lens . . . . . . . . 7.5.4 Phase Contrast Microscopy . . . . . . . . . . . . . . . . . . . . . . . 7.5.5 Gaussian Beams: Interferometric Gravitational-Wave Detectors . . . 7.6 Diffraction at a Caustic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

i

1 1 3 5 6 9 10 12 14 17 18 19 20 21 24 25 26 27 28 30 35

Chapter 7 Diffraction Version 0807.1.K, 12 Nov 2008 Please send comments, suggestions, and errata via email to [email protected] and to [email protected], or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 7.1 Reader’s Guide • This chapter depends substantially on Secs. 6.1–6.4 of Chap. 6 (Geometric Optics) • In addition, Sec. 7.6 of this chapter (diffraction at a caustic) depends on Sec. 6.5 of Chap. 6. • Chapters 7 and 8 depend substantially on Secs. 7.1–7.5 of this chapter • Nothing else in this book relies on this chapter.

7.1

Overview

The previous chapter was devoted to the classical mechanics of wave propagation. We showed how a classical wave equation can be solved in the short wavelength approximation to yield Hamilton’s dynamical equations. We showed that, when the medium is time-independent (as we shall require in this chapter), the frequency of a wave packet is constant. And for time-independent medica, we imported a result from classical mechanics, the principle of stationary action, to show that the true geometric-optics rays coincide with paths along which the action or the integral of the phase is stationary [Eq. (6.39) and associated discussion]. Our physical interpretation of this result was that the waves do indeed travel along every path, from some source to a point of observation, where they are added together but they only give a significant net contribution when they can add coherently in phase, i.e. along the true rays. This is, essentially, Huygens’ model of wave propagation, or, in modern language, a path integral. 1

2 Huygens’ principle asserts that every point on a wave front acts as a source of secondary waves that combine so that their envelope constitutes the advancing wave front. This principle must be supplemented by two ancillary conditions, that the secondary waves are only formed in the forward direction, not backward, and that a π/2 phase shift be introduced into the secondary wave. The reason for the former condition is obvious, that for the latter, less so. We shall discuss both together with the formal justification of Huygens’ construction below. We begin our exploration of the “wave mechanics” of optics in this chapter, and we shall continue it in Chapters 8 and 9. Wave mechanics differs increasingly from geometric optics as the reduced wavelength λ ¯ increases relative to the length scales R of the phase fronts and L of the medium’s inhomogeneities. The number of paths that can combine constructively increases and the rays that connect two points become blurred. In quantum mechanics, we recognize this phenomenon as the uncertainty principle and it is just as applicable to photons as to electrons. Solving the wave equation exactly is very hard except in very simple circumstances. Geometric optics is one approximate method of solving it — a method that works well in the short wavelength limit. In this chapter and the next two, we shall develop approximate techniques that work when the wavelength becomes longer and geometric optics fails. In this book, we shall make a somewhat artificial distinction between phenomena that arise when an effectively infinite number of paths are involved, which we call diffraction and which we describe in this chapter, and those when a few paths, or, more correctly, a few tight bundles of rays are combined, which we term interference, and whose discussion we defer to the next chapter. In Sec. 7.2, we shall present the Fresnel-Helmholtz-Kirchhoff theory that underlies most elementary discussions of diffraction, and we shall then distinguish between Fraunhofer diffraction (the limiting case when spreading of the wavefront mandated by the uncertainty principle is very important), and Fresnel diffraction (where wavefront spreading is a modest effect and geometric optics is beginning to work, at least roughly). In Sec. 7.3, we shall illustrate Fraunhofer diffraction by computing the expected angular resolution of the Hubble Space Telescope, and in Sec. 7.4, we shall analyze Fresnel diffraction and illustrate it using lunar occultation of radio waves and zone plates. Many contemporary optical devices can be regarded as linear systems that take an input wave signal and transform it into a linearly related output. Their operation, particularly as image processing devices, can be considerably enhanced by processing the signal in the Fourier domain, a procedure known as spatial filtering. In Sec. 7.5 we shall introduce a tool for analyzing such devices: paraxial Fourier optics — a close analog of the paraxial geometric optics of Sec. 6.4. We shall use paraxial Fourier optics in Sec. 7.5 to analyze the phase contrast microscope and develop the theory of Gaussian beams — the kind of light beam produced by lasers when their optically resonating cavities have spherical mirrors. Finally, in Sec. 7.6 we shall analyze diffraction near a caustic of a wave’s phase field, a location where geometric optics predicts a divergent magnification of the wave (Sec. 6.5 of the preceeding chapter). As we shall see, diffraction makes the magnification finite and produces an oscillating intensity pattern (interference fringes).

3

7.2

Helmholtz-Kirchhoff Integral

In this section, we shall derive a formalism for describing diffraction. We shall restrict attention to the simplest (and, fortunately, the most widely useful) case: a monochromatic scalar wave Ψ = ψ(x)e−iωt (7.1a) with field variable ψ that satisfies the Helmholtz equation ∇2 ψ + k 2 ψ = 0 with k = ω/c ,

(7.1b)

except at boundaries. Generally Ψ will represent a real valued physical quantity, but for mathematical convenience we give it a complex representation and take the real part of Ψ when making contact with physical measurements. This is in contrast to a quantum mechanical wave function satisfying the Schrödinger equation which is an intrinsically complex function. We shall assume that the wave (7.1) is monochromatic (constant ω) and nondispersive, and the medium is isotropic and homogeneous (constant phase and group speed c) so k is constant. Each of these assumptions can be relaxed, but with some technical penalty. The scalar formalism that we shall develop based on Eq. (7.1b) is fully valid for weak sound waves in a fluid, e.g. air (Chap. 15). It is also fairly accurate, but not precisely so, for the most widely used application of diffraction theory: electromagnetic waves in vacuo or in a medium with homogeneous dielectric constant. In this case ψ can be regarded as one of the Cartesian components of the electric field vector, e.g. Ex (or equally well a Cartesian component of the vector potential or the magnetic field vector). In vacuo or in a homogeneous dielectric medium, Maxwell’s equations imply that this ψ = Ex satisfies the scalar wave equation and thence, for fixed frequency, the Helmholtz equation (7.1b). However, when the wave hits a boundary of the medium (e.g. the edge of an aperture, or the surface of a mirror or lens), its interaction with the boundary can couple the various components of E, thereby invalidating the simple scalar theory we shall develop. Fortunately, this polarizational coupling is usually very weak in the paraxial (small angle) limit, and also under a variety of other circumstances, thereby making our simple scalar formalism quite accurate.1 The Helmholtz equation (7.1b) is an elliptic, linear, partial differential equation, and we can thus express the value ψP of ψ at any point P inside some closed surface E as an integral over E of some linear combination of ψ and its normal derivative; see Fig. 7.1. To derive such an expression, we first augment the actual wave ψ in the interior of E with a second solution of the Helmholtz equation, namely ψ0 =

eikr . r

(7.2)

This is a spherical wave originating from the point P, and r is the distance from P to the point where ψ0 is evaluated. Next we apply Gauss’s theorem, Eq. (1.71a), to the vector field 1

For a formulation of diffraction that takes account of these polarization effects, see, e.g., Chap. 11 of Born and Wolf (1999).

4

n' n ' o ψ'

r dΣ

Fig. 7.1: Geometry for Helmholtz-Kirchhoff Integral, which expresses the field ψP at the point P in terms of an integral of the field and its normal derivative on the surrounding surface E. The small sphere Eo surrounds the observation point P, and V is the volume bounded by E and Eo . The aperture Q, the vectors n and n′ at the aperture, the incoming wave ψ ′ , and the point P ′ are irrelevant to the formulation of the Helmholtz-Kirchoff integral, but appear in subsequent applications.

ψ∇ψ0 − ψ0 ∇ψ and invoke Eq. (7.1b), thereby arriving at Green’s theorem: Z Z (ψ∇ψ0 − ψ0 ∇ψ) · dΣ = − (ψ∇2 ψ0 − ψ0 ∇2 ψ)dV E+S o

V

= 0

(7.3)

Here we have introduced a small sphere So of radius ro surrounding P (Fig. 7.1); V is the volume between the two surfaces So and S; and for future convenience we have made an unconventional choice of direction for the integration element dΣ: it points into V instead of outward thereby producing the minus sign in the second expression in Eq. (7.3). As we let the radius ro decrease to zero, we find that, ψ∇ψ0 − ψ0 ∇ψ → −ψ(0)/ro2 + O(1/ro) and so the integral over Eo becomes 4πψ(P) ≡ 4πψP . Rearranging, we obtain 1 ψP = 4π

Z eikr eikr ψ∇ − ∇ψ · dΣ . r r E

(7.4)

Equation (7.4), known as the Helmholtz-Kirchhoff integral, is the promised expression for the field ψ at some point P in terms of a linear combination of its value and normal derivative on a surrounding surface. The specific combination of ψ and dΣ·∇ψ that appears in this formula is perfectly immune to contributions from any wave that might originate at P and pass outward through S (any “outgoing wave”). The integral thus is influenced only by waves that enter V through E, propagate through V, and then leave through E. [There cannot be sources inside E, except conceivably at P, because we assumed ψ satisfies the source-free Helmholtz equation throughout V.] If P is many wavelengths away from the boundary E, then, to high accuracy, the integral is influenced by the waves ψ only when they are entering

5 through E (when they are incoming), and not when they are leaving (outgoing). This fact is important for applications, as we shall see.

7.2.1

Diffraction by an Aperture

Next, let us suppose that some aperture Q of size much larger than a wavelength but much smaller than the distance to P is illuminated by a distant wave source (Fig. 7.1). (If the aperture were comparable to a wavelength in size, or if part of it were only a few wavelengths from P, then polarizational coupling effects at the aperture would be large1 ; our assumption avoids this complication.) Let the surface E pass through Q, and denote by ψ ′ the wave incident on Q . We assume that the diffracting aperture has a local and linear effect on ψ ′ . More specifically, we suppose that the wave transmitted through the aperture is given by ψQ = t ψ ′ ,

(7.5)

where t is a complex transmission function that varies over the aperture. In practice, t is usually zero (completely opaque region) or unity (completely transparent region). However t can also represent a variable phase factor when, for example, the aperture comprises a medium (lens) of variable thickness and of different refractive index from that of the homogeneous medium outside the aperture — as is the case in microscopes, telescopes, and other optical devices. What this formalism does not allow, though, is that ψQ at any point on the aperture be influenced by the wave’s interaction with other parts of the aperture. For this reason, not only the aperture, but any structure that it contains must be many wavelengths across. To give a specific example of what might go wrong, suppose that electromagnetic radiation is normally incident upon a wire grid. A surface current will be induced in each wire by the wave’s electric field, and that current will produce a secondary wave that cancels the primary wave immediately behind the wire, thereby “eclipsing” the wave. If the secondary wave from the current flowing in the next wire is comparable with the first wire’s secondary wave, then the transmitted net wave field will get modified in a complex, polarization-dependent manner. Such modifications are negligble if the wires are many wavelengths apart. Let us now use the Helmholtz-Kirchoff formula (7.4) to compute the field at P due to the wave ψQ = t ψ ′ transmitted through the aperture. Let the surface E of Fig. 7.1 comprise the aperture Q, a sphere of radius R ≫ r centered on P, and the linear extension of the aperture to meet the sphere; and assume that the only incoming waves are those which pass through the aperture. Then, as noted above, when the incoming waves subsequently pass on outward through E, they contribute negligibly to the integral (7.4), so the only contribution is from the aperture itself.2 On the aperture, because kr ≫ 1, we can write ∇(eikr /r) ≃ −ikneikr /r where n is a unit vector pointing towards P (Fig. 7.1). Similarly, we write ∇ψ ≃ ikt n′ ψ ′ , where n′ is a unit 2

Actually, the incoming waves will diffract around the edge of the aperture onto the back side of the screen that bounds the aperture, i.e. the side facing P; and this diffracted wave will contribute to the HelmholtzKirchhoff integral in a polarization-dependent way; see Chap. 11 of Born and Wolf (1999). However, because the diffracted wave decays along the screen with an e-folding length of order a wavelength, its contribution will be negligible if the aperture is many wavelengths across and P is many wavelengths away from the edge of the aperture, as we have assumed.

6 vector along the direction of propagation of the incident wave (and where our assumption that anything in the aperture varies on scales long compared to λ ¯ = 1/k permits us to ignore the gradient of t). Inserting these gradients into the Helmholtz-Kirchoff formula, we obtain ik ψP = − 2π

Z

dΣ ·

Q

n + n′ 2

eikr ′ tψ . r

(7.6)

Equation (7.6) can be used to compute the wave from a small aperture at any point P in the far field. It has the form of an integral transform of the incident field variable, ψ ′ , where the integral is over the area of the aperture. The kernel of the transform is the product of several factors. There is a factor 1/r. This guarantees that the flux falls off as the inverse square of the distance to the aperture as we might have expected. There is also a phase factor −ieikr which advances the phase of the wave by an amount equal to the optical path length between the element dΣ of the aperture and P, minus π/2 (the phase of −i). The amplitude and phase of the wave ψ ′ can also be changed by the transmission function t. ˆ · (n + n′ )/2 (with dΣ ˆ the unit vector normal to Finally there is the geometric factor dΣ the aperture). This is known as the obliquity factor, and it ensures that the waves from the aperture propagate only forward with respect to the original wave and not backward (not in the direction n = −n′ ). More specifically, this factor prevents the backward propagating secondary wavelets in Huygens construction from reinforcing each other to produce a backscattered wave. When dealing with paraxial Fourier optics (Sec. 7.5), we can usually set the obliquity factor to unity. It is instructive to specialize to a point source seen through a small diffracting aperture. If we suppose that the source has unit strength and is located at P ′ , a distance r ′ before Q ′ (Fig. 7.1), then ψ ′ = −eikr /4πr ′, and ψP can be written in the symmetric form ψP =

Z

eikr 4πr

′

eikr it (k + k) · dΣ 4πr ′ ′

.

(7.7)

We can think of this expression as the Greens function response at P to a δ-function source at P ′ . Alternatively, we can regard it as a propagator from P ′ to P by way of the aperture.

7.2.2

Spreading of the Wavefront: Fresnel and Fraunhofer Regions

Equation (7.6) [or (7.7)] gives a general prescription for computing the diffraction pattern from an illuminated aperture. It is commonly used in two complementary limits, called “Frauhnhofer” and “Fresnel”. Suppose that the aperture has linear size a (as in Fig. 7.2) and is roughly centered on the geometric ray from the source point P ′ to the field point P. Consider the variations of the phase ϕ of the contributions to ψP that come from various places in the aperture. Using elementary trigonometry, we can estimate that locations on the aperture’s opposite sides produce phases at P that differ by ∆ϕ = k(ρ2 − ρ1 ) ∼ ka2 /2ρ, where ρ1 and ρ2 are the distances of P from the two edges of the aperture and ρ is the distance from the center

7

~λ/a

x

a

Screen

ρ

ρ

rF = λρ> Frau nhof a er Re gion

Fig. 7.2: Fraunhofer and Fresnel Diffraction.

of the aperture. There are two limiting regions for ρ depending on whether P’s so-called Fresnel length 1/2 2πρ = (λρ)1/2 . rF ≡ (7.8) k (a surrogate for the distance ρ) is large or small compared to the aperture. When rF ≫ a (field point far from the aperture), the phase variation across the aperture, ∆ϕ ∼ ka2 /2r, is ≪ π and can be ignored, so the contributions at P from different parts of the aperture are essentially in phase with each other. This is the Fraunhofer region. When rF ≪ a (near the aperture), the phase variation is ∆ϕ ≫ π and therefore is of upmost importance in determining the observed intensity3 pattern F ∝ |ψP |2 . This is the Fresnel region; see Fig. 7.2. We can use an argument familiar, perhaps, from quantum mechanics to deduce the qualitative form of the intensity patterns in these two regions. For simplicity, let the incoming wave be planar (r ′ huge) and let it propagate perpendicular to the aperture as shown in Fig. 7.2. Then geometric optics (photons treated like classical particles) would predict that an opaque screen will cast a sharp shadow; the wave leaves the aperture plane as a beam with a sharp edge. However, wave optics insists that the transverse localization of the wave into a region of size ∆x ∼ a must produce a spread in its transverse wave vector, ∆kx ∼ 1/a (a momentum uncertainty ∆px = ~∆kx ∼ ~/a in the language of the Heisenberg uncertainty principle). This uncertain transverse wave vector produces, after propagating a distance ρ, a corresponding uncertainty (∆kx /k)ρ ∼ rF2 /a in the beam’s transverse size. This uncertainty superposes incoherently on the aperture-induced size a of the beam to produce 3 In optics it is conventional to use the word intensity to mean energy flux, F = dE/dAdt. Astronomers often mean by “intensity” I = dE/dAdtdΩ. The phrase “specific intensity” means, pretty universally, Iν = dE/dAdtdΩdν (d Energy / d everything). In Chaps. 7, 8 and 9 we shall follow the optics convention: intensity is the same as energy flux.

8

rF /a = 0.05

rF /a = 0.5

rF /a = 1

Fresnel Region -a/2

0 x

a/2

-a/2

0 x

a/2

-a/2

0 x

a/2

rF /a = 2 Fraunhofer Region

-a/2

0 x

a/2

Fig. 7.3: The one-dimensional intensity diffraction pattern |ψ|2 produced by a slit, t(x) = 1 for |x| < a/2 √ and t(x) = 0 for |x| > a/2. Four patterns are shown, each for a different value of rF /a = λz/a. For rF /a = 0.05 (very near the slit; extreme Fresnel region), the intensity distribution resembles the slit itself: sharp edges at x = ±a/2, but with damped oscillations (interference fringes) near the edges. For rF /a = 2 (beginning of Fraunhofer region) there is a bright central peak and low-brightess, oscillatory side bands. As rF /a increeases 0.05 to 2, the pattern transitions (quite rapidly between α = 2 and 0.5) from the Fraunhofer pattern to the Fresnel pattern. These intensity distributions are derived in Ex. 7.8.

a net transverse beam size q a2 + (rF2 /a)2 ∆x ∼

∼ a if rF ≪ a , i.e., ρ ≪ a2 /λ (Fresnel region) λ ∼ ρ if rF ≫ a , i.e., ρ ≫ a2 /λ (Fraunhofer region). a

(7.9)

In the nearby, Fresnel region, the aperture creates a beam whose edges will have the same shape and size as the aperture itself, and will be reasonably sharp (but with some oscillatory blurring, associated with the wave-packet spreading, that we shall analyze below); see Fig. 7.3. Thus, in the Fresnel region the field behaves approximately as one would predict using geometric optics. By contrast, in the more distant, Fraunhofer region, wave-front spreading will cause the transverse size of the entire beam to grow linearly with distance; and, as illustrated in Fig. 7.3, the intensity pattern will differ markedly from the aperture’s shape. We shall analyze the distant, Fraunhofer region in Sec. 7.3, and the near, Fresnel region in Sec. 7.4.

9

ngth Path le

x'

ρ

ρ x

z

k

Optic Axis

Fig. 7.4: Geometry for computing the path length between a point Q in the aperture and the point of observation P. The transverse vector x is used to identify Q in our Fraunhofer analysis (Sec. 7.3), and x′ is used in our Fresnel analysis (Sec. 7.4).

7.3

Fraunhofer Diffraction

Consider the Fraunhofer region of strong wavefront spreading, rF ≫ a, and for simplicity specialize to the case of an incident plane wave with wave vector k orthogonal to the aperture plane; see Fig. 7.4. Regard the line along k through the center of the aperture Q as the “optic axis”; identify points in the aperture by their transverse two-dimensional vectorial separation x from that axis; identify P by its distance ρ from the aperture center and its 2-dimensional transverse separation ρθ from the optic axis; and restrict attention to smallangle diffraction |θ| ≪ 1. Then the geometric path length between P and a point x on Q [the length denoted r in Eq. (7.6)] can be expanded as Path length = r = (ρ2 − 2ρx · θ + x2 )1/2 ≃ ρ − x · θ +

x2 + ... 2ρ

(7.10)

cf. Fig. 7.4. The first term in this expression, ρ, just contributes an x-independent phase eikρ to the ψP of Eq. (7.6). The third term, x2 /2ρ, contributes a phase variation that is ≪ 1 here in the Fraunhofer region (but that will be important in the Fresnel region, Sec. 7.4 below). Therefore, in the Fraunhofer region we can retain just the second term, −x · θ and write Eq. (7.6) in the form ψP (θ) ∝

Z

e−ikx·θ t (x)dΣ ≡ ˜t(θ) ,

(7.11a)

where dΣ is the surface area element in the aperture plane and we have dropped a constant phase factor and constant multiplicative factors. Thus, ψP (θ) in the Fraunhofer region is given by the two-dimensional Fourier transform, denoted ˜t(θ), of the transmission function t(x), with x made dimensionless in the transform by multiplying by k = 2π/λ. The intensity distribution F = dE/dAdt of the diffracted waves is F (θ) = (ℜ[ψP (θ)e−iωt ])2 = 12 |ψP (θ)|2 ∝ |˜t(θ)|2 ,

(7.11b)

10 where ℜ means take the real part, and the bar means average over time. As an example, the bottom curve in Fig. 7.3 above shows the intensity distribution from a slit ( 1 |x| < a/2 (7.12a) t(x) = H1 (x) ≡ 0 |x| > a/2 , for which ˜1 ∝ ψP (θ) ∝ H F (θ) ∝ sinc2

Z

a/2

eikxθ dx ∝ sinc −a/2 1 kaθ . 2

1 kaθ 2

,

(7.12b) (7.12c)

Here sinc(ξ) ≡ sin(ξ)/ξ. The bottom intensity curve is almost but not quite described by Eq. (7.12c); the differences (e.g., the not-quite-zero value of the minimum between the central peak and the first side lobe) are due to the field point not being fully in the Fraunhofer region, rF /a = 2 rather than rF /a ≫ 1. It is usually uninteresting to normalise Fraunhofer diffraction patterns. On those rare occasions when the absolute value of the observed flux is needed, rather than just the angular shape of the diffraction pattern, it typically can be derived most easily from conservation of the total wave energy. This is why we ignore the proportionality factors in in the above diffraction patterns. All of the techniques for handling Fourier transforms (which should be familiar from quantum mechanics and elsewhere) can be applied to derive Fraunhofer diffraction patterns. In particular, the convolution theorem turns out to be very useful. It says that the Fourier transform of the convolution Z +∞ f2 ⊗ f1 ≡ f2 (x − x′ )f1 (x′ )dΣ′ (7.13) −∞

of two functions f1 and f2 is equal to the product f˜2 (θ)f˜1 (θ) of their Fourier transforms, and conversely. [Here and throughout this chapter we use the optics version of a Fourier transform in which two-dimensional transverse position x is made dimensionless via the wave number k; Eq. (7.11a) above.] As an example of the convolution theorem’s power, we shall compute the diffraction pattern produced by a diffraction grating:

7.3.1

Diffraction Grating

A diffraction grating can be modeled as a finite series of alternating transparent and opaque, long, parallel stripes. Let there be N transparent and opaque stripes each of width a ≫ λ (Fig. 7.5 a), and idealize them as infinitely long so their diffraction pattern is one-dimensional. We shall outline how to use the convolution theorem to derive their Fraunhofer diffraction pattern. The details are left as an exercise for the reader (Ex. 7.4). Our idealized N-slit grating can be regarded as an infinite series of δ-functions with separation 2a convolved with the transmission function H1 [Eq. (7.12a)] for a single slit of

11 t a

a

a

a

(a) 0

-Na aaa

+Na

2a

a

2Na

(b) H2N (x)

H1(x)

N

(c) H1 (θ)

∼

(d) Fig. 7.5: (a) Diffraction grating t(x) formed by N alternating transparent and opaque stripes each of width a. (b) Decomposition of this finite grating into an infinite series of equally spaced δfunctions that are convolved (the symbol ⊗) with the shape of an individual transparent stripe and then multiplied (the symbol ×) by a large aperture function covering N such stripes; cf. Eq. (7.16) (c) The resulting Fraunhofer diffraction pattern ˜t(θ) shown schematically as the Fourier transform of a series of delta functions multiplied by the Fourier transform of the large aperture and then convolved with the transform of a single stripe. (d) The intensity F ∝ |˜t(θ)|2 of this diffraction pattern.

width a, Z

∞ −∞

"

+∞ X

#

δ(ξ − 2an) H1 (x − ξ)dξ

n=−∞

and then multiplied by an aperture function with width 2Na ( 1 |x| < Na H2N (x) ≡ 0 |x| > Na .

(7.14)

(7.15)

More explicitly, t(x) =

Z

∞ −∞

"

+∞ X

n=−∞

#

!

δ(y − 2an) H1 (x − ξ)dξ H2N (x) ,

(7.16)

12 which is shown graphically in Fig. 7.5(b). Let us apply convolution theorem to expression (7.16) for our transmission grating. The diffraction pattern of the infinite series of δ-functions with spacing 2a is itself an infinite series of δ-functions with reciprocal spacing 2π/(2ka) = λ/2a (see the hint in Ex. 7.4). This must ˜ 1 (θ) ∝ sinc( 1 kaθ) of the single narrow slit, and then be multiplied by the Fourier transform H 2 ˜ 2N (θ) ∝ sinc(Nkaθ) of the wide slide. The result is convolved with the Fourier transform H shown schematically in Fig. 7.5(c). (Each of the transforms is real, so the one-dimensional functions shown in the figure fully embody them.) The resulting diffracted intensity, F ∝ |t(θ)|2 (as computed in Ex. 7.4), is shown in Fig. 7.5(d). The grating has channeled the incident radiation into a few equally spaced beams with directions θ = πp/ka, where p is an integer known as the order of the beam. ˜ 2N (θ)|2 : a sharp central peak with half width Each of these beams has a shape given by |H (distance from center of peak to first null of the intensity) λ/2Na, followed by a set of side lobes whose intensities are ∝ N −1 . The fact that the deflection angles θ = πp/ka = pλ/2a of these beams are proportional to λ underlies the use of diffraction gratings for spectroscopy. It is of interest to ask what the wavelength resolution of such an idealized grating might be. If one focuses attention on the p’th order beams at two wavelengths λ and λ + δλ (which are located at θ = pλ/2a and p(λ + δλ)/2a, then one can distinguish the beams from each other when their separation δθ = pδλ/2a is at least as large as the angular distance λ/2Na between the maximum of each beam’s diffraction pattern and its first minimum, i.e., when λ . R ≡ Np . δλ

(7.17)

R is called the grating’s chromatic resolving power. Real gratings are not this simple. First, they usually work not by modulating the amplitude of the incident radiation in this simple manner, but instead by modulating the phase. Second, the manner in which the phase is modulated is such as to channel most of the incident power into a particular order, a technique known as blazing. Third, gratings are often used in reflection rather than transmission. Despite these complications, the principles of a real grating’s operation are essentially the same as our idealized grating. Manufactured gratings typically have N & 10, 000, giving a wavelength resolution for visual light that can be as small as λ/105 ∼ 10 pm, i.e. 10−11 m.

7.3.2

Airy Pattern of a Circular Aperture: Hubble Space Telescope

The Hubble Space Telescope was launched in April 1990 to observe planets, stars and galaxies above the earth’s atmosphere. One reason for going into space is to avoid the irregular refractive index variations in the earth’s atmosphere, known, generically, as seeing, which degrade the quality of the images. (Another reason is to observe the ultraviolet part of the spectrum, which is absorbed in the earth’s atmosphere.) Seeing typically limits the angular resolution of Earth-bound telescopes at visual wavelengths to ∼ 0.5′′ . We wish to

13

Dθ/λ

V

0

1.22

Fig. 7.6: Airy diffraction pattern produced by a circular aperture.

compute how much the angular resolution improves by going into space. As we shall see, the computation is essentially an exercise in Fraunhofer diffraction theory. The essence of the computation is to idealise the telescope as a circular aperture with diameter equal to the diameter of the primary mirror. Light from this mirror is actually reflected onto a secondary mirror and then follows a complex optical path before being focused onto a variety of detectors. However, this path is irrelevant to the angular resolution. The purpose of the optics is merely to bring the Fraunhofer-region light to a focus close to the mirror, in order to produce an instrument that is compact enough to be launched and to match the sizes of stars’ images to the pixel size on the detector. In doing so, however, the optics leaves the angular resolution unchanged; the resolution is the same as if we were to observe the light, which passes through the primary mirror’s circular aperture, far beyond the mirror, in the Fraunhofer region. If the telescope aperture were very small, for example a pin hole, then the light from a point source (a very distant star) would create a broad diffraction pattern, and the telescope’s angular resolution would be correspondingly poor. As we increase the diameter of the aperture, we still see a diffraction pattern, but its angular width diminishes. Using these considerations, we can compute how well the telescope can distinguish neighboring stars. We do not expect it to resolve them fully if they are closer together on the sky than the angular width of the diffraction patttern. Of course, optical imperfections and pointing errors in a real telescope may degrade the image quality even further, but this is the best that we can do, limited only by the uncertainty principle. The calculation of the Fraunhofer amplitude far from the aperture is straightforward (Fig. 7.5): Z ψ(θ) ∝ e−ikx·θ dΣ Disk withdiameter D kDθ ∝ jinc (7.18) 2 where D is the diameter of the aperture (i.e., of the telescope’s primary mirror), θ ≡ |θ| is angle from the optic axis, and jinc(x) ≡ J1 (x)/x with J1 the Bessel function of order one. The flux from the star observed at angle θ is therefore ∝ jinc2 (kDθ/2). This intensity pattern, known as the Airy pattern, is shown in Fig. 7.6. There is a central “Airy disk” surrounded by a circle where the flux vanishes, and then further surrounded by a series of

14 concentric rings whose flux diminishes with radius. Only 16 percent of the total light falls outside the central Airy disk. The angular radius θA of the Airy disk, i.e. the radius of the dark circle surrounding it, is determined by the first zero of J1 (kDθ/2): θA = 1.22λ/D .

(7.19)

A conventional, though essentially arbitrary, criterion for angular resolution is to say that two point sources can be distinguished if they are separated in angle by more than θA . For the Hubble Space Telescope, D = 2.4m and θA ∼ 0.04′′ at visual wavelengths, which is over ten times better than is achievable on the ground with conventional (non-adaptive) optics. Initially, there was a serious problem with Hubble’s telescope optics. The hyperboloidal primary mirror was ground to the wrong shape, so rays parallel to the optic axis did not pass through a common focus after reflection off a convex hyperboloidal secondary mirror. This defect, known as spherical aberration, created blurred images. However, it was possible to correct this error in subsequent instruments in the optical train, and the Hubble Space Telescope became the most successful telescope of all time, transforming our view of the Universe.

7.3.3

Babinet’s Principle

Suppose that monochromatic light falls normally onto a large circular aperture with diameter D. At distances z . D 2 /λ (i.e., rF . D), the transmitted light will be collimated into a beam with diameter D, and at larger distances, the beam will become conical with opening angle λ/D and flux distribution given by the Airy diffraction pattern of Fig. 7.6. Now, place into this aperture a significantly smaller object (size a ≪ D; Fig. 7.7) with transmissivity t1 (x) — for example an opaque star-shaped object. This object will produce a Fraunhofer diffraction pattern with opening angle λ/a ≫ λ/D that extends well beyond the large aperture’s beam. Outside that beam, the diffraction pattern will be insensitive to the shape and size of the large aperture because only the small object can diffract light to these large angles; so the diffracted flux will be F1 (θ) ∝ |˜t1 (θ)|2 . Suppose, next, that we replace the small object by one with a complentary transmissivity t2 , complementary in the sense that t1 (x) + t2 (x) = 1 .

(7.20a)

For example, we replace a small, opaque star-shaped object by an opaque screen that fills the original, large aperture except for a star-shaped hole. This new, complementary object will produce a diffraction pattern F2 (θ) ∝ |˜t2 (θ)|2 . Outside the large aperture’s beam, this pattern again is insensitive to the size and shape of the large aperture, i.e., insensitive to the 1 in t2 = 1 − t1 ; so at these large angles, ˜t2 (θ) = −˜t1 (θ), which implies that the intensity diffraction pattern of the original object and the new, complementary object will be the same, outside the large aperture’s beam: F2 (θ) ∝ |˜t2 (θ)|2 = |˜t1 (θ)|2 ∝ F1 (θ) .

(7.20b)

15 ~λ/a

~λ/D

object, size a

Screen

D

Fig. 7.7: Geometry for Babinet’s Principle. The beam produced by the large aperture D is confined between the long-dashed lines. Outside this beam, the intensity pattern F (θ) ∝ |t(θ)|2 produced by a small object (size a) and its complement are the same, Eqs. (7.20).

This is called Babinet’s Principle **************************** EXERCISES Exercise 7.1 Practice: Convolutions and Fourier Transforms (a) Calculate the one-dimensional Fourier transforms of the functions f1 (x) ≡ e−x and f2 ≡ 0 for x < 0, f2 ≡ e−x/h for x ≥ 0.

2 /2σ 2

,

(b) Take the inverse transforms of your answers to part (a) and recover the original functions. (c) Convolve the exponential function f2 with the Gaussian function f1 and then compute the Fourier transform of their convolution. Verify that the result is the same as the product of the Fourier transforms of f1 and f2 .

Exercise 7.2 Problem: Pointilist Painting The neo-impressionist painter George Seurat was a member of the pointillist school. His paintings consisted of an enormous number of closely spaced dots of pure pigment (of size ranging from ∼ 0.4mm in his smaller paintings to ∼ 4mm in his largest paintings such as A Sunday afternoon on the island of La Grande Jatte, Fig. 7.8). The illusion of color mixing was produced only in the eye of the observer. How far from the painting should one stand in order to obtain the desired blending of color?

16

Fig. 7.8: Left: George Seurat’s painting A Sunday afternoon on the Island of La Grande Jatte. When viewed from sufficient distance, adjacent dots of paint with different colors blend together in the eye to form another color. Right: Enlargement of the woman at the center of the painting. In this enlargement one sees clearly the individual dots of paint.

Exercise 7.3 Problem: Thickness of a Human Hair Conceive and carry out an experiment using light diffraction to measure the thickness of a hair from your head, accurate to within a factor ∼ 2. [Hint: make sure the source of light that you use is small enough that its finite size has negligible influence on your result.] Exercise 7.4 Derivation: Diffraction Grating Use the convolution theorem to carry out the calculation of the Fraunhofer diffraction pattern from the grating shown in Fig. 7.5. [Hint: To show that the Fourier transform of the infinite sequence of equally spaced delta functions a similar sequence P+∞ isi2kanθ (aside from a of delta functions, perform the Fourier transform to get n=∞ e multiplicative factor); then use the formulas for a Fourier series expansion, its P+∞ and i2kanθ inverse, for any function that is periodic with period π/ka to show that n=−∞ e is a sequence of delta functions.] Exercise 7.5 Derivation: Airy Pattern Derive and plot the Airy diffraction pattern (7.18) and show that 84 percent of the light is contained within the Airy disk. Exercise 7.6 Problem: Triangular Diffraction Grating Sketch the Fraunhofer diffraction pattern you would expect to see from a diffraction grating made from three groups of parallel lines aligned at angles of 120◦ to each other (Fig. 7.9).

17

Fig. 7.9: Diffraction grating formed from three groups of parallel lines.

Exercise 7.7 Problem: Light Scattering by Particles Consider the scattering of light by an opaque particle of size a ≫ 1/k. One component of the scattered radiation is due to diffraction around the particle. This component is confined to a cone with opening angle ∆θ ∼ π/ka ≪ 1 about the incident wave direction. It contains power PS = F A, where F is the incident intensity and A is the cross sectional area of the particle perpendicular to the incident wave. (a) Give a semi-quantitative derivation of ∆θ and PS using Babinet’s principle. (b) Explain why the total “extinction” (absorption plus scattering) cross section is equal to 2A independent of the shape of the opaque particle.

****************************

7.4

Fresnel Diffraction

√ We next turn to the Fresnel region of observation points P with rF = λρ much smaller than the aperture. In this region, the field at P arriving from different parts of the aperture has significantly different phase ∆ϕ ≫ 1. We again specialize to incoming wave vectors that are approximately orthogonal to the aperture plane and to small diffraction angles so that we can ignore the obliquity factor. By contrast with the Fraunhofer case, however, we identify P by its distance z from the aperture plane instead of its distance ρ from the aperture center, and we use as our integration variable in the aperture x′ ≡ x − ρθ (cf. Fig. 7.4.), thereby writing the dependence of the phase at P on x in the form ′4 kx′ 2 kx ∆ϕ ≡ k × [(path length from x to P) − z] = +O . (7.21) 2z z3 In the Fraunhofer region (Sec. 7.3 above), only the linear term −kx · θ in kx′ 2 /2z ≃ k(x − rθ)2 /r was significant. In the Fresnel region the term quadratic in x is also significant (and we have changed variables to x′ so as to simplify it), but the O(x′ 4 ) term is negligible.

18 Let us consider the Fresnel diffraction pattern formed by a simple aperture of arbitrary shape, illuminated by a normally incident plane wave. It is convenient to introduce transverse Cartesian coordinates (x′ , y ′) and to define σ=

k πz

1/2

′

x ,

τ=

1/2

k πz

y′ .

(7.22a)

√ [Notice that (k/πz)1/2 is 2/rF ; cf. Eq. (7.8).] We can thereby rewrite Eq. (7.6) (setting the obliquity factor to one) in the form ikeikz ψP = − 2πz

Z

i∆ϕ

e

Q

i ψQ dx dy = − 2 ′

′

Z Z

eiπσ

2 /2

eiπτ

2 /2

ψQ eikz dσdτ .

(7.22b)

Q

We shall use this rather general expression in Sec. 7.5, when discussing Paraxial Fourier optics. In this section we shall focus on the details of the Fresnel diffraction pattern for an incoming plane wave that falls perpendicularly on the aperture, so ψQ is constant over the aperture.

7.4.1

Rectangular Aperture, Fourier Integrals and Cornu Spiral

For simplicity, we initially confine attention to a rectangular aperture with edges along the x′ and y ′ directions. Then the two integrals have limits that are independent of each other and the integrals can be expressed in the form E(σmax ) − E(σmin ) and E(τmax ) − E(τmin ), so ψP =

−i −i [E(σmax ) − E(σmin )][E(τmax ) − E(τmin )]ψQ eikz ≡ ∆Eσ ∆Eτ ψQ eikz , 2 2

(7.23a)

where the arguments are the limits of integration and where E(ξ) ≡

Z

ξ

eiπσ

2 /2

dσ ≡ C(ξ) + iS(ξ) .

(7.23b)

0

Here C(ξ) ≡

Z

ξ 2

dσ cos(πσ /2) , 0

S(ξ) ≡

Z

ξ

dσ sin(πσ 2 /2) .

(7.23c)

0

are known as Fresnel Integrals, and are standard functions tabulated in many books and known to Mathematica and Maple. Notice that the intensity distribution is F ∝ |ψP |2 ∝ |∆Eσ |2 |∆Eτ |2 .

(7.23d)

It is convenient to exhibit the Fresnel integrals graphically using a Cornu spiral, Fig. 7.10. This is a graph of the parametric equation [C(ξ), S(ξ)], or equivalently a graph of E(ξ) =

19 S

1.5 2.5

0.6

1.0

0.4

-0.75 -0.5 -0.5

0.5 0.25

0.5

0.75

C

v

ξ

-0.25

ξ

v

2.0 0.2

ξ=σmax

-0.2 -2.0

ξ=σmin

-1.0 -2.5

-0.4 -0.6

-1.5

Fig. 7.10: Cornu Spiral in the complex plane; the real part of E(ξ) = C(ξ) + iS(ξ) is plotted horizontally and the imaginary part vertically; the point ξ = 0 is at the origin, positive ξ in the upper right quadrant, and negative ξ in the lower left quadrant. The diffracted intensity is proportional to the square of the arrow reaching from ξ = σmin to ξ = σmax .

C(ξ) + iS(ξ) in the complex plane. The two terms ∆Eσ and ∆Eτ in Eq. (7.23b) can be represented in amplitude and phase by arrows in the (C, S) plane reaching from ξ = σmin on the Cornu spiral to ξ = σmax (Fig. 7.10), and from ξ = τmin to ξ = τmax . Correspondingly, the intensity, F [Eq. (7.23d)], is proportional to the product of the squared lengths of these two vectors.

7.4.2

Unobscured Plane Wave

The simplest illustration is the totally unobscured plane wave. In this case, the limits of both integrations extend from −∞ to +∞, which as we see in Fig. 7.10 is an arrow of length 21/2 and phase π/4. Therefore, ψP is equal to (21/2 eiπ/4 )2 (−i/2)ψQ eikz = ψQ eikz , as we could have deduced simply by solving the Helmholtz equation (7.1b) for a plane wave. This unobscured-wavefront calculation elucidates three issues that we have already met. First, it illustrates our interpretation of Fermat’s principle in geometric optics. In the limit of short wavelength, the paths that contribute to the wave field are just those along which the phase is stationary to small variations in path. Our present calculation shows that, because of the tightening of the Cornu spiral as one moves toward a large argument, the paths that contribute significantly to ψP are those that are separated from the geometric-optics path by less than a few Fresnel lengths √ at Q. (For a laboratory experiment with light and z ∼ 2m, a Fresnel length is typically λz ∼ 1mm.) A second, and related, point is that in computing the Fresnel diffraction pattern from a more complicated aperture, we need only perform the integral (7.6) in the immediate vicinity of the geometric-optics ray. We can ignore the contribution from the extension of the aperture Q to meet the “sphere at infinity” (the surface E in Fig. 7.1) even when the wave is unobstructed there. The rapid phase variation makes the contribution from E sum to zero.

20

B B

|ψ|2

A

1

A C C 1/4 0

x

Fig. 7.11: Intensity diffraction pattern formed by a straight edge, and graphical interpretation using Cornu Spiral. The intensity |ψ|2 ∝ |∆Eσ |2 is proportional to the squared length of the vector whose tail is at the center of the lower left spiral and whose tip moves along the spiral curve.

Third, in integrating over the whole area of the wave front at Q, we have summed contributions with increasingly large phase differences that add in such a way that the total has a net extra phase of π/2, relative to the geometric-optics ray. This phase factor cancels exactly the prefactor −i in the Fresnel-Kirchhoff integral, Eq. (7.6). (This phase factor is unimportant in the limit of geometric optics.)

7.4.3

Fresnel Diffraction by a Straight Edge: Lunar Occultation of a Radio Source

The next simplest case of Fresnel diffraction is the pattern formed by a straight edge. As a specific example, consider a cosmologically distant source of radio waves that is occulted by the moon. If we treat the lunar limb as a straight edge, then the radio source will create a changing diffraction pattern as it passes behind the moon, and the diffraction pattern can be measured by a radio telescope on earth. We orient our coordinates so the moon’s edge √ is along the y ′ direction (t direction). Then in Eq. (7.23a) ∆Eτ ≡ E(τmax ) − E(τmin ) = 2i is constant, and ∆Eσ ≡ E(σmax ) − E(σmin ) is described by the Cornu spiral. Long before the occultation, ∆Eσ will be given by the arrow from (−1/2, −1/2) to √ (1/2, 1/2), i.e. ∆Eσ = 2i. The observed wave amplitude, Eq. (7.23a), is therefore ψQ eikz . When the moon starts to occult the radio source, the upper bound on the Fresnel integral begins to diminish from σmax = +∞, and the complex vector on the Cornu spiral begins to oscillate in length (e.g., from A to B in Fig. 7.11) and in phase. The observed flux will also oscillate, more and more strongly as geometric occultation is approached. At the point of geometric occultation (point C in Fig. 7.11), the complex vector extends from (−1/2, −1/2) to (0, 0) and so the observed wave amplitude is one half the unocculted value, and the intensity is reduced to one fourth. As the occultation proceeds, the length of the complex

21

Fig. 7.12: Fresnel diffraction pattern in the shadow of Mary’s hand holding a dime — a photograph by Eugene Hecht, from Fig. 10.1 of Hecht (1998).

vector and the observed flux will decrease monotonically to zero, while the phase continues to oscillate. Historically, diffraction of a radio source’s waves by the moon led to the discovery of quasars—the hyperactive nuclei of distant galaxies. In the early 1960s, a team of British radio observers led by Cyril Hazard knew that the moon would occult a powerful radio source named 3C273, so they set up their telescope to observe the development of the diffraction pattern as the occultation proceeded. From the pattern’s observed times of ingress (passage into the moon’s shadow) and egress (emergence from the moon’s shadow), Hazard determined the coordinates of 3C273 on the sky. These coordinates enabled Maarten Schmidt at the 200-inch telescope on Palomar Mountain to identify 3C273 optically and discover (from its optical redshift) that it was surprisingly distant and consequently had an unprecedented luminosity. In Hazard’s occultation measurements, the observing wavelength was λ ∼ 0.2√m. Since the moon is roughly z ∼ 400, 000 km distant, the Fresnel length was about rF = λz ∼ 10 km. The moon’s orbital speed is v ∼ 200 m s−1 , so the diffraction pattern took a time ∼ 5rF /v ∼ 4 min to pass through the telescope. The straight-edge diffraction pattern of Fig. 7.11 occurs universally along the edge of the shadow of any object, so long as the source of light is sufficiently small √ and the shadow’s edge bends on lengthscales long compared to the Fresnel length rF = λz. Examples are the diffraction patterns on the two edges of a slit’s shadow in the upper left curve in Fig. 7.3, and the diffraction pattern along the edge of a shadow cast by a person’s hand in Fig. 7.12.

7.4.4

Circular Apertures: Fresnel Zones and Zone Plates

We have shown how the Fresnel diffraction pattern for a plane wave can be thought of as formed by waves that derive from a patch a few Fresnel lengths in size. This notion can be made quantitatively useful by reanalyzing the unobstructed wave front in circular polar coordinates. More specifically: consider, a plane wave incident q on an aperture Q that is infinitely large (no obstruction), and define ̟ ≡ |x′ |/rF =

1 (σ 2 2

+ τ 2 ). Then the phase

22 Im ψ

ρ2 = 0, 2, 4, . . .

Re ψ

ρ

ρ2 = 1, 3, 5, . . . ρ2

Fresnel Zones

Fig. 7.13: Amplitude-and-phase diagram for an unobstructed plane wave front, decomposed into Fresnel zones; Eq. (7.24).

factor in Eq. (7.22b) is ∆ϕ = π̟ 2 and the observed wave will thus be given by Z ̟ 2 ψP = −i π̟d̟eiπ̟ ψQ eikz 0

2

= (1 − eiπ̟ )ψQ eikz .

(7.24)

Now, this integral does not appear to converge as ̟ → ∞. We can see what is happening if we sketch an amplitude-and-phase diagram (Fig. 7.13). Adding up the contributions to ψP from each annular ring, we see that as we integrate outward from ̟ = 0, the complex vector has the initial phase retardation of π/2 but then moves on a semi-circle so that by the time we have integrated out to a radius of rF (̟ = 1), the contribution to the observed wave is ψP = 2ψQ in√phase with the √ incident wave. Then, when the integration has been extended onward to 2 rF , (̟ = 2), the circle has been completed and ψP = 0! The integral continues on around the same circle as the upper-bound radius is further increased. Of course, the field must actually have a well-defined value, despite this apparent failure of the integral to converge. To understand how the field becomes well-defined, imagine splitting the √ aperture Q up into concentric annular rings, known as Fresnel half-period zones, of radius n rF , where n = 1, 2, 3 . . . . The integral fails to converge because the contribution from each odd-numbered ring cancels that from √ an adjacent even-numbered ring. However, the thickness of these rings decreases as 1/ n, and eventually we must allow for the fact that the incoming wave is not exactly planar; or, equivalently and more usefully, we must allow for the fact that the wave’s distant source has some finite angular size. The finite size causes different pieces of the source to have their Fresnel rings centered at slightly different points in the aperture plane, and this causes our computation of ψP to begin averaging over rings. This averaging forces the tip of the complex vector to asymptote to the center of the circle in Fig. 7.13. Correspondingly, due to the averaging, the observed intensity asymptotes to |ψQ |2 (Eq. (7.24) with the exponential going to zero). Although this may not seem to be a particularly wise way to decompose a plane wave front, it does allow a particularly striking experimental verification of our theory of diffraction. Suppose that we fabricate an aperture (called a zone plate) in which, for a chosen

23 observation point P on the optic axis, alternate half-period zones are obscured. Then the wave observed at P will be the linear sum of several diameters of the circle in Fig. 7.13, and therefore will be far larger than ψQ . This strong amplification is confined to our chosen spot on the optic axis; most everywhere else the field’s intensity is reduced, thereby conserving energy. Thus, the zone plate behaves like a lens. The lens’s focal length is f = kA/2π 2 , where A (typically chosen to be a few mm2 for light) is the common area of each of the half-period zones. Zone plates are only good lenses when the radiation is monochromatic, since the focal length is wavelength-dependent, f ∝ λ−1 . They have the further interesting property that they possess secondary foci, where the fields from 3, 5, 7, . . . contiguous zones add up coherently (Ex. 7.9). **************************** EXERCISES Exercise 7.8 Exercise: Diffraction Pattern from a Slit Derive a formula for the intensity diffraction pattern F (x) of a slit with width a, as a function of distance x from the center of the slit, in terms of Fresnel integrals. Plot your formula for various distances z from the slit’s plane, i.e. for various values of p rF /a = λz/a2 (using, e.g., Mathematica or Maple), and compare with Fig. 7.3. Exercise 7.9 Problem: Zone Plate (a) Use an amplitude-and-phase diagram to explain why a zone plate has secondary foci at distances of f /3, f /5, f /7 . . . . (b) An opaque, perfectly circular disk of diameter D is placed perpendicular to an incoming plane wave. Show that, at distances r such that rF ≪ D, the disk casts a rather sharp shadow, but at the precise center of the shadow there should be a bright spot.4 How bright? Exercise 7.10 Example: Seeing in the atmosphere. Stars viewed through the atmosphere appear to have angular diameters of order an arc second and to exhibit large amplitude fluctuations of flux with characteristic frequencies that can be as high as 100Hz. Both of these phenomena are a consequence of irregular variations in the refractive index of the atmosphere. An elementary model of this effect consists of a thin phase-changing screen, about a km above the ground, on which the rms phase variation is ∆ϕ & 1 and the characteristic spatial scale, on which the phase changes by ∼ ∆ϕ, is a. 4

Poisson predicted the existence of this spot as a consequence of Fresnel’s wave theory of light, in order to demonstrate that Fresnel’s theory was wrong. However, Dominique Arago quickly demonstrated experimentally that the bright spot existed.

24 (a) Explain why the rays will be irregularly deflected through a scattering angle ∆θ ∼ (λ/a)∆ϕ. Strong intensity variation requires that several rays deriving from points on the screen separated by more than a, combine at each point on the ground. These rays combine to create a diffraction pattern on the ground with scale b. √ (b) Show that the Fresnel length in the screen is ∼ ab. Now the time variation arises because winds in the upper atmosphere with speeds u ∼ 30m s−1 blow the irregularities and the diffraction pattern past the observer. Use this information to estimate the Fresnel length, rF , the atmospheric fluctuation scale size a, and the rms phase variation ∆ϕ. Do you think the assumptions of this model are well satisfied? Exercise 7.11 Challenge: Multi-Conjugate Adaptive Optics The technique of Adaptive Optics can be used to improve the quality of the images observed by a telescope. Bright artificial “laser stars” are created by shining several lasers at layers of sodium atoms in the upper atmosphere and observing the scattered light. The wavefronts from these “stars” will be deformed at the telescope due to inhomogeneities in the lower atmosphere, and the deformed wavefront shapes can be measured across the image plane. The light from a much dimmer adjacent astronomical source can then be processed, e.g. using a deformable reflecting surface, so as to remove its wavefront distortions. Discuss some of the features that an effective adaptive optics system needs. Assume the atmospheric model discussed in Ex. 7.10. Exercise 7.12 Problem: Spy Satellites Telescopes can also look down through the same atmospheric irregularities as those discussed in the previous example. In what important respects will the optics differ from that for telescopes looking upward?

****************************

7.5

Paraxial Fourier Optics

We have developed a linear theory of wave optics which has allowed us to calculate diffraction patterns in the Fraunhofer and Fresnel limiting regions. That these calculations agree with laboratory measurements provides some vindication of the theory and the assumptions implicit in it. We now turn to practical applications of these ideas, specifically to the acquisition and processing of images by instruments operating throughout the electromagnetic spectrum. As we shall see, these instruments rely on an extension of paraxial geometric optics (Sec. 6.4) to situations where diffraction effects are important. Because of the central role played by Fourier transforms in diffraction [e.g. Eq. (7.11a)], the theory underlying these instruments is called paraxial Fourier optics, or just Fourier optics.

25 Although the conceptual framework and mathematical machinery for image processing by Fourier optics were developed over a century ago, Fourier optics has only been widely exploited during the past thirty years. This maturation has been driven in part by a growing recognition of similarities between optics and communication theory — for example, the realization that a microscope is simply an image processing system. The development of electronic computation has also triggered enormous strides; computers are now seen as extensions of optical devices, and vice versa. It is a matter of convenience, economics and practicality to decide which parts of the image processing are carried out with mirrors, lenses, etc., and which parts are performed numerically. One conceptually simple example of optical image processing would be an improvement in one’s ability to identify a faint star in the Fraunhofer diffraction rings (“fringes”) of a much brighter star. As we shall see below [Eq. (7.30) and subsequent discussion], the bright image of a source in a telescope’s or microscope’s focal plane has the same Airy diffraction pattern as we met in Eq. (7.18) and Fig. 7.6. If the shape of that image could be changed from the ring-endowed Airy pattern to a Gaussian, then it would be far easier to identify a nearby feature or faint star. One way to achieve this would be to attenuate the incident radiation at the telescope aperture in such a way that, immediately after passing through the aperture, it has a Gaussian profile instead of a sharp-edged profile. Its Fourier transform (the diffraction pattern in the focal plane) would then also be a Gaussian. Such a Gaussianshaped attenuation is difficult to achieve in practice, but it turns out—as we shall see–that there are easier options. Before exploring these options, we must lay some foundations, beginning with the concept of coherent illumination in Sec. 7.5.1, and then point spread functions in Sec. 7.5.2.

7.5.1

Coherent Illumination

If the radiation that arrives at the input of an optical system derives from a single source, e.g. a point source that has been collimated into a parallel beam by a converging lens, then the radiation is best described by its complex amplitude ψ (as we are doing in this chapter). An example might be a biological specimen on a microscope slide, illuminated by an external point source, for which the phases of the waves leaving different parts of the slide are strongly correlated with each other. This is called coherent illumination. If, by contrast, the source is self luminous and of non-negligible size, with the atoms or molecules in its different parts radiating independently—for example a cluster of stars—then the phases of the radiation from different parts are uncorrelated, and it may be the radiation’s intensity, not its complex amplitude, that obeys well-defined (non-probabilistic) evolution laws. This is called incoherent illumination. In this chapter we shall develop Fourier optics for a coherently illuminating source (the kind of illumination tacitly assumed in previous sections of the chapter) . A parallel theory with a similar vocabulary can be developed for incoherent sources, and some of the foundations for it will be laid in Chap. 8. In Chap. 8 we shall also develop a more precise formulation of the concept of coherence.

26

7.5.2

Point Spread Functions

In our treatment of paraxial geometric optics (Sec. 6.4), we showed how it is possible to regard a group of optical elements as a sequence of linear devices and relate the output rays to the input by linear operators, i.e. matrices. This chapter’s theory of diffraction is also linear and so a similar approach can be followed. As in Sec. 6.4, we will restrict attention to small angles relative to some optic axis (“paraxial Fourier optics”). We shall describe the wave field at some distance zj along the optic axis by the function ψj (x), where x is a two dimensional vector perpendicular to the optic axis as in Fig. 7.4. If we consider a single linear optical device, then we can relate the output field ψ2 at z2 to the input ψ1 at z1 using a Green’s function denoted P21 (x2 , x1 ): ψ2 (x2 ) =

Z

P21 (x2 , x1 )dΣ1 ψ1 .

(7.25)

If ψ1 were a δ-function, then the output would be simply given by the function P21 , up to normalization. For this reason, P21 is usually known as the Point Spread Function. Alternatively, we can think of it as a propagator. If we now combine two optical devices sequentially, so the output of the first device ψ2 is the input of the second, then the point spread functions combine in the natural manner of any linear propagator to give a total point spread function P31 (x3 , x1 ) =

Z

P32 (x3 , x2 )dΣ2 P21 (x2 , x1 ) .

(7.26)

Just as the simplest matrix for paraxial, geometric-optics propagation is that for free propagation through some distance d, so also the simplest point spread function is that for free propagation. From Eq. (7.22b) we see that it is given by P21

−ik ikd e exp = 2πd

ik(x1 − x2 )2 2d

for free propagation through a distance d = z2 − z1 .

(7.27) Note that this P21 depends upon only on x1 − x2 and not on x1 or x2 individually, as it should because there is translational invariance in the x1 , x2 planes. A thin lens adds or subtracts an extra phase ∆ϕ to the wave, and ∆ϕ depends quadratically on distance |x| from the optic axis, so the angle of deflection, which is proportional to the gradient of the phase, will depend linearly on x. Correspondingly, the point-spread function for a thin lens is −ik|x1 |2 P21 = exp (7.28) δ(x2 − x1 ) for a thin lens with focal length f . 2f For a converging lens, f is positive; for a diverging lens, it is negative.

27

xL

xS

Source plane u

xF

xL'

Lens plane f

Focal plane

xI

Image plane v

Fig. 7.14: Wave theory of a single converging lens. The focal plane is a distance f =(lens focal length) from the lens plane; and the impage plane is a distance v = f u/(u − f ) from the lens plane.

7.5.3

Abb´ e’s Description of Image Formation by a Thin Lens

We can use these two point spread functions to give a wave description of the production of images by a single converging lens, in parallel to the geometric-optics description of Figs. 6.5 and 6.7. We shall do this in two stages. First, we shall propagate the wave from the source plane S a distance u in front of the lens, through the lens L, to its focal plane F a distance f behind the lens (Fig. 7.14). Then we shall propagate the wave a further distance v − f from the focal plane to the image plane. We know from geometric optics that v = f u/(u − f ) [Eq. (6.60)]. We shall restrict ourselves to u > f so v is positive and the lens forms a real image. Using Eqs. (7.26), (7.27), (7.28), we obtain for the propagator from the source plain to the focal plane Z PF S = PF L′ dΣL′ PL′ L dΣL LPLS Z −ik|xL |2 ik ikf ik(xF − x′L )2 = dΣL′ δ(xL′ − xL ) exp e exp 2πf 2f 2f 2 ik(xL − xS ) −ik iku e exp ×dΣL 2πu 2u 2 −ik ik(f +u) ikxF · xS ikxF = exp − . (7.29) e exp − 2πf 2(v − f ) f Here we have extended all integrations to ±∞ and have used the values of the Fresnel integrals at infinity, E(±∞) = ±(1 + i)/2 R to get the expression on the last line. The wave in the focal plane is given by ψF (xF ) = PF S dΣS ψS (xS ), which integrates to ikx2F ik ik(f +u) e exp − ψ˜S (xF /f ) . ψF (xF ) = − (7.30) 2πf 2(v − f )

28 Here ψ˜S (θ) =

Z

dΣS ψS (xS )e−ikθ·xS .

(7.31)

Thus, we have shown that the field in the focal plane is, apart from an unimportant phase factor, proportional to the Fourier transform of the field in the source plane; in other words, the focal-plane field is the Fraunhofer diffraction pattern of the input wave. That this has to be the case can be understood from Fig. 7.14. The focal plane F is where the converging lens brings parallel rays from the source plane to a focus. By doing so, the lens in effect brings in from “infinity” the Fraunhofer diffraction pattern of the source, and places it into the focal plane. It now remains to propagate the final distance from the focal plane to the Rimage plane. We do so with the free-propagation point-spread function of Eq. (7.27): ψI = PIF dΣF ψF , which integrates to ψI (xI ) = −

u v

ik(u+v)

e

exp

ikx2I 2(v − f )

ψS (xS = −xI u/v) .

(7.32)

This says that (again ignoring a phase factor) the wave in the image plane is just a magnified version of the wave in the source plane, as we might have expected from geometric optics. In words, the lens acts by taking the Fourier transform of the source and then takes the Fourier transform again to recover the source structure.5 The focal plane is a convenient place to process the image by altering its Fourier transform— a process known as spatial filtering. One simple example is a low-pass filter in which a small circular aperture or “stop” is introduced into the focal plane, thereby allowing only the low-order spatial Fourier components to be transmitted to the image plane. This will lead to considerable smoothing of the wave. An application is to the output beam from a laser (Chap. 9), which ought to be smooth but has high spatial frequency structure on account of noise and imperfections in the optics. A low-pass filter can be used to clean the beam. In the language of Fourier transforms, if we multiply the transform of the source, in the focal plane, by a small-diameter circular aperture function, we will thereby convolve the image with a broad Airy-disk smoothing function. Conversely, we can exclude the low spatial frequencies with a high-pass filter, e.g. by placing an opaque circular disk in the focal plane, centered on the optic axis. This will have the effect of accentuating boundaries and discontinuities in the source and can be used to highlight features where the gradient of the brightness is large. Another type of filter is used when the image is pixellated and thus has unwanted structure with wavelength equal to pixel size: a narrow range of frequencies centered around this spatial frequency is removed by putting an appropriate filter in the focal plane.

7.5.4

Phase Contrast Microscopy

“Phase contrast microscopy” (Fig. 7.15) is a useful technique for studying small objects, such as transparent biological specimens, that modify the phase of coherent illuminating light but not its amplitude. Suppose that the phase change in the specimen, ϕ(x), is small, |ϕ| ≪ 1, 5

This description of image formation was developed by Ernst Abbé in 1873.

29 1/4-wave phase plate and attenuator

Aperture Edge

Specimen

Airy Disk

Aperture Edge

Focal Plane

Image Plane

Fig. 7.15: Schematic Phase Contrast Microscope.

as often is the case for biological specimens. We can then write the field just after it passes through the specimen as ψS (x) = H(x)eiϕ(x) ≃ H(x) + iϕ(x)H(x) ;

(7.33)

Here H is the microscope’s aperture function, unity for |x| < D/2 and zero for |x| > D/2, with D the aperture diameter. The intensity is not modulated, and therefore the effect of the specimen on the wave is very hard to observe unless one is clever. Equation (7.33) and the linearity of the Fourier transform imply that the wave in the focal plane is the sum of (i) the Fourier transform of the aperture function, i.e. an Airy function (bright spot with very small diameter), and (ii) the transform of the phase function convolved with that of the aperture (in which the fine-scale variations of the phase function dominate and push ϕ˜ to large radii in the focal plane, Fig. 7.15, and the aperture has little influence): kD|xF | kxF + iϕ˜ . (7.34) ψF ∼ jinc 2f f If a high pass filter is used to remove the Airy disk completely, then the remaining wave field in the image plane will be essentially ϕ magnified by v/u. However, the intensity F ∝ (ϕv/u)2 will be quadratic in the phase and so the contrast in the image will still be small. A better technique is to phase shift the Airy disk in the focal plane by π/2 so that the two terms in Eq. (7.34) are in phase. The intensity variations, F ∼ (1 ± ϕ)2 ≃ 1 ± 2ϕ, will now be linear in the phase ϕ. An even better procedure is to attenuate the Airy disk until its amplitude is comparable with the rms value of ϕ and also phase shift it by π/2 (as indicated by the “1/4 wave phase plate and attenuation” in Fig. 7.15). This will maximise the contrast in the final image. Analogous techniques are used in communications to interconvert amplitude-modulated and phase-modulated signals.

30

7.5.5

Gaussian Beams: Interferometric Gravitational-Wave Detectors

The mathematical techniques of Fourier optics enable us to analyze the structure and propagation of light beams that have Gaussian profiles. (Such Gaussian beams are the natural output of ideal lasers, they are the real output of spatially filtered lasers, and they are widely used for optical communications, interferometry and other practical applications. Moreover, they are the closest one can come in the real world of wave optics to the idealization of a geometric-optics pencil beam.) Consider a beam that is precisely plane-fronted, with a Gaussian profile, at location z = 0 on the optic axis, −̟ 2 ; (7.35) ψ0 = exp σ02 here ̟ = |x| is radial distance from the optic axis. The form of this same wave at a distance z further down the optic axis can be computed by folding this ψ0 into the point spread function (7.27) (with the distance d replaced by z). The result is σ0 exp ψz = σz

−̟ 2 σz2 )

2 k̟ −1 z exp i , + kz − tan 2Rz z0

(7.36a)

where z0 =

kσ02 πσ02 = , 2 λ

σz = σ0 (1 + z 2 /z02 )1/2 ,

Rz = z(1 + z02 /z 2 ) .

(7.36b)

These equations for the freely propagating Gaussian beam are valid for negative z as well as positive. From these equations we learn the following properties of the beam: 2 • The beam’s cross sectional intensity distribution F ∝ |ψz |2 ∝ exp(−̟ /σz2 ) remains p a Gaussian as the wave propagates, with a beam radius σz = σ0 1 + z 2 /z02 that is a minimum at z = 0 (the beam’s waist) and grows away from the waist, both forward and backward, in just the manner one expects from our uncertainty-principle discussion of wave-front spreading [Eq. (7.9)]. At distances |z| p √ ≪ z0 from the waist location (corresponding to a Fresnel length rF = λ|z| ≪ πσ0 ), the√beam radius is nearly constant; this is the Fresnel region. At distances z ≫ z0 (rF ≫ πσ0 ), the beam radius increases linearly, i.e. the beam spreads with an opening angle θ = σ0 /z0 = λ/(πσ0 ); this is the Fraunhofer region.

• The beam’s wave fronts (surfaces of constant phase) have ϕ = k̟ 2 /2Rz + kz − tan−1 (z/z0 ) = constant. The tangent term (called the wave’s Guoy phase) varies far far more slowly with changing z than does the kz term, so the wave fronts are almost precisely z = ̟ 2 /2Rz +constant, which is a segment of a sphere of radius Rz . Thus, the wave fronts are spherical, with radii of curvature Rz = z(1 + z02 /z 2 ), which is infinite (flat phase fronts) at the waist z = 0, increases to 2z0 at z = z0 (boundary between

31 Fresnel and Fraunhofer regions and beginning of substantial wave front spreading), and then decreases as z02 /z (gradually flattening of spreading wave fronts) as one moves deep into the Fraunhofer region. • The Gaussian beam’s form (7.36) at some arbitrary location is fully characterized by three parameters: the wavelength λ = 2π/k, the distance z to the waist, and the beam radius at the waist σ0 [from which one can compute the local beam radius σz and the local wave front radius of curvature Rz via Eqs. (7.36b). One can easily compute the effects of a thin lens on a Gaussian beam by folding the ψz at the lens’s location into the lens point spread function (7.28). The result is a phase change that preserves the general Gaussian form of the wave, but alters the distance z to the waist and the radius σ0 at the waist. Thus, by judicious placement of lenses (or, equally well curved mirrors), and with judicious choices of the lenses’ and mirrors’ focal lengths, one can tailor the parameters of a Gaussian beam to fit whatever optical device one is working with. For example, if one wants to send a Gaussian beam into a self-focusing optical fiber (Exs. 6.7 and 7.14), one should place its waist at the entrance to the fiber, and adjust its waist size there to coincide with that of the fiber’s Gaussian mode of propagation (the mode analyzed in Ex. 7.14). The beam will then enter the fiber smoothly, and will propagate steadily along the fiber, with the effects of the transversely varying index of refraction continually compensating for the effects of diffraction so as to keep the phase fronts flat and the waist size constant. Gaussian beams are used (among many other places) in interferometric gravitationalwave detectors, such as LIGO (the Laser Interferometer Gravitational-wave Observatory). We shall learn how these GW interferometers work in Sec. 8.5. For the present, all we need to know is that a GW interferometer entails an optical cavity formed by mirrors facing each other, as in Fig. 6.8 of Chap. 6. A Gaussian beam travels back and forth between the two mirrors, with its light superposing on itself coherently after each round trip, i.e. the light resonates in the cavity formed by the two mirrors. Each mirror hangs from an overhead support, and when a gravitational wave passes, it pushes the hanging mirrors back and forth with respect to each other, causing the cavity to lengthen and shorten by a very tiny fraction of a light wavelength. This puts a tiny phase shift on the resonating light, which is measured by allowing some of the light to leak out of the cavity and interfere with light from another, similar cavity. See Sec. 8.5. In order for the light to resonate in the cavity, the mirrors’ surfaces must coincide with the Gaussian beam’s wave fronts. Suppose that the mirrors are identical, with radii of curvature R, and are separated by a distance L = 4km, as in LIGO. Then the beam must be symmetric around the center of the cavity, so its waist must be half-way between the mirrors. What is the smallest that the beam radius can be, at the mirrors’ locations z = ±L/2 = ±2km? From σz = σ0 (1 + z 2 /z02 )1/2 together with z0 = πσ02 /λ, we see that σL/2 is minimized when z0 = L/2 = 2km. If the wavelength is λ = 1.06µm (Nd:YAG p laser light) p as in LIGO, then the beam radii at the waist and at the mirrors are σ = λz /π = λL/2π = 2.6cm, and 0 0 √ σz = 2σ0 = 3.7cm, and the mirrors’ radii of curvature are RL/2 = L = 4km. This was approximately the regime of parameters used for LIGO’s initial GW interferometers, which carried out a two-year-long search for gravitational waves from autumn 2005 to autumn

32 2007. A new generation of GW interferometers, called “Advanced LIGO”, is in preparation. In these GW interferometers, the spot sizes on the mirrors will be made much larger, so as to reduce thermal noise by averaging over a much larger spatial sampling of thermal fluctuations of the mirror surfaces (cf. Sec. 10.5 and Exs. 5.8 and 10.14). How can the spot sizes on the mirrors be enlarged? From Eqs. (7.36b) we see that, in the limit z0 = πσ02 /λ → 0, the mirrors’ radii of curvature approach the cavity half-length, RL/2 → L/2, and the beam radii on the mirrors diverge as σL/2 → Lλ/(2πσ0 ) → ∞. This is the same instability as we discovered, in the geometric optics limit, in Ex. 6.11. Advanced LIGO takes advantage of this instability by moving toward the near-unstable regime, causing the beams on the mirrors to enlarge. The mirrors’ radiiof curvature are set at RL/2 = 2.079km, just 4 per cent above the unstable point R = L/2 = 2km; and Eqs. (7.36b) then tell us that σ0 = 1.16cm, z0 = 0.399km ≪ L/2 = 2km, and σz has been pushed up by nearly a factor two, to σz = 5.93cm. The mirrors are deep into the Fraunhofer, wave-front-spreading region. **************************** EXERCISES Exercise 7.13 Problem: Convolution via Fourier Optics (a) Suppose that you have two thin sheets with transmission functions t = g(x, y) and t = h(x, y), and you wish to compute via Fourier optics the convolution Z Z g ⊗ h(xo , yo) ≡ g(x, y)h(x + xo , y + yo )dxdy . (7.37) Devise a method for doing so using Fourier optics. [Hint: use several lenses and a projection screen with a pinhole through which passes light whose intensity is proportional to the convolution; place the two sheets at strategically chosen locations along the optic axis, and displace one of the two sheets transversely with respect to the other.] (b) Suppose you wish to convolve a large number of different one-dimensional functions simultaneously, i.e. you want to compute Z gj ⊗ hj (xo ) ≡ gj (x)hj (x + xo )dx (7.38) for j = 1, 2, . . .. Devise a way to do this via Fourier optics using appropriately constructed transmissive sheets and cylindrical lenses. Exercise 7.14 Problem: Guided Gaussian Beams Consider a self-focusing optical fiber discussed in Sec. 6.7, in which the refractive index is n(x) = n0 (1 − α2 ̟ 2 )1/2 , (7.39) where ̟ = |x|.

33 (a) Write down the Helmholtz equation in cylindrical polar coordinates and seek an axisymmetric mode for which ψ = R(̟)Z(z) , where R, Z are functions to be determined and z measures distance along the fiber. In particular show that there exists a mode with a Gaussian radial profile that propagates along the fiber without spreading. (b) Compute the group and phase velocities along the fiber for this mode. Exercise 7.15 Exercise: Noise Due to Scattered Light in LIGO In LIGO and other GW interferometers, one potential source of noise is scattered light: When the Gaussian beam in one of LIGO’s cavities reflects off a mirror, a small portion of the light gets scattered toward the walls of the cavity’s vacuum tube. Some of this scattered light can reflect or scatter off the tube wall and then propagate toward the distant mirror, where it scatters back into the Gaussian beam; see Fig. 7.16 (without the baffles that are shown dashed). This is troublesome because the tube wall vibrates due to sound-wave excitations and seismic excitations, and those vibrations put a phase shift on the scattered light. Although the fraction of all the light that scatters in this way is tiny, the phase shift is huge compared to that produced in the Gaussian beam by gravitational waves; and when the tiny amount of scattered light with its huge oscillating phase shift recombines into the Gaussian beam, it produces a net Gausianbeam phase shift that can be large enough to mask a gravitational wave. This exercise will explore some aspects of this scattered noise and its control. (a) The scattering of Gaussian-beam light off the mirror is caused by bumps in the mirror surface (imperfections). Denote by h(x) the height of the mirror surface, relative to the desired shape (a segment of a sphere with radius of curvature that matches the Gaussian beam’s wave fronts). Show that, if the Gaussian-beam field emerging from a perfect mirror is ψ G (x) [Eq. (7.36)] at the mirror plane, then the beam emerging from the actual mirror is ψ ′ (x) = ψ G (x) exp[−i2kh(x)]. The magnitude of the mirror irregularities is very small compared to a wavelength, so |2kh| ≪ 1, and the wave field emerging from the mirror is ψ ′ (x) = ψ G (x)[1 −i2kh(x)]. Explain why the factor 1 does not contribute at all to the scattered light (where does its light go?), so the scattered light field, emerging from the mirror, is ψ S (x) = −iψ G (x)2kh(x) . vacuum tube wall baffles

baffles mirror

Gaussian beam mirror vacuum tube wall

Fig. 7.16: Scattered light in LIGO’s beam tube.

(7.40)

34 (b) Assume that when, arriving at the vacuum-tube wall, the scattered light is in the Fraunhofer region. You will justify this below. Then at the tube wall, the scattered light field is given by the Fraunhofer formula Z S ψ (θ) ∝ ψ G (x)kh(x)eikx·θ . (7.41) Show that the light that hits the tube wall at an angle θ = |θ| to the optic axis arises from irregularities in the mirror that have spatial wavelengths λmirror ∼ λ/θ. The radius of the beam tube is R = 60cm in LIGO and the length of the tube (distance between cavity mirrors) is L = 4km. What is the spatial wavelength of the mirror irregularities which scatter light to the tube wall at distances z ∼ L/2 (which can then reflect or scatter off the wall toward the distant mirror and there scatter back into the Gaussian beam)? Show that for these irregularities, the tube wall is, indeed, in the Fraunhofer region. (Hint: the irregularities have a coherence length of only a few wavelengths λmirror .) (c) In the initial LIGO interferometers, the mirrors’ scattered light consisted of two components: one peaked strongly toward small angles so it hit the distant tube wall, e.g. at z ∼ L/2, and the other roughly isotropically distributed. What was the size of the irregularities that produced the isotropic component? (d) To reduce substantially the amount of scattered light reaching the distant mirror via reflection or scattering from the tube wall, a set of baffles was installed in the tube, in such a way as to hide the wall from scattered light (dashed lines in Fig. 7.16). The baffles have an angle of 35o to the tube wall, so when light hits a baffle, it reflects at a steep angle, ∼ 700 toward the opposite tube wall and after a few bounces gets absorbed. However, a small portion of the scattered light can now diffract off the top of each baffle and then propagate to the distant mirror and scatter back into the main beam. Especially troublesome is the case of a mirror in the center of the beam tube’s cross section, because light that scatters off such a mirror travels nearly the same total distance from mirror to the top of some baffle and then to the distant mirror, independent of the azimuthal angle φ on the baffle at which it diffracts. There is then a danger of coherent superposition of all the scattered light that diffracts off all angular locations around any given baffle—and coherent superposition means a much enlarged net noise. To protect against any such coherence, the baffles in the LIGO beam tubes are serrated, i.e. they have saw-tooth edges, and the heights of the teeth are drawn from a random (Gaussian) probability distribution. The typical tooth heights are large enough to extend through about six Fresnel zones. Questions for which part (e) may be helpful: How wide is each Fresnel zone at the baffle location, and correspondingly, how high must be the typical baffle tooth? By approximately how much do the random serrations reduce the light-scattering noise, relative to what it would be with no serrations and with coherent scattering? (e) To aid you in answering part (d), show that the propagator (point spread function) for light that begins at the center of one mirror, travels to the edge of a baffle [at a

35 radial distance R(φ) from the beam-tube axis, where φ is azimuthal angle around the beam tube, and at a distance ℓ down the tube from the scattering mirror] and that then Propagates to the center of the distant mirror is 1 1 1 ikR2 (φ) dφ , where = + . (7.42) P =∝ exp 2ℓred ℓred ℓ L−ℓ Note that ℓred is the “reduced baffle distance” by analogy with the “reduced mass” in a binary system. One can show that the time-varying part of the scattered-light amplitude (i.e. the part whose time dependence is produced by baffle vibrations) is proportional to this propagator. Explain why this is plausible. Then explain how the baffle serrations, embodied in the φ dependence of R(φ), produce the reduction of scattered-light amplitude in the manner described in part (c).

****************************

7.6

Diffraction at a Caustic

In Sec. 6.5, we described how caustics can be formed in general in the geometric-optics limit—e.g., on the bottom of a swimming pool when the water’s surface is randomly rippled, or behind a gravitational lens. We chose as an example a simple phase changing screen illuminated by a point source and observed from some fixed distance r, and we showed how a pair of images would merge as the transverse distance x of the observer from the caustic decreases to zero. We expanded the phase in a Taylor series, ϕ(s, x) = as3 /3 − bxs, where the coefficients a, b are constant and s is a transverse coordinate in the screen (cf. Fig. 6.11). We were then able to show that the magnification of the images diverged ∝ x−1/2 [Eq. (6.75)] as the caustic was approached, then crashed to zero just past the caustic (the two images disappeared). This singular behavior raised the question of what happens when we take into account the finite wavelength of the wave. We are now in a position to answer this question. We simply use the Helmholtz-Kirchhoff integral (7.6) to write the expression for the amplitude measured at position x in the form Z Z 1 1 iϕ(s,x) ψ(x) ∝ dse = ds(cos ϕ + i sin ϕ) , (7.43) λr λr ignoring multiplicative constants and constant phase factors. The phase ϕ varies rapidly with s at large |s|, so we can treat the limits of integration as ±∞. Because ϕ(s, x) is odd in s, the sin term integrates to zero, and the integral turns out to be the Airy function Z ∞ 1 2π 1 ds cos(as3 /3 − bxs) = Ai(−bx/a1/3 ) . (7.44) ψ∝ λr −∞ λr a1/3 Ai(ξ) is displayed in Fig. 7.17.

36 Ai(ξ) 0.4 0.2

-15

-10

-5

5

ξ

-0.2 -0.4

Fig. 7.17: The Airy Function Ai(z) describing diffraction at a caustic. The argument is z = −bx/a1/3 where x is distance from the caustic and a, b are constants.

The asymptotic behavior of Ai(ξ) is Ai(ξ) ∼ π −1/2 ξ −1/4 sin(2ξ 3/2 /3 + π/4),

ξ → −∞

−2ξ 3/2 /3

∼

e , 2π 1/2 ξ 1/4

ξ→∞.

(7.45)

We see that the amplitude ψ remains finite as the caustic is approached instead of diverging as in the geometric-optics limit, and it decreases smoothly toward zero when the caustic is past, instead of crashing instantaneously to zero. For x > 0 (ξ = −bx/a1/3 < 0; left part of Fig. 7.17), where an observer sees two geometric-optics images, the envelope of ψ diminishes ∝ x−1/4 , so the intensity |ψ|2 decreases ∝ x−1/2 just as in the geometric-optics limit. The peak magnification is ∝ a−2/3 . What is actually seen is a series of bands alternating dark and light with spacing calculable using ∆(2ξ 3/2 /3) = π or ∆x ∝ x−1/2 . At sufficient distance from the caustic, it will not be possible to resolve these bands and a uniform illumination of average intensity will be observed, so we recover the geometric-optics limit. The near-caustic scalings derived above and others in Ex. 7.16, like the geometric-optics scalings [text following Eq. (6.75)] are a universal property of this type of caustic (the simplest caustic of all, the “fold”). There is a helpful analogy, familiar from quantum mechanics. Consider a particle in a harmonic potential well in a very excited state. Its wave function is given in the usual way using Hermite polynomials of large order. Close to the classical turning point, these functions change from being oscillatory to an exponential decay, just like the Airy function (and if we were to expand about the turning point, we would recover Airy functions). What is happening, of course, is that the probability density of finding the particle close to its turning point diverges classically because it is moving vanishingly slowly at the turning point; the oscillations are due to interference between waves associated with the particle moving in opposite directions at the turning point. For light near a caustic, If we consider the motions of photons transverse to the screen, then we have essentially the same problem. The field’s oscillations are associated with interference of the waves associated with the motions of the photons in two geometric-optics

37 beams coming from slightly different directions and thus having slightly different transverse photon speeds. This is our first illustration of the formation of large-contrast interference fringes when only a few beams are combined. We shall meet other examples of such interference in the following chapter. **************************** EXERCISES Exercise 7.16 Problem: Wavelength scaling at a caustic Assume that the phase variation introduced at the screen in Fig. 6.12 is non-dispersive so that the ϕ(s, x) in Eq. (7.43) is ϕ ∝ λ−1 . Show that the peak magnification of the interference fringes at the caustic scales with wavelength ∝ λ−4/3 . Also show that the spacing of the fringes at a given observing position is ∝ λ.

****************************

38 Box 7.2 Important Concepts in Chapter 7 • • • • •

Helmholtz equation for a propagating, monochromatic wave – Eq. (7.1b) Helmholtz-Kirchhoff integral – Eq. (7.4) Complex transmission function – Eq. (7.5) Helmholtz-Kirchilff for wave propagating through an aperture – Eqs. (7.6), (7.7) Fresnel and Fraunhofer Diffraction compared: – Fresnel length and criteria for Fraunhofer and Fresnel Regions – Sec. 7.2.2 – Qualitative forms of diffraction in Fresnel and Fraunhofer regions – Sec. 7.2.2 – Wavefront spreading, at angle θ ∼ λ/a, in Fresnel regions – Sec. 7.2.2

• Fraunhofer Diffraction – Sec. 7.3 – Diffracted field as Fourier transform of transmission function t(θ) – Eq. (7.11a) – Diffracted intensity F ∝ |t(θ)|2 – Eq. (7.11b) – Diffraction patterns for a slit and a circular aperture – bottom curve in Fig. 7.3, Eqs. (7.12), Sec. 7.3.2 – Use of convolution theorem to analyze diffraction grating – Sec. 7.3.1 – Babinet’s principle – Sec. 7.3.3 • Fresnel Diffraction – Sec. 7.4 – As integral over the aperture with quadratically varying phase – Eqs. 7.22 – For rectangular aperture, slit, and straight edge, in terms of Fresnel integrals and Cornu spiral – Secs. 7.4.1 and 7.4.3. • Paraxial Fourier Optics – Point spread functions, as propagators down the optic axis – Sec. 7.5.2 – Thin lens: field at focal plane as Fourier transform of source field (Fraunhofer region brought to focus) – Eq. (7.30) – Thin lens: field at image plane as inverted and magnified source field – Eq. (7.32) – Image and signal processing by optical techniques (e.g., phase contrast microscope) – last paragraph of Sec. 7.5.3, plus Sec. 7.5.4 • Gaussian beams – Sec. 7.5.5 – Evolution of beam radius and phase front curvature – Eqs. (??) – Manipulating Gaussian beams with lenses and mirrors – Sec. 7.5.5 • Diffraction at a caustic: Airy pattern – Sec. 7.6

39

Bibliographic Note Hecht (1998) has an excellent treatment of diffraction at roughly the same level as this chapter, but much more detailed. For a more advanced treatment, including mixing of polarizations by interaction of an electromanetic wave with the edges of apertures and other objects, see Born and Wolf (1999). Other good texts are listed in the bibliography:

Bibliography Berry, M. V. & Upstill, C. 1980 Prog. Optics 18, 257 Born, M. & Wolf, E. 1999 Principles of Optics, Seventh Edition, Cambridge: Cambridge University Press Goodman, J. W. Introduction to Fourier Optics, New York: McGraw-Hill Hecht, E. 1998 Optics, Third Edition, New York: Addison Wesley Longhurst, R. S. 1973 Geometrical and Physical Optics, London: Longmans Welford, W. T. 1988 Optics, Oxford: Oxford University Press

Contents 8 Interference 8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Young’s Slits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Interference with an Extended Source: van Cittert-Zernike Theorem . 8.2.3 More General Formulation of Spatial Coherence; Lateral Coherence Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Generalization to two dimensions . . . . . . . . . . . . . . . . . . . . 8.2.5 Michelson Stellar Interferometer . . . . . . . . . . . . . . . . . . . . . 8.2.6 Temporal Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.7 Michelson Interferometer and Fourier-Transform Spectroscopy . . . . 8.2.8 Degree of Coherence; Relation to Theory of Random Processes . . . . 8.3 Radio Telescopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Two-Element Radio Interferometer . . . . . . . . . . . . . . . . . . . 8.3.2 Multiple Element Radio Interferometer . . . . . . . . . . . . . . . . . 8.3.3 Closure Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4 Angular Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Etalons and Fabry-Perot Interferometers . . . . . . . . . . . . . . . . . . . . 8.4.1 Multiple Beam Interferometry; Etalons . . . . . . . . . . . . . . . . . 8.4.2 Fabry-Perot Interferometer . . . . . . . . . . . . . . . . . . . . . . . . 8.4.3 Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 T2 Laser Interferometer Gravitational Wave Detectors . . . . . . . . . . . 8.6 T2 Intensity Correlation and Photon Statistics. . . . . . . . . . . . . . . .

i

1 1 2 2 4 7 8 9 10 12 14 17 17 18 18 19 20 20 25 26 30 36

Chapter 8 Interference Version 0808.2.K.pdf, 22 November 2008. [Same a 0808.1.K.pdf except for revised Fig. 8.1 and small clarifications of the discussion of this figure in the text, correction of several typos, and addition of Bibliographic Note and Box 8.2. There are not changes in any exercises.] Please send comments, suggestions, and errata via email to [email protected] and to [email protected], or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 8.1 Reader’s Guide • This chapter depends substantially on – Secs. 7.2, 7.3 and 7.5.5 of Chap. 7 – The Wiener-Khintchine theorem for random processes, Sec. 5.3.3 of Chap. 5. • The concept of coherence length or coherence time, as developed in this chapter, will be used in Chaps. 8, 14, 15 and 22 of this book. • Interferometry as developed in this chapter, especially in Sec. 8.5, is a foundation for the discussion of gravitational-wave detection in Chap. 26. • Nothing else in this book relies substantially on this chapter.

8.1

Overview

In the last chapter, we considered superpositions of waves that pass through a (typically large) aperture. The foundation for our analysis was the Helmholtz-Kirchoff expression for the field at a chosen point P as a sum of contributions from all points on a closed surface surrounding P. The spatially varying field pattern resulting from this superposition of many different contributions was called diffraction. 1

2 In this chapter, we continue our study of superposition, but for the more special case where only two or at most several discrete beams are being superposed. For this special case one uses the term interference rather than diffraction. Interference is important in a wide variety of practical instruments designed to measure or utilize the spatial and temporal structures of electromagnetic radiation. However interference is not just of practical importance. Attempting to understand it forces us to devise ways of describing the radiation field that are independent of the field’s origin and independent of the means by which it is probed; and such descriptions lead us naturally to the fundamental concept of coherence (Sec. 8.2). The light from a distant, monochromatic point source is effectively a plane wave; we call it “perfectly coherent” radiation. In fact, there are two different types of coherence present: lateral or spatial coherence (coherence in the angular structure of the radiation field), and temporal or longitudinal coherence (coherence in the field’s temporal structure, which clearly must imply something also about its frequency structure). We shall see in Sec. 8.2 that for both types of coherence there is a measurable quantity, called the degree of coherence, that is the Fourier transform of either the angular intensity distribution or the spectrum of the radiation. Interspersed with our development of the theory of coherence are an application to the stellar interferometer (Sec. 8.2.5), by which Michelson measured the diameters of Jupiter’s moons and several bright stars using spatial coherence; and applications to a Michelson interferometer and its practical implementation in a Fourier-transform spectrometer (Sec. 8.2.7), which use temporal coherence to measure electromagnetic spectra, e.g. the spectrum of the cosmic microwave background radiation (CMB). After developing our full formalism for coherence, we shall go on in Sec. 8.3 to apply it to the operation of radio telescopes, which function by measuring the spatial coherence of the radiation field. In Sec. 8.4 we shall turn to multiple beam interferometry, in which incident radiation is split many times into several different paths and then recombined. A simple example is a Fabry-Perot etalon made from two parallel, highly reflecting surfaces. A cavity resonator (e.g. in a laser), which traps radiation for a large number of reflections, is essentially a large scale etalon. These principles find exciting application in laser interferometer gravitationalwave detectors, discussed in Sec. 8.5. In these devices, two very large etalons are used to trap laser radiation for a few tens of milliseconds, and the light beams emerging from the two etalons are then interfered with each other. Gravitational-wave-induced changes in the lengths of the etalons are monitored by observing time variations in the interference. Finally, in Sec. 8.6, we shall turn to the intensity interferometer, which although it has not proved especially powerful in application, does illustrate some quite subtle issues of physics and, in particular, highlights the relationship between the classical and quantum theories of light.

8.2 8.2.1

Coherence Young’s Slits

The most elementary example of interference is provided by Young’s slits. Suppose two long, narrow, parallel slits are illuminated coherently by monochromatic light from a distant

3 F =

max

F

max

F α θ α

a

max

θ F

θ

min

θ θ arg (

(a)

min

F

F

a

− Fmin +F

(b)

(c)

θ )

(d)

Fig. 8.1: (a) Young’s Slits. (b) Interference fringes observed in a transverse plane [Eq. (8.1b)]. (c) The propagation direction of the incoming waves is rotated to make an angle α to the optic axis; as a result, the angular positions of the interference fringes in drawing (b) are shifted by ∆θ = α [Eq. (8.3); not shown]. (d) Interference fringes observed from an extended source [Eq. (8.8)].

source that lies on the perpendicular bisector of the line joining the slits (the optic axis), so an incident wavefront reaches the slits simultaneously [Fig. 8.1(a)]. This situation can be regarded as having only one lateral dimension. The waves from the slits (effectively, two one-dimensional beams) fall onto a screen in the distant, Fraunhofer region, and there they interfere. The Fraunhofer interference pattern observed at a point P, whose position is specified using polar coordinates r, θ, is proportional to the spatial Fourier transform of the transmission function [Eq. (7.11a)]. If the slits are very narrow, we can regard the transmission function as two δ-functions, separated by the slit spacing a, and its Fourier transform will be kaθ −ikaθ/2 ikaθ/2 ψP (θ) ∝ e +e ∝ cos , (8.1a) 2 where k = 2π/λ is the light’s wave number. (That we can sum the wave fields from the two slits in this manner is a direct consequence of the linearity of the underlying wave equation.) The energy flux (energy per unit time crossing a unit area) at P will be FP (θ) ∝ |ψ|2 c ∝ cos2 (kaθ/2) ;

(8.1b)

cf. Fig. 8.1(b). The alternating regions of dark and bright illumination in this flux distribution are known as interference fringes. Notice that the flux falls to zero between the bright fringes. This will be very nearly so even if (as is always the case in practice) the field is very slightly non-monochromatic, i.e. even if the field hitting the slits has the form ei[ωo t+δϕ(t)] , where ωo = c/k and δϕ(t) is a phase that varies randomly on a timescale extremely long compared to 1/ωo.1 Notice also that there are many fringes, symmetrically disposed with 1

More precisely, if δϕ(t) wanders by ∼ π on a timescale τc ≫ 2π/ωo (the waves’ coherence time), then the waves are contained in a bandwidth ∆ωo ∼ 2π/τc ≪ ωo centered on ωo , k is in a band ∆k ∼ k∆ω/ωo , and the resulting superposition of precisely monochromatic waves has fringe minima with fluxes Fmin that are smaller than the maxima by Fmin /Fmax ∼ (π∆ω/ωo )2 ≪ 1. (One can see this in order of magnitude by superposing the flux (8.1b) with wave number k and the same flux with wave number k + ∆k.) Throughout

4 respect to the optic axis. [If we were to take account of the finite width w ≪ a of the two slits, then we would find, by contrast with Eq. (8.1b) that the actual number of fringes is finite, in fact of order a/w; cf. Fig. 7.5 and associated discussion.] This type of interferometry is sometimes known as interference by division of the wave front. This Young’s slits experiment is, of course, familiar from quantum mechanics, where it is often used as a striking example of the non-particulate behavior of electrons.2 Just as for electrons, so also for photons, it is possible to produce interference fringes even if only one photon is in the apparatus at any time, as was demonstrated in a famous experiment performed by G. I. Taylor in 1909. However, our concerns in this chapter are with the classical limit, where many photons are present simultaneously and their fields can be described by Maxwell’s equations. In the next subsection we shall depart from the usual quantum mechanical treatment by asking what happens to the fringes when the source of radiation is spatially extended.

8.2.2

Interference with an Extended Source: van Cittert-Zernike Theorem

We shall approach the topic of extended sources in steps. Our first step was taken in the last subsection, where we dealt with an idealized, single, incident plane wave, such as might be produced by an ideal, distant laser. We have called this type of radiation perfectly coherent, which we have implicitly taken to mean that the field oscillates with a fixed frequency ωo and a randomly but very slowly varying phase δϕ(t) (see footnote 1), and thus, for all practical purposes, there is a time-independent phase difference between any two points within the region under consideration. As our second step, keep the incoming waves perfectly coherent and perfectly planar, but change their incoming direction in Fig. 8.1 so it makes a small angle α to the optic axis (and correspondingly its wave fronts make an angle α to the plane of the slits) as shown in Fig. 8.1(c). Then the distribution of energy flux in the Fraunhofer diffraction pattern on the screen will be modified to ka(θ − α) −ika(θ−α)/2 +ika(θ−α)/2 2 2 FP (θ) ∝ |e +e | ∝ cos 2 ∝ {1 + cos[ka(θ − α)]} . (8.2) Notice that, as the direction α of the incoming waves is varied, the locations of the bright and dark fringes change by ∆θ = α, but the fringes remain fully sharp (their minima remain essentially zero; cf. footnote 1). Thus, the positions of the fringes carry information about the direction to the source. Now, in our third and final step, we will deal with an extended source, i.e. one whose radiation comes from a finite range of angles α, with (for simplicity) |α| ≪ 1. We shall assume this section, until Eq. (8.15) we presume that the waves have such a small bandwidth (such a long coherence time) that this Fmin /Fmax is completely negligible; for example, 1 − Fmin /Fmax is far closer to unity than any fringe visibility V [Eq. (8.8) below] that is of interest to us. This can be achieved in practice by either controlling the waves’ source, or by band-pass filtering the measured signals just before detecting them. 2 See, e.g., Chapter 1 of Volume III of Feynman, Leighton, and Sands (1965).

5 that the source is monochromatic (and in practice we can make it very nearly monochromatic by band-pass filtering the waves just before detection). However, in keeping with how all realistic monochromatic sources (including band-pass filtered sources) behave, we shall give it a randomly fluctuating phase δϕ(t) (and amplitude A), and shall require that the timescale on which the phase wanders (the waves’ coherence time) be very long compared to the waves’ period 2π/ωo ; cf. footnote 1. We shall also assume that, as for almost all realistic sources, the fluctuating phases in the waves from different directions are completely uncorrelated. To make this precise, we write the field in the form3 Z i(kz−ωo t) ψ(α, t)eikαx dα , (8.3) Ψ(x, z, t) = e where ψ(α, t) = Ae−iδϕ is the slowly wandering complex amplitude of the waves from direction α. When we consider the total flux arriving at a given point (x, z) from two different directions α1 and α2 and average it over times long compared to the waves’ coherence time, then we lose all interference between the two contributions: |ψ(α1 , t) + ψ(α2 , t)|2 = |ψ(α1 , t)|2 + |ψ(α2 , t)|2 .

(8.4)

Such radiation is said to be incoherent in the incoming angle α, and we say that the contributions from different directions superpose incoherently. This is just a fancy way of saying that their intensities (averaged over time) add linearly. The angularly incoherent light from our extended source is sent through two Young’s slits and produces fringes on a screen in the distant Fraunhofer region. We assume that the coherence time for the light from each source point is very long compared to the difference in light travel time to the screen via the two different slits. Then the light from each source point in the extended source forms the sharp interference fringes described by Eq. (8.2). However, because contributions from different source directions add incoherently, the flux distribution on the screen is a linear sum of the fluxes from all the source points: Z FP (θ) ∝ dαI(α){1 + cos[ka(θ − α)]} (8.5) Here I(α)dα ∝ |ψ(α, t)|2dα is the flux incident on the plane of the slits from the infinitesimal range dα of directions, i.e. I(α) is the radiation’s intensity 4 (its energy per unit time falling onto a unit area and coming from a unit angle). The remainder of the integrand, 1+cos[ka(θ− α)], is the Fraunhofer diffraction pattern (8.2) for coherent radiation from direction α. We presume that the range of angles present in the waves, ∆α, is large compared to their fractional bandwidth ∆α ≫ ∆ω/ωo ; so, whereas the finite but tiny bandwidth produced negligible smearing out of the interference fringes (footnote 1), the finite but small range of directions may produce significant smearing, i.e. the minima of FP (θ) might not be very 3

As in Chap. 7, we denote the full field by Ψ and reserve ψ to denote the portion of the field from which a monochromatic part e−iωo t or ei(kz−ωo t) has been factored out. 4 By contrast with Chap. 7, where we used “intensity” to mean energy flux, in this chapter we shall restrict it to mean energy flux per unit angle or solid angle.

6 sharp. We quantify the fringes’ non-sharpness and their locations by writing the slit-produced flux distribution (8.5) in the form FP (θ) = FS [1 + ℜ{γ⊥ (ka)e−ikaθ }] where FS ≡

Z

dαI(α)

(8.6a)

(8.6b)

(subscript S for “source”) is the total flux arriving at the slits from the source, and γ⊥ (ka) ≡

R

dαI(α)eikaα FS

(8.7a)

is known as the radiation’s degree of spatial (or lateral) coherence. The phase of γ⊥ determines the angular locations of the fringes; its modulus determines their depth (the amount of their smearing due to the source’s finite angular size). The nonzero value of γ⊥ (ka) reflects the fact that there is some amount of relative coherence between the waves arriving at the two slits, whose separation is a. The radiation can have this finite spatial coherence, despite its complete lack of angular coherence, because each angle contributes coherently to the field at the two slits. The lack of coherence for different angles reduces the net spatial coherence (smears the fringes), but does not drive the coherence all the way to zero (does not completely destroy the fringes). Eq. (8.7a) says that the degree of spatial coherence of the radiation from an extended, angularly incoherent source is the Fourier transform of the source’s angular intensity pattern. Correspondingly, if one knows the degree of spatial coherence as a function of the (dimensionless) distance ka, from it one can reconstruct the source’s angular intensity pattern by Fourier inversion: Z d(ka) I(α) = FS (8.7b) γ⊥ (ka)e−ikaα . 2π The two Fourier relations (8.7a), (8.7b) are called the van Cittert-Zernike Theorem. In Ex. 8.7, we shall see that this theorem is a complex-variable version of Chap. 5’s WienerKhintchine Theorem for random processes. Because of its relationship to the source’s angular intensity pattern, the degree of spatial coherence is of great practical importance. For a given choice of ka (a given distance between the slits), γ⊥ is a complex number that one can read off the interference fringes of Eq. (8.6a) and Fig. 8.1(d) as follows: Its modulus is |γ⊥ | ≡ V =

Fmax − Fmin Fmax + Fmin

(8.8)

where Fmax and Fmin are the maximum and minimum values of the flux FP on the screen; and its phase arg(γ⊥ ) is ka times the displacement ∆θ of the centers of the bright fringes from the optic axis. The modulus is called the fringe visibility, or simply the visibility, because of its measuring the fractional contrast in the fringes [Eq. (8.8)], and this name is the reason

7 for the symbol V . Analogously, the complex quantity γ⊥ (or a close relative) is sometimes known as the complex fringe visibility. Notice that V can lie anywhere in the range from zero (no contrast; fringes completely undetectable) to unity (monochromatic plane wave; contrast as large as possible). When the phase arg(γ⊥ ) of the complex visibility (degree of coherence) is zero, there is a bright fringe precisely on the optic axis. This will be the case, e.g., for a source that is symmetric about the optic axis. If the symmetry point of such a source is gradually moved off the optic axis by an angle δα, the fringe pattern will shift correspondingly by δα = δθ, and this will show up as a corresponding shift in the argument of the fringe visibility, arg(γ⊥ ) = kaδα. The above analysis shows that Young’s slits are nicely suited to measuring both the modulus and the phase of the complex fringe visibility (the degree of spatial coherence) of the radiation from an extended source.

8.2.3

More General Formulation of Spatial Coherence; Lateral Coherence Length

It is not necessary to project the light onto a screen to determine the contrast and angular positions of the fringes. For example, if we had measured the field at the locations of the two slits, we could have combined the signals electronically and cross correlated them numerically to determine what the fringe pattern would be with slits. All we are doing with the Young’s slits is sampling the wave field at two different points, which we now shall label 1 and 2. Observing the fringes corresponds to adding a phase ϕ (= kaθ) to the field at one of the points and then adding the fields and measuring the flux ∝ |ψ1 + ψ2 eiϕ |2 averaged over many periods. Now, since the source is far away, the rms value of the wave field will be the same at the two slits, |ψ1 |2 = |ψ2 |2 ≡ |ψ|2 . We can therefore express this time averaged flux in the symmetric-looking form FP (ϕ) ∝ (ψ1 + ψ2 eiϕ )(ψ1∗ + ψ2∗ e−iϕ ) ! ψ1 ψ2∗ −iϕ ∝ 1+ℜ . e |ψ|2

(8.9)

Here a bar denotes an average over times long compared to the coherence times for ψ1 and ψ2 . Comparing with Eq. (8.6a) and using ϕ = kaθ, we identify γ⊥12 =

ψ1 ψ2∗ |ψ|2

(8.10)

as the degree of spatial coherence in the radiation field between the two points 1, 2. Equation (8.10) is the general definition of degree of spatial coherence. Equation (8.6a) is the special case for points separated by a lateral distance a. If the radiation field is strongly correlated between the two points, we describe it as having strong spatial or lateral coherence. Correspondingly, we shall define a field’s lateral coherence length l⊥ as the linear size of a region over which the field is strongly correlated (has V = |γ⊥ | ∼ 1). If the angle subtended by the source is ∼ δα, then by virtue of the van

8 Cittert-Zernike theorem (8.7) and the usual reciprocal relation for Fourier transforms, the radiation field’s lateral coherence length will be l⊥ ∼

λ 2π = . k δα δα

(8.11)

This relation has a simple physical interpretation. Consider two beams of radiation coming from opposite sides of the brightest portion of the source. These beams will be separated by the incoming angle δα. As one moves laterally in the plane of the Young’s slits, one will see a varying relative phase delay between these two beams. The coherence length l⊥ is the distance over which the variations in that relative phase delay are of order 2π, k δα l⊥ ∼ 2π.

8.2.4

Generalization to two dimensions

We have so far just considered a one-dimensional intensity distribution I(α) observed through the familiar Young’s slits. However, most sources will be two dimensional, so in order to investigate the full radiation pattern, we should allow the waves to come from 2-dimensional angular directions α so Z i(kz−ωo t) ψ(α, t)eikα·x d2 α ≡ ei(kz−ωo t) ψ(x, t) (8.12a) Ψ=e [where ψ(α, t) is slowly varying], and we should use several pairs of slits aligned along different directions. Stated more generally, we should sample the wave field (8.12a) at a variety of points separated by a variety of two-dimensional vectors a transverse to the direction of wave propagation. The complex visibility (degree of spatial coherence) will then be a function of ka, γ⊥ (ka) =

ψ(x, t)ψ ∗ (x + a, t) |ψ|2

,

(8.12b)

and the van Cittert-Zernike Theorem (8.7) [actually the Wiener-Khintchine theorem in disguise; Ex. 8.7] will take the two-dimensional form γ⊥ (ka) =

I(α) = FS

Z

R

dΩα I(α)eika·α , FS

d2 (ka) γ⊥ (ka)e−ika·α. (2π)2

(8.13a)

(8.13b)

Here I(α) ∝ |ψ(α, t)|2 is the source’s Rintensity (energy per unit time crossing a unit area from a unit solid angle dΩα ; FS = dΩα I(α) is the source’s total energy flux; and d2 (ka) = k 2 dΣa is a (dimensionless) surface area element in the lateral plane. **************************** EXERCISES

Fringes

9

a

Fig. 8.2: Schematic Illustration of a Michelson Stellar Interferometer.

Exercise 8.1 Problem: Single Mirror Interference X-rays with wavelength 8.33˚ A (0.833 nm) coming from a point source can be reflected at shallow angles of incidence from a plane mirror. The direct ray from a point source to a detector 3m away interferes with the reflected ray to produce fringes with spacing 25µm. Calculate the distance of the X-ray source from the mirror plane. Exercise 8.2 Problem: Lateral Coherence of solar radiation How closely separated must a pair of Young’s slits be to see strong fringes from the sun (angular diameter ∼ 0.5◦ ) at visual wavelengths? Suppose that this condition is just satisfied and the slits are 10µm in width. Roughly how many fringes would you expect to see? Exercise 8.3 Problem: Degree of Coherence for a Source with Gaussian Intensity Distribution A circularly symmetric light source has an intensity distribution I(α) = I0 exp(−α2 /2α02 ), where α is the angular radius measured from the optic axis. Compute the degree of spatial coherence. What is the lateral coherence length? What happens to the degree of spatial coherence and the interference fringe pattern if the source is displaced from the optic axis?

****************************

8.2.5

Michelson Stellar Interferometer

The classic implementation of Young’s slits for measuring spatial coherence is Michelson’s stellar interferometer, which Albert A. Michelson used for measuring the angular diameters

10 of Jupiter’s moons and some bright stars in 1920 and a bit earlier. The light is sampled at two small mirrors separated by a variable distance a and then reflected onto a telescope to form interference fringes; cf. Fig. 8.2. (As we have emphasized, the way in which the fringes are formed is unimportant; all that matters is the two locations where the light is sampled, i.e. the first two mirrors in Fig. 8.2.) It is found that as the separation a between the mirrors is increased, the fringe visibility V decreases. If we model a star (rather badly in fact) as a circular disk of uniform brightness, then the degree of spatial coherence of the light from it is given, according to Eqs. (8.13a) and (7.18), as γ⊥ = 2jinc(kaαr )

(8.14)

where αr is the angular radius of the star and jinc(ξ) = J1 (ξ)/ξ. Michelson found that for the star Betelgeuse observed at wavelength λ = 570nm, the fringes disappeared when a ∼ 3m. Associating this with the first zero of the function jinc(x), Michelson inferred that the angular radius of Betelgeuse is ∼ 0.02arc seconds, which at Betelgeuse’s (parallaxmeasured) distance of 200pc (600 lyr) corresponds to a physical radius ∼ 300 times larger than that of the Sun, a reasonable value in light of the modern theory of stellar structure. This technique only works for big, bright stars and is very difficult to use because fluctuations in the atmosphere cause the fringes to keep moving about.

8.2.6

Temporal Coherence

In addition to the degree of spatial (or lateral) coherence, which measures the correlation of the field transverse to the direction of wave propagation, we can also measure the degree of temporal coherence, also called the degree of longitudinal coherence. This describes the correlation at a given time at two points separated by a distance s along the direction of propagation. Equivalently, it measures the field sampled at a fixed position at two times differing by τ = s/c. When (as in our discussion of spatial coherence) the waves are nearly monochromatic so the field arriving at the fixed position has the form Ψ = ψ(t)e−iωo t , then the degree of longitudinal coherence is complex and has a form completely analogous to the transverse case: γk (τ ) =

ψ(t)ψ ∗ (t + τ ) |ψ|2

for nearly monochromatic radiation.

(8.15)

Here the average is over sufficiently long times t for the averaged value to settle down to an unchanging value. When studying temporal coherence, one often wishes to deal with waves that contain a wide range of frequencies — e.g., the nearly Planckian (black-body) cosmic microwave radiation emerging from the very early universe (Ex. 8.5). In this case, one should not factor any e−iωo t out of the field Ψ, and one gains nothing by regarding Ψ(t) as complex, so the longitudinal coherence γk (τ ) =

Ψ(t)Ψ(t + τ ) |Ψ|2

for real Ψ and broad-band radiation

(8.16)

11 is also real. We shall use this real γk throughout this subsection and the next. For τ = 0 this degree of temporal coherence is unity. As τ is increased, γk typically remains near unity until some critical value τc is reached, and then begins to fall off toward zero. The critical value τc , the longest time over which the field is strongly coherent, is the coherence time, of which we have already spoken: If the wave is roughly monochromatic so Ψ(t) ∝ cos[ωo t + δϕ(t)], with ωo fixed and the phase δϕ randomly varying in time, then it should be clear that the mean time for δϕ to change by an amount of order unity is, indeed, the coherence time τc at which γk begins to fall significantly. The uncertainty principle dictates that a field with coherence time τc , when Fourier analyzed in time, must contain significant power over a bandwidth ∆ω ∼ 1/τc . Correspondingly, if we define the field’s longitudinal coherence length by lk ≡ cτc ,

(8.17)

then lc for broad-band radiation will be only a few times the peak wavelength, but for a narrow spectral line of width ∆λ, it will be λ2 /∆λ. These relations between the coherence time or longitudinal coherence length and the field’s spectrum are order-of-magnitude consequences not only of the uncertainty relation, but also of the temporal analog of the van Cittert-Zernike Theorem. In that analog (which can be derived by the same methods as we used in the transverse spatial domain), the degree of lateral coherence γ⊥ is replaced by the degree of temporal coherence γk , and the angular intensity distribution I(α) (distribution of energy over angle) is replaced by the field’s spectrum Fω (ω), the energy crossing a unit area per unit time and per unit angular frequency ω.5 The theorem takes the explicit form R∞

dωFω (ω)eiωτ 2 γk (τ ) = −∞ = FS

R∞ 0

dωFω (ω) cos ωτ Fs

for real Ψ(t), valid for broad-band radiation, (8.18a)

and Fω (ω) = FS

Z

∞ −∞

dτ γk (τ )e−iωτ = 2Fs 2π

Z

0

∞

dτ γk (τ ) cos ωτ . 2π

(8.18b)

[The normalization of our Fourier transform and the sign of its exponential are those conventionally used in optics, and differ from those used in the theory of random processes (Chap. 5). Also, because we have chosen Ψ to be real, Fω (−ω) = Fω (+ω) and γk (−τ ) = γk (+τ ).] One can measure γk by combining the radiation from two points displaced longitudinally to produce interference fringes just as we did in measuring spatial coherence. This type of interference is sometimes called interference by division of the amplitude, in contrast with “interference by division of the wave front” for a Young’s-slit-type measurement of lateral spatial coherence (next to the last paragraph of Sec. 8.2.1). 5

Note that the spectrum is simply related to the spectral density of the field: If the field Ψ is so normalized that the energy density is U = β Ψ,t Ψ,t with β some constant, then Fω (ω) = βc/(2π)SΨ (f ), with f = ω/2π.

Light Source

be a

m

sp lit

te

r

12

Interference Fringes

Fig. 8.3: Michelson Interferometer.

8.2.7

Michelson Interferometer and Fourier-Transform Spectroscopy

The classic instrument for measuring the degree of longitudinal coherence is the Michelson interferometer of Fig. 8.3 (not to be confused with the Michelson stellar interferometer). In the simplest version, incident light (e.g. in the form of a Gaussian beam; Sec. 7.5.5) is split by a beam splitter into two beams, which are reflected off different plane mirrors and then recombined. The relative positions of the mirrors are adjustable so that the two light paths can have slightly different lengths. (An early version of this instrument was used in the famous Michelson-Morley experiment.) There are two ways to view the fringes. One way is to tilt one of the reflecting mirrors slightly so there is a range of path lengths in one of the arms. Light and dark interference bands (fringes) can then be seen across the circular cross section of the recombined beam. The second method is conceptually more direct but requires aligning the mirrors sufficiently accurately so the phase fronts of the two beams are parallel after recombination and the recombined beam has no banded structure. The end mirror in one arm of the interferometer is then slowly moved backward or forward, and as it moves, the recombined light slowly changes from dark to light to dark and so on. It is interesting to interpret this second method in terms of the Doppler shift. One beam of light undergoes a Doppler shift on reflection off the moving mirror. There is then a beat wave produced when it is recombined with the unshifted radiation of the other beam. Whichever method is used (tilted mirror or longitudinal motion of mirror), the visibility of the interference fringes measures the beam’s degree of longitudinal coherence, which is related to the spectrum by Eqs. (8.18). Let us give an example. Suppose we observe a spectral line with rest frequency ω0 that is broadened by random thermal motions of the emitting atoms so the line profile is (ωo − ω)2 Fω ∝ exp − . (8.19a) 2(∆ω)2 The width of the line is given by the formula for the Doppler shift, ∆ω ∼ ω0 (kB T /mc2 )1/2 , where T is the temperature of the emitting atoms and m is their mass. (We ignore other

13 sources of line broadening, e.g. natural broadening and pressure broadening, which actually dominate under normal conditions.) For example with Hydrogen at 103 K, ∆ω ∼ 10−5 ω0 . By Fourier transforming this line profile, using the well known result that the Fourier transform of a Gaussian is another Gaussian, and invoking the fundamental relations (8.18) between the spectrum and temporal coherence, we obtain 2 τ (∆ω)2 cos ωo τ . (8.19b) γk (τ ) = exp − 2 If we had used the nearly monochromatic formalism with the field written as Ψ = ψ(t)e−iωo t , then we would have obtained 2 τ (∆ω)2 iωo τ γk (τ ) = exp − e , (8.19c) 2 the real part of which is our broad-band formalism’s γk . In either case, γk oscillates with frequency ωo , and the amplitude of this oscillation is the fringe visibility V : 2 τ (∆ω)2 V = exp − . (8.19d) 2 The variation V (τ ) of this visibility with lag time τ is sometimes called an interferogram. For time lags τ ≪ (∆ω)−1 , the line appears to be monochromatic and fringes with unit visibility should be seen. However for lags τ & (∆ω)−1, the fringe visibility will decrease exponentially with τ 2 . In our example, if the rest frequency is ω0 ∼ 3 × 1015 rad s−1 , then the longitudinal coherence length will be lk = cτc ∼ 10mm and no fringes will be seen when the radiation is combined from points separated by much more than this distance. This procedure is an example of Fourier transform spectroscopy, in which, by measuring the degree of temporal coherence γk (τ ) and then Fourier tranforming it, one infers the shape of the radiation’s spectrum, or in this case, the width of a specific spectral line. When (as in Ex. 8.5) the waves are very broad band, the degree of longitudinal coherence γk (τ ) will not have the form of a sinusoidal oscillation (regular fringes) with slowly varying amplitude (visibility). Nevertheless, the broad-band van Cittert-Zernike theorem (8.18) still guarantees that the spectrum will be the Fourier transform of the coherence γk (τ ), which can be measured by a Michelson interferometer. **************************** EXERCISES Exercise 8.4 Problem: Longitudinal coherence of radio waves An FM radio station has a carrier frequency of 91.3 MHz and transmits heavy metal rock music. Estimate the coherence length of the radiation.

14 Exercise 8.5 Problem: COBE Measurement of the Cosmic Microwave Background Radiation An example of a Michelson interferometer is the Far Infrared Absolute Spectrophotometer (FIRAS) carried by the Cosmic Background Explorer Satellite (COBE). COBE studied the spectrum and anisotropies of the cosmic microwave background radiation (CMB) that emerged form the very early, hot phase of our universe’s expansion (Chap. 27). One of the goals of the COBE mission was to see if the CMB spectrum really had the shape of 2.7K black body (Planckian) radiation, or if it was highly distorted as some measurements made on rocket flights had suggested. COBE’s spectrophotometer used Fourier transform spectroscopy to meet this goal: it compared accurately the degree of longitudinal coherence γk of the CMB radiation with that of a calibrated source on board the spacecraft, which was known to be a black body at about 2.7K. The comparison was made by alternately feeding radiation from the microwave background and radiation from the calibrated source into the same Michelson interferometer and comparing their fringe spacings. The result (Mather et. al. 1994) was that the background radiation has a spectrum that is Planckian with temperature 2.726 ± 0.010K over the wavelength range 0.5–5 mm, in agreement with simple cosmological theory that we shall explore in the last chapter of this book. (a) Suppose that the CMB had had a Wien spectrum Fω ∝ |ω|3 exp(−~|ω|/kT ) where T = 2.74K. Show that the visibility of the fringes would have been V = |γk | ∝

|s4 − 6s20 s2 + s40 | (s2 + s20 )4

(8.20)

where s = cτ is longitudinal distance, and calculate a numerical value for s0 . (b) Compute the interferogram V (τ ) for a Planck function either analytically (perhaps with the help of a computer) or numerically using a Fast Fourier Transform. Compare graphically the interferogram for the Wien and Planck spectra. ****************************

8.2.8

Degree of Coherence; Relation to Theory of Random Processes

Having separately discussed spatial and temporal coherence, we now can easily perform a final generalization and define the full degree of coherence of the radiation field between two points separated both laterally by a vector a and longitudinally by a distance s, or equivalently by a time τ = s/c. If we take the time-separation viewpoint, so x1 and x2 have a purely transverse spatial separation a = x2 − x1 , and if we restrict ourselves to nearly monochromatic waves and use the complex formalism so the waves are written as Ψ = ei(kz−ωo t) ψ(x, t) [Eq. (8.12a)], then γ12 (ka, τ ) ≡

ψ(x1 , t)ψ ∗ (x1 + a, t + τ ) [|ψ(x1 , t)|2 |ψ(x1 + a, t)|2 ]1/2

=

ψ(x1 , t)ψ ∗ (x1 + a, t + τ ) |ψ|2

.

(8.21)

15 In the denominator of the second expression we have used the fact that, because the source is far away, |ψ|2 is independent of the spatial location at which it is evaluated, in the region of interest. Consistent with the definition (8.21), we can define a volume of coherence as the product of the longitudinal coherence length lk = cτc and the square of the transverse 2 coherence length l⊥ . The three-dimensional version of the van Cittert-Zernike theorem relates the complex degree of coherence (8.21) to the radiation’s specific intensity, Iω (α, ω), i.e. to the energy crossing a unit area per unit time per unit solid angle and per unit angular frequency (energy “per unit everything”). (Since the frequency ν and the angular frequency ω are related by ω = 2πν, the specific intensity Iω of this chapter and that Iν of Chap. 2 are related by Iν = 2πIω .) The three-dimensional van Cittert-Zernike theorem states that γ12 (ka, τ ) = and Iω (α, ω) = FS

R

Z

dΩα dωIω (α, ω)ei(ka·α+ωτ ) , FS

(8.22a)

dτ d2 ka γ12 (ka, τ )e−i(ka·α+ωτ ) . (2π)3

(8.22b)

There obviously must be an intimate relationship between the theory of random processes, as developed in Chap. 5, and the theory of a wave’s coherence, as we have developed it in this section, Sec. 8.2. That relationship is explained in Ex. 8.7. Most especially, it is shown that the van Cittert-Zernike theorem is nothing but the wave’s Wiener-Khintchine theorem in disguise. **************************** EXERCISES Exercise 8.6 Problem: Reduction of Degree of Coherence We have defined the degree of coherence γ12 (a, τ ) for two points in the radiation field separated laterally by a distance a and longitudinally by a time τ . Under what conditions will this be given by the product of the spatial and temporal degrees of coherence? γ12 (a, τ ) = γ⊥ (a)γk (τ )

(8.23)

Exercise 8.7 *** Example: Complex Random Processes and the van Cittert-Zernike Theorem In Chap. 5 we developed the theory of real-valued random processes that vary randomly with time t, i.e. that are defined on a one-dimensional space in which t is a coordinate. Here we shall generalize a few elements of that theory to a complex-valued random process Φ(x) defined on a (Euclidean) space with n dimensions. We assume the process

16 to be stationary and to have vanishing mean (cf. Chap. 5 for definitions). For Φ(x)we define a complex-valued correlation function by CΦ (ξ) ≡ Φ(x)Φ∗ (x + ξ)

(8.24a)

(where the ∗ denotes complex conjugation) and a real-valued spectral density by 1 ˜ |ΦL (k)|2 . L→∞ Ln

SΦ (k) = lim

(8.24b)

Here ΦL is Φ confined to a box of side L (i.e. set to zero outside that box), and the tilde denotes a Fourier transform defined using the conventions of Chap. 5: Z Z n −ik·x n ˜ ˜ L (k)e+ik·x d k . ΦL (k) = ΦL (x)e d x , ΦL (x) = Φ (8.25) (2π)n Because Φ is complex rather than real, CΦ (ξ) is complex; and as we shall see below, its complexity implies that [although SΦ (k) is real], SΦ (−k) 6= SΦ (k). This fact prevents us from folding negative k into positive k and thereby making SΦ (k) into a “singlesided” spectral density as we did for real random processes in Chap. 5. In this complex case we must distinguish −k from +k and similarly −ξ from +ξ. (a) The complex Wiener-Khintchine theorem [analog of Eq.(5.25a)] says that Z SΦ (k) = CΦ (ξ)e+ik·ξdn ξ , CΦ (ξ) =

Z

SΦ (k)e−ik·ξ

dn k . (2π)n

(8.26a) (8.26b)

R Derive these relations. [Hint: use Parseval’s theorem in the form A(x)B ∗ (x)dn x = R ˜ B ˜ ∗ (k)dn k/(2π)n with A(x) = Φ(x) and B(x) = Φ(x + ξ), and then take the A(k) limit as L → ∞.] Because SΦ (k) is real, this Wiener-Khintchine theorem implies that CΦ (−ξ) = CΦ∗ (ξ). Show that this is so directly from the definition (8.24a) of CΦ (ξ). Because CΦ (ξ) is complex, the Wiener-Khintchine theorem implies that SΦ (k) 6= SΦ (−k). (b) Let ψ(x, t) be the complex-valued wave field defined in Eq. (8.12a), and restrict x to vary only over the two transverse dimensions so ψ is defined on a 3-dimensional space. Define Φ(x, t) ≡ ψ(x, t)/[|ψ(x, t)|2 ]1/2 . Show that CΦ (a, τ ) = γ12 (ka, τ ) ,

SΦ (−αk, −ω) = const ×

Iω (α, ω) , FS

(8.27)

and the complex Wiener-Khintchine theorem (8.26) is the van Cittert-Zernike theorem (8.22). (Note: the minus signs in SΦ result from the difference in Fourier transform conventions between the theory of random processes [Eq. (8.25) above and Chap. 5] and the theory of optical coherence [this chapter]. Evaluate the constant in Eq. (8.27).

****************************

v

17

a

Telescope & Amplifier Delay

Correlator

v γ (a) Fig. 8.4: Two Element Radio Interferometer.

8.3

Radio Telescopes

The technique pioneered by Michelson for measuring the angular sizes of stars at visual wavelengths has been applied with great effect in radio astronomy. A modern radio telescope is a large, steerable surface that reflects radio waves onto a “feed” where the fluctuating electric field in the radio wave creates a very small electric voltage that can subsequently be amplified and measured electronically. A large telescope has a diameter D ∼ 100m and a typical observing wavelength might be λ ∼ 6cm. This implies an angular resolution θA ∼ λ/D ∼ 2 arc minutes [Eq. (7.18) and subsequent discussion]. However, many of the most interesting cosmic sources are much smaller than this. In order to achieve much better angular resolution, the technique of radio interferometry was developed in the 1960s and 70s; and the analogous optical interferometry is currently (2000s) under rapid development.

8.3.1

Two-Element Radio Interferometer

If we have two radio telescopes then we can think of them as two Young’s slits and we can link them using a combination of waveguides and electric cables as shown in Fig. 8.4. When they are both pointed at a source, they both measure the electric field in waves from that source. We combine their signals by narrow-band filtering their voltages to make them nearly monochromatic and then either add the filtered voltages and measure the power as above, or multiply the two voltages directly. In either case a measurement of the degree of coherence, Eq. (8.10) can be achieved. (If the source is not vertically above the two telescopes, one obtains some non-lateral component of the full degree of coherence γ12 (a, τ ). However, by introducing a time delay into one of the signals we can measure the degree of lateral coherence γ⊥ (a), which is what the astronomer usually needs.) The objective is usually to produce an image of the radio waves’ source. This is achieved by Fourier inverting the lateral degree of coherence γ⊥ (a) [Eq. (8.13b)], which must therefore

18 be measured for a variety of values of the relative separation vector a of the telescopes perpendicular to the direction of the source. As the earth rotates, the separation vector will trace out half an ellipse in the two-dimensional a plane every twelve hours. (The source ∗ intensity is a real quantity and so we can use Eq. (8.13b) to deduce that γ⊥ (−a) = γ⊥ (a).) By changing the spacing between the two telescopes twice a day and collecting data for a number of days, the degree of coherence can be well sampled. This technique is known as Earth-Rotation Aperture Synthesis because the telescopes are being made to behave like a giant telescope, as big as their maximum separation, with the aid of the earth’s rotation.

8.3.2

Multiple Element Radio Interferometer

In practice, a modern interferometer has many more than two telescopes. The Very Large Array (VLA) in New Mexico has 27 individual telescopes arranged in a Y pattern and operating simultaneosly. The degree of coherence can thus be measured simultaneously over 27 × 26/2 = 351 different relative separations. The results of these measurements can then be interpolated to give values of γ⊥ (a) on a regular grid of points (usually 2N × 2N for some integer N). This is then suitable for applying the Fast Fourier Transform algorithm to infer the source structure I(α).

8.3.3

Closure Phase

Among the many technical complications of interferometry is one which brings out an interesting point about Fourier methods. It is usually much easier to measure the modulus than the phase of the complex degree of coherence. This is partly because it is hard to introduce the necessary delays in the electronics accurately enough to know where the zero of the fringe pattern should be located and partly because unknown, fluctuating phase delays are introduced into the phase of the field as the wave propagates through the upper atmosphere and ionosphere. (This is a radio variant of the problem of “seeing” for optical telescopes, cf. Ex. 7.10, and it also plagues the Michelson stellar interferometer.) It might therefore be thought that we would have to make do with just the modulus of the degree of coherence, i.e. the fringe visibility, to perform the Fourier inversion for the source structure. This is not so. Consider a three element interferometer measuring fields ψ1 , ψ2 , ψ3 and suppose that at each telescope there are unknown phase errors, δϕ1 , δϕ2 , δϕ3 ; cf. Fig. 8.5. For baseline a12 , we measure the degree of coherence γ⊥12 ∝ ψ1 ψ2∗ , a complex number with phase Φ12 = ϕ12 + δϕ1 − δϕ2 , where ϕ12 is the phase of γ⊥12 in the absence of phase errors. If we also measure the degrees of coherence for the other two pairs of telescopes in the triangle and derive their phases Φ23 , Φ31 , we can then calculate the quantity C123 = Φ12 + Φ23 + Φ31 = ϕ12 + ϕ23 + ϕ31 ,

(8.28)

from which the phase errors cancel out. The quantity C123 , known as the closure phase, can be measured with high accuracy. In the VLA, there are 27 × 26 × 25/6 = 2925 such closure phases, and they can all be measured

19 2

2

a12 a23 a31

1

3 3

1

Fig. 8.5: Closure phase measurement using a triangle of telescopes.

with considerable redundancy. Although absolute phase information cannot be recovered, 93 per cent of the relative phases can be inferred in this manner and used to construct an image far superior to what one would get without any phase information.

8.3.4

Angular Resolution

When the telescope spacings are well sampled and the source is bright enough to carry out these image processing techniques, an interferometer can have an angular resolving power approaching that of an equivalent filled aperture as large as the maximum telescope spacing. For the VLA this is 35km, giving an angular resolution of a fraction of a second of arc at 6cm wavelength, which is 350 times better than the resolution of a single 100m telescope. Even greater angular resolution is achieved in a technique known as Very Long Baseline Interferometry (VLBI). Here the telescopes can be located on different continents and instead of linking them directly, the oscillating field amplitudes ψ(t) are stored on magnetic tape and then combined digitally long after the observation, to compute the complex degree of coherence and thence the source structure I(α). In this way angular resolutions over 300 times better than those achievable by the VLA can be obtained. Structure smaller than a milliarcsecond corresponding to a few light years at cosmological distances can be measured in this manner. **************************** EXERCISES Exercise 8.8 Example: Interferometry from Space The longest radio-telescope separation currently available is that between telescopes on the earth’s surface and an 8-m diameter radio telescope in the Japanese HALCA satellite, which orbits the earth at 6 earth radii. Radio Astronomers conventionally describe the specific intensity Iω (α, ω) of a source in terms of its brightness temperature. This is the temperature Tb (ω) that a black body would have to have in order

20 to emit, in the Rayleigh-Jeans (low-frequency) end of its spectrum, the same specific intensity as the source. (a) Show that for a single (linear or circular) polarization, if the solid angle subtended by a source Ris ∆Ω and the specific flux (also called spectral flux ) measured from the source is Fω ≡ Iω dΩ = Iω ∆Ω, then the brightness temperature is Tb =

2(2π)3 c2 Fω 2(2π)3c2 Iω = , kB ω 2 kB ω 2 ∆Ω

(8.29)

where kB is Boltzmann’s constant. (b) The brightest quasars emit radio spectral fluxes of about Fω = 10−25 W m−2 Hz−1 , independent of frequency. The smaller is such a quasar, the larger will be its brightness temperature. Thus, one can characterize the smallest sources that a radio telescope system can resolve by the highest brightness temperatures it can measure. Show that the maximum brightness temperature measurable by the earth-to-orbit interferometer is independent of the frequency at which the observation is made, and estimate its numerical value.

****************************

8.4

Etalons and Fabry-Perot Interferometers

We have shown how a Michelson interferometer can be used as a Fourier-transform spectrometer: one measures the complex fringe visibility as a function of the two arms’ optical path difference and then takes the visibility’s Fourier transform to obtain the spectrum of the radiation. The inverse process is also powerful: One can drive a Michelson interferometer with radiation with a known, steady spectrum, and look for time variations of the positions of its fringes caused by changes in the relative optical path lengths of the interferometer’s two arms. This was the philosophy of the famous Michelson-Morley experiment to search for ether drift, and it is also the underlying principle of a laser interferometer (“interferometric”) gravitational-wave detector. To reach the sensitivity required for gravitational-wave detection one must modify the Michelson interferometer by making the light travel back and forth in each arm many times. This is achieved by converting each arm into a Fabry-Perot interferometer. In this section we shall study Fabry-Perot interferometers and some of their other applications, and in the next section we shall explore their use in gravitational-wave detection.

8.4.1

Multiple Beam Interferometry; Etalons

Fabry-Perot interferometry is based on trapping monochromatic light between two highly reflecting surfaces. To understand such trapping, let us consider the concrete situation where the reflecting surfaces are flat and parallel to each other, and the transparent medium

21 d r

b t

i

n’

n

n’

n’

a

n

n’

(b)

(a)

Fig. 8.6: Multiple beam interferometry using a type of Fabry-Perot Etalon.

between the surfaces has one index of refraction n, while the medium outside the surfaces has another index n′ (Fig. 8.6). Such a device is sometimes called an etalon. One example is a glass slab in air (n ≃ 1.5, n′ ≃ 1); another is a vacuum maintained between two glass mirrors (n = 1, n′ ≃ 1.5). Suppose that a plane wave (i.e. parallel rays) with frequency ω is incident on one of the reflecting surfaces, where it is partially reflected and partially transmitted with refraction. The transmitted wave will propagate through to the second surface where it will be partially reflected and partially transmitted. The reflected portion will return to the first surface where it too will be split, and so on [Fig. 8.6(a)]. The resulting total fields in and outside the slab could be computed by summing the series of sequential reflections and transmissions (Ex. 8.9). Alternatively, they can be computed as follows: We shall assume, for pedagogical simplicity, that there is translational invariance along the slab (i.e. the slab and incoming wave are perflectly planar). Then the series, if summed, would lead to the five waves shown in Fig. 8.6(b): an incident wave (ψi ), a reflected wave (ψr ), a transmitted wave (ψt ), and two internal waves with fields (ψa , ψb ). We introduce amplitude reflection and transmission coefficients, denoted r and t, for waves incident upon the slab surface from outside. Likewise, we introduce coefficients r′ , t′ for waves incident upon the slab from inside. These coefficients are functions of the angles of incidence and the polarization. They can be computed using electromagnetic theory (e.g. Sec. 4.6.2 of Hecht 1990), but this will not concern us here. Armed with these definitions, we can express the reflected and transmitted waves at the first surface (location A in Fig. 8.7) in the form ψr = rψi + t′ ψb , ψa = tψi + r′ ψb ,

(8.30a)

where ψi , ψa , ψb , and ψr are the values of ψ at A for waves impinging on or leaving the surface along the paths i, a, b, and r depicted in Fig. 8.7. Simple geometry shows that the waves at the second surface are as depicted in Fig. 8.7; and correspondingly, the relationships

22

t

t

iks 1 -s 2) ik (s 1

eb 2d tan

a

s 1=d

s2=2d tan sin b

a

a

sec

a

e

d

b

A

r

i

r

i

Fig. 8.7: Construction for calculating the phase differences across the slab for the two internal waves in an etalon.

between the ingoing and outgoing waves there are ψb e−iks1 = r′ ψa eik(s1 −s2 ) , ψt = t′ ψa eiks1 ,

(8.30b)

where k = nω/c is the wave number in the slab and (as is shown in the figure) s1 = dsecθ ,

s2 = 2d tan θ sin θ ,

(8.30c)

with d the thickness of the slab and θ the angle that the wave fronts inside the slab make to the slab’s faces. In solving Eqs. (8.30) for the net transmitted and reflected waves ψt and ψr in terms of the incident wave ψi , we shall need reciprocity relations between the reflection and transmission coefficients r, t for waves that hit the reflecting surfaces from one side, and those r′ , t′ for waves from the other side. These coefficients are connected by certain reciprocity relations that are analyzed quite generally in Ex. 8.10. To derive the reciprocity relations in our case of sharp boundaries between homogeneous media, consider the limit in which the slab thickness d → 0. This is allowed because the wave equation is linear and the solution for one surface can be superposed on that for the other surface. In this limit s1 = s2 = 0 and the slab must become transparent so ψr = 0,

ψt = ψi .

(8.31)

Eq. (8.30a), (8.30b), and (8.31) are then six homogeneous equations in the five wave amplitudes ψi , ψr , ψt , ψa , ψb , from which we can extract the two desired reciprocity relations: r′ = −r ,

tt′ − rr′ = 1 .

(8.32)

Since there is no mechanism to produce a phase shift as the waves propagate across a perfectly sharp boundary, it is reasonable to expect r, r′ , t and t′ all to be real, as indeed they are (Ex.

23 8.10). (If the interface has a finite thickness, it is possible to adjust the spatial origins on the two sides of the inteface so as to make r, r′ , t and t′ all be real, leading to the reciprocity relations (8.32), but a price will be paid; see Ex. 8.10.) Return, now, to the case of finite slab thickness. By solving Eqs. (8.30) for the reflected and transmitted fields and invoking the reciprocity relations (8.32), we obtain r(1 − eiϕ ) ψr = ψi , 1 − r2 eiϕ

(1 − r2 )eiϕ/(2 cos ψt = 1 − r2 eiϕ

2

θ)

ψi .

(8.33a)

Here ϕ = k(2s1 − s2 ), which reduces to ϕ = 2nωd cos θ/c ,

(8.33b)

is the light’s round-trip phase shift (along path a then b) inside the etalon, relative to the phase of the incoming light that it meets at location A. If ϕ is a multiple of 2π, the round-trip light will superpose coherently on the new, incoming light. We are particularly interested in the total reflection and transmission coefficients for the flux, i.e. the coefficients that tell us what fraction of the total flux incident on the two-faced slab (etalon) is reflected by it, and what fraction emerges from its other side: |ψr |2 2r2 (1 − cos ϕ) = , |ψi |2 1 − 2r2 cos ϕ + r4 (1 − r2 )2 |ψt |2 = . T = |ψi |2 1 − 2r2 cos ϕ + r4

R =

(8.33c)

From these expressions, we see that R+T =1,

(8.34a)

which says that the energy flux reflected from the slab plus that transmitted is equal to that impinging on the slab (energy conservation). It is actually the reciprocity relations (8.32) for the amplitude reflection and transmission coefficients that have enforced this energy conservation. If they had contained a provision for absorption or scattering of light in the interfaces, R + T would have been less than one. The above expression for the flux reflection coefficient can be appreciated more clearly if we introduce the finesse F ≡ πr/(1 − r2 ) , (8.34b) in terms of which T =

1 . 1 + (2F /π)2 sin2 21 ϕ

(8.34c)

Suppose that the etalon’s surfaces are highly reflecting (as can be achieved with dielectric coatings; first paragraph of Ex. 8.9), so r ≃ 1. Then F is very large and the transmissivity T exhibits resonances (Fig. 8.8). Unless sin 21 ϕ is small, almost all the incident light is reflected by the etalon (just as one might naively expect). The (perhaps surprising) exception arises

24 1

r=0.2

0.8

r=0.4

0.6

T

0.4 0.2 0

0

π

r=0.9 3π

2π

4π

5π

ϕ

Fig. 8.8: Flux Transmission coefficient for an etalon as a function of the round-trip phase shift ϕ (relative to the incoming light) inside an etalon.

when sin 21 ϕ is small. Then the total transmission can be large, reaching unity in the limit sin 21 ϕ → 0 (i.e., on resonance, when the round-trip phase shift ϕ inside the etalon is a multiple of 2π). Notice that for large finesse, the half width of the resonance (the value of δϕ ≡ ϕ − ϕresonance at which T falls to 1/2) is δϕ1/2 = π/F . The separation between resonances (sometimes called the free spectral range) is δϕ = π; so the finesse is the ratio of the free spectral range to the resonance half width. The large transmissivity at resonance can be understood by considering what happens when one first turns on the incident wave. If, as we shall assume, the reflectivity of the faces is near unity, then the incoming wave has a large amplitude for reflection, and correspondingly only a tiny amplitude for transmission into the slab. The tiny bit that gets transmitted travels through the slab, gets strongly reflected from the second face, and returns to the first precisely in phase with the incoming wave (ϕ an integer multiple of 2π). Correspondingly, it superposes coherently on the tiny field being transmitted by the incoming wave, and so the net wave inside the slab is doubled. After one more round trip inside the slab, this wave returns to the first face again in phase with the tiny field being transmitted by the incoming wave; again they superpose coherently; and the internal wave now has a three times larger amplitude than it began with. This process continues until a very strong field has built up inside the slab (Ex. 8.9). As it builds up, that field begins to leak out of the slab’s first face with just such a phase as to destructively interfere with the wave being reflected there. The net reflected wave is thereby driven close to zero. The field leaking out the second face has no other wave to interfere with. It remains strong, so the etalon settles down into a steady state with strong net transmission. Heuristically, one can say that, because the wave inside the slab is continually constructively superposing on itself, the slab “sucks” almost all the incoming wave into itself, and then ejects it on out the other side. (Quantum mechanically, this sucking is due to the photons’ Bose-Einstein statistics: the photons “want” to be in the same quantum state. We shall study this phenomenon, in the context of plasmons that obey Bose-Einstein statistics, in Chap. 22 [passage following Eq. (22.38)]. In addition to its resonant transmission when | sin 21 ϕ| ≪ 1, the etalon exhibits two other important features. One is the slowness of its response to a changing input flux when it is operating near resonance, which the above discussion makes clear. The other is a rapid

25 change of the phase of the transmitted light ψt as ϕ is gradually changed through resonance: Eq. (8.33a) for ψt shows a phase arg(ψt ) ≃ arg(ψi ) − tan−1

δϕ F ≃ arg(ψi ) − δϕ 2 (1 − r ) π

(8.34d)

near resonance. Here δϕ is the amount by which ϕ = 2nωd cos θ/c differs from its resonant value (some multiple of 2π). This rapid phase shift of the transmitted light near resonance is a general feature of high-quality oscillators and resonators, and (as we shall see in the next section), it is crucial for interferometric gravitational-wave detectors.

8.4.2

Fabry-Perot Interferometer

When the etalon’s two faces are highly reflecting (r near unity; F ≫ 1), we can think of them as mirrors, between which the light resonates. The higher the mirror reflectivity, the sharper the resonance (Fig. 8.8), the more rapid the change of phase near resonance [Eq. (8.34d)], and the more sluggish the response to changes of input flux near resonance. Such a highreflectivity etalon is a special case of a Fabry-Perot interferometer. The general case is any device in which light resonates between two high-reflectivity mirrors. The mirrors need not be planar and need not have the same reflectivities, and the resonating light need not be plane fronted. For example, in an interferometric gravitational-wave detector (Fig. 8.11 below) each detector arm is a Fabry-Perot cavity with spherical mirrors at its ends, the mirrors have very different but high reflectivities, and the resonating light has a Gaussian-beam profile. In the case of a Fabry-Perot etalon (parallel mirrors, plane-parallel light beam), the resonant transmission enables the etalon to be used as a spectrometer. The round-trip phase change ϕ = 2nωd cos θ/c inside the etalon varies linearly with the wave’s frequency ω, but only waves with phases ϕ near integer multiples of 2π will be transmitted efficiently. The etalon can be tuned to a particular frequency by varying either the slab width d or the angle of incidence of the radiation (and thence the angle θ inside the etalon). Either way, impressively good chromatic resolving power can be achieved. We say that waves with two nearby frequencies can just be resolved by an etalon when the half power point of the transmission coefficient of one wave coincides with the half power point of the transmission coefficient of the other. Using Eq. (8.33c) we find that the phases for the two frequencies must differ by δϕ ∼ 2π/F ; and correspondingly, since ϕ = 2nωd cos θ/c, the chromatic resolving power is 2πnd 2ndF λ = = . R= (8.35) δλ λvac δϕ λvac Here λvac is the wavelength in vacuum — i.e. outside the etalon. If we regard the etalon as a resonant cavity, then the finesse F can be regarded as the effective quality factor Q for the resonator. It is roughly the number of times a typical photon traverses the etalon before escaping. Correspondingly, the response time of the etalon on resonance, when one changes the incoming flux, is roughly the round-trip travel time for light inside the etalon, multiplied by the finesse. Note, moreover, that as one slowly changes the

26 round-trip phase ϕ, the rate of change of the phase of the transmitted wave, d arg(ψt )/dϕ, is π −1 times the finesse [Eq. (8.34d)].

8.4.3

Lasers

Fabry-Perot interferometers are exploited in the construction of many types of lasers. For example, in a gas phase laser, the atoms are excited to emit a spectral line. This radiation is spontaneously emitted isotropically over a wide range of frequencies. Placing the gas between the mirrors of a Fabry-Perot interferometer allows one or more highly collimated and narrow-band modes to be trapped and, while trapped, to be amplified by stimulated emission. **************************** EXERCISES Exercise 8.9 *** Example: Etalon’s Light Fields Computed by Summing the Contributions from a Sequence of Round Trips Study the step-by-step build up of the field inside an etalon and the etalon’s transmitted field, when the input field is suddenly turned on. More specifically: (a) When the wave first turns on, the transmitted field inside the etalon, at point A of Fig. 8.7, is ψa = tψi , which is very small if the reflectivity is high so |t| ≪ 1. Show (with the aid of Fig. 8.7) that, after one round-trip-travel time in the etalon, the transmitted field at A is ψa = tψi +(r′ )2 eiϕ tψi . Show that for high reflectivity and on resonance, the tiny transmitted field has doubled in amplitude and its energy flux has quadrupled. (b) Compute the transmitted field ψa at A after more and more round trips, and watch it build up. Sum the series to obtain the stead-state field ψa . Explain the final, steady state amplitude: why is it not infinite, and why, physically, does it have the value you have derived. (c) Show that, at any time during this buildup, the field transmitted out the far side of the etalon is ψt = t′ ψa eiks1 [Eq. (8.30b)]. What is the final, steady-state transmitted field? Your answer should be Eq. (8.33a). Exercise 8.10 *** Example: Reciprocity Relations for a Locally Planar Optical Device Modern mirrors, etalons, beam splitters, and other optical devices are generally made of glass or fused silica (quartz), with dielectric coatings on their surfaces. The coatings consist of alternating layers of materials with different dielectric constants, so the index of refraction n varies periodically. If, for example, the period of n’s variations is half a wavelength of the radiation, then waves reflected from successive dielectric layers build up coherently, producing a large net reflection coefficient; the result is a highly reflecting mirror.

27 r ψi eikr .x

ψi eiki.x

ψi* e-iki.x

r* ψi* e-ikr .x

z

t ψi eikt .x’

(a)

t* ψi* e-ikt .x’

(b)

Fig. 8.9: Construction for deriving reciprocity relations for amplitude transmission and reflection coefficients.

In this exercise we shall use a method due to Stokes to derive the reciprocity relations for locally plane-fronted, monochromatic waves impinging on an arbitrary, locally planar, lossless optical device. [By “locally” plane-fronted and planar, we mean that transverse variations are on scales sufficiently long compared to the wavelength of light that we can use the plane-wave analysis sketched below; for example, the spherical mirrors and Gaussian beams of an interferometric gravitational-wave detector (Fig. 8.11) easily satisfy this requirement. By lossless we mean that there is no absorption or scattering of the light.] The device could be a mirror, a surface with an antireflection coating (Ex. 8.12 below), an etalon, or any sequence of such objects with parallel surfaces. Let a plane, monochromatic wave ψi eiki ·x e−iωt impinge on the optical device from above, and orient the device so its normal is in the z direction and it is translation invariant in the x and y directions; see Fig. 8.9(a). Then the reflected and transmitted waves are as shown in the figure. Because the medium below the device can have a different index of refraction from that above, the waves’ propagation direction below may be different from that above, as shown. For reasons explained in part (e) below, we denote position below the device by x′ and position above the device by x. (a) Consider a thought experiment in which the waves of Fig. 8.9(a) are time-reversed, so they impinge on the device from the original reflection and transmission directions and emerge toward the original input direction, as shown in Fig. 8.9(b). If the device had been lossy, the time-reversed waves would not satisfy the field’s wave equation; the absence of losses guarantees they do. Show that, mathematically, the time reversal can be achieved by complex conjugating the spatial part of the waves, while leaving the temporal part e−iωt unchanged. (Such phase conjugation can be achieved in practice using techniques of nonlinear optics, as we shall see in the next chapter.) Show, correspondingly, that the spatial part of the waves is described by the formulas shown in Fig. 8.9(b).

28 (b) Use the reflection and transmission coefficients to compute the waves produced by the inputs of Fig. 8.9(b). From the requirement that the wave emerging from the device’s upward side have the form shown in the figure, conclude that ψi∗ e−iki ·x = t′ (t∗ ψi∗ e−iki ·x ) + r(r∗ψi∗ e−iki ·x ), so that 1 = rr∗ + t′ t∗ .

(8.36a)

Similarly, from the requirement that no wave emerge from the device’s downward side, conclude that 0 = tr∗ + t∗ r′ . (8.36b) Eqs. (8.36) are the most general form of the reciprocity relations for lossless, planar devices. (c) For a sharp interface between two homogeneous media, combine these general reciprocity relations with the ones derived in the text, Eq. (8.32), to show that t, t′ , r and r′ are all real (as was asserted in the text). (d) For the etalon of Figs. 8.6 and 8.7, what are the four complex reflection and transmission coefficients implied by Eq. (8.33a)? (e) Show that for a general optical device, the reflection and transmission coefficients can all be made real by appropriate, independent adjustments of the origins of the vertical coordinates z (for points above the device) and z ′ (for points below the device). More ′ ′ specifically, show that by setting znew = zold + δz and znew = zold + δz ′ and choosing δz and δz ′ appropriately, one can make t and r real. Show further that the reciprocity relations (8.36a), (8.36b) then imply that t′ and r′ are also real. Finally, show that this adjustment of origins brings the real reciprocity relations into the same form (8.32) as for a sharp interface between two homogeneous media. As attractive as it may be to have these coefficients real, one must keep in mind some disadvantages: (i) the displaced origins for z and z ′ in general will depend on frequency, and correspondingly (ii) frequency-dependent information (most importantly, frequency-dependent phase shifts of the light) are lost by making the coefficients real. If the phase shifts depend only weakly on frequency over the band of interest (as is typically the case for the dielectric coating of a mirror face), then these disadvantages are unimportant and it is conventional to choose the coefficients real. If the phase shifts depend strongly on frequency over the band of interest (e.g., for a Fabry-Perot interferometer near resonance), the disadvantages are severe, and one generally leaves the origins frequency independent, and correspondingly leaves r, r′ , t and t′ complex. Exercise 8.11 Example: Transmission and Reflection Coefficients for an Interface Between Dielectric Media Consider monochromatic electromagnetic waves that propagate from a medium with index of refraction n1 into a medium with index of refraction n2 . Let z be a cartesian coordinate perpendicular to the planar interface between the medium.

29

B L T Fig. 8.10: Sagnac interferometer used as a type of laser gyro.

(a) From the wave equation [−ω 2 + (c2 /n2 )∇2 ]ψ = 0, show that both ψ and ψ,z must be continuous across the interface. (b) Using these continuity requirements, show that for light that propagates orthogonal to the interface (z direction), the reflection and transmission coefficients, in going from medium 1 to medium 2, are r=

n1 − n2 , n1 + n2

t=

2n2 . n1 + n2

(8.37)

Notice that these r and t are both real. (c) Use the reciprocity relations (8.36a) to deduce the reflection and transmission coefficients r′ and t′ for a wave propagating in the opposite direction, from medium 2 to medium 1. Exercise 8.12 *** Example: Anti-reflection Coating A common technique used to reduce the reflection at the surface of a lens is to coat it with a quarter wavelength of material with refactive index equal to the geometric mean of the refractive indices of air and glass. (a) Show that this does indeed lead to perfect transmission of normally incident light. (b) Roughly how thick must the layer be to avoid reflection of blue light? Estimate the flux reflection coefficient for red light in this case. Note: The amplitude reflection coefficient at an interface is given by Eq. (8.37). Exercise 8.13 *** Problem: Sagnac Interferometer A Sagnac interferometer is a rudimentary version of a laser gyroscope for measuring rotation with respect to an inertial frame. The optical configuration is shown in Fig. 8.10. Light from a laser L is split by a beam splitter B and travels both clockwise and counter-clockwise around the optical circuit, reflecting off three plane mirrors. The

30 light is then recombined at B and interference fringes are viewed through the telescope T . The whole assembly rotates with angular velocity Ω. Calculate the difference in the time it takes light to traverse the circuit in the two directions and show that the consequent fringe shift (total number of fringes passing some cross hairs in T ), can be expressed as ∆N = 4AΩ/cλ, where λ is the wavelength and A is the area bounded by the beams.

****************************

8.5

T2 Laser Interferometer Gravitational Wave Detectors

As we shall discuss in Chap. 26, gravitational waves are predicted to exist by general relativity theory, and their emission by a binary neutron-star system has already been monitored, via their back-action on the binary’s orbital motion. As orbital energy is lost to gravitational waves, the binary gradually spirals inward, so its orbital angular velocity gradually increases. The measured rate of increase agrees with general relativity’s predictions to within the experimental accuracy of a fraction of a percent (for which Russel Hulse and Joseph Taylor received the 1993 Nobel Prize). However, the gravitational analog of Hertz’s famous laboratory emission and detection of electromagnetic waves has not yet been performed, and probably cannot be in the authors’ lifetime because of the waves’ extreme weakness. For waves strong enough to be detectable, one must turn to violent astrophysical events, such as the collision and coalescence of two neutron stars or black holes. When they reach earth and pass through a laboratory, the gravitational waves should produce tiny relative accelerations of free test masses. The tiny, oscillatory variation of the spacing between two such masses can be measured optically using a Michelson interferometer, in which (to increase the signal strength) each of the two arms is operated as a Fabry-Perot cavity. The two cavities are aligned along perpendicular directions as shown in Fig. 8.11. A Gaussian beam of light from a laser passes through a beam splitter, creating two beams with correlated phases. The beams excite the two cavities near resonance. Each cavity has an end mirror with extremely high reflectivity, 1 − re2 < 10−4 , and a corner mirror (“input mirror”) with a lower reflectivity, 1 − r2i ∼ 0.01. Because of this lower reflectivity, by contrast with the etalons discussed above, the resonant light leaks out through the input mirror instead of through the end mirror. The reflectivity of the input mirror is so adjusted that the typical photon is stored in the cavity for roughly half the period of the expected gravitational waves (a few milliseconds), which means that the input mirror’s reflectivity r2i , the arm length d, and the gravitational-wave frequency ωgw are related by d 1 ∼ 2 c(1 − ri ) ωgw

(8.38)

Be a

m

sp lit te r

cavity 2

d+δ d2

31

Laser

d+δ d1 cavity 1

Recycling mirror

Photodetector

Input mirror

End mirror

Fig. 8.11: Laser Interferometer Gravitational Wave Detector.

The light emerging from the cavity, like that transmitted by an etalon, has a phase that is highly sensitive to the separation between the mirrors: a tiny change δd in their separation produces a change in the outcoming phase δϕo ≃

8ωδd ω δd 1 ∼ 2 c (1 − ri ) ωgw d

(8.39)

in the limit 1 − ri ≪ 1; see Ex. 8.14. The outcoming light beams from the two cavities return to the beam splitter and there are recombined. The relative distances from the beam splitter to the cavities are adjusted so that, in the absence of any perturbations of the cavity lengths, almost all the interfered light goes back toward the laser, and only a tiny (but nonzero) amount goes toward the photodetector of Fig. 8.11, which monitors the output. Perturbations δd1 and δd2 in the cavity lengths then produce a change δϕo1 − δϕo2 ∼

ω (δd1 − δd2 ) ωgw d

(8.40)

in the relative phases at the beam splitter, and this in turn produces a change of the light intensity into the photodetector. By using two cavities in this way, and keeping their light storage times (and hence response times) the same, one makes the intensity of the light entering the photodiode be insensitive to fluctuations in the laser frequency; this is crucial for obtaining the high sensitivities that gravitational-wave detection requires. The mirrors at the ends of each cavity are suspended as pendula, and when a gravitational wave with dimensionless amplitude h (to be discussed in Chap. 26) passes, it moves the mirrors back and forth, producing changes δd1 − δd2 ∼ hd

(8.41)

in the arm length difference. The resulting change in the relative phases of the two beams returning to the beam splitter, ω h, (8.42) δϕo1 − δϕo2 ∼ ωgw

32 is monitored via the changes in intensity that it produces for the light going into the photodetector. If one builds the entire detector optimally and uses the best possible photodetector, √ these phase changes can be measured with a photon shot-noise-limited precision of ∼ 1/ N , where N ∼ (IL /~ω)(1/ωgw ) is the number of photons put into the detector by the laser during half a gravitational-wave period.6 By combining this with Eq. (8.42) we see that the weakest wave that can be detected is 3 1/2 ~ωgw h∼ . (8.43) ωIL For a laser power IL ∼ 5 Watts, and ωgw ∼ 103 s−1 , ω ∼ 3 × 1015 s−1 , this gravitational-wave sensitivity is h ∼ 3 × 10−21 . When operated in this manner, about 99 per cent of the light returns toward the laser from the beam splitter and the other 1 per cent goes out the end mirror or into the photodetector or gets absorbed or scattered due to imperfections in the optics. The 99 per cent returning toward the laser can be recycled back into the interferometer, in phase with new laser light, by placing a mirror between the laser and the beam splitter. This “recycling mirror” (shown dashed in Fig. 8.11) makes the entire optical system into a big optical resonator with two subresonators (the arms’ Fabry-Perot cavities), and the practical result is a 30-fold increase in the input light power, from 5 Watts to 150 W—and an optical power in each arm of 100 kW. [KIP: CHECK AND FIX NUMBERS!!] When operated in this manner, the interferometer can achieve a sensitivity h ∼ 3 × 10−22 , which is in the range expected for the waves from colliding neutron stars, black holes, and other astrophysical sources; see Chap. 26. For a more accurate analysis of the sensitivity, see Exs. 8.14 and 8.15. This estimate of sensitivity is actually the rms noise in a bandwidth equal to frequency at the minimum of LIGO’s noise curve. Figure 5.4 in Chap. 5 shows the noise p curve as the square root of the spectralpdensity of the measured arm-length separations Sx (f ), or in the notation of this chapter, Sd (f ). Since the waves produce a change of d given by δd ∼ hd, the corresponding noise-induced fluctuations in the measured h have S√h ∼ Sd /d2 , and √ the rms noise √ fluctuations in a bandwidth equal to frequency f are hrms ∼ Sh f ∼ (1/d) Sd f . Inserting Sh ≃ 10−19 m Hz−1/2 and f ≃ 100 Hz from Fig. 5.4, and d = 4 km for the LIGO arm length, we obtain hrms ∼ 3 × 10−22 , in accord with the above estimate. There are enormous obstacles to achieving such high sensitivity. To name just a few: Imperfections in the optics will absorb some of the high light power, heating the mirrors and beam splitter and causing them to deform. Even without such heating, the mirrors and beam splitter must be exceedingly smooth and near perfectly shaped to minimize the scattering of light from them. Thermal noise in the mirrors and their suspensions (described by the fluctuation dissipation theorem) will cause the mirrors to move in manners that simulate the effects of a gravitational wave, as will seismic- and acoustic-induced vibrations of the mirror suspensions. LIGO’s arms must be long (4 km) in order to minimize the effects of 6

This measurement accuracy is related to the Poisson distribution of the photons entering the interferometer’s two during a half gravitational-wave period, then the √ √ arms: if N is the mean number of photons variance is N , and the fractional fluctuation is 1/ N . The interferometer’s shot noise is actually caused by a beating of quantum electrodynamical vacuum fluctuations against the laser’s light; for details see Caves (1980).

33 these noises. While photon shot noise dominates near and above the noise curve’s minimum, f & 100 Hz, these and other noises dominate at lower frequencies. The initial LIGO interferometers operated at their design sensitivity from autumn 2005 to autumn 2007, carrying out a 2-year-long gravitational-wave search, much of the time in collaboration with international partners (the French-Italian VIRGO and British/German GEO600 interferomters). LIGO’s interferometers are being upgraded by a factor ∼ 15 in amplitude sensitivity (“advanced LIGO”), with searches near the new design sensitivity likely to begin around 2014. The initial LIGO design sensitivity is at a level ∼ 10−21 where detection is plausible but not highly likely; advanced-LIGO will be at a level h ∼ 10−22 where it will be surprising if a number of gravitational-wave sources are not detected. **************************** EXERCISES Exercise 8.14 Derivation and Problem: Phase Shift in LIGO Arm Cavity (a) For the inteferometric gravitational wave detector depicted in Fig. 8.11 (with the arms’ input mirrors having amplitude reflectivities ri close to unity and the end mirrors perfectly reflecting), analyze the light propagation in cavity 1 by the same techniques as were used for an etalon in Sec. 8.11. Show that, if ψi1 is the light field impinging on the input mirror, then the total reflected light field ψr1 is ψr1 = eiϕ1

1 − ri e−iϕ1 ψi1 , 1 − ri eiϕ1

where ϕ1 = 2kd1 .

(8.44a)

(b) From this, infer that the reflected flux |ψr1 |2 is identical to the cavity’s input flux |ψi1 |2 , as it must be since no light can emerge through the perfectly reflecting end mirror. (c) The arm cavity is operated on resonance, so ϕ1 is an integer multiple of 2π. From Eq. (8.44a) infer that (up to fractional errors of order 1 − ri ) a change δd1 in the length of cavity 1 produces a change 8k δd1 δϕr1 = . (8.44b) 1 − r2i With slightly different notation, this is Eq. (8.39), which we used in the text’s order of magnitude analysis of LIGO’s sensitivity. In this exercise and the next, we will carry out a more precise analysis. Exercise 8.15 Example: Photon Shot Noise in LIGO (a) Continuing the preceeding exercise: Denote by ψL the light field from the laser that impinges on the beam splitter and gets split in two, with half going into each arm. Using the above equations, infer that the light field returning to be beam splitter from arm 1 is ψs1 = √12 ψL eiϕ1 (1 + iδϕr1 ), where ϕ1 is some net accumulated phase that depends on the separation between the beam splitter and the input mirror of arm 1.

34 (b) Using the same formula for the field ψs2 from arm 2, and assuming that the phase changes between beam splitter and input mirror are almost the same in the two arms, so ϕo ≡ ϕ1 − ϕ2 is small compared to unity (mod 2π), show that the light field that emerges from the beam splitter, traveling toward the photodetector, is 1 i ψpd = √ (ψs1 − ψs2 ) = (ϕo + δϕr1 − δϕr2 )ψL 2 2

(8.45a)

to first order in the small phases. Show that the condition |ϕo | ≪ 1 corresponds to the experimenters’ having adjusted the positions of the input mirrors in such a way that almost all of the light returns toward the laser and only a small fraction goes toward the photodetector. (c) For simplicity, let the gravitational wave travel through the interferometer from directly overhead and have an optimally oriented polarization. Then, as we shall see in Chap. 26, the dimensionless gravitational-wave field h(t) produces the arm-length changes δd1 = −δd2 = h(t)d, where d is the unperturbed arm length. Show, then, that the field traveing toward the photodetector is i ψpd = (ϕo + δϕgw ) , 2

where δϕgw =

16πd/λ 8kd h(t) = h(t) . 2 1 − ri 1 − r2i

(8.45b)

The experimenter adjusts ϕo so it is large compared to the tiny δϕgw . (d) Actually, this equation has been derived assuming, when analyzing the arm cavities [Eq. (8.44a)], that the arm lengths are static. Explain why it should still be nearly valid when the gravitational waves are moving the mirrors, so long as the gravitational-wave half period 1/2f = π/ωgw is somewhat longer than the mean time that a photon is stored inside an arm cavity, i.e. so long as f ≫ fo , where fo ≡

1 − r2i c . 4π 2d

(8.46)

Assume that this is so. (e) Show that, if IL is the laser power impinging on the beam splitter (proportional to |ψL |2 then the steady-state light power going toward the photodetector is Ipd = (ϕo /2)2 IL and the time-variation in that light power due to the gravitational wave (the gravitational-wave signal) is Igw (t) =

p

IL Ipd

16πd/λ h(t) . 1 − r2i

(8.47a)

The photodetector monitors these changes Igw (t) in the light power Ipd and from them infers the gravitational-wave field h(t). This is called a “DC” or “homodyne” readout system; it works by beating the gravitational-wave signal field (∝ δϕGW ) against the steady light field (“local oscillator”, ∝ ϕo ) to produce the signal light power Igw (t) ∝ h(t).

35 (f) Shot noise in the interferometer’s output light power Ipd gives rise to noise in the measured gravitational-wave field h(t). From Eq. (8.47a) show that the spectral density of the noise in the measured h(t) is Sh (f ) =

(1 − r2i )λ 16πd

2

SIpd IL Ipd

(8.47b)

(g) The light power Ipd impinging on the photodiode is carried by individual photons, each of which has an energy ~ω; the average arrival rate of photons is R = Ipd /~ω. Explain why photon j brings a power Ij (t) = ~ωJj (t − tj ) where Jj (τ ) is the shape of the photon’s wave packet and its time integral is unity. From the analysis of shot noise in Sec. 5.5.3, and assuming (as is surely true) that the durations of the photon wave packets are very short compared to 1/f ∼ 0.01 s, show that the randomness in the arrival times of the photons produces fluctations in Ipd with the white-noise spectral density SIpd (f ) = 2R(~ω)2 = 2Ipd ~ω . (8.48) Combining with Eq. (8.47b), infer your final formula for the spectral density of the noise in the inferred gravitational-wave signal 2 2 (1 − r2i )λ Sh (f ) = ; (8.49a) 16πd IL /~ω and from this infer the rms noise in a bandwidth equal to frequency hrms

p = f Sh =

(1 − r2i )λ √ 16πd N

.

where

N=

IL 1 ~ω 2f

(8.49b)

is the number of photons that impinge on the beam splitter, from the laser, in half a gravitational-wave period. (h) In the next exercise we shall derive (as a challenge) the modification to the spectral density that arises at frequencies f & fo . The signal strength that gets through the interferometer is reduced because the arm length is increasing, then decreasing, then increasing again, ... while the typical photon is in an arm cavity. The result of the analysis is an increase of Sh (f ) by 1( f /fo )2 , so Sh (f ) =

(1 − r2i )λ 16πd

2

2 IL /~ω

f2 1+ 2 . fo

(8.50)

Compare this with the measured noise, at frequencies above fo = 100 Hz in the initialLIGO detectors (Fig. 5.4 with x = hd), using the initial-LIGO parameters, λ = 1.06µm, ω = 2πc/λ ≃ 2 × 1015 s−1 , d = 4 km, IL = 150 W, 1 − r2i = 1/30. It should agree rather well with the measured noise.

36 Photodetector

Current

Hg line

I Correlator

Photodetector

Current

Intensity Fringes

I

Fig. 8.12: Hanbury-Brown and Twiss Intensity Interferometer.

Exercise 8.16 Challenge: LIGO Shot Noise at f & fo Derive the factor 1+(f /fo)2 by which the spectral density of the shot noise is increased at frequencies f & fo . [Hint: Redo the analysis of the arm cavity fields, part (a) of Ex. 8.15 using an arm length that varies sinusoidally at frequency f due to a sinusoidal gravitational wave, and then use the techniques of Sec. 5.5.1 to deduce Sh (f ).] ****************************

8.6

T2 Intensity Correlation and Photon Statistics.

A type of interferometer that is rather different from those studied above was proposed and constructed by Hanbury-Brown and Twiss. In this interferometer, the intensities rather than the amplitudes of the radiation are combined to measure the degree of coherence of the radiation field. In their original experiment, Hanbury-Brown and Twiss divided light from an incandescent mercury lamp and sent it along two paths of different length before detecting photons in each beam separately using a photodetector; see Fig. 8.12. The electrical output from each photodetector measures the rate of arrival of photons from its beam, I(t), which we can write as K|Ψ|2 where K is a constant. I exhibits fluctuations δI about its mean value I, and it was found that the fluctuations in the two beams were correlated. How can this be? The light that was detected originated from many random and independent emitters and therefore obeys Gaussian statistics, according to the central limit theorem (Chap. 5). This turns out to mean that the fourth-order correlations of the wave field ψ with itself can be expressed in terms of the second-order correlations—which means in terms of the degree of coherence. More specifically: Continuing to treat the wave field Ψ as a scalar, we can write the intensity (ℜΨ)2 as the sum over a set of Fourier components Ψj with P precise frequencies ωj and slowly wandering, complex amplitudes. By (i) writing I(t) = ( j ℜΨj )2 , (ii) forming the product I(t)I(t + τ ), (iii) keeping only terms that will have nonzero averages by virtue of containing products of the form e+iωj t e−iωj t e+iωk t e−iωk t (where j and k are generally not the same), and then averaging over time, we obtain I(t)I(t + τ ) = K 2 Ψ(t)Ψ∗ (t) × Ψ(t + τ )Ψ∗ (t + τ ) + K 2 Ψ(t)Ψ∗ (t + τ ) × Ψ∗ (t)Ψ(t + τ ) 2

= I [1 + |γk(τ )|2 ]

(8.51)

37 If we now measure the relative fluctuations, we find that δI(t)δI(t + τ ) I(t)

2

=

I(t)I(t + τ ) − I(t) I(t)

2

2

= |γk (τ )|2

(8.52)

[Note: This analysis is only correct if the radiation comes from many uncorrelated sources— the many independently emitting Mercury atoms in Fig. 8.12—and therefore has Gaussian statistics.] Equation (8.52) tells us that the fluxes as well as the amplitudes of coherent radiation should exhibit positive longitudinal correlation; and the degree of coherence for the fluxes is equal to the squared modulus of the degree of coherence for the amplitudes. Although this result was rather controversial at the time the experiments were first performed, it is easy to interpret qualitatively if we think in terms of photons rather than classical waves. Photons are bosons and are therefore positively correlated even in thermal equilibrium; cf. Chaps. 2 and 3. When they arrive at the beam splitter, they clump more than would be expected for a random distribution of classical particles. In fact treating the problem from the point of view of photon statistics gives an answer equivalent to Eq. (8.52). Some practical considerations should be mentioned. The first is that our result, Eq. (8.52) derived for a scalar wave, is really only valid for completely polarised radiation. If the incident radiation is unpolarized, then the intensity fluctuations are reduced by a factor two. The second point is that the photon counts are actually averaged over longer times than the correlation time of the incident radiation. This reduces the magnitude of the measured effect further. Nevertheless, after successfully measuring temporal intensity correlations, Hanbury-Brown and Twiss constructed a Stellar Interferometer with which they were able to measure the angular diameters of bright stars. This method had the advantage that, it did not depend upon the phase of the incident radiation, so the results were insensitive to atmospheric fluctuations, one of the drawbacks of the Michelson Stellar Interferometer. Indeed it is not even necessary to use accurately ground mirrors to measure the effect. The method has the disadvantage that it can only measure the modulus of the degree of coherence; the phase is lost. **************************** EXERCISES Exercise 8.17 Derivation: Intensity Correlations By expressing the field as either a Fourier sum or a Fourier integral complete the argument outlined in Eq. (8.51). Exercise 8.18 Problem: Electron Intensity Interferometry. Is it possible to construct an intensity interferometer to measure the coherence properties of an electron source? What qualitative differences do you expect there to be from a photon intensity interferometer? ****************************

38 Box 8.2 Important Concepts in Chapter 8 • Interference Fringes – Sec. 8.2.1 and Fig. 8.1 • Incoherent radiation – Eqs. (8.4) and (8.5) • Degrees of Coherence and Fringe Visibility – Degree of lateral coherence (complex fringe visibility) for nearly monochromatic radiation, γ⊥ – Eqs. (8.6a), (8.10) and (8.12); and discussion after Eq. (8.8) – Visibility for lateral coherence: V = |γ⊥ | – Eq. (8.8) – Degree of temporal (or longitudinal) coherence for nearly monochromatic radiation – Eq. (8.15) – Degree of temporal coherence for broad-band radiation – Eq. (8.16) – Three-dimensional degree of coherence – Sec. 8.2.8 • Coherence lengths and times – Eqs. (8.11), (8.17) and associated discussions, and passage following Eq. (8.21) • van Cittert-Zernike Theorem relating degree of coherence to angular distribution and/or spectrum of the source – – – –

For lateral coherence, Eqs. (8.7) and (8.13) For temporal coherence of broad-band radiation – Eqs. (8.18) Three dimensional (lateral and longitudinal together) – Eqs. (8.22) Relationship to Wiener-Khintchine theorem – Ex. (8.7b)

• Michelson interferometer and Fourier-transform spectroscopy — Fig. 8.3, Sec. 8.2.7 • Complex random processes – Ex. 8.7 • Radio Telescope: How one constructs images of the source, and what determines its angular resolution – Sec. 8.3 • Amplitude reflection and transmission coefficients – Eq. (8.30a) • Reciprocity relations for reflection and transmission coefficients– Eqs. (8.32), Ex. 8.10 • Etalon and Fabry-Perot interferometer – Secs. 8.4.1 and 8.4.2 – Finesse and its influence on half-width of resonance and phase shift across resonance – Eqs. (8.34) and associated discussion – Free spectral range – passage following Eq. (8.34c) – Spectrometer based on a Fabry-Perot interferometer; its resolving power – Sec. 8.4.2 • High reflectivity coatings and anti-reflection coatings constructed from alternating dielectric layers – Exs. 8.10 (first paragraph) and 8.12 • Sagnac interferometer – Ex. 8.13 • Laser interferometer gravitational-wave detector, and how it works – Sec. 8.5 • Intensity correlations, Sec. 8.6

39

Bibliographic Note For pedagogical introductions to interference and coherence with greater detail than this chapter, see Hecht (1998), and Klein & Furtak (1986). For more advanced treatments, see Francon (1966) and Goodman (1968). [KIP: CHECK THESE]

Bibliography Caves, C. M. 1980 Quantum-mechanical radiation pressure fluctuations in an interferometer Physical Review Letters 45 75 Feynman, R. P., Leighton, R. B., & Sands, M. 1965 The Feynman Lectures on Physics New York: Addison Wesley Francon, M. 1966 Optical Interferometry New York: Academic Press Goodman, J. W. 1968 Introduction to Fourier Optics New York: McGraw-Hill Hecht, E. 1998 Optics New York: Addison Wesley Klein, M. V. & Furtak, T. E. 1986 Optics New York: Wiley Mather, J. C. et al 1994 Measurement of the cosmic microwave background spectrum by the COBE FIRAS instrument, Astrophysical Journal 420 439

Contents 9 Nonlinear Optics 9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Basic Principles of the Laser . . . . . . . . . . . . . . . . . . . 9.2.2 T2 Types of Pumping and Types of Lasers . . . . . . . . . . 9.3 Holography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Recording a Hologram . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Reconstructing the 3-Dimensional Image from a Hologram . . 9.4 Phase-Conjugate Optics . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Wave-Wave Mixing in Nonlinear Crystals . . . . . . . . . . . . . . . . 9.5.1 Maxwell’s Equations and Nonlinear Dielectric Susceptibilities . 9.5.2 Wave-Wave Mixing; Resonance Conditions for 3-Wave Mixing 9.5.3 Three-Wave Mixing: Growth Equation for an Idealized, Dispersion-Free, Isotropic Medium . . . . . . . . . . . . . . . 9.5.4 Three-Wave Mixing: Resonance Conditions and Growth Equation for an Anisotropic, Axisymmetric Medium; Frequency Doubling . . . . . . . . . . . . . . . . . . . . . . . 9.6 Applications of Wave-Wave Mixing: Frequency Doubling, Phase Conjugation, and Squeezing . . . . . . . . . . . . . . 9.6.1 Frequency Doubling . . . . . . . . . . . . . . . . . . . . . . . 9.6.2 Phase Conjugation . . . . . . . . . . . . . . . . . . . . . . . . 9.6.3 Squeezed Light . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Other Methods to Produce Wave-Wave Mixing . . . . . . . . . . . . . 9.7.1 Photorefractive Effect . . . . . . . . . . . . . . . . . . . . . . 9.7.2 Ponderomotive Squeezing by a Suspended Mirror . . . . . . .

i

. . . . . . . . . . .

1 1 2 2 6 8 9 11 16 19 19 22

. . . .

24

. . . .

24

. . . . . . .

29 29 30 31 34 34 34

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . .

Chapter 9 Nonlinear Optics Version 0809.1.K, 26 Nov 2008 Please send comments, suggestions, and errata via email to [email protected] and [email protected], or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 9.1 Reader’s Guide • This chapter depends substantially on Secs. 6.2, 6.3 and 6.6.1 of Chap. 6, Geometric Optics. • Sec. 9.5, on wave-wave mixing, is an important foundation for Chap. 22 on the nonlinear dynamics of plasmas, and (to a lesser extent) for the discussions of solitary waves (solitons) in Chaps. 15 and 20. Nothing else in this book relies substantially on this chapter.

9.1

Overview

Communication technology is undergoing a revolution, and computer technology may do so soon, in which the key devices used (e.g., switches and communication lines) are changing from radio and microwave frequency devices to optical frequencies. This revolution has been made possible by the invention and development of lasers (most especially semiconductor diode lasers) and other technology developments such as dielectric crystals whose polarization Pi is a nonlinear function of the applied electric field, Pi = ǫ0 (χij E j + χijk E j E k + χijkl E j E k E l + · · · ). In this chapter we shall study lasers, nonlinear crystals, and various nonlinear optics applications that are based on them. Most courses in elementary physics idealize the world as linear. From the simple harmonic oscillator to Maxwell’s equations to the Schrödinger equation, most all the elementary physical laws one studies are linear, and most all the applications one studies make use of this linearity. In the real world, however, nonlinearities abound, creating such phenomena 1

2 as avalanches, breaking ocean waves, holograms, optical switches, and neural networks; and in the past three decades nonlinearities and their applications have become major themes in physics research, both basic and applied. This chapter, with its exploration of nonlinear effects in optics, serves as a first introduction to some fundamental nonlinear phenomena and their present and future applications. In later chapters we shall revisit some of these phenomena and shall meet others, in the context of fluids (Chaps. 14 and 15), plasmas (Chap. 22), and spacetime curvature (Chaps. 24, 25, 26). Since highly coherent and monochromatic laser light is one of the key foundations on which modern nonlinear optics has been built, we shall begin in Sec. 9.2 with a review of the basic physics principles that underlie the laser: the pumping of an active medium to produce a molecular population inversion, and the stimulated emission of radiation from the inverted population of molecules. Then we shall describe the details of how a number of different lasers are pumped and the characteristics of the light they emit. Most important among these characteristics are high frequency stability and high power. In Sec. 9.3 we shall meet our first example of an application of nonlinear optics: holography. In holography a three-dimensional, monochromatic image of an object is produced by a two step process: recording a hologram, and then passing coherent light through the hologram. Holography differs from more modern nonlinear optics applications in not being a realtime process. Real-time processes have been made possible by nonlinear crystals. In Sec. 9.4 we study an example of a real-time, nonlinear-optics process: phase conjugation of light by a phase-conjugating mirror; and we see how such phase conjugation can be used to prevent distortion of images and signals carried in an optical fiber. In Sec. 9.5 we study the wavewave mixing in nonlinear crystals that makes possible phase conjugation, frequency doubling and other nonlinear optical processes, and we analyze, as an important example, frequency doubling. In Sec. 9.6 we discuss several applications of wave-wave mixing: frequency doubling (practical aspects of the process analyzed in the preceding section), phase conjugation achieved via 4-wave mixing, and the generation of squeezed light.

9.2 9.2.1

Lasers Basic Principles of the Laser

In quantum mechanics one identifies three different types of interaction of light with material systems (atoms, molecules, atomic nuclei, electrons, . . .): (i ) Spontaneous emission, in which a material system in an excited state spontaneously drops into a state of lesser excitation and emits a photon in the process. (ii ) Absorption, in which an incoming photon is absorbed by a material system, exciting it. (iii ) Stimulated emission, in which a material system, initially in some excited state, is “tickled” by passing photons and this tickling stimulates it to emit a photon of the same sort (in the same state) as the photons that tickled it. As peculiar as stimulated emission may seem at first sight, it in fact is easily understood and analyzed classically. It is nothing but “negative absorption:” In classical physics, when a light beam with electric field E = ℜ[Aei(kz−ωt+ϕ) ] travels through an absorbing medium, its real amplitude A decays exponentially with the distance propagated, A ∝ e−µz/2 (corre-

3 E2

|2>

E1

| 1>

hω=E2-E1 (a)

|2>

E2

hω=E2-E1

E1

|1>

(b)

Fig. 9.1: (a) Photon Absorption: A photon with energy ~ω = E2 − E1 excites a molecule from its ground state, with energy E1 to an excited state with energy E2 (as depicted by an energy-level diagram). (b) Stimulated Emission: The molecule is initially in its excited state, and the incoming photon stimulates it to deexcite into its ground state, emitting a photon identical to the incoming one.

sponding to an intensity decay I ∝ e−µz ), while its frequency ω, wave number k, and phase ϕ remain very nearly constant. For normal materials, the absorption rate µ = I −1 dI/dz is positive and the energy lost goes ultimately into heat. However, one can imagine a material with an internally stored energy that amplifies a passing light beam. Such a material would have a negative absorption rate, µ < 0, and correspondingly the amplitude of the passing light would grow with the distance traveled, A ∝ e+|µ|z/2 , while its frequency, wave number, and phase would remain constant. Such materials do exist; they are called “active media” and their amplification of passing waves is called “stimulated emission.” This elementary, classical description of stimulated emission is equivalent to the quantum mechanical description in the domain where the stimulated emission is strong: the domain of large photon occupation numbers (which, as we learned in Chaps. 2 and 3, is the domain of classical waves). The classical description of stimulated emission takes for granted the existence of an active medium. To understand the nature of such a medium, we must turn to quantum mechanics: As a first step toward such understanding, consider a beam of monochromatic light with frequency ω that impinges on a collection of molecules (or atoms or charged particles) that are all in the same quantum mechanical state |1i. Suppose the molecules have a second state |2i with energy E2 = E1 + ~ω. Then the light will resonantly excite the molecules from their initial state |1i to the higher state |2i, and in the process photons will be absorbed [Fig. 9.1(a)]. The strength of the interaction is proportional to the beam’s energy flux F (which we shall call the “intensity” for short in this chapter1 ). Stated more precisely, the rate of absorption of photons is proportional to the number flux of photons in the beam dn/dAdt = F/~ω, in accord with the classical description of absorption. Suppose, now, that when the light beam first arrives, the atoms are all in the higher state |2i rather than the lower state |1i. There will still be a resonant interaction, but this time the interaction will deexcite the atoms, with an accompanying emission of photons [Fig. 9.1(b)]. As in the absorption case, the strength of the interaction is proportional to the intensity of the incoming beam, i.e., the rate of emission of new photons is proportional to the number flux of photons that the beam already has. A quantum mechanical analysis 1

This is the same terminology as in Chap 7, but not Chap. 8 where “intensity” was reserved for energy flux per unit solid angle.

4 absorption states fast de

cay

pump transition ecay fast d ground state

| 2 >, metastable laser transition |1 >

Fig. 9.2: The mechanism for creating the population inversion that underlies laser action. The horizontal lines and band represent energy levels of a molecule, and the arrows represent transitions in which the molecules are excited by pumping or decay by emission of photons.

shows that the photons from this stimulated emission come out in the same quantum state as is occupied by the photons of the incoming beam (Bose-Einstein statistics: photons, being bosons, like to congregate in the same state). Correspondingly, when viewed classically, the beam’s intensity will be amplified at a rate proportional to its initial intensity, with no change of its frequency, wave number, or phase. In Nature molecules usually have their energy levels populated in accord with the laws of statistical (thermodynamic) equilibrium. Such thermalized populations, as we saw in Chap. 3, entail a ratio N2 /N1 = exp[−(E2 − E1 )/kB T ] < 1 for the number N2 of molecules in state |2i to the number N1 in state |1i. Here T is the molecular temperature, and for simplicity it is assumed that the states are nondegenerate. Since there are more molecules in the lower state |1i than the higher one |2i, an incoming light beam will experience more absorption than stimulated emission. On the other hand, occasionally in Nature and often in the laboratory a collection of molecules develops a “population inversion” in which N2 > N1 . The two states can then be thought of as having a negative temperature with respect to each other. Light propagating through population-inverted molecules will experience more stimulated emission than absorption; i.e., it will be amplified. The result is “light amplification by stimulated emission,” or “laser” action. This basic principle underlying the laser has been known since the early years of quantum mechanics, but only in the 1950s did physicists succeed in designing, constructing, and operating real lasers. The first proposals for practical devices were made, independently, in the U.S. by Weber (1953) and Gordon, Zeiger, and Townes (1954), and in Russia by Basov and Prokhorov (1954, 1955). The first successful construction and operation of a laser was by Gordon, Zeiger, and Townes (1954, 1955), and soon thereafter by Basov and Prokhorov [KIP CHECK] —though these first lasers actually used radiation not at optical frequencies but rather at microwave frequencies (based on a population inversion of ammonia molecules2 ) and thus was called a maser . The first optical frequency laser, one based on a population inversion of chromium ions in a ruby crystal, was constructed and operated by Maiman (1960). 2

For the basic principles of the ammonia maser, see, e.g., Chap. 9 of Feynman, Leighton, and Sands (1965).

5 Escaping Laser Light Active Medium

Fig. 9.3: The use of a Fabry-Perot cavity to enhance the interaction of the light in a laser with its active medium.

The key to laser action is the population inversion. Population inversions are incompatible with thermodynamic equilibrium; thus, to achieve them, one must manipulate the molecules in a nonequilibrium way. This is usually done by some concrete variant of the process shown in the energy level diagram of Fig. 9.2. Some sort of pump mechanism (to be discussed in the next section) rapidly excites molecules from the ground state into some group of “absorption” states. The molecules then decay rapidly from the absorption states into the state |2i, which is metastable (i.e., has a long lifetime against spontaneous decay). The laser transition is from state |2i into state |1i. Once a molecule has decayed into state |1i, it quickly decays on down to the ground state and then may be quickly pumped back up into the absorption states. If the pump acts suddenly and briefly, this process will produce a temporary population inversion of states |2i and |1i, with which an incoming, weak burst of “seed” light can interact to produce a burst of amplification. The result is a pulsed laser. If the pump acts continually, the result may be a permanently maintained population inversion with which continuous seed light can interact to produce continuous-wave laser light. As the laser beam travels through the active medium (the population-inverted molecules), its intensity I builds up with distance z as dI/dz = I/ℓo , so I(z) = Io ez/ℓo . Here Io is the initial intensity, and ℓo ≡ 1/|µ|, the e-folding length, depends on the strength of the population inversion and the strength of the coupling between the light and the active medium. Typically ℓo is so long that strong lasing action cannot be achieved by a single pass through the active medium. In this case, the lasing action is enhanced by placing the active medium inside a Fabry-Perot cavity (Fig. 9.3). The length L of the cavity is adjusted so the lasing transition frequency ω = (E2 − E1 )/~ is an eigenfrequency of the cavity. The lasing action then excites a standing wave mode of the cavity, from which the light leaks out through one or both cavity mirrors. If F is the cavity’s finesse [approximately the average number of times a photon bounces back and forth inside the cavity before escaping through a mirror; cf. Eq. (8.34b)], then the cavity increases the distance that typical photons travel through the active medium by a factor ∼ F , thereby increasing the intensity of the light output by a factor ∼ eF L/ℓo . In addition, oblique optical elements are often added at the ends of laser, that transmit only a single polarization state. For an ideal laser (one, e.g., with a perfectly steady pump maintaining a perfectly steady population inversion that in turn maintains perfectly steady lasing), the light comes out in the most perfectly classical state that quantum mechanics allows. This state, called a quantum mechanical coherent state, has a perfectly sinusoidally oscillating electric field on

6 which is superimposed the smallest amount of noise (the smallest wandering of phase and amplitude) allowed by quantum mechanics: the noise of quantum electrodynamical vacuum fluctuations. The value of the oscillations’ well defined phase is determined by the phase of the seed field from which the coherent state was built up by lasing. Real lasers have additional noise due to a variety of practical factors, but nevertheless, their outputs are usually highly coherent, with long coherence times.

9.2.2

T2 Types of Pumping and Types of Lasers

Lasers can be pumped radiatively, collisionally, chemically, electrically, and even by nuclear explosions; and each method of pumping produces a laser with special properties that have special uses. In this section we shall describe a few examples. [WARNING: THIS SECTION NEEDS UPDATING; IT IS OUT OF DATE BY ABOUT 15 YEARS.] Radiative pumping: In radiative pumping a burst of “pump” photons excites the active medium’s molecules from their ground state to a group of absorption states. The pump photons are typically produced by a flash tube which surrounds the active medium, or whose light is focussed onto the active medium by mirrors. This was the type of pumping used by Maiman in his first ruby maser. The strongest pulsed lasers now available (Neodymium glass lasers) use a variant of this called Q-switching. In Q-switching, the resonant interaction between the laser light and the active medium is temporarily turned off (e.g., by removing the Fabry-Perot cavity from around the active medium) while the medium is radiatively pumped. Thereby a very strong population inversion is built up, and when the resonance is turned back on, an enormous but very brief pulse of laser light is produced—as much as 100 J in a picosecond. Even shorter pulses, with durations ∼ 10 − 100 fs, can be produced. These enable investigations of fast chemical reactions - a discipline called “femtochemistry”. Collisional pumping: The continuous-wave helium-neon laser uses collisional pumping. A mixture of helium and neon gas (roughly 10 helium atoms for each neon atom) is subjected to a continuous electrical discharge. The electrons in the discharge collide with the many helium atoms, exciting them into absorption states that then decay rapidly into a long-lived metastable state. The resulting population inversion of the helium atoms, however, is not used directly for laser action. Rather, the many excited helium atoms collide with the fewer ground-state neon atoms, resonantly exciting them into a metastable neon state that has nearly the same energy as that of helium. The resulting population inversion of neon then acts as the laser’s active medium. [There actually are several metastable states of neon that get population inverted in this way, and the helium-neon laser thereby can lase at several different wavelengths: 0.6328 microns (in the red), 1.15 and 3.39 microns (in the infrared), and others.] Chemical pumping: In chemical pumping a nonequilibrium chemical reaction creates products in excited, metastable states that then lase. An example is the reaction H + F → HF , which leaves the hydrogen flouride molecule in a metastable, lasing state. Electrical pumping: In electrical pumping, electric fields and associated currents are used to produce population inversions. Two important examples are semiconductor diode lasers and free electron lasers. In semiconductor diode lasers, the flows of electrons and holes,3 in 3

A “hole” is the absence of an electron in a “degenerate Fermi sea;” i.e., it is an empty single-particle

7 response to an electrical bias, populate a portion of the conduction band and depopulate a portion of the valence band in a thin layer of a semiconductor (e.g., a 0.2 micron thick layer of Gallium Arsenide that is sandwiched between one material that injects conductionband electrons into the Gallium Arsenide and another that injects holes). A weak beam of light passing along the thin layer stimulates electrons or holes to drop out of the conduction band into the valence band, thereby emitting photons that amplify the light. The resulting continuous-wave laser is easily modulated at frequencies as high as 10’s of GHz by modulating the bias voltage. This and the diode’s tiny size makes such lasers ideally suited for optical communication. In the free electron laser , a nearly monoenergetic beam of electrons, created by a particle accelerator, is sent through a static, transverse, spatially alternating magnetic field. The magnetic field is called an “undulator” because of its diffraction-grating-like undulations. The field’s alternating Lorentz force causes the moving electrons to oscillate back and forth transversely and radiate. These electron oscillations resonate with the light they emit; the light moves forward, relative to each electron, by one optical wavelength while the electron undergoes one oscillation. In this device the electrons’ population-inverted energy distribution (many high-speed electrons, fewer lower-speed electrons) is produced electrically, by the particle accelerator, and the photon emission drives the electrons from their initial, strongly populated states of high kinetic energy to more sparsely populated states of lower kinetic energy. In recent years, there has been a drive to push free electron lasers into the X-ray band using high current beams from particle accelerators like DESY in Germany. These can produce picosecond pulses which are useful for studying biological specimens. Nuclear-explosion pumping. A device much ballyhooed in America during the reign of Ronald Reagan, but never built, was a futuristic, super-powerful X-ray laser pumped by a nuclear explosion. As part of Reagan’s Strategic Defense Initiative (“Star Wars”), this laser was supposed to shoot down Soviet missiles. In Ex. 9.1 the reader is invited to speculate about the design of such a laser. Discussion: As the above examples show, lasers come in a wide variety of configurations, and the light they produce can have a wide variety of properties. Pulsed lasers can achieve very high instantaneous powers (100 J in a picosecond, corresponding to 1014 Watts). Continous-wave lasers can also achieve large powers; for example CO2 lasers putting out as much as 109 Watts can have their light concentrated into regions with transverse dimensions as small as one wavelength (a micron, but no smaller because of diffraction), thereby yielding a local energy flux of 1021 W m−2 . Let us translate this energy flux into field amplitudes. The rms magnetic field strength in the wave is ∼ 3 kT and the corresponding electric field is ∼ 1 TV m−1 . The electrical potential difference across a wavelength (∼ 1µm) is then ∼ 1 MeV. It is then not surprising that high power lasers are able to create electron-positron pair plasmas. For many applications large power is irrelevant or undesireable, but high frequency stability (a long coherence time) is crucial. By locking the frequencies of lasers to optical cavities or to molecular transitions, one can suppress the wandering of the phase of the laser light and thereby achieve frequency stabilities as high as ∆f ∼ 1 mHz, corresponding to coherence state (mode) of the electron field, in a distribution function for which, up to some momentum, most all the other electron states are occupied.

8 times of ∼ 1000 sec and coherence lengths of ∼ 3 × 108 km. When first invented, lasers were called “a solution looking for a problem.” Now they permeate everyday life as well as high technology. Examples are supermarket bar-code readers, laser pointers, CD players, eye surgery, laser printers, laser gyroscopes (which are now standard on commercial aircraft), laser-based surveying, Raman spectroscopy, laser fusion, optical communication, optically based computers, holography, maser amplifiers, and hydrogen-maser clocks. **************************** EXERCISES Exercise 9.1 Challenge: Nuclear Powered X-Ray Laser Motivated by Ronald Reagan’s “Star Wars” dreams, how would you design a nuclear powered X-ray laser? The energy for the pump comes from a nuclear explosion that you set off in space above the earth. You want to use that energy to create a population inversion in an active medium that will lase at X-ray wavelengths; and you want to focus the resulting X-ray beam onto an intercontinental ballistic missile that is rising out of the earth’s atmosphere. What would you use for the active medium? How would you guarantee that a population inversion is created in the active medium? How would you focus the resulting X-ray beam? (Note: This is a highly nontrivial exercise, intended more as a stimulus for thought than as a test of one’s understanding of things taught in this book.)

****************************

9.3

Holography

Thus far in this book, our study of optics has focused on situations where waves propagate linearly, i.e., where they superpose linearly (additively). In the 1970’s and 1980’s the technology of lasers and of “nonlinear crystals” began to make possible processes in which light waves interact with each other nonlinearly (Sec. 9.5 below). The resulting nonlinear optics has promising applications in such diverse areas as computers, communication, optical astronomy, gravitational-wave detection, spectroscopy, holography, . . . . Holography is an old and well-explored example of nonlinear optics—an example in which the nonlinear interaction of light with itself is produced not in real time, but rather by means of a recording followed by a later readout.4 Holography is to be contrasted with ordinary photography. Ordinary photography (Fig. 9.4) produces a colored, 2-dimensional image of 3-dimensional objects. Holography (Figs. 9.5, 4

Holography is discussed and analyzed in most standard optics textbooks; e.g., Chapter 8 of Ghatak and Thyagarajan (1978). A number of practical applications of holography are discussed by Iizuka (1987) and by Cathey (1974).

a 9

g tin na t i um gh Ill Li

a Object

Lens

Fig. 9.4: Ordinary photography.

Photographic Plate

9.7 below) produces a monochromatic 3-dimensional image of 3-dimensional objects. Note that, roughly speaking, the two processes contain the same amount of information: two items of information at each location in the image. The two items in an ordinary photograph are the intensity and the color; the two items in a holographic photograph (hologram) are the intensity and the phase of the monochromatic light. It is the phase of the light, lost from an ordinary photograph but preserved in a hologram, that contains the information about the third dimension: Our brain deduces the distance to a point on an object from the difference in the directions of propagation of the point’s light as it arrives at our two eyes. Those propagation directions are encoded in the light as variations of the light’s phase with transverse location [see, e.g., the point-spread function for a thin lens, Eq. (7.28)]. Thus, the transverse variations in phase contain the three-dimensional information. It is just those transverse phase variations that are preserved in a hologram. In an ordinary photograph (Fig. 9.4), white light scatters off an object, with different colors scattering with different strengths. The resulting colored light is focused through a lens to form a colored image on a photographic plate or layer of “photoresist”. The plate records the color and intensity of the light at each point in the focal plane, thereby producing the ordinary photograph. In holography one records a hologram (Fig. 9.5), and one then uses the hologram to reconstruct the holographic image (Fig. 9.7 below).

9.3.1

Recording a Hologram

Consider, first, the recording of the hologram. Monochromatic, linearly polarized plane-wave light with electric field E = ℜ[ψ(x, y, z)e−iωt ] , (9.1)

angular frequency ω and wave number k = ω/c, illuminates the object and also a mirror as shown in Fig. 9.5. The light must be spatially coherent over the entire region of mirror plus object. The illuminating light propagates in the y–z plane, at some angle θo to the z axis, and the mirror lies in the x–y plane. The mirror reflects the illuminating light, producing a so-called reference beam, which we shall call the mirror wave: ψmirror = Meik(z cos θo −y sin θo ) ,

(9.2)

10

o

Mirror wave : Meik(z cos o - y sin o )

Photographic Plate

Mirror

g atin n i m t Illu Ligh

z=0

Object Object wave :

y z

(x, y, z) eikz

Fig. 9.5: Recording a hologram.

where M is a real constant. The object scatters the illuminating light, producing a wave propagating toward the photographic plate (z direction) that we shall call the object wave and shall denote ψobject = O(x, y, z)eikz . (9.3) It is the slowly varying complex amplitude O(x, y, z) of this object wave that carries the three-dimensional, but monochromatic, information about the object’s appearance, and it thus is this O(x, y, z) that will be reconstructed in the second step of holography. In the first step [Fig. 9.5 and Eq. (9.3)], the object wave propagates along the z-direction to the photographic plate at z = 0, where it interferes with the mirror wave to produce a transverse intensity1 pattern F (x, y) ∝ |O + Me−iky sin θo |2 = M 2 + |O(x, y, z = 0)|2 + O(x, y, z = 0)Meiky sin θo + O∗ (x, y, z = 0)Me−iky sin θo . (9.4) (Here and throughout this chapter a ∗ denotes complex conjugation.) The plate is blackened at each point in proportion to this intensity. The plate is then developed and a positive or negative print (it doesn’t matter which because of Babinet’s principle) is made on a transparent sheet of plastic or glass. This print, the hologram, has a transmissivity as a function of x and y that is proportional to the intensity distribution (9.4): t(x, y) ∝ M 2 + |O(x, y, z = 0)|2 + MO(x, y, z = 0)eiky sin θo + MO∗ (x, y, z = 0)e−iky sin θo . (9.5) In this transmissivity we meet our first example of nonlinearity: t(x, y) is a nonlinear superposition of the mirror wave and the object wave. Stated more precisely, the superposition is not a linear sum of wave fields, but instead is a sum of products of one wave field with the complex conjugate of another wave field. A further nonlinearity will arise in the reconstruction of the holographic image, Eq. (9.7) below.

11

y

(a)

y

x (b)

x (c)

Fig. 9.6: (a) Ordinary photograph of an object. (b) Hologram of the same object. (c) Enlargement of the hologram. [Adapted from Fig. 8.3 of Ghatak and Thyagarajan, 1978.]

Fig. 9.6 shows an example. Fig. 9.6 (a) is an ordinary photograph of an object, (b) is a hologram of the same object, and (c) is a blow-up of a portion of that hologram. The object is not at all recognizable in the hologram because the object wave O was not focused to form an image at the plane of the photographic plate. Rather, light from each region of the object was scattered to and recorded by all regions of the photographic plate. Nevertheless, the plate contains the full details of the scattered light O(x, y, z = 0), including its phase. That information is recorded in the piece M(Oeiky sin θo + O∗ e−iky sin θo ) = 2M ℜ(Oeiky sin θo ) of the hologram’s transmissivity. This piece oscillates sinusoidally in the y direction with wavelength 2π/k sin θo ; and the amplitude and phase of its oscillations are modulated by the object wave O(x, y, z = 0). Those modulated oscillations show up clearly when one magnifies the hologram [Fig. 9.6(c)]; they make the hologram into a sort of diffraction grating, with the object wave O(x, y, z = 0) encoded as variations of the darkness and spacings of the grating lines. What about the other pieces of the transmissivity (9.5), which superpose linearly on the diffraction grating? One piece, t ∝ M 2 , is spatially uniform and thus has no effect except to make the lightest parts of the hologram slightly grey rather than leaving it absolutely transparent (since this hologram is a negative rather than a positive). The other piece, t ∝ |O|2 , is the intensity of the object’s unfocussed, scattered light. It produces a greying and whitening of the hologram [Fig. 9.6(b)] that varies on lengthscales long compared to the grating’s wavelength 2π/k sin θo , and that thus blots out the diffraction grating a bit here and there, but does not change the amplitude or phase of the grating’s modulation.

9.3.2

Reconstructing the 3-Dimensional Image from a Hologram

To reconstruct the object’s 3-dimensional wave, O(x, y, z)eikz , one sends through the hologram monochromatic, plane-wave light identical to the mirror light used in making the hologram; cf. Fig. 9.7. If, for pedagogical simplicity, we place the hologram at the same location z = 0 as was previously occupied by the photographic plate, then the incoming light has the same form (9.2) as the original mirror wave, but with an amplitude that we shall denote as R corresponding to the phrase reference beam that is used to describe this

12 Reference wave : R e -ik sin o y e ik cos o z

Hologram : T(x,y) = M 2+| (x,y,z=0)|2 + M (x,y,z=0) e+iky sin + M * (x,y,z=0) e -iky sin o

o o

Reconstructed Object Wave : const x MR (x,y,z) e ik z

o

Modulated Mirror Wave :

const x R (|

|2 + M2) e -ik sin o y e ik cos

oz

s

y

Phase Conjugated Object Wave : z

const x MR * e-ik sin

s

y e ik cos

s

z

Fig. 9.7: Reconstructing the holographic image from the hologram. Note that sin θs = 2 sin θo .

incoming light: ψreference = Reik(z cos θo −y sin θo ) .

(9.6)

In passing through the hologram at z = 0, this reference beam is partially absorbed and partially transmitted. The result, immediately upon exiting from the hologram, is a “reconstructed” light-wave field whose value ψreconstructed = R(x, y, z = 0) and normal derivative ψreconstructed ,z = Z(x, y, z = 0) are given by [cf. Eq. (9.5)] ≡ R(x, y, z = 0) = t(x, y)Re−iky sin θo ψreconstructed z=0 = M 2 + |O(x, y, z = 0)|2 Re−iky sin θo + MRO(x, y, z = 0) + MRO∗ (x, y, z = 0)e−i2ky sin θo ; ψreconstructed ,z ≡ Z(x, y, z = 0) = ik cos θo R(x, y, z = 0). (9.7) z=0

This field and normal derivative act as initial data for the subsequent evolution of the reconstructed wave. Note that the field and derivative, and thus also the reconstructed wave, are triply nonlinear: each term in Eq. (9.7) is a product of (i) the original mirror wave M used to construct the hologram or the original object wave O, times (ii) O∗ or M ∗ = M, times (iii) the reference wave R that is being used in the holographic reconstruction. The evolution of the reconstructed wave beyond the hologram (at z > 0) can be computed by combining the initial data (9.7) for ψreconstructed and ψreconstructed ,z at z = 0 with the Helmholtz-Kirchhoff formula (7.4); see Ex. 9.2. From the four terms in the initial data,

13 Eq. (9.7) [which arise from the four terms in the hologram’s transmissivity t(x, y), Eq. (9.5)], the reconstruction produces four wave fields; see Fig. 9.7. The direction of propagation of each of these waves can easily be inferred from the vertical spacing of its phase fronts along the outgoing face of the hologram, or equivalently from the relation ∂ψreconstructed /∂y = iky ψreconstructed = −ik sin θψ, where θ is the angle of propagation relative to the horizontal z direction. Since, immediately in front of the hologram, ψreconstructed = R, the propagation angle is ∂R/∂y sin θ = . (9.8) −ikR Comparing with Eqs. (9.5) and (9.7), we see that the first two, slowly spatially varying terms in the transmissivity, t ∝ M 2 and T ∝ |O|2 , both produce waves that propagate in the same direction as the reference wave, θ = θo . This combined wave has an uninteresting, smoothly and slowly varying intensity pattern. The two diffraction-grating terms in the hologram’s transmissivity produce two interesting waves. One, arising from t ∝ O(x, y, z = 0)Meiky sin θo [and produced by the MRO term of the initial conditions (9.7)], is precisely the same object wave ψobject = O(x, y, z)eikz (aside from overall amplitude) as one would have seen while making the hologram if one had replaced the photographic plate by a window and looked through it. This object wave, carrying [encoded in O(x, y, z)] the famous holographic image with full 3-dimensionality, propagates in the z-direction, θ = 0. The transmissivity’s second diffraction-grating term, t ∝ O∗ (x, y, z = 0)Me−iky sin θo , acting via the MRO∗ term of the initial conditions (9.7), gives rise to a secondary wave which [according to Eq. (9.8)] propagates at an angle θs to the z-axis, where sin θs = 2 sin θo .

(9.9)

(If θo > 30o , then 2 sin θo > 1 which means θs cannot be a real angle, and there will be no secondary wave.) This secondary wave, if it exists, carries an image that is encoded in the complex conjugate O∗ (x, y, z = 0) of the transverse (i.e., x, y) part of the original object wave. Since complex conjugation of an oscillatory wave just reverses the sign of the wave’s phase, this wave in some sense is a “phase conjugate” of the original object wave. When one recalls that the electric and magnetic fields that make up an electromagnetic wave are actually real rather than complex, and that we are using complex wave fields to describe electromagnetic waves only for mathematical convenience, one then realizes that this phase conjugation of the object wave is actually a highly nonlinear process. There is no way, by linear manipulations of the real electric and magnetic fields, to produce the phase-conjugated wave from the original object wave. In Sec. 9.4 we shall develop in detail the theory of phase-conjugated waves, and in Ex. 9.2(b), we shall relate our holographically constructed secondary wave to that theory. As we shall see, our secondary wave is not quite the same as the “phase-conjugated object wave,” but it is the same aside from some distortion along the y direction and a change in propagation direction. Using the theory in the next section, we shall deduce [Ex. 9.2(b)] the following comparison between our holographic object wave and our secondary wave: If one looks into the object wave with one’s eyes (i.e. if one focuses it onto one’s retinas), one sees the original object

14 in all its three-dimensional glory, though single colored, sitting behind the hologram at the object’s original position. Because the image one sees is behind the hologram, it is called a virtual image. If, instead, one looks into the secondary wave with one’s eyes (i.e. if one focuses it onto one’s retinas), one sees the original three-dimensional object, sitting in front of the hologram but turned inside out and distorted; for example, if the object is a human face, the secondary image looks like the interior of a mask made from that human face, with some distortion along the y direction. Because this secondary image appears to be in front of the hologram, it is called a real image—even though one can pass one’s hands through it and feel nothing but thin air. There are many variants on the basic holographic technique that we have described in Figs. 9.5–9.7. In one, called volume holography, the hologram is a number of wavelengths deep rather than being just two-dimensional. For example, it could be made from a photographic emulsion a number of wavelengths thick, in which the absorption length for light (before developing) is longer than the thickness. Such a hologram has a three-dimensional grating structure (grating “surfaces” rather than grating “lines”), and when one reconstructs the holographic image from it in the manner of Fig. 9.7, the third dimension of the grating suppresses the phase-conjugated wave while enhancing the (desired) object wave. In another variant, one reflects light off the hologram instead of transmitting light through it; in such reflection holography, the hologram’s diffraction grating produces a three-dimensional holographic image by the same process as in transmission. Other variants are optimized for reconstructing the holographic image with white light (light that has a broad range of frequencies). Even for the simple two-dimensional hologram of Fig. 9.7, if one sends in white light at the angle θo , one will get a three-dimensional object wave: The hologram’s grating will diffract various wavelengths in various directions. In the direction of the original object wave (the horizontal direction in Fig. 9.7), one will get a 3-dimensional reconstructed image of the same color as was used when constructing the hologram. When one moves vertically away from that direction (as shown in Fig. 9.7), one will see the color of the 3-dimensional image continuously change. A white-light hologram of this type (though one relying on reflection rather than transmission) is used on many credit cards as an impediment to counterfeiting, and has even been used on postage stamps. Holograms are much used in everyday life and technology. Examples are credit cards and holographic lenses in supermarket checkouts. Other examples, that are still in the developmental stage but that may become widespread in a few years, are volume holograms used for three-dimensional movies, and volume holograms for storage of large amounts of data — up to terabytes cm−3 . [KIP: UPDATE THIS] Just as one can draw two-dimensional pictures numerically, pixel-by-pixel, so one can also create and modify holograms numerically.

15

**************************** EXERCISES Exercise 9.2 Derivation: The Holographically Reconstructed Wave (a) Use the Helmholtz-Kirchhoff integral (7.4) to compute all four pieces of the holographically reconstructed wave field. Show that the piece generated by t ∝ O(x, y, z = 0)Meiky sin θo is the same as the field ψobject = O(x, y, z)e−iωt that would have resulted, when making the hologram (Fig. 9.5), had the mirror wave been absent and the photographic plate been replaced by a window. Show that the other pieces have the forms and propagation directions indicated heuristically in Fig. 9.7. (b) Consider the secondary wave generated by t ∝ MO∗ e−iky sin θo . Assume, for simplicity, that the mirror and reference waves propagate nearly perpendicular to the hologram, so θo ≪ 90o and θs ≃ 2θo ≪ 90o ; but assume that θs is still large enough that fairly far from the hologram the object wave and secondary waves separate cleanly from each other. Then, taking account of the fact that the object wave field has the form O(x, y, z)eikz , show that the secondary wave is the “phase conjugated object wave” as defined in Sec. 9.4, except that it is propagating in the +z direction rather than −z, i.e. it has been reflected through the z = 0 plane. Then use this, and the discussion of phase conjugation in Sec. 9.4, to show that the secondary wave carries an image that resides in front of the hologram and is turned inside out, as discussed near the end of Sec. 9.3. Show, further, that if θo is not ≪ 90o degrees (but is < 30o , so θs is a real angle and the secondary image actually exists), then the secondary image is changed by a distortion along the y direction. What is the nature of the distortion, a squashing or a stretch? (c) Suppose that plane-parallel white light is used in the holographic reconstruction of Fig. 9.7. Derive an expression for the direction in which one sees the object’s threedimensional image have a given color (or equivalently wave number). Assume that the original hologram was made with green light and θo = 45 degrees. What are the angles at which one sees the image as green and as red? Exercise 9.3 *** Problem: Compact Disks, DVDs and Blue Ray Disks Information on compact disks (CDs), on DVDs and on Blue Ray disks (BDs) is recorded and read out using holographic lenses. In each successive generation, the laser light has been pushed to a shorter wavelength (λ = 760 nm for CDs, 650 nm for DVDs, 405nm for BDs), and in each generation, the efficiency of the information storage has been improved. In CDs, the information is stored in a single holographic layer on the surface of the disk; in DVDs and BDs, it is usually stored in a single layer, but can also be stored in two layers, one above the other, though with some price in access time.

16 (a) Explain why one can expect to record in a disk’s recording layer, at the very most, (close to) one bit of information per square wavelength of the recording light. (b) The actual storage capacities are up to 900 MB for CDs, 4.7GB for DVDs, and 25 GB for Blue Ray Disks. How efficient are each of these technologies relative to the maximum of part (a)? (c) Estimate the number of volumes of the Encyclopedia Britannica that can be recorded on a CD, on a DVD and on a BD.

****************************

9.4

Phase-Conjugate Optics

Nonlinear optical techniques make it possible to phase conjugate an optical wave in real time, by contrast with holography where the phase conjugation requires recording a hologram and then reconstructing the wave later. In this section we shall explore the properties of phase conjugated waves of any sort (light, sound, plasma waves, ...), and in the next section we shall discuss the technology by which real-time phase conjugation is achieved for light. The basic ideas and foundations for phase conjugation of waves were laid in Moscow, Russia by Boris Yakovovich Zel’dovich5 and his colleagues (1972) and at Caltech by Amnon Yariv (1977). Phase conjugation is the process of taking a monchromatic wave 1 ΨO = ℜ[ψ(x, y, z)e−iωt ] = (ψe−iωt + ψ ∗ e+iωt ) , 2

(9.10a)

and from it constructing the wave 1 ΨPC = ℜ[ψ ∗ (x, y, z)e−iωt ] = (ψ ∗ e−iωt + ψe+iωt ). 2

(9.10b)

Notice that the phase conjugated wave ΨPC is obtainable from the original wave ΨO by time reversal , t → −t. This has a number of important consequences. One is that ΨPC propagates in the opposite direction to ΨO . Others are explained most clearly with the help of a phase-conjugating mirror : Consider a wave ΨO with spatial modulation (i.e., a wave that carries a picture or a signal of some sort). Let the wave propagate in the z-direction (rightward in Fig. 9.8), so ψ = A(x, y, z)ei(kz−ωt) , 5

where A = Aeiϕ

(9.11)

Zel’dovich is the famous son of a famous Russian/Jewish physicist, Yakov Borisovich Zel’dovich, who with Andrei Dmitrievich Sakharov fathered the Soviet hydrogen bomb and then went on to become a dominant figure internationally in astrophysics and cosmology.

ordinary mirror phase-conjugating mirror

refle refle

c t ed w a v e ΨPC

Ψ c t ed w a v e R

m i ng w a v e ΨO

inco

medium

inco

reflected wave

ted lec

ref

medium

m i ng w a v e ΨO

e ΨR v wa

ΨPC

(b)

incoming wave ΨO

(a)

incoming wave ΨO

17

Fig. 9.8: A rightward propagating wave and the reflected wave produced by (a) an ordinary mirror and (b) a phase-conjugating mirror. In both cases the waves propagate through a medium with spatially variable properties, which distorts their phase fronts. In case (a) the distortion is reinforced by the second passage through the variable medium; in case (b) the distortion is removed by the second passage.

is a complex amplitude whose modulus A and phase ϕ change slowly in x, y, z (slowly compared to the wave’s wavelength λ = 2π/k). Suppose that this wave propagates through a time-independent medium with slowly varying physical properties (e.g. a dielectric medium with slowly varying index of refraction n(x, y, z)). These slow variations will distort the wave’s complex amplitude as it propagates. The wave equation for the real, classical field Ψ = ℜ[ψe−iωt ] will have the form LΨ − ∂ 2 Ψ/∂t2 = 0, where L is a spatial differential operator that depends on the medium’s slowly varying physical properties. This wave equation implies that the complex field ψ satisfies Lψ + ω 2 ψ = 0 .

(9.12)

This is the evolution equation for the wave’s complex amplitude. Let the distorted, rightward propagating wave ΨO reflect off a mirror located at z = 0. If the mirror is a phase-conjugating one, then very near it (at z near zero) the reflected wave will have the form ΨPC = ℜ[A∗ (x, y, z = 0)ei(−kz−ωt) ] , (9.13) while if it is an ordinary mirror, then the reflected wave will be ΨR = ℜ[±A(x, y, z = 0)ei(−kz−ωt) ] .

(9.14)

(Here the sign, + or −, depends on the physics of the wave; for example, if Ψ is the transverse electric field of an electromagnetic wave and the mirror is a perfect conductor, the sign will be

first fiber segment

phaseconjugating mirror

sp bea lit m te r

18

ry na r di ro or mir

second fiber segment identical to first

Fig. 9.9: The use of a phase-conjugating mirror in an optical transmission line to prevent the fiber from distorting an optical image. The distortions put onto the image as it propagates through the first segment of fiber are removed during propagation through the second segment.

− to guarantee that the total electric field—original and reflected—vanishes at the mirror’s surface.) These two waves, the phase-conjugated one ΨPC and the ordinary reflected one ΨO , have very different surfaces of constant phase (phase fronts): The phase of the incoming wave ΨO [Eq. (9.11)] as it nears the mirror (z = 0) is ϕ + kz, so (taking account of the fact that ϕ is slowly varying), the surfaces of constant phase are z = −ϕ(x, y, z = 0)/k. Similarly, the phase of the wave ΨR [Eq. (9.14)] reflected from the ordinary mirror is ϕ − kz, so its surfaces of constant phase near the mirror are z = +ϕ(x, y, z = 0)/k, which are reversed from those of the incoming wave as shown in the upper right of Fig. 9.8. Finally, the phase of the wave ΨPC [Eq. (9.13)] reflected from the phase-conjugating mirror is −ϕ − kz, so its surfaces of constant phase near the mirror are z = −ϕ(x, y, z = 0)/k, which are the same as those of the incoming wave (lower right of Fig. 9.8), even though the two waves are propagating in opposite directions. The phase fronts of the original incoming wave and the phase conjugated wave are the same not only near the phase conjugating mirror; they are the same everywhere. More specifically, as the phase-conjugated wave ΨPC propagates away from the mirror [near which it is described by Eq. (9.13)], the propagation equation (9.12) forces it to evolve in such a way as to remain always the phase conjugate of the incoming wave: ΨPC = ℜ[A∗ (x, y, z)e−ikz e−iωt ] .

(9.15)

This should be obvious from the fact that, because the differential operator L in the propagation equation (9.12) for ψ(x, y, z) = Aeikz is real, ψ ∗ (x, y, z) = A∗ e−ikz will satisfy this propagation equation whenever ψ(x, y, z) does. This fact that the reflected wave ΨPC remains always the phase conjugate of the incoming wave ΨO means that the distortions put onto the incoming wave, as it propagates rightward through the inhomogeneous medium, get removed from the phase conjugated wave as it propagates back leftward; see Fig. 9.8. This removal of distortions has a number of important applications. One is for image transmission in optical fibers. Normally when an optical fiber is used to transmit an optical image, the transverse spatial variations n(x, y) of the fiber’s index of refraction (which are required to hold the light in the fiber) distort the image somewhat. The distortions can be eliminated by using a sequence of identical segments of optical fibers separated by phaseconjugating mirrors (Fig. 9.9). A few other applications include (i ) real time holography,

19 (ii ) removal of phase distortions in Fabry-Perot cavities by making one of the mirrors a phase conjugating one, with a resulting improvement in the shape of the beam that emerges from the cavity, (iii ) devices that can memorize an optical image and compare it to other images, (iv ) the production of squeezed light (Ex. 9.10), and (v ) focusing of laser light for laser fusion (Part V of this book). [KIP: SUPPLY REFERENCES] As we shall see in the next section, phase conjugating mirrors rely crucially on the sinusoidal time evolution of the wave field; they integrate up that sinusoidal evolution coherently over some timescale τˆ (typically microseconds to nanoseconds) in order to produce the phase conjugated wave. Correspondingly, if an incoming wave varies on timescales τ long compared to this τˆ (e.g., if it carries a temporal modulation with bandwidth ∆ω ∼ 1/τ small compared to 1/ˆ τ ), then the wave’s temporal modulations will not get time reversed by the phase conjugating mirror. For example, if the wave impinging on a phase conjugating mirror has a frequency that is ωa initially, and then gradually, over a time τ , increases to ωb = ωa + 2π/τ , then the phase conjugated wave will not emerge from the mirror with frequency ωb first and ωa later. Rather, it will emerge with ωa first and ωb later (same order as for the original wave). When the incoming wave’s temporal variations are fast compared to the mirror’s integration time, τ ≪ τˆ, the mirror encounters a variety of frequencies during its integration time, and ceases to function properly. Thus, even though phase conjugation is equivalent to time reversal in a formal sense, a phase conjugating mirror cannot time reverse a temporal signal. It only time reverses monochromatic waves (which might carry a spatial signal).

9.5 9.5.1

Wave-Wave Mixing in Nonlinear Crystals Maxwell’s Equations and Nonlinear Dielectric Susceptibilities

In nonlinear optics one is often concerned with media that that are electrically polarized with polarization (electric dipole moment per unit volume) P, but that have no free charges or currents and are unmagnetized. In such a medium, the charge and current densities associated with the polarization are ρP = −∇ · P ,

jP =

∂P , ∂t

(9.16a)

and Maxwell’s equations in SI units take the form ρP ∇·E= , ǫ0

∇·B =0 ,

∂B ∇×E =− , ∂t

∂E , ∇ × B = µ0 jP + ǫ0 ∂t

(9.16b)

which should be familiar. When rewritten in terms of the electric displacement vector D ≡ ǫ0 E + P ,

(9.17)

these Maxwell equations take following the alternative form ∇·D =0 ,

∇·B =0 ,

∇×E =−

∂B , ∂t

∇ × B = µ0

∂D , ∂t

(9.18)

20 which should also be familiar. By taking the curl of the third Maxwell equation (9.16b), using the relation ∇ × ∇ × E = −∇2 E + ∇(∇ · E), and combining with the time derivative of the fourth Maxwell equation (9.16b) and with ǫ0 µ0 = 1/c2 and jP = ∂P/∂t, we obtain the following wave equation for the electric field, sourced by the medium’s polarization: ∇2 E − ∇(∇ · E) =

1 ∂ 2 (E + P/ǫ0 ) . c2 ∂t2

(9.19)

If the electric field is sufficiently weak and the medium is isotropic (the case treated in most textbooks on electromagnetic theory), the polarization P is proportional to the electric field: P = ǫ0 χ0 E, where χ0 is the medium’s electrical susceptibility. In this case the medium does not introduce any nonlinearities into Maxwell’s equations. In many dielectric media, however, a strong electric field can produce a polarization that is nonlinear in the field. In such “nonlinear media,” the general expression for the polarization P in terms of the electric field is Pi = ǫ0 (χij Ej + χijk Ej Ek + χijkl Ej Ek El + . . .) . (9.20) Here χij , the linear susceptibility, is proportional to the 3-dimensional metric, χij = χ0 gij = χ0 δij , if the medium is isotropic (i.e., if all directions in it are equivalent), but otherwise is more complicated; and the χijk , χijkl , . . . are nonlinear susceptibilities. The normalizations used for these susceptibilities differ from one researcher to another; sometimes the factor ǫ0 is omitted in Eq. (9.20); sometimes factors of 2 or 4 or . . . are inserted. When the nonlinear susceptibilities are important and a monochromatic wave at frequency ω enters the medium, the nonlinearities lead to harmonic generation—i.e., the production of secondary waves with frequencies 2ω, 3ω, . . .; see below. As a result, an electric field in the medium cannot oscillate at just one frequency, and each of the electric fields in expression (9.20) for the polarization must be a sum of pieces with different frequencies. Because the susceptibilities can depend on frequency, this means that when using expression (9.20) one sometimes must break Pi and each Ei up into its frequency components and use different values of the susceptibility to couple the different frequencies together. For example, one of the terms in Eq. (9.20) will become (1)

Pi (1)

(1234)

= ǫ0 χijkl (A)

(2)

(3)

(4)

Ej Ek El

,

(9.21) (1234)

where Pi oscillates at frequency ω1 , Ej oscillates at frequency ωA , and χijkl depends on the four frequencies ω1 , . . . , ω4 . Although this is complicated in the general case, in most practical applications resonant couplings (or equivalently energy and momentum conservation for photons) guarantee that only a single set of frequencies is important, and the resulting analysis simplifies substantially. See Sec. 9.5.2 below. Because all the tensor indices on the susceptibilities except the first index get contracted into the electric field in expression (9.20), we are free to (and it is conventional to) define the susceptibilities as symmetric under interchange of any pair of indices that does not include the first. When [as has been tacitly assumed in Eq. (9.20)] there is no hysteresis in the medium’s response to the electric field, the energy density of interaction between the

21 polarization and the electric field is U = −ǫ0

χij Ei Ej χijk Ei Ej Ek χijkl Ei Ej Ek El + + +··· 2 3 4

,

(9.22a)

and the polarization is related to this energy of interaction, in Cartesian coordinates, by Pi = −

∂U , ∂Ei

(9.22b)

which agrees with Eq. (9.20) providing the susceptibilities are symmetric under interchange of all pairs of indices, including the first. We shall assume such symmetry.6 If the crystal is isotropic (as will be the case if it has cubic symmetry and reflection symmetry), then each of its tensorial susceptibilities is constructable from the metric gij = δij and a single scalar susceptibility; see Ex. 9.4: χij = χ0 gij ,

χijk = 0 ,

1 χijkl = χ4 (gij gkl + gik gjl + gil gjk ) , 3

χijklm = 0 ,

··· .

(9.23) A simple model of a crystal which explains how nonlinear susceptibilities can arise is the following. Imagine each ion in the crystal as having a valence electron that can oscillate in response to a sinusoidal electric field. The electron can be regarded as residing in a potential well which, for low-amplitude oscillations, is very nearly harmonic (potential energy quadratic in displacement; restoring force proportional to displacement; “spring constant” independent of displacement). However, if the electron’s displacement from equilibrium becomes a significant fraction of the interionic distance, it will begin to feel the electrostatic attraction of the neighboring ions, and its spring constant will weaken. This means that the potential the electron sees is really not that of a harmonic oscillator, but rather that of an anharmonic oscillator , V (x) = αx2 − βx3 + · · · , where x is the electron’s displacement from equilibrium. The nonlinearities in this potential cause the electron’s amplitude of oscillation, when driven by a sinusoidal electric field, to be nonlinear in the field strength, and that nonlinear displacement causes the crystal’s polarization to be nonlinear.7 For most crystals, the spatial arrangement of the ions causes the electron’s potential energy V to be different for displacements in different directions, and this causes the susceptibilities to be anisotropic. Because the total energy required to liberate the electron from its lattice site is roughly one eV, and the separation between lattice sites is ∼ 10−10 m, the characteristic electric field for strong instantaneous nonlinearities is ∼ 1V/10−10 m = 1010 V m−1 . Correspondingly, since χijk has dimensions 1/(electric field) and χijkl has dimensions 1/(electric field)2 , the largest that we can expect their Cartesian components to be is χijk ∼ 10−10 m V−1 , 6 7

χ4 ∼ χijkl ∼ 10−20 m2 V−2 .

For further details see, e.g., Secs. 16.2–16.4 and 16.7 of Yariv (1989). Quantitative details are worked out, e.g., in Sec. 16.3 of Yariv (1989).

(9.24)

22 For comparison, the strongest continuous-wave electric fields that occur in practical applications are E ∼ 106 V m−1 corresponding to maximum intensities F ∼ 1 kW / mm2 . These numbers dictate that, unless the third-order χijk are suppressed by isotropy, they will produce much larger effects than the fourth-order χijkl , which in turn will dominate over all higher orders. Among the dielectric crystals with especially strong nonlinear susceptibilities are barium titanate (BaTiO3 ) and lithium niobate (LiNbO3 ); they have χijk ∼ (1 to 10) × 10−11 (Volt/meter)−1 at optical frequencies; i.e., they get as large as our rough estimate for the upper limit.

9.5.2

Wave-Wave Mixing; Resonance Conditions for 3-Wave Mixing

The nonlinear susceptibilities produce wave-wave mixing when a beam of light is sent through a crystal. The mixing produced by χijk is called three-wave mixing because three electric fields appear in the polarization-induced interaction energy, Eq. (9.22a). The mixing produced by χijkl is similarly called four-wave mixing. Three-wave mixing dominates in an anisotropic medium, but is suppressed when the medium is isotropic, leaving four-wave mixing as the leading-order nonlinearity. Let us examine three-wave mixing in a general anisotropic crystal. Because the nonlinear susceptibilities are so small (i.e., because the input wave will generally be far weaker than 1010 V m−1 ), the nonlinearities can be regarded as small perturbations. Suppose that two waves, labeled n = 1 and n = 2, are injected into the anisotropic crystal, and let their wave vectors be kn when one ignores the (perturbative) nonlinear susceptibilities but keeps the large linear χij . Because χij is an anisotropic function of frequency, the dispersion relation for these waves (ignoring the nonlinearities), Ω(k), will typically be anisotropic. The frequencies of the two input waves satisfy this dispersion relation, ωn = Ω(kn ), and the waves’ forms are 1 (n) (n) i(kn ·x−ωn t) (n) i(kn ·x−ωn t) (n)∗ i(−kn ·x+ωn t) = Ej = ℜ Aj e , (9.25) Aj e + Aj e 2 (n)

where we have denoted their vectorial complex amplitudes by Aj . We shall adopt the convention that wave 1 is the one with the larger frequency, so ω1 − ω2 ≥ 0. These two input waves couple, via the third-order nonlinear susceptibility χijk , to produce the following contribution to the medium’s polarization vector: (1)

(2)

Pi = 2ǫ0 χijk Ej Ek (1) (2) (1) (2)∗ = ǫ0 χijk ℜ Aj Ak ei(k1 +k2 )·x ei(ω1 +ω2 )t + Aj Ak ei(k1 −k2 )·x ei(ω1 −ω2 )t . (9.26) This sinusoidally oscillating polarization produces source terms in Maxwell’s equations (9.16b) and the wave equation (9.19): an oscillating, polarization-induced charge density ρP = −∇ · P and current density jP = ∂P/∂t. This polarization charge and current, like P itself [Eq. (9.26)], consist of two traveling waves, one with frequency and wave vector ω3 = ω1 + ω2 ,

k3 = k1 + k2 ;

(9.27a)

23 the other with frequency and wave vector ω3 = ω1 − ω2 ,

k3 = k1 − k2 .

(9.27b)

If either of these (ω3 , k3 ) satisfies the medium’s dispersion relation ω = Ω(k), then the chargecurrent wave will generate an electromagnetic wave that propagates along in resonance with itself. This new electromagnetic wave, with frequency ω3 and wave vector k3 , will grow as it propagates, with its growth being along the direction of the group velocity Vgj = (∂Ω/∂kj )k=k3 . The wave will be weakest at the “back” of the crystal (the side where Vg · x is smallest), and strongest at the “front” (the side where Vg · x is largest). For most choices of the input waves, i.e. most choices of k1 , ω1 = Ω(k1 ), k2 , and ω2 = Ω(k2 ), neither of the charge-density waves, (k3 = k1 ± k2 , ω3 = ω1 ± ω2 ) will satisfy the medium’s dispersion relation, and thus neither will be able to create a third electromagnetic wave resonantly; the wave-wave coupling is ineffective. However, for certain special choices of the input waves, resonant coupling will be achieved, and a strong third wave will be produced. The resonance conditions have simple quantum mechanical interpretations—a fact that is not at all accidental: quantum mechanics underlies the classical theory that we are developing. Each classical wave is carried by photons that have discrete energies En = ~ωn and discrete momenta pn = ~kn . The input waves are able to produce, resonantly, waves with ω3 = ω1 ± ω2 and k3 = k1 ± k2 , if those waves satisfy the dispersion relation. Restated in quantum mechanical terms, the condition of resonance with the “+” sign rather than the “−” is E3 = E1 + E2 , p3 = p1 + p2 . (9.28a) This has the quantum mechanical meaning that one photon of energy E1 and momentum p1 , and another of energy E2 and momentum p2 combine together, via the medium’s nonlinearities, and are annihilated (in the language of quantum field theory), and by their annihilation they create a new photon with energy E3 = E1 + E2 and momentum p3 = p1 + p2 . Thus, the classical condition of resonance is the quantum mechanical condition of energy-momentum conservation for the sets of photons involved in a quantum annihilation and creation process. For this process to proceed, not only must energy-momentum conservation be satisfied, but all three photons must have energies and momenta that obey the photons’ semiclassical Hamiltonian relation E = H(p) (i.e., the dispersion relation ω = Ω(k) with H = ~Ω, E = ~ω, and p = ~k). Similarly, the classical conditions of resonance with the “−” sign rather than the “+” can be written (after bringing photon 2 to the left-hand side) as E3 + E2 = E1 ,

p3 + p2 = p1 .

(9.28b)

This has the quantum mechanical meaning that one photon of energy E1 and momentum p1 gets annihilated, via the medium’s nonlinearities, and from its energy and momentum there are created two photons, with energies E2 , E3 and momenta p2 , p3 that satisfy energy-momentum conservation. This process has a rate (number of such photon annihilation/creation events per second) that is proportional to the number of photons (the total energy) in wave 1 because those photons are being absorbed, and also proportional to the

24 number (the total energy) in wave 2 because those photons are being created via stimulated emission. Since the photon fluxes are proportional to |A1 |2 and |A2 |2 , and since the rate of reactions is proportional to the rate of growth of |A3 |2 , this quantum rate dependence, rate ∝ |A1 |2 |A2|2 , means that A3 must grow at a rate proportional to |A1 ||A2|, which is indeed the case as we shall see in the following example:

9.5.3

Three-Wave Mixing: Growth Equation for an Idealized, Dispersion-Free, Isotropic Medium

Consider, as an example, the simple and idealized case where the linear part of the susceptibility χjk is isotropic and frequency-independent, χjk = χ0 gjk . Then the dispersion relation, ignoring the nonlinearities, takes the simple, nondispersive form (Ex. 9.6 in the isotropic limit) p c (9.29) ω = k, where k = |k|, n = 1 + χ0 n with n a constant. Let the input waves E1 and E2 both propagate in the z direction. Then the resonance condition (energy-momentum conservation for photons) requires that the new, third wave also propagate in the z direction, with frequency and wave number ω3 = ω1 ± ω2 , k3 = k1 ± k2 . This is a highly unusual situation: because the input waves satisfy the dispersion relation (9.29), so also do both of the possible output waves. Making use of the fact that the lengthscale on which the new wave grows is long compared to a wavelength (which is always the case because the fields are always much weaker than 1010 Vm−1 ), the wave equation (9.19) with Pi = ǫ0 Ei − ǫ0 χijk Ej Ek implies the following growth equation for the new wave’s amplitudes (Ex. 9.5): (3)

dAi k3 (123) (1) (2) = −2i 2 χijk Aj Ak at ω3 = ω1 + ω2 , k3 = k1 + k2 dz n (3) k3 (123) (1) (2)∗ dAi = −2i 2 χijk Aj Ak at ω3 = ω1 − ω2 , k3 = k1 − k2 . dz n

(9.30a)

(3)

Therefore, as claimed above, the new wave’s amplitude Ai grows linearly with distance z travelled, and its growth rate is proportional to the product of the field strengths of the two input waves.

9.5.4

Three-Wave Mixing: Resonance Conditions and Growth Equation for an Anisotropic, Axisymmetric Medium; Frequency Doubling

In reality, all nonlinear media have frequency-dependent dispersion relations and many are anisotropic. An example is the crystal KH2 PO4 , also called “KDP”, which is axisymmetric. If we orient its symmetry axis along the z direction, then its linear susceptibility χij has as its only nonzero components χ11 = χ22 and χ33 , which we embody in two indices of refraction, p p p (9.31) no = 1 + χ11 = 1 + χ22 , ne = 1 + χ33 ,

25 0.69

0.67

A

0.66 n-1 0.65

80 extraordinary θ=π/2

θ, degrees

ne-1

0.68

B

no-1 ordinary any θ

0.64

A

60 40 20

0.63 0.62

0

5

10

15

k, µ−1 (a)

20

25

30

0

2

4

6

8

k1 , µ−1 (b)

10

12

14

Fig. 9.10: (a) The inverse of the index of refraction (equal to the phase speed in units of the speed of light) for electromagnetic waves propagating at an angle θ to the symmetry axis of a KDP crystal, as a function of wave number k in reciprocal microns. See Eq. (9.32a) for lower curve and Eq. (9.32b) with θ = π/2 for upper curve. For extraordinary waves propagating at an arbitrary angle θ to the crystal’s symmetry axis, n−1 is a mean [Eq. (9.32b)] of the two plotted curves. (b) The angle θ to the symmetry axis at which ordinary waves with wave number k1 (e.g. point A) must propagate in order that 3-wave mixing be able to produce frequency doubled or phase conjugated extraordinary waves (e.g. point B).

that depend on frequency as shown in Fig. 9.10(a). The subscript “o” stands for ordinary; e, for extraordinary; see below. Maxwell’s equations imply that for plane, monochromatic waves propagating in the x − z plane at an angle θ to the symmetry axis [k = k(sin θex + cos θez )], there are two dispersion relations corresponding to the two polarizations of the electric field: (i) If E is orthogonal to the symmetry axis, then (as is shown in Ex. 9.6), it must also be orthogonal to the propagation direction (i.e., must point in the ey direction), and the dispersion relation is 1 ω/k = (phase speed in units of speed of light) = . c no

(9.32a)

These waves are called ordinary, and their phase speed (9.32a) is the lower curve in Fig. 9.10(a); at k = 10µm−1 (point A), the phase speed is 0.663c, while at k = 20µm−1 , it is 0.649c. (ii) If E is not orthogonal to the symmetry axis, then (Ex. 9.6) it must lie in the plane formed by k and the symmetry axis (the x − z) plane, with Ex /Ez = −(ne /no )2 cot θ [which means that E is not orthogonal to the propagation direction unless the crystal is isotropic, ne = no ]; and the dispersion relation is s ω/k cos2 θ sin2 θ 1 + . (9.32b) = = c n n2o n2e In this case the waves are called extraordinary. As the propagation direction varies from parallel to the symmetry axis (cos θ = 1) to perpendicular (sin θ = 1), this extraordinary phase speed varies from c/no (the lower curve in Fig. 9.10; 0.663c at k = 10µm−1), to c/ne (the upper curve; 0.681c at k = 10µm−1 ).

26 Now, consider the resonance conditions for a frequency-doubling device (discussed in greater detail in the next section): one in which the two input waves are identical, so k1 = k2 and k3 = 2k1 point in the same direction. Let this common propagation direction be at an angle θ to the symmetry axis. Then the resonance conditions reduce to the demands that the output wave number be twice the input wave number, k3 = 2k1 , and the output phase speed be the same as the input phase speed, ω3 /k3 = ω1 /k1 . Now, for waves of the same type (both ordinary or both extraordinary), the phase speed is a monotonic decreasing function of wave number [Fig. 9.10 and Eq. (9.32a), (9.32b)], so there is no choice of propagation angle θ that enables these resonance conditions to be satisfied. The only way to satisfy them is by using ordinary input waves and extraordinary output waves, and then only for a special, frequency-dependent propagation direction. For example, if the input waves are ordinary, with k = 10µm−1 [point A in Fig. 9.10(a)], then the output waves must be extraordinary and must have the same phase speed as the input waves [same height in Fig. 9.10(a); i.e., point B]. This phase speed is between c/ne and c/no , and thus can be achieved for a special choice of propagation angle: θ = 56.7o [point A in Fig. 9.10(b)]. In general, Eq. (9.32a) , (9.32b) imply that the unique propagation direction θ at which the resonance conditions can be satisfied is the following function of the input wave number k1 : 1/n2o (k1 ) − 1/n2o (2k1) . (9.33) sin2 θ = 1/n2e (2k1) − 1/n2o (2k1 ) This resonance angle is plotted as a function of frequency for KDP in Fig. 9.10(b). This special case of identical input waves illustrates the very general phenomenon, that at fixed input frequencies, the resonance conditions can be satisfied only for special, discrete input and output directions. Also quite generally, once one has found wave vectors and frequencies that satisfy the resonance conditions, the growth rate of the new waves is governed by equations similar to Eq. (9.30a), but with the growth along the direction of the group velocity of the new, growing wave; see Ex. 9.7. For example, in any 3-wave process with k3 = k1 + k2 and ω3 = ω1 + ω2 , the rate of growth of the new wave with distance s along the group-velocity direction is (Ex. 9.7) (3) dAl k3 (123) (1) (2) (3) (3) (9.34) fl , = −2iα 2 χijk Ai Aj fk ds n3 (3)

where α is a coefficient of order unity that depends on the dispersion relation, and fl is a unit vector pointing along the electric field direction of the new wave (3). For our frequency doubling example, the extraordinary dispersion relation Eq. (9.32b) for the output wave can be rewritten as s p ck kz2 kx2 ω= kx2 + kz2 . (9.35) = Ωe (k) = c + , where k = n no (k)2 ne (k)2

27 Correspondingly, the group velocity Vgj = ∂Ω/∂kj for the output waves has components 2 n2 cos2 θ d ln no n2 sin2 θ d ln no n x Vg = Vph sin θ − , − n2e n2o d ln k n2e d ln k 2 n2 cos2 θ d ln no n2 sin2 θ d ln no n z − , (9.36) − Vg = Vph cos θ n2o n2o d ln k n2e d ln k where Vph = ω/k = c/n is the phase velocity. For ordinary input waves with k1 = 10µm−1 (point A) and extraordinary output waves with k3 = 20µm−1 (point B), these formulae give for the direction of the output group velocity (direction along which the output waves grow) θg = arctan(Vgx /Vgz ) = 58.4o , compared to the direction of the common input-output phase velocity θ = 56.7o ; and they give for the magnitude of the group velocity Vg = 0.628c, compared to the common phase velocity vph = 0.663c. Thus, the differences between the group velocity and the phase velocity are small, but they do differ. **************************** EXERCISES Exercise 9.4 Derivation and Example: Nonlinear Susceptibilities for an Isotropic Medium Explain why the nonlinear susceptibilities for an isotropic medium have the forms given in Eq. (9.23). [Hint: Use the facts that the χ’s must be symmetric in all their indices, and that, because the medium is isotropic, the χ’s must be constructable from the only isotropic tensors available to us, the (symmetric) metric tensor gij and the (antisymmetric) Levi-Civita tensor ǫijk .] What are the corresponding forms, in an isotropic medium, of χijklmn and χijklmnp? Exercise 9.5 Derivation: Growth Equation in Idealized Three-Wave Mixing Use Maxwell’s equations to derive the propagation equations (9.30a) for the new wave produced by three-wave mixing under the idealized dispersion-free conditions described in the text. Exercise 9.6 *** Example: Dispersion Relation for an Anisotropic Medium Consider a wave propagating through a dielectric medium that is anisotropic, but not necessarily—for the moment—axisymmetric. Let the wave be sufficiently weak that nonlinear effects are unimportant. Define the wave’s displacement vector in the usual way, Di = ǫ0 Ei + Pi [Eq. (9.17)]. (a) Show that Di = ǫo ǫij Ej ,

where ǫij ≡ δij + χij ≡ “dielectric tensor”;

(9.37)

ǫo is often absorbed into the dielectric tensor, but we find it more convenient to normalize ǫij so that in vacuum ǫij = δij .

28 (b) Show that the wave equation (9.19) for the electric field takes the form −∇2 E + ∇(∇ · E) = −

1 ∂2E ǫ · . c2 ∂t2

(9.38)

(d) Now specialize to a monochromatic plane wave with angular frequency ω and wave vector k. Show that the wave equation (9.38) reduces to ω2 ǫij . (9.39a) c2 This equation says that E is an eigenvector of L with vanishing eigenvalue, which is possible if and only if det||Lij || = 0 . (9.39b) Lij Ej = 0 ,

where Lij = ki kj − k 2 δij +

This vanishing determinant is the waves’ dispersion relation. We shall use it in Chap. 20 to study waves in an unmagnetized plasma. (e) Now specialize to an axisymmetric medium and orient the symmetry axis along the z direction so the only nonvanishing components of ǫij are ǫ11 = ǫ22 and ǫ33 , and let the ˆ that makes an angle θ to the symmetry axis. Show wave propagate in a direction k that in this case Lij has the form (no /n)2 − cos2 θ 0 sin θ cos θ 2 2 , 0 (no /n) − 1 0 (9.40a) ||Lij || = k 2 2 sin θ cos θ 0 (ne /n) − sin θ and the dispersion relation (9.39b) reduces to 1 1 1 cos2 θ sin2 θ − =0, (9.40b) − − n2 n2o n2 n2o n2e √ √ √ where 1/n = ω/kc, no = ǫ11 = ǫ22 , and ne = ǫ33 , in accord with Eq. (9.31).

(f) Show that this dispersion relation has the two solutions (ordinary and extraordinary) discussed in the text, Eqs. (9.32a) and (9.32b), and show that the electric fields associated with these two solutions point in the directions described in the text. Exercise 9.7 Growth Equation in Realistic Wave-Wave Mixing Consider a wave-wave mixing process in which the new wave (without its source) satisfies a linearized wave equation of the form Ljk (−i∇, i∂/∂t)Ek = 0 ,

(9.41)

where Ljk is some function of its indicated arguments. For an anisotropic dielectric medium, Ljk will have a form that can be read off Eqs. (9.38) and (9.39a). In this exercise we use the more general form (9.41) for the wave equation so that our analysis will be valid for waves in a magnetized plasma (Chap. 20) as well as in a dielectric. Let k, ω, and f be a wave vector, angular frequency, and unit vector such that Ej = ei(k·x−ωt) fj satisfies this wave equation.

29 (a) Suppose that several other waves interact nonlinearly to produce a piece PjNL of the polarization that propagates as a plane wave with wave vector, angular frequency and electric-field direction k, ω and f, i.e., propagates in resonance with the new wave. Show that this polarization will resonantly generate the new wave in a manner described by the equation ∂2 ∂Ljk ∂ ∂Ljk ∂ Ek = iµo 2 PjNL . − (9.42) ∂ω ∂t ∂kl ∂xl ∂t Here in the functional form (9.41) of Ljk , −i∇ has been replaced by k and i∂/∂t by ω. (b) Orient the axes of a (primed) coordinate system along the eigendirections of Ljk , so its only nonzero components (when evaluated for the resonant wave vector and frequency) are L1′ 1′ = λ1 = 0, L2′ 2′ = λ2 6= 0, and L3′ 3′ = λ3 6= 0. (The vanishing of one of the eigenvalues is demanded by the dispersion relation det||Lab || = 0.) Show that the group velocity of the new, resonant wave is given by Vgj = −

∂λ1 /∂ω , ∂λ1 /∂kj

(9.43)

and the direction of its electric field vector is f = e1′ . (c) By a computation in the primed coordinate system, show that the growth equation (9.42) for the new wave has the form −iµo ω 2 P1NL d ′ A1′ = , dt ∂λ1 /∂ω

(9.44a)

where d/dt is the time derivative moving with the group velocity, d ∂ ∂ = + Vgj j . dt ∂t ∂x

(9.44b)

(d) For 3-wave mixing (e.g., frequency-doubling), show that this growth equation takes the form (9.34), and evaluate the coefficient α.

****************************

9.6 9.6.1

Applications of Wave-Wave Mixing: Frequency Doubling, Phase Conjugation, and Squeezing Frequency Doubling

Frequency doubling (also called second harmonic generation) is one of the most important applications of wave-wave mixing. As we have seen in the previous section, it can be achieved

E2 E3 E1 (pump)

ordinary mirror

sp bea lit m te r

30

Nonlinear Crystal

Fig. 9.11: A phase conjugating mirror based on 3-wave mixing.

by passing a single wave (which plays the role of both wave A = 1 and wave A = 2) through a nonlinear crystal, with the propagation direction chosen to satisfy the resonance conditions [Eq. (9.33) and Fig. 9.10]. Three-wave mixing produces an output wave A = 3 with ω3 = 2ω1 that grows with distance inside the crystal at a rate given by a variant of Eq. (9.30a). By doing a sufficiently good job of satisfying the resonance conditions and choosing the thickness of the crystal appropriately, one can achieve close to 100% conversion of the input-wave energy into frequency-doubled energy. As an example, the Neodymium:YAG (Nd3+ :YAG) laser, which is based on an Yttrium Aluminum Garnet crystal with trivalent Neodymium impurities, is the most attractive of all lasers for a combination of high frequency stability, moderately high power, and high efficiency. However, this laser operates in the infrared, at a wavelength of 1.0641 microns. For some purposes one wants optical light. This can be achieved by frequency doubling the output of the Nd3+ :YAG laser. Thereby one can convert nearly all of the laser’s output power into 0.532 micron (green) light; cf. Ex. 9.8. This is how green laser pointers, used in lecturing, work. Frequency doubling also plays a key role in laser fusion, where intense, pulsed laser beams, focused on a pellet of fusion fuel, compress and heat the pellet to high densities and temperatures. Because the beam’s energy flux is inversely proportional to the area of its focussed cross section, and because the larger the wavelength, the more seriously diffraction impedes making the cross section small, it is important to give the beam a very short wavelength. This is achieved by multiple frequency doublings.

9.6.2

Phase Conjugation

A second example of three-wave mixing is phase conjugation. Phase conjugation is achieved by pumping the crystal with a very strong plane wave n = 1 at a frequency ω1 = 2ω2 , and using as wave n = 2 a much weaker one that is to be phase conjugated. In this case the (3) propagation equation (9.30a) produces the new wave, n = 3, with complex amplitude Ai (2)∗ proportional to Ak . An ordinary mirror can then be used to turn the new wave around (Fig. 9.11); the result is a fully phase-conjugated wave. Thus, the three-wave-coupling crystal and the ordinary mirror together function as a phase-conjugating mirror. Four-wave mixing in an isotropic crystal can be used for phase conjugation, without the help of an ordinary mirror, and without the difficulties in maintaining temporal and spatial resonance simultaneously that plague three-wave mixing—but at the price of working with

31 a much weaker nonlinearity. Suppose that the goal is to phase conjugate a wave n = 1 that enters an isotropic crystal moving in the +z direction. Wave n = 4 is to be the fully phase conjugated wave, and thus must leave the crystal moving in the −z direction. Waves n = 2 and 3, used to pump the crystal, are chosen to have the same frequency as the wave to be conjugated: ω2 = ω3 = ω1 = ω4 ≡ ω. The pump waves are injected into the crystal moving perpendicular to the incoming wave n = 1 and in opposite directions, so their wave vectors are k2 = −k3 ⊥ k1 . Then the four-wave-mixing polarization produced by the two pump waves and the incoming wave will include a term 1 (2) i(k2 ·x−ωt) (3) i(k3 ·x−ωt) (1)∗ −i(kz−ωt) Pi = ǫ0 ℜ χ4 Aj e , (9.45a) Aj e Ai e 3 which generates via Maxwell’s equations the fully phase-conjugated wave (4) (1)∗ −i(kz−ωt) . Ei = ℜ const × Ai e

(9.45b)

See Ex. 9.9 for details.

9.6.3

Squeezed Light

Section not yet written; will be in next week’s version of this chapter. **************************** EXERCISES Exercise 9.8 Problem: Efficiency of Frequency Doubling A Nd+3 :YAG laser puts out 10 Watts of linearly polarized light in a Gaussian beam at its lasing wavelength of 1.0641 microns. It is desired to frequency double a large fraction of this light by 3-wave mixing, using a Ba2 NaNb5 O15 crystal for which the relevant component of the susceptibility has magnitude |χijk | ∼ 4 × 10−11 (Volts/m)−1 . The crystal has a thickness of 1 centimeter. To what diameter do should the laser’s light beam be focused before sending it through the crystal, in order to guarantee that a large fraction of its power will be frequency doubled? Show that diffraction effects are small enough that the beam diameter can remain ≃ do all the way through the crystal. [Hint: Recall the spreading of a Gaussian-shaped beam, as described by Eqs. (7.36).] Exercise 9.9 Derivation and Example: Phase Conjugation by Four-Wave Mixing (a) Consider an idealized crystal which is isotropic and has scalar susceptibilities χ0 and χ4 that are independent of frequency. Show that Maxwell’s equations in the crystal imply 2 2 2 2 n ∂ E n ∂ 2 2 2 − + ∇ E = const 2 2 (E E) − ∇∇ · (E E) , (9.46) c ∂t2 c ∂t where n is the index of refraction and E2 = E · E. What is the constant in terms of χ4 ?

32 (b) Assume (as is always the case) that χ4 E2 ≪ 1. Then Eq. (9.46) can be solved using perturbation theory. We shall do so, in this exercise, for the physical setup described at the end of Sec. 9.6.2, which produces a phase-conjugated output. As a concrete realization of that setup, assume that the incoming (“zero-order”) waves inside the medium are (i ) a signal wave which propagates in the z direction and has a slowly varying complex amplitude A(1) (x, y, z) containing some sort of picture E(1) = ℜ[A(1) (x, y, z)ei(kz−ωt) ex ] ;

(9.47a)

and (ii ) two pump waves which propagate in the x and −x directions: E(2) = ℜ[A(2) ei(kx−ωt) ez ] ,

E(3) = ℜ[A(3) ei(−kx−ωt) ez ] ,

(9.47b)

Here |A(1) | ≪ |A(2) |, |A(1) | ≪ |A(3) |, and all the frequencies ω and wave numbers k ≡ (n/c)ω are identical. Give a list of the frequencies of all the new waves E(4) that are generated from these three waves by four-wave mixing. Among those frequencies is that of the original three waves, ω. A narrow-band filter is placed on the output of the crystal to assure that only this frequency, ω, emerges. (c) When one computes the details of the four-wave mixing using the propagation equation (9.46), one finds that the only new waves with frequency ω that can be generated have wave numbers k = (n/c)ω. Explain why. Give a list of all such waves including (i ) their propagation directions, (ii ) their polarization directions, and (iii ) their dependences on A1 , A2 , and A3 . (d) Show that the only wave which emerges from the crystal propagating in the −z direction with frequency ω is the phase-conjugated signal wave. Suppose that the crystal extends along the z direction from z = 0 to z = L. What is the amplitude of this phase conjugated wave throughout the crystal and emerging from its front face, z = 0? Exercise 9.10 *** Example: Squeezed Light Produced by Phase Conjugation Consider a plane electromagnetic wave in which we ignore polarization, and for which we write the complex amplitude A as X1 + iX2 so E = ℜ[(X1 + iX2 )ei(kz−ωt) ] .

(9.48)

This wave is slightly noisy: X1 and X2 are randomly varying functions of t − zn/c, ¯1, X ¯ 2 , variances (∆X1 )2 = (∆X2 )2 ≡ σ 2 , and correlation time τ∗ . One with means X can describe such waves by an error box in the complex amplitude plane [Fig. 9.12 (a)]. Because (∆X1 )2 = (∆X2 )2 , the error box is round. This plane wave is split into two parts by a beam splitter, one part is reflected off a phase-conjugating mirror, the other part is reflected off an ordinary mirror, and the two parts are then recombined at the beam splitter.

33 X 2

2σ

X 2

X 2

(X , X ) 1 2 X 1

X 1

X 1

Fig. 9.12: Error boxes in the complex amplitude plane for several different waves: (a) The original wave discussed at the beginning of Ex. 9.10. (b) A wave in a squeezed state with reduced phase noise. (c) A wave in a squeezed state with reduced amplitude noise.

(a) Suppose that the phase-conjugating mirror is a pumped, four-wave-mixing crystal of the type analyzed in Ex. 9.9, and that its length is L. Suppose, further, that the incoming wave’s correlation time τ∗ is long compared to the time, 2Ln/c, required for light to propagate from one end of the crystal to the other and back. Explain why the phase-conjugating mirror will not time reverse the variations of the wave’s complex amplitude. (b) Suppose that the two mirrors (one phase-conjugating, the other ordinary) reflect their waves with slightly different efficiencies, so the beams that recombine at the beam splitter have complex amplitudes whose moduli differ by a fractional amount ǫ ≪ 1. Show that by appropriately adjusting the relative phase delay of the recombining beams, one can make the recombined light have the form E = αℜ[(2X1 + iǫX2 )ei(kz−ωt) ] .

(9.49)

Here α is a constant and z is distance along the optic axis. In this recombined light the mean and the variance of X2 are both reduced drastically (by a factor ǫ/2) relative to those of X1 . The corresponding error box, shown in Fig. 9.12 (b), is squeezed along the X2 direction. Correspondingly, the light itself is said to be in a squeezed state. ¯1, X ¯ 2 , and squeeze factor ǫ, one can produce (c) By appropriately choosing the initial X light with an error box of the form shown in Fig. 9.12 (c) rather than (b). How should they be chosen? Explain why the light in Fig. 9.12 (b) is said to have “reduced phase noise” and that in (c) to have “reduced amplitude noise.” (d) For the light with reduced phase noise, the electric field E(t) measured at some fixed location in space lies inside the stippled band shown in Fig. 9.13, with 90 per cent confidence. Draw similar stippled bands depicting the electric field E(t) for the original, unsqueezed light and for squeezed light with reduced amplitude noise. Discussion: A quantum mechanical analysis shows that, not only is classical noise squeezed by the above process; so is quantum noise. However, in quantum mechanics X1 and X2 are Hermitean operators which do not commute, and correspondingly there is an uncertainty relation of the form ∆X1 ∆X2 ≥ (a quantum limit). As a result,

34 E

t

Fig. 9.13: The error band for the electric field E(t), as measured at a fixed location in space when squeezed light with reduced phase noise passes by.

when one begins with minimum uncertainty light, ∆X1 = ∆X2 =(the minimum, corresponding to an energy 21 ~ω in the chosen wave mode), for example the vacuum state of quantum electrodynamics, squeezing that light will reduce ∆X1 only at the price of increasing ∆X2 . Such squeezed light, including the “squeezed vacuum,” has great promise for fundamental physics experiments and technology. For example, it can be used to reduce the photon shot √ noise of an interferometer or a laser below the “standard quantum limit” of ∆N = N, thereby improving the signal to noise ratio in certain communications devices, and in laser interferometer gravitational-wave detectors.8

****************************

9.7

Other Methods to Produce Wave-Wave Mixing

9.7.1

Photorefractive Effect

To be written by next Wednesday.

9.7.2

Ponderomotive Squeezing by a Suspended Mirror

To be written by next Wednesday. Box 9.2 Important Concepts in Chapter 9 • To be supplied by next Wednesday

8

For detailed discussions see, e.g., Walls (1983), Wu et al . (1986), and LaPorta et al . (1989).

35

Bibliographic Note Three excellent textbooks on the topics covered by this chapter are Boyd (1992), Yariv (1989), and Yariv and Yeh (2006) especially Chap. 8. Also useful for nonlinear optics is Shen (1984). For in-depth discussions of recent developments in this field, see Yariv and Yeh (2006); also, though it is somewhat old by now, Agrawal and Boyd (1992) and .

Bibliography Agrawal, G. P. & Boyd, R. W. eds. 1992 Contemporary Nonlinear Optics New York: Academic Press Basov, N. G. & Prokhorov, A.M. 1954 JETP 27 431 Basov, N. G. & Prokhorov, A.M. 1955 JETP 28 249 Boyd, R. W. 1992 Nonlinear Optics New York: Academic Press Cathey, W. T. 1974 Optical Information Processing and Holography New York: Wiley Feynman, R. P., Leighton, R. B., & Sands, M. 1965 The Feynman Lectures on Physics New York: Addison Wesley Ghatak, A. K. & Thyagarajan, K. 1978 Contemporary Optics New YorK: Plenum Gordon, J. P., Zeiger, H. J., & Townes, C. H. 1954 Phys. Rev. 95 282 Gordon, J. P., Zeiger, H. J., & Townes, C. H. 1955 Phys. Rev. 99 1264 Iizuka, K. 1987 Engineering Optics Berlin: Springer-Verlag Jackson, J. D. 1999 Classical ElectrodynamicsNew York: Wiley La Porta, A., Slusher, R. E., & Yurke, B. 1989 Phys. Rev. Lett. 62 28 Maiman, T.H. 1960 Nature 187 493 Shen, Y.R. 1984 The Principles of Nonlinear Optics New York: Wiley Walls, D.F. 1983 Nature 306 141 Weber, J. 1953 IRE Trans. Prof. Group on Electron Devices 3 1 Wu, L.-A., Kimble, H. J., Hall, J .L., & Wu, H. 1986 Phys. Rev. Lett. 57 2520 Yariv, A. 1977 J. Opt. Soc. Amer. 67 1 Yariv, A. 1989 Quantum Electronics New York: Wiley

36 Yariv, A. and Yeh, P. 2006 Photonics: Optical Electronics in Modern communications Oxford: Oxford University Press Zel’dovich, B.Y., Popovichev, V.I., Ragulskii, V.V., & Faisullov, F.S. 1972 JETP 15 109

Contents III

ELASTICITY

ii

10 Elastostatics 10.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Displacement and Strain; Expansion, Rotation, and Shear . . . . . . . . . . 10.2.1 Displacement Vector and Strain Tensor . . . . . . . . . . . . . . . . . 10.2.2 Expansion, Rotation and Shear . . . . . . . . . . . . . . . . . . . . . 10.3 Stress and Elastic Moduli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Stress Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.2 Elastic Moduli and Elastostatic Stress Balance . . . . . . . . . . . . . 10.3.3 Energy of Deformation . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.4 Molecular Origin of Elastic Stress . . . . . . . . . . . . . . . . . . . . 10.4 Young’s Modulus and Poisson’s Ratio for an Isotropic Material: A Simple Elastostatics Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 T2 Cylindrical and Spherical Coordinates: Connection Coefficients and Components of Strain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 T2 Solving the 3-Dimensional Elastostatic Equation in Cylindrical Coordinates: Simple Methods, Separation of Variables and Green’s Functions . . . 10.6.1 Simple Methods: Pipe Fracture and Torsion Pendulum . . . . . . . . 10.6.2 Separation of Variables and Green’s Functions: Thermoelastic Noise in a LIGO Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Reducing the Elastostatic Equations to One Dimension for a Bent Beam; Cantilever Bridges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8 Bifurcation, Buckling and Mountain Folding . . . . . . . . . . . . . . . . . . 10.9 T2 Reducing the Elastostatic Equations to Two Dimensions for a Deformed Thin Plate: Stress-Polishing a Telescope Mirror . . . . . . . . . . . . . . . .

i

1 2 4 5 6 11 11 14 16 18 20 22 27 27 29 36 43 48

Part III ELASTICITY

ii

Chapter 10 Elastostatics Version 0810.3.K, 11 January 2009. Changes from 0810.2.K are a reworking of Sec. 10.1, Overview, to reflect the changes in this chapter’s organization that I made this year, and correction of a few typos. - kip Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 10.1 Reader’s Guide • This chapter relies heavily on the geometric view of Newtonian physics (including vector and tensor analysis) laid out in the sections of Chap. 1 labeled “[N]”. • Chapter 11 (Elastodynamics) is an extension of this chapter; to understand it, this chapter must be mastered. • The idea of the irreducible tensorial parts of a tensor, and its most important example, decomposition of the strain tensor into expansion, rotation, and shear (Sec. 10.2.2 and Box 10.2) will be encountered again in Part IV (Fluid Mechanics) and Part V (Plasma Physics). • Differentiation of vectors and tensors with the help of connection coefficients (Sec. 10.5), will be used occasionally in Part IV (Fluid Mechanics) and Part V (Plasma Physics), and will be generalized to non-orthonormal bases in Part VI (General Relativity) and used extensively there. • No other portions of this chapter are important for subsequent Parts of this book.

1

2

10.1

Overview

In this chapter we consider static equilibria of elastic solids — for example, the equilibrium shape and internal strains of a cantilevered balcony on a building, deformed by the weight of people standing on it. From the point of view of continuum mechanics, a solid (e.g. a wooden board in the balcony) is a substance that recovers its shape after the application and removal of any small stress. In other words, after the stress is removed, the solid can be rotated and translated to assume its original shape. Note the requirement that this be true for any stress. Many fluids (e.g. water) satisfy our definition as long as the applied stress is isotropic; however, they will deform permanently under a shear stress. Other materials (for example, the earth’s crust) are only elastic for limited times, but undergo plastic flow when a stress is applied for a long time. We shall confine our attention in this chapter to elastic solids, which deform while the stress is applied in such a way that the magnitude of the deformation (quantified by a tensorial strain) is linearly proportional to the applied, tensorial stress. This linear, threedimensional stress-strain relationship, which we shall develop and explore in this chapter, generalizes Hooke’s famous one-dimensional law (originally expressed in the concise Latin phrase “Ut tensio, sic vis”). In English, Hooke’s law says that, if an elastic wire or rod is stretched by an applied force F (Fig. 10.1a), its fractional change of length (its strain) is proportional to the force, ∆ℓ/ℓ ∝ F . In the language of stresses and strains (introduced below), Hooke’s law says that the longitudinal stress Tzz ≡ (longitudinal force F per unit cross sectional area A of the rod) = F/A is proportional to the longitudinal strain Szz = ∆ℓ/ℓ, with a proportionality constant E called Young’s modulus that is a property of the material from which the rod is made: F ∆ℓ ≡ Tzz = ESzz ≡ E . A ℓ

(10.1)

ξ

+∆ z

F (a)

(b)

Fig. 10.1: (a) Hooke’s one-dimensional law for a rod stretched by a force F : ∆ℓ/ℓ ∝ F . (b) The 3-dimensional displacement vector ξ(x) inside the stretched rod.

3 Hooke’s law will turn out to be one component of the three-dimensional stress-strain relation, but in order to understand it deeply in that language, we must first develop a deep understanding of the strain tensor and the stress tensor. Our approach to these tensors will follow the geometric, frame-independent philosophy introduced in Chap. 1. Some readers may wish to review that philosophy and associated mathematics by rereading the “[N]” sections of Chap. 1. We begin in Sec. 10.2 by introducing, in a frame-independent way, the vectorial displacement field ξ(x) inside a stressed body and its gradient ∇ξ, which is the strain tensor S = ∇ξ. We then express the strain tensor as the sum of its irreducible tensorial parts: an expansion Θ, a rotation R, and a shear Σ. In Sec. 10.3.1 we introduce the stress tensor for a deformed, isotropic, elastic material. In Sec. 10.3.2, we discuss how such a material resists volume change (an expansion-type strain) by developing an opposing isotropic stress, with a stress/strain ratio that is equal to the bulk modulus K; and how the material also resists a shear-type strain by developing an opposing shear stress with a stress/strain ratio equal to twice the shear modulus 2µ. We then compute the elastic force density inside the material, as the divergence of the sum of these two elastic stresses, and we formulate the law of elastostatic stress balance as the vanishing sum of the material’s internal elastic force density and any other force densities that may act (usually a gravitational force density due to the weight of the elastic material). We discuss the analogy between this elastostatic stress-balance equation and Maxwell’s electrostatic or magnetostatic equations, and we describe how mathematical techniques common in electrostatics (separation of variables and Green’s functions) can also be applied to solve the elastostatic stress-balance equation, subject to boundary conditions that describe external forces (e.g. the pressure of a person’s feet, standing on a balcony). In Sec. 10.3.3 we evaluate the energy density stored in elastostatic strains, and in Sec. 10.3.4 we discuss the atomic-force origin of the elastostatic stresses and use atomic considerations to estimate the magnitudes of the bulk and shear moduli. In Sec. 10.4 we present a simple example of how to solve the three-dimensional equation of elastostatic force balance subject to the appropriate boundary conditions on the surface of a stressed body. Specficially, we use our three-dimensional formulas to deduce Hooke’s law for the one-dimensional longitudinal stress and strain in a stretched wire, and we thereby relate Young’s modulus E of Hooke’s law to the bulk modulus K that resists three-dimensional volume changes, and the shear modulus µ that resists three-dimensional shears. Because elasticity theory entails computing gradients of vectors and tensors, and practical calculations are often best performed in cylindrical or spherical coordinate systems, we present a mathematical digression in Sec. 10.5 — an introduction to how one can perform practical calculations of gradients of vectors and tensors in the orthonormal bases associated with curvilinear coordinate systems, using the concept of a connection coefficient (the directional derivative of one basis vector field along another). In Sec. 10.5 we also use these connection coefficients to derive some useful differentiation formulae in cylindrical and spherical coordinate systems and bases. As illustrative examples of both connection coefficients and elastostatic force balance, in Sec 10.6 and various exercises, we give practical examples of solutions of the elastostatic force-balance equation in cylindrical coordinates: for the stresses and strains in a pipe that

4 contains a fluid under pressure (Sec. 10.6.1 and Ex. 10.11) and in the wire of a torsion pendulum (Ex. 10.12); and in a cylinder that is subjected to a Gaussian-shaped pressure on one face (Sec. 10.6.2). As we shall see in Ex. 10.14, this cylinder-pressure problem is one part of computing the spectral density of thermoelastic noise inside the test-mass mirrors of a gravitational-wave interferometer — an application of the fluctuation-dissipation theorem that we introduced and discussed in Sec. 5.6 and Ex. 5.8 of Chap. 5 (Random Processes). We shall sketch how to solve this cylinder-pressure problem using the two common techniques of elastostatics and electrostatics: separation of variables (text of Sec. 10.6.2) and a Green’s function (Ex. 10.15) When the elastic body that one studies is very thin in two dimensions compared to the third (e.g., a wire or rod), we can reduce the three-dimensional elastostatic equations to a set of coupled one-dimensional equations by performing a two-lengthscale expansion. The key to this dimensional reduction is taking moments of the elastostatic equations. We illustrate this technique in Sec. 10.7, where we treat the bending of beams (e.g. for a cantilevered balcony or bridge), and in exercises where we treat the bending of the support wire of a Foucault pendulum, and the bending of a very long, thin wire to which forces are applied at the ends (elastica). Elasticity theory, as developed in this chapter, is an example of a common (some would complain far too common) approach to physics problems, namely to linearize them. Linearization may be acceptable when the distortions are small. However, when deformed by sufficiently strong forces, elastic media may become neutrally stable to small displacements, which can then grow to large amplitude. We shall study an example of this phenomenon in Sec. 10.8, using our dimensionally reduced, one-dimensional theory. Our example will lead us to a classic result, due originally to Euler: that when an elastic solid is compressed, there comes a point where stable equilibria can disappear altogether. For an applied force in excess of this maximum, the solid will buckle, a phenomenon that gives rise, in the earth’s crust, to mountains (as we shall discuss). Buckling is associated with bifurcation of equilibria, a phenomenon that is common to many physical systems, not just elastostatic ones. We illustrate bifurcation in Sec. 10.8 using a strut under a compressive load, and we will encounter bifurcation again in Sec. 14.5, when we study the route to turbulence in fluids and the route to chaos in other dynamical systems. Finally, in Sec. 10.9 we discuss dimensional reduction by the method of moments for bodies that are thin in only one dimension, not two; e.g. plates and thin mirrors. In this case the three-dimensional elastostatic equations are reduced to two dimensions. We illustrate our two-dimensional formalism by the stress polishing of telescope mirrors.

10.2

Displacement and Strain; Expansion, Rotation, and Shear

We begin our study of elastostatics by introducing the elastic displacement vector, its gradient (the strain tensor), and the irreducible tensorial parts of the strain.

5

10.2.1

Displacement Vector and Strain Tensor

We label the position of a point (a tiny bit of solid) in an unstressed body, relative to some convenient origin in the body, by its position vector x. Let a force be applied so the body deforms and the point moves from x to x + ξ(x); we call ξ the point’s displacement vector. If ξ were constant (i.e., if its components in a Cartesian coordinate system were independent of location in the body), then the body would simply be translated and would undergo no deformation. To produce a deformation, we must make the displacement ξ change from one location to another. The most simple, coordinate-independent way to quantify those changes is by the gradient of ξ, ∇ξ. This gradient is a second-rank tensor field;1 we shall give it the name strain tensor and shall denote it S: S = ∇ξ .

(10.2a)

This strain tensor is a geometric object, defined independent of any coordinate system in the manner described in Sec. 1.9. In slot-naming index notation (Sec. 1.5), it is denoted Sij = ξi;j ,

(10.2b)

where the index j after the semicolon is the name of the gradient slot. In a Cartesian coordinate system the components of the gradient are always just partial derivatives [Eq. (1.54c)], and therefore the Cartesian components of the strain tensor are Sij =

∂ξi = ξi,j . ∂xj

(10.2c)

(Recall that indices following a comma represent partial derivatives.) In the next section we shall learn how to compute the components of the strain in cylindrical and spherical coordinates. For the one-dimensional Hooke’s-Law situation of Fig. 10.1a, we have ξz = z(∆ℓ/ℓ) and Szz = ξz;z = ∂ξz /∂z = ∆ℓ/ℓ [Eq. (10.1)]. If we look in greater detail at the interior of the stretched rod, paying attention to its three-dimensional structure, we see that the rod’s resistance to volume changes causes it to shrink in cross section as it stretches in length. This shows up as an inward component of the displacement vector (Fig. 10.1b), so Sxx = ∂ξx /∂x < 0, Syy = ∂ξy /∂y < 0, Szz = ∂ξz /∂z > 0. In any small neighborhood of any point xo in a deformed body, we can reconstruct the displacement vector ξ from the strain tensor, up to an additive constant. In Cartesian coordinates, by virtue of a Taylor-series expansion, ξ is given by ξi (x) = ξi (xo ) + (xj − xo j )(∂ξi /∂xj ) + . . . = ξi (xo ) + (xj − xo j )Sij + . . . .

(10.3)

If we place our origin of Cartesian coordinates at xo and let the origin move with the point there as the body deforms [so ξ(xo ) = 0], then Eq. (10.3) becomes ξi = Sij xj 1

when |x| is sufficiently small .

(10.4)

In our treatment of elasticity theory, we shall make extensive use of the tensorial concepts introduced in Chap. 1.

6 We have derived this as a relationship between components of ξ, x, and S in a Cartesian coordinate system. However, the indices can also be thought of as the names of slots (Sec. 1.5) and correspondingly Eq. (10.4) can be regarded as a geometric, coordinate-independent relationship between the vectors and tensor ξ, x, and S. In Ex. 10.2 we shall use Eq. (10.4) to gain insight into the displacements associated with various types of strain.

10.2.2

Expansion, Rotation and Shear

In Box 10.2 we introduce the concept of the irreducible tensorial parts of a tensor, and we state that in physics, when one encounters a new, unfamiliar tensor, it is often useful to identify the tensor’s irreducible parts. The strain tensor S is an important example. It is a general, second-rank tensor. Therefore, as we discuss in Box 10.2, its irreducible tensorial parts are its trace Θ ≡ Tr(S) = Sii = ∇·ξ, which is called the deformed body’s expansion for reasons we shall explore below; its symmetric, trace-free part Σ, which is called the body’s shear ; and its antisymmetric part R, which is called the body’s rotation: Θ = Sii = ∇ · ξ ,

(10.5a)

1 1 1 1 Σij = (Sij + Sji) − Θgij = (ξi;j + ξj;i) − Θgij , 2 3 2 3

(10.5b)

1 Rij = (Sij − Sji) . 2

(10.5c)

Here gij is the metric, which has components gij = δij (Kronecker delta) in Cartesian coordinates. The strain tensor can be reconstructed from these irreducible tensorial parts in the following manner [Eq. (4) of Box 10.2, rewritten in abstract notation]: 1 ∇ξ = S = Θg + Σ + R. 3

(10.6)

Let us consider the physical effects of the three separate parts of S in turn. To understand expansion, consider a small 3-dimensional piece V of a deformed body (a “volume element”). An element of area2 dΣ on the surface ∂V of V gets displaced through a vectorial distance ξ and in the process sweeps out a volume ξ ·dΣ. Therefore, the change in the volume element’s volume, produced by an arbitrary (small) displacement field ξ is Z Z Z δV = dΣ · ξ = dV ∇ · ξ = ∇ · ξ dV = (∇ · ξ)V . (10.7) ∂V

V

V

Here we have invoked Gauss’ theorem in the second equality, and in the third we have used the smallness of V to infer that ∇ · ξ is essentially constant throughout V and so can be 2

Note that we use Σ for a vectorial area and Σ for a strain tensor. There should be no confusion.

7

Box 10.2 Irreducible Tensorial Parts of a Second-Rank Tensor in 3-Dimensional Euclidean Space In quantum mechanics an important role is played by the “rotation group,” i.e., the set of all rotation matrices viewed as a mathematical entity called a group; see, e.g., chapter XIII of Messiah (1962) or chapter 16 of Mathews and Walker (1965). Each tensor in 3-dimensional Euclidean space, when rotated, is said to generate a specific “representation” of the rotation group. Tensors that are “big”, in a sense to be discussed below, can be broken down into a sum of several tensors that are “as small as possible.” These smallest tensors are said to generate “irreducible representations” of the rotation group. All this mumbo-jumbo is really very simple, when one thinks about tensors as geometric, frame-independent objects. As an example, consider an arbitrary second-rank tensor Sij in three-dimensional, Euclidean space. In the text Sij is the strain tensor. From this tensor we can construct the following “smaller” tensors by linear operations that involve only Sij and the metric gij . (As these smaller tensors are enumerated, the reader should think of the notation used as basis-independent, frame-independent, slot-naming index notation.) The smaller tensors are the “trace” of Sij , Θ ≡ Sij gij = Sii ; (1) the antisymmetric part of Sij

1 Rij ≡ (Sij − Sji ) ; 2 and the symmetric, trace-free part of Sij 1 1 Σij ≡ (Sij + Sji ) − gij Skk . 2 3

(2)

(3)

It is straightforward to verify that the original tensor Sij can be reconstructed from these three “smaller” tensors, plus the metric gij as follows: 1 Sij = Θgij + Σij + Rij . 3

(4)

One way to see the sense in which Θ, Rij , and Σij are “smaller” than Sij is by counting the number of independent real numbers required to specify their components in an arbitrary basis. (In this counting the reader is asked to think of the index notation as components on the chosen basis.) The original tensor Sij has 3 × 3 = 9 components (S11 , S12 , S13 , S21 , . . .), all of which are independent. By contrast, the 9 components of Σij are not independent; symmetry requires that Σij ≡ Σji , which reduces the number of independent components from 9 to 6; trace-freeness, Σii = 0 reduces it further from 6 to 5. The antisymmetric tensor Rij has just three independent components, R12 , R23 ,

8

Box 10.2, Continued and R31 . The scalar Θ has just one. Therefore, (5 independent components in Σij ) + (3 independent components in Rij ) + (1 independent components in Θ) = 9 = (number of independent components in Sij ). The number of independent components (one for Θ, 3 for Rij , 5 for Σij ) is a geometric, basis-independent concept: It is the same, regardless of the basis used to count the components; and for each of the “smaller” tensors that make up Sij , it is easily deduced without introducing a basis at all: (Here the reader is asked to think in slot-naming index notation.) The scalar Θ is clearly specified by just one real number. The antisymmetric tensor Rij contains precisely the same amount of information as the vector 1 φi ≡ − ǫijk Rjk , 2

(5)

as one can see from the fact that Eq. (5) can be inverted to give Rij = −ǫijk φk ;

(6)

and the vector φi can be characterized by its direction in space (two numbers) plus its length (a third). The symmetric, trace-free tensor Σij can be characterized geometrically by the ellipsoid (gij + εΣij )ζi ζj = 1, where ε is an arbitrary number ≪ 1 and ζi is a vector whose tail sits at the center of the ellipsoid and head moves around on the ellipsoid’s surface. Because Σij is trace-free, this ellipsoid has unit volume. It therefore is specified fully by the direction of its longest principal axis (two numbers) plus the direction of a second principle axis (a third number) plus the ratio of the length of the second axis to the first (a fourth number) plus the ratio of the length of the third axis to the first (a fifth number). Each of the tensors Θ, Rij (or equivalently φi ), and Σij is “irreducible” in the sense that one cannot construct any “smaller” tensors from it, by any linear operation that involves only it, the metric, and the Levi-Civita tensor. Irreducible tensors in 3-dimensional Euclidean space always have an odd number of components. It is conventional to denote this number by 2l + 1 where the integer l is called the “order of the irreducible representation of the rotation group” that the tensor genenerates. For Θ, Rij (or equivalently φi ), and Σjk , l is 0, 1, and 2 respectively. These three tensors can be mapped into the spherical harmonics of order l = 0, 1, 2; and their 2l + 1 components correspond to the 2l + 1 values of the quantum number m = −l, −l + 1 . . . , l − 1, l. For details see, e.g., section II.C of Thorne (1980). In physics, when one encounters a new, unfamiliar tensor, it is often useful to identify the tensor’s irreducible parts. They almost always play important, independent roles in the physical situation one is studying. We meet one example in this chapter (the strain tensor), and shall meet another when we study fluid mechanics (Chap. 12).

9

+

=

S

g

+ R

Fig. 10.2: A simple example of the decomposition of a two dimensional distortion of a square body into an expansion (Θ), a shear (Σ), and a rotation (R).

pulled out of the integral. Therefore, the fractional change in volume is equal to the trace of the stress tensor, i.e. the expansion: δV = ∇·ξ =Θ . V

(10.8)

See Figure 10.2 for a simple example.

x2

x1

Fig. 10.3: Shear in two dimensions. The displacement of points in a solid undergoing pure shear is the vector field ξ(x) given by Eq. (10.4) with Sji replaced by Σji : ξj = Σji xi = Σj1 x1 + Σj2 x2 . The integral curves of this vector field are plotted in this figure. The figure is drawn using principal axes, which are Cartesian, so Σ12 = Σ21 = 0, Σ11 = −Σ22 , which means that ξ1 = Σ11 x1 , and ξ2 = −Σ11 x2 . The integral curves of this simple vector field are the hyperbolae shown in the figure. Note that the displacement increases linearly with distance from the origin.

The shear tensor Σ produces the shearing displacements illustrated in Figures 10.2 and 10.3. As it has zero trace, there is no volume change when a body undergoes a pure shear deformation. The shear tensor has five independent components (Box 10.2). However, by rotating our Cartesian coordinates appropriately, we can transform away all the off diagonal elements, leaving the three diagonal elements, which must sum to zero. This is known as a principal axis transformation. The components of the shear tensor in a cartesian coordinate system can be written down immediately from Eq. (10.5b) by substituting the Kronecker delta δij for the components of the metric tensor gij and treating all derivatives as partial

10 derivatives: Σxx

2 ∂ξx 1 − = 3 ∂x 3

∂ξy ∂ξz + ∂y ∂z

,

Σxy

1 = 2

∂ξx ∂ξy + ∂y ∂x

,

(10.9)

and similarly for the other components. The analogous equations in spherical and cylindrical coordinates will be described in the next section. The third term in Eq. (10.6) describes a pure rotation which does not deform the solid. To verify this, write ξ = φ × x where φ is a small rotation of magnitude φ about an axis parallel to the direction of φ. Using cartesian coordinates in three dimensional Euclidean space, we can demonstrate by direct calculation that the symmetric part of S vanishes, i.e., Θ = Σ = 0 and that 1 Rij = −ǫijk φk , φi = − ǫijk Rjk . (10.10a) 2 Therefore the elements of the tensor R in a cartesian coordinate system just involve the angle φ. Note that expression (10.10a) for φ and expression (10.5c) for Rij imply that φ is half the curl of the displacement vector, 1 φ= ∇×ξ . 2

(10.10b)

A simple example of rotation is shown in the last picture in Figure 10.2. Let us consider some examples of strains that can arise in physical systems. (i) Understanding how materials deform under various loads is central to mechanical, civil and structural engineering. As we have already remarked, in an elastic solid, the deformation (i.e. strain) is proportional to the applied stress. If, for example, we have some structure of negligible weight and it supports a load, then the amount of strain will increase everywhere in proportion to this load. However this law will only be obeyed as long as the strain is sufficiently small that the material out of which the structure is constructed behaves elastically. At a large enough strain, plastic flow will set in and the solid will not return to its original shape after the stress is removed. The point where this happens is known as the elastic limit. For a ductile substance like polycrystalline copper with a relatively low elastic limit, this occurs at strains ∼ 10−4 . [However, failure (cracking or breaking of the material) will not occur until the yield point which occurs at a strain ∼ 10−3 .] For a more resilient material like cemented tungsten carbide, strains can be elastic up to ∼ 3×10−3 , and for rubber, a non-Hookean material, recoverable strains of three or four are possible. What is significant is that all these strains (with the exception of that in rubber) are small, ≪ 1. So, usually, when a material behaves elastically, the strains are small and the linear approximation is consequently pretty good. (ii) Continental drift can be measured on the surface of the earth using Very Long Baseline Interferometry, a technique in which two or more radio telescopes are used to detect interferometric fringes using radio waves from a distant point source. (A similar technique uses the Global Positioning System to achieve comparable accuracy.) By observing the fringes, it is possible to detect changes in the spacing between the telescopes as small as a fraction of a wavelength (∼ 1 cm). As the telescopes are typically

11 1000km apart, this means that dimensionless strains ∼ 10−8 can be measured. Now, the continents drift apart on a timescale . 108 yr., so it takes roughly a year for these changes to grow large enough to be measured. Such techniques are becoming useful for monitoring earthquake faults. (iii) The smallest time-varying strains that have been measured so far involve laser interferometer gravitational wave detectors such as LIGO. In each arm of a LIGO interferometer, two mirrors hang freely, separated by 4 km. In 2005 their separations are monitored, at frequencies ∼ 100 Hz, to ∼ 10−18 m, a thousandth the radius of a nucleon! The associated strain is 3×10−22 . Although these strains are not associated with an elastic solid, they do indicate the high accuracy of optical measurement techniques.

**************************** EXERCISES Exercise 10.1 Derivation and Practice: Reconstruction of a Tensor from its Irreducible Tensorial Parts. Using Eqs. (1), (2), and (3) of Box 10.2, show that 13 Θgij + Σij + Rij is equal to Sij . Exercise 10.2 Example: The Displacement Vectors Associated with Expansion, Rotation and Shear (a) Consider a strain that is pure expansion, Sij = 31 Θgij . Using Eq. (10.4) show that, in the vicinity of a chosen point, the displacement vector is ξi = 13 Θxi . Draw this displacement vector field. (b) Similarly, draw ξ(x) for a strain that is pure rotation. [Hint: express ξ in terms of the vectorial angle φ with the aid of Eq. (10.10a).] (c) Draw ξ(x) for a strain that is pure shear. To simplify the drawing, assume that the shear is confined to the x-y plane, and make your drawing for a shear whose only nonzero components are Σxy = Σyx . Compare your drawing with Fig. 10.3, where the nonzero components are Σxx = −Σyy .

****************************

10.3

Stress and Elastic Moduli

10.3.1

Stress Tensor

The forces acting within an elastic solid are measured by a second rank tensor, the stress tensor introduced in Sec. 1.12.1 (which is also the spatial part of the stress-energy tensor of Sec. 1.12.2). Let us recall the definition of this stress tensor:

12 Consider two small, contiguous regions in a solid. If we take a small element of area dΣ in the contact surface with its positive sense3 (same as direction of dΣ viewed as a vector) pointing from the first region toward the second, then the first region exerts a force dF (not necessarily normal to the surface) on the second through this area. The force the second region exerts on the first (through the area −dΣ) will, by Newton’s third law, be equal and opposite to that force. The force and the area of contact are both vectors and there is a linear relationship between them. (If we double the area, we double the force.) The two vectors therefore will be related by a second rank tensor, the stress tensor T: dF = T · dΣ = T(. . . , dΣ) ;

i.e., dFi = Tij dΣj .

(10.11)

Thus, the tensor T is the net (vectorial) force per unit (vectorial) area that a body exerts upon its surroundings. Be aware that many books on elasticity (e.g. Landau and Lifshitz 1986) define the stress tensor with the opposite sign to (10.11). Also be careful not to confuse the shear tensor Σjk with the vectorial infinitesimal surface area dΣj . We often need to compute the total elastic force acting on some finite volume V. Let us now make an important assumption, which we discuss in Sec. 10.3.4, namely that the stress is determined by local conditions and can be computed from the local arrangement of atoms. If this assumption is valid, then (as we shall see in Sec. 10.3.4), we can compute the total force acting on the volume element by integrating the stress over its surface ∂V: Z Z T · dΣ = − ∇ · TdV , (10.12) F=− ∂V

V

where we have invoked Gauss’ theorem, and the minus sign is because, for a closed surface ∂V (by convention), dΣ points out of V instead of into it. Equation (10.12) must be true for arbitrary volumes and so we can identify the elastic force density f acting on an elastic solid as f = −∇ · T .

(10.13)

In elastostatic equilibrium, this force density must balance all other volume forces acting on the material, most commonly the gravitational force density so that f + ρg = 0

(10.14)

where g is the gravitational acceleration. (Again, there should be no confusion between the vector g and the metric tensor g.) There are other possible external forces, some of which we shall encounter later in a fluid context, e.g. an electromagnetic force density. These can be added to Eq. (10.14). Just as for the strain, the stress-tensor T can be decomposed into its irreducible tensorial parts, a pure trace (the pressure P ) and a symmetric trace-free part (the shear stress): T = P g + Tshear ; 3

1 1 P = Tr(T) = Tii . 3 3

For a discussion of area elements including their positive sense, see Sec. 1.11.

(10.15)

13 (There is no antisymmetric part because the stress tensor is symmetric, as we saw in Sec. 1.12.) Fluids at rest exert isotropic stresses, i.e. T = P g. They cannot exert shear stress when at rest, though when moving and shearing they can exert a viscous shear stress, as we shall discuss extensively in Part IV (especially Sec. 12.6). In SI units, stress is measured in units of Pascals, denoted Pa 1Pa = 1N/m2 = 1

kg m/s2 , m2

(10.16)

or sometimes in GPa = 109 Pa. In cgs units, stress is measured in dyne/cm2 . Note that 1 Pa = 10 dyne/cm2 . Now let us consider some examples of stresses: (i) Atmospheric pressure is equal to the weight of the air in a column of unit area extending above the earth, and thus is roughly P ∼ ρgH ∼ 105 Pa, where ρ ≃ 1 kg m−3 is the density of air, g ≃ 10m s−2 is the acceleration of gravity at the earth’s surface, and H ≃ 10km is the atmospheric scale height. Thus 1 atmosphere is ∼ 105 Pa (or, more precisely, 1.01325 × 105 Pa). The stress tensor is isotropic. (ii) Suppose we hammer a nail into a block of wood. The hammer might weigh m ∼ 0.3kg and be brought to rest from a speed of v ∼ 10m s−1 in a distance of, say, d ∼ 3mm. Then the average force exerted on the wood by the nail is F ∼ mv 2 /d ∼ 104 N. If this is applied over an effective area A ∼ 1mm2 , then the magnitude of the typical stress in the wood is ∼ F/A ∼ 1010 Pa ∼ 105atmosphere. There is a large shear component to the stress tensor, which is responsible for separating the fibers in the wood as the nail is hammered. (iii) Neutron stars are as massive as the sun, M ∼ 2 × 1030 kg, but have far smaller radii, R ∼ 10km. Their surface gravities are therefore g ∼ GM/R2 ∼ 1012 m s−2 , a billion times that encountered on earth. They have solid crusts of density ρ ∼ 1017 kg m−3 that are about 1km thick. The magnitude of the stress at the base of a neutron-star crust will then be P ∼ ρgH ∼ 1031 Pa! This stress will be mainly hydrostatic, though as the material is solid, a modest portion will be in the form of a shear stress. (iv) As we shall discuss in Chap. 27, a popular cosmological theory called inflation postulates that the universe underwent a period of rapid, exponential expansion during its earliest epochs. This expansion was driven by the stress associated with a false vacuum. The action of this stress on the universe can be described quite adequately using a classical stress tensor. If the interaction energy is E ∼ 1015 GeV, the supposed scale of grand unification, and the associated length scale is the Compton wavelength associated with that energy l ∼ ~c/E, then the magnitude of the stress is ∼ E/l3 ∼ 1097 (E/1015 GeV)4 Pa. (v) Elementary particles interact through forces. Although it makes no sense to describe this interaction using classical elasticity, it does make sense to make order of magnitude estimates of the associated stress. One promising model of these interactions involves fundamental strings with mass per unit length µ = gs2c2 /8πG ∼ 0.1 Megaton/Fermi

14 (where Megaton is not the TNT equivalent!), and cross section of order the Planck length squared, LP 2 = ~G/c3 ∼ 10−70 m2 , and tension (negative pressure) Tzz ∼ µc2 /LP 2 ∼ 10110 Pa. Here ~, G and c are Planck’s (reduced) constant, Newton’s gravitation constant, and the speed of light, and gs2 ∼ 0.025 is the string coupling constant. (vi) The highest possible stress is presumably found associated with singularities, for example at the creation of the universe or inside a black hole. Here the characteristic energy is the Planck energy EP = (~c5 /G)1/2 ∼ 1019 GeV, the lengthscale is the Planck length LP = (~G/c3 )1/2 ∼ 10−35 m, and the associated ultimate stress is ∼ 10114 Pa.

10.3.2

Elastic Moduli and Elastostatic Stress Balance

Having introduced the stress and the strain tensors, we are now in a position to generalize Hooke’s law by postulating a linear relationship between them. The most general linear equation relating two second rank tensors will involve a fourth rank tensor known as the elastic modulus tensor, Y. In slot-naming index notation, Tij = −Yijkl Skl

(10.17)

Now, a general fourth rank tensor in three dimensions has 34 = 81 independent components. Elasticity can get complicated! However, the situation need not be so dire. There are several symmetries that we can exploit. Let us look first at the general case. As the stress tensor is symmetric, and only the symmetric part of the strain tensor creates stress (i.e., a solid-body rotation through some vectorial angle φ produces no stress), Y is symmetric in its first pair of slots and also in its second pair: Yijkl = Yjikl = Yijlk . There are therefore 6 independent components Yijkl for variable i, j and fixed k, l, and vice versa. In addition, as we show below, Y is symmetric under an interchange of its first and second pairs of slots: Yijkl = Yklij . There are therefore (6 × 7)/2 = 21 independent components in Y. This is an improvement over 81. Many substances, notably crystals, exhibit additional symmetries and this can reduce the number of independent components considerably. The simplest, and in fact most common, case arises when the medium is isotropic. In other words, there are no preferred directions in the material. This occurs when the solid is polycrystalline or amorphous and completely disordered on a scale large compared with the atomic spacing, but small compared with the solid’s inhomogeneity scale. If a body is isotropic, then its elastic properties must be describable by scalars. Now, the stress tensor T, being symmetric, must have just two irreducible tensorial parts, T =(a scalar P )g+(a trace-free symmetric part Tshear ); and the parts of the strain that can produce this {P, Tshear } are the scalar expansion Θ and the trace-free, symmetric shear Σ, but not the rotation. The only linear, coordinate-independent relationship between these {P, Tshear } and {Θ, Σ} involving solely scalars is P = −KΘ, T shear = −2µΣ, corresponding to a total stress tensor T = −KΘg − 2µΣ . (10.18) Here K is called the bulk modulus and µ the shear modulus, and the factor 2 is included for purely historical reasons. In Sec. 10.4 we will deduce the relationship of these elastic

15 moduli to Young’s modulus E (which appears in Hooke’s law for the stress in a stretched rod or fiber [Eq. (10.1) and Fig. 10.1]). In some treatments and applications of elasticity, µ is called the first Lame coefficient, and a second Lame coefficient λ ≡ K − 23 µ is introduced and used in place of K. It is commonly the case that the elastic moduli K and µ are constant, i.e. are independent of location in the medium, even though the medium is stressed in an inhomogeneous way. (This is because the strains are small and thus perturb the material properties by only small amounts.) If so, we can deduce (Ex. 10.3) an expression for the elastic force density inside the body [Eq. (10.13)]: f = −∇ · T = K∇Θ + 2µ∇ · Σ =

1 K + µ ∇(∇ · ξ) + µ∇2 ξ . 3

(10.19)

Here ∇ · Σ in index notation is Σij;j = Σji;j . Extra terms must be added if we are dealing with less symmetric materials. However, in this book Eq. (10.19) will be sufficient for our needs. If no other countervailing forces act in the interior of the material (e.g., if there is no gravitational force), and if, as in this chapter, the material is in a static, equilibrium state rather than vibrating dynamically, then this force density will have to vanish throughout the material’s interior. This vanishing of f ≡ −∇ · T is just a fancy version of Newton’s law for static situations, F = ma = 0. If the material has density ρ and is pulled on by a gravitational acceleration g, then the sum of the elastostatic force per unit volume and gravitational force per unit volume must vanish, f + ρg = 0. When external forces are applied to the surface of an elastic body (for example, when one pushes on the face of a cylinder) and gravity acts on the interior, the distribution of the strain ξ(x) inside the body can be computed by solving the zero-internal-force equation 1 f + ρg = K + µ ∇(∇ · ξ) + µ∇2 ξ + ρg = 0 3

(10.20)

subject to boundary conditions provided by the applied forces. Solving this equation for ξ(x), subject to specified boundary conditions, is a problem in elastostatics analogous to solving Maxwell’s equations for an electric field subject to boundary conditions in electrostatics, or for a magnetic field subject to boundary conditions in magnetostatics, and the types of solution techniques used in electrostatics and magnetostatics can also be used here — e.g. separation of variables and Green’s functions. We shall explore examples in Sec. 10.6.2 and Exs. 10.14 and 10.15 below. In electrostatics one can derive boundary conditions by integrating Maxwell’s equations over the interior of a thin box (a “pill box”) with parallel faces that snuggle up to the boundary (Fig. 10.4). For example, by integrating ∇ · E = ρe /ǫo over the interior of the pill box, then applying Gauss’s law to convert the left side to a surface integral, we obtain the junction condition that the discontinuity in the normal component of the electric field is equal 1/ǫo times the surface charge density. Similarly, in elastostatics one can derive boundary conditions by integrating the elastostatic equation ∇ · T = 0 over the pill box of

16 n pill box

boundary

Fig. 10.4: Pill box used to derive boundary conditions in electrostatics and elastostatics.

Fig. 10.4 and then applying Gauss’s law: Z Z Z 0= ∇·TdV = T·dΣ = T·ndA = [(T·n)upper face −(T·n)lower face ]A . (10.21) V

∂V

∂V

Here in the next-to-last expression we have used dΣ = ndA where dA is the scalar area element and n is the unit normal to the pill-box face, and in the last term we have assumed the pill box has a small face so T · n can be treated as constant and be pulled outside the integral. The result is the boundary condition that T · n must be continuous across any boundary, i.e. in index notation, Tij nj is continuous. Physically this is nothing but the law of force balance across the boundary: The force per unit area acting from the lower side to the upper side must be equal and opposite to that acting from upper to lower. As an example, if the upper face is bounded by vacuum then the solid’s stress tensor must satisfy Tij nj = 0 at the surface. If a normal pressure P is applied by some external agent at the upper face, then the solid must respond with a normal force equal to P : ni Tij nj = P . If a vectorial force per unit area Fi is applied at the upper face by some external agent, then it must be balanced: Tij nj = Fi .

10.3.3

Energy of Deformation

Take a wire of length ℓ and cross sectional area A, and stretch it (e.g. via the “Hooke’s-law experiment” of Fig. 10.1) by an amount ζ ′ that grows gradually from 0 to ∆ℓ. When the stretch is ζ ′, the force that does the stretching is F ′ = EA(ζ ′ /ℓ) = EV /ℓ2 )ζ ′; here V = EAℓ is the wire’s volume and E is its Young’s modulus. As the wire is gradually lengthened, the stretching force F ′ does work Z ∆ℓ Z ∆ℓ ′ ′ W = F dζ = (EV /ℓ)ζ ′dζ ′ 0

0

1 = EV (∆ℓ/ℓ)2 . 2

(10.22)

This tells us that the stored elastic energy per unit volume is 1 U = E(∆ℓ/ℓ)2 2

(10.23)

To generalize this formula to three dimensions, consider an arbitrary but very small region V inside a body that has already been stressed by a displacement vector field ξi and is thus already experiencing an elastic stress Tij given by the three-dimensional stress-strain

17 relation (10.18). Imagine building up this displacement gradually from zero at the same rate everywhere in and around V, so at some moment during the buildup the displacement field is ξi′ = ξiǫ (with the parameter ǫ gradually growing from 0 to 1). At that moment, the stress tensor (by virtue of the linearity of the stress-strain relation) is Tij′ = Tij ǫ. On the boundary ∂V of the region V, this stress exerts a force ∆Fi′ = −Tij′ ∆Σj across any surface element ∆Σj , from the exterior of ∂V to its interior. As the displacement grows, this surface force does the following amount of work on V: Z Z Z 1 1 ′ ′ ′ ′ (10.24) ∆Wsurf = ∆Fi dξi = (−Tij ∆Σj )dξi = − Tij ǫ∆Σj ξi′dǫ = − Tij ∆Σj ξi . 2 0 The total amount of work done can be computed by adding up the contributions from all the surface elements of ∂V: Z Z 1 1 1 Wsurf = − Tij ξi dΣj = − (Tij ξi);j dV = − (Tij ξi );j V . (10.25) 2 ∂V 2 V 2 In the second step we have used Gauss’s theorem, and in the third step we have used the smallness of the region V to infer that the integrand is very nearly constant and the integral is the integrand times the total volume V of V. Does this equal the elastic energy stored in V? The answer is “no”, because we must also take account of the work done in the interior of V by gravity or any other non-elastic force that may be acting. Now, although it is not easy in practice to turn gravity off and then on, we must do so in this thought experiment: In the volume’s final deformed state, the divergence of its elastic stress tensor is equal to the gravitational force density, ∇ · T = ρg [Eqs. (10.13) and (10.14]; and in the initial, undeformed and unstressed state, ∇ · T must be zero, whence so must be g. Therefore, we must imagine growing the gravitational force proportional to ǫ just like we grow the displacement, strain and stress. During this growth, the gravitational force ρg′ V = ρgV ǫ does the following amount of work on our tiny region V: Z Z 1 1 1 1 ′ ′ Wgrav = ρV g · dξ = ρV gǫ · ξdǫ = ρV g · ξ = (∇ · T) · ξV = Tij;j ξi V . (10.26) 2 2 2 0 The total work done to deform V is the sum of the work done by the elastic force (10.25) on its surface and the gravitational force (10.26) in its interior, Wsurf + Wgrav = − 12 (ξiTij );j V + 21 Tij;j ξi V = − 12 Tij ξi;j V . This work gets stored in V as elastic energy, so the energy density is U = − 12 Tij ξi;j . Inserting Tij = −KΘgij − 2µΣij and ξi;j = 31 Θgij + Σij + Rij and performing some simple algebra that relies on the symmetry properties of the expansion, shear, and rotation (Ex. 10.5), we obtain 1 U = KΘ2 + µΣij Σij . 2

(10.27)

Note that this elastic energy density is always positive if the elastic moduli are positive — as they must be in order that matter be be stable to small perturbations.

18

Fig. 10.5: Action of electromagnetic forces within a solid. If we compute the force acting on one side of a slice of material, the integral is dominated by interactions between atoms lying in the shaded area. The force is effectively a surface force rather than a volume force. In elastostatic equilibrium, the forces acting on the two sides of the slice are effectively equal and opposite.

For the more general, anisotropic case, expression (10.27) becomes [by virtue of the stress-strain relation Tij = −Yijkl ξk;l, Eq. (10.17)] 1 U = ξi;j Yijkl ξk;l . 2

(10.28)

The volume integral of the elastic energy density (10.27) or (10.28) can be used as an action from which to compute the stress, by varying the displacement (Ex. 10.6). Since only the part of Y that is symmetric under interchange of the first and second pairs of slots contributes to U, only that part can affect the action-principle-derived stress. Therefore, it must be that Yijkl = Yklij . This is the symmetry we asserted earlier.

10.3.4

Molecular Origin of Elastic Stress

It is important to understand the microscopic origin of the elastic stress. Consider an ionic solid in which singly ionized ions (e.g. sodium and chlorine) attract their nearest neighbours through their mutual Coulomb attraction and repel their next nearest neighbors and so on. Overall, there is a net attraction, which is balanced by the short range repulsion of the bound electrons. Now consider a thin slice of material of thickness intermediate between the interatomic spacing and the solid’s inhomogeneity scale, a few atomic spacings thick (Figure 10.5). If we calculate the force acting on the material in the slice, exerted by external atoms on one side of the slice, we find that the sum converges very close to the boundary. Although the electrostatic force between individual atoms is long range, the material is electrically neutral and, when averaged over many atoms, the net electric force is of short range. We can therefore treat the net force acting on a region that is large enough to encompass many atoms, but much smaller than the body’s inhomogeneity scale, as a surface force governed by local conditions in the material. This is essential if we are to be able to write down a local, linear stress-strain relation Tij = −Yijkl Skl . This need not have been the case and there are circumstances when a long range force develops. One example occurs with certain types of crystal (e.g. tourmaline) which develop internal, piezoelectric fields when strained. Our treatment so far has implicitly made the assumption that matter is continuous on all scales and that derivatives are mathematically well-defined. Of course, this is not the case. In fact, we not only need to acknowledge the existence of atoms, we must use this fact to compute the elastic moduli.

19

(b)

(a)

Fig. 10.6: a) A perfect crystal in which the atoms are organized in a perfectly repeating lattice can develop very large shear strains without yielding. b) Real materials contain dislocations which greatly reduce their rigidity. The simplest type of dislocation, shown here, is the edge dislocation. The dislocation will move and the crystal will undergo inelastic deformation when the stress is typically less than one per cent of the yield shear stress for a perfect crystal.

Substance Steel Copper Glass Rubber

K GPa 170 130 47 10

µ GPa 81 45 28 0.0007

E GPa 210 120 70 0.002

ν 0.29 0.34 0.25 0.50

cL km s−1 5.9 4.6 5.8 1.0

cT km s−1 3.2 2.2 3.3 0.03

Table 10.1: Bulk, Shear and Young’s moduli and Poisson’s ratio for a range of materials. The final two columns quote the longitudinal and transverse sound speeds defined in the following chapter.

We can estimate the elastic moduli in ionic or metallic materials by observing that, if a crystal lattice were to be given a dimensionless strain of order unity, then the elastic stress would be of order the electrostatic force between adjacent ions divided by the area associated with each ion. If the lattice spacing is a ∼ 2˚ A and the ions are singly charged, 2 4 then K, µ ∼ e /4πǫ0 a ∼ 100 GPa. This is about a million atmospheres. Covalently bonded compounds are less tightly bound and have somewhat smaller elastic moduli. See Table 10.1. It might be thought, on the basis of this argument, that crystals can be subjected to strains of order unity before they attain their elastic limits. However, as explained above, most materials are only elastic for strains . 10−3 . The reason for this difference is that crystals are generally imperfect and are laced with dislocations. Relatively small stresses suffice for the dislocations to move through the solid and for the crystal thereby to undergo permanent deformation (Fig. 10.6). **************************** EXERCISES Exercise 10.3 Derivation and Practice: Elastic Force Density From Eq. (10.18) derive expression (10.19) for the elastostatic force density inside an elastic body. Exercise 10.4 *** Practice: Biharmonic Equation

20 A homogeneous, isotropic, elastic solid is in equilibrium under (uniform) gravity and applied surface stresses. Use Eq. (10.19) to show that the displacement inside it ξ(x) is biharmonic, i.e. it satisfies the differential equation ∇2 ∇ 2 ξ = 0 .

(10.29a)

Show also that the expansion Θ satisfies the Lapace equation ∇2 Θ = 0 .

(10.29b)

Exercise 10.5 Derivation and Practice: Elastic Energy Beginning with U = − 21 Tij ξi;j [text following Eq. (10.26)], derive U = 12 KΘ2 + µΣij Σij for the elastic energy density inside a body. Exercise 10.6 Derivation and Practice: Action Principle for Elastic Stress For an anisotropic, elastic body with elastic energy density U = 12 ξi;j Yijkl ξk;l , integrate this energy density over a three-dimensional region V (not necessarily small) to get the total elastic energy E. Now, consider a small variation δξi in the displacement field. Evaluate the resulting change δE in the elastic energy without using the relation Tij = −Yijkl ξk;l . Convert to a surface integral over ∂V and therefrom infer the stress-strain relation Tij = −Yijkl ξk;l. Exercise 10.7 Problem: Order of Magnitude Estimates (a) What length of steel wire can hang vertically without breaking? (b) What is the maximum size of a non-spherical asteroid? [Hint: if the asteroid is too large, its gravity will deform it into a spherical shape.] (c) Can a helium balloon lift the tank used to transport the helium?

****************************

10.4

Young’s Modulus and Poisson’s Ratio for an Isotropic Material: A Simple Elastostatics Problem

As a simple example of an elastostatics problem, we shall explore the connection between our three-dimensional theory of stress and strain, and the one-dimensional Hooke’s law [Fig. 10.1 and Eq. (10.1)]. Consider a thin rod of square cross section hanging along the ez direction of a Cartesian coordinate system (Fig. 10.1). Subject the rod to a stretching force applied normally and uniformly at its ends. (It could just as easily be a rod under compression.) Its sides are free to expand transversely, since no force acts on them, dFi = Tij dΣj = 0. As the rod is slender,

21 vanishing of dFi at its x and y sides implies to high accuracy that the stress components Tix and Tiy will vanish throughout the interior; otherwise there would be a very large force density Tij;j inside the rod. Using Tij = −KΘgij − 2µΣij , we then obtain Txx Tyy Tyz Txz Txy Tzz

= = = = = =

−KΘ − 2µΣxx = 0 , −KΘ − 2µΣyy = 0 , −2µΣyz = 0 , −2µΣxz = 0 , −2µΣxy = 0 , −KΘ − 2µΣzz .

(10.30a) (10.30b) (10.30c) (10.30d) (10.30e) (10.30f)

From the first two of these equations and Σxx + Σyy + Σzz = 0 we obtain a relationship between the expansion and the nonzero components of the shear, KΘ = µΣzz = −2µΣxx = −2µΣyy ;

(10.31)

and from this and Eq. (10.30f) we obtain Tzz = −3KΘ. The decomposition of Sij into its irreducible tensorial parts tells us that Szz = ξz;z = Σzz + 13 Θ, which becomes, upon using Eq. (10.31), ξz;z = [(3K + µ)/3µ]Θ. Combining with Tzz = −3KΘ we obtain Hooke’s law and an expression for Young’s modulus E in terms of the bulk and shear moduli: 9µK −Tzz = =E. ξz;z 3K + µ

(10.32)

It is conventional to introduce Poisson’s ratio, ν, which is defined to be minus the ratio of the lateral strain to the longitudinal strain during a deformation of this type, in which the transverse motion is unconstrained. It can be expressed as a ratio of elastic moduli as follows: Σxx + 13 Θ 3K − 2µ ξx,x = ν=− =− , (10.33) 1 ξz,z 2(3K + µ) Σzz + 3 Θ where we have used Eq. (10.31). We tabulate these and their inverses for future use:

E=

9µK , 3K + µ

ν=

3K − 2µ ; 2(3K + µ)

K=

E , 3(1 − 2ν)

µ=

E . 2(1 + ν)

(10.34)

We have already remarked that mechanical stability of a solid requires that K, µ > 0. Using Eq. (10.34), we observe that this imposes a restriction on Poisson’s ratio, namely that −1 < ν < 1/2. For metals, Poisson’s ratio is typically 1/3 and the shear modulus is roughly half the bulk modulus. For a substance that is easily sheared but not easily compressed, like rubber, the bulk modulus is relatively high and ν ≃ 1/2 (cf. Table 10.1.) For some exotic materials, Poison’s ratio can be negative (cf. Yeganeh-Haeri et al 1992). Although we derived them for a square strut under compression, our expressions for Young’s modulus and Poisson’s ratio are quite general. To see this, observe that the derivation would be unaffected if we combined many parallel, square fibers together. All that is necessary is that the transverse motion be free so that the only applied force is normal to a pair of parallel faces.

22

10.5

T2 Cylindrical and Spherical Coordinates: Connection Coefficients and Components of Strain

Thus far, in our discussion of elasticisty, we have resticted ourselves to Cartesian coordinates. However, many problems in elasticity are most efficiently solved using cylindrical or spherical coordinates, so in this section we shall develop some mathematical tools for those coordinate systems. In doing so we follow the vectorial conventions of standard texts on electrodynamics and quantum mechanics (e.g., Jackson 1999, and Messiah 1962): We introduce an orthonormal set of basis vectors associated with each of our curvilinear coordinate systems; the coordinate lines are orthogonal to each other, and the basis vectors have unit lengths and point along the coordinate lines. In our study of continuum mechanics (Part III – Elasticity, Part IV – Fluid Mechanics, and Part V – Plasma Physics), we shall follow this practice. Then in studying General Relativity and Cosmology (Part VI) we shall introduce and use basis vectors that are not orthonormal. Our notation for cylindrical coordinates is (̟, φ, z); ̟ (pronounced “pomega”) is distance from the z axis, and φ is angle around the z axis, so p ̟ = x2 + y 2 , φ = arctan(y/x) . (10.35a) The unit basis vectors that point along the coordinate axes are denoted e̟ , eφ , ez , and are related to the Cartesian basis vectors by e̟ = (x/̟)ex + (y/̟)ey ,

eφ = −(y/̟)ex + (x/̟)ey ,

ez = Cartesian ez . (10.35b)

Our notation for spherical coordinates is (r, θ, φ), with (as should be very familiar) p (10.36a) r = x2 + y 2 + z 2 , θ = arccos(z/r) , φ = arctan(y/x) . The unit basis vectors associated with these coordinates are er =

x y z ex + ey + ez , r r r

z ̟ eθ = e̟ − ez , r r

eφ = −

y x ex + ey . ̟ ̟

(10.36b)

Because our bases are orthonormal, the components of the metric of 3-dimensional space retain the Kronecker-delta values gjk ≡ ej · ek = δjk ,

(10.37)

which permits us to keep all vector and tensor indices down, by contrast with spacetime where we must distinguish between up and down; cf. Sec. 1.5.4 In Jackson (1999), Messiah (1962) and other standard texts, formulas are written down for the gradient and Laplacian of a scalar field, and the divergence and curl of a vector field, in cylindrical and spherical coordinates; and one uses these formulas over and over again. ij Occasionally, e.g. in the useful equation ǫijm ǫklm = δkl ≡ δki δlj − δli δkj [Eq. (1.61)], it is convenient to put some indices up; but because gjk = δjk , any component with an index up is equal to that same component with an index down; e.g., δki ≡ δik . 4

23 In elasticity theory we deal largely with second rank tensors, and will need formulae for their various derivatives in cylindrical and spherical coordinates. In this book we introduce a mathematical tool, connection coefficients Γijk , by which those formulae can be derived when needed. The connection coefficients quantify the turning of the orthonormal basis vectors as one moves from point to point in Euclidean 3-space; i.e., they tell us how the basis vectors at one point in space are connected to (related to) those at another point. More specifically, we define Γijk by the two equivalent relations ∇k ej = Γijk ei ;

Γijk = ei · (∇k ej ) ;

(10.38)

here ∇k ≡ ∇ek is the directional derivative along the orthonormal basis vector ek ; cf. Eq. (1.54a). Notice that (as is true quite generally; cf. Sec. 1.9) the differentiation index comes last on Γ. Because our basis is orthonormal, it must be that ∇k (ei · ej ) = 0. Expanding this out using the standard rule for differentiating products, we obtain ej · (∇k ei ) + ei · (∇k ej ) = 0. Then invoking the definition (10.38) of the connection coefficients, we see that Γijk is antisymmetric on its first two indices: Γijk = −Γjik .

(10.39)

In Part VI, when we use non-orthonormal bases, this antisymmetry will break down. It is straightforward to compute the connection coefficients for cylindrical and spherical coordinates from the definition (10.38), expressions (10.35b) and (10.36b) for the cylindrical and spherical basis vectors in terms of the Cartesian basis vectors, and from the fact that in Cartesian coordinates the connection coefficients vanish (ex , ey and ez do not rotate as one moves through Euclidean 3-space). One can also deduce the cylindrical and spherical connection coefficients by drawing pictures of the basis vectors and observing how they change from point to point. For cylindrical coordinates, we see from Fig. 10.7 that ∇φ e̟ = eφ /̟. A similar pictorial calculation (which the reader is encouraged to do) reveals that ∇φ eφ = −e̟ /̟. All other derivatives vanish. Therefore, the only nonzero connection coefficients in cylindrical coordinates are Γ̟φφ = −

1 , ̟

Γφ̟φ =

1 , ̟

(10.40)

which have the required antisymmetry [Eq. (10.39)]. Likewise, for spherical coordinates (Ex. 10.9) cot θ 1 ; Γθrθ = Γφrφ = −Γrθθ = −Γrφφ = , Γφθφ = −Γθφφ = (10.41) r r The connection coefficients are the keys to differentiating vectors and tensors. Consider the strain tensor S = ∇ξ. Applying the product rule for differentiation, we obtain ∇k (ξj ej ) = (∇k ξj )ej + ξj (∇k ej ) = ξj,k ej + ξj Γljk el .

(10.42)

24

eω ∼ eφ ∆ ∼ φeω eω ∼

Fig. 10.7: Pictorial evaluation of Γφ̟φ . In the right-most assemblage of vectors we compute ∇φ e̟ as follows: We draw the vector to be differentiated, e̟ , at the tail of eφ (the vector along which we differentiate) and also at its head. We then subtract e̟ at the head from that at the tail; this difference is ∇φ e̟ . It obviously points in the eφ direction. When we perform the same calculation at a radius ̟ that is smaller by a factor 2 (left assemblage of vectors), we obtain a result, ∇φ e̟ , that is twice as large. Therefore the length of this vector must scale as 1/̟. By looking quantitatively at the length at some chosen radius ̟, one can see that the multiplicative 1 eφ . Comparing with Eq. (10.38), we deduce that Γφ̟φ = 1/̟. coefficient is unity: ∇φ e̟ = ̟

Here the comma denotes the directional derivative, along a basis vector, of the components treated as scalar fields. For example, in cylindrical coordinates we have ξi,̟ =

∂ξi , ∂̟

ξi,φ =

1 ∂ξi , ̟ ∂φ

ξi,z =

∂ξi ; ∂z

(10.43)

and in spherical coordinates we have ξi,r =

∂ξi , ∂r

ξi,θ =

1 ∂ξi , r ∂θ

ξi,φ =

1 ∂ξi . r sin θ ∂φ

(10.44)

Taking the i’th component of Eq. (10.42) we obtain Sik = ξi;k = ξi,k + Γijk ξj .

(10.45)

Here ξi;k are the nine components of the gradient of the vector field ξ(x) evaluated in any orthonormal basis. We can use Eq. (10.45) to evaluate the expansion Θ = TrS = ∇ · ξ. Using Eq. (10.40) , (10.41) , we obtain ∂ξ̟ 1 ∂ξφ ∂ξz ξ̟ + + + ∂̟ ̟ ∂φ ∂z ̟ 1 ∂ξφ ∂ξz 1 ∂ (̟ξ̟ ) + + = ̟ ∂̟ ̟ ∂φ ∂z

Θ=∇·ξ =

(10.46)

in cylindrical coordinates, and ∂ξr 1 ∂ξθ 1 ∂ξφ 2ξr cot θξθ + + + + ∂r r ∂θ r sin θ ∂φ r r 1 ∂ 1 ∂ξφ 1 ∂ (sin θξθ ) + = 2 (r 2 ξr ) + r ∂r r sin θ ∂θ r sin θ ∂φ

Θ= ∇·ξ =

(10.47)

25 in spherical coordinates, in agreement with formulae in standard textbooks such as the flyleaf of Jackson (1999). The components of the rotation are most easily deduced using Rij = −ǫijk φk with φ = 1 ∇ × ξ, and the standard expressions for the curl in cylindrical and spherical coordinates 2 (e.g., Jackson 1999). Since the rotation does not enter into elasticity theory in a significant way, we shall refrain from writing down the results. The components of the shear are computed in Box 10.3. By a computation analogous to Eq. (10.42) we can construct an expression for the gradient of a tensor of any rank. For a second rank tensor T = Tij ei ⊗ ej we obtain (Ex. 10.8) Tij;k = Tij,k + Γilk Tlj + Γjlk Til .

(10.48)

Equation (10.48) for the components of the gradient can be understood as follows: In cylindrical or spherical coordinates, the components Tij can change from point to point as a result of two things: a change of the tensor T, or the turning of the basis vectors. The two connection coefficient terms in Eq. (10.48) remove the effects of the basis turning, leaving in Tij;m only the influence of the change of T itself. There are two correction terms corresponding to the two slots (indices) of T; the effects of basis turning on each slot get corrected one after another. If T had had n slots, then there would have been n correction terms, each with the form of the two in Eq. (10.48). These expressions for derivatives of tensors are not required to deal with the vector fields of introductory electromagnetic theory, but they are essential to manipulate the tensor fields encountered in elasticity. As we shall see in Sec. 23.3, with one further generalization, we can go on to differentiate tensors in any basis (orthonormal or non-orthonormal) in a curved spacetime, as is needed to perform calculations in general relativity. Although the algebra of evaluating the components of derivatives such as (10.48) in explicit form (e.g., in terms of {r, θ, φ}) can be long and tedious when done by hand, in the modern era of symbolic manipulation via computers (e.g. Maple or Mathematica), the algebra can be done quickly and accurately to obtain, e.g., expressions such as Eqs. (3) of Box 10.3. **************************** EXERCISES Exercise 10.8 Derivation and Practice: Gradient of a Second Rank Tensor By a computation analogous to Eq. (10.42), derive Eq. (10.48) for the components of the gradient of a second rank tensor in any orthonormal basis Exercise 10.9 Derivation and Practice: Connection in Spherical Coordinates (a) By drawing pictures analogous to Fig. 10.7, show that 1 ∇φ er = eφ , r

1 ∇θ er = eθ , r

∇φ eθ =

cot θ eφ . r

(10.49)

26 Box 10.3 Shear Tensor in Spherical and Cylindrical Coordinates Using our rules for forming the gradient of a vector we can derive a general expression for the shear tensor 1 1 (ξi;j + ξj;i) − δij ξk;k 2 3 1 1 = (ξi,j + ξj,i + Γilj ξl + Γjli ξl ) − δij (ξk,k + Γklk ξl ) . 2 3

Σij =

(1)

Evaluating this in cylindrical coordinates using the connection coefficients (10.40), we obtain Σ̟̟ = Σφφ = Σzz = Σφz = Σz̟ = Σ̟φ =

2 ∂ξ̟ 1 ξ̟ 1 ∂ξφ 1 ∂ξz − − − 3 ∂̟ 3̟ 3̟ ∂φ 3 ∂z 2 ∂ξφ 2 ξ̟ 1 ∂ξ̟ 1 ∂ξz + − − 3̟ ∂φ 3̟ 3 ∂̟ 3 ∂z 2 ∂ξz 1 ∂ξ̟ 1 ξ̟ 1 ∂ξφ − − − 3 ∂z 3 ∂̟ 3̟ 3̟ ∂φ 1 ∂ξz 1 ∂ξφ + Σzφ = 2̟ ∂φ 2 ∂z 1 ∂ξ̟ 1 ∂ξz + Σ̟z = 2 ∂z 2 ∂̟ 1 ∂ξφ ξφ 1 ∂ξ̟ Σφ̟ = − + . 2 ∂̟ 2̟ 2̟ ∂φ

(2)

Likewise, in spherical coordinates using the connection coefficients (10.41), we obtain Σrr = Σθθ = Σφφ = Σθφ = Σφr = Σrθ =

2 ∂ξr 2 cotθ 1 ∂ξθ 1 ∂ξφ − ξr − ξθ − − 3 ∂r 3r 3r 3r ∂θ 3r sin θ ∂φ ξr 1 ∂ξr cotθξθ 1 ∂ξφ 2 ∂ξθ + − − − 3r ∂φ 3r 3 ∂r 3r 3r sin θ ∂φ 2 ∂ξφ 2cotθξθ ξr 1 ∂ξr 1 ∂ξθ + + − − 3r sin θ ∂φ 3r 3r 3 ∂r 3r ∂θ 1 ∂ξθ 1 ∂ξφ cotθξφ − + Σφθ = 2r ∂θ 2r 2r sin θ ∂φ 1 ∂ξr 1 ∂ξφ ξφ Σrφ = + − 2r sin θ ∂φ 2 ∂r 2r ξθ 1 ∂ξr 1 ∂ξθ − + . Σθr = 2 ∂r 2r 2r ∂θ

(3)

27

ϖ2 ϖ1

Fig. 10.8: Pipe.

(b) From these relations deduce the connection coefficients (10.41). Exercise 10.10 Derivation and Practice: Expansion in Cylindrical and Spherical Coordinates Derive Eqs. (10.46) and (10.47) for the divergence of the vector field ξ in cylindrical and spherical coordinates using the connection coefficients (10.40) and (10.41). ****************************

10.6

T2 Solving the 3-Dimensional Elastostatic Equation in Cylindrical Coordinates: Simple Methods, Separation of Variables and Green’s Functions

10.6.1

Simple Methods: Pipe Fracture and Torsion Pendulum

As an example of an elastostatic problem with cylindrical symmetry, consider a cylindrical pipe that carries a high-pressure fluid (water, oil, natural gas, ...); Fig. 10.8 How thick must the pipe’s wall be to ensure that it will not burst due to the fluid’s pressure? We shall sketch the solution, leaving the details to the reader in Ex. 10.11. We suppose, for simplicity, that the pipe’s length is held fixed by its support system: it does not lengthen or shorten when the fluid pressure is changed. Then by symmetry, the displacement field in the pipe wall is purely radial and depends only on radius; i.e., its only nonzero component is ξ̟ (̟). The radial dependence is governed by radial force balance, f̟ = KΘ;̟ + 2µΣ̟j;j = 0 .

(10.50)

[Eq. (10.19)]. The expansion and the components of shear that appear in this force-balance equation can be read off the cylindrical-coordinate Eq. (10.46) and Eq. (2) of Box 10.3; most importantly Θ=

∂ξ̟ ξ̟ + . ∂̟ ̟

(10.51)

28 The second term in the radial force balance equation (10.50) is proportional to σ̟j;j which, using Eq. (10.46) and noting that the only nonzero connection coefficients are Γ̟φφ = −Γφ̟φ = −1/̟ [Eq. (10.40)] and that symmetry requires the strain tensor to be diagonal, becomes Σ̟j;j = Σ̟̟,̟ + Γ̟φφ Σφφ + Γφ̟φ Σ̟̟ . (10.52) Inserting the components of the strain tensor from Eq. (2) of Box 10.3 and the values of the connection coefficients and comparing the result with Expression (10.51) for the expansion, we obtain the remarkable result that σ̟j;j = 32 ∂Θ/∂̟. Inserting this into the radial force balance equation (10.51), we obtain 4µ ∂Θ f̟ = K + =0. (10.53) 3 ∂̟ Thus, inside the pipe wall, the expansion vanishes (the radial compression of the pipe material is equal and opposite to the azimuthal stretching), and correspondingly, the radial displacement must have the form [cf. Eq. (10.51)] ξ̟ = A̟ +

B ̟

(10.54)

for some constants A and B. The values of these constants are fixed by the boundary conditions at the inner and outer faces of the pipe wall: T̟̟ = P at ̟ = ̟1 (inner wall) and T̟̟ = 0 at ̟ = ̟2 (outer wall). Here P is the pressure of the fluid that the pipe carries and we have neglected the atmosphere’s pressure on the outer face by comparison. Evaluating T̟̟ = −KΘ − 2µΣ̟̟ in terms of ξ̟ and inserting (10.54) and then imposing these boundary conditions, we obtain A=

P ̟12 , 2K + 2µ/d ̟22 − ̟12

B=

P ̟12 ̟22 . 2µ ̟22 − ̟12

(10.55)

The only nonvanishing components of the shear then work out to be equal to the radial strain: 2 ̟2 P ̟12 3µ ∂ξ̟ (10.56) =− − Σ̟̟ = −Σφφ = S̟̟ = ∂̟ µ ̟22 − ̟12 2̟ 2 6K + 2µ This strain is maximal at the inner wall of the pipe; expressing it in terms of the ratio λ of the outer to the inner pipe radius λ = ̟2 /̟1 and using the values of K = 180GPa and µ = 81GPa for steel, we bring this maximum strain into the form S̟̟ =

P 5λ2 − 2 . µ 10(λ2 − 1)

(10.57)

The pipe will break at a strain ∼ 10−3 ; for safety it is best to keep the actual strain smaller than this by an order of magnitude, |S̟̟ | . 10−4 . A typical pressure for an oil pipeline is P ≃ 10 atmospheres ≃ 106 Pa, compared to the shear modulus of steel µ = 81 GPa, so P/µ ≃ 1.2 × 10−5 . Inserting this into Eq. (10.57) with |S̟̟ | . 10−4, we deduce

29 that the ratio of the pipe’s outer radius to its inner radius must be λ = ̟2 /̟1 & 1.02. If the pipe has a diameter of one meter, then its wall thickness should be at least one centimeter. This is typical of the pipes in oil pipelines. Exercise 10.12 presents a second fairly simple example of elastostatics in cylindrical coordinates: a computation of the period of a torsion pendulum.

10.6.2

Separation of Variables and Green’s Functions: Thermoelastic Noise in a LIGO Mirror

In more complicated situations that have moderate amounts of symmetry, the elastostatic equations can be solved by the same kinds of sophisticated mathematical techniques as one uses in electrostatics: separation of variables, Green’s functions, complex potentials, or integral transform methods; see, e.g. Gladwell (1980). We provide an example in this section, focusing on separation of variables and Green’s functions. Our example is chosen as one that makes contact with things we have already learned in optics (gravitational-wave interferometers; Sec. 8.5) and in statistical physics (the fluctuationdissipation theory; Sec. 5.6). This application is the computation of thermoelastic noise in second-generation gravitational-wave detectors such as Advanced LIGO. Our analysis is based on Braginsky, Gorodetsky and Vyatchanin (1999); see also Liu and Thorne (2000). Thermoelastic Context for the Elastostatic Problem to be Solved We discussed laser interferometer gravitational-wave detectors in Sec. 5.6 (see especially Fig. 8.11). Recall that in such a detector, a gravitational wave moves four test-mass mirrors relative to each other, and laser interferometry is used to monitor the resulting oscillatory changes in the mirror separations. As we discussed in Sec. 5.6.1, the separations actually measured are the differences in the mirrors’ generalized coordinates q, each of which is the longitudinal position ξz of the test mass’s mirrored front face, weighted by the laser beam’s Gaussian-shaped intensity distribution and averaged over the mirror’s face: q=

Z

2

2

e−̟ /̟o ξz (̟, φ)̟dφd̟ π̟o2

(10.58)

[Eq. (5.61) with a change of notation]. Here (̟, φ, z) are cylindrical coordinates with the axis ̟ = 0 along the center of the laser beam, ̟o ∼ 4 cm is the radius at which the light’s intensity has dropped by a factor 1/e, and ξz (̟, φ) is the longitudinal displacement of the mirror face. The gravitational-wave signal is the difference of mirror positions divided by the interferometer arm length L = 4 km: h(t) = {[q1 (t) − q2 (t)] − [q3 (t) − q4 (t)]} /L, where the subscripts label the four mirrors. The thermoelastic noise is uncorrelated between the four test masses and, because the test masses and the beam spots on them are identical, it is the same in all four test masses—which means that the spectral densities of their noises add incoherently, giving 4Sq (f ) . (10.59) Sh (f ) = L2 Here Sq (f ) is the spectral density of the fluctuations of the generalized coordinate q of any one of the test masses.

30 The thermoelastic noise is a variant of thermal noise; it arises when fluctuations of the thermal energy distribution inside a test mass randomly cause a slight increase (or decrease) in the test-mass temperature near the laser beam spot, and a corresponding thermal expansion (or contraction) of the test-mass material near the beam spot. This random expansion (or contraction) entails a displacement ξz of the test-mass surface and a corresponding random change of the generalized coordinate q [Eq. (10.58)]. In Ex. 10.14 we use the fluctuation-dissipation theorem (Sec. 5.6) to derive the following prescription for computing the spectral density Sq (f ) of these random thermoelastic fluctuations of q. Our prescription is expressed in terms of a thought experiment: Imagine applying a static, normal (z-directed) force Fo to the face of the test mass at the location of the beam spot, with the force distributed spatially with the same Gaussian profile as q so the applied stress is 2 2 e−̟ /̟o applied Fo . (10.60) Tzz = π̟o2 This applied stress induces a strain distribution S inside the test mass, and that strain includes an expansion Θ(̟, φ, z). The analysis in Ex. 10.14 shows that the spectral density of thermoelastic noise is expressible as follows in terms of an integral over the squared gradient of this expansion: Z 2κth E 2 α2 kT 2 2 (∇Θ) ̟dφd̟dz . (10.61) Sq (f ) = (1 − 2ν)2 CV2 ρ2 Fo2 (2πf )2 Here κth is the coefficient of thermal conductivity (Sec. 2.7), E is Young’s modulus, ν is the Poisson ratio, α is the coefficient of linear thermal expansion (fractional change of length induced by a unit change of temperature), T is temperature, CV is the specific heat per unit mass at constant volume, ρ is the density, and f is the frequency at which the noise is being evaluated. The computation of the thermoelastic noise, thus, boils down to computing the distribution Θ(̟, φ, z, t) of expansion induced by the applied stress (10.60), and then evaluating the integral in Eq. (10.61). The computation is made easier by the fact that Θ and ∇Θ are concentrated in a region of size ∼ ̟o ∼ 4 cm, which is small compared to the test-mass radius and length (∼ 16 cm), so in our computation we can idealize the test mass as having infinite radius and length (i.e., as being an “infinite half space” or “half-infinite body”).5 Equations of Elasticity in Cylindrical Coordinates, and their Solution Because the applied stress is cylindrical, the induced strain and expansion will also be cylindrical, and are thus computed most easily using cylindrical coordinates. One way to compute the expansion Θ is to solve the zero-internal-force equation f = (K + 31 µ)∇(∇ · ξ) + µ∇2 ξ = 0 for the cylindrical components ξ̟ (z, ̟) and ξz (z, ̟) of the displacement (a problem in elastostatics), and then evaluate the divergence Θ = ∇ · ξ. (The component ξφ vanishes by symmetry.) It is straightforward, using the techniques of Sec. 10.5, to compute the cylindrical components of f. Reexpressing the bulk and shear moduli 5

Finiteness of the test mass turns out to increase Sq (f ) by about 20 per cent (Liu and Thorne, 2000).

31 K and µ in terms of Young’s modulus E and Poisson’s ratio ν [Eq. (10.34)] and setting the internal forces to zero, we obtain 2 E ∂ ξ̟ 1 ∂ξ̟ ξ̟ f̟ = 2(1 − ν) + − 2 2(1 + ν)(1 − 2ν) ∂̟ 2 ̟ ∂̟ ̟ 2 ∂ 2 ξz ∂ ξ̟ =0, (10.62a) + (1 − 2ν) 2 + ∂z ∂z∂̟ 2 ∂ ξz 1 ∂ξz E (1 − 2ν) + fz = 2(1 + ν)(1 − 2ν) ∂̟ 2 ̟ ∂̟ ∂ 2 ξ̟ 1 ∂ξ̟ ∂ 2 ξz =0. (10.62b) + + 2(1 − ν) 2 + ∂z ∂z∂̟ ̟ ∂z These are two coupled, linear, second-order differential equations for the two unknown components of the displacement vector. As with the analogous equations of electrostatics and magnetostatics, these can be solved by separation of variables, i.e. by setting ξ̟ = R(̟)Z(z) and inserting into Eq. (10.62a). We seek solutions that die out at large ̟ and z. The general variables-separated solutions of this sort are Z ∞ ξ̟ = [α(k) − (2 − 2ν − kz)β(k)] e−kz J1 (k̟)kdk , Z0 ∞ [α(k) + (1 − 2ν + kz)β(k)] e−kz J0 (k̟)dk , (10.63) ξz = 0

where J0 and J1 are Bessel functions of order 0 and 1. Boundary Conditions The functions α(k) and β(k) are determined by boundary conditions on the face of the test mass: The force per unit area exerted across the face by the strained test-mass material, Tzj at z = 0 with j = {̟, φ, z}, must be balanced by the applied force per unit area, applied applied Tzj [Eq. (10.60)]. The (shear) forces in the φ direction, Tzφ and Tzφ , vanish because of cylindrical symmetry and thus provide no useful boundary condition. The (shear) force applied in the ̟ direction, which must vanish since Tz̟ = 0, is given by [cf. Eq. (2) in Box 10.3] Z ∞ ∂ξz ∂ξ̟ Tz̟ (z = 0) = −2µΣz̟ = −µ = −µ [β(k) − α(k)] J1 (kz)kdk = 0 , + ∂̟ ∂z 0 (10.64) which implies that β(k) = α(k). The (normal) force in the z direction, which must balance the applied normal force, is Tzz = −KΘ − 2µΣzz ; using Eq. (2) in Box 10.3 and Eqs. (10.63), this reduces to Z ∞ 2 2 e−̟ /̟o applied Fo cos(2πf t) , (10.65) Tzz (z = 0) = −2µ α(k)J0 (k̟)kdk = Tzz = π̟o2 0 which can be inverted6 to give α(k) = β(k) = − 6

1 −k2 ̟o2 /4 e Fo cos(2πf t) . 4πµ

(10.66)

The inversion and the subsequent evaluationRof the integral of (∇Θ)2 are Raided by the following expres∞ ∞ sions for the Dirac delta function: δ(k − k ′ ) = k 0 J0 (k̟)J0 (k ′ ̟)̟d̟ = k 0 J1 (k̟)J1 (k ′ ̟)̟d̟.

32 Inserting this into the Eqs. (10.63) for the displacement, and then evaluating the expansion Θ = ∇ · ξ = ξz,z + ̟ −1 (̟ξ̟ ),̟ , we obtain Z ∞ Θ = −4ν α(k)e−kz J0 (k̟)kdk . (10.67) 0

Side Remark: As in electrostatics and magnetostatics, so also in elasticity theory, one can solve an elastostatics problem using Green’s functions instead of separation of variables. We explore this, for our applied Gaussian force, in Ex. 10.15 below. For greater detail on Green’s functions in elastostatics and their applications, from an engineer’s viewpoint, see Johnson (1985). For other commonly used solution techniques see, e.g. Gladwell (1980). Noise Spectral Density It is now straightforward to compute the gradient of this expansion, square and integrate to get the spectral density Sq (f ) [Eq. (10.61)]. That result, when inserted into Eq. (10.59) gives, for the gravitational-wave noise, 32(1 + ν)2 κth α2 kT 2 . Sh (f ) = √ 2πCV2 ρ2 ̟o3 (2πf )2

(10.68)

A possible material for the test masses is sapphire, for which ν = 0.29, κth = 40 W m−1 K−1 , α = 5.0 × 10−6 K−1 , CV = 790 J kg−1 K−1 , ρ = 4000 kg m−3 . Inserting these into Eq. (10.68), along with the interferometer arm length L = 4 km, a laser-beam radius ̟o = 40 mm, and room temperature T = 300 K, we obtain the following result for the thermoelastic gravity-wave noise in a bandwidth equal to frequency: s p 100Hz . (10.69) f Sh (f ) = 2.6 × 10−23 f We shall explore the consequences of this noise for gravitational-wave detection in Chap. 26. **************************** EXERCISES Exercise 10.11 Derivation and Practice: Fracture of a Pipe Fill in the details of the text’s analysis of the deformation of a pipe carrying a high-pressure fluid, and the wall thickness required to protect the pipe against fracture, Sec. 10.6.1. Exercise 10.12 Practice: Torsion pendulum A torsion pendulum is a very useful tool for performing the classical Eötvös experiment and for seeking evidence for hypothetical fifth (not to mention sixth) forces (see, e.g., Will 1993 and references therein, or Fig. 1.6 of Misner, Thorne and Wheeler 1973). It would be advantageous to design a torsion pendulum with a one day period (Figure 10.9). In this exercise we shall estimate whether this is possible. The pendulum consists of a thin cylindrical wire of length l and radius a. At the bottom of the wire are suspended three masses at the corners of an equilateral triangle at a distance b from the wire.

33

2a

m m

b m

Fig. 10.9: Torsion Pendulum

(a) Show that the longitudinal strain is ξz;z =

3mg . πa2 E

(10.70a)

(b) What component of shear is responsible for the restoring force in the wire, which causes torsion pendulum to oscillate? (c) Show that the pendulum undergoes torsional oscillations with period 1/2 1/2 2 2b Eξz;z ℓ P = 2π g a2 µ

(10.70b)

(d) Do you think you could design a pendulum that attains the goal of a one day period? Exercise 10.13 Derivation and Practice: Evaluation of Elastostatic Force in Cylindrical Coordinates Derive Eqs. (10.62) for the cylindrical components of the internal elastostatic force per unit volume f = (K + 31 µ)∇(∇ · ξ) + µ∇2 ξ in a cylindrically symmetric situation. Exercise 10.14 Derivation and Example: Thermoelastic Noise Derive Eq. (10.61) for the thermoelastic noise in a gravitational-wave test mass by the following steps: First, read the discussion of the fluctuation-dissipation theorem in Sec. 5.6, and then read Ex. 5.7, which is the starting point for the derivation. Our Sq (f ) is given by Eq. (5.70). The key unknown quantity in this equation is the dissipation rate Wdiss associated with a sinusoidally oscillating applied stress [Eq. (10.60), multiplied by cos(2πf t)]. (a) There are three important time scales in this problem: (i) the oscillation period of the applied stress τapplied = 1/f ∼ 0.01 s, (ii) the time τsound for sound waves to travel across the test p mass (a distance ∼ 14 cm; the sound speed, as we shall see in Chap. 11, is roughly E/ρ), and (iii) the time τheat for diffusive heat conductivity to substantially change the temperature distribution inside the test mass (cf. the discussion of heat

34 conductivity in Sec. 2.7). Estimate, roughly, τsound and τheat , and thereby show that τsound ≪ τapplied ≪ τheat . Explain why this means that in evaluating Wdiss , we can (i) treat the test-mass strain as being produced quasistatically (i.e., we can ignore the inertia of the test-mass material), and (ii) we can treat the expansion of the testmass material adiabatically (i.e., ignore the effects of heat flow when computing the temperature distribution in the test mass). (b) Show that, when the test-mass material adiabatically expands by an amount ∆V /V = Θ, its temperature goes down by δT =

−αET Θ. CV ρ(1 − 2ν)

(10.71)

For a textbook derivation of this, see Sec. 6 of Landau and Lifshitz (1986). In that section a clean distinction is made between the bulk modulus K for expansions at constant temperature and that Kad for adiabatic expansion. For most materials, these bulk moduli are nearly the same; for example, for sapphire they differ by only ∼ 1 part in 105 [cf. Eqs. (6.7), (6.8) of Landau and Lifshitz (1986) and the numbers for sapphire at the end of Sec. 10.6.2; and note the difference of notation: α of this paper is 1/3 that of Landau and Lifshitz, and CV ρ of this paper is CV of Landau and Lifshitz]. (c) The inhomogeneity of the expansion Θ causes the temperature perturbation δT to be inhomogeneous, and that inhomogeneity produces a heat flux q = −κth ∇δT . Whenever an amount Q of heat flows from a region of high temperature T to one of slightly lower temperature T − dT , there is an increase of entropy, dS = Q/(T − dT ) − Q/T = QdT /T 2 . Show that for our situation, the resulting rate of entropy increase per unit volume is −q · ∇δT κth · (∇δT )2 dS = = . (10.72) dV dt T2 T2 (We shall rederive this fundamental result from a different viewpoint in Part IV.) (d) This entropy increase entails a creation of new thermal energy at a rate per unit volume dEth /dV dt = T dS/dV dt. Since, for our thought experiment with temporally oscillating applied stress, this new thermal energy must come from the oscillating elastic energy, the rate of dissipation of elastic energy must be Z Wdiss = κth (∇δT )2 T dV . (10.73) By combining with Eq. (10.71), inserting into Eq. (5.70) and averaging over the period τapplied of the applied force, derive Eq. (10.61) for Sq (f ). Explain why, in this equation, we can treat the applied force as static rather than oscillatory, which is what we did in the text. Exercise 10.15 *** Example: Green’s Function for Normal Force on Half-Infinite Body applied Suppose that a stress Tzj (xo ) is applied on the face z = 0 of a half-infinite elastic body (one that fills the region z > 0). Then by virtue of the linearity of the elastostatics

35 equation f = (K + 13 µ)∇(∇ · ξ) + µ∇2 ξ = 0 and the linearity of its boundary conditions, applied internal Tzj = Tzj , there must be a Green’s function Gjk (x−xo ) such that the body’s internal displacement ξ(x) is given by Z applied ξj (x) = Gjk (x − x0 )Tzk (xo )d2 xo . (10.74) Here the integral is over all points xo on the face of the body (z = 0), and x can be anywhere inside the body, z ≥ 0. (a) Show that, if a force Fj is applied on the body’s surface at a single point, the origin of coordinates, then the displacement inside the body is ξj (x) = Gjk (x)Fk .

(10.75)

Thus, the Green’s function can be thought of as the body’s response to a point force on its surface. (b) As a special case, consider a point force Fz directed perpendicularly into the body. The resulting displacement turns out to have cylindrical components7 z2 1 + ν 2(1 − ν) + 3 Fz , ξz = Gzz (̟, z)Fz = 2πE ̟ ̟ (1 + ν)(1 − 2ν) Fz ξ̟ = G̟z (̟, z)Fz = − . (10.76) 2πE ̟ It is straightforward to show that this displacement does satisfy the elastostatics equations (10.62). Show that it also satisfies the required boundary condition Tz̟ (z = 0) = −2µΣz̟ = 0. (c) Show that for this displacement, Tzz = −KΘ − 2µΣzz vanishes everywhere on the body’s surface z = 0 except at the origin ̟ = 0 and is infinite there. Show that the integral of this normal stress over the surface is Fz , and therefore Tzz (z = 0) = Fz δ2 (x) where δ2 is the two-dimensional Dirac delta function in the surface. This is the second required boundary condition (d) Plot the integral curves of the displacement vector ξ (i.e. the curves to which ξ is parallel) for a reasonable choice of Poisson’s ratio ν. Explain physically why the curves have the form you find. (e) One can use the Green’s function (10.76) to compute the displacement ξ induced by the Gaussian-shaped pressure (10.60) applied to the body’s face, and to then evaluate the induced expansion and thence the thermoelastic noise; see Braginsky, Gorodetsky and Vyatchanin (1999), or Liu and Thorne (2000). The results agree with those (10.67) and (10.68) deduced using separation of variables. **************************** 7

For the other components of the Green’s function, written in Cartesian coordinates (since a non-normal applied force breaks the cylindrical symmetry), see Eqs. (8.18) of Landau and Lifshitz (1986).

36

10.7

Reducing the Elastostatic Equations to One Dimension for a Bent Beam; Cantilever Bridges

When dealing with bodies that are much thinner in two dimensions than the third (e.g. rods, wires, and beams), one can use the method of moments to reduce the three-dimensional elastostatic equations to ordinary differential equations in one dimension (a process called dimensional reduction). We have already met an almost trivial example of this in our discussion of Hooke’s law and Young’s modulus (Sec. 10.4 and Fig. 10.1). In this section we shall discuss a more complicated example, the bending of a beam through a small displacement angle; and in Ex. 10.17 we shall analyze a more complicated example: the bending of a very long, elastic wire into a complicated shape called an elastica. Our beam-bending example is motivated by a common method of bridge construction, which uses cantilevers. (A famous historical example is the old bridge over the Firth of Forth in Scotland that was completed in 1890 with a main span of half a km.) The principle is to attach two independent beams to the two shores and allow them to meet in the middle. (In practice the beams are usually supported at the shores on piers and strengthened along their lengths with trusses.) Let us make a simple model of a cantilever (Figure 10.10). Consider a beam clamped rigidly at one end, with length ℓ, horizontal width w and vertical thickness h. Introduce local cartesian coordinates with ex pointing along the beam and ez pointing vertically upward. Imagine the beam extending horizontally in the absence of gravity. Now let it sag under its own weight so that each element is displaced through a small distance ξ(x). The upper part of the beam is stretched while the lower part is compressed, so there must be a neutral surface where the horizontal strain ξx,x vanishes. This neutral surface must itself be curved downward. Let its downward displacement from the horizontal plane that it occupied before sagging be η(x)(> 0), let a plane tangent to the neutral surface make an angle θ(x) (also > 0) with the horizontal, and adjust the x and z coordinates so x runs along the slightly curved neutral plane and z is orthogonal to it (Fig. 10.10). The longitudinal strain is then given to first order in small quantities by ξx,x =

dθ d2 η z =z ≃z 2 , R dx dx

(10.77)

where R = dx/dθ > 0 is the radius of curvature of the beam’s bend and we have chosen z = 0 at the neutral surface. The one-dimensional displacement η(x) will be the focus of dimensional reduction of the elastostatic equations. As in our discussion of Hooke’s law for a stretched rod (Sec. 10.4), we can regard the beam as composed of a bundle of long parallel fibers, stretched along their length and free to contract transversely. The longitudinal stress is therefore Txx = −Eξx,x = −Ez

d2 η . dx2

(10.78)

We can now compute the horizontal force density, which must vanish in elastostatic

37

z

z neutral surface

x

x h

(a)

S

dx -M M+dM

(b)

dx

S S

(c)

S + dS

(d)

Wdx

Fig. 10.10: Bending of a cantilever. a) A beam is held rigidly at one end and extends horizontally with the other end free. We introduce an orthonormal coordinate system (x, y, z) with ex extending along the beam. We only consider small departures from equilibrium. The bottom of the beam will be compressed, the upper portion extended. There is therefore a neutral surface z = 0 on which the strain ξx,x vanishes. b) The beam shown here has rectangular cross section with horizontal width w, vertical thickness h and length ℓ. c) The bending torque M must be balanced by the torque exerted by the vertical shearing force S. d) S must vary along the beam so as to support the beam’s weight per unit length, W .

equilibrium8

d3 η − Txz,z = 0. (10.79) dx3 This is a partial differential equation. We convert it into a one-dimensional ordinary differential equation by the method of moments: We multiply it byR z and integrate over z (i.e., we compute its “first moment”). Integrating the second term, zTxz,z dz, by parts and using the boundary condition Txz = 0 on the upper and lower surfaces of the beam, we obtain Z h/2 Eh3 d3 η =− Txz dz . (10.80) 12 dx3 −h/2 fx = −Txx,x − Txz,z = Ez

Notice (using Txz = Tzx ) that the integral, when multiplied by the beam’s width w in the y 8

Because the coordinates are curvilinear, there are connection coefficient terms in this equation that have been omitted: −Γxjk Tjk − Γjkj Txk . However each Γ has magnitude 1/R so these terms are of order Tjk /R, whereas the terms kept in Eq. (10.79) are of order Txx /ℓ and Txz /h; and since the thickness h and length ℓ of the beam are small compared to the beam’s radius of curvature R, the connection-coefficient terms are negligible.

38 direction, is the vertical shearing force S(x) in the beam: S=

Z

Tzx dydz = w

Here D≡E

Z

Z

h/2

Tzx dz = −D −h/2

d3 η . dx3

z 2 dydz = Ewh3 /12

(10.81)

(10.82)

is called the beam’s flexural rigidity. Notice that it is the second moment of the beam’s Young’s modulus. As an aside, we can gain some insight into Eq. (10.81) by examining the torques that act on a segment of the beam with length dx. As shown in Fig. 10.10c, the shear forces on the two ends of the segment exert a clockwise torque 2S(dx/2) = Sdx. This is balanced by a counterclockwise torque due to the stretching of the upper half of the segment and compression of the lower half, i.e. due to the bending of the beam. This bending torque is M≡

Z

Txx zdydz = −D

d2 η dx2

(10.83)

on the right end of the segment and minus this on the left, so torque balance says (dM/dx)dx = Sdx, i.e. S = dM/dx . (10.84) This is precisely Eq. (10.81). Equation (10.81) [or equivalently (10.84)] embodies half of the elastostatic equations. It is the x component of force balance fx = 0, converted to an ordinary differential equation by R evaluating its lowest non-vanishing moment: its first moment, zfx dydz = 0 [Eq. (10.80)]. The other half is the z component of stress balance, which we can write as Tzx,x + Tzz,z + ρg = 0

(10.85)

(vertical elastic force balanced by gravitational pull on the beam). We can convert this to a one-dimensional ordinary differential equation by taking its lowest nonvanishing moment, its zero’th moment, i.e. by integrating over y and z. The result is dS = −W , dx

(10.86)

where W = gρwh is the beam’s weight per unit length. Combining our two dimensionally reduced components of force balance, Eqs. (10.81) and (10.86), we obtain a fourth order differential equation for our one-dimensional displacement η(x): W d4 η (10.87) = . 4 dx D (Fourth order differential equations are characteristic of elasticity.)

39 Equation (10.87) can be solved subject to four appropriate boundary conditions. However, before we solve it, notice that for a beam of a fixed length ℓ, the deflection η is inversely proportional to the flexural rigidity. Let us give a simple example of this scaling. American floors are conventionally supported by wooden joists of 2” (inch) by 6” lumber with the 6” side vertical. Suppose an inept carpenter installed the joists with the 6” side horizontal. The flexural rigidity of the joist would be reduced by a factor 9 and the center of the floor would be expected to sag 9 times as much as if the joists had been properly installed – a potentially catastrophic error. Also, before solving Eq. (10.87), let us examine the approximations that we have made. First, we have assumed that the sag is small compared with the length of the beam in making the small angle approximation in Eq. (10.77), and we have assumed the beam’s radius of curvature is large compared to its length in neglecting connection coefficient terms (footnote 8). These will usually be the case, but are not so for the elastica studied in Ex. 10.17. Second, by using the method of moments rather than solving for the complete local stress tensor field, we have ignored the effects of some components of the stress tensor. In particular, in evaluating the bending torque [Eq. (10.83)] we have ignored the effect of the Tzx component of the stress tensor. This is O(h/ℓ)Txx and so our equations can only be accurate for fairly slender beams. Third, the extension above the neutral surface and the compression below the neutral surface lead to changes in the cross sectional shape of the beam. The fractional error here is of order the longitudinal shear, which is small for real materials. The solution to Eq. (10.87) is a fourth order polynomial with four unknown constants to be set by boundary conditions. In this problem, the beam is held horizontal at the fixed end so that η(0) = η ′ (0) = 0, where ′ = d/dx. At the free end, Tzx and Txx must vanish, so the shearing force S must vanish, whence η ′′′ (ℓ) = 0 [Eq. (10.81)]; and the bending torque R M [Eq. (10.83)] must also vanish, whence [by Eq. (10.84)] Sdx ∝ η ′′ (ℓ) = 0. By imposing these four boundary conditions η(0) = η ′ (0) = η ′′ (ℓ) = η ′′′ (ℓ) on the solution of Eq. (10.87), we obtain for the beam shape W 1 2 2 1 3 1 4 η(x) = . (10.88a) ℓ x − ℓx + x D 4 6 24 Therefore the end of the beam sags by W ℓ4 . (10.88b) 8D Problems in which the beam rests on supports rather than is clamped can be solved in a similar manner. The boundary conditions will be altered, but the differential equation (10.87) will be unchanged. Now suppose that we have a cantilever bridge of constant vertical thickness h and total span 2ℓ ∼100m made of material with density ρ ∼ 8 × 103 kg m−3 (e.g. reinforced concrete) and Young’s modulus E ∼ 100GPa. Suppose further that we want the center of the bridge to sag by no more than η ∼ 1m. According to Eq. (10.88b), the thickness of the beam must satisfy 1/2 3ρgℓ4 ∼ 2.7m . (10.89) h& 2Eη η(ℓ) =

40 ez x F

ex

z

θ

w h

F

(a)

(b)

(c)

(d)

Fig. 10.11: Elastica. (a) A bent wire is in elastostatic equilibrium under the action of equal and opposite forces applied at its two ends. x measures distance along the neutral surface; z measures distance orthogonal to the wire in the plane of the bend. (b), (c), (d) Examples of the resulting shapes.

This estimate makes no allowance for all the extra strengthening and support present in real structures (e.g. via trusses and cables) and so it is an overestimate. **************************** EXERCISES Exercise 10.16 Derivation: Sag in a cantilever (a) Verify Eqs. (10.88) for the sag in a horizontal beam clamped at one end and allowed to hang freely at the other end. (b) Now consider a similar beam with constant cross section and loaded with weights so that the total weight per unit length is W (x). Give a Green’s function for the sag of the free end in terms of an integral over W (x).

Exercise 10.17 *** Example: Elastica Consider a slender wire of rectangular cross section resting on a horizontal surface (so gravity is unimportant), with horizontal thickness h and vertical thickness w. Let the wire be bent in the horizontal plane (so gravity is unimportant) as a result of equal and opposite forces F that act at its ends; Fig. 10.11. The various shapes the wire can assume are called elastica; they were first computed by Euler in 1744 and are discussed on pp. 401–404 of Love (1927). The differential equation that governs the wire’s shape is similar to that for the cantilever, Eq. (10.87), with the simplification that the wire’s weight does not enter the problem and the complication that the wire is long enough to deform through large angles.

41 It is convenient (as in the cantilever problem, Fig. 10.10) to introduce curvilinear coordinates with coordinate x measuring distance along the neutral surface, z measuring distance orthogonal to x in the plane of the bend (horizontal plane), and y measured perpendicular to the bending plane (vertically). The unit vectors along the x, y, and z directions are ex , ey , ez (Figure 10.11). Let θ(x) be the angle between ex and the applied force F; θ(x) is determined, of course, by force and torque balance. (a) Show that force balance along the x and z directions implies Z Z F cos θ = Txx dydz , F sin θ = Tzx dydz ≡ S . (b) Show that torque balance for a short segment of wire implies Z dM S= , where M(x) ≡ zTxx dydz is the bending torque. dx

(10.90a)

(10.90b)

(c) Show that the stress-strain relation in the wire implies M = −D

dθ , dx

(10.90c)

where D = Ewh3 /12 is the flexural rigidity, Eq. (10.82). (d) From the above relations, derive the following differential equation for the shape of the wire: F sin θ d2 θ =− . (10.90d) 2 dx D This is the same equation as desribes the motion of a simple pendulum! (e) Go back through your analysis and identify any place that connection coefficients would enter into a more careful computation, and explain why the connection-coefficient terms are neglible. (f) Find one non-trivial solution of the elastica equation (10.90d) either analytically using elliptic integrals or numerically. (The general solution can be expressed in terms of elliptic integrals.) (g) Solve analytically or numerically for the shape adopted by the wire corresponding to your solution in (f), in terms of Cartesian coordinates (X, Z) in the bending (horizontal) plane. Hint: express the curvature of the wire, 1/R = dθ/dx as " 2 #−3/2 d2 X dX dθ 1+ = . (10.90e) dx dZ 2 dZ (h) Obtain a uniform piece of wire and adjust the force F to compare your answer with experiment.

42 Z x X 0

F Fig. 10.12: Foucault Pendulum

Exercise 10.18 Example: Foucault Pendulum In the design of a Foucault pendulum for measuring the earth’s general relativistic “gravitomagnetic field” (discussed further in Part VI), it is crucial that the pendulum’s restoring force be isotropic, since anisotropy will make the swinging period be different in different planes and thereby will cause precession of the plane of swing (Braginsky, Polnarev and Thorne 1984). The answer to the elastica exercise 10.17 can be adapted to model the effect of anisotropy on the pendulum’s period. (a) Consider a pendulum of mass m and length ℓ suspended as shown in Figure 10.12 by a rectangular wire with thickness h in the plane of the bend (X − Z plane) and thickness w orthogonal to that plane (Y direction). Explain why the force that the wire exerts on the mass is −F = −(mg cos θo + mℓθ˙o2 )ex , where g is the acceleration of gravity, θo is defined in the figure, θ˙o is the time derivative of θo due to the swinging of the pendulum, and in the second term we have assumed that the wire is long compared to its region of bend. Express the second term in terms of the amplitude of swing θomax , and show that for small amplitudes θomax ≪ 1, F ≃ mgex . Use this approximation in the subsequent parts. (b) Assuming that all along the wire, its angle θ(x) to the vertical is small, θ ≪ 1, show that θ(x) = θo [1 − e−x/λ ] , (10.91a) where λ (not to be confused with the Lame constant) is λ=

h , (12ǫ)1/2

(10.91b)

ǫ = ξx,x is the longitudinal strain in the wire, and h is the wire’s thickness in the plane of its bend. Note that the bending of the wire is concentrated near the support, so this is where dissipation will be most important and where most of the suspension thermal noise will arise (cf. Sec. 5.6 for discussion of thermal noise). (c) Hence show that the shape of the wire is given in terms of cartesian coordinates by Z = [X − λ(1 − e−X/λ )]θo ,

(10.91c)

43 and that the pendulum period is P = 2π

ℓ−λ g

1/2

.

(10.91d)

(d) Finally show that the pendulum periods when swinging along ex and ey differ by 1/2 δP 1 h−w = . (10.91e) P ℓ 48ǫ From this one can determine how accurately the two thicknesses h and w must be equal to achieve a desired degree of isotropy in the period. A similar analysis can be carried out for the more realistic case of a slightly elliptical wire.

****************************

10.8

Bifurcation, Buckling and Mountain Folding

So far, we have considered stable elastostatic equilibria, and have implicitly assumed that the only reason for failure of a material is exceeding the elastic limit. However, anyone who has built a house of cards knows that mechanical equilibria can be unstable, with startling consequences. A large scale example is the formation of mountains. The surface of the earth is covered by several interlocking horizontal plates that are driven into each other by slow (hundred-million-year) convective motions in the underlying mantle. When plates are pushed together, mountains can be formed in two ways: by folding (e.g. the Jura Mountains of France) which sometimes happens when a portion of crust is compressed in just one direction, and by forming domes (e.g. the Black Hills of Dakota) which arise when there is simultaneous compression along two directions. For a simple model of folding, take a new playing card and squeeze it between your finger and thumb (Figure 10.13). When you squeeze gently, the card remains flat, but when you gradually increase the force past a critical value Fcrit , the card suddenly “buckles,” i.e. bends; and the curvature of the bend then increases rapidly with the applied force. For a force somewhat higher than Fcrit there are equilibrium states of “higher quantum number”, i.e. with one or more nodes in the transverse displacement η(x) of the card. To understand this quantitatively, we derive an eigenequation for the transverse displacement η as a function of distance x from one end of the card. (Although the card is effectively two dimensional, it has translation symmetry along its transverse dimension, so we can use the one-dimensional equations of the previous section.) We suppose that the ends are free to pivot but not move, so η(0) = η(ℓ) = 0 . (10.92) For small displacements, the bending torque of our dimensionally-reduced one-dimensional theory is [Eq. (10.83)] d2 η M(x) = −D 2 , (10.93) dx

44 F

η (x)

x

F w

Fig. 10.13: A playing card of length ℓ, width w and thickness h is subjected to a compressive force F , applied at both ends. The ends of the card are fixed but are free to pivot. 2

F=0 F>Fcrit η0 F=Fcrit

Fig. 10.14: Schematic illustration of the behavior of the frequency of small oscillations about equilibrium and the displacement of the center of the card, η0 . Equilibria with ω 2 > 0 are stable; those with ω 2 < 0 are unstable. The applied force F increases in the direction of the arrows.

where D = wh3 E/12 is the flexural rigidity [Eq. (10.82)]. As the card is very light (negligible gravity), the total torque around location x, acting on a section of the card from x to one end, is the bending torque applied at x plus the torque associated with the applied force −F η(x), and this sum must vanish: D

d2 η + Fη = 0 . dx2

(10.94)

The eigenfunction solutions of Eq. (10.94) satisfying boundary conditions (10.92) are η = η0 sin kx , where k=

F D

1/2

=

nπ ℓ

for non-negative integers n.

(10.95a)

(10.95b)

Therefore, there is a critical force given by Fcrit =

π 2 wh3 E π2D = , ℓ2 12ℓ2

(10.96)

45 V

V

η0

(a)

η0

(b)

Fig. 10.15: Representation of bifurcation by a potential energy function V (ξ0 ). a) When the applied force is small, there is only one stable equilibrium. b) As the applied force F is increased, the bottom of the potential well flattens and eventually the number of equilibria increases from one to three, of which only two are stable.

below which there is no solution except η = 0 (an unbent card). When the applied force is equal to Fcrit , the unbent card is still a solution, and there is an additional solution (10.95) with n = 1 (a single arch with no nodes). The linear approximation, which we have used, cannot tell us the height η0 of the arch as a function of F ; it reports, incorrectly, that for F = Fcrit all arch heights are allowed and for F > Fcrit there is no solution with n = 1. When nonlinearities are taken into account (Ex. 10.19), the force F and the arch height ηo are related by 1 2 4 F = Fcrit 1 + sin (θo /2) + O[sin (θo /2)] , (10.97) 2 The sudden appearance of this new n = 1, arched equilibrium state as F is increased through Fcrit is called a bifurcation of equilibria. Bifurcations show up sharply in the elastodynamics of the playing card, as we shall see in Sec. 11.3.5: When F < Fcrit , small perturbations of the card’s shape oscillate stably. When F = Fcrit , the card is neutrally stable, and its zero-frequency motion leads the card from its straight equilibrium state to its n = 1 bent equilibrium. When F > Fcrit , the straight card is an unstable equilibrium: its n = 1 perturbations grow in time, driving the card toward the n = 1 equilibrium state of Eq. (10.97). Another nice way of looking at bifurcations is in terms of energy. Consider candidate equilibrium states labeled by the height η0 of their arch. For each value of η0 , give the card (for concreteness) the n = 1 sine-wave shape η = η0 sin(πx/ℓ). Compute the total elastic energy U(η0 ) associated with the card’s bending and subtract off the work F δX done on the card by the applied forces F when the card arches from η0 = 0 to height η0 . (Here δX(η0 ) is the arch-induced decrease in straight-line separation between card’s the ends). The resulting quantity, V (η0 ) = U − F δX is the card’s free energy — analogous to the Gibb’s free energy G = E − T S + P V of thermodynamics; it is the relevant energy for analyzing the card’s equilibrium and dynamics, when the force F is continually being applied at the two ends. This free energy has the shapes shown in Fig. 10.15. At small values of the force [curve (a)], the free energy has only one minimum η0 = 0 corresponding to a single stable equilibrium, the straight card. However, as the force is increased through Fcrit , the potential minimum flattens out and then becomes a maximum flanked by two new minima [curve (b)]. The

46 maximum for F > Fcrit is the unstable, zero-displacement (straight-card) equilibrium and the two minima are the two stable finite amplitude equilibria with positive and negative η0 . This procedure of representing a continuous system with an infinite number of degrees of freedom by just one or a few coordinates and finding the equilibrium by minimizing a free energy is quite common and powerful. Coordinates like η0 are sometimes called state variables, and physical parameters like the force F are then called control variables. The compressed card’s bifurcation is an example of a cusp catastrophe; it is an analog of the catastrophes we met in geometrical optics in Sec. 6.5. Other examples of bifurcations include the failure of struts under excessive compressive loads, the instability (or whirling) of a drive shaft when it rotates too rapidly, and the development of triaxiality in self-gravitating fluid masses (i.e. stars) when their rotational kinetic energy becomes comparable with their gravitational energy. Let us now return to the problem of mountain folding with which we began this section. Our playing-card model is obviously inadequate to describe the full phenomenon, as we have omitted gravitational forces and the restoring force associated with the earth’s underlying mantle. Gravity causes no difficulties of principle as it just changes the equilibrium state. Coupling to the mantle can be modeled by treating it as an underlying viscoelastic medium. When it departs from equilibrium, the mantle changes not on a dynamical time (the time for a seismic wave to cross it, of order minutes), but instead on the time for the mantle rocks to flow, typically millions of years. Despite these limitations, our playing-card model can give a semiquantitative understanding of why plates of rock (E ∼ 100GPa, ν ∼ 0.25), buckle when subjected to large, horizontal, compressive forces. **************************** EXERCISES Exercise 10.19 Derivation and Example: Bend as a Function of Applied Force Derive Eq. (10.97) relating the angle θo ≃ (dη/dx)x=0 to the applied force F when the card has an n = 1, arched shape. Hint: Use the elastica differential equation d2 θ/dx2 = −(F/D) sin θ [Eq. (10.90d)] for the angle between the card and the applied force at distance x from the card’s end. The sin θ becomes θ in the linear approximation used in the text; the nonlinearities embodied in the sine give rise to the desired relation. The following steps along the way toward a solution are mathematically the same as used when computing the period of a pendulum as a function of its amplitude of swing. (a) Derive the first integral of the elastica equation (dθ/dx)2 = 2(F/D)(cos θ − cos θo ) ,

(10.98)

where θo is an integration constant. Show that the boundary condition of no bending torque (no inflexion of the card’s shape) at the card ends implies θ = θo at x = 0 and x = ℓ; whence θ = 0 at the card’s center, x = ℓ/2. (b) Integrate the differential equation (10.98) to obtain r Z ℓ dθ D θo √ . = 2 2F 0 cos θ − cos θo

(10.99)

47 (c) Perform the change of variable sin(θ/2) = sin(θo /2) sin φ and thereby bring Eq. (10.99) into the form r r Z π/2 D D dφ p = 2 ℓ=2 K[sin2 (θo /2)] . (10.100) 2 2 F 0 F 1 − sin (θo /2) sin φ Here K(y) is the complete elliptic integral of the first type, with the parametrization used by Mathematica (which differs from many books).

(d) Expand Eq. (10.100) in powers of sin2 (θo /2) to obtain 4 2 1 2 11 4 2 F = Fcrit 2 K [sin (θo /2)] = Fcrit 1 + sin (θo /2) + sin (θo /2) + . . . , (10.101) π 2 32 which is our desired result (10.97). Exercise 10.20 Practice: The Height of Mountains Estimate the maximum size of a mountain by requiring that the shear stress in the underlying rocks not exceed the elastic limit. Compare your answer with the height of the tallest mountains on Earth. Exercise 10.21 Example: Neutron Star Crusts The crust of a neutron star is made of iron (A = 56, Z = 26) at density ρ. It is supported against the pull of gravity by the pressure of a relativistic, degenerate, electron gas (Sec. 2.5.4) whose iron ions are arranged in a body centered cubic lattice that resists shearing. Estimate the ratio of shear modulus to bulk modulus µ/K, and use this to estimate roughly the ratio of height to width of mountains on a neutron star. You might proceed as follows: (a) Show that the electron Fermi energy is given by EF = (3π 2 ne )1/3 ~c , where ne = Zρ/AmP is the free electron density. Hence show the Fermi pressure is given by 1 pF = ne EF . 4 (b) Use the definition of Bulk Modulus preceding Eq. (10.18) to express it in the form 1 K = ne EF 3 (c) Show that the iron ions’ body centered cubic lattice produces a shear modulus of magnitude n 4/3 e µ=C Z 2 e2 , Z where C is a numerical constant of order unity. Hence show that the ratio of the shear modulus to the bulk modulus is 2 2/3 e 3 µ 2/3 . = CZ K π ~c

48 (d) By an order-of-magnitude analysis of stress balance inside a mountain, estimate its ratio of height to width. Your answer should be very small compared to unity.

****************************

10.9

T2 Reducing the Elastostatic Equations to Two Dimensions for a Deformed Thin Plate: StressPolishing a Telescope Mirror

The world’s largest optical telescopes, the two ten meter Keck telescopes, are located on Mauna Kea in Hawaii. It is very difficult to support traditional, monolithic mirrors so that they maintain their figure as the telescope slews, because they are so heavy; so for Keck a new method of fabrication was sought. The solution devised by Jerry Nelson and his colleagues was to construct the telescope out of 36 separate hexagons, each 0.9m on a side. However, this posed a second problem, grinding each hexagon’s reflecting surface to the required hyperboloidal shape. For this, a novel technique called stressed mirror polishing was developed. This technique relies on the fact that it is relatively easy to grind a surface to a spherical shape, but technically highly challenging to create a non-axisymmetric shape. So, during the grinding, stresses are applied around the boundary of the mirror to deform it and a spherical surface is produced. The stresses are then removed and the mirror springs into the desired nonspherical shape. Computing the necessary stresses is a problem in classical elasticity theory and, in fact, is a good example of a large number of applications where the elastic body can be approximated as a thin plate and its shape can be analyzed using elasticity equations that are reduced from three dimensions to two by the method of moments. For stress polishing of mirrors, the applied stresses are so large that we can ignore gravitational forces (at least in our simplified treatment). We suppose that the hexagonal mirror has a uniform thickness h and idealize it as a circle of radius R, and we introduce Cartesian coordinates with (x, y) in the horizontal plane (the plane of the mirror before deformation and polishing begin), and z vertical. The mirror is deformed as a result of a net vertical force per unit area (pressure) F (x, y). This force is applied at the lower surface when positive and the upper surface when negative. In addition there are shear forces and bending moments applied around the rim of the mirror. As in our analysis of a cantilever in Sec. 10.7, we assume the existence of a neutral surface in the deformed mirror, where the horizontal strain vanishes, Tab = 0. (Here and below we use letters from the early part of the Latin alphabet for horizontal x = x1 , y = x2 components.) We denote the vertical displacement of the neutral surface by η(x, y). By applying the method of moments to the three-dimensional equation stress balance Tjk,k = 0 in a manner similar to our cantilever analysis, we obtain the following two-dimensional equation for the mirror’s shape: ∇2 (∇2 η) = F (x, y)/D .

(10.102a)

49 glass block

radial arm

r1

F3 10cm

r2 F1

F2

Fig. 10.16: Schematic showing mirror blank, radial arm and lever assembly used to apply shear forces and bending torques to the rim of a mirror in stress polishing. (F1 need not equal F2 as there is a pressure F applied to the back surface of the mirror and forces applied at 23 other points around its rim.) The shear force is S = F2 − F1 and the bending torque is M = r2 F2 − r1 F1

Here ∇2 is the horizontal Laplacian, i.e. ∇2 η ≡ η,aa = η,xx + η,yy . Equation (10.102a) is the two-dimensional analog of the equation d4 η/dx4 = W/D for the shape of a cantilever [Eq. (10.87)], and the two-dimensional flexural rigidity that appears in it is D=

Eh3 , 12(1 − ν 2 )

(10.102b)

where E is the mirror’s Young’s modulus, h is its thickness and ν is its Poisson ratio. The quantity ∇2 ∇2 that operates on η in the shape equation (10.102a) is called the biharmonic operator ; it also appears in 3-dimensional form in the biharmonic equation (10.29a) for the displacement inside a homogeneous, isotropic body to which surface stresses are applied. The shape equation (10.102a) must be solved subject to boundary conditions around the mirror’s rim: the applied shear forces and bending torques. The individual Keck mirror segments were constructed out of a ceramic material with Young’s modulus E = 89GPa and Poisson’s ratio ν = 0.24 (cf. Table 10.1). A mechanical jig was constructed to apply the shear forces and bending torques at 24 uniformly spaced points around the rim of the mirror (Figure 10.16). The maximum stress was applied for the six outermost mirrors and was 2.4 × 106 N m−2 , 12 per cent of the breaking tensile strength (2 × 107 N m−2 ). This stress-polishing worked beautifully and the Keck telescopes have become highly successful tools for astronomical research. **************************** EXERCISES Exercise 10.22 *** Derivation and Example: Dimensionally Reduced Shape Equation for a Stressed Plate Use the method of moments (Sec. 10.7) to derive the two-dimensional shape equation (10.102a) for the stress-induced deformation of a thin plate, and expression (10.102b) for the 2-dimensional flexural rigidity. Here is a step-by-step guide, in case you want or need it: (a) First show, on geometrical grounds, that the in-plane strain is related to the vertical displacement by [cf. Eq. (10.77)] ξa,b = −zη,ab .

(10.103a)

50 (b) Next derive an expression for the horizontal components of the stress, Tab , in terms of double derivatives of the displacement function η(x, y) [analog of Txx = −Ezd2 η/dx2 , Eq. (10.78), for a stressed rod]. This can be done (i) by arguing on physical grounds that the vertical component of stress, Tzz , is much smaller than the horizontal components and therefore can be approximated as zero [an approximation to be checked in part (f) below], (ii) by expressing Tzz = 0 in terms of the strain and thence displacement and using Eqs. (10.34) to arrive at 1 − 2ν Θ=− z∇2 η , (10.103b) 1−ν where ∇2 is the horizontal Laplacian, (iii) by then writing Tab in terms of Θ and ξa,b and combining with Eqs. (10.103a) and (10.103b) to get the desired equation: ν η,ab 2 . (10.103c) Tab = Ez ∇ η δab + (1 − ν 2 ) (1 + ν) (c) With the aid of this equation, write the horizontal force density in the form fa = −Tab,b − Taz,z = −

Ez ∇2 η,a − Taz,z = 0 . 2 1−ν

(10.103d)

Then, as in the cantilever analysis [Eq. (10.80)], reduce the dimensionality of this force equation by the method of moments. The zero’th moment (integral over z) vanishes; why? Therefore, the lowest nonvanishing moment is the first (multiply by z and integrate). Show that this gives Z Sa ≡ Tza dz = D∇2 η,a , (10.103e) where D is the 2-dimensional flexural rigidity (10.102b). The quantity Sa is the vertical shear force per unit length acting perpendicular to a line in the mirror, whose normal is in the direction a; it is the 2-dimensional analog of a stressed rod’s shear force S [Eq. (10.81)]. (d) For physical insight into Eq. (10.103e), define the bending torque per unit length (bending torque density) Z Mab ≡ zTab dz , (10.103f) and show with the aid of Eq. (10.103c) that (10.103e) is the law of torque balance Sa = Mab,b — the 2-dimensional analog of a stressed rod’s S = dM/dx [Eq. (10.84)]. (e) Compute the total vertical shearing force acting on a small area of the plate as the line integral of Sa around its boundary, and by applying Gauss’s theorem, deduce that the vertical shear force per unit area is Sa,a . Argue that this must be balanced by the net force density F applied to the face of the plate, and thereby deduce the law of vertical force balance. Sa,a = F . (10.103g)

51 By combining with the law of torque balance (10.103e), obtain the plate’s bending equation ∇2 (∇2 η) = F/D, Eq. (10.102a) — the final result we were seeking. (f) Use this bending equation to verify the approximation made in part (b), that Tzz is small compared to the horizontal stresses; specifically, show that Tzz ≃ F is O(h/R)2Tab , where h is the plate thickness and R is the plate radius. Exercise 10.23 Example: Paraboloidal Mirror Show how to construct a paraboloidal mirror of radius R and focal length f by stressed polishing. (a) By comparing the shape of a paraboloid to that of a sphere of similar curvature at the origin, show that the required vertical displacement of the stressed mirror is r4 , 64f 3 where r is the radial coordinate and we only retain terms of leading order. η(r) =

(b) Hence use Eq. (10.102a) to show that a uniform force per unit area F =

D , f3

where D is the Flexural Rigidity, must be applied to the bottom of the mirror. (Ignore the weight of the mirror.) (c) Hence show that if there are N equally-spaced levers attached at the rim, the vertical force applied at each of them is πDR2 Szr = Nf 3 and the associated bending torque is πDR3 M= . 2Nf 3 (d) Show that the radial displacement is r3z , 16f 3 where z is the vertical distance from the neutral surface, halfway through the mirror. ξr = −

(e) Hence evaluate the expansion Θ and the components of the strain tensor Σ and show that the maximum stress in the mirror is (3 − 2ν)R2 hE Tmax = , 32(1 − 2ν)(1 + ν)f 3 where h is the mirror thickness. Comment on the limitations of this technique for making a thick, “fast” (i.e. 2R/f large) mirror. ****************************

52 Box 10.4 Important Concepts in Chapter 10 • Foundational Concepts – Displacement vector field ξ, Sec. 10.2.1 – Strain tensor S = ∇ξ, Sec. 10.2.1 – Irreducible tensorial parts of strain: expansion Θ, rotation Rij and shear Σij , Sec. 10.2.2 – Bulk and shear moduli K, µ; elastic stress tensor T = −KΘg − 2µΣ, Sec. 10.3.2 – Molecular origin of moduli and orders of magnitude, Sec. 10.3.4 – Elastic force on a unit volume, f = −∇ · T = (K + µ/3)∇(∇ · ξ) + µ∇2 ξ Sec. 10.3.2 – Elastic force balance, Sec. 110.3.2 – Elastic energy (energy of deformation), Sec. 10.3.3 – Connection Coefficients and their use in cylindrical and spherical coordinate systems, Sec10.5 • Elastostatic Equilibrium – Differential equation for displacement, f = 0 or f + ρg = 0, Sec. 10.3.2 – Boundary condition Tij nj continuous, Sec. 10.3.2 – Methods of solving for displacement in full 3 dimensions: separation of variables, Green’s functions, Sec. 10.6.2 and Exs. 10.14, 10.15 – Dimensional reduction via method of moments, and application to rods, beams and fibers, and to plates: Secs. 10.7, 10.9 – Bifurcation of equilibria: Sec. 10.8

Bibliographic Note Elasticity Theory was developed in the 19th and early 20th centuries. The classic textbook from that era is Love (1927), which is available as a Dover reprint. An outstanding, somewhat more modern text is Landau and Lifshitz (1986) — originally written in the 1950s and revised in a third edition in 1986, shortly before Lifshitz’s death. This is, perhaps, the most readable of all the textbooks that Landau and Lifshitz wrote, and is still widely used by physicists in the early 21st century. Other good texts include Southwell (1941), and Timoshenko and Goodier (1970). For a sophisticated treatment of methods of solving the elastostatic

53 equations for a body on which external forces act, see Gladwell (1980). For Green’s function solutions, see Johnson (1984).

Bibliography Braginsky, V. B., Gorodetsky, M. L., and Vyatchanin, S. P. 1999. “Thermodynamical fluctuations and photo-thermal shot noise in gravitational wave antennae” Physics Letters A, 264,1. Braginsky, V. B., Polnarev, A G., and Thorne, K. S. 1984. “Foucault pendulum at the south pole: proposal for an experiment to detect the Earth’s general relativistic gravitomagnetic field” Physical Review Letters 53, 863. Braginsky, V., Mitrofanov, M. and Panov, V. 1984. Systems with Small Dissipation, Chicago: University of Chicago Press. Chandrasekhar, S. 1962. Ellipsoidal Figures of Equilibrium, New Haven: Yale University Press. Gladwell, G. M. L. 1980. Contact Problems in the Classical Theory of Elasticity, Alphen aan den Rijn: Sijthoff and Noordhoff. Jackson, J. D. 1999. Classical Electrodynamics, New York: Wiley. Johnson, K. L. 1984. Contact Mechanics, Cambridge: Cambridge University Press. Landau, L. D., and Lifshitz, E. M. 1986. Elasticity, Third Edition, Oxford: Pergamon. Levin, Yu. 1998. “Internal thermal noise in the LIGO test masses: a direct approach” Physical Review D, 57,659–663. Liu, Y. T., and Thorne, K. S. 2000. “Thermoelastic noise and thermal noise in finitesized gravitational-wave test masses” Physical Review D, in preparation. Love, A.E.H. 1927. A Treatise on the Mathematical Theory of Elasticity, Cambridge: Cambridge University Press; Reprinted – New York: Dover Publications (1944). Mathews, J. and Walker, R. L. 1965. Mathematical Methods of Physics, New York: Benjamin. Messiah, A. 1962. Quantum Mechanics, Vol. II, North-Holland, Amsterdam. Misner, C.W., Thorne, K.S., and Wheeler, J.A. 1973. Gravitation, W.H. Freeman, San Francisco. Southwell, R.V. 1941. An Introduction to the Theory of Elasticity for Engineers and Physicists, Second Edition, Oxford: Clarendon Press; Reprinted – New York: Dover Publications.

54 Thorne, K.S. 1980. “Multipole Moments in General Relativity” Rev. Mod. Phys., 52, 299. Turcotte, D. L. and Schubert, G. 1982. Geodynamics, New York: Wiley. Timoshenko, S. and Goodier, J. N. 1970. Theory of Elasticity, Third Edition, New York: McGraw-Hill. Will, C. M. 1993. Theory and Experiment in Gravitational Physics, Revised Edition, Cambridge: Cambridge University Press. Yeganeh-Haeri, A., Weidner, D. J. & Parise, J. B. 1992. Science, 257, 650.

Contents 11 Elastodynamics 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Basic Equations of Elastodynamics; Waves in a Homogeneous Medium . 11.2.1 Equation of Motion for a Strained Elastic Medium . . . . . . . . . 11.2.2 Elastodynamic Waves . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Longitudinal Sound Waves . . . . . . . . . . . . . . . . . . . . . . 11.2.4 Transverse Shear Waves . . . . . . . . . . . . . . . . . . . . . . . 11.2.5 Energy of Elastodynamic Waves . . . . . . . . . . . . . . . . . . . 11.3 Waves in Rods, Strings and Beams . . . . . . . . . . . . . . . . . . . . . 11.3.1 Compression waves . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.2 Torsion waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.3 Waves on Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.4 Flexural Waves on a Beam . . . . . . . . . . . . . . . . . . . . . . 11.3.5 Bifurcation of Equilibria and Buckling (once more) . . . . . . . . 11.4 Body Waves and Surface Waves — Seismology . . . . . . . . . . . . . . . 11.4.1 Body Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.2 Edge Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.3 Green’s Function for a Homogeneous Half Space . . . . . . . . . . 11.4.4 Free Oscillations of Solid Bodies . . . . . . . . . . . . . . . . . . . 11.4.5 Seismic tomography . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 The Relationship of Classical Waves to Quantum Mechanical Excitations.

1

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

2 2 3 4 8 9 11 11 15 15 16 16 17 18 22 23 26 30 32 33 35

Chapter 11 Elastodynamics Version 0811.1.K, 14 January 2009 Please send comments, suggestions, and errata via email to [email protected], or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 11.1 Reader’s Guide • This chapter is a companion to Chap. 10 (Elastostatics) and relies heavily on it. • This chapter also relies rather heavily on geometric-optics concepts and formalism, as developed in Secs. 6.2 and 6.3, especially: phase velocity, group velocity, dispersion relation, rays and the propagation of waves, information and energy along them, the role of the dispersion relation as a Hamiltonian for the rays, and ray tracing. • The discussion of continuum-mechanics wave equations in Box 11.2 underlies this book’s treatment of waves in fluids (Part IV), especially in Plasmas (Part V), and in general relativity (Part VI). • The experience that the reader gains in this chapter with waves in solids will be useful when we encounter much more complicated waves in plasmas in Part V. • No other portions of this chapter are of great importance for subsequent Parts of this book.

11.1

Overview

In the previous chapter we considered elastostatic equilibria in which the forces acting on elements of an elastic solid were balanced so that the solid remained at rest. When this

2

3 equilibrium is disturbed, the solid will undergo accelerations. This is the subject of this chapter — Elastodynamics. In Sec. 11.2, we derive the equations of motion for elastic media, paying particular attention to the underlying conservation laws and focusing especially on elastodynamic waves. We show that there are two distinct wave modes that propagate in a uniform, isotropic solid, longitudinal waves and shear waves, and both are nondispersive (their phase speeds are independent of frequency). A major use of elastodynamics is in structural engineering, where one encounters vibrations (usually standing waves) on the beams that support buildings and bridges. In Sec. 11.3 we discuss the types of waves that propagate on bars, rods and beams and find that the boundary conditions at the free transverse surfaces make the waves dispersive. We also return briefly to the problem of bifurcation of equilibria (treated in Sec. 10.8) and show how, by changing the parameters controlling an equilibrium, a linear wave can be made to grow exponentially in time, thereby rendering the equilibrium unstable. A second application of elastodynamics is to seismology (Sec. 11.4). The earth is mostly a solid body through which waves can propagate. The waves can be excited naturally by earthquakes or artificially using man-made explosions. Understanding how waves propagate through the earth is important for locating the sources of earthquakes, for diagnosing the nature of an explosion (was it an illicit nuclear bomb test?) and for analyzing the structure of the earth. We briefly describe some of the wave modes that propagate through the earth and some of the inferences about the earth’s structure that have been drawn from studying their propagation. In the process, we gain some experience in applying the tools of geometric optics to new types of waves, and we learn how rich can be the Green’s function for elastodynamic waves, even when the medium is as simple as a homogeneous half space. Finally (Sec. 11.5), we return to physics to consider the quantum theory of elastodynamic waves. We compare the classical theory with the quantum theory, specializing to quantised vibrations in an elastic solid: phonons.

11.2

Basic Equations of Elastodynamics; Waves in a Homogeneous Medium

In subsection 11.2.1 of this section, we shall derive a vectorial equation that governs the dynamical displacement ξ(x, t) of a dynamically disturbed elastic medium. We shall then specialize to monochromatic plane waves in a homogeneous medium (Subsec. 11.2.2) and shall show how the monochromatic plane-wave equation can be converted into two wave equations, one for “longitudinal” waves (Subsec. 11.2.3) and the other for “transverse” waves (Subsec. 11.2.4). From those two wave equations we shall deduce the waves’ dispersion relations, which act as Hamiltonians for geometric-optics wave propagation through inhomogeneous media. Our method of analysis is a special case of a very general approach to deriving wave equations in continuum mechanics. That general approach is sketched in Box 11.2. We shall follow that approach not only here, for elastic waves, but also in Part IV for waves in fluids, Part V for waves in plasmas and Part VI for general relativistic gravitational waves. We shall conclude this section in Subsec. 11.2.5 with a discussion of the energy density and energy

4 flux of these waves, and in Ex. 11.4 we shall explore the relationship of this energy density and flux to a Lagrangian for elastodynamic waves.

11.2.1

Equation of Motion for a Strained Elastic Medium

In Chap. 10, we learned that, when an elastic medium undergoes a displacement ξ(x), it builds up a strain S = ∇ξ, which in turn produces an internal stress T = −KΘg − 2µΣ, where Θ ≡ ∇ · ξ is the expansion and Σ ≡(the symmetric trace-free part of S) is the shear; see Eqs. (10.5) and (10.18). The stress T produces an elastic force per unit volume 1 (11.1) f = −∇ · T = K + µ ∇(∇ · ξ) + µ∇2 ξ 3 [Eq. (10.19)], where K and µ are the bulk and shear moduli. In Chap. 10, we restricted ourselves to elastic media that are in elastostatic equilibrium, so they are static. This equilibrium required that the net force per unit volume acting on the medium vanish. If the only force is elastic, then f must vanish. If the pull of gravity is also significant, then f + ρg vanishes, where ρ is the medium’s mass density and g the acceleration of gravity. In this chapter we shall focus on dynamical situations, in which an unbalanced force per unit volume causes the medium to move — with the motion, in this chapter, taking the form of an elastodynamic wave. For simplicity, we shall assume that the only significant force is elastic; i.e., that the gravitational force is negligible by comparison. In Ex. 11.2 we shall show that this is the case for elastodynamic waves in most media on Earth whenever the wave frequency ω/2π is higher than about 0.001 Hz (which is usually the case in practice). Stated more precisely, in a homogeneous medium we can ignore the gravitational force whenever the elastodynamic wave’s angular frequency ω is much larger than g/c, where g is the acceleration of gravity and c is the wave’s propagation speed. Consider, then, a dynamical, strained medium with elastic force per unit volume (11.1) and no other significant force (negligible gravity), and with velocity v=

∂ξ . ∂t

(11.2a)

The law of momentum conservation states that the force per unit volume f, if nonzero, must produce a rate of change of momentum per unit volume ρv according to the equation1 ∂(ρv) 1 = f = −∇ · T = K + µ ∇(∇ · ξ) + µ∇2 ξ . (11.2b) ∂t 3 1 In Sec. 12.5 of the next chapter we shall learn that the motion of the medium produces a stress ρv ⊗ v that must be included in this equation if the velocities are large. However, this subtle dynamical stress is always negligible in elastodynamic waves because the displacements and hence velocities v are tiny and ρv ⊗ v is second order in the displacement. For this reason we shall delay studying this subtle nonlinear effect until Chap. 12.

5 Notice that when rewritten in the form ∂(ρv) +∇·T=0, ∂t this is the version of the law of momentum conservation discussed in Chap. 1 [Eq. (1.51)], and it has the standard form for a conservation law (time derivative of density of something, plus divergence of flux of that something, vanishes; Sec. 1.11.3); ρv is the density of momentum, and the stress tensor T is by definition the flux of momentum. Equations (11.2a) and (11.2b), together with the law of mass conservation [the obvious analog of Eqs. (1.72) for conservation of charge and particle number], ∂ρ + ∇ · (ρv) = 0 ∂t

(11.2c)

are a complete set of equations for the evolution of the displacement ξ(x, t), the velocity v(x, t) and the density ρ(x, t). The elastodynamic equations (11.2) are nonlinear because of the ρv terms (see below). From them we shall derive a linear wave equation for the displacement vector ξ(x, t). Our derivation provides us with a simple (almost trivial) example of the general procedure discussed in Box 11.2. To derive a linear wave equation, we must find some small parameter in which to expand. The obvious choice in elastodynamics is the strain S = ∇ξ and its components, which are all dimensionless and must be less than about 10−3 to remain within the non-yielding, nonbreaking, linear elastic regime (Sec. 10.2.1). Equally well, we can regard the displacement ξ itself as our small parameter. If the medium’s equilibrium state were homogeneous, the linearization would be trivial. However, we wish to be able to treat perturbations of inhomogeneous equilibria such as seismic waves in the Earth, or perturbations of slowly changing equilibria such as vibrations of a pipe or mirror that is gradually changing temperature. In almost all situations the lengthscale L and timescale T on which the medium’s equilibrium properties (ρ, K, µ) vary are extremely large compared to the lengthscale and timescale of the dynamical perturbations (their reduced wavelength λ ¯ =wavelength/2π and 1/ω =period/2π). This permits us to perform a two-lengthscale expansion (like the one that underlies geometric optics, Sec. 6.3) alongside our small-strain expansion. In analyzing a dynamical perturbation of an equilibrium state, we use ξ(x, t) to denote the dynamical displacement (i.e., we omit from it the equilibrium’s static displacement, and similarly we omit from ∇ξ the equilibrium strain). We write the density as ρ+δρ, where ρ(x) is the equilibrium density distribution and δρ(x, t) is the dynamical density perturbation, which is first-order in the dynamical displacement ξ. Inserting these into the equation of mass conservation (11.2c), we obtain ∂δρ/∂t + ∇ · [(ρ + δρ)v] = 0. Because v = ∂ξ/∂t is first order, the term (δρ)v is second order and can be dropped, resulting in the linearized equation ∂δρ/∂t + ∇ · (ρv) = 0. Because ρ varies on a much longer lengthscale than v (L vs. λ ¯ ), we can pull ρ out of the derivative; setting v = ∂ξ/∂t and interchanging the time derivative and divergence, we then obtain ∂δρ/∂t + ρ∂(∇ · ξ)/∂t = 0. Noting that ρ varies

6 Box 11.2 Wave Equations in Continuum Mechanics In this box, we make an investment for future chapters by considering wave equations in some generality. Most wave equations arise as approximations to the full set of equations that govern a dynamical physical system. It is usually possible to arrange those full equations as a set of first order partial differential equations that describe the dynamical evolution of a set of n physical quantities, VA , with A = 1, 2, ..., n: i.e. ∂VA + FA (VB ) = 0 . ∂t

(1)

[For elastodynamics there are n = 7 quantities VA : {ρ, ρvx , ρvy , ρvz , ξx , ξy , ξz } (in Cartesian coordinates); and the seven equations (1) are mass conservation, momentum conservation, and ∂ξj /∂t = vj ; Eqs. (11.2).] Now, most dynamical systems are intrinsically nonlinear (Maxwell’s equations in vacuo being a conspicuous exception) and it is usually quite hard to find nonlinear solutions. However, it is generally possible to make a perturbation expansion in some small physical quantity about a time-independent equilibrium and just retain terms that are linear in this quantity. We then have a set of n linear partial differential equations that are much easier to solve than the nonlinear ones—and that usually turn out to have the character of wave equations (i.e., to be “hyperbolic”). Of course the solutions will only be a good approximation for small amplitude waves. [In elastodynamics, we justify linearization by requiring that the strains be below the elastic limit, we linearize in the strain or displacement of the dynamical perturbation, and the resulting linear wave equation is ρ∂ 2 ξ/∂t2 = (K + 31 µ)∇(∇ · ξ) + µ∇2 ξ; Eq. (11.4b).] Boundary Conditions In some problems, e.g. determining the normal modes of vibration of a building during an earthquake, or analyzing the sound from a violin or the vibrations of a finitelength rod, the boundary conditions are intricate and have to be incorporated as well as possible, to have any hope of modeling the problem. The situation is rather similar to that familiar from elementary quantum mechanics. The waves are often localised within some region of space, like bound states, in such a way that the eigenfrequencies are discrete, for example, standing wave modes of a plucked string. In other problems the volume in which the wave propagates is essentially infinite, as happens with unbound states (e.g. waves on the surface of the ocean or seismic waves propagating through the earth). Then the only boundary condition is essentially that the wave amplitude remain finite at large distances. In this case, the wave spectrum is usually continuous.

7

Box 11.2, Continued Geometric Optics Limit and Dispersion Relations The solutions to the wave equation will reflect the properties of the medium through which the wave is propagating, as well as its boundaries. If the medium and boundaries have a finite number of discontinuities but are otherwise smoothly varying, there is a simple limiting case: waves of short enough wavelength and high enough frequency that they can be analyzed in the geometric optics approximation (Chap. 6). The key to geometric optics is the dispersion relation, which (as we learned in Sec. 6.3) acts as a Hamiltonian for the propagation. Recall from Chap. 6 that, although the medium may actually be inhomogeneous and might even be changing with time, when deriving the dispersion relation we can approximate it as precisely homogeneous and time-independent, and can resolve the waves into plane-wave modes, i.e. modes in which the perturbations vary ∝ exp i(k · x − ωt). Here k is the wave vector and ω is the angular frequency. This allows us to remove all the temporal and spatial derivatives and converts our set of partial differential equations into a set of homogeneous, linear algebraic equations. When we do this, we say that our normal modes are local. If, instead, we were to go to the trouble of solving the partial differential wave equation with its attendant boundary conditions, the modes would be referred to as global. The linear algebraic equations for a local problem can be written in the form MAB VB = 0, where VA is the vector of n dependent variables and the elements MAB of the n × n matrix ||MAB || depend on k and ω as well as on parameters pα that describe the local conditions of the medium. This set of equations can be solved in the usual manner by requiring that the determinant of kMAB k vanish. Carrying through this procedure yields a polynomial, usually of n’th order, for ω(k, pα). This polynomial is the dispersion relation. It can be solved (analytically in simple cases and numerically in general) to yield a number of complex solutions for ω, with k regarded as real. (Of course, we might just as well treat the wave vector as a complex number, but for the moment we will regard it as real.) Armed with these solutions, we can solve for the associated eigenvectors. The eigenfrequencies fully characterize the solution of the local problem, and can be used to solve for the waves’ temporal evolution from some given initial conditions in the usual manner. (As we shall see several times, especially when we discuss Landau damping in Chap. 21, there are some subtleties that can arise.) What does a complex value of the angular frequency ω mean? We have posited that all small quantities vary ∝ exp[i(k · x − ωt)]. If ω has a positive imaginary part, then the small perturbation quantities will grow exponentially with time. Conversely, if it has a negative imaginary part, they will decay. Now, polynomial equations with real coefficients have complex conjugate solutions. Therefore if there is a decaying mode there must also be a growing mode. Growing modes correspond to instability, a topic that we shall encounter often.

8 on a timescale T long compared to that 1/ω of ξ and δρ, we can integrate this to obtain the linear relation δρ (11.3) = −∇ · ξ . ρ This linearized equation for the fractional perturbation of density could equally well have been derived by considering a small volume V of the medium that contains mass M = ρV , and by noting that the dynamical perturbations lead to a volume change δV /V = Θ = ∇ · ξ [Eq. (10.8)], so conservation of mass requires 0 = δM = δ(ρV ) = V δρ+ρδV = V δρ+ρV ∇·ξ, which implies δρ/ρ = −∇ · ξ. This is the same as Eq. (11.3). The equation of momentum conservation (11.2b) can be handled similarly. By linearizing and pulling the slowly varying density out from under the time derivative, we convert ∂(ρv)/∂t into ρ∂v/∂t = ρ∂ 2 ξ/∂t2 . Inserting this into Eq. (11.2b), we obtain the linear wave equation ∂2ξ ρ 2 = −∇ · Tel (11.4a) ∂t i.e., 1 ∂2ξ ρ 2 = (K + µ)∇(∇ · ξ) + µ∇2 ξ . (11.4b) ∂t 3 In this equation, terms involving a derivative of K or µ have been omitted because the two-lengthscale assumption L ≫ λ ¯ makes them negligible compared to the terms we have kept. Equation (11.4b) is the first of many wave equations we shall encounter in elastodynamics, fluid mechanics, and plasma physics.

11.2.2

Elastodynamic Waves

Continuing to follow our general procedure for deriving and analyzing wave equations as outlined in Box 11.2, we next derive dispersion relations for two types of waves (longitudinal and transverse) that are jointly incorporated into the general elastodynamic wave equation (11.4b). Recall from Chap. 6 that, although a dispersion relation can be used as a Hamiltonian for computing wave propagation through an inhomogeneous medium, one can derive the dispersion relation most easily by specializing to monochromatic plane waves propagating through a medium that is precisely homogeneous. Therefore, we seek a plane-wave solution, i.e. a solution of the form ξ(x, t) ∝ ei(k·x−ωt) , (11.5) to the wave equation (11.4b) with ρ, K and µ regarded as homogeneous (constant). (To deal with more complicated perturbations of a homogeneous medium, we can think of this wave as being an individual Fourier component and linearly superpose many such waves as a Fourier integral.) Since our wave is planar and monochromatic, we can remove the derivatives in Eq. (11.4b) by making the substitutions ∇ → ik and ∂/∂t → −iω (the first of

9 which implies ∇2 → −k 2 , ∇· → ik·, ∇× → ik×.) We thereby reduce the partial differential equation (11.4b) to a vectorial algebraic equation: 1 ρω 2 ξ = (K + µ)k(k · ξ) + µk 2 ξ . 3

(11.6)

(This reduction is only possible because the medium is uniform, or in the geometric optics limit of near uniformity; otherwise, we must solve the second order partial differential equation (11.4b) using standard techniques.) How do we solve this equation? The sure way is to write it as a 3 × 3 matrix equation Mij ξj = 0 for the vector ξ and set the determinant of Mij to zero (Box 11.2 and Ex. 11.3). This is not hard for small or sparse matrices. However, some wave equations are more complicated and it often pays to think about the waves in a geometric, coordinateindependent way before resorting to brute force. The quantity that oscillates in the elastodynamic waves (11.6) is the vector field ξ. The nature of its oscillations is influenced by the scalar constants ρ, µ, K, ω and by just one quantity that has directionality: the constant vector k. It seems reasonable to expect the description (11.6) of the oscillations to simplify, then, if we resolve the oscillations into a “longitudinal” component (or “mode”) along k and a “transverse” component (or “mode”) perpendicular to k, as shown in Fig. 11.1: ξ = ξL + ξT ,

ˆ, ξ L = ξL k

ˆ=0. ξT · k

(11.7a)

ˆ ≡ k/k is the unit vector along the propagation direction. It is easy to see that Here k the longitudinal mode ξ L has nonzero expansion Θ ≡ ∇ · ξ L 6= 0 but vanishing rotation φ = 21 ∇ × ξL = 0, and can therefore be written as the gradient of a scalar potential, ξ L = ∇ψ .

(11.7b)

By contrast, the transverse mode has zero expansion but nonzero rotation and can thus be written as the curl of a vector potential, ξT = ∇ × A ;

(11.7c)

cf. Ex. 11.1.

11.2.3

Longitudinal Sound Waves

For the longitudinal mode the algebraic wave equation (11.6) reduces to the following simple ˆ into Eq. (11.6) , or, alternatively, relation [as one can easily see by inserting ξ ≡ ξ L = ξL k by taking the divergence of (11.6), which is equivalent to taking the scalar product with k]: K + 34 µ 2 k ; ω = ρ 2

i.e.

ω = Ω(k) =

K + 34 µ 2 k ρ

1/2

.

This relation between ω and k is the longitudinal mode’s dispersion relation.

(11.8)

10 ξ

ξ

k

k

(b)

(a)

Fig. 11.1: Displacements in an isotropic, elastic solid, perturbed by a) a longitudinal mode, b) a transverse mode.

From the geometric-optics analysis in Sec. 6.3 we infer that, if K, µ and ρ vary spatially on an inhomogeneity lengthscale L large compared to 1/k = λ ¯ , and vary temporally on a timescale T large compared to 1/ω, then the dispersion relation (11.8), with Ω now depending on x and t through K, µ, and ρ, serves as a Hamiltonian for the wave propagation. In Sec. 11.4 and Fig. 11.6 below we shall use this to deduce details of the propagation of seismic waves through the Earth’s inhomogeneous interior. As we discussed in great detail in Sec. 6.2, associated with any wave mode is its phase ˆ and its phase speed Vph = ω/k. The dispersion relation (11.8) implies velocity, Vph = (ω/k)k that for longitudinal elastodynamic modes, the phase speed is ω CL = = k

K + 34 µ ρ

1/2

.

(11.9a)

As this does not depend on the wave number k ≡ |k|, the mode is non-dispersive, and as it ˆ of propagation through the medium, the phase speed is does not depend on the direction k also isotropic, naturally enough, and the group velocity Vg j = ∂Ω/∂kj is equal to the phase velocity: ˆ. Vg = Vph = CL k (11.9b) Elastodynamic longitudinal modes are similar to sound waves in a fluid. However, in a fluid, as we shall see in Eq. (15.67d), the sound waves travel with phase speed Vph = (K/ρ)1/2 [the limit of Eq. (11.9a) when the shear modulus vanishes].2 This fluid sound speed is lower than the CL of a solid with the same bulk modulus because the longitudinal displacement necessarily entails shear (note that in Fig. 11.1a the motions are not an isotropic expansion), and in a solid there is a restoring shear stress (proportional to µ) that is absent in a fluid. Because the longitudinal phase velocity is independent of frequency, we can write down general planar longitudinal-wave solutions to the elastodynamic wave equation (11.4b) in p Eq. (15.67d) says the fluid sound speed is C = (∂P/∂ρ)s , i.e. the square root of the derivative of the fluid pressure with respect to density at fixed entropy. In the language of elasticity theory, the fractional change of density is related to the expansion Θ by δρ/ρ = −Θ [Eq. (11.3)], and the accompanying change of pressure is δP = −KΘ p preceding Eq. (10.18)], i.e. δP = K(δρ/ρ). Therefore the fluid mechanical p [paragraph sound speed is C = δP/δρ = K/ρ. 2

11 the following form:

ˆ = F (k ˆ · x − CL t)k ˆ, ξ = ξL k

(11.10)

where F (x) is an arbitrary function. This describes a wave propagating in the (arbitrary) ˆ with an arbitrary profile determined by the function F . direction k

11.2.4

Transverse Shear Waves

To derive the dispersion relation for a transverse wave we can simply make use of the transversality condition k · ξ T = 0 in Eq. (11.6); or, equally well, we can take the curl of Eq. (11.6) (multiply it by ik×), thereby projecting out the transverse piece, since the longitudinal part of ξ has vanishing curl. The result is µ ω = k2 ; ρ 2

i.e.

ω = Ω(k) ≡

µ 2 k ρ

1/2

.

(11.11)

This dispersion relation ω = Ω(k) serves as a geometric-optics Hamiltonian for wave propagation when µ and ρ vary slowly with x and/or t, and it also implies that the transverse waves propagate with a phase speed CT and phase and group velocities given by 1/2 µ CT = ; ρ

(11.12a)

ˆ. Vph = Vg = CT k

(11.12b)

As K > 0, the shear wave speed CT is always less than the speed CL of longitudinal waves [Eq. (11.9a)]. These transverse modes are known as shear waves because they are driven by the shear stress; cf. Fig. 11.1b. There is no expansion and therefore no change in volume associated with shear waves. They do not exist in fluids, but they are close analogs of the transverse vibrations of a string. Longitudinal waves can be thought of as scalar waves, since they are fully describable ˆ Shear waves, by contrast, by a single component ξL of the displacement ξ: that along k. are inherently vectorial. Their displacement ξT can point in any direction orthogonal to k. Since the directions orthogonal to k form a two-dimensional space, once k has been chosen, there are two independent states of polarization for the shear wave. These two polarization states, together with the single one for the scalar, longitudinal wave, make up the three independent degrees of freedom in the displacement ξ. In Ex. 11.3 we deduce these properties of ξ using matrix techniques.

11.2.5

Energy of Elastodynamic Waves

Elastodynamic waves transport energy, just like waves on a string. The waves’ kinetic energy 2 density is obviously 21 ρv2 = 21 ρξ˙ , where the dot means ∂/∂t. The elastic energy density is

12 given by Eq. (10.27), so the total energy density is 1 2 1 U = ρξ˙ + KΘ2 + µΣij Σij . 2 2

(11.13a)

In Ex. 11.4 we show that (as one might expect) the elastodynamic wave equation (11.4b) can be derived from an action whose Lagrangian density is the kinetic energy density minus the elastic energy density. We also show that associated with the waves is an energy flux F (not to be confused with a force for which we use the same notation) given by Fi = −KΘξ˙i − 2µΣij ξ˙j .

(11.13b)

As the waves propagate, energy sloshes back and forth between the kinetic part and the elastic part, with the time averaged kinetic energy being equal to the time averaged elastic energy (equipartion of energy). For the planar, monochromatic, longitudinal mode, the time averaged energy density and flux are UL = ρhξ˙L2 i ,

ˆ , FL = UL CL k

(11.14)

where h...i denotes an average over one period or wavelength of the wave. Similarly, for the planar, monochromatic, transverse mode, the time averaged density and flux of energy are 2 UT = ρhξ˙ T i , ,

ˆ FT = UT CT k

(11.15)

[Ex. 11.4]. Thus, elastodynamic waves transport energy at the same speed cL,T as the waves ˆ This is the same behavior as electromagnetic waves propagate, and in the same direction k. ˆ with c the in vacuum, whose Poynting flux and energy density are related by FEM = UEM ck speed of light, and the same as all forms of dispersion-free scalar waves (e.g. sound waves in a medium), cf. Eq. (6.31). Actually, this is the dispersion-free limit of the more general result that the energy of any wave, in the geometric-optics limit, is transported with the wave’s group velocity, Vg ; see Sec. 6.2.2. **************************** EXERCISES Exercise 11.1 Example: Scalar and Vector Potentials for Elastic Waves in a Homogeneous Solid Just as in electromagnetic theory, it is sometimes useful to write the displacement ξ in terms of scalar and vector potentials, ξ = ∇ψ + ∇ × A . (11.16) (The vector potential A is, as usual, only defined up to a gauge transformation, A → A+∇ϕ, where ϕ is an arbitrary scalar field.) By inserting Eq. (11.16) into the general elastodynamic

13 wave equation (11.4b), show that the scalar and vector potentials satisfy the following wave equations in a homogeneous solid: ∂2ψ = c2L ∇2 ψ , ∂t2

∂2A = c2T ∇2 A . ∂t2

(11.17)

Thus, the scalar potential ψ generates longitudinal waves, while the vector potential A generates transverse waves. Exercise 11.2 *** Problem: Influence of gravity on wave speed Modify the wave equation (11.4b) to include the effect of gravity. Assume that the medium is homogeneous and the gravitational field is constant. By comparing the orders of magnitude of the terms in the wave equation verify that the gravitational terms can be ignored for high-enough frequency elastodynamic modes: ω ≫ g/cL,T . For wave speeds ∼ 3 km/s, this says ω/2π ≫ 0.0005 Hz. Seismic waves are generally in this regime. Exercise 11.3 Example: Solving the Algebraic Wave Equation by Matrix Techniques By using the matrix techniques discussed in the next-to-the-last paragraph of Box 11.2, deduce that the general solution to the algebraic wave equation (11.6) is the sum of a longitudinal mode with the properties deduced in Sec. 11.2.3, and two transverse modes with the properties deduced in Sec. 11.2.4. [Note: This matrix technique is necessary and powerful when the algebraic dispersion relation is complicated, e.g. for plasma waves; Secs. 20.3.2 and 20.4.1. Elastodynamic waves are simple enough that we did not need this matrix technique in the text.] Guidelines for solution: (a) Rewrite the algebraic wave equation in the matrix form Mij ξj = 0, obtaining thereby an explicit form for the matrix ||Mij || in terms of ρ, K, µ, ω and the components of k. (b) This matrix equation has a solution if and only if the determinant of the matrix ||Mij || vanishes. (Why?) Show that det||Mij || = 0 is a cubic equation for ω 2 in terms of k 2 , and that one root of this cubic equation is ω = CL k, while the other two roots are ω = CT k with CL and CT given by Eqs. (11.9a) and (11.12a). (c) Orient Cartesian axes so that k points in the z direction. Then show that when ω = CL k, the solution to Mij ξj = 0 is a longitudinal wave, i.e., a wave with ξ pointing in the z direction, the same direction as k. (d) Show that when ω = CT k, there are two linearly independent solutions to Mij ξj = 0, one with ξ pointing in the x direction (transverse to k) and the other in the y direction (also transverse to k). Exercise 11.4 Example: Lagrangian and Energy for Elastodynamic Waves Derive the energy-density, energy-flux, and Lagrangian properties of elastodynamic waves that are stated in Sec. 11.2.5. Guidelines:

14 (a) For ease of calculation (and for greater generality), consider an elastodynamic wave in a possibly anisotropic medium, for which Tij = −Yijkl ξk;l

(11.18)

with Yijkl the tensorial modulus of elasticity, which is symmetric under interchange of the first two indices ij, and under interchange of the last two indices kl, and under interchange of the first pair ij with the last pair kl [Eq. (10.17) and associated discussion]. Show that for an isotropic medium 2 (11.19) Yijkl = K − µ gij gkl + µ(gik gjl + gil gjk ) . 3 (Recall that in the orthonormal bases to which we confine ourselves, the components of the metric are gij = δij , i.e. the Kronecker delta.) (b) For these waves the elastic energy density is 12 Yijklξi;j ξk;l [Eq. (10.28)]. Show that the kinetic energy density minus the elastic energy density 1 1 L = ρ ξ˙i ξ˙i − Yijkl ξi;j ξk;l 2 2

(11.20)

is a Lagrangian density for the waves; i.e., show that the vanishing of its variational derivative δL/δξj = 0 is equivalent to the elastodynamic equations ρξ¨ = −∇ · T. (c) The waves’ energy density and flux can be constructed by the vector-wave analog of the canonical procedure of Eq. (6.49c): 1 ∂L ˙ 1 ξi − L = ρ ξ˙i ξ˙i + Yijklξi;j ξk;l , ˙ 2 2 ∂ ξi ∂L ˙ Fj = ξi = −Yijkl ξ˙i ξk;l . ∂ξi;j U =

(11.21)

Verify that these density and flux satisfy the energy conservation law, ∂U/∂t+∇·F = 0. It is straightforward algebra to verify, using Eq. (11.19), that for an isotropic medium expressions (11.21) for the energy density and flux become the expressions (11.13) given in the text. (d) Show that the time average of the kinetic energy density and that of the elastic energy density in Eq. (11.21) are equal, and that therefore the energy densities of the longitudinal and transverse modes are given by the first of Eqs. (11.14) and (11.15). (e) Show that the time average of the energy flux (11.13b) for the longitudinal and transverse modes is given by the second of Eqs. (11.14) and (11.15), so the energy propagates with the same speed and direction as the waves themselves. ****************************

15

dx

ϕ ∼ ω

a x

Fig. 11.2: When a wire of circular cross section is twisted, there will be a restoring torque.

11.3

Waves in Rods, Strings and Beams

Let us now illustrate some of these ideas using the types of waves that can arise in some practical applications. In particular we discuss how the waves get modified when the medium through which they propagate is not uniform but instead is bounded. Despite this situation being formally “global” in the sense of Box 11.2, elementary considerations enable us to derive the relevant dispersion relations without much effort.

11.3.1

Compression waves

First consider a longitudinal wave propagating along a light (negligible gravity), thin, unstressed rod. Introduce a Cartesian coordinate system with the x-axis parallel to the rod. When there is a small displacement ξx independent of y and z, the restoring stress is given by Txx = −E∂ξx /∂x, where E is Young’s modulus (cf. end of Sec. 10.3). Hence the restoring force density f = −∇ · T is fx = E∂ 2 ξx /∂x2 . The wave equation then becomes 2 ∂ 2 ξx E ∂ ξx = , (11.22) 2 ∂t ρ ∂x2 and so the sound speed for compression waves in a long straight rod is 21 E CC = . ρ

(11.23)

Referring to Table 10.1 (in Chap. 10), we see that a typical value of Young’s modulus in a solid is ∼ 100 GPa. If we adopt a typical density ∼ 3 × 103 kg m−3 , then we estimate the compressional sound speed to be ∼ 5 km s−1 . This is roughly 15 times the sound speed in air.

16

11.3.2

Torsion waves

Next consider a wire with circular cross section of radius a subjected to a twisting force (Fig. 11.2). Let us introduce an angular displacement ∆φ ≡ ϕ that depends on x. The only nonzero component of the displacement vector is then ξφ = ̟ϕ. We can calculate the total torque by integating over a circular cross section. For small twists, there will be no expansion and the only components of the shear tensor are 1 ̟ ∂φ Σφx = Σxφ = ξφ,x = . (11.24) 2 2 ∂x The torque contributed by an annular ring of radius ̟ and thickness d̟ is ̟ · Tφx · 2π̟d̟ and we substitute Tφx = −2µΣφx to obtain the total torque Z a ∂ϕ N= 2πµ̟ 3 d̟ . (11.25) ∂x 0 Now the moment of inertia per unit length is π (11.26) I = ρa4 , 2 so equating the net torque per unit length to the rate of change of angular momentum, also per unit length, we obtain ∂N ∂2ϕ =I 2 , (11.27) ∂x ∂t or 2 ρ ∂ ϕ ∂2ϕ = . (11.28) 2 ∂x µ ∂t2 The speed of torsional waves is thus 21 µ . (11.29) CT = ρ Note that this is the same speed as that of shear waves in a uniform medium. This might have been anticipated as there is no change in volume in a torsional oscillation and so only the shear stress acts to produce a restoring force.

11.3.3

Waves on Strings

This example is surely all too familiar. When a string under a tension force T (not force per unit area) is plucked, there will be a restoring force proportional to the curvature of the string. If ξx ≡ η is the transverse displacement (in the same notation as we used for rods in Secs. 10.7 and 10.8), then the wave equation will be ∂2η ∂2η T 2 =Λ 2 , ∂x ∂t where Λ is the mass per unit length. The wave speed is thus 1/2 T CS = . Λ

(11.30)

(11.31)

17

11.3.4

Flexural Waves on a Beam

Now consider the small amplitude displacement of a rod or beam that can be flexed. In Sec. 10.7 we showed that such a flexural displacement produces a net elastic restoring force per unit length given by D∂ 4 η/∂x4 , and we considered a situation where that force was balanced by the beam’s weight per unit length, W = Λg [Eq. (10.87)]. Here D=

1 Ewh3 12

(11.32)

is the flexural rigidity [Eq. (10.82)], h is the beam’s thickness in the direction of bend, w is its width, η = ξz is the transverse displacement of the neutral surface from the horizontal, Λ is the mass per unit length, and g is the earth’s acceleration of gravity. The solution of the resulting force-balance equation, −D∂ 4 η/∂x4 = W = Λg, was the quartic (10.88a), which described the equilibrium beam shape. When gravity is absent and the beam is allowed to move, the acceleration of gravity g gets replaced by a dynamical acceleration of the beam, ∂ 2 η/∂t2 ; the result is a wave equation for flexural waves on the beam: ∂4η ∂2η −D 4 = Λ 2 . (11.33) ∂x ∂t [This derivation of the wave equation is an elementary illustration of the Principle of Equivalence—the equivalence of gravitational and inertial forces, or gravitational and inertial accelerations—which underlies Einstein’s general relativity theory (Chap. 24).] The wave equations we have encountered so far in this chapter have all described nondispersive waves, for which the wave speed is independent of the frequency. Flexural waves, by contrast, are dispersive. We can see this by assuming that η ∝ exp[i(kx−ωt)] and thereby deducing from Eq. (11.33) the dispersion relation ω=

p

D/Λ k 2 .

(11.34)

Before considering the implications of this dispersion, we shall complicate the equilibrium a little. Let us suppose that, in addition to the net shearing force per unit length −D∂ 4 η/∂x4 , the beam is also held under a tension force T as well. We can then combine the two wave equations (11.30), (11.33) to obtain −D

∂2η ∂2η ∂4η + T = Λ , ∂x4 ∂x2 ∂t2

(11.35)

for which the dispersion relation is 2

ω =

CS2 k 2

k2 1+ 2 , kc

(11.36)

p where CS = T /Λ is the wave speed when the flexural rigidity D is negligible so the beam is string-like, and p (11.37) kc = T /D

18 is a critical wave number. If the average strain induced by the tension is ǫ = ξx,x = T /Ewh, then kc = (12ǫ)1/2 h−1 , where h is the thickness of the beam and w is its width. [Notice that kc is also of order 1/λ, where λ is the lengthscale on which a pendulum’s support wire (“beam”) bends as discussed in Ex. 10.18.] For short wavelengths k ≫ kc , the shearing force dominates and the beam behaves like a tension-free beam; for long wavelengths k ≪ kc , it behaves like a string. A consequence of dispersion is that waves with different wave numbers k propagate with different speeds, and correspondingly the group velocity Vg = dω/dk with which wave packets propagate differs from the phase velocity Vph = ω/k with which a wave’s crests and troughs move (see Sec. 6.2.2). For the dispersion relation (11.36), the phase and group velocities are Vph ≡ ω/k = CS (1 + k 2 /kc2 )1/2 , Vg ≡ dω/dk = CS (1 + 2k 2 /kc2 )(1 + k 2 /kc2 )−1/2 .

(11.38)

As we discussed in detail in Sec. 6.2.2 and Ex. 6.2, for dispersive waves such as this one, the fact that different Fourier components in the wave packet propagate with different speeds causes the packet to gradually spread; we explore this quantitatively for longitudinal waves on a beam in Ex. 11.5.

11.3.5

Bifurcation of Equilibria and Buckling (once more)

We conclude this discussion by returning to the problem of buckling, which we introduced in Sec. 10.8. The example we discussed there was a playing card compressed until it wants to buckle. We can analyze small dynamical perturbations of the card, η(x, t), by treating the tension T of the previous section as negative, T = −F where F is the compressional force applied to the card’s two ends in Fig. 10.13. Then the equation of motion (11.35) becomes −D

∂4η ∂2η ∂2η − F = Λ . ∂x4 ∂x2 ∂t2

(11.39)

We seek solutions for which the ends of the playing card are held fixed (as shown in Fig. 10.13), η = 0 at x = 0 and x = ℓ. Solving Eq. (11.39) by separation of variables, we see that nπ x e−iωn t . (11.40) η = ηn sin ℓ Here n = 1, 2, 3, ... labels the card’s modes of oscillation, n − 1 is the number of nodes in the card’s sinusoidal shape for mode n, ηn is the amplitude of deformation for mode n, and the mode’s eigenfrequency ωn (of course) satisfies the same dispersion relation (11.36) as for waves on a long, stretched beam, with T → −F and k → nπ/ℓ: 1 nπ 2 nπ 2 2 D−F . (11.41) ωn = Λ ℓ ℓ Consider the lowest normal mode, n = 1, for which the playing card is bent in the single-arch manner of Fig. 10.13 as it oscillates. When the compressional force F is small, ω12 is positive, so ω1 is real and the normal mode oscillates sinusoidally, stably. But for

19 F > Fcrit = π 2 D/ℓ2, ω12 is negative, so ω1 is imaginary and there are two normal-mode solutions, one decaying exponentially with time, η ∝ exp(−|ω1 |t), and the other increasing exponentially with time, η ∝ exp(+|ω1|t), signifying an instability against buckling. Notice that the onset of instability occurs at identically the same compressional force, F = Fcrit ≡ π 2 D/ℓ2 , as the bifurcation of equilibria [Eq. (10.96)], at which a new, bent, equilibrium state for the playing card comes into existence. Notice, moreover, that the card’s n = 1 normal mode has zero frequency, ω1 = 0, at this onset of instability and bifurcation of equilibria; the card can bend by an amount that grows linearly in time, η = A sin(πx/ℓ) t, with no restoring force or exponential growth. This zero-frequency motion leads the card from its original, straight equilibrium shape, to its new, bent equilibrium shape. [For a free-energy-based analysis of the onset of this instability, see Ex. 11.8.] This is an example of a very general phenomenon, which we shall meet again in fluid mechanics (Sec. 14.5): For mechanical systems without dissipation (no energy losses to friction or viscosity or radiation or ...), as one gradually changes some “control parameter” (in this case the compressional force F ), there can occur bifurcation of equilibria. At each bifurcation point, a normal mode of the original equilibrium becomes unstable, and at its onset of instability the mode has zero frequency and represents a motion from the original equilibrium (which is becoming unstable) to the new, stable equilibrium. In our simple playing-card example, we see this phenomenon repeated again and again as the control parameter F is increased: One after another the modes n = 1, n = 2, n = 3, ... become unstable. At each onset of instability, ωn vanishes, and the zero-frequency mode (with n − 1 nodes in its eigenfunction) leads from the original, straight-card equilibrium to the new, stable, (n − 1)-noded, bent equilibrium. Buckling is a serious issue in engineering. Whenever one has a vertical beam supporting a heavy weight (e.g. in the construction of a tall building), one must make sure that the beam has a large enough flexural rigidity D to be stable against buckling. The reason is that, although there is a new, stable equilibrium if F is only slightly larger than Fcrit , the bend in that equilibrium increases rapidly with increasing F [Eq. (10.97)] and becomes so large, when F is only moderately larger than Fcrit , that the beam breaks. A large enough flexural rigidity D to protect against this is generally achieved not by making the beam uniformly thick, but rather by fashioning its cross section into an H shape or I shape (with cross bars). Whenever one has a long pipe exposed to night-to-day cooling-to-heating transitions (e.g. an oil or natural gas pipe, or the long vacuum tubes of a laser interferometer gravitational wave detector), one must make sure the pipe has enough flexural rigidity to avoid buckling in the heat of the day, when it wants to expand in length.3 It can be overly expensive to make the pipe walls thick enough to achieve the required flexural rigidity, so instead of thickening the walls everywhere, engineers weld “stiffening rings” onto the outside of the pipe to increase its rigidity. Notice, in Eq. (11.41), that the longer is the length ℓ of the beam or pipe, the larger must be the flexural rigidity D to avoid buckling; the required rigidity scales as the square of the length.

3

Much of the thermal expansion is dealt with by bellows.

20

**************************** EXERCISES Exercise 11.5 Derivation: Dispersion of Flexural Waves Verify Eqs. (11.36) and (11.38). Sketch the dispersion-induced evolution of a Gaussian wave packet as it propagates along a stretched beam. Exercise 11.6 Problem: Speeds of Elastic Waves Show that the sound speeds for the following types of elastic waves in an isotropic material 1/2 1−ν are in the ratio 1 : (1 − ν 2 )−1/2 : (1+ν)(1−2ν) : [2(1 + ν)]−1/2 : [2(1 + ν)]−1/2 . Longitudinal waves along a rod, longitudinal waves along a sheet, longitudinal waves along a rod embedded in an incompressible fluid, shear waves in an extended solid, torsional waves along a rod. [Note: Here and elsewhere in this book, if you encounter grungy algebra (e.g. frequent conversions from {K, µ} to {E, ν}), do not hesitate to use Mathematica or Maple or other symbolic manipulation software to do the algebra!] Exercise 11.7 Problem: Xylophones Consider a beam of length ℓ, whose weight is neglible in the elasticity equations, supported freely at both ends (so the slope of the beam is unconstrained at the ends). Show that the frequencies of standing flexural waves satisfy ω=

nπ 2 D 1/2 ℓ

ρA

,

where A is the cross-sectional area and n is an integer. Now repeat the exercise when the ends are clamped. Hence explain why xylophones don’t have clamped ends. Exercise 11.8 ***Example: Free-Energy Analysis of Buckling Instability In this exercise you will explore the relationship of the onset of the buckling instability to the concept of free energy, which we introduced in Chap. 4 in our study of statistical mechanics and phase transitions. (a) Consider a rod with flexural rigidity D and with a compressional force F applied at each end, as in Sec. 11.3.5. Show that, if the rod gets bent slightly, with a transverse displacement η(x), its elastic energy increases by an amount E=

Z

1 1 E(ξx,x)2 dxdydz = D 2 2

Z ℓ 0

∂2η ∂x2

2

dx .

(11.42)

Here ξx is the longitudinal displacement inside the rod, ξx,x is the longitudinal strain, and the first integral is over the entire interior of the rod. [Hint: recall the first few steps in the dimensional-reduction analysis for such a rod in Sec. 10.7.]

21 (b) If the compressional force F were absent, then the most stable equilibrium shape would be the one that minimizes this elastic energy subject to the boundary conditions that η(x) = 0 at x = 0 and x = ℓ: i.e. the unbent shape η = 0. However, when the force F is present and is held fixed as the rod deforms, the deformation causesR the rod’s right ℓ end to move inward relative to the left end by an amount δℓ = − 0 21 (∂η/∂x)2 dx. [Prove this using the Pythagorean theorem.] As the rod moves inward, the force on its right end does an amount of work −F δℓ on the rod. Correspondingly, the amount of free energy that the rod has (the amount of energy adjusted for the energy −F δℓ that gets exchanged with the force F when the rod bends) is 2 # Z " 2 2 1 ℓ ∂ η ∂η dx . (11.43) H = E + F δℓ = D −F 2 2 0 ∂x ∂x (c) Think of the constant force F as arising from a “volume bath” with pressure P , that the rod’s ends (with cross sectional area hw) are in contact with. Show that the free energy (11.43) can be reexpressed as H = E +P δV , where −δV = −hwδℓ is the change in the bath’s volume as a result of the rod’s ends moving when it bends. This is the enthalpy of the rod, associated with the bending (Ex. 4.3), and the laws of statistical mechanics for any system in contact with a volume bath tell us that the system’s most stable state is the one with minimum enthalpy, i.e. minimum free energy (Ex. 4.8). This minimum-free-energy state η(x) must be stationary under small changes δη(x) of the rod’s shape, subject to δη(0) = δη(ℓ) = 0. Show that this stationarity implies η satisfies the equation of elastostatic equilibrium, Eq. (11.39) with ∂ 2 η/∂t2 = 0, and therefore must have the shape η = ηo sin(nπx/ℓ), i.e. the shape of one of the normal-mode eigenfunctions. [KIP: THERE IS SOME DELICACY OF BOUNDARY CONDITIONS THAT NEEDS TO BE SORTED OUT. PH136 STUDENTS: CAN YOU HELP?] (d) Compare the free energy H1 for the n = 1 bent shape (no nodes) with that H0 of the straight rod by performing the integral (11.43). Your result should be H1 − H0 =

πη 2 1

2ℓ

ℓ(Fcrit − F ) ,

(11.44)

where Fcrit = π 2 D/ℓ2 is the critical force at which the bifurcation of equilibria occurs and the straight rod becomes unstable, and η1 is the amplitude of the bend. (e) Comment: When one includes higher order corrections in ηo , the difference in free energies between the straight rod and the n = 1 shape turns out to be πη 2 πη 2 1 1 −F . (11.45) ℓ Fcrit 1 + H1 − H0 = H1 − H0 = 2ℓ 2ℓ This free-energy difference is plotted in Fig. 11.3, as a function of η1 (the amplitude of the rod’s deformation) for various applied forces F . For F < Fcrit there is only one extremum: a minimum at η1 = 0, so the only equilibrium state is that of the straight

22 H1-H0 F=0

F=Fcrit F=1.1Fcrit

-

F=1.2Fcrit η1

Fig. 11.3: The difference in free energy H1 − H0 between a rod bent in the n = 1 shape with bend amplitude η1 and the unbent rod.

rod, and that equilibrium is stable. As F increases past Fcrit the p minimum at η = 0 becomes a maximum, and two new minima are created, at η = ± (2ℓ2 /π 2 )(F − Fcrit ). This is the bifurcation of equilibria, with the straight rod becoming unstable and the upward or downward bent rod, in the n = 1 shape, being stable.

****************************

11.4

Body Waves and Surface Waves — Seismology

In Sec. 11.2 we derived the dispersion relations ω = CL k and ω = CT k for elastodynamic waves in uniform media. We now consider how the waves are modified in an inhomogeneous, finite body, the earth. The earth is well approximated as a sphere of radius R ∼ 6000 km and mean density ρ¯ ∼ 6000 kg m−3 . The outer crust comprising rocks of high tensile strength rests on a denser but more malleable mantle, the two regions being separated by the famous Moho discontinuity. Underlying the mantle is an outer core mainly comprised of liquid iron, which itself surrounds a denser, solid inner core; see Table 11.1 and Fig.11.6 below. The pressure in the Earth’s interior is much larger than atmospheric and the rocks are therefore quite compressed. Their atomic structure cannot be regarded as a small perturbation from their structure in vacuo. Nevertheless, we can still use linear elasticity theory to discuss small perturbations about this equilibrium. This is because the crystal lattice has had plenty of time to re-establish a new equilibrium with a much smaller lattice spacing (Figure 11.4). The density of lattice defects and dislocations will probably not differ appreciably from the density on the earth’s surface. The linear stress-strain relation should still apply below the elastic limit, though the elastic moduli are much greater than those measured at atmospheric pressure. We can estimate the magnitude of the pressure P in the Earth’s interior by idealizing the earth as an isotropic medium with negligible shear stress so its stress tensor is like that

23 V

V

r

r

r0

r0

(a)

(b)

Fig. 11.4: Potential energy curves (dashed) for nearest neighbors in a crystal lattice. (a) At atmospheric (effectively zero) pressure, the equilibrium spacing is set by the minimum in the potential energy which is a combination of hard electrostatic repulsion by the nearest neighbors (upper solid curve) and a softer overall attraction associated with all the nearby ions (lower solid curve). (b) At much higher pressure, the softer, attractive component is moved inward and the equilibrium spacing is greatly reduced. The bulk modulus is proportional to the curvature of the potential energy curve at its minimum, and is considerably increased.

of a fluid, T = P g (where g is the metric tensor). Then the equation of static equilibrium takes the form dP = −gρ , (11.46) dr where ρ is density and g(r) is the acceleration of gravity at radius r. This equation can be approximated by P ∼ ρ¯gR ∼ 300GPa ∼ 3 × 106 atmospheres , (11.47) where g is now the acceleration of gravity at the earth’s surface r = R, and ρ¯ is the earth’s mean density. This agrees well numerically with the accurate value of 360GPa at the earth’s center. The bulk modulus produces the isotropic pressure P = −KΘ [Eq. (11.29)]; and since Θ = −δρ/ρ [Eq. (10.18)], the bulk modulus can be expressed as K=

dP . d ln ρ

(11.48)

[Strictly speaking, we should distinguish between adiabatic and isothermal variations in Eq. (11.48), but the distinction is small for solids; see the passage following Eq. (10.71). It is significant for gases.] Typically, the bulk modulus inside the earth is 4-5 times the pressure and the shear modulus in the crust and mantle is about half the bulk modulus.

11.4.1

Body Waves

Virtually all our direct information about the internal structure of the earth comes from measurements of the propagation times of elastic waves generated by earthquakes. There are two fundamental kinds of body waves: the longitudinal and shear modes of Sec. 11.2.

24 SVi

SVr SVi

Pr

boundary

Pt SVt

SVt

Fig. 11.5: An incident shear wave polarized in the vertical direction (SVi ), incident from above on a boundary, produces both a longitudinal (P) wave and a SV wave in reflection and in transmission. If the wave speeds increase across the boundary (the case shown), then the transmitted waves, SVt , Pt , will be refracted away from the vertical. A shear mode, SVr , will be reflected at the same angle as the incident wave. However, the reflected P mode, Pr , will be reflected at a greater angle to the vertical as it is has a greater speed.

These are known in seismology as P-modes and S-modes respectively. The two polarizations of the shear waves are designated SH and SV, where H and V stand for “horizontal” and “vertical” displacements, i.e., displacements orthogonal to k that are fully horizontal, or that ˆ are obtained by projecting the vertical direction ez orthogonal to k. We shall first be concerned with what seismologists call high-frequency (of order 1Hz) modes. This leads to three related simplifications. As typical wave speeds lie in the range 3–14 km s−1 , the wavelengths lie in the range ∼ 1 − 10 km which is generally small compared with the distance over which gravity causes the pressure to change significantly – the pressure scale height. It turns out that we then can ignore the effects of gravity on the propagation of small perturbations. In addition, we can regard the medium as effectively homogeneous and infinite and use the local dispersion relations ω = cL,T k, Finally, as the wavelengths are short we can trace rays through the earth using geometrical optics (Sec. 6.3). Zone

R 103 km Inner Core 1.2 Outer Core 3.5 Mantle 6.35 Crust 6.37 Ocean 6.37

ρ 103 kg m−3 13 10-12 3-5 3 1

K GPa 1400 600-1300 100-600 50 2

µ GPa 160 70-250 30 -

CP km s−1 11 8-10 8-14 6-7 1.5

CS km s−1 2 5-7 3-4 -

Table 11.1: Typical outer radii (R), densities (ρ), bulk moduli (K), shear moduli(µ), P-wave speeds and S-wave speeds within different zones of the earth. Note the absence of shear waves in the fluid regions. (Adapted from Stacey 1977.)

Despite these simplifications, the earth is quite inhomogeneous and the sound speeds vary significantly with radius; see Table 11.1. Two types of variation can be distinguished, the abrupt and the gradual. To a fair approximation, the earth is horizontally stratified below

25 the outer crust. However, there are several abrupt changes in composition in the crust and mantle (including the Moho discontinuity) where the density, pressure and elastic constants apparently change over distances short compared with a wavelength. Seismic waves incident on these discontinuities behave like light incident on the surface of a glass plate; they can be reflected and refracted. In addition, as there are now two different waves with different phase speeds, it is possible to generate SV waves from pure P waves and vice versa at a discontinuity (Fig. 11.5). However, this wave-wave mixing is confined to SV and P; the SH waves do not mix with SV or P. The junction conditions that control this wave mixing and all other details of the waves’ behavior at a discontinuity are: (i) the displacement ξ must be continuous across the boundary (otherwise there would be infinite strain and infinite stress there); and (ii) the net force acting on an element of surface must be zero (otherwise the surface, having no mass, would have infinite acceleration), so the force per unit area acting from the front face of the boundary to the back must be balanced by that acting from the back to the front. If we take the unit normal to the horizontal boundary to be ez , then these boundary conditions become [ξ j ] = [Tjz ] = 0 ,

(11.49)

where the notation [X] signifies the difference in X across the boundary and the j is a vector index. (For an alternative, more formal derivation of [Tjz ] = 0, see Ex. 11.9.) One consequence of these boundary conditions is Snell’s law for the directions of propagation of the waves: Since these continuity conditions must be satisfied all along the boundary and at all times, the phase φ = k·x−ωt of the wave must be continuous across the boundary at all locations x on it and all times, which means that the phase φ must be the same on the boundary for all transmitted waves and all reflected waves as for the incident waves. This is possible only if the frequency ω, the horizontal wave number kH = k sin α, and the horizontal phase speed cH = ω/kH = ω/(k sin α), are the same for all the waves. (Here kH = k sin α is the magnitude of the horizontal component of a wave’s propagation vector and α is the angle between its propagation direction and the vertical; cf. Fig. 11.5.) Thus, we arrive at Snell’s law: For every reflected or transmitted wave J, the horizontal phase speed must be the same as for the incident wave: cJ = cH is the same for all J. sin αJ

(11.50)

It is straightforward though tedious to compute the reflection and transmission coefficients (e.g. the strength of transmitted P-wave produced by an incident SV wave) for the general case using the boundary conditions (11.49); see, e.g., Eringen and Suhubi (1975). The analysis is straightforward but algebraically complex. For the very simplest of examples, see Ex. 11.10. Gradual variation in the wave speeds, due to gradual variations of the elastic moduli and density inside the earth, can be handled using geometrical optics: In the regions between the discontinuities, the pressures and consequently the elastic moduli increase steadily, over many wavelengths, with depth. The elastic moduli generally increase more rapidly than the density so the wave speeds generally also increase with depth,

26 i.e. dc/dr < 0. This radial variation in c causes the rays along which the waves propagate to bend. The details of this bending are governed by Hamilton’s equations, with the Hamiltonian Ω(x, k) determined by the simple nondispersive dispersion relation Ω = c(x)k (Sec. 6.3.1). Hamilton’s equations in this case reduce to the simple ray equation (6.42), which (since the index of refraction is ∝ 1/c) can be rewritten as 1 d 1 dx =∇ . (11.51) ds c ds c Here s is distance along the ray, so dx/ds = n is the unit vector tangent to the ray. This ray equation can be reexpressed in the following form: dn/ds = −(∇ ln c)⊥ ,

(11.52)

where the subscript ⊥ means “projected perpendicular to the ray;” and this in turn means that the ray bends away from the direction in which c increases (i.e., it bends upward inside the earth since c increases downward) with the radius of curvature of the bend given by R=

1 1 = . |(∇ ln c)⊥ | |(d ln c/dr) sin α|

(11.53)

Here α is the angle between the ray and the radial direction; see the bending rays in Fig. 11.6. Figure 11.6 shows schematically the propagation of seismic waves through the earth. At each discontinuity in the earth’s material, Snell’s law governs the directions of the reflected and transmitted waves. As an example, note from Eq. (11.50) that an SV mode incident on a boundary cannot generate any P mode when its angle of incidence exceeds sin−1 (cT i /cLt ). (Here we use the standard notation CT for the phase speed of an S wave and CL for that of a P wave.) This is what happens at points b and c in Fig. 11.6.

11.4.2

Edge Waves

One phenomenon that is important in seismology but is absent for many other types of wave motion is the existence of “edge waves”, i.e., waves that propagate along a discontinuity in the elastic medium. An important example is surface waves, which propagate along the surface of a medium (e.g., the earth) and that decay exponentially with depth. Waves with such exponential decay are sometimes called evanescent. The simplest type of surface wave is called a Rayleigh wave. We shall now analyze Rayleigh waves for the idealisation of a plane semi-infinite solid. This discussion must be modified to allow for both the density stratification and the surface curvature when it is applied to the earth. However, the qualitative character of the mode is unchanged. Rayleigh waves are an intertwined mixture of P and SV waves; and, in analyzing them, it is useful to resolve their displacement vector ξ into a sum of a (longitudinal) P-wave component, ξL , and a (transverse) S-wave component, ξT . Consider a semi-infinite elastic medium and introduce a local Cartesian coordinate system with ez normal to the surface, with ex lying in the surface, and with the propagation vector k in the ez -ex plane. The propagation vector will have a real component along the horizontal,

27

d

E

a sv q

sv

sv sv

sv

p v sv

b r

p

s t sv Inner Core

u

sv

p

Outer Core Mantle Crust Fig. 11.6: Seismic wave propagation in a schematic earth model. A SV wave made by an earthquake, E, propagates to the crust-mantle boundary at a where it generates two transmitted waves (SV and P) and two reflected waves (SV and P). The transmitted SV wave propagates along rays that bend upward a bit (geometric optics bending) and hits the mantle-outer-core boundary at b. There can be no transmitted SV wave at b because the outer core is fluid; there can be no transmitted or reflected P wave because the angle of incidence of the SV wave is too great; so the SV wave is perfectly reflected. It then travels along an upward curving ray, to the crust-mantle interface at d, where it generates four waves, two of which hit the earth’s surface. The earthquake E also generates an SV wave traveling almost radially inward, through the crust-mantle interface at q, to the mantle-outer-core interface at r. Because the outer core is liquid, it cannot support an SV wave, so only a P wave is transmitted into the outer core at r. That P wave propagates to the interface with the inner core at s, where it regenerates an SV wave (shown) along with the transmitted and reflected P waves. The SV wave refracts back upward in the inner core, and generates a P wave at the interface with the outer core t; that P wave propagates through the liquid outer core to u where it generates an SV wave along with its transmitted and reflected P waves; that SV wave travels nearly radially outward, through v to the earth’s surface.

28

Fig. 11.7: Rayleigh waves in a semi-infinite elastic medium.

ex direction, corresponding to true propagation, and an imaginary component along the ez direction, corresponding to an exponential decay of the amplitude as one goes downward into the medium. In order for the longitudinal (P-wave) and transverse (SV-wave) parts of the wave to remain in phase with each other as they propagate along the boundary, they must have the same values of the frequency ω and horizontal wave number kx . However, there is no reason why their vertical e-folding lengths should be the same, i.e. why their imaginary kz ’s should be the same. We therefore shall denote their imaginary kz ’s by −iqL for the longitudinal (P-wave) component and −iqT for the transverse (S-wave) component, and we shall denote kx by k. Focus attention, first, on the longitudinal part of the wave. Its displacement must have the form ξL = AeqLz+i(kx−ωt) , z ≤ 0 . (11.54) Substituting into the general dispersion relation ω 2 = CL2 k2 for longitudinal waves, we obtain qL = (k 2 − ω 2 /c2L )1/2 .

(11.55)

Now, the longitudinal displacement field is irrotational (curl-free), so we can write L L ξx,z = ξz,x

(11.56)

or

iqL ξxL (11.57) k As the transverse component is solenoidal (divergence-free), the expansion of the combined P-T wave is produced entirely by the P component: qL2 L (11.58) Θ = ∇ · ξ = ik 1 − 2 A . k ξzL =

Now turn to the transverse (SV-wave) component. We write ξ T = B expqT z+i(kx−ωt) ,

z≤0,

(11.59)

.

(11.60)

where (by virtue of the transverse dispersion relation) qT =

ω2 k − 2 cT 2

1/2

29 As the transverse mode is solenoidal, we obtain ξzT = and for the rotation

ikξxT qT

(11.61) k2 1− 2 B . qT

1 1 φy = ey · ∇ × ξT = − qT 2 2

(11.62)

We must next impose boundary conditions at the surface. Now, as the surface is free, there will be no force acting upon it, so, T · ez |z=0 = 0,

(11.63)

which is a special case of the general boundary condition (11.49). (Note that we can evaluate the stress at the unperturbed surface location rather than at the displaced surface as we are only working to linear order.) The normal stress is 1 −Tzz = KΘ + 2µ(ξz,z − Θ) = 0 , 3

(11.64)

−Txz = 2µ(ξz,x + ξx,z ) = 0 .

(11.65)

and the tangential stress is Combining Eqs. (11.58), (11.62), (11.64) and (11.65), we obtain (k 2 + qT2 )2 = 4qL qT k 2 .

(11.66)

Next we substitute for qL , qT from (11.55) and (11.60) to obtain the dispersion relation 8 2−ν 3 2 ζ− ζ − 8ζ + 8 =0, (11.67) 1−ν (1 − ν) where ζ=

ω CT k

2

.

(11.68)

The dispersion relation (11.67) is a third order polynomial in ω 2 with generally just one positive real root. From Eqs. (11.67) and (11.68), we see that for a Poisson ratio characteristic of rocks, 0.2 . ν . 0.3, the phase speed of a Rayleigh wave is roughly 0.9 times the speed of a pure shear wave; cf. Fig. 11.8. Rayleigh waves propagate around the surface of the earth rather than penetrate the interior. However, our treatment is inadequate because their wavelengths, typically 1–10 km if generated by an earthquake, are not necessarily small compared with the pressure scale heights in the outer crust. Our wave equation has to be modified to include these vertical gradients. This vertical stratification has an important additional consequence. If, ignoring these gradients, we attempt to find an orthogonal surface mode just involving SH waves, we find that we cannot simultaneously satisfy the surface boundary conditions on displacement and

30

1.0

CR CT 0.9

0.8

0

0.25

0.5

ν

Fig. 11.8: Solution of the dispersion relation (11.67) for different values of Poisson’s ratio, ν.

stress with a single evanescent wave. We need two modes to do this. However, when we allow for stratification, the strong refraction allows an SH surface wave to propagate. This is known as a Love wave. The reason for its practical importance is that seismic waves are also created by underground nuclear explosions and it is necessary to be able to distinguish explosion-generated waves from earthquake waves. Now, an earthquake is usually caused by the transverse slippage of two blocks of crust across a fault line. It is therefore an efficient generator of shear modes including Love waves. By contrast, explosions involve radial motions away from the point of explosion and are inefficient emitters of Love waves. This allows these two sources of seismic disturbance to be distinguished.

11.4.3

Green’s Function for a Homogeneous Half Space

To get insight into the combination of waves generated by a localized source, such as an explosion or earthquake, it is useful to examine the Green’s function for excitations in a homogeneous half space. Physicists define the Green’s function Gjk (x, t; x′ , t′ ) to be the displacement response ξj (x, t) to a unit delta-function force in the ek direction at location x′ and time t′ , F = δ(x − x′ )δ(t − t′ )ek . Geophysicists sometimes find it useful to work, instead, ′ ′ with the “Heaviside Green’s function,” GH jk (x, t; x , t ), which is the displacement response ξj (x, t) to a unit step-function force (one that turns on to unit strength and remains forever constant afterwards) at x′ and t′ : F = δ(x − x′ )H(t − t′ )ek . Because δ(t − t′ ) is the time derivative of the Heaviside step function H(t − t′ ), the Heaviside Green’s function is the time integral of the physicists’ Green’s function. The Heaviside Green’s function has the advantage that one can easily see, visually, the size of the step functions it contains, by contrast with the size of the delta functions contained in the physicists’ Green’s function. It is a rather complicated task to compute the Heaviside Green’s function, and geophysicists have devoted much effort to doing so. We shall not give details of such computations, but merely show the Green’s function graphically in Fig. 11.9 for an instructive situation: the displacement produced by a step-function force in a homogeneous half space with the observer at the surface and the force at two different locations: (a) a point nearly beneath the observer, and (b) a point close to the surface and some distance away in the x direction. Several features of this Green’s function deserve note: (i) For the source nearly beneath

31

Horizontal, Longitudinal Force

H

Gxx Horizontal, Longitudinal Force H

Gxx H G zx

Horizontal, Longitudinal Force

H G zx

Horizontal, Transverse Force

H G yy H Gxz

Vertical Force

Horizontal, Transverse Force H G yy

Vertical Force

H

Gxz

H

G zz

Vertical Force Vertical Force

GHzz 1 P

2 S (a)

3

t (sec)

4

1

P

2

SR

3

4

t (sec)

(b)

Fig. 11.9: The Heaviside Green’s function (displacement response to a step-function force) in a homogeneous half space; adapted from Figs. 2 and 4 of Johnson (1974). The observer is at the surface. The force is applied at a point in the x−z plane, with a direction given by the second index of GH ; the displacement direction is given by the first index of GH . In (a), the source is nearly directly beneath the observer so the waves propagate nearly vertically upward; more specifically, the source is at 10 km depth and 2 km distance along the horizontal x direction. In (b), the source is close to the surface and the waves propagate nearly horizontally, in the x direction; more specifically, the source is at 2 km depth and is 10 km distance along the horizontal x direction. The longitudinal and transverse speeds are cH = 8 km/s and CS = 4.62 km/s, and the density is 3.30 g/cm3 . For a force of 1 dyne, a division on the vertical scale is 10−19 cm. The moments of arrival of the P-wave, S-wave and Rayleigh wave from the moment of force turnon are indicated on the horizontal axis.

32

Fig. 11.10: Surface displacements associated with three simple classes of free oscillation. a)Radial modes. b) l=2 spheroidal mode. c) Torsional mode.

the observer [graphs (a)], there is no sign of any Rayleigh wave, whereas for the source close to the surface, the Rayleigh wave is the strongest feature in the x and z (longitudinal and vertical) displacements but is absent from the y (transverse) displacement. (ii) The y (transverse) component of force produces a transverse displacement that is strongly concentrated in the S-wave. (iii) The x and z (longitudinal and vertical) components of force produce x and z displacements that include P-waves, S-waves, and (for the source near the surface) Rayleigh waves. (iv) The gradually changing displacements that occur between the arrival of the turn-on P-wave and turn-on S-wave are due to P-waves that hit the surface some distance from the observer, and from there diffract to the observer as a mixture of P- and S-waves, and similarly for gradual changes of displacement after the turn-on S-wave. The complexity of seismic waves arises in part from the richness of features in this homogeneous-half-space Green’s function, in part from the influences of the earth’s inhomogeneities, and in part from the complexity of an earthquake’s or explosion’s forces.

11.4.4

Free Oscillations of Solid Bodies

In computing the dispersion relations for body (P- and S-wave) and surface (Rayleigh-wave) modes, we have assumed that the wavelength is small compared with the earth’s radius and therefore can have a continuous frequency spectrum. However, it is also possible to excite global wave modes in which the whole earth “rings”. If we regard the earth as spherically symmetric, then we can isolate three fundamental types of oscillation, radial, spheroidal and torsional. If we introduce spherical polar coordinates for the displacement, then it is possible to separate and solve the equations of elastodynamics to find the normal modes just like solving the Schrodinger equation for a central potential. Each of the three types of modes has a displacement vector ξ characterized by its own type of spherical harmonic. The spheroidal modes have radial displacements proportional to Ylm (θ, φ)er (where θ, φ are spherical coordinates, Ylm is the scalar spherical harmonic of order l and m, and er is the unit radial vector; and they have nonradial components proportional to ∇Ylm ). These modes are called “spheroidal” because (when one ignores the tiny nonsphericity of the earth and ignores Coriolis and centrifugal forces due to the earth’s rotation), their eigenfrequencies are independent of m, and thus can be studied by specializing to m = 0, in which case the displacements become ξr ∝ Pl (cos θ) , ξθ ∝ sin θPl′(cos θ). (11.69) These displacements deform the earth in a spheroidal manner for the special case l = 2. [In

33 Eq. (11.69) Pl is the Legendre polynomial and Pl′ is the derivative of Pl with respect to its argument.] The radial modes are the special case l = 0 of these spheroidal modes. It is often mistakenly asserted that there are no l = 1 modes because of conservation of momentum. In fact, l = 1 modes do exist: for example, the central regions of the earth can move up, while the outer regions move down. The l = 2 spheroidal mode has a period of 53 min. and can ring for about 1000 periods. (We say that its quality factor is Q ∼ 1000.) This is typical for solid planets. Toroidal modes have vanishing radial displacements, and their nonradial displacements are proportional to the vector spherical harmonic er × ∇Ylm . As for spheroidal modes, spherical symmetry of the unperturbed earth guarantees that the eigenfrequencies will be independent of the azimuthal quantum number m, so m = 0 is representative. For m = 0 the only nonzero component of the vector spherical harmonic er ×∇Ylm is in the φ direction, and it gives ξφ ∝ sin θPl′ (cos θ) . (11.70) In these modes alternate zones of different latitude oscillate in opposite directions (clockwise or counterclockwise at some chosen moment of time), in such a way as to conserve total angular momentum. When one writes the displacement vector ξ for a general vibration of the earth as a sum over these various types of normal modes, and inserts that sum into the wave equation (11.4b) (augmented, for greater realism, by gravitational forces), spherical symmetry of the unperturbed earth guarantees that the various modes will separate from each other, and for each mode the wave equation will give a radial wave equation analogous to that for a hydrogen atom in quantum mechanics. The boundary condition T · er = 0 at the earth’s surface constrains the solutions of the radial wave equation, for each mode, to be a discrete set, which one can label by the number n of radial nodes that they possess (just as for the hydrogen atom). The frequencies of the modes increase with both n and l. For small values of the quantum numbers, the modes are quite sensitive to the model assumed for the earth’s structure. For example, they are sensitive to whether one correctly includes the gravitational restoring force in the wave equation. However, for large l and n, the spheroidal and toroidal modes become standing combinations of P waves, SV waves, SH waves, Rayleigh and Love waves, and therefore they are rather insensitive to one’s ignoring the effects of gravity.

11.4.5

Seismic tomography

Observations of all of these types of seismic waves clearly code much information about the earth’s structure and inverting the measurements to infer this structure has become a highly sophisticated and numerically intensive branch of geophysics. The travel times of the P and S body waves can be measured at various points over the earth’s surface and essentially allow CL and CT and hence K/ρ and µ/ρ to be determined as functions of radius inside the earth. Travel times are . 1 hour. Using this type of analysis, seismologists can infer the presence of hot and cold regions within the mantle and then show how the rocks are circulating under the crust.

34 It is also possible to combine the observed travel times with the the earth’s equation of elastostic equilibrium K dρ dP = = −g(r)ρ , (11.71) dr ρ dr where the local gravity is given by 4πG g= 2 r

Z

r

r ′2 ρ(r ′ )dr ′ ,

(11.72)

0

to determine the distributions of density, pressure and elastic constants. Measurements of Rayleigh and Love waves can be used to probe the surface layers. The results of this procedure are then input to obtain free oscillation frequencies which compare well with the observations. The damping rates for the free oscillations furnish information on the interior viscosity. **************************** EXERCISES Exercise 11.9 Derivation: Junction Condition at a Discontinuity Derive the junction condition [Tjz ] = 0 at a horizontal discontinuity between two media by the same method as one uses in electrodynamics to show that the normal component of the magnetic field must be continuous: Integrate the equation of motion ρdv/dt = −∇ · T over the volume of an infinitesimally thin “pill box” centered on the boundary, and convert the volume integral to a surface integral via Gauss’s theorem. Exercise 11.10 Example: Reflection and Transmission of Normal, Longitudinal Waves at a Boundary Consider a longitudinal elastic wave incident normally on the boundary between two media, labeled 1,2. By matching the displacement and the normal component of stress at the boundary, show that the ratio of the transmitted wave amplitude to the incident amplitude is given by 2Z1 t= Z1 + Z2 where Z1,2 = [ρ1,2 (K1,2 + 4µ1,2/3)]1/2 is known as the acoustic impedance. (The impedance is independent of frequency and just a characteristic of the material.) Likewise, evaluate the amplitude reflection coefficent and verify that wave energy flux is conserved. Exercise 11.11 Example: Earthquakes The magnitude M of an earthquake is a quantitative measure of the strength of the seismic waves it creates. Roughly speaking, the elastic wave energy release can be inferred semiempirically from the magnitude using the formula E = 105.2+1.44M J

35

Fault

λ

VT

Fig. 11.11: Earthquake: The region of the fault that slips (solid rectangle), and the volume over which the strain is relieved, on one side of the fault (dashed region).

The largest earthquakes have magnitude ∼ 8.5. One type of earthquake is caused by slippage along a fault deep in the crust. Suppose that most of the seismic power in an earthquake with M ∼ 8.5 is emitted at frequencies ∼ 1Hz and that the quake lasts for a time T ∼ 100s. If V is an average wave speed, then it is believed that the stress is relieved over an area of fault of length ∼ V T and a depth of order one wavelength. By comparing the stored elastic energy with the measured energy release make an estimate of the minimum strain prior to the earthquake. Is this reasonable? Hence estimate the typical displacement during the earthquake in the vicinity of the fault. Make an order of magnitude estimate of the acceleration measurable by a seismometer in the next state and in the next continent. (Ignore the effects of density stratification, which are actually quite significant.) Exercise 11.12 Example: Normal Modes of an Elastic, Homogeneous Sphere EXERCISE NOT YET WRITTEN. ****************************

11.5

The Relationship of Classical Waves to Quantum Mechanical Excitations.

In the previous chapter, we explored the limits of the continuum approximation and showed how we must acknowledge that solids are composed of atoms in order to account for the magnitude of the elastic constants and to explain why most solids yield under comparatively small strain. A quite different demonstration of the limits of the continuum approximation is provided by the normal modes of vibration of a finite sized solid body—e.g., the sphere treated in Sec. 11.4.4 and Ex. 11.12. For any such body, one can solve the wave equation (11.4b) [subject to the vanishingsurface-force boundary condition T · n = 0, Eq. (10.21)] to find the body’s normal modes, as we did in Ex. 11.12 for the sphere. We shall label the normal modes by a single index N, and shall denote the eigenfrequency of mode N by ωN and its (typically complex) eigenfunction by ξN . Then any general, small-amplitude disturbance in the body can be decomposed into a linear superposition of these normal modes: X ξ(x, t) = ℜ aN (t)ξ N (x) , aN = AN exp(−iωN t) . (11.73) N

36 Here ℜ means to take the real part, aN is the complex generalized coordinate of mode N, and AN is its complex amplitude. It is convenient to normalize the eigenfunctions so that Z ρ|ξ N |2 dV = M , (11.74) where M is the mass of the body; AN then measures the mean physical displacement in mode N. Classical electromagnetic waves in vacuo are described by linear Maxwell equations, and, after they have been excited, will essentially propagate forever. This is not so for elastic waves, where the linear wave equation is only an approximation. Nonlinearities, and most especially impurities and defects in the homogeneous structure of the body’s material, will cause the different modes to interact weakly so that their complex amplitudes AN change slowly with time according to a damped simple harmonic oscillator differential equation of the form 2 a¨N + (2/τN )a˙ N + ωN aN = FN′ /M . (11.75) Here the second term on the left hand side is a damping term that will cause the mode to decay as long as τN > 0, and FN′ is a fluctuating or stochastic force on mode N caused by weak coupling to the other modes. Equation (11.75) is the Langevin equation that we studied in Chap. 5, and the strength and spectrum of the fluctuating force FN′ is determined by the fluctuation-dissipation theorem, Eq. (5.111). If the modes are thermalized at temperature T , then the fluctuating forces maintain an average energy of kT in each one. Now, what happens quantum mechanically? The ions and electrons in an elastic solid interact so strongly that it is very difficult to analyze them directly. A quantum mechanical treatment is much easier if one makes a canonical transformation from the coordinates and momenta of the individual ions or atoms to new, generalized coordinates xˆN and momenta pˆN that represent weakly interacting normal modes. These coordinates and momenta are Hermitian operators, and they are related to the quantum mechanical complex generalized coordinate a ˆn by 1 aN + a ˆ†N ), xˆN = (ˆ (11.76a) 2 pˆN =

MωN (ˆ aN − aˆ†N ), 2i

(11.76b)

where the dagger denotes the Hermitean adjoint. We can transform back to obtain an expression for the displacement of the i’th ion 1 ∗ ˆ i = ΣN [ˆ aN ξN (xi ) + a ˆ†N ξN x (xi )] 2 [a quantum version of Eq. (11.73)]. The Hamiltonian can be written in terms of these coordinates as 2 p ˆ 1 N 2 2 ˆ = ΣN ˆ int , H + MωN xˆN + H 2M 2

(11.77)

(11.78)

37 where the first term is a sum of simple harmonic oscillator Hamiltonians for individual ˆ int is the perturbative interaction Hamiltonian which takes the place of the modes and H combined damping and stochastic forcing terms in the classical Langevin equation (11.75). When the various modes are thermalized, the mean energy in mode N takes on the standard Bose-Einstein form 1 1 (11.79) + E¯N = ~ωN 2 exp(~ωN /kT ) − 1 [Eq. (3.27b) with vanishing chemical potential and augmented by a “zero-point energy” of 1 ~ω], which reduces to kT in the classical limit ~ → 0. 2 As the unperturbed Hamiltonian for each mode is identical to that for a particle in a harmonic oscillator potential well, it is sensible to think of each wave mode in a manner analogous to such a particle-in-well. Just as the particle-in-well can reside in any one of a series of discrete energy levels lying above the “zero point” energy of ~ω/2, and separated by ~ω, so each wave mode with frequency ωN must have an energy (n + 1/2)~ωN , where n is an integer. The operator which causes the energy of the mode to decrease by ~ωN is the annihilation operator for mode n α ˆN =

MωN ~

1/2

aˆN ,

(11.80)

and the operator which causes an increase in the energy by ~ωN is its Hermitian conjugate, † the creation operator α ˆN . In the case of wave modes, it is useful to think of each increase or decrease in the energy as the creation or annihilation of an individual quantum or “particle” of energy, so that when the energy in mode N is (n + 1/2)~ωN , there are n quanta (particles) present. These particles are called phonons. Phonons are not conserved, and because they can co-exist in the same state (the same mode), they are bosons. They have individual energies and momenta which must be conserved in their interactions with each other and with other types of particles, e.g. electrons. The important question is now, given an elastic solid at finite temperature, do we think of its thermal agitation as a superposition of classical wave modes or do we regard it as a gas of quanta? The answer depends upon what we want to do. From a purely fundamental viewpoint, the quantum mechanical description takes precedence. However, for many problems where the number of phonons per mode nN ∼ kT /~ωN is large compared to one, the classical description is amply adequate and much easier to handle. We do not need a quantum treatment when computing the normal modes of a vibrating building excited by an earthquake or when trying to understand how to improve the sound quality of a violin. Here the difficulty is in accommodating the boundary conditions so as to determine the normal modes. All this was expected. What comes as more of a surprise is that often, for purely classical problems, where ~ is quantitatively irrelevant, the fastest way to procede formally is to follow the quantum route and then take the limit ~ → 0. We shall see this graphically demonstrated when we discuss nonlinear plasma physics in Chap. 22.

38 Box 11.3 Important Concepts in Chapter 11 • Elastodynamic conservation of mass and momentum, Sec. 11.2.1 • Methods of deriving and solving wave equations in continuum mechanics, Box 11.2 • Decomposition of elastodynamic waves into longitudinal and transverse components, Sec. 11.2.2 and Ex. 11.1 • Dispersion relation and propagation speeds for longitudinal and transverse waves, Secs. 11.2.3 and 11.2.4 • Energy density and energy flux of elastodynamic waves, Sec. 11.2.5 • Waves on rods: compression waves, torsion waves, string waves, flexural waves, Secs. 11.3.1 – 11.3.4 • Edge waves, and Rayleigh waves as an example, Sec. 11.4.2 • Wave mixing in reflections off boundaries, Sec. 11.4.1 – Conservation of tangential phase speed and its implications for directions of wave propagation, Sec. 11.4.1 – Boundary conditions on stress and displacement, Sec. 11.4.1 • Greens functions for elastodynamic waves; Heaviside vs. physicists’ Greens functions Sec. 11.4.3 • Elastodynamic free oscillations (normal modes), Secs. 11.4.4 and 11.5 • Relation of classical waves to quantum mechanical excitations, Sec. 11.5 • Onset of instability and zero-frequency modes related to bifurcation of equilibria, Sec. 11.3.5 • Free energy for a system on which a force is acting, and its use to diagnose stability, Ex. 11.8

Bibliographic Note For a discussion of textbooks on elasticity theory, including both elastostatics and elastodynamics, see the Bibliographic note at the end of Chap. 10.

39

Bibliography Anderson, D. L. & Dziewanski, A. M. 1984. Scientific American. Eringen, A. C. & Suhubi, E. S. 1975. Chap. 7 of Elastodynamics, Vol. II, Linear Theory, New York: Academic Press. Kolsky, H. 1963. Stress Waves in Solids, New York: Dover. Johnson, L. R. 1974. “Green’s function for Lamb’s problem,” Geophys. J. Roy. astron. Soc. 37, 99–131. Lifshitz, E. M. and Pitaevskii, L. P. 1980. Statistical Physics. Part 1 (Third Edition), Oxford: Pergamon. Landau, L. D. and Lifshitz, E. M. 1970. Elasticity, Oxford: Pergamon. Stacey, F. D. 1977. Physics of the Earth, New York: Wiley. Turcotte, D. L. and Schubert, G. 1982. Geodynamics, New York: Wiley.

Contents IV

FLUID MECHANICS

ii

12 Foundations of Fluid Dynamics 12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 The Macroscopic Nature of a Fluid: Density, Pressure, Flow velocity; Fluids vs. Gases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Hydrostatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Archimedes’ Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Stars and Planets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.3 Hydrostatics of Rotating Fluids . . . . . . . . . . . . . . . . . . . . . 12.4 Conservation Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Conservation Laws for an Ideal Fluid . . . . . . . . . . . . . . . . . . . . . . 12.5.1 Mass Conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.2 Momentum Conservation . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.3 Euler Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.4 Bernoulli’s Theorem; Expansion, Vorticity and Shear . . . . . . . . . 12.5.5 Conservation of Energy . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Incompressible Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Viscous Flows - Pipe Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.1 Decomposition of the Velocity Gradient . . . . . . . . . . . . . . . . . 12.7.2 Navier-Stokes Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.3 Energy conservation and entropy production . . . . . . . . . . . . . . 12.7.4 Molecular Origin of Viscosity . . . . . . . . . . . . . . . . . . . . . . 12.7.5 Reynolds’ Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.6 Blood Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

i

1 1 3 7 9 10 11 15 19 19 20 20 21 23 26 32 32 32 34 35 35 36

Part IV FLUID MECHANICS

ii

Chapter 12 Foundations of Fluid Dynamics Version 0812.1.K, 21 January 2009 Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 12.1 Reader’s Guide • This chapter relies heavily on the geometric view of Newtonian physics (including vector and tensor analysis) laid out in the sections of Chap. 1 labeled “[N]”. • This chapter also relies on the concepts of strain and its irreducible tensorial parts (the expansion, shear and rotation) introduced in Chap. 11. • Chapters 13–18 (fluid mechanics and magnetohydrodynamics) are extensions of this chapter; to understand them, this chapter must be mastered. • Portions of Part V, Plasma Physics (especially Chap. 20 on the “two-fluid formalism”), rely heavily on this chapter. • Small portions of Part VI, General Relativity, will entail relativistic fluids, for which concepts in this chapter will be important.

12.1

Overview

Having studied elasticity theory, we now turn to a second branch of continuum mechanics: fluid dynamics. Three of the four states of matter (gases, liquids and plasmas) can be regarded as fluids and so it is not surprising that interesting fluid phenomena surround us in our everyday lives. Fluid dynamics is an experimental discipline and much of what has been learned has come in response to laboratory investigations. Fluid dynamics finds experimental application in engineering, physics, biophysics, chemistry and many other fields. 1

2 The observational sciences of oceanography, meteorology, astrophysics and geophysics, in which experiments are less frequently performed, are also heavily reliant upon fluid dynamics. Many of these fields have enhanced our appreciation of fluid dynamics by presenting flows under conditions that are inaccessible to laboratory study. Despite this rich diversity, the fundamental principles are common to all of these applications. The fundamental assumption which underlies the governing equations that describe the motion of fluid is that the length and time scales associated with the flow are long compared with the corresponding microscopic scales, so the continuum approximation can be invoked. In this chapter, we will derive and discuss these fundamental equations. They are, in some respects, simpler than the corresponding laws of elastodynamics. However, as with particle dynamics, simplicity in the equations does not imply that the solutions are simple, and indeed they are not! One reason is that there is no restriction that fluid displacements be small (by constrast with elastodynamics where the elastic limit keeps them small), so most fluid phenomena are immediately nonlinear. Relatively few problems in fluid dynamics admit complete, closed-form, analytic solutions, so progress in describing fluid flows has usually come from the introduction of clever physical “models” and the use of judicious mathematical approximations. In more recent years numerical fluid dynamics has come of age and in many areas of fluid mechanics, finite difference simulations have begun to complement laboratory experiments and measurements. Fluid dynamics is a subject where considerable insight accrues from being able to visualize the flow. This is true of fluid experiments where much technical skill is devoted to marking the fluid so it can be photographed, and numerical simulations where frequently more time is devoted to computer graphics than to solving the underlying partial differential equations. We shall pay some attention to flow visualization. The reader should be warned that obtaining an analytic solution to the equations of fluid dynamics is not the same as understanding the flow; it is usually a good idea to sketch the flow pattern at the very least, as a tool for understanding. We shall begin this chapter in Sec. 12.2 with a discussion of the physical nature of a fluid: the possibility to describe it by a piecewise continuous density, velocity, and pressure, and the relationship between density changes and pressure changes. Then in Sec. 12.3 we shall discuss hydrostatics (density and pressure distributions of a static fluid in a static gravitational field); this will parallel our discussion of elastostatics in Chap. 10. Following a discussion of atmospheres, stars and planets, we shall explain the microphysical basis of Archimedes’ law. Our foundation for moving from hydrostatics to hydrodynamics will be conservation laws for mass, momentum and energy. To facilitate that transition, in Sec. 12.4 we shall examine in some depth the physical and mathematical origins of these conservation laws in Newtonian physics. The stress tensor associated with most fluids can be decomposed into an isotropic pressure and a viscous term linear in the rate of shear or velocity gradient. Under many conditions the viscous stress can be neglected over most of the flow and diffusive heat conductivity (Chap. 17) is negligible. The fluid is then called ideal.1 We shall study the laws governing ideal 1

An ideal fluid (also called a perfect fluid ) should not to be confused with an ideal or perfect gas—one whose pressure is due solely to kinetic motions of particles and thus is given by P = nkB T , with n the

3 fluids in Sec. 12.5. After deriving the relevant conservation laws and equation of motion, we shall derive and discuss the Bernoulli theorem (which relies on negligible viscosity) and show how it can simplify the description of many flows. In flows for which the speed neither approaches the speed of sound, nor the gravitational escape velocity, the fractional changes in fluid density are relatively small. It can then be a good approximation to treat the fluid as incompressible and this leads to considerable simplification, which we also study in Sec. 12.5. As we shall see, incompressibility can be a good approximation not just for liquids which tend to have large bulk moduli, but also, more surprisingly, for gases. In Sec. 12.7 we augment our basic equations with terms describing the action of the viscous stresses. This allows us to derive the famous Navier-Stokes equation and to illustrate its use by analyzing pipe flow. Much of our study of fluids in future chapters will focus on this Navier-Stokes equation. In our study of fluids we shall often deal with the influence of a uniform gravitational field, such as that on earth, on lengthscales small compared to the earth’s radius. Occasionally, however, we shall consider inhomogeneous gravitational fields produced by the fluid whose motion we study. For such situations it is useful to introduce gravitational contributions to the stress tensor and energy density and flux. We present and discuss these in a box, Box 12.3, where they will not impede the flow of the main stream of ideas.

12.2

The Macroscopic Nature of a Fluid: Density, Pressure, Flow velocity; Fluids vs. Gases

The macroscopic nature of a fluid follows from two simple observations. The first is that in most flows the macroscopic continuum approximation is valid: Because, in a fluid, the molecular mean free paths are small compared to macroscopic lengthscales, we can define a mean local velocity v(x, t) of the fluid’s molecules, which varies smoothly both spatially and temporally; we call this the fluid’s velocity. For the same reason, other quantities that characterize the fluid, e.g. the density ρ(x, t), also vary smoothly on macroscopic scales. Now, this need not be the case everywhere in the flow. The exception is a shock front, which we shall study in Chap. 16; there the flow varies rapidly, over a length of order the collision mean free path of the molecules. In this case, the continuum approximation is only piecewise valid and we must perform a matching at the shock front. One might think that a second exception is a turbulent flow where, it might be thought, the average molecular velocity will vary rapidly on whatever length scale we choose to study, all the way down to intermolecular distances, so averaging becomes problematic. As we shall see in Chap. 14, this is not the case; in turbulent flows there is generally a length scale far larger than intermolecular distances below which the flow varies smoothly. The second observation is that fluids do not oppose a steady shear strain. This is easy to understand on microscopic grounds as there is no lattice to deform and the molecular velocity distribution remains isotropic in the presence of a static shear. By kinetic theory considerations (Chap. 2), we therefore expect that a fluid’s stress tensor T will be isotropic in the local rest frame of the fluid (i.e., in a frame where v = 0). This allows us to write particle number density, kB Boltzmann’s constant, and T temperature; see Box 12.2.

4 T = P g in the local rest frame, where P is the fluid’s pressure and g is the metric (with Kronecker delta components, gij = δij ). The laws of fluid mechanics, as we shall develop them, are valid equally well for liquids, gases, and (under many circumstances) plasmas. In a liquid, as in a solid, the molecules are packed side by side (but can slide over each other easily). In a gas or plasma the molecules are separated by distances large compared to their sizes. This difference leads to different behaviors under compression: For a liquid, e.g. the water in a lake, the molecules resist strongly even a very small compression; and, as a result, it is useful to characterize the pressure increase by a bulk modulus K, as in an elastic solid (Chap. 10): δP = −KΘ = K

δρ ρ

for a liquid.

(12.1)

(Here we have used the fact that the expansion Θ is the fractional increase in volume, or equivalently by mass conservation the fractional decrease in density.) The bulk modulus for water is about 2.2 GPa, so as one goes downward in a lake far enough to double the pressure from one atmosphere (105 Pa to 2 × 105 Pa), the fractional change in density is only δρ/ρ = (2 × 105 /2.2 × 109 ) ≃ one part in 10,000. Gases and plasmas, by contrast, are much less resistant to compression. Due to the large distance between molecules, a doubling of the pressure requires, in order of magnitude, a doubling of the density; i.e. δP δρ =Γ for a gas, (12.2) P ρ where Γ is a proportionality factor of order unity. The numerical value of Γ depends on the physical situation. If the gas is ideal (i.e., perfect) [so P = ρkB T /µmp in the notation of Box 12.2, Eq. (4)] and the temperature T is being held fixed by thermal contact with some heat source as the density changes (isothermal process), then δP ∝ δρ and Γ = 1. Alternatively, and much more commonly, the fluid’s entropy might remain constant because no significant heat can flow in or out of a fluid element during the density change. In this case Γ is called the adiabatic index, and (continuing to assume ideality, P = ρkB T /µmp ), it can be shown using the laws of thermodynamics that Γ = γ ≡ CP /CV

for adiabatic process in an ideal gas,

(12.3)

where CP , CV are the specific heats at constant pressure and volume; see Ex. 12.2. [Our specific heats, like the energy, entropy and enthalpy, are defined on a per unit mass basis, so CP = T (∂s/∂T )P is the amount of heat that must be added to a unit mass of the fluid to increase its temperature by one unit, and similarly for CV = T (∂s/∂T )ρ .] From Eqs. (12.1) and (12.2) we see that Γ = KP ; so why do we use K for liquids and Γ for gases and plasmas? Because in a liquid K remains nearly constant when P changes by large fractional amounts δP/P & 1, while in a gas or plasma it is Γ that remains nearly constant. For other thermodynamic aspects of fluid dynamics, which will be very important as we proceed, see Box 12.2.

5 Box 12.2 Thermodynamic Considerations One feature of fluid dynamics, especially gas dynamics, that distinguishes it from elastodynamics, is that the thermodynamic properties of the fluid are often very important and we must treat energy conservation explicitly. In this box we review, from Chap. 4, some of the thermodynamic concepts we shall need in our study of fluids; see also, e.g., Reif (1959). We shall have no need for partition functions, ensembles and other statistical aspects of thermodynamics. Instead, we shall only need elementary thermodynamics. We begin with the nonrelativistic first law of thermodynamics (4.8) for a sample of fluid with energy E, entropy S, volume V , number NI of molecules of species I, temperature T , pressure P , and chemical potential µI for species I: X dE = T dS − P dV + µI dNI . (1) I

Almost everywhere in our treatment P of fluid mechanics (and throughout this chapter), we shall assume that the term I µI dNI vanishes. Physically this happens because all relevant nuclear reactions are frozen (occur on timescles τreact far longer than the dynamical timescales τdyn of interest to us), so dNI = 0; and each chemical reaction is either frozen dNI = 0, or goes so rapidlyP(τreact ≪ τdyn ) that it and its inverse are in local thermodynamic equilibrium (LTE): I µI dNI = 0 for those species involved in the reactions. In the intermediate situation, where some relevant reaction has τreact ∼ τdyn , we would have to carefully keep track of the relative abundances of the chemical or nuclear species and their chemical potentials. Consider a small fluid element with mass ∆m, energy per unit mass u, entropy per unit mass s, and volume per unit mass 1/ρ. Then inserting E = u∆m, S = s∆m and V = ∆m/ρ into the first law dE = T dS − P dV , we obtain the form of the first law that we shall use in almost all of our fluid dynamics studies: 1 du = T ds − P d . ρ

(2)

The internal energy (per unit mass) u comprises the random translational energy of the molecules that make up the fluid, together with the energy associated with their internal degrees of freedom (rotation, vibration etc.) and with their intermolecular forces. The term T ds represents some amount of heat (per unit mass) that may get injected into a fluid element, e.g. by viscous heating (last section of this chapter), or may get removed, e.g. by radiative cooling. In fluid mechanics it is useful to introduce the enthalpy H = E + P V of a fluid element (cf. Ex. 4.3) and the corresponding enthalpy per unit mass h = u+P/ρ. Inserting u = h − P/ρ into the left side of the first law (2), we obtain the first law in the “enthalpy representation” [Eq. (4.23)]:

6 Box 12.2, Continued dh = T ds +

dP . ρ

(3)

Because all reactions are frozen or are in LTE, the relative abundances of the various nuclear and chemical species are fully determined by a fluid element’s density ρ and temperature T (or by any two other variables in the set ρ, T , s, and P ). Correspondingly, the thermodynamic state of a fluid element is completely determined by any two of these variables. In order to calculate all features of that state from two variables, we must know the relevant equations of state, such as P (ρ, T ) and s(ρ, T ); or P = P (ρ, s) and T = T (ρ, s); or the fluid’s fundamental thermodynamic potential (Table 4.1) from which follow the equations of state. We shall often deal with perfect gases (also called ideal gasses: gases in which intermolecular forces and the volume occupied by the molecules are treated as totally negligible). For any ideal gas, the pressure arises solely from the kinetic motions of the molecules and so the equation of state P (ρ, T ) is P =

ρkB T . µmp

(4)

Here µ is the mean molecular weight and mp is the proton mass [cf. Eq. (3.47c) with ¯ the number density of particles n = N/V reexpressed as ρ/µmp ] . The mean molecular weight µ is the mean mass per gas molecule in units of the proton mass (e.g., µ = 1 for hydrogen, µ = 32 for oxygen O2 , µ = 28.8 for air); and this µ should not be confused with the chemical potential of species I, µI (which will rarely if ever be used in our fluid mechanics analyses). [The concept of an ideal gas must not be confused an ideal fluid — one for which dissipative processes (viscosity and heat conductivity) are negligible.] An idealisation that is often accurate in fluid dynamics is that the fluid is adiabatic; that is to say there is no heating or cooling resulting from dissipative processes, such as viscosity, thermal conductivity or the emission and absorption of radiation. When this is a good approximation, the entropy per unit mass s of a fluid element is constant following a volume element with the flow, i.e. ds/dt = 0.

(5)

In an adiabatic flow, there is only one thermodynamic degree of freedom and so we can write P = P (ρ, s) = P (ρ). Of course, this function will be different for fluid elements that have different s. In the case of an ideal gas, a standard thermodynamic argument (Ex. 12.2) shows that the pressure in an adiabatically expanding or contracting fluid element varies with density as δP/P = γδρ/ρ, where γ = CP /CV is the adiabatic index

7 Box 12.2, Continued [Eqs. (12.2) and (12.3)]. If, as is often the case, the adiabatic index remains constant over a number of doublings of the pressure and density, then we can integrate this to obtain the equation of state P = K(s)ργ , (6) where K(s) is some function of the entropy. This is sometimes called the polytroic equation of state, and a polytropic index n not to be confused with number density of particles!) is defined by γ = 1 + 1/n. See, e.g., the discussion of stars and planets in Sec. 12.3.2, and Exs. 12.5. A special case of adiabatic flow is isentropic flow. In this case, the entropy is constant everywhere, not just along individual streamlines. Whenever the pressure can be regarded as a function of the density alone (the same function everywhere), the fluid is called barotropic. Note that barytropes are not necessarily isentropes; for example, in a fluid of sufficiently high thermal conductivity, the temperature will be constant everywhere (isothermal), thereby causing both P and s to be unique functions of ρ.

12.3

Hydrostatics

Just as we began our discussion of elasticity with a treatment of elastostatics, so we will introduce fluid mechanics by discussing hydrostatic equilibrium. The equation of hydrostatic equilibrium for a fluid at rest in a gravitational field g is the same as the equation of elastostatic equilibrium with a vanishing shear stress, so T = P g: ∇ · T = ∇P = ρg

(12.4)

[Eq. (10.14) with f = −∇ · T]. Here g is the acceleration of gravity (which need not be constant, e.g. it varies from location to location inside the Sun). It is often useful to express g as the gradient of the Newtonian gravitational potential Φ, g = −∇Φ .

(12.5)

Note our sign convention: Φ is negative near a gravitating body and zero far from all bodies. It is determined by Newton’s field equation for gravity ∇2 Φ = − ∇ · g = 4πGρ .

(12.6)

From Eq. (12.4), we can draw some immediate and important inferences. Take the curl of Eq. (12.4): ∇Φ × ∇ρ = 0 . (12.7) This tells us that, in hydrostatic equilibrium, the contours of constant density coincide with the equipotential surfaces, i.e. ρ = ρ(Φ), and Eq. (12.4) itself tells us that as we move from point to point in the fluid, the changes in P and Φ are related by dP/dΦ = −ρ(Φ). This, in

8

Water

Water

Water

g

Mercury P

1

P

P

2

3

Fig. 12.1: Elementary demonstration of the principle of hydrostatic equilibrium. Water and mercury, two immiscible fluids of different density, are introduced into a container with two connected chambers as shown. The pressure at each point on the bottom of the container is equal to the weight per unit area of the overlying fluids. The pressures P1 and P2 at the bottom of the left chamber are equal, but because of the density difference between mercury and water, they differ from the pressure P3 at the bottom of the right chamber.

turn, implies that the difference in pressure between two equipotential surfaces Φ1 and Φ2 is given by Z Φ2 ρ(Φ)dΦ, (12.8) ∆P = − Φ1

Moreover, as ∇P ∝ ∇Φ, the surfaces of constant pressure (the isobars) coincide with the gravitational equipotentials. This is all true when g varies inside the fluid, or when it is constant. The gravitational acceleration g is actually constant to high accuracy in most nonastrophysical applications of fluid dynamics, for example on the surface of the earth. In this case, the pressure at a point in a fluid is, from Eq. (12.8), equal to the total weight of fluid per unit area above the point, Z ∞ P (z) = g ρdz , (12.9) z

where the integral is performed by integrating upward in the gravitational field; cf. Fig. 12.1. For example, the deepest point in the world’s oceans is the bottom of the Marianas trench in the Pacific, 11.03 km. Adopting a density ≃ 103 kg m−3 for water and a value ≃ 10 m s−2 for g, we obtain a pressure of ≃ 108 Pa or ≃ 103 atmospheres. This is comparable with the yield stress of the strongest materials. It should therefore come as no surprize to discover that the deepest dive ever recorded by a submersible was made by the Trieste in 1960, when it reached a depth of 10.91 km, just a bit shy of the lowest point in the trench. Since the bulk modulus of water is K = 2.2 Gpa, at the bottom of the trench the water is compressed by δρ/ρ = P/K ≃ 5 per cent.

9

∂V

V

dΣ

Fig. 12.2: Derivation of Archimedes’ Law.

12.3.1

Archimedes’ Law

The Law of Archimedes, states that when a solid body is totally or partially immersed in a fluid in a uniform gravitational field g = −gez , the total buoyant upward force of the fluid on the body is equal to the weight of the displaced fluid. A formal proof can be made as follows; see Fig. 12.2. The fluid, pressing inward on the body across a small element of the body’s surface dΣ, exerts a force dFbuoy = T( , −dΣ), where T is the fluid’s stress tensor and the minus sign is because, by convention, dΣ points out of the body rather than into it. Converting to index notation and integrating over the body’s surface ∂V, we obtain for the net buoyant force Z buoy Fi =− Tij dΣj . (12.10) ∂V

Now, imagine removing the body and replacing it by fluid that has the same pressure P (z) and density ρ(z), at each height z, as the surrounding fluid; this is the fluid that was originally displaced by the body. Since the fluid stress on ∂V has not changed, the buoyant force will be unchanged. Use Gauss’s law to convert the surface integral (12.10) into a volume integral over the interior fluid (the originally displaced fluid) Z buoy Fi = − Tij;j dV . (12.11) V

The displaced fluid obviously is in hydrostatic equilibrium with the surrounding fluid, and its equation of hydrostatic equilibrium Tij;j = ρgi [Eq. (12.4)], when inserted into Eq. (12.11), implies that Z buoy F = −g ρdV = −Mg , (12.12) V

where M is the mass of the displaced fluid. Thus, the upward buoyant force on the original body is equal in magnitude to the weight Mg of the displaced fluid. Clearly, if the body has a higher density than the fluid, then the downward gravitational force on it (its weight) will exceed the weight of the displaced fluid and thus exceed the buoyant force it feels, and the body will fall. If the body’s density is less than that of the fluid, the buoyant force will exceed its weight and it will be pushed upward. A key piece of physics underlying Archimedes law is the fact that the intermolecular forces acting in a fluid, like those in a solid (cf. Sec. 10.3), are of short range. If, instead, the forces were of long range, Archimedes’ law could fail. For example, consider a fluid that is

10 electrically conducting, with currents flowing through it that produce a magnetic field and resulting long-range magnetic forces (the magnetohydrodynamic situation studied in Chap. 18). If we then substitute an insulating solid for some region V of the conducting fluid, the force that acts on the solid will be different from the force that acted on the displaced fluid.

12.3.2

Stars and Planets

Stars and massive planets—if we ignore their rotation—are self-gravitating fluid spheres. We can model the structure of a such non-rotating, spherical, self-gravitating fluid body by combining the equation of hydrostatic equilibrium (12.4) in spherical polar coordinates, dΦ dP = −ρ , dr dr with Poisson’s equation, 1 d ∇ Φ= 2 r dr 2

to obtain

1 d r 2 dr

2 dΦ r = 4πGρ , dr

r 2 dP ρ dr

= −4πGρ.

(12.13)

(12.14)

(12.15)

This can be integrated once radially with the aid of the boundary condition dP/dr = 0 at r = 0 (pressure cannot have a cusp-like singularity) to obtain dP Gm = −ρ 2 , dr r where m = m(r) ≡

Z

(12.16a)

r

4πρr 2 dr

(12.16b)

0

is the total mass inside radius r. Equation (12.16a) is an alternative form of the equation of hydrostatic equilibrium at radius r inside the body: Gm/r 2 is the gravitational acceleration g at r, ρ(Gm/r 2 ) = ρg is the downward gravitational force per unit volume on the fluid, and dP/dr is the upward buoyant force per unit volume. Equations (12.13)—(12.16b) are a good approximation for solid planets such as Earth, as well as for stars and fluid planets such as Jupiter, because, at the enormous stresses encountered in the interior of a solid planet, the strains are so large that plastic flow will occur. In other words, the limiting shear stresses are much smaller than the isotropic part of the stress tensor. Let us make an order of magnitude estimate of the interior pressure in a star or planet of mass M and radius R. We use the equation of hydrostatic equilibrium (12.4) or (12.16a), approximating m by M, the density ρ by M/R3 and the gravitational acceleration by GM/R2 , so that GM 2 P ∼ . (12.17) R4

11 In order to improve upon this estimate, we must solve Eq. (12.15). We therefore need a prescription for relating the pressure to the density. A common idealization is the polytropic relation, namely that P ∝ ρ1+1/n (12.18) where n is called the polytropic index (cf. last part of Box 12.2). [This finesses the issue of the thermal balance of stellar interiors, which determines the temperature T (r) and thence the pressure P (ρ, T ).] Low mass white dwarf stars are well approximated as n = 1.5 polytropes [Eq. (2.50c)], and red giant stars are somewhat similar in structure to n = 3 polytropes. The giant planets, Jupiter and Saturn mainly comprise a H-He fluid which is well approximated by an n = 1 polytrope, and the density of a small planet like Mercury is very roughly constant (n = 0). We also need boundary conditions to solve Eqs. (12.16). We can choose some density ρc and corresponding pressure Pc = P (ρc ) at the star’s center r = 0, then integrate Eqs. (12.16) outward until the pressure P drops to zero, which will be the star’s (or planet’s) surface. The values of r and m there will be the star’s radius R and mass M. For details of polytropic stellar models constructed in this manner see, e.g., Chandrasekhar (1939); for the case n = 1, see Ex. 12.5 below. We can easily solve the equation of hydrostatic equilibrium (12.16a) for a constant density (n = 0) star to obtain r2 (12.19) P = P0 1 − 2 , R where the central pressure is P0 =

3 8π

GM 2 , R4

(12.20)

consistent with our order of magnitude estimate (12.17).

12.3.3

Hydrostatics of Rotating Fluids

The equation of hydrostatic equilibrium (12.4) and the applications of it discussed above are valid only when the fluid is static in a reference frame that is rotationally inertial. However, they are readily extended to bodies that rotate rigidly, with some uniform angular velocity Ω relative to an inertial frame. In a frame that corotates with the body, the fluid will have vanishing velocity v, i.e. will be static, and the equation of hydrostatic equilibrium (12.4) will be changed only by the addition of the centrifugal force per unit volume: ∇P = ρ(g + g cen ) = −ρ∇(Φ + Φcen ) .

(12.21)

g cen = −Ω × (Ω × r) = −∇Φcen

(12.22)

Here is the centrifugal acceleration; ρg cen is the centrifugal force per unit volume; and 1 Φcen = − (Ω × r)2 . 2

(12.23)

12 is a centrifugal potential whose gradient is equal to the centrifugal acceleration in our situation of constant Ω. The centrifugal potential can be regarded as an augmentation of the gravitational potential Φ. Indeed, in the presence of uniform rotation, all hydrostatic theorems [e.g., Eqs. (12.7) and (12.8)] remain valid with Φ replaced by Φ + Φcen . We can illustrate this by considering the shape of a spinning fluid planet. Let us suppose that almost all the mass of the planet is concentrated in its core so the gravitational potential Φ = −GM/r is unaffected by the rotation. Now, the surface of the planet must be an equipotential of Φ + Φcen (coinciding with the zero-pressure isobar) [cf. Eq. (12.7) and subsequent sentences, with Φ → Φ + Φcen ]. The contribution of the centrifugal potential at the equator is −Ω2 Re2 /2 and at the pole zero. The difference in the gravitational potential Φ between the equator and the pole is ≃ g(Re − Rp ) where Re , Rp are the equatorial and polar radii respectively and g is the gravitational acceleration at the planet’s surface. Therefore, adopting this centralized-mass model, we estimate the difference between the polar and equatorial radii to be Ω2 R2 Re − Rp ≃ (12.24) 2g The earth, although not a fluid, is unable to withstand large shear stresses (because its shear strain cannot exceed ∼ 0.001); therefore its surface will not deviate by more than the maximum height of a mountain from its equipotential. If we substitute g ≃ 10m s−2 , R ≃ 6 × 106 m and Ω ≃ 7 × 10−5 rad s−1 , we obtain Re − Rp ≃ 10km, about half the correct value of 21km. The reason for this discrepancy lies in our assumption that all the mass lies in the center. In fact, it is distributed fairly uniformly in radius and, in particular, some mass is found in the equatorial bulge. This deforms the gravitational equipotential surfaces from spheres to ellipsoids, which accentuates the flattening. If, following Newton (in his Principia Mathematica 1687), we assume that the earth has uniform density then the flattening estimate is about 2.5 times larger than the actual flattening (Ex. 12.6), in fairly good agreement with the Earth’s shape. **************************** EXERCISES Exercise 12.1 Practice: Weight in Vacuum How much more would you weigh in vacuo? Exercise 12.2 Derivation: Adiabatic Index Show that for an ideal gas [one with equation of state P = (k/µmp )ρT ; Eq. (4) of Box 12.2], the specific heats are related by CP = CV + k/(µmp ), and the adiabatic index is Γ = γ ≡ CP /CV . [The solution is given in most thermodynamics textbooks.] Exercise 12.3 Example: Earth’s Atmosphere As mountaineers know, it gets cooler as you climb. However, the rate at which the temperature falls with altitude depends upon the assumed thermal properties of air. Consider two limiting cases.

13 Altitude (km) 180

Thermosphere Mesopause Mesosphere Stratopause

48 35 Stratosphere 16

Troposphere 0 180

220

270

295 T (K)

Fig. 12.3: Actual temperature variation in the Earth’s mean atmosphere at temperate latitudes.

(a) In the lower stratosphere (Fig. 12.3), the air is isothermal. Use the equation of hydrostatic equilibrium (12.4) to show that the pressure decreases exponentially with height z P ∝ exp(−z/H), where the scale height H is given by H=

kB T µmp g

and µ is the mean molecular weight of air and mp is the proton mass. Evaluate this numerically for the lower stratosphere and compare with the stratosphere’s thickness. By how much does P drop between the bottom and top of the isothermal region? (b) Suppose that the air is isentropic so that P ∝ ργ [Eq. (6) of Box 12.2], where γ is the specific heat ratio. (For diatomic gases like nitrogen and oxygen, γ ∼ 1.4.) Show that the temperature gradient satisfies dT γ − 1 gµmp =− . dz γ k Note that the temperature gradient vanishes when γ → 1. Evaluate the temperature gradient, otherwise known as the lapse rate at low altitudes. The average lapse rate at low altitudes is measured to be ∼ 6K km−1 (Fig. 12.3). Show that this is intermediate between the two limiting cases of an isentropic and isothermal lapse rate.

14 Center of Buoyancy

Center of Gravity

Fig. 12.4: Stability of a Boat. We can understand the stability of a boat to small rolling motions by defining both a center of gravity for weight of the boat and also a center of buoyancy for the upthrust exerted by the water.

Exercise 12.4 Problem: Stability of Boats Use Archimedes Law to explain qualitatively the conditions under which a boat floating in still water will be stable to small rolling motions from side to side. [Hint, you might want to introduce a center of buoyancy inside the boat, as in Figure 12.4.] Exercise 12.5 Problem: Jupiter and Saturn The text described how to compute the central pressure of a non-rotating, constant density planet. Repeat this exercise for the polytropic relation P = Kρ2 (polytropic index n = 1), appropriate to Jupiter and Saturn. Use the information that MJ = 2 × 1027 kg, MS = 6 × 1026 kg, RJ = 7 × 104 km to estimate the radius of Saturn. Hence, compute the central pressures, gravitational binding energy and polar moments of inertia of both planets. Exercise 12.6 Example: Shape of a constant density, spinning planet (a) Show that the spatially variable part of the gravitational potential for a uniform density, non-rotating planet can be written as Φ = 2πGρr 2 /3, where ρ is the density. (b) Hence argue that the gravitational potential for a slowly spinning planet can be written in the form 2πGρr 2 Φ= + Ar 2 P2 (µ) 3 where A is a constant and P2 is a Legendre polynomial of µ = sin(latitude). What happens to the P1 term? (c) Give an equivalent expansion for the potential outside the planet. (d) Now transform into a frame spinning with the planet and add the centrifugal potential to give a total potential. (e) By equating the potential and its gradient at the planet’s surface, show that the difference between the polar and the equatorial radii is given by Re − Rp ≃

5Ω2 R2 , 4g

where g is the gravitational acceleration at the surface. Note that this is 5 times the answer for a planet whose mass is all concentrated at its center [Eq. (12.24)].

15 Exercise 12.7 Problem: Shapes of Stars in a Tidally Locked Binary System Consider two stars, with the same mass M orbiting each other in a circular orbit with diameter (separation between the stars’ centers) a. Kepler’s laws tell us that their orbital p angular velocity is Ω = 2GM/a3 . Assume that each star’s mass is concentrated near its center so that everywhere except near a star’s center the gravitational potential, in an inertial frame, is Φ = −GM/r1 − GM/r2 with r1 and r2 the distances of the observation point from the center of star 1 and star 2. Suppose that the two stars are “tidally locked”, i.e. tidal gravitational forces have driven them each to rotate with rotational angular velocity equal to the orbital angular velocity Ω. (The moon is tidally locked to the earth; that is why it always keeps the same face toward the earth.) Then in a reference frame that rotates with angular velocity Ω, each star’s gas will be at rest, v = 0. (a) Write down the total potential Φ + Φcen for this binary system. (b) Using Mathematica or Maple or some other computer software, plot the equipotentials Φ + Φcen = (constant) for this binary in its orbital plane, and use these equipotentials to describe the shapes that these stars will take if they expand to larger and larger radii (with a and M held fixed). You should obtain a sequence in which the stars, when compact, are well separated and nearly round, and as they grow tidal gravity elongates them, ultimately into tear-drop shapes followed by merger into a single, highly distorted star. With further expansion there should come a point where they start flinging mass off into the surrounding space (a process not included in this hydrostatic analysis).

****************************

12.4

Conservation Laws

As a foundation for making the transition from hydrostatics to hydrodynamics [to situations with nonzero fluid velocity v(x, t)], we shall give a general discussion of Newtonian conservation laws, focusing especially on the conservation of mass and of linear momentum. We begin with the differential law of mass conservation, ∂ρ + ∇ · (ρv) = 0 , ∂t

(12.25)

which we met and used in our study of elastic media [Eq. (11.2c)]. This is the obvious analog of the laws of conservation of charge ∂ρe /∂t+∇·j = 0 and of particles ∂n/∂t+∇·S = 0, which we met in Chapter 2 [Eqs. (1.73)]. In each case the law says (∂/∂t)(density of something) = ∇·( flux of that something). This, in fact, is the universal form for a differential conservation law. Each Newtonian differential conservation law has a corresponding integral conservation law, which we obtain by integrating the differential law over some arbitrary 3-dimensional volR ume V , e.g. the volume used in Fig. 12.2 above to discuss Archimedes’ Law: (d/dt) V ρdV =

16 R

(∂ρ/∂t)dV = − V

R

V

∇ · (ρv)dV . Applying Gauss’s law to the last integral, we obtain Z Z d ρdV = − ρv · dΣ , (12.26) dt V ∂V

where ∂V is the closed surface bounding V. The left side is the rate of change of mass inside the region V. The right side is the rate at which mass flows into V through ∂V (since ρv is the mass flux, and the inward pointing surface element is −dΣ). This is the same argument, connecting differential to integral conservation laws, as we gave in Eqs. (1.72) and (1.73) for electric charge and for particles, but going in the opposite direction. And this argument depends in no way on whether the flowing material is a fluid or not. The mass conservation laws (12.25) and (12.26) are valid for any kind of material whatsoever. Writing the differential conservation law in the form (12.25), where we monitor the changing density at a given location in space rather than moving with the material, is called the Eulerian approach. There is an alternative Lagrangian approach to mass conservation, in which we focus on changes of density as measured by somebody who moves, locally, with the material, i.e. with velocity v. We obtain this approach by differentiating the product ρv in Eq. (12.25), to obtain dρ (12.27) = −ρ∇ · v , dt where d ∂ ≡ +v·∇. dt ∂t

(12.28)

The operator d/dt is known as the convective time derivative (or advective time derivative) and crops up often in continuum mechanics. Its physical interpretation is very simple. Consider first the partial derivative (∂/∂t)x . This is the rate of change of some quantity [the density ρ in Eq. (12.27)] at a fixed point in space in some reference frame. In other words, if there is motion, ∂/∂t compares this quantity at the same point P in space for two different points in the material: one that was at P at time t + dt; the other that was at P at the earlier time dt. By contrast, the convective time derivative (d/dt) follows the motion, taking the difference in the value of the quantity at successive times at the same point in the moving matter. It therefore measures the rate of change of ρ (or any other quantity) following the material rather than at a fixed point in space; it is the time derivative for the Lagrangian approach. Note that the convective derivative d/dt is the Newtonian limit of relativity’s proper time derivative along the world line of a bit of matter, d/dτ = uα ∂/∂xα = (dxα /dτ )∂/∂xα [Secs. 1.4.2 and 1.6]. The Lagrangian approach can also be expressed in terms of fluid elements. Consider a fluid element, with a bounding surface attached to the fluid, and denote its volume by ∆V . The mass inside the fluid element is ∆M = ρ∆V . As the fluid flows, this mass must be conserved, so d∆M/dt = (dρ/dt)∆V + ρ(d∆V /dt) = 0, which we can rewrite as dρ d∆V /dt = −ρ . dt ∆V

(12.29)

17 Comparing with Eq. (12.27), we see that ∇·v =

d∆V /dt . ∆V

(12.30)

Thus, the divergence of v is the fractional rate of increase of a fluid element’s volume. Notice that this is just the time derivative of our elastostatic equation ∆V /V = ∇ · ξ = Θ [Eq. (10.8)] (since v = dξ/dt), and correspondingly we denote ∇ · v ≡ θ = dΘ/dt ,

(12.31)

and call it the fluid’s rate of expansion. Equation (12.25) is our model for Newtonian conservation laws. It says that there is a quantity, in this case mass, with a certain density, in this case ρ, and a certain flux, in this case ρv, and this quantity is neither created nor destroyed. The temporal derivative of the density (at a fixed point in space) added to the divergence of the flux must vanish. Of course, not all physical quantities have to be conserved. If there were sources or sinks of mass, then these would be added to the right hand side of Eq. (12.25). Turn, now, to momentum conservation. The (Newtonian) law of momentum conservation must take the standard conservation-law form (∂/∂t)(momentum density) +∇ · (momentum flux) = 0. If we just consider the mechanical momentum associated with the motion of mass, its density is the vector field ρv. There can also be other forms of momentum density, e.g. electromagnetic, but these do not enter into Newtonian fluid mechanics. For fluids, as for an elastic medium (Chap. 11), the momentum density is simply ρv. The momentum flux is more interesting and rich. Quite generally it is, by definition, the stress tensor T, and the differential conservation law says ∂(ρv) +∇·T=0. ∂t

(12.32)

[Eq. (1.90)]. For an elastic medium, T = −KΘg − 2µΣ [Eq. (10.18)] and the conservation law (12.32) gives rise to the elastodynamic phenomena that we explored in Chap. 11. For a fluid we shall build up T piece by piece: We begin with the rate dp/dt that mechanical momentum flows through a small element of surface area dΣ, from its back side to its front (i.e. the rate that it flows in the “positive sense”; cf. Fig. 1.16b). The rate that mass flows through is ρv·dΣ, and we multiply that mass by its velocity v to get the momentum flow: dp/dt = (ρv)(v · dΣ). This flow of momentum is the same thing as a force F = dp/dt acting across dΣ; so it can be computed by inserting dΣ into the second slot of a “mechanical” stress tensor Tm : dp/dt = T( , dΣ) [cf. the definition (1.88) of the stress tensor]. By writing these two expressions for the momentum flow in index notation, dpi/dt = (ρvi )vj dΣj = Tij dΣj , we read off the mechanical stress tensor: Tij = ρvi vj ; i.e., Tm = ρv ⊗ v . (12.33)

18 This tensor is symmetric (as any stress tensor must be), and it obviously is the flux of mechanical momentum since it has the form (momentum density)⊗(velocity). Let us denote by f the net force per unit volume that acts on the fluid. Then, instead of writing momentum conservation in the usual Eulerian differential form (12.32), we can write it as ∂(ρv) + ∇ · Tm = f, (12.34) ∂t (conservation law with a source on the right hand side!). Inserting Tm = ρv ⊗ v into this equation, converting to index notation, using the rule for differentiating products, and combining with the law of mass conservation, we obtain the Lagrangian law ρ

dv =f . dt

(12.35)

Here d/dt = ∂/∂t + v · ∇ is the convective time derivative, i.e. the time derivative moving with the fluid; so this equation is just Newton’s “F=ma”, per unit volume. In order for the equivalent versions (12.34) and (12.35) of momentum conservation to also be equivalent to the Eulerian formulation (12.32), it must be that there is a stress tensor Tf such that f = −∇ · Tf ;

and T = Tm + Tf .

(12.36)

Then Eq. (12.34) becomes the Eulerian conservation law (12.32). Evidently, a knowledge of the stress tensor Tf for some material is equivalent to a knowledge of the force density f that acts on it. Now, it often turns out to be much easier to figure out the form of the stress tensor, for a given situation, than the form of the force. Correspondingly, as we add new pieces of physics to our fluid analysis (isotropic pressure, viscosity, gravity, magnetic forces), an efficient way to proceed at each stage is to insert the relevant physics into the stress tensor T, and then evaluate the resulting contribution f = −∇ · Tf to the force and thence to the Lagrangian law of force balance (12.35). At each step, we get out in f = −∇ · Tf the physics that we put into Tf . There may seem something tautological about the procedure (12.36) by which we went from the Lagrangian “F=ma” equation (12.35) to the Eulerian conservation law (12.32). the “F=ma” equation makes it look like mechanical momentum is not be conserved in the presence of the force density f. But we make it be conserved by introducing the momentum flux Tf . It is almost as if we regard conservation of momentum as a principle to be preserved at all costs and so every time there appears to be a momentum deficit, we simply define it as a bit of the momentum flux. This, however, is not the whole story. What is important is that the force density f can always be expressed as the divergence of a stress tensor; that fact is central to the nature of force and of momentum conservation. An erroneous formulation of the force would not necessarily have this property and there would not be a differential conservation law. So the fact that we can create elastostatic, thermodynamic, viscous, electromagnetic, gravitational etc. contributions to some grand stress tensor (that go to zero outside the regions occupied by the relevant matter or fields), as we shall see in the coming chapters, is significant and affirms that our physical model is complete at the level of approximation to which we are working.

19 We can proceed in the same way with energy conservation as we have with momentum. There is an energy density U(x, t) for a fluid and an energy flux F(x, t), and they obey a conservation law with the standard form ∂U +∇·F= 0. ∂t

(12.37)

At each stage in our buildup of fluid mechanics (adding, one by one, the influences of compressional energy, viscosity, gravity, magnetism), we can identify the relevant contributions to U and F and then grind out the resulting conservation law (12.37). At each stage we get out the physics that we put into U and F. We conclude with a remark about relativity. In going from Newtonian physics (this chapter) to special relativity (Chap. 1), mass and energy get combined (added) to form a conserved mass-energy or total energy. That total energy and the momentum are the temporal and spatial parts of a spacetime 4-vector, the 4-momentum; and correspondingly, the conservation laws for mass [Eq. (12.25)], nonrelativistic energy [Eq. (12.37)], and momentum [Eq. (12.32)] get unified into a single conservation law for 4-momentum, which is expressed as the vanishing 4-dimensional, spacetime divergence of the 4-dimensional stress-energy tensor (Sec. 1.12).

12.5

Conservation Laws for an Ideal Fluid

We now turn from hydrostatic situations to fully dynamical fluids. We shall derive the fundamental equations of fluid dynamics in several stages. In this section, we will confine our attention to ideal fluids, i.e., flows for which it is safe to ignore dissipative processes (viscosity and thermal conductivity), and for which, therefore, the entropy of a fluid element remains constant with time. In the next section we will introduce the effects of viscosity, and in Chap. 17 we will introduce heat conductivity. At each stage, we will derive the fundamental fluid equations from the even-more-fundamental conservation laws for mass, momentum, and energy.

12.5.1

Mass Conservation

Mass conservation, as we have seen, takes the (Eulerian) form ∂ρ/∂t + ∇ · (ρv) = 0 [Eq. (12.25)], or equivalently the (Lagrangian) form dρ/dt = −ρ∇ · v [Eq. (12.27)], where d/dt = ∂/∂t + v · ∇ is the convective time derivative (moving with the fluid) [Eq. (12.28)]. We define a fluid to be incompressible when dρ/dt = 0. Note: incompressibility does not mean that the fluid cannot be compressed; rather, it merely means that in the situation being studied, the density of each fluid element remains constant as time passes. From Eq. (12.28), we see that incompressibility implies that the velocity field has vanishing divergence (i.e. it is solenoidal, i.e. expressible as the curl of some potential). The condition that the fluid be incompressible is a weaker condition than that the density be constant everywhere; for example, the density varies substantially from the earth’s center to its surface, but if the material inside the earth were moving more or less on surfaces of constant radius, the flow

20 would be incompressible. As we shall shortly see, approximating a flow as incompressible is a good approximation when the flow speed is much less than the speed of sound and the fluid does not move through too great gravitational potential differences.

12.5.2

Momentum Conservation

For an ideal fluid, the only forces that can act are those of gravity and of the fluid’s isotropic pressure P . We have already met and discussed the contribution of P to the stress tensor, T = P g, when dealing with elastic media (Chap. 10) and in hydrostatics (Sec. 12.3). The gravitational force density, ρg, is so familiar that it is easier to write it down than the corresponding gravitational contribution to the stress. Correspondingly, we can most easily write momentum conservation in the form ∂(ρv) + ∇ · T = ρg ; ∂t

i.e.

∂(ρv) + ∇ · (ρv ⊗ v + P g) = ρg , ∂t

(12.38)

where the stress tensor is given by T = ρv ⊗ v + P g for an ideal fluid

(12.39)

[cf. Eqs. (12.33), (12.34) and (12.4)]. The first term, ρv ⊗ v, is the mechanical momentum flux (also called the kinetic stress), and the second, P g, is that associated with the fluid’s pressure. In most of our applications, the gravitational field g will be externally imposed, i.e., it will be produced by some object such as the Earth that is different from the fluid we are studying. However, the law of momentum conservation remains the same, Eq. (12.38), independently of what produces gravity, the fluid or an external body or both. And independently of its source, one can write the stress tensor Tg for the gravitational field g in a form presented and discussed in Box 12.3 below — a form that has the required property −∇ · Tg = ρg = (the gravitational force density).

12.5.3

Euler Equation

The “Euler equation” is the equation of motion that one gets out of the momentum conservation law (12.38) by performing the differentiations and invoking mass conservation (12.25): dv ∇P =− +g dt ρ

for an ideal fluid.

(12.40)

This Euler equation was first derived in 1785 by the Swiss mathematician and physicist Leonhard Euler. The Euler equation has a very simple physical interpretation: dv/dt is the convective derivative of the velocity, i.e. the derivative moving with the fluid, which means it is the acceleration felt by the fluid. This acceleration has two causes: gravity, g, and the pressure gradient ∇P . In a hydrostatic situation, v = 0, the Euler equation reduces to the equation of hydrostatic equilibrium, ∇P = ρg [Eq. (12.4)]

21 In Cartesian coordinates, the Euler equation (12.40) and mass conservation (12.25) comprise four equations in five unknowns, ρ, P, vx , vy , vz . In order to close this system of equations, we must relate P to ρ. For an ideal fluid, we use the fact that the entropy of each fluid element is conserved (because there is no mechanism for dissipation), ds =0, dt

(12.41)

together with an equation of state for the pressure in terms of the density and the entropy, P = P (ρ, s). In practice, the equation of state is often well approximated by incompressibility, ρ = constant, or by a polytropic relation, P = K(s)ρ1+1/n [Eq. (12.18)].

12.5.4

Bernoulli’s Theorem; Expansion, Vorticity and Shear

Bernoulli’s theorem is well known. Less well appreciated are the conditions under which it is true. In order to deduce these, we must first introduce a kinematic quantity known as the vorticity, ω = ∇ × v. (12.42) The attentive reader may have noticed that there is a parallel between elasticity and fluid dynamics. In elasticity, we are concerned with the gradient ∇ξ of the displacement vector field ξ and we decompose it into expansion Θ, rotation R or φ = 21 ∇×ξ, and shear Σ. In fluid dynamics, we are interested in the gradient ∇v of the velocity field v = dξ/dt and we make an analogous decomposition. The fluid analog of expansion Θ = ∇ · ξ is [as we saw when discussing mass conservation, Eq. (12.31)] its time derivative θ ≡ ∇ · v = dΘ/dt, the rate of expansion. Rotation φ is uninteresting in elastostatics because it causes no stress. Vorticity ω ≡ ∇ × v = 2dφ/dt is its fluid counterpart, and although primarily a kinematic quantity, it plays a vital role in fluid dynamics because of its close relation to angular momentum; we shall discuss it in more detail in the following chapter. Shear Σ is responsible for the shear stress in elasticity. We shall meet its counterpart, the rate of shear tensor σ = dΣ/dt below when we introduce the viscous stress tensor. To derive the Bernoulli theorem, we begin with the Euler equation dv/dt = −(1/ρ)∇P + g; we express g as −∇Φ; we convert the convective derivative of velocity (i.e. the acceleration) into its two parts dv/dt = ∂v/∂t + (v · ∇)v; and we rewrite (v · ∇)v using the vector identity 1 v × ω ≡ v × (∇ × v) = ∇v 2 − (v · ∇)v . (12.43) 2 The result is ∂v 1 ∇P + ∇( v 2 + Φ) + − v × ω = 0. (12.44) ∂t 2 ρ This is just the Euler equation written in a new form, but it is also the most general version of the Bernoulli theorem. Two special cases are of interest: (i) Steady flow of an ideal fluid. A steady flow is one in which ∂(everything)/∂t = 0, and an ideal fluid is one in which dissipation (due to viscosity and heat flow) can be ignored.

22 Ideality implies that the entropy is constant following the flow, i.e. ds/dt = (v·∇)s = 0. From the thermodynamic identity, dh = T ds + dP/ρ [Eq. (3) of Box 12.2] we obtain (v · ∇)P = ρ(v · ∇)h.

(12.45)

(Remember that the flow is steady so there are no time derivatives.) Now, define the Bernoulli function, B, by 1 B ≡ v2 + h + Φ . (12.46) 2 This allows us to take the scalar product of the gradient of Eq. (12.46) with the velocity v to rewrite Eq. (12.44) in the form dB = (v · ∇)B = 0, dt

(12.47)

This says that the Bernoulli function, like the entropy, does not change with time in a fluid element. Let us define streamlines, analogous to lines of force of a magnetic field, by the differential equations dy dz dx = = (12.48) vx vy vz In the language of Sec. 1.5, these are just the integral curves of the (steady) velocity field; they are also the spatial world lines of the fluid elements. Equation (12.47) says that the Bernoulli function is constant along streamlines in a steady, ideal flow. (ii) Irrotational flow of an isentropic fluid. An even more specialized type of flow is one where the vorticity vanishes and the entropy is constant everywhere. A flow in which ω = 0 is called an irrotational flow. (Later we shall learn that, if an incompressible flow initially is irrotational and it encounters no walls and experiences no significant viscous stresses, then it remains always irrotational.) Now, as the curl of the velocity field vanishes, we can follow the electrostatic precedent and introduce a velocity potential ψ(x, t) so that at any time, v = ∇ψ

for an irrotational flow.

(12.49)

A flow in which the entropy is constant everywhere is called isentropic (Box 12.2). Now, the first law of thermodynamics [Eq. (3) of Box 12.2] implies that ∇h = T ∇s + (1/ρ)∇P . Therefore, in an isentropic flow, ∇P = ρ∇h. Imposing these conditions on Eq. (12.44), we obtain, for an isentropic, irrotational flow: ∂ψ + B = 0. (12.50) ∇ ∂t Thus, the quantity ∂ψ/∂t + B will be constant everywhere in the flow, not just along streamlines. (If it is a function of time, we can absorb that function into ψ without affecting v, leaving it constant in time as well as in space.) Of course, if the flow is steady so ∂(everything)/∂t = 0, then B itself is constant. Note the important restriction that the vorticity in the flow must vanish.

23 v

Air

O O O S

v

M Manometer

Fig. 12.5: Schematic illustration of a Pitot tube used to measure airspeed. The tube points into the flow well away from the boundary layer. A manometer measures the pressure difference between the stagnation points S, where the external velocity is very small, and several orifices O in the side of the tube where the pressure is almost equal to that in the free air flow. The air speed can then be inferred by application of the Bernoulli theorem.

The most immediate consequence of Bernoulli’s theorem in a steady, ideal flow (constancy of B = 12 v 2 + h + Φ along flow lines) is that the enthalpy h falls when the speed increases. For an ideal gas in which the adiabatic index γ is constant over a large range of densities so P ∝ ργ , the enthalpy is simply h = c2 /(γ − 1), where c is the speed of sound. For an incompressible liquid, it is P/ρ. Microscopically, what is happening is that we can decompose the motion of the constituent molecules into a bulk motion and a random motion. The total kinetic energy should be constant after allowing for variation in the gravitational potential. As the bulk kinetic energy increases, the random or thermal kinetic energy must decrease, leading to a reduction in pressure. A simple, though important application of the Bernoulli theorem is to the Pitot tube which is used to measure air speed in an aircraft (Figure 12.5). A Pitot tube extends out from the side of the aircraft and points into the flow. There is one small orifice at the end where the speed of the gas relative to the tube is small and several apertures along the tube, where the gas moves with approximately the air speed. The pressure difference between the end of the tube and the sides is measured using an instrument called a manometer and is then converted into an airspeed using the formula v = (2∆P/ρ)1/2 . For v ∼ 100m s−1 , ρ ∼ 1kg m−3 , ∆P ∼ 5000N m−3 ∼ 0.05 atmospheres. Note that the density of the air ρ will vary with height.

12.5.5

Conservation of Energy

As well as imposing conservation of mass and momentum, we must also address energy conservation. So far, in our treatment of fluid dynamics, we have finessed this issue by simply postulating some relationship between the pressure P and the density ρ. In the case of ideal fluids, this is derived by requiring that the entropy be constant following the flow. In this case, we are not required to consider the energy to derive the flow. However, understanding how energy is conserved is often very useful for gaining physical insight. Furthermore, it is imperative when dissipative processes operate.

24 Quantity Mass Momentum Energy

Density ρ ρv U = ( 21 v 2 + u + Φ)ρ

Flux ρv T = P g + ρv ⊗ v F = ( 21 v 2 + h + Φ)ρv

Table 12.1: Densities and Fluxes of mass, momentum, and energy for an ideal fluid in an externally produced gravitational field.

The most fundamental formulation of the law of energy conservation is Eq. (12.37): ∂U/∂t + ∇ · F = 0. To explore its consequences for an ideal fluid, we must insert the appropriate ideal-fluid forms of the energy density U and energy flux F. When (for simplicity) the fluid is in an externally produced gravitational field Φ, its energy density is obviously 1 2 v +u+Φ for ideal fluid with external gravity. U =ρ (12.51) 2 Here the three terms are kinetic, internal, and gravitational. When the fluid participates in producing gravity and one includes the energy of the gravitational field itself, the energy density is a bit more subtle; see Box 12.3. In an external field one might expect the energy flux to be F = Uv, but this is not quite correct. Consider a bit of surface area dA orthogonal to the direction in which the fluid is moving, i.e., orthogonal to v. The fluid element that crosses dA during time dt moves through a distance dl = vdt, and as it moves, the fluid behind this element exerts a force P dA on it. That force, acting through the distance dl, feeds an energy dE = (P dA)dl = P vdAdt across dA; the corresponding energy flux across dA has magnitude dE/dAdt = P v and obviously points in the v direction, so it contributes P v to the energy flux F. This contribution is missing from our initial guess F = Uv. We shall explore its importance at the end of this subsection. When it is added to our guess, we obtain for the total energy flux 1 2 F = ρv (12.52) v + h + Φ for ideal fluid with external gravity. 2 Here h = u + P/ρ is the enthalpy per unit mass [cf. Box 12.2]. Inserting Eqs. (12.51) and (12.52) into the law of energy conservation (12.37), and requiring that the external gravity be static (time independent) so the work it does on the fluid is conservative, we get out the following ideal-fluid equation of energy balance: ∂ 1 2 1 2 ρ v + u + Φ +∇· ρv v +h+Φ = 0 for ideal fluid & static external gravity. ∂t 2 2 (12.53) When the gravitational field is dynamical and/or being generated by the fluid itself, we must use a more complete gravitational energy density and stress; see Box 12.3. By combining this law of energy conservation with the corresponding laws of momentum and mass conservation (12.25) and (12.38), and using the first law of thermodynamics dh =

25 Nozzle

v1

P1

P2

v

2

Fig. 12.6: Joule-Kelvin cooling of a gas. Gas flows steadily through a nozzle from a chamber at high pressure to one at low pressure. The flow proceeds at constant enthalpy. Work done against the intermolecular forces leads to cooling. The efficiency of cooling is enhanced by exchanging heat between the two chambers. Gases can also be liquefied in this manner as shown here.

T ds + (1/ρ)dP , we obtain the remarkable result that the entropy per unit mass is conserved moving with the fluid. ds = 0 for an ideal fluid. (12.54) dt The same conclusion can be obtained when the gravitational field is dynamical and not external (cf. Box 12.3 and Ex. 12.14]), so no statement about gravity is included with this equation. This entropy conservation should not be surprising. If we put no dissipative processes into the energy density or stress tensor, then we get no dissipation out. Moreover, the calculation that leads to Eq. (12.54) assures us that, so long as we take full account of mass and momentum conservation, then the full and sole content of the law of energy conservation for an ideal fluid is ds/dt = 0. Let us return to the contribution P v to the energy flux. A good illustration of the necessity for this term is provided by the Joule-Kelvin method commonly used to cool gases (Fig. 12.6). In this method, gas is driven under pressure through a nozzle or porous plug into a chamber where it can expand and cool. Microscopically, what is happening is that the molecules in a gas are not completely free but attract one another through intermolecular forces. When the gas expands, work is done against these forces and the gas therefore cools. Now let us consider a steady flow of gas from a high pressure chamber to a low pressure chamber. The flow is invariably so slow (and gravity so weak!) that the kinetic and gravitational potential energy contributions can be ignored. Now as the mass flux ρv is also constant the enthalpy per unit mass, h must be the same in both chambers. The actual temperature drop is given by Z P2

µJK dP,

∆T =

(12.55)

P1

where µJK = (∂T /∂P )h is the Joule-Kelvin coefficient. A straighforward thermodynamic calculation yields the identity 1 ∂(ρT ) µJK = − 2 (12.56) ρ Cp ∂T P The Joule-Kelvin coefficient of a perfect gas obviously vanishes.

26

12.6

Incompressible Flows

A common assumption that is made when discussing the fluid dynamics of highly subsonic flows is that the density is constant, i.e., that the fluid is incompressible. This is a natural approximation to make when dealing with a liquid like water which has a very large bulk modulus. It is a bit of a surprise that it is also useful for flows of gases, which are far more compressible under static conditions. To see its validity, suppose that we have a flow in which the characteristic length L over which the fluid variables P, ρ, v etc. vary is related to the characteristic timescale T over which they vary by L . vT —and in which gravity is not important. In this case, we can compare the magnitude of the various terms in the Euler equation (12.40) to obtain an estimate of the magnitude of the pressure variation: ∂v ∇P + (v · ∇)v = − − |{z} ∇Φ . | {z } ∂t ρ |{z} |{z} δΦ/L v2 /L v/T

(12.57)

δP/ρL

Multiplying through by L and using L/T . v we obtain δP/ρ ∼ v 2 +|δΦ|. Now, the variation in pressure will be related to the variation in density by δP ∼ c2 δρ, where c is the sound speed (not light speed) and we drop constants of order unity in making these estimates. Inserting this into our expression for δP , we obtain the estimate for the fractional density fluctuation v 2 δΦ δρ ∼ 2 + 2 . (12.58) ρ c c Therefore, if the fluid speeds are highly subsonic (v ≪ c) and the gravitational potential does not vary greatly along flow lines, |δΦ| ≪ c2 , then we can ignore the density variations moving with the fluid in solving for the velocity field. Correspondingly, since ρ−1 dρ/dt = ∇ · v = θ [Eq. (12.27)], we can make the approximation ∇ · v ≃ 0.

(12.59)

This argument breaks down when we are dealing with sound waves for which L ∼ cT . For air at atmospheric pressure the speed of sound is c ∼ 300 m/s, which is very fast compared to most flows speeds one encounters, so most flows are “incompressible”. It should be emphasized, though, that “incompressibility”, which is an approximation made in deriving the velocity field, does not imply that the density variation can be neglected in all other contexts. A particularly good example of this is provided by convection flows which are driven by buoyancy as we shall discuss in Chap. 17. **************************** EXERCISES Exercise 12.8 Problem: A Hole in My Bucket There’s a hole in my bucket. How long will it take to empty? (Try an experiment and if the time does not agree with the estimate suggest why this is so.)

27 Box 12.3 Self Gravity T2 In the text, we mostly treat the gravitational field as externally imposed and independent of the behavior of the fluid. This is usually a good approximation. However, it is inadequate for discussing the properties of planets and stars. It is easiest to discuss the necessary modifications required by self-gravitational effects by amending the conservation laws. As long as we work within the domain of Newtonian physics, the mass conservation equation (12.25) is unaffected. However, we included the gravitational force per unit volume ρg as a source of momentum in the momentum conservation law. It would fit much more neatly into our formalism if we could express it as the divergence of a gravitational stress tensor Tg . To see that this is indeed possible, use Poisson’s equation ∇ · g = −4πGρ to write ∇ · Tg = −ρg =

∇ · [g ⊗ g − 12 g 2 g] (∇ · g)g = , 4πG 4πG

so Tg =

g ⊗ g − 21 g 2 g . 4πG

(1)

Readers familiar with classical electromagnetic theory will notice an obvious and understandable similarity to the Maxwell stress tensor whose divergence equals the Lorentz force density. What of the gravitational momentum density? We expect that this can be related to the gravitational energy density using a Lorentz transformation. That is to say it is O(v/c2 ) times the gravitational energy density, where v is some characteristic speed. However, in the Newtonian approximation, the speed of light, c, is regarded as infinite and so we should expect the gravitational momentum density to be identically zero in Newtonian theory—and indeed it is. We therefore can write the full equation of motion (12.38), including gravity, as a conservation law ∂(ρv) + ∇ · Ttotal = 0 ∂t

(2)

where Ttotal includes Tg . Turn to energy conservation: We have seen in the text that, in a constant, external gravitational field, the fluid’s total energy density U and flux F are given by Eqs. (12.51) and (12.52). In a general situation, we must add to these some field energy density and flux. On dimensional grounds, these must be Ufield ∝ g 2 /G and Ffield ∝ Φ,t g/G (where g = −∇Φ). The proportionality constants can be deduced by demanding that for an

28

Box 12.3, Continued T2 ideal fluid in the presence of gravity, the law of energy conservation when combined with mass conservation, momentum conservation, and the first law of thermodynamics, lead to ds/dt = 0 (no dissipation in, so no dissipation out); see Eq. (12.54) and associated discussion. The result [Ex. 12.14] is 1 g2 U = ρ( v 2 + u + Φ) + , 2 8πG

(3)

1 ∂Φ 1 g. (4) F = ρv( v 2 + h + Φ) + 2 4πG ∂t Actually, there is an ambiguity in how the gravitational energy is localized. This ambiguity arises physically from the fact that one can transform away the gravitational acceleration g, at any point in space, by transforming to a reference frame that falls freely there. Correspondingly, it turns out, one can transform away the gravitational energy density at any desired point in space. This possibility is embodied mathematically in the possibility to add to the energy flux F the time derivative of αΦ∇Φ/4πG and add to the energy density U minus the divergence of this quantity (where α is an arbitrary constant), while preserving energy conservation ∂U/∂t + ∇ · F = 0. Thus, the following choice of energy density and flux is just as good as Eqs. (2) and (3); both satisfy energy conservation: 1 2 1 Φ∇Φ g2 g2 U = ρ( v + u + Φ) + = ρ[ v 2 + u + (1 − α)Φ] + (1 − 2α) − α∇ · , (5) 2 8πG 4πG 2 8πG 1 2 1 ∂Φ ∂ Φ∇Φ F = ρv( v + h + Φ) + g+α 2 4πG ∂t ∂t 4πG 1 1 ∂Φ α ∂g = ρv( v 2 + h + Φ) + (1 − α) g+ Φ . 2 4πG ∂t 4πG ∂t

(6)

[Here we have used the gravitational field equation ∇2 Φ = 4πGρ and g = −∇Φ.] Note that the choice α = 1/2 puts all of the energy density into the ρΦ term, while the choice α = 1 puts all of the energy density into the field term g 2 . In Ex. 12.15 it is shown that the total gravitational energy of an isolated system is independent of the arbitrary parameter α, as it must be on physical grounds. A full understanding of the nature and limitations of the concept of gravitational energy requires the general theory of relativity (Part VI). The relativistic analog of the arbitrariness of Newtonian energy localization is an arbitrariness in the gravitational “stress-energy pseudotensor”.

29 Box 12.4 Flow Visualization There are various methods for visualizing fluid flows. We have already met the streamlines which are the integral curves of the velocity field v at a given time. They are the analog of magnetic lines of force. They will coincide with the paths of individual fluid elements if the flow is stationary. However, when the flow is time-dependent, the paths will not be the same as the streamlines. In general, the paths will be the solutions of the equations dx = v(x, t). (1) dt These paths are the analog of particle trajectories in mechanics. Yet another type of flow line is a streak. This is a common way of visualizing a flow experimentally. Streaks are usually produced by introducing some colored or fluorescent tracer into the flow continuously at some fixed point, say x0 , and observing the locus of the tracer at some fixed time, say t0 . Now, if x(t; x0 , t0 ) is the expression for the location of a particle released at time t at x0 and observed at time t0 , then the equation for the streak emanating from x0 and observed at time t0 is the parametric relation x(t) = x(t; x0 , t0 ) Streamlines, paths and streaks are exhibited below. Streak v

v

x (t) x0 x0 Streamlines t= t0 = const

individual paths

Paths

Exercise 12.9 Problem: Rotating Planets, Stars and Disks Consider a stationary, axisymmetric planet star or disk differentially rotating under the action of a gravitational field. In other words, the motion is purely in the azimuthal direction. (a) Suppose that the fluid has a barotropic equation of state P = P (ρ). Write down the equations of hydrostatic equilibrium including the centrifugal force in cylindrical polar coordinates. Hence show that the angular velocity must be constant on surfaces of constant cylindrical radius. This is called von Zeipel’s theorem. (As an application, Jupiter is differentially rotating and therefore might be expected to have similar rotation periods at the same latitude in the north and the south. This is only roughly

30

V

hydrofoil

D

Fig. 12.7: Water flowing past a hydrofoil as seen in the hydrofoil’s rest frame.

true, suggesting that the equation of state is not completely barotropic.) (b) Now suppose that the structure is such that the surfaces of constant entropy per unit mass and angular momentum per unit mass coincide.(This state of affairs can arise if slow convection is present.) Show that the Bernoulli function [Eq. (12.46)] is also constant on these surfaces. (Hint: Evaluate ∇B.) Exercise 12.10 Problem: Crocco’s Theorem (a) Consider steady flow of an ideal fluid. The Bernoulli function is conserved along streamlines. Show that the variation of B across streamlines is given by ∇B = T ∇s + v × ω .

(12.60)

(b) As an example, consider the air in a tornado. In the tornado’s core, the velocity vanishes; and it also vanishes beyond the tornado’s outer edge. Use Crocco’s theorem to show that the pressure in the core is substantially different from that at the outer edge. Is it lower, or is it higher? How does this explain the ability of a tornado to make the walls of a house explode? Exercise 12.11 Derivation: Joule-Kelvin Coefficient Verify Eq. (12.56) Exercise 12.12 Problem: Cavitation A hydrofoil moves with velocity V at a depth D = 3m below the surface of a lake. (See Figure 12.7.) How fast must the hydrofoil move to make the water next to it boil? (Boiling results from the pressure P trying to go negative.) Exercise 12.13 Example: Collapse of a bubble Suppose that a spherical bubble has just been created in the water above the hydrofoil in the previous question. We will analyze its collapse, i.e. the decrease of its radius R(t) from its value Ro at creation. First show that the assumption of incompressibility implies that the

31 radial velocity of the fluid at any radial location r can be written in the form v = F (t)/r 2 . Then use the radial component of the Euler equation (12.40) to show that 1 dF ∂v 1 ∂P +v + =0 2 r dt ∂r ρ ∂r and integrate this outward from the bubble surface at radius R to infinite radius to obtain 1 P0 −1 dF + v 2 (R) = R dt 2 ρ where P0 is the ambient pressure. Hence show that the bubble surface moves with speed #1/2 1/2 " 3 R0 2P0 −1 v(R) = 3ρ R Suppose that bubbles formed near the pressure minimum on the surface of the hydrofoil are swept back onto a part of the surface where the pressure is much larger. By what factor must the bubbles collapse if they are to create stresses which inflict damage on the hydrofoil? A modification of this solution is also important in interpreting the fascinating phenomenon of Sonoluminescence (Brenner, Hilgenfeldt & Lohse 2002). This arises when fluids are subjected to high frequency acoustic waves which create oscillating bubbles. The temperatures inside these bubbles can get so large that the air becomes ionized and radiates. Exercise 12.14 T2 Derivation: No dissipation “in” means no dissipation “‘out”, and verification of the claimed gravitational energy density and flux Consider an ideal fluid interacting with a (possibly dynamical) gravitational field that the fluid itself generates via ∇2 Φ = 4πGρ. For this fluid, take the law of energy conservation ∂U/∂t + ∇ · F = 0 and from it subtract the scalar product of v with the law of momentum conservation, v ·[∂(ρv)/∂t+∇ ·T)]; then simplify using the law of mass conservation and the first law of thermodynamics, to obtain ρds/dt = 0. In your computation, use for U and F the expressions given in Eqs. (3) and (4) of Box 12.3. This calculation tells us two things: (i) The law of energy conservation for an ideal fluid reduces simply to conservation of entropy moving with the fluid; we have put no dissipative physics into the fluxes of momentum and energy, so we get no dissipation out. (ii) The gravitational energy density and flux contained in Eqs. (3) and (4) of Box 12.3 must be correct, since they guarantee that gravity does not alter this “no dissipation in, no dissipation out” result. Exercise 12.15 T2 Example: Gravitational Energy Integrate the energy density U of Eq. (4) of Box 12.3 over the interior and surroundings of an isolated gravitating system to obtain the system’s total energy. Show that the gravitational contribution to this total energy (i) is independent of the arbitrariness (parameter α) in the energy’s localization, and (ii) can be written in the following forms: Z Z Z Z −G ρ(x)ρ(x′ ) 1 1 2 dV g = dV dV ′ Eg = dV ρΦ = − (12.61) 2 8πG 2 |x − x′ | Interpret each of these expressions physically. ****************************

32

12.7

Viscous Flows - Pipe Flow

12.7.1

Decomposition of the Velocity Gradient

It is an observational fact that many fluids develop a shear stress when they flow. Pouring honey from a spoon provides a convenient example. The stresses that are developed are known as viscous stresses. Most fluids, however, appear to flow quite freely; for example, a cup of tea appears to offer little resistance to stirring other than the inertia of the water. It might then be thought that viscous effects only account for a negligible correction to the description of the flow. However, this is not the case. Despite the fact that many fluids behave in a nearly ideal fashion almost always and almost everywhere, the effects of viscosity are still of great consequence. One of the main reasons for this is that most flows that we encounter touch solid bodies at whose surfaces the velocity must vanish. This leads to the formation of boundary layers whose thickness is controlled by strength of the viscous forces. This boundary layer can then exert a controlling influence on the bulk flow. It may also lead to the development of turbulence. We must therefore augment our equations of fluid dynamics to include viscous stress. Our formal development proceeds in parallel to that used in elasticity, with the velocity field v = dξ/dt replacing the displacement field ξ. As already discussed briefly in Sec. 12.5.4 we decompose the velocity gradient tensor ∇v into its irreducible tensorial parts: a rate of expansion, θ, a symmetric rate of shear tensor σ and an antisymmetric rate of rotation tensor r, i.e. 1 ∇v = θg + σ + r . (12.62) 3 Note that we use lower case symbols to distinguish the fluid case from its elastic counterpart: θ = dΘ/dt, σ = dΣ/dt, r = dR/dt. Proceeding directly in parallel to the treatment in Chap. 10, we write θ = ∇·v (12.63a) 1 1 σij = (vi;j + vj;i) − θgij 2 3

(12.63b)

1 1 rij = (vi;j − vj;i) = − ǫijk ω k 2 2

(12.63c)

where ω = 2dφ/dt is the vorticity, which is the counterpart of the rotation vector φ.

12.7.2

Navier-Stokes Equation

Although, as we have emphasized, a fluid at rest does not exert a shear stress, and this distinguishes it from an elastic solid, a fluid in motion can resist shear in the velocity field. It has been found experimentally that in most fluids the magnitude of this shear stress is linearly related to the velocity gradient. This law, due to Hooke’s contemporary, Isaac Newton, is the analogue of the linear relation between stress and strain that we used in our discussion of elasticity. Fluids that obey this law are known as Newtonian. (Some examples of the behavior of non-Newtonian fluids are exhibited in Figure 12.8.)

33 Rheopectic Newtonian

Shear Stress

Plastic Shear Stress

Thixotropic

Newtonian

Time (a)

Rate of Strain (b)

Fig. 12.8: Some examples of non-Newtonian behavior in fluids. a). In a Newtonian fluid the shear stress is proportional to the rate of shear σ and does not vary with time when σ is constant. However, some substances, such as paint, flow more freely with time and are said to be thixotropic. Microscopically, what happens is that the molecules become aligned with the flow which reduces the resistance. The opposite behaviour is exhibited by rheopectic substances. b). An alternative type of non-Newtonian behavior is exhibited by various plastics where a threshold stress is needed before flow will commence.

Fluids are usually isotropic. (Important exceptions include smectic liquid crystals.) Therefore, by analogy with the theory of elasticity, we can describe the linear relation between stress and rate of strain using two constants called the coefficients of bulk and shear viscosity and denoted ζ and η respectively. We write the viscous contribution to the stress tensor as Tvis = −ζθg − 2ησ (12.64) by analogy to Eq. (10.18). If we add this viscous contribution to the stress tensor, then the law of momentum conservation ∂(ρv)/∂t + ∇ · T = ρg gives the following modification of Euler’s equation (12.40), which contains viscous forces: ρ

dv = −∇P + ρg + ∇(ζθ) + 2∇ · (ησ) dt

(12.65)

This is called the Navier-Stokes equation, and the last two terms are the viscous force density. For incompressible flows (e.g., whenever the flow is highly subsonic; Sec. 12.6), θ can be approximated as zero so the bulk viscosity can be ignored. In this case, Eq. (12.65) simplifies to dv ∇P =− + g + ν∇2 v , (12.66) dt ρ where ν=

η ρ

(12.67)

is known as the kinematic viscosity. This is the commonly quoted form of the Navier-Stokes equation.

34

12.7.3

Energy conservation and entropy production

The viscous stress tensor represents an additional momentum flux which can do work on the fluid at a rate Tvis · v per unit area. There is therefore a contribution Fvis = Tvis · v

(12.68)

to the energy flux, just like the term P v appearing (via the ρvh) in Eq. (12.52). Diffusive heat flow (thermal conductivity) can also contribute to the energy flux; its contribution is [Eq. (2.67b)] Fcond = −κ∇T , (12.69) where κ is the coefficient of thermal conductivity. The molecules or particles that produce the viscosity and the heat flow also carry energy, but their energy density is included already in u, the total internal energy per unit mass. The total energy flux, including these contributions, is shown in Table 12.2, along with the energy density and the density and flux of momentum. We see most clearly the influence of the dissipative viscous forces and heat conduction on energy conservation by inserting the energy density and flux from Table 12.2 into the law of energy conservation ∂U/∂t + ∇ · F = 0, subtracting v · [∂(ρv)/∂t + ∇ · T = 0] (v dotted into momentum conservation), and simplifying using mass conservation and the first law of thermodynamics. The result [Ex. 12.18] is the following equation for the evolution of entropy: ds κ Fcond T ρ (12.70) +∇· = ζθ2 + 2ησ : σ + (∇T )2 . dt T T The term in square brackets on the left side represents an increase of entropy per unit volume moving with the fluid due to dissipation (the total increase minus that due to heat flowing conductively into a unit volume); multiplied by T this is the dissipative increase in entropy density. This increase of random, thermal energy is being produced, on the right side, by viscous heating (first two terms), and by the flow of heat Fcond = −κ∇T down a temperature gradient −∇T (third term). The dissipation equation (12.70) is the full content of the law of energy conservation for a dissipative fluid, when one takes account of mass conservation, momentum conservation, and the first law of thermodynamics. Remarkably, we can combine this Lagrangian rate of viscous dissipation with the equation of mass conservation (12.25) to obtain an Eulerian differential equation for the entropy increase: κ ∂(ρs) 1 2 (12.71) ζθ + 2ησ : σ + (∇T )2 . + ∇ · (ρsv − κ∇ ln T ) = ∂t T T The left hand side of this equation describes the rate of change of entropy density plus the divergence of entropy flux. The right hand side is therefore the rate of production of entropy per unit volume. Invoking the second law of thermodynamics, this must be positive definite. Therefore the two coefficients of viscosity, like the bulk and shear moduli, must be positive, as must the coefficient of thermal conductivity κ (heat must flow from hotter regions to cooler regions).

35 Quantity Mass Momentum Energy

Density ρ ρv U = ( 12 v 2 + u + Φ)ρ

Flux ρv g2 − ζθg − 2ησ T = ρv ⊗ v + P g + 4πG F = ( 21 v 2 + h + Φ)ρv − ζθv − 2ησ · v − κ∇T

Table 12.2: Densities and Fluxes of mass, momentum, and energy for a dissipative fluid in an externally produced gravitational field. For self-gravitating systems see Box 12.3

12.7.4

Molecular Origin of Viscosity

Microscopically, we can distinguish gases from liquids. In gases, molecules of mass m travel a distance of order their mean free path λ before they collide. If there is a velocity gradient, ∇v in the fluid, then they will, on average, transport a momentum ∼ mλ∇v with themselves. If there are n molecules per unit volume traveling with mean speeds c¯, then the extra momentum crossing a unit area in unit time is ∼ nm¯ cλ∇v, from which we may extract an estimate of the coefficient of shear stress 1 cλ . η = ρ¯ 3

(12.72)

Here the numerical coefficient of 1/3 has been inserted to agree with a proper kinetic-theory calculation. (Since, in the language of Chap. 2, the viscosity coefficients are actually “transport coefficients” for momentum, a kinetic-theory calculation can be made using the techniques of Section 2.7.) Note from Eq. (12.72) that in a gas the coefficient of viscosity will increase with temperature (∝ T 1/2 ). In a liquid, where the molecules are less mobile, it is the close intermolecular attraction that produces the shear stress. The ability of molecules to slide past one another therefore increases rapidly with their thermal activation, causing typical liquid viscosity coefficients to fall dramatically with temperature.

12.7.5

Reynolds’ Number

The kinematic viscosity ν has dimensions [L]2 [T ]−1 . This suggests that we quantify the importance of viscosity by comparing ν with the product of a characteristic velocity in the flow V and a characteristic length L. The dimensionless combination R=

LV ν

(12.73)

is known as the Reynolds’ number, and is the first of many dimensionless numbers we shall encounter in our study of fluid mechanics. Flows with Reynolds number much less than unity – such as the tragic Boston molasses tank explosion in 1919 – are dominated by viscosity. Large Reynolds’ number flows can still be controlled by viscosity (as we shall see in later chapters), especially when acting near boundaries, despite the fact that the viscous stresses are negligible over most of the volume.

36 Quantity Kinematic viscosity ν (m2 s−1 ) Water 10−6 Air 10−5 Glycerine 10−3 Blood 3 × 10−6 Table 12.3: Kinematic viscosity for common fluids.

12.7.6

Blood Flow

Let us now consider one simple example of a viscous stress at work, namely the flow of blood down an artery. Let us model the artery as a cylindrical pipe of radius R, down which the blood is forced by a pressure gradient. This is an example of what is called pipe flow. In the absence of external forces, and time-dependence, the divergence of the total stress tensor must vanish. Therefore, ∇ · [ρv ⊗ v + P g − 2ησ] = 0 (12.74) Now, in most instances of pipe flow ρv 2 ≪ ∆P =(the pressure difference between the two ends), so we can neglect the first term in Eq. (12.74). We now suppose that the flow is solely along the z− direction only a function of cylindrical radius ̟. (This is an example of laminar flow.) This is, in fact, a very important restriction. As we shall discuss in detail in the following chapter, many flows become turbulent and this has a major impact on the result. As the density is effectively constant (we satisfy the conditions for incompressible flow), and we must conserve mass, the velocity cannot change along the pipe. Therefore the only non-vanishing component of the shear tensor is the ̟z component. Reexpressing Eq. (12.74) in cylindrical coordinates, and inferring from it that the pressure is a function of z only and not of ̟, we obtain 1 d dv dP ̟η =− , (12.75) ̟ d̟ d̟ dz where dP/dz is the pressure gradient along the pipe. This differential equation must be solved subject to the boundary conditions that the velocity gradient vanish at the center of the pipe and that the velocity vanish at its walls. The solution is v(̟) = −

dP R2 − ̟ 2 dz 4η

We can now evaluate the total discharge or mass of fluid flowing along the pipe. Z R πρR4 dP dm = ρv2π̟d̟ = − dt 8η dz 0

(12.76)

(12.77)

This relation is known as Poiseuille’s law. Now let us apply this to blood. Consider an artery of radius R = 1mm. An estimate of the pressure gradient may be obtained from the difference between the diastolic and systolic

37 pressure measured by a doctor (∼ 40mm of mercury ∼ 5 × 103 N m−2 in a healthy adult) and dividing by the length of the artery, ∼ 1m. The kinematic viscosity is η/ρ = ν = 3 × 10−6m2 s−1 from Table 12.3. The rate of blood flow is then ∼ 3 × 10−4 kg s−1 or ∼ 3 × 10−7m3 s−1 . Now, supposing there are ten such arteries of this size and length, the total blood flow will be ∼ 3 × 10−6 m3 s−1 . Actually, the heart of a healthy adult pumps the full complement of blood ∼ 5litres or ∼ 5 × 10−3 m3 every minute at a mean rate of ∼ 10−4 m3 s−1 about thirty times faster than this estimate. The main reason for this large discrepancy is that we have assumed in our calculation that the walls of an artery are rigid. They are not. They are quite elastic and are able to contract and expand in a wave-like manner so as to boost the blood flow considerably. Note that the Poiseuille formula is very sensitive to the radius of the pipe, dm/dt ∝ R4 , so a factor two increase in radius increases the flow of blood by sixteen. So, both hardening and thinning of the arteries will therefore strongly inhibit the flow of blood. Eat salads! **************************** EXERCISES Exercise 12.16 Problem: Mean free path Estimate the collision mean free path of the air molecules around you. Hence verify the estimate for the kinematic viscosity of air given in Table 12.3. Exercise 12.17 Example: Kinematic interpretation of Vorticity Consider a velocity field with non-vanishing curl. Define a locally orthonormal basis at a point in the velocity field so that one basis vector, ex is parallel to the vorticity. Now imagine the remaining two basis vectors as being frozen into the fluid. Show that they will both rotate about the axis defined by ex and that the vorticity will be the sum of their angular velocities (i.e. twice the average of their angular velocities). Exercise 12.18 Derivation: Entropy Increase Derive the Lagrangian equation (12.70) for the rate of increase of entropy in a dissipative fluid by the steps in the sentence preceeding that equation. [Hints: If you have already done the analogous problem, Ex. 12.14, for an ideal fluid, then you need only compute the new terms that arise from the dissipative momentum flux Tvis = −ζθg − 2ησ and dissipative energy fluxes Fvis = Tvis · v and Fcond = −κ∇T . The sum of these new contributions, when you subtract v · (momentum conservation) from energy conservation, is ∇ · Tvis − ∇ · (v · Tvis ) − ∇ · Fcond ; and this must be added to the left side of the result ρT ds/dt = 0, Eq. (12.54), for an ideal fluid.] ****************************

Bibliographic Note There are many good texts on fluid mechanics, most directed toward an engineering or applied mathematics audience. Among those we find useful are Acheson (1990) at an elementary level, and Batchelor (1970) and Lighthill (1986) at a more advanced level. Landau

38 Box 12.5 Terminology in Chapter 12 This chapter introduces a large amount of terminology. We list much of it here. adiabatic A process in which each fluid element conserves its entropy. adiabatic index The parameter Γ that relates pressure and density changes δP/P = Γδρ/ρ in an adiabatic process. For an ideal gas, it is the ratio of specific heats, Γ = γ ≡ CP /CV . advective time derivative The time derivative d/dt = ∂/∂t + v · ∇ moving with the fluid. barotropic A process or equation in which pressure can be regarded as a function solely of density, P = P (ρ). Bernoulli function , also sometimes called Bernoulli constant. B = ρ( 21 v 2 + h + Φ). bulk viscosity, coefficient of The proportionality constant ζ relating rate of expansion to viscous stress, Tvis = −ζθg convective time derivative Same as advective time derivative dissipation A process that increases the entropy. Viscosity and diffusive heat flow are forms of dissipation. equation of state In this chapter, where chemical and nuclear reactions do not occur: relations of the form u(ρ, s), P (ρ, s) or u(ρ, T ), P (ρ, T ). Eulerian changes Changes in a quantity at fixed location; cf. Lagrangian changes Euler equation Newton’s “F = ma” equation for an ideal fluid, ρdv/dt = −∇P + ρg. expansion, rate of Fractional rate of increase of a fluid element’s volume; θ = ∇ · v. gas A fluid in which the separations between molecules are large compared to the molecular sizes and there are no long-range forces between molecules except gravity; contrast this with a fluid. ideal gas (also called “perfect gas”) A gas in which the sizes of the molecules and (nongravitational) forces between them are completely neglected, so the pressure is due solely to kinetic motions of molecules, P = nkB T . ideal flow A flow in which there is no dissipation. ideal fluid (also called “perfect fluid”) A fluid in which there are no dissipative processes. incompressible A process or fluid in which the fractional changes of density are small, δρ/ρ ≪ 1. inviscid With negligible viscosity. irrotational A flow or fluid with vanishing vorticity. isentropic A process or fluid in which the entropy per unit rest. mass s is the same everywhere.

39 Box 12.5, Continued isothermal A process or fluid in which the temperature is the same. everywhere. isobar A surface of constant pressure. kinematic viscosity ν ≡ η/ρ, the ratio of the coefficient of shear viscosity to the density. Lagrangian changes Changes measured moving with the fluid; cf. Eulerian changes. laminar flow A non-turbulent flow. liquid A fluid such as water in which the molecules are packed side by side; contrast this with a gas. mean molecular weight The average mass of a molecule in a gas, divided by the mass of a proton. Navier-Stokes equation Newton’s “F = ma” equation for a viscuous, incompressible fluid, dv/dt = −(1/ρ)∇P + ν∇2 v + g. Newtonian fluid Two meanings: (i) nonrelativistic fluid; (ii) a fluid in which the only anisotropic stresses are those due to bulk and shear viscosity. perfect gas Ideal gas. perfect fluid Ideal fluid. polytropic A barotropic pressure-density relation of the form P ∝ ρ1+1/n for some constant n called the polytopic index. The proportionality constant is often some function of entropy. Reynolds’ number The ratio R = LV /ν, where L is the characteristic lengthscale of a flow, V is the characteristic velocity, and ν is the kinematic viscosity. In order of magnitude this is the ratio of inertial acceleration v · v to viscous acceleration ν∇2 v in the Navier-Stokes equation. rotation, rate of Antisymmetric part of the gradient of velocity; vorticity converted into an antisymmetric tensor using the Levi-Civita tensor. shear, rate of Symmetric trace-free part of the gradient of velocity. steady flow One that is independent of time in some chosen coordinate system. turbulent flow A flow characterized by chaotic fluid motions. vorticity The curl of the velocity field. and Lifshitz (1959) as always is terse, but good for physicists who already have some knowledge of the subject. Tritton (1977) takes an especially physical approach to the subject, with lots of useful diagrams and photographs of fluid flows. Physical intuition is very important in fluid mechanics, and is best developed with the aid of visualizations — both movies and photographs. In recent years many visualizations have been made available on the web. For a catalog, see University of Iowa Fluids Laboratory (1999). Movies that we have found especially useful are those of Hunter Rouse (1965) and the National Committee for Fluid Mechanics Films (1963).

40 Box 12.6 Important Concepts in Chapter 12 • Dependance of pressure on density: equation of state; δP = Kδρ/ρ for liquid; δP/P = Γδρ/ρ for gas, Sec. 12.2 • Hydrostatic equilibrium, Sec. 12.3 • Archimedes law, Sec. 12.3.1 • Shapes of rotating bodies, Sec. 12.3.2 • Centrifugal potential and hydrostatics in rotating reference frame, Sec. 12.3.3 • Conservation laws: mass, momentum and energy; Lagrangian vs. Eulerian approach, Sec. 12.4 • Gravitational field: densities and fluxes of momentum and energy, Box 12.3 • Viscous stress and energy flux, Sec. 12.7.2 • Thermal conductivity and diffusive energy flux, Sec. 12.7.2 • Densities and fluxes of mass, momentum, and energy summarized, Tables 12.1 and 12.2 • Euler equation (momentum conservation) for an ideal fluid, Secs. 12.5.2, 12.5.3 • Bernoulli’s theorem, Sec. 12.5.4 • Incompressibility of subsonic gas, Sec. 12.6 • Rates of expansion, rotation, and shear, and vorticity, Secs. 12.5.4 and 12.7.1 • Navier-Stokes equation (momentum conservation) for viscous, incompressible fluid, Sec. 12.7.2 • Energy conservation equivalent to a law for evolution for entropy, Secs. 12.5.5, 12.7.3 • Entropy increase (dissipation) due to viscosity and diffusive heat flow, Sec. 12.7.3 • Molecular origin of viscosity, Sec. 12.7.4

Bibliography Acheson, D. J. 1990. Elementary Fluid Dynamics, Oxford: Clarendon Press.

41 Batchelor, G. K. 1970. An Introduction to Fluid Dynamics, Cambridge: Cambridge University Press. Brenner, M. P., Hilgenfeldt, S. & Lohse, D. 2002 Rev. Mod. Phys. 74 425 Chandrasekhar, S. 1939. Stellar Structure, Chicago: University of Chicago Press; reprinted by Dover Publications. Landau, L. D. and Lifshitz, E. M. 1959. Fluid Mechanics, Oxford: Pergamon. Lighthill, J. 1986. An Informal Introduction to Theoretical Fluid Mechanics, Oxford: Oxford University Press. National Committee for Fluid Mechanics Films, ca. 1963. Fluid Mechanics Films. Available at http://web.mit.edu/fluids/www/Shapiro/ncfmf.html . Reif, F. 1959. Fundamentals of Statistical and Thermal Physics, New York: McGrawHill. Rouse, H. ca. 1965. Fluid Mechanics Movies. Available at http://users.rowan.edu/∼orlins/fm/movies.html . Tritton, D. J. 1977. Physical Fluid Dynamics, Wokingham: van Nostrand-Reinhold. University of Iowa Fluids Laboratory. 1999. Flow Visualization & Simulation Gallery, http://css.engineering.uiowa.edu/fluidslab/referenc/visualizations.html . White, F. M. 1974. Viscous Fluid Flow, New York: McGraw-Hill.

Contents 13 Vorticity 13.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Vorticity and Circulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Vorticity Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.2 Tornados . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.3 Kelvin’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.4 Diffusion of Vortex Lines . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.5 Sources of Vorticity. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Low Reynolds Number Flow – Stokes Flow, Sedimentation, and Climate Change 13.3.1 Stokes Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.2 Sedimentation Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 High Reynolds Number Flow – Laminar Boundary Layers . . . . . . . . . . . 13.4.1 Vorticity Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Nearly Rigidly Rotating Flow — Earth’s Atmosphere and Oceans . . . . . . 13.5.1 Equations of Fluid Dynamics in a Rotating Reference Frame . . . . . 13.5.2 Geostrophic Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.3 Taylor-Proudman Theorem . . . . . . . . . . . . . . . . . . . . . . . 13.5.4 Ekman Boundary Layers . . . . . . . . . . . . . . . . . . . . . . . . . 13.6 Kelvin-Helmholtz Instability — Excitation of Ocean Waves by Wind . . . . 13.6.1 Temporal growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.2 Spatial Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.3 Relationship between temporal and spatial growth; Excitation of ocean waves by wind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.4 Physical Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.5 Rayleigh and Richardson Stability Criteria. . . . . . . . . . . . . . . .

0

1 1 3 4 6 7 8 9 11 12 16 18 22 23 24 25 27 27 28 33 36 36 36 37 38

Chapter 13 Vorticity Version 0813.1.K, 28 January 2008. Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 13.1 Reader’s Guide • This chapter relies heavily on Chap. 12, Fundamentals of Fluid Dynamics. • Chapters 14–18 (fluid mechanics and magnetohydrodynamics) are extensions of this chapter; to understand them, this chapter must be mastered. • Portions of Part V, Plasma Physics (especially Chap. 20 on the “two-fluid formalism”), rely on this chapter.

13.1

Overview

In the last chapter, we introduced an important quantity called vorticity which is the subject of the present chapter. Although the most mathematically simple flows are “potential flows”, with velocity of the form v = ∇ψ for some ψ so the vorticity ω = ∇ × v vanishes, the majority of naturally occuring flows are vortical. We shall find that studying vorticity allows us to develop an intuitive understanding of how flows evolve. Furthermore computing the vorticity can provide an important step along the path to determining the full velocity field of a flow. We all think we know how to recognise a vortex. The most hackneyed example is water disappearing down a drainhole in a bathtub. Here what happens is that water at large distances has a small angular velocity about the drain, which increases as the water flows towards the drain in order to conserve angular momentum. This in turn means that the product of the circular velocity vφ and the radius r is independent of radius, which, in turn, 1

2 implies that ∇ × v ∼ 0. So this is a vortex without much vorticity! (except, as we shall see, a delta-function spike of vorticity right at the drainhole’s center). Vorticity is a precise physical quantity defined by ω = ∇ × v, not any vaguely circulatory motion.1 In Sec. 13.2 we shall introduce several tools for analyzing and utilizing vorticity: Vorticity is a vector field and therefore has integral curves obtained by solving dx/dλ = ω for some parameter λ. These are called vortex lines and they are quite analogous to magnetic field lines. (We shall also introduce an integral quantity called the circulation analogous to the magnetic flux and show how this is also helpful for understanding flows.) In fact, the analogy with magnetic fields turns out to be extremely useful. Vorticity, like a magnetic field, automatically has vanishing divergence, which means that the vortex lines are continuous, just like magnetic field lines. Vorticity, again like a magnetic field, is an axial vector and thus can be written as the curl of a polar vector potential, the velocity v.2 Vorticity has the interesting property that it evolves in a perfect fluid in such a manner that the flow carries the vortex lines along with it. Furthermore, when viscous stresses are important, vortex lines diffuse through the moving fluid with a diffusion coefficient that is equal to the kinematic viscosity. In Sec. 13.3 we study a classical problem that illustrates both the action and the propagation of vorticity: the creeping flow of a low Reynolds number fluid around a sphere. (Low Reynolds number flow arises when the magnitude of the viscous stress in the equation of motion exceeds the magnitude of the inertial acceleration.) The solution to this problem finds contemporary application in computing the sedimentation rates of soot particles in the atmosphere. In Sec. 13.4, we turn to high Reynolds number flows, in which the viscous stress is quantitatively weak over most of the fluid. Here, the action of vorticity can be concentrated in relatively thin boundary layers in which the vorticity, created at the wall, diffuses away into the main body of the flow. Boundary layers arise because in real fluids, intermolecular attraction requires that the component of the fluid velocity parallel to the boundary (not just the normal component) vanish. It is the vanishing of both components of velocity that distinguishes real fluid flow at high Reynolds number (i.e. small viscosity) from the solutions obtained assuming vanishing vorticity. Nevertheless, it is often a good approximation to adopt a solution to the equations of fluid dynamics in which vortex-free fluid slips freely past the solid and then match it onto the solid using a boundary-layer solution. Stirred water in a teacup and the Earth’s oceans and atmosphere rotate nearly rigidly, and so are most nicely analyzed in a co-rotating reference frame. In Sec. 13.5 we use such an analysis to discover novel phenomena produced by Coriolis forces — including winds around pressure depressions, “Taylor columns” of fluid that hang together in a rigid-body-like way, 1

Incidentally, in a bathtub the magnitude of the Coriolis force resulting from the earth’s rotation with angular velocity Ω is a fraction ∼ Ω(r/g)1/2 ∼ 3 × 10−6 of the typical centrifugal force in the vortex (where g is the acceleration of gravity and r ∼ 2 cm is the radius of the vortex at the point where the water’s surface achieves an appreciable downward slope). Thus, only under the most controlled of conditions will the hemisphere in which the drainhole is located influence the direction of rotation. 2 Pursuing the electromagnetic analogy further we can ask the question, “Given a specified vorticity field ω(x, t) can I solve uniquely for the velocity v(x, t)?” The answer, of course is “No”. There is gauge freedom and many solutions exist. Interestingly, if we specify that the flow be incompressible ∇ · v = 0 (i.e. be in the analog of Coulomb gauge), then v(x, t) is unique.

3 y

v

v

r

r x

(a)

(b)

(c)

Fig. 13.1: Illustration of vorticity in three two-dimensional flows. a) Constant angular velocity Ω. If we measure radius r from the center P, the circular velocity satisfies v = Ωr. This flow has vorticity ω = 2Ω. b) Constant angular momentum per unit mass j, with v = j/r. This flow has zero vorticity except at its center, ω = (j/2π)δ(x). c) Shear flow in a laminar boundary layer, vx = ωy. In this flow the vorticity is ω = vx /y.

spiral-shaped boundary layers, gyres such as the Sargasso Sea around which ocean currents such as the Gulph Stream circulate, and tea leaves that accumulate at the bottom center of a tea cup. All the above issues will be discussed in some detail in this chapter. As a final issue, in Sec. 13.6 we shall consider a simple vortex sheet ignoring viscous stresses. We shall find that this type of flow is generically unstable. This will provide a good introduction to the principal topic of the next chapter, turbulence.

13.2

Vorticity and Circulation

We have already defined the vorticity as the curl of the velocity ω = ∇ × v, analogous to defining the magnetic field as the curl of a vector potential. We can illustrate vorticity by considering the three simple 2-dimensional flows shown in Fig. 13.1: Fig. 13.1(a) shows uniform rotation with angular velocity Ω = Ωez . The velocity field is v = Ω × x, where x is measured from the rotation axis. Taking its curl, we discover that ω = 2Ω. Fig. 13.1(b) shows a flow in which the angular momentum per unit mass j = jez is constant because it is approximately conserved in a flow lasting many rotation periods; i.e. v = j × x/r 2 (where r = |x| and j =constant). This is the kind of flow that occurs around a bathtub vortex, and around a tornado. In this case, the vorticity is ω = (j/2π)δ(x), i.e. it vanishes everywhere except at the center, x = 0. What is different in this case is that the fluid rotates differentially and although two neighboring fluid elements, separated tangentially, rotate about each other with an angular velocity j/r 2 , when the two elements are separated radially, their angular velocity is −j/r 2 . The average of these two angular velocities vanishes, and the vorticity vanishes. The vanishing vorticity in this case is an illustration of a simple geometrical description of vorticity in any two dimensional flow: If we orient the ez axis of a Cartesian coordinate

4 system normal to the velocity field, then ∂vy ∂vx − . (13.1) ∂x ∂y From this expression, it is apparent that the vorticity at a point is the sum of the angular velocities of any pair of mutually perpendicular, infinitesimal lines passing through that point and moving with the fluid. If we float a little vane with orthogonal fins in the flow, the vane will rotate with an angular velocity that is the average of the flow’s angular velocities at its fins, which is half the vorticity. Equivalently, the vorticity is twice the rotation rate of the vane. In the case of constant angular momentum flow in Fig. 13.1(b), the average of the two angular velocities is zero, the vane doesn’t rotate, and the vorticity vanishes. Figure 13.1(c) shows the flow in a plane-parallel shear layer. In this case, a line in the flow along the x direction does not rotate, while a line along the y direction rotates with angular velocity ω. The sum of these two angular velocities, 0 + ω = ω is the vorticity. Evidently, curved streamlines are not a necessary condition for vorticity. ωz =

13.2.1

Vorticity Transport

By analogy with magnetic field lines, we define a flow’s vortex lines to be parallel to the vorticity vector ω and to have a line density proportional to ω = |ω|. These vortex lines will always be continuous throughout the fluid because the vorticity field, like the magnetic field, is a curl and therefore is necessarily solenoidal (∇ · ω = 0). However, vortex lines can begin and end on solid surfaces, as the equations of fluid dynamics no longer apply there. Vorticity depends on the velocity field at a particular instant, and will evolve with time as the velocity field evolves. We can determine how by manipulating the Navier-Stokes equation ∂v ∇P dv ≡ + (v · ∇)v = − − ∇Φ + ν∇2 v dt ∂t ρ

(13.2)

[Eq. (12.66)]. (Here and throughout this chapter we keep matters simple by assuming that the bulk viscosity is ignorable and the coefficient of shear viscosity is constant.) We take the curl of (13.2) and use the vector identity (v · ∇)v = ∇(v 2 )/2 − v × ω (easily derivable using the Levi-Civita tensor and index notation) to obtain ∂ω ∇P × ∇ρ = ∇ × (v × ω) − + ν∇2 ω. (13.3) ∂t ρ2 It is convenient to rewrite this vorticity evolution law with the aid of the relation (again derivable using the Levi-Civita tensor) ∇ × (v × ω) = (ω · ∇)v + v(∇ · ω) − ω(∇ · v) − (v · ∇)ω .

(13.4)

Inserting this into Eq. (13.3), using ∇ · ω = 0 and introducing a new type of time derivative3 Dω ∂ω dω = + (v · ∇)ω − (ω · ∇)v = − (ω · ∇)v , Dt ∂t dt 3

(13.5)

The combination of spatial derivatives appearing here is called the “Lie derivative” and denoted Lv ω ≡ (v · ∇)ω − (ω · ∇)v; it is also the “commutator” of v and ω and denoted [v, ω]. It is often encountered in differential geometry.

5 we bring Eq. (13.3) into the following form: Dω ∇P × ∇ρ = −ω∇ · v − + ν∇2 ω. Dt ρ2

(13.6)

This is our favorite form for the vorticity evolution law. In the remainder of this section we shall explore its predictions. (v +dv)dt ( x

.

) v dt

x x

vdt

Fig. 13.2: Equation of motion for an infinitesimal vector ∆x connecting two fluid elements. As the fluid elements at P and Q move to P ′ and Q′ in a time interval dt, the vector changes by (∆x · ∇)vdt.

The operator D/Dt (defined by Eq. (13.5) when acting on a vector and by D/Dt = d/dt when acting on a scalar) is called the fluid derivative. (The reader should be warned that the notation D/Dt is used in some older texts for the convective derivative d/dt.) The geometrical meaning of the fluid derivative can be understood from Fig. 13.2. Denote by ∆x(t) the vector connecting two points P and Q that are moving with the fluid. Then the convective derivative d∆x/dt must equal the relative velocity of these two points, namely (∆x · ∇)v. In other words, the fluid derivative of ∆x vanishes D∆x =0. Dt

(13.7)

More generally, the fluid derivative of any vector can be understood as its rate of change relative to a vector moving with the fluid. In order to understand the vorticity evolution law (13.6) physically, let us consider a barotropic [P = P (ρ)], inviscid (ν = 0) fluid flow. (This is the kind of flow that usually occurs in the Earth’s atmosphere and oceans, well away from solid boundaries.) Then the last two terms in Eq. (13.6) will vanish, leaving Dω = −ω∇ · v . Dt

(13.8)

Equation (13.8) says that the vorticity has a fluid derivative parallel to itself. In this sense we can speak of the vortex lines as being frozen into the moving fluid. We can actually make the fluid derivative vanish by substituting ∇ · v = −ρ−1 dρ/dt (the equation of mass conservation) into Eq. (13.8); the result is D ω = 0 for barotropic, inviscid flow. (13.9) Dt ρ

6 Vortex lines ..

x

Fig. 13.3: Simple demonstration of the kinematics of vorticity propagation in a barotropic, inviscid flow. A short, thick cylindrical fluid element with generators parallel to the local vorticity gets deformed, by the flow, into a long, slender cylinder. By virtue of Eq. (13.9), we can think of the vortex lines as being convected with the fluid, with no creation of new lines or destruction of old ones, so that the number of vortex lines passing through the cylinder (through its end surface ∆Σ) remains constant.

Therefore, the quantity ω/ρ evolves according to the same equation as the separation ∆x of two points in the fluid. To see what this implies, consider a small cylindrical fluid element whose symmetry axis is parallel to ω (Fig. 13.3). Denote its vectorial length by ∆x and its vectorial cross sectional area by ∆Σ. Then since ω/ρ points along ∆x and both are frozen into the fluid, it must be that ω/ρ = constant×∆x. Therefore, the fluid element’s conserved mass is ∆M = ρ∆x · ∆Σ = constant × ω · ∆Σ, so ω · ∆Σ is conserved as the cylindrical fluid element moves and deforms. We thereby conclude that the fluid’s vortex lines, with number per unit area directly proportional to |ω|, are convected by our barotropic, inviscid fluid, without having to be created or destroyed. If the flow is not only barotropic and inviscid, but also incompressible (as it usually is to high accuracy in the oceans and atmosphere), then Eqs. (13.8) and (13.9) say that Dω/Dt = 0. Suppose, in addition, that the flow is 2-dimensional (as it commonly is to moderate accuracy averaged over scales large compared to the thickness of the atmosphere and oceans), so v is in the x and y directions and independent of z. This means that ω = ωez and we can regard the vorticity as the scalar ω. Then Eq. (13.5) with (ω · ∇)v = 0 implies that the vorticity obeys the simple propagation law dω =0. (13.10) dt Thus, in a 2-dimensional, incompressible, barotropic, inviscid flow, the vorticity is convected conservatively, just like entropy per unit mass in an adiabatic fluid.

13.2.2

Tornados

A particular graphic illustration of the behavior of vorticity is provided by a tornado. Tornadoes in North America are most commonly formed at a front where cold, dry air from the north meets warm, moist air from the south, and huge, cumulo-nimbus thunderclouds form. Air is set into counter-clockwise circulatory motion just below the clouds by Coriolis forces.4 A low pressure vortical core is created at the center of this spinning fluid and there 4

In the southern hemisphere, the Coriolis forces act in the opposite direction, producing clockwise motion.

7 will be an upflow of air which will cause the spinning region to lengthen. Now, consider this in the context of vorticity propagation. As the air, to first approximation, is incompressible, a lengthening of the vortex lines corresponds to a reduction in the cross section and a strengthening of the vorticity. This, in turn, corresponds to an increase in the circulatory speeds found in a tornado. (Speeds in excess of 300 mph have been reported.) If and when the tornado touches down to the ground and its very-low-pressure core passes over the walls and roof of a building, the far larger, normal atmospheric pressure inside the building can cause the building to explode.

13.2.3

Kelvin’s Theorem

Intimately related to vorticity is a quantity called the circulation Γ; it is defined as the line integral of the velocity around a closed contour ∂S lying in the fluid Γ≡

Z

v · dx ,

(13.11)

∂S

and it can be regarded as a property of the closed contour ∂S. We can invoke Stokes’ theorem to convert this circulation into a surface integral of the vorticity passing through a surface S bounded by the same contour: Z Γ= ω · dΣ . (13.12) S

[Note, though, that Eq. (13.12) is only valid if the area bounded by the contour is simply connected; in particular, if the area enclosed contains a solid body, Eq. (13.12) may fail.] Equation (13.12) says that the circulation Γ is the flux of vorticity through S, or equivalently the number of vortex lines passing through S. Circulation is thus the fluid counterpart of magnetic flux. Kelvin’s theorem tells us the rate of change of the circulation associated with a particular contour ∂S that is attached to the moving fluid. Let us evaluate this directly using the convective derivative of Γ. We do this by differentiating the two vector quantities inside the integral (13.11): Z Z dx dv dΓ = · dx + v·d dt dt ∂S dt ∂S Z Z Z Z ∇P 1 2 =− · dx − ∇Φ · dx + ν (∇ v) · dx + d v2, (13.13) ∂S ρ ∂S ∂S ∂S 2 where we have used the Navier-Stokes equation (13.2) with ν = constant. The second and fourth terms on the right hand side of Eq. (13.13) vanish around a closed curve and the first can be rewritten in different notation to give dΓ =− dt

Z

∂S

dP +ν ρ

Z

∂S

(∇2 v) · dx .

(13.14)

8 This is Kelvin’s theorem for the evolution of circulation. In a rotating reference frame it must be augmented by the integral of the Coriolis acceleration −2Ω × v around the closed curve ∂S, and if the fluid is electrically conducting and possesses a magnetic field it must be augmented by the integral of the Lorentz force per unit mass J × B/ρ around ∂S. If the fluid is barotropic, P = P (ρ), and the effects of viscosity are negligible (and the coordinates are inertial and there is no magnetic field and electric current), then the right hand side of Eq. (13.14) vanishes, and Kelvin’s theorem takes the simple form dΓ = 0 for barotropic, inviscid flow. dt

(13.15)

This is just the global version of our result that the circulation ω · ∆Σ of an infinitesimal fluid element is conserved. The qualitative content of Kelvin’s theorem is that vorticity in a fluid is long-lived. Once a fluid develops some circulation, this circulation will persist unless and until the fluid can develop some shear stress, either directly through viscous action, or indirectly through the ∇P × ∇ρ term or a Coriolis or Lorentz force term in Eq. (13.3).

13.2.4

Diffusion of Vortex Lines

Next, consider the action of viscous stresses on an existing vorticity field. For an incompressible, barotropic fluid with nonnegligible viscosity, the vorticity evolution law (13.6) says Dω = ν∇2 ω Dt

for incompressible, barotropic fluid

(13.16)

This is a “convective” vectorial diffusion equation: the viscous term ν∇2 ω causes the vortex lines to diffuse through the moving fluid. When viscosity is negligible, the vortex lines are frozen into the flow. When the shear viscous stress is large, vorticity will spread away from its source to occupy an area ∼ νt in the moving fluid after a time t. The kinematic viscosity, therefore, not only has the dimensions of a diffusion coefficient, it actually controls the diffusion of vortex lines relative to the moving fluid. As a simple example of the spreading of vortex lines, consider an infinite plate moving parallel to itself relative to a fluid at rest. Let us transform to the frame of the plate [Fig. 13.4(a)] so the fluid moves past it. Suppose that at time t = 0 the velocity has only a component vx parallel to the plate, which depends solely on the distance y from the plate, and suppose further that vx is constant away from the plate, but in a thin boundary layer along the plate it decreases to 0 at y = 0 (as it must, because of the plate’s “no-slip” boundary condition); cf. Fig. 13.4(b). As the flow is a function only of y (and t), and v and ω point in directions orthogonal to ey , the fluid derivative (13.5) in our situation reduces to a partial derivative and the convective diffusion equation (13.16) becomes an ordinary diffusion equation, ∂ω/∂t = ν∇2 ω. Let the initial thickness of the boundary layer be δ(0). Then this diffusion equation says that the viscosity will diffuse through the fluid, under the action of viscous stress, and as a result, the boundary-layer thickness will increase with time as 1 δ(t) ∼ (νt) 2 for t & δ(0)2 /ν . (13.17)

9

y x

vx

(a) vx

t=0

t >>

2

(0)/ ν

1/2

(t) ~ (νt)

(0) y

y (b)

(c)

Fig. 13.4: A simple shear layer in which the vortex line freezing term, ∇ × (v × ω) vanishes. Vorticity will diffuse away from the midplane under the action of viscous torques in much the same way that heat diffuses away from a heated surface.

13.2.5

Sources of Vorticity.

Having discussed how vorticity is conserved in simple inviscid flows and how it diffuses away under the action of viscosity, we must now consider its sources. The most important source is a solid surface. When fluid suddenly encounters a solid surface like the leading edge of an airplane wing, intermolecular forces act to decelerate the fluid very rapidly in a thin boundary layer along the surface. This introduces circulation and consequently vorticity into the flow, where none existed before; and that vorticity then diffuses into the bulk flow, thickening the boundary layer (Sec. 13.4 below). If the fluid is non-barotropic, then pressure gradients can also create vorticity, as described by the second term on the right hand side of our vorticity evolution law (13.6). Physically what happens is that when the isobars do not coincide with the contours of constant density, known as isochors, the net force on a small element of fluid does not pass through its center of mass, and it therefore exerts a torque on the element, introducing some rotational motion and vorticity. (See Figure 13.5.) Non-barotropic pressure gradients can therefore create vorticity within the body of the fluid. Note that as the vortex lines must be continuous, any fresh ones that are created within the fluid must be created as loops that expand from a point or a line. There are three other common sources of vorticity in fluid dynamics, Coriolis forces (when one’s reference frame is rotating rigidly), curving shock fronts (when the speed is supersonic) and Lorentz forces (when the fluid is electrically conducting). We shall discuss these in Chaps. 15, 16 and 18 respectively. ****************************

10

P= const F

Fluid element

= const center of mass

Fig. 13.5: Mechanical explanation for the creation of vorticity in a non-barotropic fluid. The net pressure gradient force F acting on a small fluid element is normal to the isobars (solid lines) and does not pass through the center of mass of the element; thereby a torque is produced.

EXERCISES Exercise 13.1 Practice: Vorticity and incompressibility Sketch the streamlines for the following stationary two dimensional flows, determine if the fluid is compressible, and evaluate its vorticity. The coordinates are Cartesian in (i) and (ii), and are circular polar with orthonormal bases {er , eφ } in (iii) and (iv). (i) vx = 2xy, (ii) vx = x2 ,

vy = x2 . vy = −2xy

(iii) vr = 0,

vφ = r

(iv) vr = 0,

vφ = r −1 .

Exercise 13.2 ***Example: Joukowski’s Theorem When an appropriately curved airfoil is introduced into a steady flow of air, the air has to flow faster along the upper surface than the lower surface and this can create a lifting force (Fig. 13.6.) In this situation, compressibility and gravity are usually unimportant for the flow. Show that the pressure difference across the airfoil is given approximately by 1 ∆P = ρ∆(v 2 ) = ρv∆v . 2 Hence show that the lift exerted by the air on an airfoil of length L is given approximately by FL = L

Z

∆P dx = ρvLΓ ,

where Γ is the circulation around the airfoil. This is known as Joukowski’s theorem. Interpret this result in terms of the conservation of linear momentum, and sketch the overall flow pattern.

11 F

L

V

V

Fig. 13.6: Flow around an airfoil.

Exercise 13.3 ***Example: Rotating Superfluids Certain fluids at low temperature undergo a phase transition to a superfluid state. A good example is 4 He for which the transition temperature is 2.2K. As a superfluid has no viscosity, it cannot develop vorticity. How then can it rotate? The answer (e.g. Feynman 1972) is that not all the fluid is in a superfluid state; some of it is normal and can have vorticity. When the fluid rotates, all the vorticity is concentrated within microscopic vortex cores of normal fluid that are parallel to the rotation axis and have quantized circulations Γ = h/m, where m is the mass of the atoms and h is Planck’s constant. The fluid external to these vortex cores is irrotational. These normal fluid vortices may interact with the walls of the container. (i) Explain, using a diagram, how the vorticity of the macroscopic velocity field, averaged over many vortex cores, is twice the mean angular velocity of the fluid. (ii) Make an order of magnitude estimate of the spacing between these vortex cores in a beaker of superfluid helium on a turntable rotating at 10 rpm. (iii) Repeat this estimate for a millisecond neutron star, which mostly comprises superfluid neutron pairs at the density of nuclear matter and spins with a period of order a millisecond. (The mass of the star is roughly 3 × 1030 kg.)

****************************

13.3

Low Reynolds Number Flow – Stokes Flow, Sedimentation, and Climate Change

In the last chapter, we defined the Reynolds number, R, to be the ratio of the product of the characteristic speed and lengthscale of the flow to its kinematic viscosity. The significance of the Reynolds number follows from the fact that in the Navier-Stokes equation (13.2), the ratio of the magnitude of the inertial term |(v · ∇)v| to the viscous acceleration, |ν∇2 v| is approximately equal to R. Therefore, when R ≪ 1, the inertial acceleration can often be ignored and the velocity field is determined by balancing the pressure against the viscous stress. The velocity will then scale linearly with the magnitude of the pressure gradient

12 and will vanish when the pressure gradient vanishes. This has the amusing consequence that a low Reynolds number flow driven by a solid object moving through a fluid at rest is effectively reversible; if the motion of the object is reversed, then the fluid elements will return almost to their original positions. From the magnitudes of viscosities of real fluids [Table 12.3 (on page 12.36)], it follows that the low Reynolds number limit is appropriate for either very small scale flows (e.g. the motion of micro-organisms) or for very viscous fluids (e.g. the earth’s mantle). One important example of a small-scale flow arises in the issue of the degree to which cooling of the Earth due to volcanic explosions can mitigate global warming. The context is concern about anthropogenic (man-made) climate change. The Earth’s atmosphere is a subtle and fragile protector of the environment that allows life to flourish. Recent attention has focused on the increase in atmospheric carbon dioxide — by nearly 25% over the past fifty years to a mass of 3 × 1015 kg. As an important greenhouse gas, carbon dioxide traps solar radiation. Increases in its concentration are contributing to the observed increase in mean surface temperature, the rise of sea levels and the release of oceanic carbon dioxide with potential runaway consequences. These effects are partially mitigated by volcanos like Krakatoa, which exploded in 1883, releasing roughly 200 Megatons or ∼ 1018 J of energy and nearly 1014 kg of aerosol particles (soot etc.), of which ∼ 1012 kg was raised into the the stratosphere, where it remained for several years. These micron-sized particles absorb light with roughly their geometrical cross section. As the area of the earth’s surface is roughly 7 × 1014 m2 , and the density of the particles is roughly 2000kg m−3 , a mass M of about 1012 kg (the amount ejected into the stratosphere by Krakatoa) is sufficient to blot out the sun. More specifically, the micronsized particles absorb solar optical and ultraviolet radiation while remaining reasonably transparent to infra-red radiation escaping from the earth’s surface. The result is a noticeable global cooling of the earth for as long as the soot remains suspended in the atmosphere.5 A key issue in assessing how our environment is likely to change over the next century is how long the small particles of soot etc. will remain in the atmosphere after volcanic explosions, i.e. their rate of sedimentation. This is a problem in low Reynolds’ number flow. We shall model the sedimentation by computing the speed at which a spherical soot particle falls through quiescent air when the Reynolds number is small. The speed is governed by a balance between the downward force of gravity and the speed-dependent upward drag force of the air. We shall compute this speed by first evaluating the force of the air on the moving particle ignoring gravity, and then, at the end of the calculation, inserting the influence of gravity.

13.3.1

Stokes Flow

We model the soot particle as a sphere of radius a. The flow of a viscous fluid past such a sphere is known as Stokes flow. We will calculate this flow’s velocity field, and then from it, the force of the fluid on the sphere. (This calculation also finds application in the famous Millikan oil drop experiment.) 5

Similar effects would follow the explosion of nuclear weapons in a major nuclear war according to Turco et al (1983), a phenomenon that they dubbed “nuclear winter”.

13 It obviously is easiest to tackle this problem in the frame of the sphere and to seek a solution in which the flow speed, v(x), tends to a constant value V (the velocity of the sphere through the fluid), at large distances from the sphere’s center. The asymptotic velocity V is presumed to be highly subsonic and so the flow is effectively incompressible, ∇ · v = 0. It is also stationary, ∂v/∂t = 0 by virtue of the stationarity of the sphere and the distant flow.

x a

Fig. 13.7: Stokes flow around a sphere.

We define the Reynolds number for this flow by R = ρV a/η = V a/ν. As this is, by assumption, small, we can ignore the inertial term, which is O(V ∆v/a) in the Navier-Stokes Eq. (13.2) in comparison with the viscous term which is O(η∆v/a2 ρ); here ∆v ∼ V is the total velocity variation. The Navier-Stokes equation (13.2) can thus be well-approximated by ∇P = η∇2 v . (13.18) The full details of the flow are governed by this force-balance equation, the flow’s incompressibility ∇·v = 0, (13.19) and the boundary conditions v = 0 at r = a and v → V at r → ∞. From force balance (13.18) we infer that in order of magnitude the difference between the fluid’s pressure on the front of the sphere and that on the back is ∆P ∼ ηV /a. We also expect a viscous drag stress along the sphere’s sides of magnitude Trθ ∼ ηV /a, where V /a is the magnitude of the shear. These two stresses, acting on the sphere’s surface area ∼ a2 will produce a net drag force F ∼ ηV a. Our goal is to verify this order of magnitude estimate, compute the force more accurately, then balance this force against gravity and thereby infer the speed of fall V of a soot particle. For a highly accurate analysis of the flow, we could write the full solution as a perturbation expansion in powers of the Reynolds number R. We shall compute only the leading term in this expansion; the next term, which corrects for inertial effects, will be smaller than our solution by a factor O(R). Our solution to this classic problem is based on some general principles that ought to be familiar from other areas of physics. First, we observe that the quantities in which we are interested are the pressure P , the velocity v and the vorticity ω, a scalar, a polar vector and an axial vector, respectively. The only scalar we can form, linear in V, is V · x and we expect the variable part of the pressure to be proportional to this combination. For the

14 velocity we have two choices, a part ∝ V and a part ∝ (V · x)x and both terms are present. Finally for the vorticity, our only option is a term ∝ V × x. Now take the divergence of Eq. (13.18), and conclude that the pressure must satisfy Laplace’s equation, ∇2 P = 0. The solution should be axisymmetric about V, and we know that axisymmetric solutions to Laplace’s that decay as r → ∞ can be expanded P∞equation ℓ+1 as a sum over Legendre polynomials, ℓ=0 Pℓ (µ)/r , where µ is the cosine of the angle θ between V and x, and r is |x|. The dipolar, ℓ = 1 term [for which P1 (µ) = µ = V · x/(V r)], is all we need and so we write P = P∞ +

kη(V · x)a + ... . r3

(13.20)

Here k is a numerical constant which we must determine, we have introduced a factor η to make the k dimensionless, and P∞ is the pressure far from the sphere. Next consider the vorticity, which we can write in the form ω=

V×x f (r/a) . a2

(13.21)

The factor a appears in the denominator to make the unknown function f (ξ) dimensionless. We determine this unknown function by using ∇ · v = 0, rewriting Eq. (13.18) in the form ∇P = −η∇ × ω ,

(13.22)

and substituting Eq. (13.21) to obtain f (ξ) = kξ −3 , whence ω=

k(V × x)a r3

(13.23)

Now, Eq. (13.23) for the vorticity looks familiar. It has the form of the Biot-Savart law for the magnetic field from a current element. We can therefore write down immediately a formula for its associated “vector potential”, which in this case is the velocity v(x) =

kaV + ∇ψ . r

(13.24)

The addition of the ∇ψ term corresponds to the familiar gauge freedom in defining the vector potential. However in the case of fluid dynamics, where the velocity is a directly observable quantity, the choice of the scalar ψ is fixed by the boundary conditions instead of being free. As ψ is a scalar, it must be expressible in terms of a second dimensionless function g(ξ) as ψ = g(r/a)V · x . (13.25) Next we recall that the flow is incompressible, i.e. ∇ · v = 0. Substituting Eq. (13.25) into Eq. (13.24) and setting the divergence expressed in spherical polar coordinates to zero, we obtain an ordinary differential equation for g d2 g 4 dg k + − 3 = 0. 2 dξ ξ dξ ξ

(13.26)

15 This has the solution,

B k + 3 , (13.27) 2ξ ξ where A and B are integration constants. As v → V far from the sphere, the constant A = 1. The constants B, k can be found by imposing the boundary condition v = 0 for r = a. We thereby obtain B = −1/4, k = −3/2 and after substituting into Eq. (13.24) we obtain for the velocity field a 2 (V · x)x 3 a 3 3 a 1 a 3 − V− 1− v= 1− . (13.28) 4 r 4 r 4 r r a2 g(ξ) = A −

The associated pressure and vorticity are given by 3ηa(V · x) , 2r 3 3a(x × V) . (13.29) ω= 2r 3 The pressure is seen to be largest on the upstream hemisphere as expected. However, the vorticity, which points in the direction of eφ , is seen to be symmetric between the front and the back of the sphere. This is because, under our low Reynolds number approximation, we are neglecting the convection of vorticity by the velocity field and only retaining the diffusive term. Vorticity is generated on the front surface of the sphere at a rate νωφ,r = 3νV sin θ/a2 per unit length and diffuses into the surrounding flow; then, after the flow passes the sphere’s equator, the vorticity diffuses back inward and is absorbed onto the sphere’s back face, according to (13.28). An analysis that includes higher orders in the Reynolds number would show that not all of the vorticity is reabsorbed; some is left in the fluid, downstream from the sphere. We have been able to obtain a simple solution for low Reynolds number flow past a sphere. Although closed form solutions like this are not so common, the methods that were used to derive it are of widespread applicability. Let us recall them. First, we approximated the equation of motion by omitting the sub-dominant inertial term and invoked a symmetry argument. We used our knowledge of elementary electrostatics to write the pressure in the form (13.20). We then invoked a second symmetry argument to solve for the vorticity and drew upon another analogy with electromagnetic theory to derive a differential equation for the velocity field which was solved subject to the no-slip boundary condition on the surface of the sphere. Having obtained a solution for the velocity field and pressure, it is instructive to reexamine our approximations. The first point to notice is that the velocity perturbation, given by Eq. (13.28) dies off slowly, inversely proportional to distance r from the sphere. This implies that the region through which the sphere is moving must be much larger than the sphere; otherwise the boundary conditions at r → ∞ have to be modified. This is not a concern for a soot particle in the atmosphere. A second, related point is that, if we compare the sizes of the inertial term (which we neglected) and the pressure gradient (which we kept) in the full Navier-Stokes equation, we find ∇P ηaV V 2a ∼ |(v · ∇)v| ∼ 2 , . (13.30) r ρ ρr 3 P = P∞ −

16 Evidently, the inertial term becomes significant at a distance r & η/ρV ∼ a/R from the sphere. In fact, in order to improve upon our zero order solution we must perform a second expansion at large r including inertial effects and then match it asymptotically to a near zone expansion. This technique of matched asymptotic expansions is a very powerful and general way of finding approximate solutions valid over a wide range of length scales where the dominant physics changes from one scale to the next. Let us return to the problem which motivated this calculation, estimating the drag force on the sphere. This can be computed by integrating the stress tensor T = P g −2ησ over the surface of the sphere. If we introduce a local orthonormal basis, er , eθ , eφ , we readily see that the only non-zero viscous contribution to the surface stress tensor is Trθ = Tθr = η∂vθ /∂r. The net resistive force along the direction of the velocity is then given by

F =

Z

r=a

=−

Z

0

dΣ · T · V V 2π

3ηV cos2 θ 3ηV sin2 θ 2πa sin θdθ −P∞ cos θ + + 2a 2a 2

.

(13.31)

The integrals are elementary; they give for the total resistive force F = −6πηaV .

(13.32)

This is Stokes’ Law for the drag force in low Reynolds number flow. Two-thirds of the force comes from the viscous stress and one third from the pressure.

13.3.2

Sedimentation Rate

In order to compute the sedimentation rate of soot particles relative to the air we must restore gravity to our analysis. We can do so by regarding ∇P throughout the above analysis as that portion of the gradient of P which is not balancing the gravitational force on the fluid. Having done so, we obtain precisely the same flow, in the (spherical) soot particle’s reference frame as we obtained in the absence of gravity, and precisely the same viscous force on the particle’s surface. This viscous force 6πηaV must balance the gravitational force on the particle, 4πρs a3 ge /3, where ρs ∼ 2000kg m−3 is the density of soot. Hence, V =

2ρs a2 ge 9η

(13.33)

Now, the kinematic viscosity of air at sea level is, according to Table 12.3, ν ∼ 10−5 m2 s−1 and the density is ρa ∼ 1kg m−3 , so the coefficient of viscosity is η = ρa ν ∼ 10−5 kg m−1 s−1 . This viscosity is proportional to the square root of temperature and independent of the density [cf. Eq. (12.68)]; however, the temperature does not vary by more than about 25 per cent through the atmosphere, so for order of magnitude calculations we can use its value at sea level. Substituting the above values into (13.33), we obtain an equilibrium velocity of V ∼ 0.5(a/1µ)2mm s−1

(13.34)

17 We should also, for self-consistency, estimate the Reynolds number; it is R∼

2aV η

∼ 10

−4

a 1µ

3

(13.35)

Our analysis is therefore only likely to be adequate for particles of radius a . 10µ. The drift velocity (13.34) is much smaller than wind speeds in the upper atmosphere vwind ∼ 30m s−1 . However, as the stratosphere is reasonably stratified, the net vertical motion due to the winds is quite small and so we can estimate the settling time by dividing the height ∼ 30km by the speed (13.34) to obtain −2 −2 a a 7 tsettle ∼ 6 × 10 s∼2 months. (13.36) 1µ 1µ This calculation is a simple model for more serious and complex analyses of sedimentation after volcanic eruptions, and the resulting mitigation of global warming. Of course, huge volcanic eruptions are rare, so no matter the result of reliable future analyses, we cannot count on volcanos to save humanity from runaway global warming. **************************** EXERCISES Exercise 13.4 Problem: Oseen’s Paradox Consider low Reynolds number flow past an infinite cylinder and try to repeat the analysis we used for a sphere to obtain an order of magnitude estimate for the drag force per unit length. Do you see any difficulty, especially concerning the analog of Eq. (13.30)? Exercise 13.5 Problem: Viscosity of the Earth’s mantle. Episodic glaciation subjects the earth’s crust to loading and unloading by ice. The last major ice age was 10,000 years ago and the subsequent unloading produces a non-tidal contribution to the acceleration of the earth’s rotation rate of order |Ω| ≃ 6 × 1011 yr , ˙ |Ω| detectable from observing the positions of distant stars. Corresponding changes in the earth’s oblateness produce a decrease in the rate of nodal line regression of the geodetic satellite LAGEOS. Estimate the speed with which the polar regions (treated as spherical caps of radius ∼ 1000km) are rebounding now. Do you think the speed was much greater in the past? Geological evidence suggests that a particular glaciated region of radius about 1000km sank in ∼ 3000yr during the last ice age. By treating this as a low Reynolds number viscous flow, make an estimate of the coefficient of viscosity for the mantle. ****************************

18

V

y

(x)

x plate

boundary layer

Fig. 13.8: Laminar boundary layer formed by a long, thin plate in a flow with asymptotic speed V . The length ℓ of the plate must give a Reynolds number Rℓ ≡ V ℓ/ν in the range 10 . Rℓ . 106 ; if Rℓ is much less than 10, the plate will be in or near the regime of low Reynolds number flow (Sec. 13.3 above), and the boundary layer will be so thick everywhere that our analysis will fail. If Rℓ is much larger than 106 , then at sufficiently great distances x down the plate (Rx = V x/ν & 106 ), the boundary layer will become unstable and its simple laminar structure will be destroyed; see Chap. 14.

13.4

High Reynolds Number Flow – Laminar Boundary Layers

As we have described, flow near a solid surface creates vorticity and, consequently, the velocity field near the surface cannot be derived from a scalar potential, v = ∇ψ. However, if the Reynolds number is high, then the vorticity may be localized within a thin boundary layer adjacent to the surface, as in Fig. 13.4 above; and the flow may be very nearly of potential form v = ∇ψ outside that boundary layer. In this section we shall use the equations of hydrodynamics to model the flow in the simplest example of such a boundary layer: that formed when a long, thin plate is placed in a steady, uniform flow V ex with its surface parallel to the flow (Fig. 13.8). If the plate is not too long (see caption of Fig. 13.8), then the flow will be laminar, i.e. steady and two-dimensional—a function only of the distances x along the plate’s length and y perpendicular to the plate (both being measured from an origin at the plate’s front). We assume the flow to be very subsonic, so it can be regarded as incompressible. As the viscous stress decelerates the fluid close to the plate, it must therefore be deflected away from the plate to avoid accumulating, thereby producing a small y component of velocity along with the larger x component. As the velocity is uniform well away from the plate, the pressure is constant outside the boundary layer. We use this to motivate the approximation that P is also constant within the boundary layer. After solving for the flow, we will check the self-consistency of this ansatz. With P = constant and the flow stationary, only the inertial and viscous terms remain in the Navier-Stokes equation (13.2): (v · ∇)v ≃ ν∇2 v .

(13.37)

This equation must be solved in conjunction with ∇ · v = 0 and the boundary conditions v → V ex as y → ∞ and v → 0 as y → 0. The fluid first encounters the no-slip boundary condition at the front of the plate, x = y = 0. The flow there abruptly decelerates to vanishing velocity, creating a sharp velocity

19 gradient that contains a sharp spike of vorticity. This is the birth of the boundary layer. As the fluid flows on down the plate, from x = 0 to larger x, the vorticity gradually diffuses outward from the wall into the flow, thickening the boundary layer. Let us compute, in order of magnitude, the boundary layer’s thickness δ(x) as a function of distance x down the plate. Incompressibility, ∇ · v = 0, implies that vy ∼ vx δ/x. Using this to estimate the relative magnitudes of the various terms in the x component of the force balance equation (13.37), we see that the dominant inertial term (left hand side) is ∼ V 2 /x and the dominant viscous term (right hand side) is ∼ νV /δ 2 . We therefore obtain p the estimate δ ∼ νx/V . This motivates us to define the function δ(x) ≡

νx 1/2 V

.

(13.38)

for use in our quantitative analysis. Our analysis will reveal that the actual thickness of the boundary layer is several times larger than this δ(x). Equation (13.38) shows that the boundary layer has a parabolic shape. To keep our analysis manageable, we shall confine ourselves to the region, not too close to the front of the plate, where the layer is thin, δ ≪ x, and the velocity is nearly parallel to the plate, vy ∼ (δ/x)vx ≪ vx . To proceed further we use a technique of widespread applicability in fluid mechanics: we make a similarity anszatz. We suppose that, once the boundary layer has become thin (δ ≪ x), the cross sectional shape of the flow is independent of distance x down the plate (it is “similar” at all x). Stated more precisely, we assume that vx (x, y) (which has magnitude ∼ V ) and (x/δ)vy (which also has magnitude ∼ V ) are functions only of the single dimensionless variable r y V ξ= (13.39) =y . δ(x) νx Our task, then, is to compute v(ξ) subject to the boundary conditions v = 0 at ξ = 0, and v = V ex at ξ ≫ 1. We do so with the aid of a second, very useful calculational device. Recall that any vector field [v(x) in our case] can be expressed as the sum of the gradient of a scalar potential and the curl of a vector potential, v = ∇ψ + ∇ × A. If our flow were irrotational ω = 0, we would need only ∇ψ, but it is not; the vorticity in the boundary layer is large. On the other hand, to high accuracy the flow is incompressible, θ = ∇ · v = 0, which means we need only the vector potential, v = ∇ × A; and because the flow is two dimensional (depends only on x and y and has v pointing only in the x and y directions), the vector potential need only have a z component, A = Az ez . We denote its nonvanishing component by Az ≡ ζ(x, y) and give it the name stream function, since it governs how the laminar flow streams. In terms of the stream function, the relation v = ∇ × A takes the simple form ∂ζ ∂ζ vx = (13.40) , vy = − . ∂y ∂x This, automatically, satisfies ∇ · v = 0.

20 Since the stream function varies on the lengthscale δ, in order to produce a velocity field with magnitude ∼ V , it must have magnitude ∼ V δ. This motivates us to guess that it has the functional form ζ = V δ(x)f (ξ) , (13.41) where f (ξ) is some dimensionless function of order unity. This will be a good guess if, when inserted into Eq. (13.40), it produces a self-similar flow, i.e. one with vx and (x/δ)vy depending only on ξ. Indeed, inserting Eq. (13.41) into Eq. (13.40), we obtain vx = V f ′ ,

vy =

δ(x) V (ξf ′ − f ) , 2x

(13.42)

where the prime means d/dξ. This has the desired self-similar form. By inserting these self-similar vx and vy into the x component of the force-balance equation (13.37), we obtain a non-linear third order differential equation for f (ξ) f d2 f d3 f + =0 dξ 3 2 dξ 2

(13.43)

p The fact that this equation involves x and y only in the combination ξ = y V /νx confirms that our self-similar anszatz was a good one. Equation (13.43) must be solved subject to the boundary condition that the velocity vanish at the surface and approach V as y → ∞; i.e. [cf. Eqs. (13.42)] that f (0) = f ′ (0) = 0 , f ′ (∞) = 1 . (13.44)

1.0

vx / V

/V

0.5

0

1

2

3

4

5

=y/

Fig. 13.9: Scaled velocity profile vx /V = f ′ (ξ) (solid line) for a laminar boundary layer as a function of scaled perpendicular distance ξ = y/δ. Note that the flow speed is 90 per cent of V at a distance of 3δ from the surface and so δ is a good measure of the thickness of the boundary layer. Also shown is the scaled vorticity profile ωδ/V . [These figures are hand-sketched; a more accurate numerically generated solution will be provided in future version of this chapter.]

Not surprisingly, Eq. (13.43) does not admit an analytic solution. However, it is simple to compute a numerical solution with the boundary conditions (13.44). The result for vx /V =

21 f ′ (ξ) is shown in Fig. 13.9. This solution, the Blasius profile, has qualitatively the form we expected: the velocity vx rises from 0 to V in a smooth manner as one moves outward from the plate, achieving a sizable fraction of V at a distance several times larger than δ(x). This Blasius profile is our first example of a common procedure in fluid dynamics: taking account of a natural scaling in the problem to make a self-similar anzatz and thereby transform the partial differential fluid equations into ordinary differential equations. Solutions of this type are known as similarity solutions. The motivation for using similarity solutions is obvious. The non-linear partial differential equations of fluid dynamics are much harder to solve, even numerically, than ordinary differential equations. Elementary similarity solutions are especially appropriate for problems where there is no characteristic length or timescale associated with the relevant physical quantities except those explicitly involving the spatial and temporal coordinates. Large Reynolds number flow past a large plate has a useful similarity solution, whereas flow with Rℓ ∼ 1, where the size of the plate is clearly a significant scale in the problem, does not. We shall encounter more examples of similarity solutions in the following chapters. Now that we have a solution for the flow, we must examine a key approximation that underlies it: constancy of the pressure P . To do this, we begin with the y component of the force-balance equation (13.37) (a component that we never used explicitly in our analysis). The inertial and viscous terms are both O(V 2 δ/x2 ), so if we re-instate a term −∇P/ρ ∼ −∆P/ρδ, it can be no larger than the other two terms. From this we estimate that the pressure difference across the boundary layer is ∆P . ρV 2 δ 2 /x2 . Using this estimate in the x component of force balance (13.37) (the component on which our analysis was based), we verify that the pressure gradient term is smaller than those we kept by a factor . δ 2 /x2 ≪ 1. For this reason, when the boundary layer is thin we can, indeed, neglect pressure gradients across it in computing its structure from longitudinal force balance. It is of interest to compute the total drag force exerted on the plate. Letting ℓ be the plate’s length and w ≫ ℓ be its width, and noting that the plate has two sides, the drag force produced by the viscous stress acting on the plate’s surface is Z ℓ ∂vx F = 2w ρν dx . (13.45) ∂y y=0 0 p Inserting ∂vx /∂y = (V /δ)f ′′(0) = V ν/V xf ′′ (0) from Eq. (13.42), and performing the integral, we obtain 1 F = ρV 2 × (2ℓw) × CD , (13.46) 2 where −1/2 CD = 4f ′′ (0)Rℓ . (13.47) Here we have introduced an often-used notation for expressing the drag force of a fluid on a solid body: we have written it as half the incoming fluid’s kinetic stress ρV 2 , times the surface area of the body 2ℓw on which the drag force acts, times a dimensionless drag coefficient CD , and we have expressed the drag coefficient in terms of the Reynolds number Rℓ =

Vℓ ν

(13.48)

22 formed from the body’s relevant dimension, ℓ, and the speed and viscosity of the incoming fluid. From Fig. 13.9, we estimate that f ′′ (0) ≃ 0.3 (an accurate numerical value is 0.332), and −1/2 so CD ≃ 1.328Rℓ . Note that the drag coefficient decreases as the viscosity decreases and the Reynolds number increases. However, as we shall discuss in the next section, this model breaks down for very large Reynolds number Rℓ & 106 because the boundary layer becomes turbulent.

13.4.1

Vorticity Profile

It is illuminating to consider the structure of the boundary layer in terms of the vorticity. Since the flow is two-dimensional with velocity v = ∇ × (ζez ), its vorticity is ω = ∇ × ∇ × (ζez ) = −∇2 (ζez ), which has as its only nonzero component V (13.49) ω ≡ ωz = −∇2 ζ ≃ − f ′′ (ξ) . δ This vorticity is exhibited in Fig. 13.9. From Eq. (13.43), we observe that the gradient of vorticity vanishes at the plate. This means that the vorticity is not diffusing out of the plate’s surface. Neither is it being convected away from the plate’s surface, as the perpendicular velocity vanishes there. However, there is no vorticity upstream so it must have been created somewhere. Its source is the plate’s leading edge, where, as we have already remarked, our approximations must break down. The vorticity created impulsively there diffuses away from the plate in the downstream flow. If we transform into a frame moving with an intermediate speed ∼ V /2, and measure time t since passing the leading edge, the vorticity will diffuse a distance ∼ (νt)1/2 away from the surface after time t just like heat in a thermal conductor. This accounts for the parabolic shape of the boundary layer. V x C

Boundary layer v=0

y

R Fig. 13.10: The circulation Γ = C v · dx. The circuit consists of a rectangle with one side of length ∆x parallel to the surface and the other extending from the surface perpendicularly through the boundary layer and into the surrounding homogeneous flow. The circulation is therefore given simply by Γ = V ∆x.

Equivalently we can consider what happens to the circulation Γ. The circulation associated with the circuit shown in Figure 13.10 is just V ∆x, or simply V per unit length of the plate. Since V is constant along the surface (independent of x), no vorticity is created along solid surface; it must all be be created at the leading edge.

23 V Surface of separation

Fig. 13.11: Separation of a boundary layer in the presence of an adverse pressure gradient.

13.4.2

Separation

Next consider flow past a non-planar surface, e.g. an aircraft wing. In this case, there in general will be a pressure gradient along the boundary layer, which cannot be ignored in contrast to the pressure gradient across the boundary layer. If the pressure decreases along the flow, the flow will accelerate and more vorticity will be created at the surface and will diffuse away from the surface. However, if there is an “adverse” pressure gradient and the flow decelerates, then negative vorticity must be created at the wall. For a sufficiently adverse gradient, the negative vorticity gets so strong that it cannot diffuse fast enough into and through the boundary layer to maintain a simple boundary-layer-type flow. Instead, the boundary layer separates from the surface, as shown in Fig. 13.11, and a backward-flow is generated beyond the separation point by the negative vorticity. This phenomenon can occur on an aircraft when the wings’ angle of attack (i.e. the inclination of the wings to the horizontal) is too great. An unfavorable pressure gradient develops on the upper wing surfaces, the flow separates and the plane stalls. The designers of wings make great effort to prevent this, as we shall discuss briefly in the next chapter. **************************** EXERCISES Exercise 13.6 Example: Potential Flow around a Cylinder (D’Alembert’s Paradox) Consider stationary incompressible flow around a cylinder of radius a with sufficiently large Reynolds number that viscosity may be ignored except in a thin boundary layer which is assumed to extend all the way around the cylinder. The velocity is assumed to have the uniform value V at large distances from the cylinder. Show that the velocity field outside the boundary layer may be derived from a scalar potential ψ which satisfies Laplace’s equation, ∇2 ψ = 0. Write down suitable boundary conditions for ψ. Now write the velocity potential in the form ψ = V · x + f (x) and solve for f . Sketch the streamlines and equipotentials. Next use Bernoulli’s equation to compute the pressure distribution over the surface and the net drag force given by this solution. Does this seem reasonable? Finally, consider the effect of the pressure distribution on the boundary layer. How do you think the real flow will be different from the potential solution? How will the drag change?

24 Exercise 13.7 Problem: Reynolds Numbers Estimate the Reynolds numbers for the following flows. Make sketches of the flow fields pointing out any salient features. (i) A hang glider in flight. (ii) Planckton in the ocean. (iii) A physicist waving her hands. Exercise 13.8 Problem: Fluid Dynamical Scaling An auto manufacturer wishes to reduce the drag force on a new model by changing its design. She does this by building a one eighth scale model and putting it into a wind tunnel. How fast must the air travel in the wind tunnel to simulate the flow at 60 mph on the road? Exercise 13.9 Example: Stationary Laminar Flow down a Long Pipe Fluid flows down a long cylindrical pipe with length b much larger than radius a, from a reservoir maintained at pressure P0 , to a free end where the pressure is negligible. In this problem, we try to understand the velocity profile for a given “discharge” (i.e. mass flow per unit time) M˙ of fluid along the pipe. We assume that the Reynolds number is small enough for the flow to be treated as laminar. (i) Close to the entrance of the pipe, the boundary layer will be very thin and the velocity will be nearly independent of radius. What will be the fluid velocity there in terms of its density and M˙ ? How far must the fluid travel along the pipe before the vorticity diffuses into the center of the flow and the boundary layer becomes as thick as the radius? An order of magnitude calculation is adequate and you may assume that the pipe is much longer than your estimate. (ii) Use the Poiseuille formula derived in Sec. 12.7.6 to derive M˙ and the velocity profile. Hence sketch how the velocity profile changes along the pipe. (iii) Outline a procedure for computing the discharge in a long pipe of arbitrary cross section.

****************************

13.5

Nearly Rigidly Rotating Flow — Earth’s Atmosphere and Oceans

One often encounters, in Nature, fluids that are nearly rigidly rotating, i.e. fluids with a nearly uniform distribution of vorticity. The Earth’s oceans and atmosphere are important examples, where the rotation is forced by the underlying rotation of the Earth. Such rotating fluids are best analyzed in a rotating reference frame, in which the unperturbed fluid is at rest and the perturbations are influenced by Coriolis forces, resulting in surprising phenomena. We shall explore some of these phenomena in this section.

25

13.5.1

Equations of Fluid Dynamics in a Rotating Reference Frame

As a foundation for this exploration, we wish to transform the Navier-Stokes equation from the inertial frame in which it was derived to a uniformly rotating frame: the mean rest frame of the flows we shall study. We begin by observing that the Navier-Stokes equation has the same form as Newton’s second law for particle motion: dv = f, (13.50) dt where the force per unit mass is f = −∇P/ρ−∇Φ+ν∇2 v. We transform to a frame rotating with uniform angular velocity Ω by adding “fictitious” Coriolis and centrifugal accelerations, given respectively by −2Ω × v and −Ω × (Ω × x), and expressing the force f in rotating coordinates. The fluid velocity transforms as v → v + Ω × x.

(13.51)

It is straightforward to verify that this transformation leaves the expression for the viscous acceleration, ν∇2 v, unchanged. Therefore the expression for the force is unchanged, and the Navier-Stokes equation in rotating coordinates becomes dv ∇P =− − ∇Φ + ν∇2 v − 2Ω × v − Ω × (Ω × x) . dt ρ

(13.52)

Now, the centrifugal acceleration −Ω × (Ω × x) can be expressed as the gradient of a centrifugal potential, ∇[ 21 (Ω × x)2 ] = ∇[ 21 (Ω̟)2 ], where ̟ is distance from the rotation axis. For simplicity, we shall confine ourselves to an incompressible fluid so that ρ is constant. This allows us to define an effective pressure 1 2 ′ P = P + ρ Φ − (Ω × x) (13.53) 2 that includes the combined effects of the real pressure, gravity and the centrifugal force. In terms of P ′ the Navier-Stokes equation in the rotating frame becomes ∇P ′ dv =− + ν∇2 v − 2Ω × v . dt ρ

(13.54a)

The quantity P ′ will be constant if the fluid is at rest, in contrast to the true pressure P which does have a gradient. Equation (13.54a) is the most useful form for the Navier-Stokes equation in a rotating frame. For incompressible (very subsonic) flows we augment it by the incompressibility condition ∇ · v = 0, which is left unchanged by the transformation (13.51) to a rotating reference frame: ∇·v = 0. (13.54b) It should be evident from Eq. (13.54a) that two dimensionless numbers characterize rotating fluids. The first is the Rossby number, Ro =

V , LΩ

(13.55)

26 where V is a characteristic velocity of the flow relative to the rotating frame and L is a characteristic length. Ro measures the relative strength of the inertial acceleration and the Coriolis acceleration: inertial force |(v · ∇)v| ∼ . Ro ∼ (13.56) |2Ω × v| Coriolis force The second dimensionless number is the Ekman number, Ek =

ν , ΩL2

(13.57)

which similarly measures the relative strengths of the viscous and Coriolis accelerations: Ek ∼

|ν∇2 v| viscous force ∼ . |2Ω × v| Coriolis force

(13.58)

Notice that Ro/Ek = R is the Reynolds’ number. The three traditional examples of rotating flows are large-scale storms and other weather patterns on the rotating earth, deep currents in the earth’s oceans, and water in a stirred teacup. For a typical storm, the wind speed might be L ∼ 25 mph ∼ 10 m s−1 , and a characteristic length scale might be L ∼ 1000 km. The effective angular velocity at a temperate latitude is Ω⋆ = Ω⊕ sin 45◦ ∼ 10−4 rad s−1 , where Ω⊕ is the Earth’s angular velocity (cf. the next paragraph). As the air’s kinematic viscosity is ν ∼ 10−5 m2 s−1 , we find that Ro ∼ 0.1 and Ek ∼ 10−13 . This tells us immediately that rotational effects are important but not totally dominant in controlling the weather and that viscous boundary layers will be very thin. For deep ocean currents such as the gulf stream, V ranges from ∼ 0.01 to ∼ 1 m s−1 , so we use V ∼ 0.1 m s−1 , and lengthscales are L ∼ 1000 km, so Ro ∼ 10−3 and Ek ∼ 10−14 . Thus, Coriolis accelerations are very important and boundary layers are very thin. For water stirred in a teacup (with parameters typical of many flows in the laboratory), L ∼ 10 cm, Ω ∼ V /L ∼ 10 rad s−1 and ν ∼ 10−6 m2 s−1 giving Ro ∼ 1, Ek ∼ 10−5 . Viscous forces are somewhat more important in this case. For large-scale flows in the earth’s atmosphere or oceans (e.g. storms), the rotation of the unperturbed fluid is that due to rotation of the earth. One might think that this means we should take, as the angular velocity Ω in the Coriolis term of the Navier-Stokes equation (13.54a), the earth’s angular velocity Ω⊕ . Not so. The atmosphere and ocean are so thin vertically that vertical motions cannot achieve small Rossby numbers; i.e. Coriolis forces are unimportant for vertical motions. Correspondingly, the only component of the Earth’s angular velocity Ω⊕ that is important for Coriolis forces is that which couples horizontal flows to horizontal flows: the vertical component Ω∗ = Ω⊕ sin(latitute). (A similar situation occurs for a Foucault pendulum). Thus, in the Coriolis term of the Navier-Stokes equation we must set Ω = Ω∗ ez = Ω⊕ sin(latitude)ez , where ez is the vertical unit vector. By contrast, in the centrifugal potential 21 (Ω × x)2 , Ω remains the full angular velocity of the Earth, Ω⊕ . Uniform rotational flows in a teacup or other vessel also typically have Ω vertically directed. This will be the case for all flows considered in this section.

27

13.5.2

Geostrophic Flows

Stationary flows ∂v/∂t = 0 in which both the Rossby and Ekman numbers are small (i.e. with Coriolis forces big compared to inertial and viscous forces) are called geostrophic, even in the laboratory. Geostrophic flow is confined to the bulk of the fluid, well away from all boundary layers, since viscosity will become important in those layers. For such geostrophic flow, the Navier-Stokes equation (13.54a) reduces to 2Ω × v = −

∇P ′ ρ

for geostrophic flow.

(13.59)

This equation says that the velocity v (measured in the rotating frame) is orthogonal to the body force ∇P ′ , which drives it. Correspondingly, the streamlines are perpendicular to the gradient of the generalized pressure; i.e. they lie in the surfaces of constant P ′. An example of geostrophic flow is the motion of atmospheric winds around a low pressure region or depression. The geostrophic equation (13.59) tells us that such winds must be counter-clockwise in the northern hemisphere as seen from a satellite, and clockwise in the southern hemisphere. For a flow with speed v ∼ 10 m s−1 around a ∼ 1000 km depression, the drop in effective pressure at the depression’s center is ∆P ′ ∼ 1 kPa ∼ 10 mbar ∼ 0.01 atmosphere ∼ 0.3 inches of mercury. Around a high-pressure region winds will circulate in the opposite direction. It is here that we can see the power of introducing the effective pressure P ′. In the case of atmospheric and oceanic flows, the true pressure P changes significantly vertically, and the pressure scale height is generally much shorter than the horizontal length scale. However, the effective pressure will be almost constant vertically, any small variation being responsible for minor updrafts and downdrafts which we generally ignore when describing the wind or current flow pattern. It is the horizontal pressure gradients which are responsible for driving the flow. When pressures are quoted, they must therefore be referred to some reference equipotential surface, Φ − 21 (Ω × x)2 =constant. The convenient one to use is the equipotential associated with the surface of the ocean, usually called “mean sea level”. This is the pressure that appears on a meteorological map.

13.5.3

Taylor-Proudman Theorem

There is a simple theorem due to Taylor and Proudman which simplifies the description of three dimensional, geostrophic flows. Take the curl of Eq. (13.59) and use ∇ · v = 0 for incompressible flow; the result is (Ω · ∇)v = 0. (13.60) Thus, there can be no vertical gradient of the velocity under geostrophic conditions. This result provides a good illustration of the stiffness of vortex lines. The simplest demonstration of this theorem is the Taylor column of Fig. 13.12. It is easy to see that any vertically constant, divergence-free velocity field v(x, y) can be a solution to the geostrophic equation (13.59). The generalized pressure P ′ can be adjusted to make it a solution. However, one must keep in mind that to guarantee it is also a true

28

L

Above

Side

Fig. 13.12: Taylor Column. A solid cylinder is placed in a large container of water which is then spun up on a turntable to a high enough angular velocity Ω that the Ekman number is small, Ek = ν/ΩL2 ≪ 1. A slow, steady flow relative to the cylinder is then induced. (The flow’s velocity v in the rotating frame must be small enough to keep the Rossby number ≪ 1.) The water in the bottom half of the container flows around the cylinder. The water in the top half does the same as if there were an invisible cylinder present. This is an illustration of the Taylor-Proudman theorem which states that there can be no vertical gradients in the velocity field. The effect can also be demonstrated with vertical velocity: If the cylinder is slowly made to rise, then the fluid immediately above it will also be pushed upward rather than flow past the cylinder—except at the water’s surface, where the geostrophic flow breaks down. The fluid above the cylinder, which behaves as though it were rigidly attached to the cylinder, is called a Taylor column.

(approximate) solution of the full Navier-Stokes equation (13.54a), its Rossby and Ekman numbers must be ≪ 1.

13.5.4

Ekman Boundary Layers

As we have seen, Ekman numbers are typically very small in the bulk of a rotating fluid. However, as was true in the absence of rotation, the no slip condition at a solid surface generates a boundary layer that can indirectly impose a major influence on the global velocity field. When the Rossby number is small, typically . 1, the structure of a laminar boundary layer is dictated by a balance between viscous and Coriolis forces rather than viscous and inertial forces. Balancing the relevant terms in Eq. (13.54a), we obtain an estimate of the boundary-layer thickness: r ν thickness ∼ δE ≡ (13.61) . Ω In other words, the thickness of the boundary layer is that which makes the layer’s Ekman number unity, Ek(δE ) = ν/(ΩδE2 ) = 1.

29 vy

z=

3

E

vx

4

z=

z=0

E

4 Net mass flux

Applied Stress

Fig. 13.13: Ekman pumping at a surface where the wind exerts a stress.

Consider such an “Ekman boundary layer” at the bottom or top of a layer of geostrophically flowing fluid. For the same reasons as we met in the case of ordinary laminar boundary layers, the generalized pressure P ′ will be nearly independent of height z through the Ekman layer; i.e. it will have the value dictated by the flow just outside the layer: ∇P ′ = −2ρΩ × V = constant. Here V is the velocity just outside the layer (the velocity of the bulk flow), which we assume to be constant on scales ∼ δE . Since Ω is vertical, ∇P ′ like V will be horizontal, i.e., they will lie in the x, y plane. To simplify the analysis, we introduce the fluid velocity relative to the bulk flow, w ≡v−V ,

(13.62)

which goes to zero outside the boundary layer. When rewritten in terms of w, the NavierStokes equation (13.54a) [with ∇P ′ /ρ = −2Ω × V and with dv/dt = ∂v/∂t + (v · ∇)v = 0 because the flow is steady and v is horizontal and varies only vertically] takes the simple form d2 w = (2/ν)Ω × w. Assuming Ω is in the +z direction and introducing the complex quantity w = wx + iwy , (13.63) we can rewrite this as

d2 w 2i = 2w= 2 dz δE

1+i δE

2

w,

(13.64)

This must be solved subject to w → 0 far from the water’s boundary and some appropriate condition at the boundary. For a first illustration of an Ekman layer, consider the effects of a wind blowing in the ex direction above a still ocean and orient ez vertically upward from the ocean’s surface. The wind will exert, through a turbulent boundary layer of air, a stress Txz on the ocean’s surface, and this must be balanced by an equal, viscous stress at the top of the water’s boundary layer. Thus, there must be a velocity gradient, dvx /dz = Txz /νρ in the water at z = 0. (This replaces the “no slip” boundary condition that we have when the boundary is a solid surface.) Imposing this boundary condition along with w → 0 as z → −∞, we find

30 vy z=

z=0

E

/4

V

vx

Fig. 13.14: Ekman Spiral: The velocity in the bottom boundary layer when the bulk flow above it moves geostrophically with speed V ex . .

from Eqs. (13.64) and (13.63):

Txz δE wx = √ ez/δE cos(z/δE − π/4) , 2 νρ Txz δE wy = √ ez/δE sin(z/δE − π/4) , 2 νρ

(13.65)

for z ≤ 0; cf. Fig. 13.13. As a function of depth, this velocity field has the form of a spiral—the so-called Ekman spiral. When Ω points toward us (as in Fig. 13.13), the spiral is clockwise and tightens as we move away from the boundary (z = 0 in the figure) into the bulk flow. By integrating the mass flux ρv over z, we find for the total mass flowing per unit time per unit length of the ocean’s surface Z 0 δ2 (13.66) F=ρ vdz = − E Txz ey ; 2ν −∞ see Fig. 13.13. Thus, the wind, blowing in the ex direction, causes a net mass flow in the direction of ex × Ω/Ω = −ey . This response may seem less paradoxical if one recalls how a gyroscope responds to applied forces. This mechanism is responsible for the creation of gyres in the oceans; cf. Ex. 13.10 and Fig. 13.15. As a second illustration of an Ekman boundary layer, we consider a geostrophic flow with nonzero velocity V = V ex in the bulk of the fluid, and we examine this flow’s interaction with a static, solid surface at its bottom. The structure of the boundary layer on the bottom can be inferred from that of our previous example by adding a constant velocity and a corresponding constant pressure gradient and setting z → −z [note that the differential equation (13.64) is invariant under z → −z]. The resulting solution is vx = wx = V exp(−z/δE ) cos(z/δE ) ,

vy = wy = −V exp(−z/δE ) sin(z/δE ) .

(13.67)

This solution, shown in Fig. 13.14, is a second example of the Ekman spiral. Ekman boundary layers are important because they can circulate rotating fluids faster than viscous diffusion. Suppose we have a nonrotating container (e.g., a tea cup) of radius ∼

31 O ce a n S ur f a c e

Wes t er l y W i nd s

North America Gul

ea f Str

m

We s t Wi nd

D

rif

Europe

t

Sargasso

en

t

Sea

N o r th E q ua to ria l C ur re nt

C

ies anar

Cu

rr

Africa

Tra d e Wi n d s

Fig. 13.15: Winds and ocean currents in the north Atlantic. The upper inset shows the surface currents, along the dotted north-south line, that produce the Sargasso-Sea gyre.

L, containing a fluid that rotates with angular velocity Ω (e.g., due to stirring; cf. Ex. 13.11). As you will see in your analysis of Ex. 13.11, the Ekman layer at the container’s bottom experiences a pressure difference between the wall and the container’s center given by ∆P ∼ ρL2 Ω2 . This drives a fluid circulation in the Ekman layer, from the wall toward the center, with radial speed V ∼ ΩL. The circulating fluid must upwell at the bottom’s center from the Ekman layer into the bulk fluid. This produces a poloidal mixing of the fluid on a timescale given by LδE L3 ∼ . (13.68) tE ∼ LδE V ν This is shorter than the timescale for simple diffusion of vorticity, tν ∼ L2 /ν, by a factor √ tE /tν ∼ Ek, which as we have seen can be very small. This circulation and mixing are key to the piling up of tea leaves at the bottom center of a stirred tea cup, and to the mixing of the tea or milk into the cup’s hot water; cf. Ex. 13.11. **************************** EXERCISES Exercise 13.10 ***Example: Winds and Ocean Currents in the North Atlantic In the north Atlantic Ocean there is the pattern of winds and ocean currents shown in Fig. 13.15. Westerly winds blow from west to east at 40 degrees latitude. Trade winds blow from east to west at 20 degrees latitude. In between, around 30 degrees latitude, is the Sargasso Sea: A 1.5-meter high gyre (raised hump of water). The gyre is created by ocean surface currents, extending down to a depth of only about 30 meters, that flow northward from the trade-wind region and southward from the westerly wind region; see the upper

32 inset in Fig. 13.15. A deep ocean current, extending from the surface down to near the bottom, circulates around the Sargasso-Sea gyre in a clockwise manner. This current goes under different names in different regions of the ocean: gulf stream, west wind drift, Canaries current, and north equatorial current. Explain both qualitatively and semiquantitatively (in order of magnitude) how the winds are ultimately responsible for all these features of the ocean. More specifically, (a) Explain the surface currents in terms of an Ekman layer at the top of the Ocean, and explain why their depth is about 30 meters. Explain, further, why the height of the gyre that they produce in the Sargasso Sea is about 1.5 meters. (b) Explain the deep ocean current (gulf stream etc.) in terms of a geostrophic flow, and estimate the speed of this current. (c) If there were no continents on Earth, but only an ocean of uniform depth, what would be the flow pattern of this deep current—it’s directions of motion at various locations around the Earth, and its speeds? The continents (North America, Europe, Africa) must be responsible for the deviation of the actual current (gulf stream, etc.) from this continent-free flow pattern. How do you think the continents give rise to the altered flow pattern ? Exercise 13.11 ***Example: Circulation in a Tea Cup Place tea leaves and water in a tea cup or glass or other, larger container. Stir the water until it is rotating uniformly, and then stand back and watch the motion of the water and leaves. Notice that the tea leaves tend to pile up at the cup’s center. An Ekman boundary layer on the bottom of the cup is responsible for this. In this exercise you will explore the origin and consequences of this Ekman layer. (a) Evaluate the pressure distribution P (̟, z) in the bulk flow (outside all boundary layers), assuming that it rotates rigidly. (Here z is height and ̟ is distance from the water’s rotation axis.) Perform your evaluation in the water’s rotating reference frame. From this P (̟, z) deduce the shape of the top surface of the water. Compare your deduced shape with the actual shape in your experiment. (b) Estimate the thickness of the Ekman layer at the bottom of your container. It is very thin. Show, using the Ekman spiral diagram, that the water in this Ekman layer flows inward toward the container’s center, causing the tea leaves to pile up at the center. Estimate the radial speed of this Ekman-layer flow and the mass flux that it carries. (c) To get a simple physical understanding of this inward flow, examine the radial gradient ∂P/∂̟ of the pressure P in the bulk flow just above the Ekman layer. Explain why ∂P/∂̟ in the Ekman layer will be the same as in the rigidly rotating flow above it. Then apply force balance in an inertial frame to deduce that the water in the Ekman layer will be accelerated inward toward the center.

33 (d) Using geostrophic-flow arguments, deduce the fate of the boundary-layer water after it reaches the center of the container’s bottom: where does it go? What is the largescale circulation pattern that results from the “driving force” of the Ekman layer’s mass flux? What is the Rossby number for this large-scale circulation pattern? How and where does water from the bulk, nearly rigidly rotating flow, enter the bottom boundary layer so as to be swept inward toward the center? (d) Explain how this large-scale circulation pattern can mix much of the water through the boundary layer in the time tE of Eq. (13.68). What is the value of this tE for the water in your container? Explain why this, then, must also be the time for the angular velocity of the bulk flow to slow substantially. Compare your computed value of tE with the observed slow-down time for the water in your container.

****************************

13.6

Kelvin-Helmholtz Instability — Excitation of Ocean Waves by Wind

A particularly simple type of vorticity distribution is one where the vorticity is confined to a thin, plane interface between two immiscible fluids. In other words, we suppose that one fluid is in uniform motion relative to the other. This type of flow arises quite frequently; for example, when the wind blows over the ocean or when smoke from a smokestack discharges into the atmosphere. Following the discussion of the previous section, we expect that a boundary layer will form at the interface and the vorticity will slowly diffuse away from it. However, quite often the boundary layer is much thinner than the length scale associated with the flow and we can therefore treat it as a plane discontinuity in the velocity field. It turns out that this type of flow is unstable, as we shall now demonstrate. This instability is known as the Kelvin-Helmholtz instability. It provides another good illustration of the behavior of vorticity as well as an introduction to the techniques that are commonly used to analyze fluid instabilities. We restrict attention to the simplest version of the instability. Consider an equilibrium with a fluid of density ρ+ moving horizontally with speed V above a second fluid, which is at rest, with density ρ− . Let x be a coordinate measured along the interface and let y be measured perpendicular to the surface. The equilibrium contains a sheet of vorticity lying in the plane y = 0, across which the velocity changes discontinuously. Now this discontinuity ought to be treated as a boundary layer, with a thickness determined by the viscosity. However, in this problem, we shall analyze disturbances with length scales much greater than the thickness of the boundary layer. A corollary of this assumption is that we can ignore viscous stresses in the body of the flow. We also specialise to very subsonic speeds, for which the flow can be treated as incompressible, and we ignore the effects of surface tension and gravity.

34 V

V y

x t=0 at rest

at rest

V

V

t >0 at rest

at rest

(b)

(a)

Fig. 13.16: Kelvin-Helmholtz Instability. a) Temporarily growing mode. b) Spatially growing mode.

A full description of this flow requires solving the full equations of fluid dynamics, which are quite non-linear and must be attacked numerically. However, we can make progress analytically on an important sub-problem. This involves answering the question of whether or not the simple vortex sheet with uniform flow above and below it is stable to small perturbations. This allows us to linearize in the amplitude of these perturbations. If we discover that these perturbations grow with time then we will eventually be forced to consider the full non-linear problem. However, uncovering the conditions for instability is a very important first step in our analysis of the flow. Let us imagine a small perturbation to the location of the interface by an amount ξ(x) [Fig. 13.16(a)]. We denote the associated perturbations to the pressure and velocity by δP, δv. That is to say we write P (x, t) = P0 + δP (x, t) ,

v = V H(y)ex + δv(x, t) ,

(13.69)

where P0 is the constant pressure in the equilibrium flow about which we are perturbing, V is the constant speed of the flow above the interface, and H(y) is the Heaviside step function (1 for y > 0 and 0 for y < 0). We substitute these P (x, t) and v(x, t) into the governing equations: the incompressibility relation ∇·v = 0,

(13.70)

and the viscosity-free Navier-Stokes equation, i.e., the Euler equation, dv −∇P = . dt ρ

(13.71)

We then subtract off the equations satisfied by the equilibrium quantities to obtain, for the perturbed variables, ∇ · δv = 0, (13.72) ∇δP dδv =− . dt ρ

(13.73)

35 Combining these two equations we find, once again, that the pressure satifies Laplace’s equation ∇2 δP = 0 (13.74) (cf. Sec. 13.3). We now follow the procedure that we used in Sec. 11.4.2 when treating Rayleigh waves on the surface of an elastic medium: we seek a wave mode in which the perturbed quantities vary ∝ exp[i(kx − ωt)]f (y) with f (y) dying out away from the interface. From Laplace’s equation (13.74), we infer an exponential falloff with |y|: δP = δP0 e−k|y|+i(kx−ωt) ,

(13.75)

where δP0 is a constant. Our next step is to substitute this δP into the perturbed Euler equation (13.73) to obtain ikδP at y > 0 (ω − kV )ρ+ −ikδP = at y < 0 . ωρ−

δvy =

(13.76)

We must impose two boundary conditions at the interface between the fluids: continuity of the vertical displacement ξ of the interface (the tangential displacement need not be continuous since we are examining scales large compared to the boundary layer), and continuity of the pressure P across the interface. [See Eq. (11.49) and associated discussion for the analogous boundary conditions at a discontinuity in an elastic medium.] Now, the vertical interface displacement ξ is related to the velocity perturbation by dξ/dt = δvy (y = 0), which implies by Eq. (13.76) that iδvy at y = 0+ (immediately above the interface) (ω − kV ) iδvy at y = 0− (immediately below the interface). = ω

ξ=

(13.77)

Then, by virtue of eqs. (13.75), (13.77), and (13.76), the continuity of pressure and vertical displacement at y = 0 imply that ρ+ (ω − kV )2 + ρ− ω 2 = 0 ,

(13.78)

where ρ+ and ρ− are the densities of the fluid above and below the interface. Solving for frequency ω as a function of horizontal wave number k, we obtain the following dispersion relation for linear Kelvin-Helmholtz modes: ω = kV

ρ+ ± i(ρ+ ρ− )1/2 ρ+ + ρ−

.

(13.79)

36

13.6.1

Temporal growth

Suppose that we create some small, localised disturbance at time t = 0. We can Fourier analyse the disturbance in space and, as we have linearized the problem, can consider the temporal evolution of each Fourier component. Now, what we ought to do is to solve the initial value problem carefully taking account of the initial conditions. However, when there are growing modes, we can usually infer the long-term behavior by ignoring the transients and just consider the growing solutions. In our case, we infer from Eqs. (13.75), (13.76), (13.77) and the dispersion relation (13.79) that a mode with spatial frequency k must grow as kV (ρ+ ρ− )1/2 ρ+ δP, ξy ∝ exp t + ik x − V t (13.80) (ρ+ + ρ− ) ρ+ + ρ− Thus, this mode grows exponentially with time [Fig. 13.16(a)]. Note that the mode is nondispersive, and if the two densities are equal, it e-folds in a few periods. This means that the fastest modes to grow are those which have the shortest periods and hence the shortest wavelengths. (However, the wavelength must not approach the thickness of the boundary layer and thereby compromise our assumption that the effects of viscosity are negligible.) We can understand this growing mode somewhat better by transforming into the center of momentum frame, which moves with speed ρ+ V /(ρ+ + ρ− ) relative to the frame in which the lower fluid is at rest. In this (primed) frame, the velocity of the upper fluid is V ′ = ρ− V /(ρ+ + ρ− ) and the perturbations evolve as δP, ξ ∝ exp[kV ′ (ρ+ /ρ− )1/2 t] cos(ikx′ )

(13.81)

In this frame the wave is purely growing, whereas in our original frame it oscillated with time as it grew.

13.6.2

Spatial Growth

An alternative type of mode is one in which a small perturbation is excited temporally at some point where the shear layer begins [Fig. 13.16(b)]. In this case we regard the frequency ω as real and look for the mode with positive imaginary k corresponding to spatial growth. Using Eq. (13.79), we obtain " 1/2 # ρ+ ω 1±i (13.82) k= V ρ− The mode therefore grows exponentially with distance from the point of excitation.

13.6.3

Relationship between temporal and spatial growth; Excitation of ocean waves by wind

An illustrative application is to a wind blowing steadily over the ocean. In this case, ρ+ /ρ− ∼ 10−3 and for temporal growth of the waves, the imaginary part of the frequency satisfies p ωi = ρ+ /ρ− ∼ 0.03ω (13.83)

37 for the growing mode [cf. Eq. (13.80)]. For spatial growth, we have [Eq. (13.82)] p ρ+ /ρ− ω ∼ 0.03 . ki = V V

(13.84)

In other words it takes about 30 periods either spatially or temporally for the amplitude of the ocean waves to increase by a factor ∼ e. This result is a simple application of a somewhat more general result. Consider any type of wave for which the dispersion relation ω(k) is an analytic function. Suppose that, among the solutions to this dispersion relation, there is a wave with real k that grows slowly with time; its growth rate can then be expressed as ωi (kr ), where the subscripts denote “imaginary” and “real”. There will be another solution, describing a wave with real frequency ω and a slow spatial growth of amplitude, with e-folding rate given by ki (ωr ). The growth rates for these two solutions are related by the following equation, which follows from the Cauchy-Riemann equations for any complex analytic function: ωi (kr ) ∂ωi ∂ωr ≃ = = vg . ki (ωr ) ∂ki ∂kr

(13.85)

This ratio is recognised as the group velocity. In the present application this is the wind velocity, V ∼ 10 m s−1 . The computational procedure that we have in studying the Kelvin-Helmholtz instability —seeking eigenfunction solutions to the equations of small perturbations—, although in widespread use, must be applied with care. For example, if the full temporal evolution of some initial configuration is needed, then there is no escape from solving the initial value problem, and the resulting transients may exhibit important behaviors that are absent in the individual growing modes.

13.6.4

Physical Interpretation

We have performed a normal mode analysis of a particular flow and discovered that there are unstable modes. However much we calculate the form of the growing modes, though, we cannot be said to understand the instability until we can explain it physically. In the case of the Kelvin-Helmholtz instability, this is a simple task. v

large

Psmall

v small

v small

P

P

v

large

large large

Psmall Fig. 13.17: Physical explanation for the Kelvin-Helmholtz instability.

38 The flow pattern is shown schematically in Fig. 13.17. The air will have to move faster when passing over a crest in the water wave as the cross sectional area of a flow tube diminishes and the flux of air must be conserved. By Bernoulli’s theorem, the pressure will be lower than ambient at this point and so the water in the crest will rise even higher. Conversely in the trough of the wave, the air will travel slower and the pressure will increase. The pressure differential will push the trough downward, making it grow. Equivalently, we can regard the boundary layer as a plane containing parallel vortex lines which interact one with another, much like magnetic field lines exert pressure on one another. When the vortex lines all lie strictly in a plane, they are in equilibrium as the repulsive force exerted by one on its neighbor is balanced by an opposite force exerted by the opposite neighbor. However, when this equilibrium is disturbed, the forces become unbalanced and the vortex sheet effectively buckles. More generally, whenever there is a large amount of relative kinetic energy available in a large Reynolds number flow, there exists the possibility of unstable modes that can tap this energy and convert it into a spectrum of growing modes, which can then interact non-linearly to create fluid turbulence, which ultimately is dissipated as heat. However, the fact that free kinetic energy is available does not necessarily imply that the flow is unstable; sometimes it is stable. Instability must be demonstrated, often a very difficult task.

13.6.5

Rayleigh and Richardson Stability Criteria.

Let us conclude this chapter by examining the stability of two interesting types of shear flow. The analyses we shall give are more directly physical than our routine solution for small amplitude normal modes. Firstly, we consider Couette flow, i.e. the azimuthal flow of an incompressible, effectively inviscid fluid between two coaxial cylinders. Let us consider the stability of the flow to purely axisymmetric perturbations by interchanging two rings. As there are no azimuthal forces, the interchange will occur at constant angular momentum. Now suppose that the ring that moves outward in radius r has lower specific angular momentum j than its surroundings; then it will have less centrifugal force per unit mass j 2 /r 3 than its surroundings and will thus experience a restoring force that drives it back to its original position. Conversely, if the angular momentum decreases outward, the ring will continue to expand. We conclude on this basis that Couette and similar flows are unstable when the angular momentum decreases outward. This is known as the Rayleigh criterion. We shall return to Couette flow in the following chapter. Compact stellar objects (black holes, neutron stars and white dwarfs) are sometimes surrounded by orbiting accretion disks of gas. The gas in these disks orbits roughly with the angular velocity dictated by Kepler’s laws. Therefore the specific angular momentum of the gas increases radially outward, proportional to the square root of the orbital radius. Consequently accretion disks are stable to this type of instability. We ignored the influence of gravity on the Kelvin-Helmholtz instability. This is not always valid; gravity can sometimes be quite important. Consider, as an example, the earth’s stratosphere. Its density decreases upward much faster than if the stratosphere were isentropic. This means that when a fluid element moves upward (adiabatically), its pressureinduced density changes are small compared to those of its ambient surroundings, which in

39 turn means that the air’s motions can be regarded as incompressible. This incompressiblity, together with the upward decrease of density, guarantees that the stratosphere is stably stratified. It may be possible, however, for the stratosphere to tap the relative kinetic energy in its horizontal winds so as to mix the air vertically. Consider the pedagogical example of two thin streams separated vertically by a distance δℓ large compared to their thicknesses, and moving horizontally with relative speed δV = V ′ δℓ. In the center of velocity frame, the streams each have speed δV /2 and they differ in density by δρ = |ρ′ δℓ|. To interchange these streams requires doing a work per unit mass δW = g(δρ/2ρ)δℓ (where the factor 2 comes from the fact that there are two unit masses involved in the interchange, one going up and the other coming down). This work can be supplied by the streams’ kinetic energy, if the available kinetic energy per unit mass δEk = (δV /2)2 /2 exceeds the required work. A necessary condition for instability is then that δEk = (δV )2 /8 > δW = g|ρ′ /2ρ|δℓ2

(13.86)

or Ri =

1 |ρ′ |g < ′2 ρV 4

(13.87)

where V ′ = dV /dz is the velocity gradient. This is known as the Richardson criterion and Ri is the Richardson number. Under a wide variety of circumstances this criterion turns out to be not only necessary but also sufficient for instability. The density scale height in the stratosphere is ρ/|ρ′ | ∼ 10 km. Therefore the maximum velocity gradient allowed by the Richardson criterion is V ′ . 60

m s−1 . km

(13.88)

Larger velocity gradients are rapidly disrupted by instabilities. This instability is responsible for much of the clear air turbulence encountered by airplanes and it is to a discussion of turbulence that we shall turn in the next chapter.

Bibliographic Note Vorticity is so fundamental to fluid mechanics that it is treated in detail in all fluid dynamical textbooks. Among those we find useful are Acheson (1990) at an elementary level; Betchelor (1970), Lighthill (1986) and Landau and Lifshitz (1959) at a more advanced level; and Tritton (1977) for an especially physical approach. To build up physical intuition, we recommend studying the movie Vorticity by Shapiro (1961).

Bibliography Acheson, D. J. 1990. Elementary Fluid Dynamics, Oxford: Clarendon Press.

40 Box 13.2 Important Concepts in Chapter 13 Differential equation for evolution of vorticity, Sec. 13.2.1 Circulation, Sec. 13.2.3 Kelvin’s theorem for evolution of circulation, Sec. 13.2.3 Vortex lines, Sec. 13.2.1 Freezing of vortex lines into the flow for barotropic, inviscid fluid, Sec. 13.2.1 Stretching of vortex lines, e.g. in a tornado, Sec. 13.2.2 Diffusion of vorticity due to viscosity, Sec. 13.2.4 Processes that create vortex lines, Sec. 13.2.5 Lift on aerofoil related to circulation around it, Ex. 13.2 Small change of pressure across boundary layer, Sec. 13.4 Laminar boundary layers, Sec. 13.4 Separation of a boundary layer, Sec. 13.4.2 Stokes flow around an obstacle at low Reynolds number, Sec. 13.3.1 Coriolis and Centrifugal forces in rotating reference frame, Sec. 13.5.1 Rossby number as ratio of inertial to Coriolis acceleration, Sec. 13.5.1 Eckmann number as ratio of viscous to Coriolis acceleration, Sec. 13.5.1 Geostrophic flows, Secs. 13.5.2, 13.5.3 Taylor-Proudman theorem: 2-D flow orthogonal to vortex lines, Sec. 13.5.3 Taylor column, Fig. 13.12 Kelvin-Helmholtz instability and excitation of ocean waves by wind, Sec. 13.6 Richardson criterion for gravity to suppress Kelvin-Helmholtz instability, Sec. 13.6.5 Drag force and drag coefficient, Sec. 13.4 Rayleigh criterion for instability of rotating Couette flow: increase of angular momentum outward, Sec. 13.6.5 • Methods of analysis

• • • • • • • • • • • • • • • • • • • • • • •

– Order of magnitude analysis, Sec. 13.4 – Velocity as gradient of a potential when vorticity vanishes, Secs. 13.1, 13.4 – Stream function – vector potential for velocity in incompressible flow; becomes a scalar potential in 2-dimensional flow, Sec. 13.4 – Matched asymptotic expansions, Sec. 13.3.1 – Similarity solution, Sec. 13.4

41 Batchelor, G. K. 1970. An Introduction to Fluid Dynamics, Cambridge: Cambridge University Press. Feyman, R. P. 1972. Statistical Mechanics, Reading: Benjamin. Landau, L. D. and Lifshitz, E. M. 1959. Fluid Mechanics, Oxford:Pergamon. Shapiro, A. 1961. Vorticity, a movie (National Committee for Fluid Mechanics Films); available in two parts at http://web.mit.edu/hml/ncfmf.html. Tritton, D. J. 1977. Physical Fluid Dynamics, Wokingham: van Nostrand-Reinhold. Turco et al. 1986. Science, 222, 283. White, F. M. 1974. Viscous Fluid Flow, New York: McGraw-Hill.

Contents 14 Turbulence 14.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 The Transition to Turbulence - Flow Past a Cylinder . . . . . . . . . . . 14.3 Semi-Quantitative Analysis of Turbulence . . . . . . . . . . . . . . . . . 14.3.1 Weak Turbulence . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.2 Turbulent Viscosity . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.3 Relationship to Vorticity . . . . . . . . . . . . . . . . . . . . . . . 14.3.4 Kolmogorov Spectrum for Homogeneous and Isotropic Turbulence 14.4 Turbulent Boundary Layers . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.1 Profile of a Turbulent Boundary Layer . . . . . . . . . . . . . . . 14.4.2 Instability of a Laminar Boundary Layer . . . . . . . . . . . . . . 14.4.3 The flight of a ball. . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 The Route to Turbulence: Onset of Chaos . . . . . . . . . . . . . . . . . 14.5.1 Couette Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.2 Feigenbaum Sequence . . . . . . . . . . . . . . . . . . . . . . . . .

0

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

1 1 4 9 9 11 12 13 20 20 23 24 27 27 29

Chapter 14 Turbulence Version 0814.1.K, 4 February 2009 Please send comments, suggestions, and errata via email to [email protected], or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 14.1 Reader’s Guide • This chapter relies heavily on Chaps. 12 and 13. • The remaining chapters on fluid mechanics and magnetohydrodynamics (Chaps. 15–18) do not rely significantly on this chapter, nor do any of the remaining chapters in this book.

14.1

Overview

In Sec. 12.7.6, we derived the Poiseuille formula for the flow of a viscous fluid down a pipe by assuming that the flow is laminar, i.e. that it has a velocity parallel to the pipe wall. We showed how balancing the stress across a cylindrical surface led to a parabolic velocity profile and a rate of flow proportional to the fourth power of the pipe diameter, d. We also defined the Reynolds number; for pipe flow it is Rd ≡ v¯d/ν, where v¯ is the mean speed in the pipe. Now, it turns out experimentally that the flow only remains laminar up to a critical Reynolds number that has a value in the range ∼ 103 − 105 depending upon the smoothness of the pipe’s entrance and roughness of its walls. If the pressure gradient is increased further (and thence the mean speed v¯ and Reynolds number Rd are increased), then the velocity field in the pipe becomes irregular both temporally and spatially, a condition known as turbulence. Turbulence is common in high Reynolds number flows. Much of our experience of fluids involves air or water for which the kinematic viscosities are ∼ 10−5 and 10−6 m2 s−1 respectively. For a typical everyday flow with a characteristic speed of v ∼ 10 m s−1 and a characteristic length of d ∼ 1m, the Reynolds number is huge: Rd = vd/ν ∼ 106 − 107 . It is 1

2 therefore not surprising that we see turbulent flows all around us. Smoke in a smokestack, a cumulus cloud and the wake of a ship are three examples. In Sec. 14.2 we shall illustrate the phenomenology of the transition to turbulence as Rd increases using a particularly simple example, the flow of a fluid past a circular cylinder oriented perpendicular to the line of sight. We shall see how the flow pattern is dictated by the Reynolds number and how the velocity changes from steady creeping flow at low Rd to fully-developed turbulence at high Rd . What is turbulence? Fluid dynamicists can certainly recognize it but they have a hard time defining it precisely,1 and an even harder time describing it quantitatively. At first glance, a quantitative description appears straightforward. Decompose the velocity field into Fourier components just like the electromagnetic field when analysing electromagnetic radiation. Then recognize that the equations of fluid dynamics are nonlinear, so there will be coupling between different modes (akin to wave-wave coupling between light modes in a nonlinear crystal, discussed in Chap. 9). Analyze that coupling perturbatively. The resulting weak-turbulence theory (some of which we sketch in Sec. 14.3 and Ex. 14.3) is useful when the turbulence is not too strong. In this theory, among other things, one averages the spectral energy density over many realizations of a stationary turbulent flow to obtain a mean spectral energy density for the fluid’s motions. Then, if this energy density extends over several octaves of wavelength, scaling arguments can be invoked to infer the shape of the energy spectrum. This produces the famous Kolmogorov spectrum for turbulence. This spectrum has been verified experimentally under many different conditions. (Another weak-turbulence theory which is developed along similar lines is the quasi-linear theory of nonlinear plasma interactions, which we shall develop in Chap. 22.) Most turbulent flows come under the heading of fully developed or strong turbulence and cannot be well described in this weak-turbulence manner. Part of the problem is that the (v · ∇)v term in the Navier-Stokes equation is a strong nonlinearity, not a weak coupling between linear modes. As a consequence, eddies of size ℓ persist for typically no more than one turnover timescale ∼ ℓ/v before they are broken up, and so do not behave like weakly coupled normal modes. Another, related, problem is that it is not a good approximation to assume that the phases of the modes are random, either spatially or temporally. If we look at a snapshot of a turbulent flow, we frequently observe large, well-defined coherent structures like eddies and jets, which suggests that the flow is more organized than a purely random superposition of modes, just as the light reflected from the surface of a painting differs from that emitted by a black body. Moreover, if we monitor the time variation of some fluid variable, such as one component of the velocity at a given point in the flow, we can recognize intermittency – the irregular starting and ceasing of strong turbulence. Again, this is such a marked effect that there is more than a random-mode superposition at work, reminiscent of the distinction between noise and music (at least some music). Strong turbulence is therefore not just a problem in perturbation theory; and alternative, semiquantitative approaches must be devised. In the absence of a decent quantitative theory, it becomes necessary to devise intuitive, qualitative and semiquantitative approaches to the physical description of turbulence (Secs. 14.3 and 14.4). We emphasize the adjective physical, because our goal is not just to produce 1

The analogy to Justice Potter Stewart’s definition of pornography should be resisted.

3 empirical descriptions of the consequences of turbulence, but rather to comprehend the underlying physical character of turbulence, going beyond purely empirical rules on one hand and uninstructive mathematical expansions on the other. This means that the reader must be prepared to settle for order of magnitude scaling relations based on comparing the relative magnitudes of individual terms in the governing fluid dynamical equations. At first, this will seem quite unsatisfactory. However, much contemporary physics has to proceed with this methodology. It is simply too hard, in turbulence and some other phenomena, to discover elegant mathematical counterparts to the Kepler problem or the solution of the Schrödinger equation for the hydrogen atom. A good example of a turbulent flow that embodies some of these principles is a turbulent boundary layer along a solid surface (Sec. 14.4). Turbulent boundary layers generally exert more shear stress on a surface than laminar boundary layers, but nevertheless usually produce less total drag, because they are less prone to separation from the surface when subjected to an adverse pressure gradient. For this reason, turbulence is often induced artificially in boundary layers, e.g. those going over the top of an airplane wing, or those on one face of a ball. In Sec. 14.4 we shall study the structure of a turbulent boundary layer and briefly discuss how its turbulence can arise through instability of a laminar boundary layer; and we shall examine some applications to balls moving through the air (golf balls, cricket balls, baseballs, ...). Whether or not a flow becomes turbulent can have a major influence on how fast chemical reactions occur in liquids and gases; this is another motivation for artificially inducing or surpressing turbulence. One can gain additional insight into turbulence by a technique that is often useful when struggling to understand complex physical phenomena: Replace the system being studied by a highly idealized model system that is much simpler than the original one, both conceptually and mathematically, but that retains at least one central feature of the original system. Then analyze the model system completely, with the hope that the quantitative insight so gained will be useful in understanding the original problem. This approach is central, e.g., to research in quantum cosmology, where one is trying to understand how the initial conditions for the expansion of the universe were set, and to do so one works with model universes that have only a few degrees of freedom. Similarly, during the 1970’s and especially the 1980’s and 1990’s, new insights into turbulence came from studying idealized dynamical systems that have very small numbers of degrees of freedom, but have the same kinds of nonlinearities as produce turbulence in fluids. We shall examine several such low-dimensional dynamical systems and the insights they give in Sec. 14.5. The most useful of those insights deal with the onset of weak turbulence, and the fact that it seems to have much in common with the onset of chaos (irregular and unpredictable dynamical behavior) in a wide variety of other dynamical systems — e.g., coupled pendula, electric circuits, and planets, asteroids, satellites and rings in the solar system. A great discovery of modern classical physics/mathematics has been that there exist organizational principles that govern the behavior of these seemingly quite different chaotic physical systems. In Sec. 14.5 we shall discuss some very simple models for the onset of chaos and shall outline the relationship of those models to the behavior of turbulent fluids. Despite these elegant modern insights, those features of turbulence that are important in

4

d V

(a)

R 0 develops outside the boundary layer, near the cylinder’s downstream face, and causes the separated boundary layer to be replaced by these two counter-rotating eddies. The pressure in these eddies is of order the flow’s incoming pressure P0 and is significantly less than the stagnation pressure Pstag = P0 + 12 ρV 2 at the cylinder’s front face, so the drag coefficient stabilizes at CD ∼ 1. As the Reynolds number increases above Rd ∼ 5, the size of the two eddies increases until, at Rd ∼ 100, the eddies are shed dynamically, and the flow becomes non-stationary. The eddies tend to be shed alternately in time, first one and then the other, producing a pattern of alternating vortices downstream known as a Karman vortex street [Fig. 14.1(d)]. When Rd ∼ 1000, the downstream vortices are no longer visible and the wake behind the cylinder contains a velocity field irregular on all scales [Fig. 14.1(e)]. This downstream flow has become turbulent. Finally, at Rd ∼ 3 × 105 , the boundary layer, which has been laminar up to this point, itself becomes turbulent [Fig. 14.1(f)], reducing noticeably the drag coefficient (Fig. 14.2). We shall explore the cause of this reduction below. [The physically relevant Reynolds number for onset of turbulence in the boundary layer is that computed not from the cylinder diameter d, Rd = V d/ν, but rather from the boundary layer thickness 1/2 δ ∼ d/Rd : −1/2 p V dRd Vδ ∼ = Rd . Rδ = (14.7) ν ν √ The onset of boundary-layer turbulence is at Rδ ∼ 3 × 105 ∼ 500, about the same as the Rd ∼ 1000 for onset of turbulence in the wake.]

8 An important feature of this changing flow pattern is the fact that at Rd ≪ 1000 [Figs. 14.1(a)–(d)], before any turbulence sets in, the flow (whether steady or dynamical) is translation symmetric; i.e., it is independent of distance z down the cylinder; i.e., it is two-dimensional. This is true even of the Karman vortex street. By contrast, the turbulent velocity field at Rd & 1000 is fully three-dimensional. At these large Reynolds numbers, small, non-translation-symmetric perturbations of the translation-symmetric flow grow into vigorous, three-dimensional turbulence. This is a manifestation of the fact (which we shall explore below) that two-dimensional flows cannot exhibit true turbulence. True turbulence requires chaotic motions in all three dimensions. The most important feature of this family of flows, a feature that is characteristic of most such families, is that there is a critical Reynolds number for the onset of turbulence. That critical number can range from ∼ 30 to ∼ 105 , depending on the geometry of the flow and on precisely what length and speed are used to define the Reynolds number. **************************** EXERCISES Exercise 14.1 Example: Spreading of a laminar wake Consider stationary, incompressible flow around a long circular cylinder with diameter d, extending perpendicular to the velocity. Let the flow far in front of the cylinder be uniform with speed V . Let the Reynolds number Rd be small enough for the flow to remain laminar. (We will treat the turbulent regime at higher Rd in Ex. 14.4 below.) Behind the body is a wake with width w(x), at a distance x downstream from the cylinder. At the center of the wake, the flow speed is reduced to V − ∆v, where ∆v is called the velocity deficit. (a) Use momentum conservation to derive the approximate relationship ∆v ∝ w −1 between the velocity deficit and the width of the wake at a distance x far downstream from the cylinder. Then use the Navier-Stokes equation and invoke self similarity (like that of the Blasius profile in Sec. 13.4) to show that ∆v ∝ x−1/2 . (b) How will this scaling law be modified if we replace the cylinder by a sphere?

Exercise 14.2 Example: Spreading of Laminar Jets Consider a narrow, two-dimensional, incompressible (i.e. subsonic) jet emerging from a twodimensional nozzle into ambient fluid of the same composition and pressure, at rest. (By two-dimensional we mean that the nozzle and jet are translation symmetric in the third dimension.) Let the Reynolds number be low enough for the flow to be laminar; we shall study the turbulent regime in Ex. 14.5 below. We want to understand how rapidly this laminar jet spreads. (a) Show that the pressure forces far downstream from the nozzle are likely to be much smaller than the viscous forces and can therefore be ignored.

9

ambient fluid at rest

y jet jet

x

w (x) ambient fluid at rest

Fig. 14.3: Two-dimensional laminar jet. As the jet widens, it entrains ambient fluid.

(b) Let the jet’s thrust per unit length (i.e. the momentum per unit time per unit length flowing through the nozzle) be F . Introduce cartesian coordinates x, y, with x parallel to and y perpendicular to the jet (cf. Fig. 14.3). Use the Navier-Stokes equation to make an order of magnitude estimate of the speed vx and the width w of the jet as a function of distance x downstream in a similar manner to the previous problem. (c) Use these scalings to modify the self-similarity analysis that we used for the laminar boundary layer in Sec. 13.4, and thereby obtain the following approximate solution for the jet velocity profile: 1/3 1/3 ! F 3F 2 sech2 y . (14.8) vx = 32ρ2 νx 48ρν 2 x2

****************************

14.3

Semi-Quantitative Analysis of Turbulence

14.3.1

Weak Turbulence

The considerations of the last section motivate us to attempt to construct a semi-quantitative, mathematical description of turbulence. We shall begin with an approach that is moderately reasonable for weak turbulence, but becomes much less so when the turbulence is strong. (In Sec. 14.3.4 below, we shall develop a model for turbulence based on interacting eddies. One can regard the turbulence as weak if the timescale τ∗ for a big eddy to feed most of its energy to smaller eddies is long compared to the eddy’s turnover time τ , i.e., its “rotation period”. Unfortunately, turbulence is usually strong, so the eddy’s energy loss time is of order its turnover time, τ∗ ∼ τ , which means the eddy loses its identity in roughly one turnover time. For such strong turbulence, the weak-turbulence formalism that we shall sketch here is only semiquantitatively accurate.)

10 The theory of weak turbulence (with gravity negligible and the flow very subsonic so it can be regarded as incompressible) is based on the standard incompressibility equation and the time-dependent Navier-Stokes equation, which we write in the following forms:

ρ

∇·v = 0,

(14.9a)

∂v + ∇ · (ρv ⊗ v) = −∇P + ρν∇2 v . ∂t

(14.9b)

[Eq. (14.9b) is equivalent to (14.2b) with ∂v/∂t added because of the time dependence, and with the inertial force term rewritten via ∇ · (ρv ⊗ v) = ρ(v · ∇)v, or equivalently in index notation, (ρvi vj );i = ρ,i vi vj + ρ(vi;i vj + vi vj;i) = ρvi vj;i .] Equations (14.9a) and (14.9b) are four scalar equations for four unknowns, P (x, t) and the three components of v(x, t); ρ and ν can be regarded as constants in these equations. To obtain the weak-turbulence versions of these equations, we split the velocity field ¯ , P¯ , plus fluctuating parts, δv, δP : v(x, t) and pressure P (x, t) into steady parts v ¯ + δv , v=v

P = P¯ + δP .

(14.10)

¯ and P¯ as the time averages of v and P , and define δv We can think of (or, in fact, define) v and δP as the difference between the exact quantities and the time-averaged quantities. ¯ and P¯ are governed by the time averages of the incomThe time-averaged variables v pressibility and Navier-Stokes equations (14.9). Because the incompressibility equation is linear, its time average ¯=0 ∇·v (14.11a) entails no coupling of the steady variables to the fluctuating variables. By contrast, the nonlinear inertial term ∇·(ρv ⊗ v) in the Navier-Stokes equation gives rise to such a coupling in the (time-independent) time-averaged equation: ¯ − ∇ · TR . ρ(¯ v · ∇)¯ v = −∇P¯ + νρ∇2 v

(14.11b)

TR ≡ ρδv ⊗ δv

(14.11c)

Here is known as the Reynolds stress tensor. It serves as a “driving term” in the time-averaged Navier-Stokes equation (14.11b) — a term by which the fluctuating part of the flow acts back on, and influences the time-averaged flow. This Reynolds stress TR can be regarded as an additional part of the total stress tensor, analogous to the gas pressure computed in kinetic theory,2 P = 13 ρv 2 , where v is the molecular speed. TR will be dominated by the largest eddies present, and it can be anisotropic, especially when the largest-scale turbulent velocity fluctuations are distorted by interaction with an averaged shear flow, i.e when σ ¯ij = 12 (¯ vi;j + v¯j;i) is large. If the turbulence is both stationary and homogeneous (a case we shall specialize to below when studying the “Kolmogorov spectrum”), then the Reynolds stress tensor can be written 2

¯ = 1 mv 2 . Deducible from Eq. (2.35c) or from Eqs. (2.37b) and (2.37c) with mean energy per particle E 2

11 in the form TR = PR g, where PR is the Reynolds pressure, which is independent of position, and g is the metric, so gij = δij . In this case, the turbulence will exert no force density on the mean flow; i.e., ∇ · TR = ∇PR will vanish in the time-averaged Navier-Stokes equation (14.11b). By contrast, near the edge of a turbulent region (e.g., near the edge of a turbulent wake or jet or boundary layer), the turbulence will be inhomogeneous, and thereby (as we shall see in the next subsection) will exert an important influence on the time-independent, averaged flow. Notice that the Reynolds stress tensor is the tensorial auto-correlation function of the velocity fluctuation field (multiplied by density ρ). It is possible to extend this “theory” of weak turbulence with the aid of a cross-correlation of the velocity field. The crosscorrelation function involves taking the time average of products of velocity components (or other relevant physical quantities) at different points simultaneously, or at the same point at different times. (It is relatively straightforward experimentally to measure these correlation functions.) As we discuss in greater detail below (and as we also saw for one-dimensional random processeses in Sec. 5.3 and for multidimensional, complex random processes in Ex. 8.7), the Fourier transforms of these correlation functions give the spatial and temporal spectral densities of the fluctuating quantities. Just as the structure of the time-averaged flow is governed by the time-averaged incompressibility and Navier-Stokes equations (14.11) (with the fluctuating variables acting on the time-averaged flow through the Reynolds stress), so also the fluctuating part of the flow is governed by the fluctuating (difference between exact and time-averaged) incompressibility and Navier-Stokes equations; see Ex. 14.3.

14.3.2

Turbulent Viscosity

Additional tools that are often introduced in the theory of weak turbulence come from taking the analogy with the kinetic theory of gases one stage further and defining turbulent transport coefficients (most importantly a turbulent viscosity that governs the turbulent transport of momentum). These turbulent transport coefficients are derived by simple analogy with the kinetic-theory transport coefficients (Sec. 2.7.) Momentum, heat, etc. are transported most efficiently by the largest turbulent eddies in the flow; therefore, in estimating the transport coefficients we replace the particle mean free path by the size ℓ of the largest eddies and the mean particle speed by the magnitude vℓ of the fluctuations of velocity in the largest eddies. The result, for momentum transport, is a model turbulent viscosity 1 νt ≃ vℓ ℓ 3

(14.12)

[cf. Eq. (12.72) for molecular viscosity, with ν = η/ρ]. The Reynolds stress is then approximated as a turbulent shear stress of the standard form ¯. TR ≃ −2ρνt σ

(14.13)

¯ is the rate of shear tensor (12.63b) evaluated using the mean velocity field v ¯ . Note Here σ that the turbulent kinematic viscosity defined in this manner, νt , is a property of the turbulent flow and not an intrinsic property of the fluid; it differs from molecular viscosity in this important respect.

12 By considerations similar to these for turbulent viscosity, one can define and estimate a turbulent thermal conductivity for the spatial transport of time-averaged heat (cf. Sec. 2.7.2) and a turbulent diffusion coefficient for the spatial transport of one component of a time-averaged fluid through another, for example an odor crossing a room (cf. Ex. 2.19). In Ex. 14.4 we shall see how the Reynolds stress (14.13), expressed in terms of the turbulent viscosity, produces (via vortex-line diffusion) the spatial widening of the timeaveraged, turbulent wake behind a cylinder, and in Ex. 14.5 we shall see it for a turbulent jet. In the case of a gas, νt and the other turbulent transport coefficients can be far far larger than their kinetic-theory values. For example, air in a room subject to typical uneven heating and cooling might circulate with an average largest eddy velocity of vℓ ∼ 1 cm s−1 and an associated eddy size of ℓ ∼ 3 m. (This can be estimated by observing the motion of illuminated dust particles.) The kinematic turbulent viscosity νt — and also the turbulent diffusion coefficient Dt (Ex. 2.19) — associated with these motions are νt ∼ Dt ∼ 10−2 m2 s−1 , some three orders of magnitude larger than the molecular values. Correspondingly, a time-averaged turbulent wake (or boundary layer or jet) will widen, downstream, due to diffusion of time-averaged vorticity, much more rapidly than will its laminar counterpart.

(a)

(b)

(c)

Fig. 14.4: Schematic illustration of the propagation of turbulence by the stretching of vortex lines. The tube of vortex lines in (a) gets stretched and thereby is forced into a reduced cross section by the turbulent evolution from (a) to (b) to (c). The reduced cross section means an enhanced vorticity on smaller scales.

14.3.3

Relationship to Vorticity

Three-dimensional turbulent flows contain tangled vorticity. As we discussed in Chap. 13, vortex lines move with the fluid and can be stretched by the action of neighboring vortex lines. As a bundle of vortex lines is stretched and twisted (Fig. 14.4), the incompressibility of the fluid causes the bundle’s cross section to decrease and correspondingly causes the magnitude of its vorticity to increase, and the lengthscale on which the vorticity changes to decrease (cf. Sec. 13.2). The continuous lengthening and twisting of the fluid therefore creates vorticity on progressively smaller length scales. Note that, when the flow is two-dimensional (i.e. translation symmetric), there is no stretching of the vortex lines and thus no inexorable driving of the turbulent energy to

13 smaller and smaller length scales. This is one reason why true turbulence does not occur in two dimensions, only in three.

14.3.4

Kolmogorov Spectrum for Homogeneous and Isotropic Turbulence

When a fluid exhibits turbulence over a large volume that is well-removed from any solid bodies, then there will be no preferred directions and no substantial gradients in the statisticallyaveraged properties of the turbulent velocity field. This suggests that the turbulence will be stationary and isotropic. We shall now derive a semi-quantitative description of some of the statistical properties of such stationary, isotropic turbulence. Our derivation will be based on the following simple physical model: We shall idealize the turbulent velocity field as made of a set of large eddies, each of which contains a set of smaller eddies and so on. We suppose that each eddy splits into eddies roughly half its size after a few turnover times. This can be described mathematically as nonlinear or triple velocity correlation terms producing, in the law of energy conservation, an energy transfer (a “cascade” of energy) from larger scale eddies to smaller scale eddies. Now, for large enough eddies, we can ignore the effects of genuine viscosity in the flow. However, for small enough eddy scales, viscous dissipation will convert the eddy bulk kinetic energy into heat. This simple model will enable us to derive a remarkably successful formula (the “Kolmogorov spectrum”) for the distribution of turbulent energy over eddy size. We must first introduce and define the turbulent energy per unit wave number and per unit mass, uk (k). Consider the Fourier transform δ˜ v(k) of the fluctuating part of the velocity field δv(x): Z d3 k δ˜ veik·x . δv = (14.14) 3 (2π) We perform this Fourier transform in much the same way that we take the Fourier transform of the wave function ψ in quantum mechanics: we introduce an imaginary, cubical “box” of volume V and side V 1/3 in which to compute the transform, we require that V be much larger than the volumes of the largest turbulent eddies, and we treat δv mathematically as though it vanished outside the box’s walls. We then define the total energy per unit mass u in the turbulence by Z 3 Z d x1 d3 k |δ˜ v |2 2 u= |δv| = V 2 (2π)3 2V Z ∞ = dk uk (k) , (14.15) 0

where we have used Parseval’s theorem in the second equality, we have used d3 k = 4πk 2 dk, and we have defined |δ˜ v |2 k 2 uk (k) ≡ . (14.16) 4π 2 V Here the bars denote a time average, k is the magnitude of the wave vector k ≡ |k| (i.e. it is the wave number or equivalently 2π divided by the wavelength), and uk (k) is the spectral

14 energy per unit mass of the turbulent velocity field δv. In the third equality in Eq. (14.15), we have assumed that the turbulence is isotropic so the integrand depends only on wave number k and not on the direction of k. Correspondingly, we have defined uk (k) as the energy per unit wave number rather than an energy per unit volume of k-space. Therefore, uk (k)dk is the average kinetic energy per unit mass associated with modes that have k lying in the interval dk; we treat k as positive. In Chap. 5 we introduced the concept of a “random process” and its “spectral density.” The Cartesian components of the fluctuating velocity δvx , δvy , δvz obviously are random processes, and it is straightforward to show that their spectral densities are related to uk (k) by Svx (k) = Svy (k) = Svz (k) = const × uk (k), (14.17) where the constant is of order unity. We shall now use our physical model of turbulence to derive an expression for uk (k). Denote by kmin = 2π/ℓ the wave number of the largest eddies, and by kmax that of the smallest ones (those in which viscosity dissipates the cascading, turbulent energy). Our derivation will be valid, and the result will be valid, only when kmax /kmin ≫ 1, i.e. only when there is a large sequence of eddies from the largest to half the largest to a quarter the largest ... down to the smallest. As a tool in computing uk (k), we introduce the root-mean-square turbulent turnover speed of the eddies with wave number k, v(k) ≡ v; and ignoring factors of order unity, we treat the size of these eddies as k −1 . Then their turnover time is τ (k) ∼ k −1 /v(k) = 1/[kv(k)]. Our model presumes that in this same time τ (to within a factor of order unity), each eddy of size k −1 splits into eddies of half this size; i.e. the turbulent energy cascades from k to 2k. Since the energy cascade is presumed stationary (i.e. no energy is accumulating at any wave number), the energy per unit mass that cascades in a unit time from k to 2k must be independent of k. Denote by q that k-independent, cascading energy per unit mass per unit time. Since the energy per unit mass in the eddies of size k −1 is v 2 (aside from a factor 2, which we neglect), and the cascade time is τ ∼ 1/(kv), then q ∼ v 2 /τ ∼ v 3 k. This tells us that the rms turbulent velocity is v(k) ∼ (q/k)1/3 .

(14.18)

Our model lumps together all eddies with wave number within a range ∆k ∼ k around k, and treats them all as having wave number k. The total energy per unit mass in these eddies is uk (k)∆k ∼ kuk (k) when expressed in terms of the sophisticated quantity uk (k), and it is ∼ v(k)2 when expressed in terms of our simple model. Thus, our model predicts that uk (k) ∼ v(k)2 /k, which by Eq. (14.18) implies uk (k) ∼ q 2/3 k −5/3 for kmin ≪ k ≪ kmax ;

(14.19)

see Fig. 14.5. This is the Kolmogorov spectrum for the spectral energy density of stationary, isotropic, incompressible turbulence. It is valid only in the range kmin ≪ k ≪ kmax because only in this range are the turbulent eddies continuously receiving energy from larger lengthscales and passing it on to smaller scales. At the ends of the range, the spectrum will be modified in the manner illustrated qualitatively in Fig. 14.5.

15

log u k(k) u k ~ Q2/3 k -5/3

kmin

kmax

llog kl

~ R-3/4 kmax

~ q1/4 v-3/4

Fig. 14.5: The Kolmogorov spectral energy density for stationary, homogeneous turbulence. −1 The smallest lengthscales present, kmax , are determined by the fact that there viscous forces become competitive with inertial forces in the Navier-Stokes equation, and thereby convert the cascading energy into heat. Since the ratio of inertial forces to viscous forces is the Reynolds number, the smallest eddies have a Reynolds number of order unity: Rkmax = −1 v(kmax )kmax /ν ∼ 1. Inserting Eq. (14.18) for v(k), we obtain

kmax ∼ q 1/4 ν −3/4 .

(14.20)

−1 The largest eddies have sizes ℓ ∼ kmin and turnover speeds vℓ = v(kmin) ∼ (q/kmin)1/3 . By combining these relations with Eq. (14.20) we see that the ratio of the largest wave numbers present in the turbulence to the smallest is

kmax ∼ kmin

vℓ ℓ ν

3/4

3/4

= Rℓ .

(14.21)

Here Rℓ is the Reynolds number for the flow’s largest eddies. Let us now take stock of our results: If we know the scale ℓ of the largest eddies and their rms turnover speeds vℓ (and, of course, the viscosity of the fluid), then from these we can compute their Reynolds number Rℓ ; from that, Eq. (14.21), and kmin ∼ ℓ−1 , we can compute the flow’s maximum and minimum wave numbers; and from q ∼ vℓ3 /ℓ and Eq. (14.19) we can compute the spectral energy density in the turbulence. We can also compute the total time required for energy to cascade from the largest eddies to the smallest: Since τ (k) ∼ 1/(kv) ∼ 1/(q 1/3 k 2/3 ), each successive set of eddies feeds its energy downward in a time 2−2/3 shorter than the preceeding set. As a result, it takes roughly the same amount of time for energy to pass from the second largest eddies (size ℓ/2) −1 to the very smallest (size kmax ) as it takes for the second largest to extract the energy from the very largest. The total cascade occurs in a time of several ℓ/vℓ (during which time, of course, the mean flow has fed new energy into the largest eddies and they are sending it on downwards).

16 These results are accurate only to within factors of order unity – with one major exception: The −5/3 power law in the Kolmogorov spectrum is very accurate. That this ought to be so one can verify in two equivalent ways: (i) Repeat the above derivation inserting arbitrary factors of order unity at every step. These factors will influence the final multiplicative factor in the Kolmogorov spectrum, but will not influence the −5/3 power. (ii) Notice that the only dimensioned entitities that can influence the spectrum in the region kmin ≪ k ≪ kmax are the energy cascade rate q and the wave number k. Then notice that the only quantity with the dimensions of uk (k) (energy per unit mass per unit wave number) that can be constructed from q and k is q 2/3 k −5/3 . Thus, aside from a multiplicative factor of order unity, this must be the form of uk (k). Let us now review and critique the assumptions that went into our derivation of the Kolmogorov spectrum. First, we assumed that the turbulence is stationary and homogeneous. Real turbulence is neither of these, since it exhibits intermittency (Sec. 14.1) and smaller eddies tend to occupy less volume overall than larger eddies. Second, we assumed that the energy source is large-length-scale motion and that the energy transport is local in k-space from the large length scales to steadily smaller ones. In the language of a Fourier decomposition into normal modes, we assumed that nonlinear coupling between modes with wave number k causes modes with wave number of order 2k to grow, but does not significiantly enhance modes with wave number 100k or 0.01k. Again this is not completely in accord with observations which reveal the development of coherent structure–large scale regions with correlated vorticity in the flow. These structures are evidence for a reversed flow of energy in k-space from small scales to large scales, and they play a major role in another feature of real turbulence, entrainment – the spreading of an organised motion, e.g. a jet, into the surrounding fluid (Ex. 14.6). Despite these qualifications, the Kolmogorov law is surprisingly useful. It has been verified in many laboratory flows, and it describes many naturally occuring instances of turbulence. For example, the twinkling of starlight is caused by refractive index fluctuations in the earth’s atmosphere, whose power spectrum we can determine optically. The underlying turbulence spectrum turns out to be of Kolmogorov form. **************************** EXERCISES Exercise 14.3 Example: Reynolds Stress, and Fluctuating Part of Navier-Stokes Equation in Weak Turbulence (a) Derive the time-averaged Navier-Stokes equation (14.11b) from the time-dependent form of the equation, (14.9b), and thereby infer the definition (14.11c) for the Reynolds stress. Equation (14.11b) shows how the Reynolds stress affects the evolution of the mean velocity. However, it does not tell us how the Reynolds stress evolves. (b) Explain why an equation for the evolution of the Reynolds stress must involve averages of triple products of the velocity fluctuation. Similarly the time evolution of the averaged triple products will involve averaged quartic products, and so on (cf. the

17 BBGYK hierarchy of equations in plasma physics, Sec. 21.6). How do you think you might “close” this sequence of equations, i.e. terminate it at some low order and get a fully determined system of equations? [Hint: the simplest way is via the turbulent viscosity.] (c) Show that the fluctuating part of the Navier-Stokes equation (the difference between the exact Navier-Stokes equation and its time average) takes the following form: 1 ∂δv + (¯ v · ∇)δv + (δv · ∇)¯ v + [(δv · ∇)δv − (δv · ∇)δv] = − ∇δP + ν∇2 (δv). ∂t ρ (14.22a) This equation and the fluctuating part of the incompressibility equation ∇ · δv = 0

(14.22b)

govern the evolution of the fluctuating variables δv and δP . [The challenge, of course, is to devise ways to solve these equations despite the nonlinearities and the coupling to the mean flow that show up strongly in Eq. (14.22a).] (d) By dotting δv into Eq. (14.22a) and then taking its time average, derive the following law for the spatial evolution of the turbulent energy density 12 ρδv 2 : 1 ¯ · ∇( ρδv 2 ) + ∇ · v 2

1 ρδv 2 δv + δP δv 2

= −TRij v¯i,j + νρδv · (∇2 δv) .

(14.23)

Here TRij = ρδvi δvj is the Reynolds stress [Eq. (14.11c)]. Interpret each term in this equation. (e) Now derive a similar law for the spatial evolution of the energy density of ordered motion 12 ρ¯ v2 . Show that the energy lost by the ordered motion is compensated by the energy gained by the turbulent energy. Exercise 14.4 Example: Turbulent Wake Consider a turbulent wake formed by high Reynolds number flow past a cylinder, as in Ex. 14.1. Let the width a distance x downstream be w(x), the flow speed far upstream be V , and the velocity deficit in the mean velocity field of the wake be ∆¯ v. (a) Argue that the kinematic turbulent viscosity in the wake should be νt ∼ ∆¯ v w. Hence, using a similar order of magnitude analysis to that given in Ex. 14.1, show that the width of the wake is w ∼ (xd)1/2 , where d is the cylinder diameter, and that the velocity deficit is ∆¯ v ∼ V (d/x)1/2 . (b) Repeat this exercise for a sphere.

aa 18

a Fig. 14.6: The Coanda effect. A turbulent jet emerging from an orfice in the left wall is attracted by the solid bottom wall.

Exercise 14.5 Problem: Turbulent Jet Now consider a two-dimensional, turbulent jet emerging into an ambient fluid at rest, and contrast it to the laminar jet analyzed in Ex. 14.2. (a) Find how the velocity and jet width scale with distance downstream from the nozzle.

(b) Repeat the exercise for a three-dimensional jet.

Exercise 14.6 Problem: Entrainment and the Coanda Effect

(a) Evaluate the scaling of the rate of mass flow (discharge) M˙ (x) along the three-dimensional, turbulent jet of the previous exercise. Show that it must increase with distance from the nozzle so that mass must be entrained into the flow and become turbulent.

(b) Entrainment is responsible for the Coanda effect in which a turbulent jet is attracted by a solid surface (Fig. 14.6). Can you offer a physical explanation for this effect?

(c) Compare the entrainment rate for a turbulent jet with that for a laminar jet (Ex. 14.2). Do you expect the Coanda effect to be larger for a turbulent or a laminar jet? The Coanda effect is important in aeronautics; for example, it is exploited to prevent the separation of the boundary layer from the upper surface of a wing, thereby improving the wing’s lift and reducing its drag.

Exercise 14.7 Example: Excitation of Earth’s Normal Modes by Atmospheric Turbulence3 3

Problem devised by David Stevenson; based in part on Tanimoto and Um (1999) who, however, use the pressure spectrum deduced in part (i) rather than the more nearly correct spectrum of part (ii). The difference in spectra does not much affect their conclusions

19 The Earth has normal modes of oscillation, many of which are in the milliHertz frequency range. Large earthquakes occasionally excite these modes strongly, but the quakes are usually widely spaced in time compared to the ringdown time of a particular mode (typically a few days). There is evidence of a background level of continuous excitation of these modes, with an rms ground acceleration per mode ∼ 10−10 cm/s2 at seismically “quiet” times. Stochastic forcing by the pressure fluctuations associated with atmospheric turbulence is suspected. This exercise deals with some aspects of this hypothesis. (a) Estimate the rms pressure fluctuations P (f ) at frequency f , in a bandwidth equal to frequency ∆f = f , produced on the earth’s surface by atmospheric turbulence, assuming a Kolmogorov spectrum for the turbulent velocities and energy. Make your estimate in two ways: (a) via dimensional analysis (what quantity can you construct from the energy cascade rate q, atmospheric density ρ and frequency f that has dimensions of pressure?), and (b) via the kinds of arguments about eddy sizes and speeds developed in Sec. 14.3.4. (b) Your answer in part (i) should scale with frequency as P (f ) ∝ 1/f . In actuality, the measured pressure spectra have a scaling law more nearly like P (f ) ∝ 1/f 2/3 , not P (f ) ∝ 1/f [e.g., Fig. 2a of Tanimoto and Um (1999)]. Explain this; i.e., what is wrong with the argument in (i), and how can you correct it to give P (f ) ∝ 1/f 2/3 ? Hint: There is a poem by Lewis Fry Richardson, which says: Big whirls have little whirls, which feed on their velocity. Little whirls have lesser whirls, and so on to viscosity. (c) The low-frequency cutoff for this pressure spectrum is about 0.5 mHz, and at 1 mHz, P (f ) has the value P (f = 1mHz) ∼ 0.3Pa, which is about 3 × 10−6 of atmospheric pressure. Assuming that 0.5 mHz corresponds to the largest eddies, which have a length scale of a few km (a little less than the scale height of the atmosphere), derive an estimate for the eddies’ turbulent viscosity νt in the lower atmosphere. By how many orders of magnitude does this exceed the molecular viscosity? What fraction of the sun’s energy input to Earth (∼ 106 erg cm−2 s−1 ) goes into maintaining this turbulence (assumed to be distributed over the lowermost 10 km of the atmosphere)? (d) At f = 1 mHz, what is the characteristic spatial scale (wavelength) of the relevant normal modes of the Earth? [Hint: The relevant modes have few or no nodes in the radial direction. All you need to answer this is a typical wave speed for seismic shear waves, which you can take to be 5 km/s.] What is the characteristic spatial scale (eddy size) of the atmospheric pressure fluctuations at this same frequency, assuming isotropic turbulence? Suggest a plausible estimate for the rms amplitude of the pressure fluctuation averaged over a surface area equal to one square wavelength of the earth’s normal modes. (You must keep in mind the random spatially and temporally fluctuating character of the turbulence.)

20 (e) Using your answer from (iv) and a characteristic shear and bulk modulus for the Earth’s deformation of K ∼ µ ∼ 1012 dyne cm−2 , comment on how the observed rms normalmode acceleration (10−10 cm s−2 ) compares with that expected from stochastic forcing due to atmospheric turbulence. You may need to go back to Chaps. 10 and 11, and think about the relationship between surface force and surface deformation. [Note: There are several issues in doing this assessment accurately that have not been dealt with in this exercise, e.g. number of modes in a given frequency range; so don’t expect to be able to get an answer more accurate than an order of magnitude.]

****************************

14.4

Turbulent Boundary Layers

Much interest surrounds the projection of spheres of cork, rubber, leather and string by various parts of the human anatomy, with and without the mechanical advantage of levers of willow, ceramic and the finest Kentucky ash. As is well-known, changing the surface texture, orientation, and spin of a ball in various sports can influence the trajectory markedly. Much study has been made of ways to do this both legally and illegally. Some procedures used by professional athletes are pure superstition, but many others find physical explanations that are good examples of the behavior of boundary layers. Many sports involve the motion of balls where the boundary layers can be either laminar or turbulent, and this allows opportunities for controlling the flow. With the goal of studying this, let us now consider the structure of a turbulent boundary layer—first along a straight wall, and later along a ball’s surface.

14.4.1

Profile of a Turbulent Boundary Layer

In Chap. 13, we derived the Blasius profile for a laminar boundary layer and showed that its thickness a distance x downstream from the start of the boundary layer was roughly 3δ = 3(νx/V )1/2 , where V is the free stream speed; cf. Fig. 13.9. As we have √ described, 5 when the Reynolds number is large enough, Rd = V d/ν ∼ 3 × 10 or Rδ ∼ Rd ∼ 500 in the case of flow past a cylinder (Figs. 14.1 and 14.2), the boundary layer becomes turbulent. A turbulent boundary layer consists of a thin laminar sublayer of thickness δls close to the wall and a much thicker turbulent zone of thickness δt ; Fig. 14.7. In the following paragraphs we shall use the turbulence concepts developed above to compute, in order of magnitude, the structure of the laminar sublayer and the turbulent zone, and the manner in which those structures evolve along the boundary. We denote by y distance perpendicular to the boundary, and by x distance along it in the direction of the flow. One key to the structure of the boundary layer is the fact that, in the x component of the time-averaged Navier-Stokes equation, the stress-divergence term Txy,y has the potential to be so huge (because of the boundary layer’s small thickness) that no other term can compensate it. This is true in the turbulent zone, where Txy is the huge Reynolds stress, and also true in the laminar sublayer, where Txy is the huge viscous stress produced by a huge

21 V

ulen

t

y

turb

δt

y laminar sublayer

δls

δls

ar lamin

x

vx

(a)

(b)

Fig. 14.7: Structure of a turbulent boundary layer.

shear that results from the thinness of the layer. (One can check at the end of the following analysis that, for the computed boundary-layer structure, other terms in the x component of the Navier-Stokes equation are indeed so small that they could not compensate a significantly nonzero Txy,y .) This potential dominance of Txy,y implies that the flow must adjust itself so as to make Txy,y be nearly zero, i.e. Txy be (very nearly) independent of distance y from the boundary. In the turbulent zone Txy is the Reynolds stress, ρvℓ2 , where vℓ is the turbulent velocity of the largest eddies at a distance y from the wall; and therefore constancy of Txy implies constancy of vℓ . The largest eddies at y will have a size ℓ of order the distance y from the wall, and correspondingly, the turbulent viscosity will be νt ∼ vℓ y/3. Equating the expression ρvℓ2 for the Reynolds stress to the alternative expression 2ρνt 21 v¯,y (where v¯ is the mean flow speed at y and 12 v¯,y is the shear), and using νt ∼ vℓ y/3 for the turbulent viscosity, we discover that in the turbulent zone the mean flow speed varies logarithmically with distance from the wall: v¯ ∼ vℓ ln y + constant. Since the turbulence is created at the inner edge of the turbulent zone, y ∼ δls , by interaction of the mean flow with the laminar sublayer, the largest turbulent eddies there must have their turnover speeds vℓ equal to the mean-flow speed there: v¯ ∼ vℓ at y ∼ δls . This tells us the normalization of the logarithmically varying mean flow speed: v¯ ∼ vℓ [1 + ln(y/δls )] at y & δls .

(14.24)

Turn, next to the structure of the laminar sublayer. There the constant shear stress is viscous, Txy = ρν¯ v,y . Stress balance at the interface between the laminar sublayer and the turbulent zone requires that this viscous stress be equal to the turbulent zone’s ρvℓ2 . This equality implies a linear profile for the mean flow velocity in the laminar sublayer, v¯ = (vℓ2 /ν)y. The thickness of the sublayer is then fixed by continuity of v¯ at its outer edge, (vℓ2 /ν)δℓ = vℓ . Combining these last two relations, we obtain the following profile and laminar-sublayer thickness: y at y . δls ∼ ν/vℓ . v¯ ∼ vℓ (14.25) δls Having deduced the internal structure of the boundary layer, we turn to the issue of what determines the y-independent turbulent velocity vℓ of the largest eddies. This vℓ is fixed by

22 matching the turbulent zone to the free-streaming region outside it. The free-stream velocity V must be equal to the mean flow velocity v¯ [Eq. (14.24)] at the outer edge of the turbulent zone. The logarithmic term will dominate, so V = vℓ ln(δt /δls ). Introducing an overall Reynolds number for the boundary layer, Rδ ≡ V δt /ν,

(14.26)

and noting that turbulence requires a huge value (& 1000) of this Rδ , we can reexpress V as V ∼ vℓ ln Rδ . This should actually be regarded as an equation for the turbulent velocity of the largest scale eddies in terms of the free-stream velocity: vℓ ∼

V . ln Rδ

(14.27)

If the thickness δt of the entire boundary layer and the free-stream velocity V are given, then Eq. (14.26) determines the boundary layer’s Reynolds number, Eq. (14.27) then determines the turbulent velocity, and Eqs. (14.25) and (14.24) determine the layer’s internal structure. Turn, finally, to the issue of how the boundary layer thickness δt evolves with distance x down the wall (and correspondingly, how all the rest of the boundary layer’s structure, which is fixed by δt , evolves). At the turbulent zone’s outer edge, the largest turbulent eddies move with speed vℓ into the free-streaming fluid, entraining that fluid into themselves (cf. Ex. 14.6 on entrainment and the Coanda effect). Correspondingly, the thickness grows at a rate vℓ 1 dδt = = . dx V ln Rδ

(14.28)

Since ln Rδ depends only extremely weakly on δt , the turbulent boundary layer expands essentially linearly with distance x, by contrast with a laminar boundary layer’s δ ∝ x1/2 . One can easily verify that, not only does the turbulent boundary layer expand more rapidly than the corresponding laminar boundary layer would, if it were stable, but the turbulent layer is also thicker at all locations down the wall. Physically, this can be traced to the fact that the turbulent boundary layer involves a three-dimensional velocity field, whereas the corresponding laminar layer would involve only a two-dimensional field. The enhanced thickness and expansion contribute to an enhanced ability to withstand an adverse pressure gradient and to cling to the surface without separation (the “Coanda effect”; Fig. 14.6 and Ex. 14.6). However, there is a price to be paid for this benefit. Since the velocity gradient is increased close to the surface, the actual surface shear stress exerted by the turbulent layer, through its laminar sublayer, is significantly larger than in the corresponding laminar boundary layer. As a result, if the layer were to remain laminar, that portion that would adhere to the surface would produce less viscous drag than the corresponding portion of the turbulent layer. Correspondingly, in a long, straight pipe, the drag on the pipe wall goes up when the boundary layer becomes turbulent. However, for flow around a cylinder or other confined body, the drag goes down! cf. Fig. 14.2. The reason is that in the separated, laminar boundary layer the dominant source

23 of drag is not viscosity but rather a pressure differential between the front face of the cylinder, where the layer adheres, and the back face where the reverse eddies circulate. The pressure is much lower in the back-face eddies than in the front-face boundary layer, and that pressure differential gives rise to a significant drag, which gets reduced when the layer goes turbulent and adheres to the back face. Therefore, if one’s goal is to reduce the overall drag and the laminar flow is prone to separation, a nonseparating (or delayed-separation) turbulent layer is to be prefered over the laminar layer. Similarly (and for essentially the same reason), for an airplane wing, if one’s goal is to maintain a large lift, then a nonseparating (or delayedseparation) turbulent layer is to be prefered over a separating, laminar one.4 For this reason, steps are often taken in engineering flows to ensure that boundary layers become and remain turbulent. A crude but effective example is provided by the vortex generators that are installed on the upper surfaces of some airfoils. These are small obstacles on the wing which penetrate through a laminar boundary layer into the free flow. By changing the pressure distribution, they force air into the boundary layer and initiate threedimensional vortical motion in the boundary layer forcing it to become partially turbulent. This allows airplanes to climb more steeply without stalling due to boundary-layer separation, and it helps reduce aerodynamical drag.

k no Im (w) < 0 inflection wit hi stable nfl Im ( w) > 0 ec tio n unstable Im (w) < 0 stable Rcrit ~ 500

Rδ

Fig. 14.8: Values of wave number k for stable and unstable wave modes in a laminar boundary layer with thickness δ, as a function of the boundary layer’s Reynolds number Rδ = V δ/ν. If the unperturbed velocity distribution vx (y) has no inflection point, i.e. if d2 vx /dy 2 < 0 everywhere as is the case for the Blasius profile (Fig. 13.9), then the unstable modes are confined to the shaded region. If there is an inflection point (so d2 vx /dy 2 > 0 near the wall but becomes negative farther from the wall), as is the case near a surface of separation (Fig. 13.11), then the unstable region is larger and does not asymptote to k = 0 as Rδ → ∞, i.e. it has a boundary like the dashed curve.

14.4.2

Instability of a Laminar Boundary Layer

Much work has been done on the linear stability of laminar boundary layers. The principles of such stability analyses should now be familiar, although the technical details are formidable. 4

Another example of separation occurs in ”Lee waves” which can form when wind blows over a mountain range. These consist of standing-wave eddies in the separated boundary layer, somewhat analogous to the Karman vortex street of Fig. 14.2(d); and they are sometimes used by glider pilots to regain altitude.

24 turbulent boundary layer

laminar boundary layer

turbulent wake

Γ

Force

(a) Golf Ball

turbulent boundary layer

(b) Cricket Ball

Force

(c) Baseball

Fig. 14.9: Boundary layers around golf balls, cricket balls, and baseballs, as they move leftward relative to the air — i.e., as the air flows rightward as seen in their rest frames.

In the simplest case an equilibrium flow like the Blasius profile is identified and the equations governing the time evolution of small perturbations are written down. The spatial and temporal evolution of individual Fourier components is assumed to vary as exp i(k · x − ωt), and we seek modes that have zero velocity perturbation on the solid surface past which the fluid flows, and that decay to zero in the free stream. We ask whether there are unstable modes, i.e., modes with real k for which the imaginary part of ω is positive so they grow exponentially in time. The results can generally be expressed in the form of a diagram like Fig. 14.8. It is found that there is generally a critical Reynolds number at which one mode becomes unstable. At higher values of the Reynolds number a range of k-vectors are unstable. One interesting result of these calculations is that in the absence of viscous forces (i.e., in the limit Rδ → ∞), the boundary layer is unstable if and only if there is a point of inflection in the velocity profile (a point where d2 vx /dy 2 changes sign); cf. Fig. 14.8 and Ex. 14.9. Although, in the absence of an inflection, an inviscid flow vx (y) is stable, for some such profiles even the slightest viscosity can trigger instability. Physically, this is because viscosity can tap the relative kinetic energies of adjacent flow lines. Viscous-triggered instabilities of this sort are sometimes called secular instabilities by contrast with the dynamical instabilities that arise in the absence of viscosity. Secular instabilities are quite common in fluid mechanics.

14.4.3

The flight of a ball.

Having developed some insights into boundary layers and their stability, we now apply those insights to the balls used in various sports. The simplest application is to the dimples on a golf ball [Fig. 14.9(a)]. The dimples provide finite-amplitude disturbances in the flow which can initiate the formation of growing wave modes and turbulence in the boundary layer. The adherence of the boundary layer to the ball is improved and separation occurs further behind the ball leading to a lower drag coefficient and a greater range of flight; see Figs. 14.2 and 14.9(a). A variant on this mechanism is found in the game of cricket, which is played with a ball whose surface is polished leather with a single equatorial seam of rough stitching. When the ball is “bowled” in a non-spinning way with the seam inclined to the direction of motion,

25 v

Starting Vortex

Vortex Lines

Γ V

v

ω

v

Fig. 14.10: Vortex lines passing through a spinning ball. The starting vortex is created and shed when the ball is thrown, and is carried downstream by the flow as seen in the ball’s frame of reference. The vortex lines connecting this starting vortex to the ball lengthen as the flow continues.

there is a laminar boundary layer on the smooth side and a turbulent boundary layer on the side with the rough seam [Fig. 14.9(b)]. These two boundary layers separate at different points behind the flow leading to a net deflection of the air. The ball will therefore swerve towards the side with the leading seam. (The effect is strongest when the ball is new and still shiny and on days when the humidity is high so the thread in the seam swells and is more efficient at making turbulence.) This mechanism is different from that used to throw a slider or curveball in baseball, in which the pitcher causes the ball to spin about an axis roughly perpendicular to the direction of motion. In the slider the axis is vertical; for a curveball it is inclined at about 45◦ to the vertical. The spin of the ball creates circulation (in a nonrotating, inertial frame) like that around an airfoil. The pressure forces associated with this circulation produce a net sideways force in the direction of the baseball’s rotational velocity on its leading hemisphere, i.e. as seen by the hitter [Fig. 14.9(c)]. The physical origin of this effect is actually quite complex and is only properly described with reference to experimental data. The major effect is that separation is delayed on the side of the ball where the rotational velocity is in the same direction as the airflow, and happens sooner on the opposite side [Fig. 14.9(c)], leading to a pressure differential. The reader may be curious as to how this circulation can be established in view of Kelvin’s theorem, Eq. (13.15), which tells us that if we use a circuit that is so far from the ball and its wake that viscous forces cannot cause the vorticity to diffuse to it, then the circulation must be zero. What actually happens is that when the flow is initiated, starting vortices are shed by the ball and are then convected downstream, leaving behind the net circulation Γ that passes through the ball (Fig. 14.10). This effect is very much larger in two dimensions with a rotating cylinder than in three dimensions because the magnitude of the shed vorticity is much larger. It goes by the name of Magnus effect in two dimensions and Robins effect in three. It is also useful in understanding the lift in airfoils. In table tennis, a drive is often hit with topspin so that the ball rotates about a horizontal axis perpendicular to the direction of motion. In this case, the net force is downwards and the

26 ball falls faster toward the ground, the effect being largest after it has somewhat decelerated. This allows a ball to be hit hard over the net and bounce before passing the end of the table, increasing the margin for errors in the direction of the hit. Those wishing to improve their curveballs or cure a bad slice are referred to the monographs by Adair (1990), Armenti (1992) and Lighthill (1986). **************************** EXERCISES Exercise 14.8 Problem: Effect of drag A well hit golf ball travels about 300 yards. A fast bowler or fastball pitcher throws a ball at over 90 m.p.h (miles per hour). A table tennis player can hit a forehand return at about 30 m.p.h. The masses and sizes of each of these three types of balls are mg ∼ 46g, dg ∼ 43mm, mc ∼ 160g, dc ∼ 70mm, mb ∼ 140g, db ∼ 75mm, mtt ∼ 2.5g, dtt ∼ 38mm. (a) For golf, cricket (or baseball) and table tennis, estimate the Reynolds number of the flow and infer the drag coefficient, CD . (The variation of CD with Rd can be assumed to be similar to that in flow past a cylinder.) (b) Hence estimate the importance of aerodynamic drag in determining the range of a ball in each of these three cases. Exercise 14.9 Problem: Tollmein-Schlicting Waves Consider an inviscid (ν = 0), incompressible flow near a plane wall where a boundary layer is established. Introduce coordinates x parallel to the wall and y perpendicular to the wall. Let the components of the equilibrium velocity be {vx (y), vy (y), 0}. (a) Show that a small perturbation in the velocity, δvy ∝ exp ik(x − ct), with k real and frequency ck possibly complex, satisfies the differential equation 1 d2 vx ∂ 2 δvy 2 (14.29) = + k δvy . ∂y 2 (vx − c) dy 2 Hence argue that a sufficient condition for unstable wave modes (Im(c) > 0), is that the velocity field possess a point of inflection; cf. Fig. 14.8. (The boundary layer can also be unstable in the absence of a point of inflection, but viscosity must be present to trigger the instability.)

****************************

27 z

φ

r solid: center of perturbed, wavy Taylor roll

P v

v

v

v

v

v

(a)

z

φ

h

sta equil ble ilib Tayl or ria w rol it ls

bifurcation point

unstable R c1 equilibria with no Taylor rolls

r dashed: center of unperturbed Taylor roll

stable

(b)

R

(c)

Fig. 14.11: Bifurcation in Couette flow. (a) Equilibrium flow with Taylor rolls. (b)Bifurcation diagram in which the amplitude of the poloidal circulation |ΓP | in a Taylor roll is plotted against the Reynolds number R. At low R (R < Rc1 ) the only equilibrium flow configuration is smooth, azimuthal flow. At larger R (Rc1 < R < Rc2 ) there are two equilibria, one with Taylor rolls and stable, the other the smooth, azimuthal flow, which has become unstable. (c) Shape of a Taylor roll at Rc1 < R < Rc2 (dashed ellipse) and at higher R, Rc2 < R < Rc3 (wavy curve).

14.5

The Route to Turbulence: Onset of Chaos

14.5.1

Couette Flow

Let us examine qualitatively how a viscous flow becomes turbulent. A good example is Couette flow between two long, concentric, relatively rotating cylinders as introduced in Chap. 13 and depicted in Fig. 14.11(a). The Rayleigh stability criterion (flow unstable if and only if angular momentum per unit mass decreases outward) was derived in Chap. 13 ignoring viscous stress. Now suppose we have a flow that is stable according to the Rayleigh criterion. Suppose, further, that the fluid is a liquid and we steadily decrease its viscosity by heating it, so that the Reynolds number steadily increases. At low R, the equilibrium flow is stationary and azimuthal [strictly in the φ direction in Fig. 14.11(a)]. However, at some critical Reynolds number Rc1 , the flow becomes unstable to the growth of small perturbations, and these perturbations drive a transition to a new, stationary equilibrium that involves poloidal circulation (quasi-circular motions in the r and z directions, called Taylor rolls); see Fig. 14.11(a). What has happened is that an equilibrium with a high degree of symmetry has become unstable and a new, lower-symmetry, stable equilibrium has taken over. Translational invariance in the direction of the cylinder axis has been lost from the flow, despite the fact that the boundary conditions remain translationally symmetric. This change of equilibrium mode is another example of a bifurcation like that discussed when we treated the buckling of beams and playing cards (Chaps. 10 and 11). As R is increased further, this process repeats: At a second critical Reynolds number

28 U(f)

U(f)

U(f)

f1

2f1

(a)

3f1

f f2 − f1

f1 f2 2f1 f1+f2

(b)

f 2f1+f2

f2− f1

f1 f2 2f1 f f1+f2 2f1+f2

(c)

Fig. 14.12: The energy spectrum of velocity fluctuations in rotating Couette flow (schematic). (a) For a moderate Reynolds number, Rc2 < R < Rc3 , at which the stable equilibrium flow is that with the wavy Taylor rolls of Fig. 14.11(c). (b) For a higher Reynolds number, Rc3 < R < Rc4 , at which the stable flow has wavy Taylor rolls with two incommensurate fundamental frequencies present. (c) For a still higher Reynolds number, R > Rc4 , at which turbulence has set in.

Rc2 there is a second bifurcation of equilibria in which the azimuthally smooth Taylor rolls become unstable and are replaced by new, azimuthally wavy Taylor rolls; see Fig. 14.11(c). Again, an equilibrium with higher symmetry (rotation invariance) has been replaced, at a bifurcation point, by one of lower symmetry (no rotation invariance). There is a fundamental frequency f1 associated with the wavy Taylor rolls’ motion as they circulate around the central cylinder. Since the waves are nonlinearly large, harmonics of this fundamental are also seen when one Fourier decomposes the velocity field; cf. Fig. 14.12(a). When R is increased still further to some third critical value Rc3 , there is yet another bifurcation. The Taylor rolls now develop a second set of waves, superimposed on the first, with a corresponding new fundamental frequency f2 that is incommensurate with f1 . In the energy spectrum one now sees various harmonics of f1 and of f2 , as well as sums and differences of these two fundamentals; cf. Fig. 14.12(b). It is exceedingly difficult to construct experimental apparatus that is clean enough, and free enough from the effects of finite lengths of the cylinders, to reveal what happens next as one turns up the Reynolds number. However, despite the absence of clean experiments, it seemed obvious before the 1970’s what would happen: The sequence of bifurcations would continue, with ever decreasing intervals of Reynolds number ∆R between them, producing after awhile such a complex maze of frequencies, harmonics, sums, and differences, as to be interpreted as turbulence. Indeed, one finds the onset of turbulence described in just this manner in the classic fluid mechanics textbook of Landau and Lifshitz (1959). The 1970’s and 1980’s brought a major breakthrough in our understanding of the onset of turbulence. This breakthrough came from studies of model dynamical systems with only a few degrees of freedom, in which nonlinear effects play similar roles to the nonlinearities of the Navier-Stokes equation. These studies revealed only a handful of routes to irregular or unpredictable behavior known as chaos, and none were of the Landau-Lifshitz type. However, one of these routes starts out in the same manner as does rotating Couette flow: As a control parameter (Reynolds number for Couette flow) is gradually increased, first oscillations with one fundamental frequency f1 and its harmonics turn on; then a second frequency f2 and its harmonics turn on, along with sums and differences of f1 and f2 ; and then, suddenly, chaos sets in. Moreover, the chaos is clearly not being produced by a complicated superposition

29 of other, new frequencies; it is fundamentally different from that. The best Couette-flow experiments of the 1980’s and later appear to confirm that the onset of turbulence goes by this route; see Fig. 14.12(c).

14.5.2

Feigenbaum Sequence

The very simplest of systems in which one can study the several possible routes to chaos are one-dimensional mathematical maps. A lovely example is the “Feigenbaum sequence,” explored by Mitchell Feigenbaum in the 1970’s. The Feigenbaum sequence is a sequence {x1 , x2 , x3 , . . .} of values of a real variable x, given by the rule (sometimes called the logistic equation) 5 xn+1 = 4axn (1 − xn ).

(14.30)

Here a is a fixed “control” parameter. It is easy to compute Feigenbaum sequences {xn } for different values of a on a personal computer (Ex. 14.10). What is found is that there are critical parameters a1 , a2 , . . . at which the character of the sequence changes sharply. For a < a1 , the sequence asymptotes to a stable fixed point. For a1 < a < a2 , the sequence asymptotes to stable, periodic oscillations between two fixed points. If we increase the parameter further, so that a2 < a < a3 , the sequence becomes a periodic oscillation between four fixed points. The period of the oscillation has doubled. This period doubling (NOT frequency doubling) happens again: When a3 < a < a4 , x asymptotes to regular motion between eight fixed points. Period doubling increases with shorter and shorter intervals of a until at some value a∞ , the period becomes infinite and the sequence does not repeat. Chaos has set in. This period doubling is a second route to chaos, very different in character from the “one-frequency, two-frequencies, chaos” route that one meets in Couette flow. Remarkably, fluid dynamical turbulence can set in by this second route, as well as by the first. It does so in certain very clean experiments on convection in liquid helium. We shall return to this below, and then again in Chap. 17. How can so starkly simple and discrete a thing as a one-dimensional map bear any relationship at all to the continuous solutions of the fluid dynamical differential equations? The answer is quite remarkable: Consider a steady flow in which one parameter a (e.g. the Reynolds number) can be adjusted. Now, as we change a and approach turbulence, the flow may develop a periodic oscillation with a single frequency f1 . We could measure this by inserting some probe at a fixed point in the flow to measure a fluid variable y, e.g. one component of the velocity. We can detect the periodicity either by inspecting the readout y(t) or its Fourier transform y˜. However, there is another way, that may be familiar from classical mechanics. This is to regard {y, y} ˙ as the two coordinates of a two-dimensional phase space. (Of course the dimensionality of the phase space could be arbitrarily large, but let us keep matters as simple 5

This equation first appeared in discussions of population biology (Verhulst, 1838). If we consider xn as being proportional to the number of animals in a species, the number in the next season should be proportional to the number of animals already present and to the availability of resources which will decrease as xn approaches some maximum value, in this case unity. Hence the terms xn and 1 − xn in Eq. (14.30).

30

x

3

x

1

=x

3

x

1

=x

5

x

y

x

y

y

x

4

x

2

x

2

y

(a)

(b)

Fig. 14.13: a) Representation of a single periodic oscillation as motion in phase space. b) Motion in phase space after period doubling. The behavior of the system may also be described by using the coordinate x of the Poincaré map.

as possible.) For a single periodic oscillation, the system will follow a closed path in this phase space [Fig. 14.13(a)]. As we increase a further, a period doubling may occur and the trajectory in phase space may look like Fig. 14.13(b). Now, as we are primarily interested in the development of the oscillations, we need only keep one number for every fundamental period P1 = 1/f1 . Let us do this by taking a section through phase space and introducing a coordinate x on this section as shown in Fig. 14.13. The n’th time the trajectory crosses through this section, its crossing point is xn , and the mapping from xn to xn+1 can be taken as a representative characterization of the flow. When only the frequency f1 is present, the map will read xn+2 = xn [Fig. 14.13(a)]. When f1 and f2 = 12 f1 are present, the map will read xn+4 = xn [Fig. 14.13(b)]. (These specific maps are overly simple compared to what one may encounter in a real flow, but they illustrate the idea.) To reiterate, instead of describing the flow by the full solution v(x, t) to the Navier-Stokes equations and the flow’s boundary conditions, we can construct the simple map xn → xn+1 to characterize the flow. This procedure is known as a Poincaré map. The mountains have labored and brought forth a mouse! However, this mouse turns out to be all that we need. For the convection experiments, just the same period doubling behavior and approach to chaos are present in these maps as in the original phase-space diagram and in the full solution to the fluid dynamical equations; and when observed in the Poincaré maps, it looks qualitatively the same as in the Feigenbaum sequence. It is remarkable that for a system with so many degrees of freedom, chaotic behavior can be observed by suppressing almost all of them. If, in the period-doubling route to chaos, we compute the limiting ratio, F = lim

j→∞

aj − aj−1 , aj+1 − aj

(14.31)

we find that it has the value 4.6692016090 . . . . This (Feigenbaum) number seems to be a universal constant characteristic of most period doubling routes to chaos, independent of the

31 particular map that was used. For example, if we had used xn+1 = a sin πxn

(14.32)

we would have got the same constant. The most famous illustration of the period doubling route to chaos is a classic experiment by Libchaber and Maurer (1978) on convection in liquid helium. The temperature at a point was monitored with time as the helium’s vertical temperature gradient was slowly increased. Initially, the temperature was found to oscillate with a single period, but then subharmonics started appearing one after another, until, eventually, the flow became turbulent. Libchaber was able to measure the ratio (14.31) accurate to about 3 per cent (with aj the temperature at which the j’th period doubling occurred). His result agreed with Feigenbaum’s number to within his experimental accuracy! For several other routes to chaos identified in convection experiments, see Gollub and Benson (1980). When chaos sets in, the evolution of the system becomes essentially incalculable. This is because, as can be shown mathematically, the future state, as measured by the values of a set of fluid variables at some subsequent time (or by the value of a map), becomes highly sensitive to the assumed initial conditions. Paths in phase space (or in the mapping) that start extremely close to one another diverge from each other exponentially rapidly with time. It is important to distinguish the unpredictability of classical chaos from unpredictability in the evolution of a quantum mechanical system. A classical system evolves under precisely deterministic differential equations. Given a full characterization of the system at any time t, the system is fully specified at a time t + ∆t later for any ∆t. However, what characterizes a chaotic system is that the evolution of two identical systems in neighboring initial states will eventually evolve so that they follow totally different histories. The time for this to happen is called the Lyapunov time (see Ex. 14.11). The practical significance of this essentially mathematical feature is that if, as will always be the case, we can only specify the initial state up to a given accuracy (due to practical considerations, not issues of principle), then the true initial state could be any one of those lying in some region, and we have no way of predicting what the state will be after a few Lyapunov times. Quantum mechanical indeterminacy is different. If we can prepare a system in a given state described by a wave function, the evolution will be governed quite deterministically by the time-dependent Schrödinger equation. However, if we choose to make a measurement of an observable, many quite distinct outcomes are immediately possible and the system will be left in an eigenstate corresponding to the actual measured outcome. The quantum mechanical description of classical chaos is the subject of quantum chaos. The realisation that many classical systems have an intrinsic unpredictability despite being deterministic from instant to instant has been widely publicised in popularisations of research into chaos. However it is not particularly new. It was well understood, for example, by Poincaré at the turn of the century, and watching the weather report on the nightly news bears witness to its dissemination into the popular culture! What is new and intriguing is the manner in which the transition from a deterministic to a non-predictable evolution happens. Chaotic behavior is well documented in a variety of physical dynamical systems: electrical

32 circuits, nonlinear pendula, dripping faucets, planetary motions and so on. The extent to which the principles that have been devised to describe chaos in these systems can also be applied to general fluid turbulence remains a matter for debate. There is no question that there are similarities, and there has been quantitative success in applying chaos results to a limited form of turbulent convection. However, most forms of turbulence are not so easily described and there is still a huge gap between the intriguing mathematics of chaotic dynamics and practical applications to natural and technological flows. **************************** EXERCISES Exercise 14.10 Problem: Feigenbaum Sequence Use a computer to calculate the first five critical parameters aj in the Feigenbaum sequence, Eq. (14.30). Hence verify that the ratio of successive differences, tends toward the limit quoted in Eq. (14.31). (Hint. You might find it helpful to construct a graph to find suitable starting values, x1 and starting parameters a.) Exercise 14.11 Example: Lyapunov Exponent Consider the logistic equation (14.30) for the special case a = 1, which is large enough to ensure that chaos has set in. (a) Make the substitution xn = sin2 πθn and show that the equation can be expressed in the form θn+1 = 2θn (mod 1); i.e., θn+1 = fractional part of 2θn . . (b) Write θn as a “binimal” (binary decimal). For example 11/16 = 1/2 +1/8+1/16 has the binary decimal form 0.1011 . Explain what happens to this number in each successive iteration. (c) Now suppose that an error is made in the i’th digit of the starting binimal. When will it cause a major error in the predicted value of xn ? (d) If the error after n iterations is written ǫn , show that the Lyapunov exponent p defined by 1 ǫn p = lim ln (14.33) n→∞ n ǫ0 is ln 2 (so ǫn ≃ 2n ǫ0 for large enough n). Lyapunov exponents play an important role in the theory of dynamical systems.

Exercise 14.12 Example: Strange Attractors Another interesting one-dimensional map is provided by the recursion relation, 1 xn+1 = a 1 − 2 xn − 2

(14.34)

33 (a) Consider the asymptotic behavior of the variable xn for different values of the parameter a, with both xn and a being confined to the interval [0, 1]. In particular find that for 0 < a < acrit (for some acrit ), the sequence xn converges to a stable fixed point, but for acrit < a < 1, the sequence wanders chaotically through some interval [xmin , xmax ]. (b) Using a computer, calculate the value of acrit and the interval [xmin , xmax ] for a = 0.8. (c) The interval [xmin , xmax ] is an example of a strange attractor. It has the property that if we consider sequences with arbitrarily close starting values, their values of xn in this range will eventually diverge. Show that the attractor is strange by computing the sequences with a = 0.8 and starting values x1 = 0.5, 0.51, 0.501, 0.5001. Determine the number of iterations nǫ required to produce significiant divergence as a function of ǫ = x1 − 0.5. It is claimed that nǫ ∼ − ln2 (ǫ). Can you verify this? Note that the onset of chaos at a = acrit is quite sudden in this case, unlike the behavior exhibited by the Feigenbaum sequence. Exercise 14.13 Problem: Lorenz equations One of the first discoveries of chaos in a mathematical model was by Lorenz (1963), who made a simple model of atmospheric convection. In this model, the temperature and velocity field are characterized by three variables, x, y, z, which satisfy the coupled, nonlinear differential equations x˙ = 10(y − x) , y˙ = −xz + 28x − y , z˙ = xy − 8z/3 .

(14.35)

(The precise definitions of x, y, z need not concern us here.) Integrate these equations numerically to show that x, y, z follow non-repeating orbits in the three-dimensional phase space that they span, but follow certain broadly defined paths in this space. (It may be helpful to plot out the trajectories of pairs of the dependent variables.) [Note: These Lorenz equations are often studied with the numbers 10, 28, 8/3 replaced by parameters σ, ρ, and β. As these parameters are varied, the behavior of the system changes.]

****************************

Bibliographic Note Turbulence is omitted from many standard textbooks on fluid mechanics, aside from brief descriptions, presumably because it is so poorly understood. Good textbook treatments can be found in White (1991), Tennekes and Lumley (1972), and from a more physical perspective, Tritton (1977). To develop physical insight into turbulence, we recommend

34 Box 14.2 Important Concepts in Chapter 14 • Weak turbulence contrasted with strong or fully developed turbulence, Sec. 14.1 • Scaling relation, Sec. 14.2 • Stagnation pressure, Sec. 14.2 • Drag coefficient, Sec. 14.2 • Karman vortex street, Sec. 14.2 • Critical Reynolds number, Rd ∼ 1000, for onset of turbulence, Sec. 14.2 • Entrainment, Coanda effect, and its role on airplane wings, Secs. 14.2, 14.4.1, Ex. 14.6 • Intermittency, Sec. 14.1 • Role of vorticity in turbulence: stretching of vortex tubes, Sec. 14.3.3, Fig. 14.4 • Eddies, energy cascade, viscous damping at smallest scale, Sec. 14.3.4 • Kolmogorov spectrum, Sec. 14.3.4 • Weak turbulence theory, Sec. 14.3 – Decomposition into time averaged flow and fluctuating flow, Sec. 14.3.1 – Reynolds stress, tubulent viscosity, and their role in coupling fluctuating flow to time-averaged flow, Secs. 14.3.1, 14.3.2 – The physics that governs the structure of the time-averaged flow in boundary layers, wakes and jets, Sec. 14.4.1, Exs. 14.4, 14.5 • Secular instability contrasted with dynamical instability, Sec. 14.4.2 • Rotating Couette flow, Sec. 14.5.1 • Poincaré map and its use to produce discrete maps that characterize a flow, Sec. 14.5.2 • Lyapunov exponent, Ex. 14.11 • Strange attractor, Ex. 114.12

viewing the movie by Stewart (196?) and looking at photographs, e.g. in Van Dyke (1982). For the influence of boundary layers and turbulence on the flight of balls of various sorts, see Adair (1990), Armenti (1992), and Lighthill (1986). For the onset of turbulence, and more generally the onset of chaos in dynamical systems and mathematical maps, see Sagdeev, Usikov and Zaslavsky (1988), and Acheson (1990).

35

Bibliography Acheson, D. J. 1990. Elementary Fluid Dynamics, Oxford: Clarendon Press. Adair, R. K. 1990. The Physics of Baseball, New York: Harper and Row. Armenti, A., Jr., editor 1992. The Physics of Sports, New York: The American Institute of Physics. Drazin, P. G. and Reid, W. H. 1981. Hydrodynamic Stability, Cambridge: Cambridge University Press. Feigenbaum, M. 1978. “Universal behavior in nonlinear systems,” J. Stat. Phys., 19, 25. Gollub, J. P. 1980. “Many routes to turbulent convection,” J. Fluid. Mech., 100, 449. Landau, L. D. and Lifshitz, E. M. 1959. Fluid Mechanics, Reading, Massachusetts: Addison Wesley. Libchaber, A. and Maurer, J. “Local probe in a Rayleigh-Bénard expriment in liquid helium,” Phys. Lett. Paris, 39, L369 (1978). Lighthill, M. J. 1986. An Informal Introduction to Theoretical Fluid Mechanics, Oxford: Oxford Science Publications. Lorenz, E. N. 1963. “Deterministic nonperiodic flow”. J. Atmospheric Sciences, 20, 130. Ott, E. 1982. “Strange attractors and chaotic motions of dynamical systems” Rev. Mod. Phys., 53, 655. Ott, E. 1993. Chaos in Dynamical Systems, Cambridge: Cambridge University press. Sagdeev, R. Z., Usikov, D. A., and Zaslovsky, G. M. 1988. Non-Linear Physics from the Pendulum to Turbulence and Chaos, Harwood. Stewart, R. W. 196?. Turbulence, a movie (National Committee for Fluid Mechanics Films); available at http://web.mit.edu/fluids/www/Shapiro/ncfmf.html . Tanimoto, T. and Um, J. 1999. “Cause of Continuous Oscillations of the Earth,” J. Geophysical Research, 104, No. B12, pp. 28723–28739. Tennekes, H. and Lumley, J. L. 1972. A First Course on Turbulence, Cambridge: MIT Press. Tritton, D. J. 1977. Physical Fluid Dynamics, New York: Van Nostrand Reinhold. Van Dyke, M. 1982. An Album of Fluid Flow, Stanford: Parabolic Press.

36 Verhulst, P. F. 1838. “Notice sur la loi que la population poursuit dans son accroissement,” Correspondance Mathématique et Physique 10, 113. White, F. M. 1991. Viscous Fluid Flow, second edition, New York: McGraw Hill.

Contents 15 Waves and Convection 15.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Gravity Waves on the Surface of a Fluid . . . . . . . 15.2.1 Deep Water Waves . . . . . . . . . . . . . . . 15.2.2 Shallow Water Waves . . . . . . . . . . . . . . 15.2.3 Capillary Waves . . . . . . . . . . . . . . . . . 15.2.4 Helioseismology . . . . . . . . . . . . . . . . . 15.3 Nonlinear Shallow Water Waves and Solitons . . . . . 15.3.1 Korteweg-de Vries (KdV) Equation . . . . . . 15.3.2 Physical Effects in the KdV Equation . . . . . 15.3.3 Single-Soliton Solution . . . . . . . . . . . . . 15.3.4 Two-Soliton Solution . . . . . . . . . . . . . . 15.3.5 Solitons in Contemporary Physics . . . . . . . 15.4 Rossby Waves in a Rotating Fluid . . . . . . . . . . . 15.5 Sound Waves . . . . . . . . . . . . . . . . . . . . . . 15.5.1 Wave Energy . . . . . . . . . . . . . . . . . . 15.5.2 Sound Generation . . . . . . . . . . . . . . . . 15.5.3 T2 Radiation Reaction, Runaway Solutions, totic Expansions1 . . . . . . . . . . . . . . . 15.6 T2 Convection . . . . . . . . . . . . . . . . . . . . 15.6.1 T2 Heat Conduction . . . . . . . . . . . . . 15.6.2 T2 Boussinesq Approximation . . . . . . . . 15.6.3 T2 Rayleigh-Bénard Convection . . . . . . . 15.6.4 T2 Convection in Stars . . . . . . . . . . . . 15.6.5 T2 Double Diffusion — Salt Fingers . . . .

1

Our treatment is based on Burke (1970).

0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and Matched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asymp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 3 6 6 6 8 14 14 16 17 19 20 21 23 24 26 28 32 33 39 40 47 50

Chapter 15 Waves and Convection Version 0815.2.K, 11 February 2009. Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 15.1 Reader’s Guide • This chapter relies heavily on Chaps. 12 and 13. • Chap. 16 (compressible flows) relies to some extent on Secs. 15.2, 15.3 and 15.5 of this chapter. • The remaining chapters of this book do not rely significantly on this chapter.

15.1

Overview

In the preceding chapters, we have derived the basic equations of fluid dynamics and developed a variety of techniques to describe stationary flows. We have also demonstrated how, even if there exists a rigorous, stationary solution of these equations for a time-steady flow, instabilities may develop and the amplitude of oscillatory disturbances can grow with time. These unstable modes of an unstable flow can usually be thought of as waves that interact strongly with the flow and extract energy from it. Waves, though, are quite general and can be studied quite independently of their sources. Fluid dynamical waves come in a wide variety of forms. They can be driven by a combination of gravitational, pressure, rotational and surface-tension stresses and also by mechanical disturbances, such as water rushing past a boat or air passing through a larynx. In this chapter, we shall describe a few examples of wave modes in fluids, chosen to illustrate general wave properties. The most familiar types of wave are probably gravity waves on a large body of water (Sec. 15.2), e.g. ocean waves and waves on the surfaces of lakes and rivers. We consider these 1

2 in the linear approximation and find that they are dispersive in general, though they become nondispersive in the long-wavelength (shallow-water) limit. We shall illustrate gravity waves by their roles in helioseismology, the study of coherent-wave modes excited within the body of the sun by convective overturning motions. We shall also examine the effects of surface tension on gravity waves, and in this connection shall develop a mathematical description of surface tension (Box 15.2). In contrast to the elastodynamic waves of Chap. 11, waves in fluids often develop amplitudes large enough that nonlinear effects become important (Sec. 15.3). The nonlinearities can cause the front of a wave to steepen and then break—a phenomenon we have all seen at the sea shore. It turns out that, at least under some restrictive conditions, nonlinear waves have some very surprising properties. There exist soliton or solitary-wave modes in which the front-steepening due to nonlinearity is stably held in check by dispersion, and particular wave profiles are quite robust and can propagate for long intervals of time without breaking or dispersing. We shall demonstrate this by studying flow in a shallow channel. We shall also explore the remarkable behaviors of such solitons when they pass through each other. In a nearly rigidly rotating fluid, there is a remarkable type of wave in which the restoring force is the Coriolis effect, and which have the unusual property that their group and phase velocities are oppositely directed. These so-called Rossby waves, studied in Sec. 15.4, are important in both the oceans and the atmosphere. The simplest fluid waves of all are small-amplitude sound waves—a paradigm for scalar waves. These are nondispersive, just like electromagnetic waves, and are therefore sometimes useful for human communication. We shall study sound waves in Sec.15.5 and shall use them to explore an issue in fundamental physics: the radiation reaction force that acts back on a wave-emitting object. We shall also explore how sound waves can be produced by fluid flows. This will be illustrated with the problem of sound generation by high-speed turbulent flows—a problem that provides a good starting point for the topic of the following chapter, compressible flows. The last section of this chapter, Sec. 15.6, deals with dynamical motions of a fluid that are driven by thermal effects, convection. To understand convection, one must first understand diffusive head conduction. When viewed microscopically, heat conduction is a similar transport process to viscosity, and it is responsible for analogous physical effects. If a viscous fluid has high viscosity, then vorticity diffuses through it rapidly; simularly, if a fluid has high thermal conductivity, then heat diffuses through it rapidly. In the other extreme, when viscosity is low (i.e., when the Reynolds number is high), instabilities produce turbulence, which transports vorticity far more rapidly than diffusion could possibly do. Analogously, in heated fluids with modest conductivity, the accumulation of heat drives the fluid into convective motion and the heat is transported much more efficiently by this motion than by thermal conduction alone. As the convective heat transport increases, the fluid motion becomes more vigorous and, if the viscosity is sufficiently low, the thermally driven flow can also become turbulent. These effects are very much in evidence near solid boundaries, where thermal boundary layers can be formed, analogous to viscous boundary layers. In addition to thermal effects that resemble the effects of viscosity, there are also unique

3 thermal effects—particularly the novel and subtle combined effects of gravity and heat. Heat, unlike vorticity, causes a fluid to expand and thus, in the presence of gravity, to become buoyant; and this buoyancy can drive thermal circulation or free convection in an otherwise stationary fluid. (Free convection should be distinguished from forced convection in which heat is carried passively by a flow driven in the usual manner by externally imposed pressure gradients, for example when you blow on hot food to cool it.) The transport of heat is a fundamental characteristic of many flows. It dictates the form of global weather patterns and ocean currents. It is also of great technological importance and is studied in detail, for example, in the cooling of nuclear reactors and the design of automobile engines. From a more fundamental perspective, as we have already discussed, the analysis and experimental studies of convection have led to major insights into the route to chaos (cf. Sec. 14.5). In Sec. 15.6, we shall describe some flows where thermal effects are predominant. We shall begin in Sec. 15.6.1 by modifying the conservation laws of fluid dynamics so as to incorporate heat conduction. Then in Sec. 15.6.2 we shall discuss the Boussinesq approximation, which is appropriate for modest scale flows where buoyancy is important. This allows us in Sec. 15.6.3 to derive the conditions under which convection is initiated. Unfortunately, this Boussinesq approximation sometimes breaks down. In particular, as we discuss in Sec. 15.6.4, it is inappropriate for application to convection in stars and planets where circulation takes place over several gravitational scale heights. Here, we shall have to use alternative, more heuristic arguments to derive the relevant criterion for convective instability, known as the Schwarzschild criterion, and to quantify the associated heat flux. We shall apply this theory to the solar convection zone. Finally, in Sec. 15.6.5 we return to simple buoyancy-driven convection in a stratified fluid to consider double diffusion, a quite general type of instability which can arise when the diffusion of two physical quantities (in our case heat and the concentration of salt) render a fluid unstable despite the fact that the fluid would be stably stratified if there were only concentration gradients of one of these quantities.

15.2

Gravity Waves on the Surface of a Fluid

Gravity waves 1 are waves on and beneath the surface of a fluid, for which the restoring force is the downward pull of gravity. Familiar examples are ocean waves and the waves produced on the surface of a pond when a pebble is thrown in. Less familiar are “g-modes” of vibration of the sun, discussed at the end of this section. Consider a small-amplitude wave propagating along the surface of a flat-bottomed lake with depth ho , as shown in Fig. 15.1. As the water’s displacement is small, we can describe the wave as a linear perturbation about equilibrium. The equilibrium water is at rest, i.e. it has velocity v = 0. The water’s perturbed motion is essentially inviscid and incompressible, so ∇ · v = 0. A simple application of the equation of vorticity transport, Eq. (13.3), assures us that since the water is static and thus irrotational before and after the wave passes, it 1

Not to be confused with gravitational waves which are waves in the gravitational field that propagate at the speed of light and which we shall meet in Chap. 25

4 z 2 /k x

(x,t)

h0

Fig. 15.1: Gravity Waves propagating horizontally across a lake of depth ho .

must also be irrotational within the wave. Therefore, we can describe the wave inside the water by a velocity potential ψ whose gradient is the velocity field, v = ∇ψ .

(15.1)

Incompressibility, ∇ · v = 0, applied to this equation, implies that the velocity potential ψ satisfies Laplace’s equation ∇2 ψ = 0 (15.2) We introduce horizontal coordinates x, y and a vertical coordinate z measured upward from the lake’s equilibrium surface (cf. Fig. 15.1), and for simplicity we confine attention to a sinusoidal wave propagating in the x direction with angular frequency ω and wave number k. Then ψ and all other perturbed quantities will have the form f (z) exp[i(kx−ωt)] for some function f (z). More general disturbances can be expressed as a superposition of many of these elementary wave modes propagating in various horizontal directions (and in the limit, as a Fourier integral). All of the properties of such superpositions follow straightforwardly from those of our elementary plane-wave mode, so we shall continue to focus on it. We must use Laplace’e equation (15.2) to solve for the vertical variation, f (z), of the velocity potential. As the horizontal variation at a particular time is ∝ exp(ikx), direct substitution into Eq. (15.2) gives two possible vertical variations, ψ ∝ exp(±kz). The precise linear combination of these two forms is dictated by the boundary conditions. The one that we shall need is that the vertical component of velocity vz = ∂ψ/∂z vanish at the bottom of the lake (z = −ho ). The only combination that can vanish is a sinh function. Its integral, the velocity potential, therefore involves a cosh function: ψ = ψ0 cosh[k(z + ho )] exp[i(kx − ωt)].

(15.3)

An alert reader might note at this point that the horizontal velocity does not vanish at the lake bottom, whereas a no-slip condition should apply in practice. In fact, as we

5 discussed in Sec 13.4, a thin, viscous boundary layer along the bottom of the lake will join our potential-flow solution (15.3) to nonslipping fluid at the bottom. We shall ignore the boundary layer under the (justifiable) assumption that for our oscillating waves it is too thin to affect much of the flow. Returning to the potential flow, we must also impose a boundary condition at the surface. This can be obtained from Bernoulli’s law. The version of Bernoulli’s law that we need is that for an irrotational, isentropic, time-varying flow: v2 /2 + h + Φ + ∂ψ/∂t = constant everywhere in the flow

(15.4)

[Eqs. (12.46), (12.50)]. We shall apply this law at the surface of the perturbed water. Let us examine each term: (i) The term v2 /2 is quadratic in a perturbed quantity and therefore can be dropped. (ii) The enthalpy h = u + P/ρ (cf. Box 12.1) is a constant since u and ρ are constants throughout the fluid and P is constant on the surface and equal to the atmospheric pressure. [Actually, there will be a slight variation of the surface pressure caused by the varying weight of the air above the surface, but as the density of air is typically ∼ 10−3 that of water, this is a very small correction.] (iii) The gravitational potential at the fluid surface is Φ = gξ, where ξ(x, t) is the surface’s vertical displacement from equilibrium and we ignore an additive constant. (iv) The constant on the right hand side, which could depend on time C(t), can be absorbed into the velocity potential term ∂ψ/∂t without changing the physical observable v = ∇ψ. Bernoulli’s law applied at the surface therefore simplifies to give gξ +

∂ψ =0. ∂t

(15.5)

Now, the vertical component of the surface velocity in the linear approximation is just vz (z = 0, t) = ∂ξ/∂t. Expressing vz in terms of the velocity potential we then obtain ∂ψ ∂ξ = vz = . ∂t ∂z

(15.6)

Combining this with the time derivative of Eq. (15.5), we obtain an equation for the vertical gradient of ψ in terms of its time derivative: g

∂ψ ∂2ψ =− 2. ∂z ∂t

(15.7)

Finally, substituting Eq. (15.3) into Eq. (15.7) and setting z = 0 [because we derived Eq. (15.7) only at the water’s surface], we obtain the dispersion relation for linearized gravity waves: ω 2 = gk tanh(kho ) (15.8) How do the individual elements of fluid move in a gravity wave? We can answer this question by first computing the vertical and horizontal components of the velocity by differentiating Eq. (15.3) [Ex. 15.1]. We find that the fluid elements undergo elliptical motion similar to that found for Rayleigh waves on the surface of a solid (Sec.11.4). However, in gravity waves, the sense of rotation of the particles is always the same at a particular phase of the wave, in contrast to reversals found in Rayleigh waves. We now consider two limiting cases: deep water and shallow water.

6

15.2.1

Deep Water Waves

When the water is deep compared to the wavelength of the waves, kho ≫ 1, the dispersion relation (15.8) is approximately p ω = gk . (15.9) p Thus, deep water waves are dispersive; their group velocity Vg ≡ dω/dk = 21 g/k is half p their phase velocity, Vφ ≡ ω/k = g/k. [Note: We could have deduced the deep-water dispersion relation (15.9), up to a dimensionless multiplicative constant, by dimensional arguments: The only frequency that can be constructed from the relevant variables g, k, ρ √ is gk.]

15.2.2

Shallow Water Waves

For shallow water waves, with kho ≪ 1, the dispersion relation (15.8) becomes ω=

p

gho k .

(15.10)

√ Thus, these waves are nondispersive; their phase and group velocities are Vφ = Vg = gho . Below, when studying solitons, we shall need two special properties of shallow water waves. First, when the depth of the water is small compared with the wavelength, but not very small, the waves will be slightly dispersive. We can obtain a correction to Eq. (15.10) by expanding the tanh function of Eq. (15.8) as tanhx = x − x3 /3 + . . . . The dispersion relation then becomes p 1 2 2 (15.11) ω = gho 1 − k ho k . 6 Second, by computing v = ∇ψ from Eq. (15.3), we find that in the shallow-water limit the horizontal motions are much larger than the vertical motions, and are essentially independent of depth. The reason, physically, is that the fluid acceleration is produced almost entirely by a horizontal pressure gradient (caused by spatially variable water depth) that is independent of height; see Ex. 15.1.

15.2.3

Capillary Waves

When the wavelength is very short (so k is very large), we must include the effects of surface tension on the surface boundary condition. This can be done by a very simple, heuristic argument. Surface tension is usually treated as an isotropic force per unit length, γ, that lies in the surface and is unaffected by changes in the shape or size of the surface; see Box 15.2. In the case of a gravity wave, this tension produces on the fluid’s surface a net downward force per unit area −γd2 ξ/dx2 = γk 2 ξ, where k is the horizontal wave number. [This downward force is like that on a curved violin string; cf. Eq. (11.27) and associated discussion.] This additional force must be included in Eq. (15.5) as an augmentation of ρg. Correspondingly, the effect of surface tension on a mode with wave number k is simply to change the true

7 Box 15.2 Surface Tension In a water molecule, the two hydrogen atoms stick out from the larger oxygen atom somewhat like Micky Mouse’s ears, with an H-O-H angle of 105 degrees. This asymmetry of the molecule gives rise to a rather large electric dipole moment. In the interior of a body of water, the dipole moments are oriented rather randomly, but near the water’s surface they tend to be parallel to the surface and bond with each other so as to create surface tension — a macroscopically isotropic, two-dimensional tension force (force per unit length) γ that is confined to the water’s surface.

γ

γ

z

L

x P

(a)

y

(b)

More specifically, consider a line L in the water’s surface, with unit length [drawing (a) above]. The surface water on one side of L exerts a tension (pulling) force on the surface water on the other side. The magnitude of this force is γ and it is orthogonal to the line L regardless of L’s orientation. This is analogous to an isotropic pressure P in three dimensions, which acts orthogonally across any unit area. Choose a point P in the water’s surface and introduce local Cartesian coordinates there with x and y lying in the surface and z orthogonal to it [drawing (b) above]. In this coordinate system, the 2-dimensional stress tensor associated with surface tension has components (2) Txx =(2) Tyy = −γ, analogous to the 3-dimensional stress tensor for an isotropic pressure, Txx = Tyy = Tzz = P . We can also use a 3-dimensional stress tensor to describe the surface tension: Txx = Tyy = −γδ(z); all other Tjk = 0. If we integrate this 3-dimensional stress tensor through surface, we obtain the 2-dimensional R R the water’s R (2) stress tensor: Tjk dz = Tjk ; i.e., Txx dz = Tyy dz = −γ. The 2-dimensional metric of the surface is (2) g = g − ez ⊗ ez ; in terms of this 2-dimensional metric, the surface tension’s 3-dimensional stress tensor is T = −γδ(z)(2) g . Water is not the only fluid that exhibits surface tension; all fluids do so, at the interfaces between themselves and other substances. For a thin film, e.g. a soap bubble, there are two interfaces (the top face and the bottom face of the film), so the stress tensor is twice as large as for a single surface, T = −2γδ(z)(2) g. The hotter the fluid, the more randomly are oriented its surface molecules and hence the smaller the fluid’s surface tension γ. For water, γ varies from 75.6 dyne/cm2 at T = 0 C, to 72.0 dyne/cm2 at T = 25 C, to 58.9 dyne/cm2 at T = 100 C. In Exs. 15.3 and 15.4 we explore some applications of surface tension. In Sec. 15.2.3 and Ex. 15.5 we explore the influence of surface tension on water waves.

8 gravity to an effective gravity g→g+

γk 2 . ρ

(15.12)

The remainder of the derivation of the dispersion relation for deep gravity waves carries over unchanged, and the dispersion relation becomes ω 2 = gk +

γk 3 ρ

(15.13)

[cf. Eqs. (15.9) and (15.12)]. When the second term dominates, the waves are sometimes called capillary waves.

15.2.4

Helioseismology

The sun provides an excellent example of the excitation of small amplitude waves in a fluid body. In the 1960s, Robert Leighton and colleagues discovered that the surface of the sun oscillates vertically with a period of roughly five minutes and a speed of ∼ 1 km s−1 . This was thought to be an incoherent surface phenomenon until it was shown that the observed variation was, in fact, the superposition of thousands of highly coherent wave modes excited within the sun’s interior — normal modes of the sun. Present day techniques allow surface velocity amplitudes as small as 2 mm s−1 to be measured, and phase coherence for intervals as long as a year has been observed. Studying the frequency spectrum and its variation provides a unique probe of the sun’s interior structure, just as the measurement of conventional seismic waves, as described in Sec.11.4, probes the earth’s interior. The description of the normal modes of the sun requires some modification of our treatment of gravity waves. We shall eschew details and just outline the principles. First, the sun is (very nearly) spherical. We therefore work in spherical polar coordinates rather than Cartesian coordinates. Second, the sun is made of hot gas and it is no longer a good approximation to assume that the fluid is always incompressible. We must therefore replace the equation ∇ · v = 0 with the full equation of continuity (mass conservation) together with the equation of energy conservation which governs the relationship between the density and pressure perturbations. Third, the sun is not uniform. The pressure and density in the unperturbed gas vary with radius in a known manner and must be included. Fourth, the sun has a finite surface area. Instead of assuming that there will be a continuous spectrum of waves, we must now anticipate that the boundary conditions will lead to a discrete spectrum of normal modes. Allowing for these complications, it is possible to derive a differential equation to replace Eq. (15.7). It turns out that a convenient dependent variable (replacing the velocity potential ψ) is the pressure perturbation. The boundary conditions are that the displacement vanish at the center of the sun and that the pressure perturbation vanish at the surface. At this point the problem is reminiscent of the famous solution for the eigenfunctions of the Schrödinger equation for a hydrogen atom in terms of associated Laguerre polynomials. The wave frequencies of the sun’s normal modes are given by the eigenvalues of the differential equation. The corresponding eigenfunctions can be classified using three quantum

9

(b)

Frequency, mHz

(a)4

g10 ( = 2)

p17 ( = 20)

g18 ( = 4)

p10 ( = 60)

3

2

1 0

20

40

60

80

100

Spherical Harmonic Degree

120

140

0

1

.5

r/R

0

1

.5

r/R

Fig. 15.2: (a) Measured frequency spectrum for solar p-modes with different values of the quantum numbers n, l. The error bars are magnified by a factor 1000. Frequencies for modes with n > 30 and l > 1000 have been measured. (b) Sample eigenfunctions for g and p modes labeled by n (subscripts) and l (parentheses). The ordinate is the radial velocity and the abscissa is fractional radial distance from the sun’s center to its surface. The solar convection zone is the dashed region at the bottom. (Adapted from Libbrecht and Woodard 1991.)

numbers, n, l, m, where n counts the number of radial nodes in the eigenfunction and the angular variation is proportional to the spherical harmonic Ylm (θ, φ). If the sun were precisely spherical, the modes that are distinguished only by their m quantum number would be degenerate just as is the case with an atom when there is no preferred direction in space. However, the sun rotates with a latitude-dependent period in the range ∼ 25 − 30 days and this breaks the degeneracy just as an applied magnetic field in an atom breaks the degeneracy of the atom’s states (the Zeeman effect). From the splitting of the solar-mode spectrum, it is possible to learn about the distribution of rotational angular momentum inside the sun. When this problem is solved in detail, it turns out that there are two general classes of modes. One class is similar to gravity waves, in the sense that the forces which drive the gas’s motions are produced primarily by gravity (either directly, or indirectly via the weight of overlying material producing pressure that pushes on the gas.) These are called g modes. In the second class (known as p and f modes), the pressure forces arise mainly from the compression of the fluid just like in sound waves (which we shall study in Sec. 15.5 below). Now, it turns out that the g modes have large amplitudes in the middle of the sun, whereas the p and f modes are dominant in the outer layers [cf. Fig. 15.2(b)]. The reasons for this are relatively easy to understand and introduce ideas to which we shall return: The sun is a hot body, much hotter at its center (T ∼ 1.5 × 107 K) than on its surface (T ∼ 6000 K). The sound speed c is therefore much greater in its interior and so p and f modes of a given frequency ω can carry their energy flux ∼ ρξ 2 ω 2 c (Sec.15.5) with much smaller amplitudes ξ than near the surface. That is why the p- and f -mode amplitudes are much smaller in the center of the sun than near the surface. The g-modes are controlled by different physics and thus behave differently: The outer ∼ 30 percent (by radius) of the sun is convective (cf. Sec. 15.6.4) because the diffusion of

10 photons is inadequate to carry the huge amount of nuclear energy being generated in the solar core. The convection produces an equilibrium variation of pressure and density with radius that are just such as to keep the sun almost neutrally stable, so that regions that are slightly hotter (cooler) than their surroundings will rise (sink) in the solar gravitational field. Therefore there cannot be much of a mechanical restoring force which would cause these regions to oscillate about their average positions, and so the g modes (which are influenced almost solely by gravity) have little restoring force and thus are evanescent in the convection zone, and so their amplitudes decay quickly with increasing radius there. We should therefore expect only p and f modes to be seen in the surface motions and this is, indeed the case. Furthermore, we should not expect the properties of these modes to be very sensitive to the physical conditions in the core. A more detailed analysis bears this out. **************************** EXERCISES Exercise 15.1 Problem: Fluid Motions in Gravity Waves (a) Show that in a gravity wave in water of arbitrary depth, each fluid element undergoes elliptical motion. (Assume that the amplitude of the water’s displacement is small compared to a wavelength.) (b) Calculate the longitudinal diameter of the motion’s ellipse, and the ratio of vertical to longitudinal diameters, as functions of depth. (c) Show that for a deep-water wave, kho ≫ 1, the ellipses are all circles with diameters that die out exponentially with depth. (d) We normally think of a circular motion of fluid as entailing vorticity, but a gravity wave in water has vanishing vorticity. How can this vanishing vorticity be compatible with the circular motion of fluid elements? (e) Show that for a shallow-water wave, kho ≪ 1, the motion is (nearly) horizontal and independent of height z. (f) Compute the pressure perturbation P (x, z) inside the fluid for arbitrary depth. Show that, for a shallow-water wave the pressure is determined by the need to balance the weight of the overlying fluid, but for general depth, vertical fluid accelerations alter this condition of weight balance. Exercise 15.2 Problem: Maximum size of a water droplet What is the maximum size of water droplets that can form by water very slowly dripping out of a pipette? and out of a water faucet?

11 Exercise 15.3 Problem: Force balance for an interface between two fluids Consider a point P in the curved interface between two fluids. Introduce Cartesian coordinates at P with x and y parallel to the interface and z orthogonal [as in diagram (b) of Box 15.2], and orient the x and y axes along the directions of the interface’s “principal curvatures”, so the local equation for the interface is z=

y2 x2 + . 2R1 2R2

(15.14)

Here R1 and R2 are the surface’s “principal radii of curvature” at P; note that each of them can be positive or negative, depending on whether the surface bends up or down along their directions. Show that stress balance ∇ · T = 0 for the surface implies that the pressure difference across the surface is 1 1 , + ∆P = γ (15.15) R1 R2 where γ is the surface tension. Exercise 15.4 Challenge: Minimum Area of Soap Film For a soap film that is attached to a bent wire (e.g. to the circular wire that a child uses to blow a bubble), the air pressure on the film’s two sides is the same. Therefore, Eq. (15.15) (with γ replaced by 2γ since the film has two faces) tells us that at every point of the film, its two principal radii of curvature must be equal and opposite, R1 = −R2 . It is an interesting excercise in differential geometry to show that this means that the soap film’s surface area is an extremum with respect to variations of the film’s shape, holding its boundary on the wire fixed. If you know enough differential geometry, prove this extremal-area property of soap films, and then show that, in order for the film’s shape to be stable, its extremal area must actually be a minimum. Exercise 15.5 Problem: Capillary Waves Consider deep-water gravity waves of short enough wavelength that surface tension must be included, so the dispersion relation is Eq. (15.13). Show that there is a minimum value of the group velocity and find its value together with the wavelength of the associated wave. Evaluate these for water (γ ∼ 0.07 N m−1 ). Try performing a crude experiment to verify this phenomenon. Exercise 15.6 Example: Boat Waves A toy boat moves with uniform velocity u across a deep pond (Fig. 15.3). Consider the wave pattern (time-independent in the boat’s frame) produced on the water’s surface at distances large compared to the boat’s size. Both gravity waves and surface-tension or capillary waves are excited. Show that capillary waves are found both ahead of and behind the boat, and gravity waves, solely inside a trailing wedge. More specifically: (a) In the rest frame of the water, the waves’ dispersion relation is Eq. (15.13). Change notation so ω is the waves’ angular velocity as seen in the boat’s frame and ωo in the

12 Vgo Q

u P gw

Gr

av

ity

W av

es

Fig. 15.3: Capillary and Gravity waves excited by a small boat (Ex. 15.6).

water’s frame, so the dispersion relation is ωo2 = gk + (γ/ρ)k 3 . Use the doppler shift (i.e. the transformation between frames) to derive the boat-frame dispersion relation ω(k). (b) The boat radiates a spectrum of waves in all directions. However, only those with vanishing frequency in the boat’s frame, ω = 0, contribute to the time-independent (“stationary”) pattern. As seen in the water’s frame and analyzed in the geometric optics approximation of Chap. 6, these waves are generated by the boat (at points along its dash-dot trajectory in Fig. 15.3) and travel outward with the group velocity Vgo . Regard Fig. 15.3 as a snapshot of the boat and water at a particular moment of time. Consider a wave that was generated at an earlier time, when the boat was at location P, and that traveled outward from there with speed Vgo at an angle φ to the boat’s direction of motion. (You may restrict yourself to 0 ≤ φ ≤ π/2.) Identify the point Q that this wave has reached, at the time of the snapshot, by the angle θ shown in the figure. Show that θ is given by tan θ =

Vgo (k) sin φ , u − Vgo (k) cos φ

(15.16a)

where k is determined by the dispersion relation ω0 (k) together with the “vanishing ω” condition ω0 (k, φ) = uk cos φ . (15.16b)

(c) Specialize to capillary waves [k ≫

p gρ/γ]. Show that

tan θ =

3 tan φ . 2 tan2 φ − 1

(15.17)

Demonstrate that the capillary wave pattern is present for all values of θ (including in front of the boat, π/2 < θ < π, and behind it, 0 ≤ θ ≤ π/2).

13 (d) Next, specialize to gravity waves and show that tan θ =

tan φ . 2 tan2 φ + 1

(15.18)

Demonstrate that the gravity-wave pattern is confined to a trailing wedge with angles θ < θgw = sin−1 (1/3) = 19.47o ; cf. Fig. 15.3. You might try to reproduce these results experimentally. Exercise 15.7 Example: Shallow-Water Waves with Variable Depth; Tsunamis2 Consider shallow-water waves in which the height of the bottom boundary varies, so the unperturbed water’s depth is variable ho = ho (x, y). (a) Show that the wave equation for the perturbation ξ(x, y, t) of the water’s height takes the form ∂2ξ − ∇ · (gho ∇ξ) = 0 . (15.19) ∂t2 Note that gho is the square of the wave’s propagation speed c2 (phase speed and group speed), so this equation takes the form that we studied in the geometric optics approximation in Sec. 6.3.1. (b) Describe what happens to the direction of propagation of a wave as the depth ho of the water varies (either as a set of discrete jumps in ho or as a slowly varying ho ). As a specific example, how must the propagation direction change as waves approach a beach (but when they are sufficiently far out from the beach that nonlinearities have not yet caused them to begin to break). Compare with your own observations at a beach. (c) Tsunamis are waves with enormous wavelengths, ∼ 100 km or so, that propagate on the deep ocean. Since the ocean depth is typically ∼ 4 km, tsunamis are governed by the shallow-water wave equation (15.19). What would you have to do to the ocean floor to create a lens that would focus a tsunami, generated by an earthquake near Japan, so that it destroys Los Angeles? For simulations of tsunami propagation, see, e.g., http://bullard.esc.cam.ac.uk/~taylor/Tsunami.html . (d) The height of a tsunami, when it is in the ocean with depth ho ∼ 4 km, is only ∼ 1 meter or less. Use Eq. (15.19) to show that the tsunami height will increase by a large factor as the tsunami nears the shore. **************************** 2

Exercise courtesy David Stevenson.

14

15.3

Nonlinear Shallow Water Waves and Solitons

In recent decades, solitons or solitary waves have been studied intensively in many different areas of physics. However, fluid dynamicists became familiar with them in the nineteenth century. In an oft-quoted pasage, John Scott-Russell (1844) described how he was riding along a narrow canal and watched a boat stop abruptly. This deceleration launched a single smooth pulse of water which he followed on horseback for one or two miles, observing it “rolling on a rate of some eight or nine miles an hour, preserving its original figure some thirty feet long and a foot to a foot and a half in height”. This was a soliton – a one dimensional, nonlinear wave with fixed profile traveling with constant speed. Solitons can be observed fairly readily when gravity waves are produced in shallow, narrow channels. We shall use the particular example of a shallow, nonlinear gravity wave to illustrate solitons in general.

15.3.1

Korteweg-de Vries (KdV) Equation

The key to a soliton’s behavior is a robust balance between the effects of dispersion and the effects of nonlinearity. When one grafts these two effects onto the wave equation for shallow water waves, then at leading order in the strengths of the dispersion and nonlinearity one gets the Korteweg-de Vries (KdV) equation for solitons. Since a completely rigorous derivation of the KdV equation is quite lengthy, we shall content ourselves with a somewhat heuristic derivation that is based on this grafting process, and is designed to emphasize the equation’s physical content. We choose as the dependent variable in our wave equation the height ξ of the water’s surface above its quiescent position, and we confine ourselves to a plane wave that propagates in the horizontal x direction so ξ = ξ(x, t). In√the limit of very weak waves, ξ(x, t) is governed by the shallow-water dispersion relation ω = gho k, where ho is the depth of the quiescent water. This dispersion relation implies that ξ(x, t) must satisfy the following elementary wave equation: p p ∂ ∂2ξ ∂ ∂ ∂ ∂2ξ ξ. (15.20) − gho + gho 0 = 2 − gho 2 = ∂t ∂x ∂t ∂x ∂t ∂x In the second expression, we have factored the wave operator into two pieces, one that governs waves propagating rightward, and the other leftward. To simplify our derivation and the final wave equation, we shall confine ourselves to rightward propagating waves, and correspondingly we can simply remove the left-propagation operator from the wave equation, obtaining ∂ξ p ∂ξ + gho =0. (15.21) ∂t ∂x (Leftward propagating waves are described by this same equation with a change of sign.) We now graft the effects of dispersion onto this rightward wave equation. √ The dispersion relation, including the effects of dispersion at leading order, is ω = gho k(1 − 16 k 2 h2o ) [Eq. (15.11)]. Now, this dispersion relation ought to be derivable by assuming a variation ξ ∝ exp[i(kx − ωt)] and substituting into a generalization of Eq. (15.21) with corrections

15 that take account of the finite depth of the channel. We will take a short cut and reverse this process to obtain the generalization of Eq. (15.21) from the dispersion relation. The result is 1p ∂ξ p ∂ξ ∂3ξ + gho =− (15.22) gho h2o 3 , ∂t ∂x 6 ∂x as a direct calculation confirms. This is the “linearized KdV equation”. It incorporates weak dispersion associated with the finite depth of the channel but is still a linear equation, only useful for small-amplitude waves. Now let us set aside the dispersive correction and tackle nonlinearity. For this purpose we return to first principles for waves in very shallow water. Let the height of the surface above the lake bottom be h = ho + ξ. Since the water is very shallow, the horizontal velocity, v ≡ vx , is almost independent of depth (aside from the boundary layer which we ignore); cf. the discussion following Eq. (15.11). The flux of water mass, per unit width of channel, is therefore ρhv and the mass per unit width is ρh. The law of mass conservation therefore takes the form ∂h ∂(hv) + =0, (15.23a) ∂t ∂x where we have canceled the constant density. This equation contains a nonlinearity in the product hv. A second nonlinear equation for h and v can be obtained from the x component of the inviscid Navier-Stokes equation ∂v/∂t + v∂v/∂x = −(1/ρ)∂p/∂x, with p determined by the weight of the overlying water, p = gρ[h(x) − z]: ∂v ∂h ∂v +v +g =0. (15.23b) ∂t ∂x ∂x Equations (15.23a) and (15.23b) can be combined to obtain √ √ p ∂ v − 2 gh ∂ v − 2 gh + v − gh =0. (15.23c) ∂t ∂x √ This equation shows that√the quantity v − 2 gh is constant along characteristics that propagate with speed v − gh. (This constant quantity is a special case of a “Riemann invariant”, a concept that we shall study in Chap. 16.) When, as we shall require below, the nonlinearites are modest so h does not differ greatly from ho , these characteristics propagate leftward, which implies that for rightward propagating waves they begin at early times √ in undisturbed fluid where v = 0 and h = ho . Therefore, the constant value of v − 2 gh is √ −2 gho , and correspondingly in regions of disturbed fluid p p gh − gho . (15.24) v=2

Substituting this into Eq. (15.23a), we obtain p ∂h ∂h p + 3 gh − 2 gho =0. (15.25) ∂t ∂x We next substitute ξ = h − ho and expand to second order in ξ to obtain the final form of our wave equation with nonlinearities but no dispersion: r 3ξ g ∂ξ ∂ξ ∂ξ p + gho =− , (15.26) ∂t ∂x 2 ho ∂x

16 ξ

ξ

-2

-1

ξ

1

1

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2 0

1

2

χ

-2

-1

1

0.2 1

0

2

χ

-2

-1

0

1

2

χ

Fig. 15.4: Steepening of a Gaussian wave profile by the nonlinear term in the KdV equation. The increase of wave speed with amplitude causes the leading part of the profile to steepen with time and the trailing part to flatten. In the full KdV equation, this effect can be balanced by the effect of dispersion, which causes the high-frequency Fourier components in the wave to travel slightly slower than the low-frequency components. This allows stable solitons to form.

where the term on the right hand side is the nonlinear correction. We now have separate dispersive corrections (15.22) and nonlinear corrections (15.26) to the rightward wave equation (15.21). Combining the two corrections into a single equation, we obtain 3ξ ∂ξ ∂ξ p h2o ∂ 3 ξ =0. (15.27) + gho 1 + + ∂t 2ho ∂x 6 ∂x3 Finally, we substitute χ≡x−

p

gho t

(15.28)

to transform into a frame moving rightward with the speed of small-amplitude gravity waves. The result is the full Korteweg-deVries or KdV equation: ∂ξ 3 + ∂t 2

15.3.2

r

g ho

∂ξ 1 3 ∂3ξ ξ + h =0. ∂χ 9 o ∂χ3

(15.29)

Physical Effects in the KdV Equation

Before exploring solutions to the KdV equation (15.29), let us consider the physical effects of its nonlinear and dispersive terms. The second, nonlinear term derives from the nonlinearity in the (v · ∇)v term of the Navier-Stokes equation. The effect of this nonlinearity is to steepen the leading edge of a wave profile and flatten the trailing edge (Fig. 15.4.) Another way to understand the effect of this term is to regard it as a nonlinear coupling of linear waves. Since it is nonlinear in the wave amplitude, it can couple together waves with different wave numbers k. For example if we have a purely sinusoidal wave ∝ exp(ikx), then this nonlinearity will lead to the growth of a first harmonic ∝ exp(2ikx). Similarly, when two linear waves with spatial frequencies k, k ′ are superposed, this term will describe the production of new waves at the sum and difference spatial frequencies. We have already met such wave-wave coupling in our study of nonlinear optics (Chap. 9), and in the route to turbulence for rotating Couette flow (Fig. 14.12), and we shall meet it again in nonlinear plasma physics (Chap. 22).

17

(x1,t)

Stable Solitons

dispersing waves

x

x

(b)

(a)

Fig. 15.5: Production of stable solitons out of an irregular initial wave profile.

The third term in (15.29) is linear and is responsible for a weak dispersion of the wave. The higher-frequency Fourier components travel with slower phase velocities than lowerfrequency components. This has two effects. One is an overall spreading of a wave in a manner qualitatively familiar from elementary quantum mechanics; cf. Ex. 6.2. For example, in a Gaussian wave packet with width ∆x, the range of wave numbers k contributing significantly to the profile is ∆k ∼ 1/∆x. The spread in the group velocity is then ∼ ∆k ∂ 2 ω/∂k 2 ∼ (gho)1/2 h2o k∆k [cf. Eq. (15.11)]. The wave packet will then double in size in a time 2 1 ∆x ∆x √ ∼ . (15.30) tspread ∼ ∆vg ho k gho The second effect is that since the high-frequency components travel somewhat slower than the low-frequency components, there will be a tendency for the profile to become asymmetric with the trailing edge steeper than the leading edge. Given the opposite effects of these two corrections (nonlinearity makes the wave’s leading edge steeper; dispersion reduces its steepness), it should not be too surprising in hindsight that it is possible to find solutions to the KdV equation with constant profile, in which nonlinearity balances dispersion. What is quite surprising, though, is that these solutions, called solitons, are very robust and arise naturally out of random initial data. That is to say, if we solve an initial value problem numerically starting with several peaks of random shape and size, then although much of the wave will spread and disappear due to dispersion, we will typically be left with several smooth soliton solutions, as in Fig. 15.5.

15.3.3

Single-Soliton Solution

We can discard some unnecessary algebraic luggage in the KdV equation (15.29) by transforming both independent variables using the substitutions ξ , ζ= ho

3χ η= , ho

9 τ= 2

r

g t. ho

(15.31)

18 χ

1/2

0

(g h o)1/2 [1 + (

o

2ho

)]

χ Fig. 15.6: Profile of the single-soliton solution (15.33), (15.31) of the KdV equation. The width χ1/2 is inversely proportional to the square root of the peak height ξo .

The KdV equation then becomes ∂ζ ∂ζ ∂3ζ +ζ + 3 =0. ∂τ ∂η ∂η

(15.32)

There are well understood mathematical techniques3 for solving equations like the KdV equation. However, we shall just quote solutions and explore their properties. The simplest solution to the dimensionless KdV equation (15.32) is ζ = ζ0 sech2

"

ζ0 12

1/2 # 1 η − ζ0 τ . 3

(15.33)

This solution describes a one-parameter family of stable solitons. For each such soliton (each ζ0 ), the soliton maintains its shape while propagating at speed dη/dτ = ζ0 /3 relative to a weak wave. By transforming to the rest frame of the unperturbed water using Eqs. (15.31) and (15.28), we find for the soliton’s speed there; dx p ξo . (15.34) = gho 1 + dt 2ho The first term is the propagation speed of a weak (linear) wave. The second term is the nonlinear correction, proportional to the wave amplitude ξo . The half width of the wave may be defined by setting the argument of the hyperbolic secant to unity: 3 1/2 4ho χ1/2 = . (15.35) 3ξo The larger the wave amplitude, the narrower its length and the faster it propagates; cf. Fig. 15.6. 3

See, for example, Whitham (1974).

19 τ=−9

ζ 6 5 4 3 2 1

-25 -20 -15 -10 -5 η

ζ 6 5 4 3 2 1

-10 -5

ζ

τ=0

0 η

5

10

6 5 4 3 2 1

τ=9

5

10

η

15

20

25

Fig. 15.7: Two-Soliton solution to the dimensionless KdV equation (15.32). This solution describes two waves well separated for τ → −∞ that coalesce and then separate producing the original two waves in reverse order as τ → +∞. The notation is that of Eq. (15.36); the values of the parameters in that equation are η1 = η2 = 0 (so the solitons will be merged at time η = 0), α1 = 1, α2 = 1.4.

Let us return to Scott-Russell’s soliton. Converting to SI units, the speed was about 4 m s giving an estimate of the depth of the canal as ho ∼ 1.6 m. Using the width χ1/2 ∼ 5 m, we obtain a peak height ξo ∼ 0.25 m, somewhat smaller than quoted but within the errors allowing for the uncertainty in the definition of the width and an (appropriate) element of hyperbole in the account. −1

15.3.4

Two-Soliton Solution

One of the most fascinating properties of solitons is the way that two or more waves interact. The expectation, derived from physics experience with weakly coupled normal modes, might be that if we have two well separated solitons propagating in the same direction with the larger wave chasing the smaller wave, then the larger will eventually catch up with the smaller and nonlinear interactions between the two waves will essentially destroy both, leaving behind a single, irregular pulse which will spread and decay after the interaction. However, this is not what happens. Instead, the two waves pass through each other unscathed and unchanged, except that they emerge from the interaction a bit sooner than they would have had they moved with their original speeds during the interaction. See Fig. 15.7. We shall not pause to explain why the two waves survive unscathed, save to remark that there are topological invariants in the solution which must be preserved. However, we can exhibit one such twosoliton solution analytically: ∂2 [12 ln F (η, τ )] , ∂η 2 2 α2 − α1 f1 f2 , where F = 1 + f1 + f2 + α2 + α1 and fi = exp[−αi (η − ηi ) + αi3 τ ] ; ζ =

here αi and ηi are constants. This solution is depicted in Fig. 15.7.

(15.36)

20

15.3.5

Solitons in Contemporary Physics

Solitons were re-discovered in the 1960’s when they were found in numerical plasma simulations. Their topological properties were soon discovered and general methods to generate solutions were derived. Solitons have been isolated in such different subjects as the propagation of magnetic flux in a Josephson junction, elastic waves in anharmonic crystals, quantum field theory (as instantons) and classical general relativity (as solitary, nonlinear gravitational waves). Most classical solitons are solutions to one of a relatively small number of nonlinear ordinary differential equations, including the KdV equation, Burgers’ equation and the sine-Gordon equation. Unfortunately it has proved difficult to generalize these equations and their soliton solutions to two and three spatial dimensions. Just like research into chaos, studies of solitons have taught physicists that nonlinearity need not lead to maximal disorder in physical systems, but instead can create surprisingly stable, ordered structures. **************************** EXERCISES Exercise 15.8 Example: Breaking of a Dam Consider the flow of water along a horizontal channel of constant width after a dam breaks. Sometime after the initial transients have died away, the flow may be described by the nonlinear shallow wave equations (15.23): ∂h ∂(hv) + =0, ∂t ∂x

∂v ∂v ∂h +v +g =0. ∂t ∂x ∂x

(15.37)

Here h is the height of the flow, v is the horizontal speed of the flow and x is distance along the channel measured from the location of the dam. Solve for the flow assuming that initially (at t = 0) h = ho for x < 0 and h = 0 for x > 0 (no water). Your solution should have the form shown in Fig. 15.8. What is the speed of the front of the water? √ [Hints: Note that from the parameters of the problem we can construct only one velocity, gho and no length except ho . It√therefore is a reasonable guess that the solution has the self-similar form ˜ ˜ and v˜ are dimensionless functions of the similarity h = ho h(ξ), v = gho v˜(ξ), where h variable x/t . (15.38) ξ=√ gho Using this ansatz, convert the partial differential equations (15.37) into a pair of ordinary differential equations which can be solved so as to satisfy the initial conditions.] Exercise 15.9 Derivation: Single-Soliton Solution Verify that expression (15.33) does indeed satisfy the dimensionless KdV equation (15.32). Exercise 15.10 Derivation: Two-Soliton Solution

21

ho

3 2 1 t=0

h

x Fig. 15.8: The water’s height h(x, t) after a dam breaks.

(a) Verify, using symbolic-manipulation computer software (e.g., Macsyma, Maple or Mathematica) that the two-soliton expression (15.36) satisfies the dimensionless KdV equation. (Warning: Considerable algebraic travail is required to verify this by hand, directly.) (b) Verify analytically that the two-soliton solution (15.36) has the properties claimed in the text: First consider the solution at early times in the spatial region where f1 ∼ 1, f2 ≪ 1. Show that the solution is approximately that of the single-soliton described by Eq. (15.33). Demonstrate that the amplitude is ζ01 = 3α12 and find the location of its peak. Repeat the exercise for the second wave and for late times. (c) Use a computer to follow, numerically, the evolution of this two-soliton solution as time η passes (thereby filling in timesteps between those shown in Fig. 15.7).

****************************

15.4

Rossby Waves in a Rotating Fluid

In a nearly rigidly rotating fluid, the Coriolis effect (observed in a co-rotating reference frame; Sec. 13.5) provides the restoring force for an unusual type of wave motion called “Rossby waves.” These waves are seen in the Earth’s oceans and atmosphere. For a simple example, we consider the sea above a sloping seabed; Fig. 15.9. We assume the unperturbed fluid has vanishing velocity v = 0 in the Earth’s rotating frame, and we study weak waves in the sea with oscillating velocity v. (Since the fluid is at rest in the equilibrium state about which we are perturbing, we write the perturbed velocity as v rather than δv.) We assume that the wavelengths are long enough that viscosity is negligible. We shall also, in this case, restrict attention to small-amplitude waves so that nonlinear terms can

22 be dropped from our dynamical equations. The perturbed Navier-Stokes equation (13.54a) then becomes (after linearization) −∇δP ′ ∂v + 2Ω × v = . ∂t ρ

(15.39)

Here, as in Sec. 13.5, δP ′ is the perturbation in the effective pressure [which includes gravitational and centrifugal effects, P ′ = P + ρΦ − 12 ρ(Ω × x)2 ]. Taking the curl of Eq. (15.39), we obtain for the time derivative of the waves’ vorticity ∂ω = 2(Ω · ∇)v . (15.40) ∂t We seek a wave mode in which the horizontal fluid velocity oscillates in the x direction, vx , vy ∝ exp[i(kx − ωt)], and is independent of z in accord with the Taylor-Proudman theorem (Sec. 13.5.3): ∂vy ∂vx = =0. (15.41) vx and vy ∝ exp[i(kx − ωt)] , ∂z ∂z The only allowed vertical variation is in the vertical velocity vz , and differentiating ∇ · v = 0 with respect to z, we obtain ∂ 2 vz =0. (15.42) ∂z 2 The vertical velocity therefore varies linearly between the surface and the sea floor. Now, one boundary condition is that the vertical velocity must vanish at the surface. The other is that, at the seafloor z = −h, we must have vz (−h) = −αvy (x), where α is the tangent of the angle of inclination of the sea floor. The solution to Eq. (15.42) satisfying these boundary conditions is αz vz = vy . (15.43) h Taking the vertical component of Eq. (15.40) and evaluating ωz = vy,x − vx,y = ikvy , we obtain ∂vz 2Ωαvy ωkvy = 2Ω = . (15.44) ∂z h The dispersion relation therefore has the quite unusual form ωk =

h

z y

Fig. 15.9: Geometry of ocean for Rossby waves.

2Ωα . h

(15.45)

23 Rossby waves have interesting properties: They can only propagate in one direction— parallel to the intersection of the sea floor with the horizontal (our ex direction). Their phase velocity Vph and group velocity Vg are equal in magnitude but in opposite directions, Vph = −Vg =

2Ωα ex . k2 h

(15.46)

If we use ∇ · v = 0, we discover that the two components of horizontal velocity are in quadrature, vx = iαvy /kh. This means that, when seen from above, the fluid circulates with the opposite sense to the angular velocity Ω. Rossby waves plays an important role in the circulation of the earth’s oceans; see, e.g., Chelton and Schlax (1996). A variant of these Rossby waves in air can be seen as undulations in the atmosphere’s jet stream produced when the stream goes over a sloping terrain such as that of the Rocky Mountains; and another variant in neutron stars generates gravitational waves (ripples of spacetime curvature) that are a promising source for ground-based detectors such as LIGO.

15.5

Sound Waves

So far, our discussion of fluid dynamics has mostly been concerned with flows sufficiently slow that the density can be treated as constant. We now introduce the effects of compressibility by discussing sound waves (in a non-rotating reference frame). Sound waves are prototypical scalar waves and therefore are simpler in many respects than vector electromagnetic waves and tensor gravitational waves. For a sound wave in the small-amplitude limit, we linearize the Euler and continuity (mass conservation) equations to obtain ρ

∂v = −∇δP , ∂t

(15.47a)

∂δρ = −ρ∇ · v . (15.47b) ∂t As the flow is irrotational (vanishing vorticity before the wave arrives implies vanishing vorticity as it is passing), we can write the fluid velocity as the gradient of a velocity potential: v = ∇ψ .

(15.48a)

Inserting this into Eq. (15.47a) and integrating spatially (with ρ regarded as constant since its perturbation would give a second-order term), we obtain δP = −ρ

∂ψ . ∂t

(15.48b)

Setting δρ = (∂ρ/∂P )s δP in Eq. (15.47b) (with the derivative performed at constant entropy s because there generally will not be enough time in a single wave period for heat to cross

24 a wavelength), and expressing δP and ∇v in terms of the velocity potential [Eqs. (15.48a) and (15.48b)], we obtain the dispersion-free wave equation ∇2 ψ =

1 ∂2ψ . c2 ∂t2

(15.48c)

1/2

(15.48d)

Here c=

∂P ∂ρ

s

is the adiabatic sound speed. For a perfect gas its value is c = (γP/ρ)1/2 where γ is the ratio of specific heats. The sound speed in air at 20◦ C is 340m s−1 . In water under atmospheric conditions, it is about 1.5km s−1 (not much different from sound speeds in solids). The general solution of the wave equation (15.48c) for plane sound waves propagating in the ±x directions is ψ = f1 (x − ct) + f2 (x + ct) , (15.49) where f1 , f2 are arbitrary functions.

15.5.1

Wave Energy

We shall use sound waves to illustrate how waves carry energy. The fluid’s energy density is U = ( 21 v 2 + u)ρ [Table 12.1 with Φ = 0]. The first term is the fluid’s kinetic energy; the second, its internal energy. The internal energy density can be evaluated by a Taylor expansion in the wave’s density perturbation: 2 1 ∂ (uρ) ∂(uρ) δρ + δρ2 (15.50) uρ = [uρ] + 2 ∂ρ 2 ∂ρ s s where the three coefficients in brackets [] are evaluated at the equilibrium density. The first term in Eq. (15.50) is the energy of the background fluid, so we shall drop it. The second term will average to zero over a wave period, so we shall also drop it. The third term can be simplified using the first law of thermodynamics in the form du = T ds − P d(1/ρ) (which implies [∂(uρ)/∂ρ]s = u + P/ρ), followed by the definition h = u + P/ρ of enthalpy density, followed by the first law in the form dh = T ds + dP/ρ, followed by expression (15.48d) for the speed of sound. The result is 2 ∂ (uρ) c2 ∂h = = . (15.51) ∂ρ2 ∂ρ s ρ s Inserting this into the third term of (15.50) and averaging over a wave period and wavelength, we obtain for the wave energy per unit volume " 2 # 1 ∂ψ 1 2 c2 2 1 = ρ(∇ψ)2 . (15.52) ε = ρv + δρ = ρ (∇ψ)2 + 2 2 2ρ 2 c ∂t

25 In the second equality we have used v = ∇ψ [Eq. (15.48a)] and δρ = (ρ/c2 )∂ψ/∂t [from δρ = (∂ρ/∂P )s δP = δP/c2 and Eq. (15.48b)]; the third equality can be deduced by multiplying the wave equation (15.48c) by ψ and averaging. Thus, there is equipartition of energy between the kinetic and internal energy terms. The energy flux is F = ( 12 v 2 + h)ρv [Table 12.1 with Φ = 0]. The kinetic energy flux (first term) is third order in the velocity perturbation and therefore vanishes on average. For a sound wave, the internal energy flux (second term) can be brought into a more useful form by expanding the enthalpy per unit mass: ∂h δP h = [h] + δP = [h] + . (15.53) ∂P s ρ Here we have used the first law of thermodynamics dh = T ds + (1/ρ)dP and adiabaticity of the perturbation, s =constant; and the terms in square brackets are unperturbed quantities. Inserting this into F = hρv and expressing δP and v in terms of the velocity potential [Eqs. (15.48a) and (15.48b)], and averaging over a wave period and wavelength, we obtain for the energy flux ∂ψ F = ρhv = δP v = −ρ ∇ψ . (15.54) ∂t This equation and Eq. (15.52) are a special case of the scalar-wave energy flux and energy density discussed in Sec. 6.3.1 and Ex. 6.9 [Eqs. (6.18)]. For a locally plane wave with ψ = ψo cos(k · x − ωt + ϕ) (where ϕ is an arbitrary phase), the energy density (15.52) is ε = 21 ρψo2 k 2 , and the energy flux (15.54) is F = 12 ρψo2 ωk. Since, ˆ = ck ˆ for this dispersion-free wave, the phase and group velocities are both V = (ω/k)k ˆ (where k = k/k is the unit vector pointing in the wave-propagation direction), the energy density and flux are related by ˆ. F = εV = εck (15.55) The energy flux is therefore the product of the energy density and the wave velocity, as we might have anticipated. When studying dispersive waves in plasmas (Chaps. 20 and 22) we shall return to the issue of energy transport, and shall see that just as information in waves is carried at the group velocity, not the phase velocity, so energy is also carried at the group velocity. In Sec. 6.3.1 we used the above equations for the sound-wave energy density ε (there denoted u) and flux F to illustrate, via geometric-optics considerations, the behavior of wave energy in an inhomogeneous, time-varying medium. The energy flux carried by sound is conventionally measured in dB (decibels). The flux in decibels, FdB , is related to the flux F in W m−2 by FdB = 120 + 10 log10 (F ) .

(15.56)

Sound that is barely audible is about 1 dB. Normal conversation is about 50-60 dB. Jet aircraft and rock concerts can cause exposure to more than 120 dB with consequent damage to the ear.

26

15.5.2

Sound Generation

So far in this book, we have been concerned with describing how different types of waves propagate. It is also important to understand how they are emitted. We now outline some aspects of the theory of sound generation. The reader should be familiar with the theory of electromagnetic wave emission. There, one considers a localised region containing moving charges and consequently variable currents. The source can be described as a sum over electric and magnetic multipoles, and each multipole in the source produces a characteristic angular variation of the distant radiation field. The radiation-field amplitude decays inversely with distance from the source and so the Poynting flux varies with the inverse square of the distance. Integrating over a large sphere gives the total power radiated by the source, broken down into the power radiated by each multipolar component. When the reduced wavelength λ ¯ = 1/k of the waves is small compared to the source (a situation referred to as slow motion since the source’s charges then generally move slowly compared to the speed of light), the most powerful radiating multipole is the electric dipole d(t). The dipole’s average emitted power is given by the Larmor formula ¨2 d P= , (15.57) 6πǫ0 c3 ¨ is the second time derivative of d, the bar denotes a time average, and c is the speed where d of light, not sound. This same procedure can be followed when describing sound generation. However, as we are dealing with a scalar wave, sound can have a monopolar source. As an pedagogical example, let us set a small, spherical, elastic ball, surrounded by fluid, into radial oscillation (not necessarily sinusoidal) with oscillation frequencies of order ω, so the emitted waves have reduced wavelengths of order λ = c/ω. Let the surface of the ball have radius a + ξ(t), and impose the slow-motion and small-amplitude conditions that λ ¯ ≫ a ≫ |ξ| .

(15.58)

As the waves will be spherical, the relevant outgoing-wave solution of the wave equation (15.48c) is f (t − r/c) ψ= , (15.59) r where f is a function to be determined. Since the fluid’s velocity at the ball’s surface must match that of the ball, we have (to first order in v and ψ) ˙ r = v(a, t) = ∇ψ ≃ − f (t − a/c) er ≃ − f (t) er , ξe a2 a2

(15.60)

where in the third equality we have used the slow-motion condition. Solving for f (t) and inserting into Eq. (15.59), we see that ψ(r, t) = −

˙ − r/c) a2 ξ(t . r

(15.61)

27 It is customary to express the radial velocity perturbation v in terms of an oscillating fluid monopole moment q = 4πρa2 ξ˙ . (15.62) Physically this is the total radial discharge of air mass (i.e. mass per unit time) crossing an imaginary fixed spherical surface of radius slightly larger than that of the oscillating ball. In ˙ = q(t)/4πρa2 . Using this and Eq. (15.61), we compute for the power terms of q we have ξ(t) radiated as sound waves [Eq. (15.54) integrated over a sphere centered on the ball] P=

q˙2 . 4πρc

(15.63)

Note that the power is inversely proportional to the signal speed. This is characteristic of monopolar emission and in contrast to the inverse cube variation for dipolar emission [Eq. (15.57)]. The emission of monopolar waves requires that the volume of the emitting solid body oscillate. When the solid simply oscillates without changing its volume, for example the reed on a musical instrument, dipolar emission should generally dominate. We can think of this as two monopoles of size a in antiphase separated by some displacement b ∼ a. The velocity potential in the far field is then the sum of two monopolar contributions, which almost cancel. Making a Taylor expansion, we obtain ψdipole b ωb ∼ ∼ , ψmonopole λ ¯ c

(15.64)

where ω and λ ¯ are the characteristic magnitudes of the angular frequency and reduced wavelength of the waves (which we have not assumed to be precisely sinusoidal). This reduction of ψ by the slow-motion factor ωb/c implies that the dipolar power emission is weaker than monopolar power by a factor ∼ (ωb/c)2 for similar frequencies and amplitudes of motion. However, to emit dipole radiation, momentum must be given to and removed from the fluid. In other words the fluid must be forced by a solid body. In the absence of such a solid body, the lowest multipole that can be radiated effectively is quadrupolar radiation, which is weaker by yet one more factor of (ωb/c)2 . These considerations are important to understanding how noise is produced by the intense turbulence created by jet engines, especially close to airports. We expect that the sound emitted by the free turbulence in the wake just behind the engine will be quadrupolar and will be dominated by emission from the largest (and hence fastest) turbulent eddies. [See the discussion of turbulent eddies in Sec. 14.3.4.] Denote by ℓ and vℓ the size and turnover speed of these largest eddies. Then the characteristic size of the sound’s source will be a ∼ b ∼ ℓ, the mass discharge will be q ∼ ρℓ2 vℓ , the characteristic frequency will be ω ∼ vℓ /ℓ, the reduced wavelength of the sound waves will be λ ¯ = c/ω ∼ ℓc/vℓ , and the slow-motion parameter will be b/¯ λ ∼ ωb/c ∼ vℓ /c. The quadrupolar power radiated per unit volume [Eq. (15.63) divided by the volume ℓ3 of an eddy and reduced by ∼ (bω/c)4 ] will therefore be dP vℓ3 vℓ 5 , ∼ ρ d3 x ℓ c

(15.65)

28 and this power will be concentrated around frequency ω ∼ vℓ /ℓ. For air of fixed sound speed and length scale, and for which the largest eddy speed is proportional to some characteristic speed V (e.g. the average speed of the air leaving the engine), the sound generation increases proportional to the eighth power of the Mach number M = V /c. This is known as Lighthill’s law. The implications for the design of jet engines should be obvious.

15.5.3

T2 Radiation Reaction, Runaway Solutions, and Matched Asymptotic Expansions4

Let us return to our idealized example of sound waves produced by a radially oscillating, spherical ball. We shall use this example to illustrate several deep issues in theoretical physics: the radiation-reaction force that acts back on a source due to its emission of radiation, a spurious runaway solution to the source’s equation of motion caused by the radiation-reaction force, and matched asymptotic expansions, a mathematical technique for solving field equations when there are two different regions of space in which the equations have rather different behaviors. We shall meet these concepts again, in a rather more complicated way, in Chap. 26, when studying the radiation reaction force caused by emission of gravitational waves. For our oscillating ball, the two different regions of space that we shall match to each other are the near zone, r ≪ λ ¯ , and the wave zone, r & λ ¯. We consider, first, the near zone, and we redo, from a new point of view, the analysis of the matching of the near-zone fluid velocity to the ball’s surface velocity and the computation of the pressure perturbation. Because the region near the ball is small compared to λ ¯ and the fluid speeds are small compared to c, the flow is very nearly incompressible ∇ · v = ∇2 ψ = 0; cf. the discussion of conditions for incompressibility in Sec. 12.6. [The near-zone equation ∇2 ψ = 0 is analogous to ∇2 Φ = 0 for the Newtonian gravitational potential in the weakgravity near zone of a gravitational-wave source (Chap. 26).] The general monopolar (spherical) solution to ∇2 ψ = 0 is ψ=

A(t) + B(t) . r

(15.66)

Matching the fluid’s radial velocity v = ∂ψ/∂r = −A/r 2 at r = a to the ball’s radial velocity ˙ we obtain ξ, ˙ . A(t) = −a2 ξ(t) (15.67) From the point of view of near-zone physics there is no mechanism for generating a nonzero spatially constant term B(t) in ψ [Eq. (15.66)], so if one were unaware of the emitted waves and their action back on the source, one would be inclined to set this B(t) to zero. [This is analogous to a Newtonian physicist who would be inclined to write the quadrupolar contribution to an axisymmetric source’s external gravitational field in the form Φ = P2 (cos θ)[A(t)r −3 + B(t)r 2 ] and then, being unaware of gravitational waves and their action back on the source, would set B(t) to zero; see Chap. 26]. Taking this near-zone 4

Our treatment is based on Burke (1970).

29 point of view, with B = 0, we infer that the fluid’s pressure perturbation acting on the ball’s surface is A˙ ∂ψ(a, t) δP = −ρ = −ρ = ρaξ¨ . (15.68) ∂t a The motion ξ(t) of the ball’s surface is controlled by the elastic restoring forces in its interior and the fluid pressure perturbation δP on its surface. In the absence of δP the surface would oscillate sinusoidally with some angular frequency ωo , so ξ¨ + ωo2 ξ = 0. The pressure will modify this to m(ξ¨ + ωo2 ξ) = −4πa2 δP , (15.69) where m is an effective mass, roughly equal to the ball’s true mass, and the right hand side is the integral of the radial component of the pressure perturbation force over the sphere’s surface. Inserting the near-zone viewpoint’s pressure perturbation (15.68), we obtain (m + 4πa3 ρ)ξ¨ + mωo2 ξ = 0 .

(15.70)

Evidently, the fluid increases the ball’s effective inertial mass (it loads additional mass onto the ball), and thereby reduces its frequency of oscillation to ω=√

ωo , 1+κ

where κ =

4πa3 ρ m

(15.71)

is a measure of the coupling strength between the ball and the fluid. In terms of this loaded frequency the equation of motion becomes ξ¨ + ω 2 ξ = 0 .

(15.72)

This near-zone viewpoint is not quite correct, just as the standard Newtonian viewpoint is not quite correct for the near-zone gravity of a gravitational-wave source (Chap. 26). To improve on this viewpoint, we temporarily move out into the wave zone and identify the general, outgoing-wave solution to the sound wave equation, ψ=

f (t − ǫr/c) r

(15.73)

[Eq. (15.59)]. Here f is a function to be determined by matching to the near zone, and ǫ is a parameter that has been inserted to trace the influence of the outgoing-wave boundary condition. For outgoing waves (the real, physical, situation), ǫ = +1; if the waves were ingoing, we would have ǫ = −1. This wave-zone solution remains valid down into the near zone. In the near zone we can perform a slow-motion expansion to bring it into the same form as the near-zone velocity potential (15.66): f (t) f˙(t) ψ= −ǫ + ... . (15.74) r c The second term is sensitive to whether the waves are outgoing or ingoing and thus must ultimately be responsible for the radiation reaction force that acts back on the oscillating ball; for this reason we will call it the radiation-reaction potential.

30 Equating the first term of this ψ to the first term of (15.66) and using the value (15.67) of A(t) obtained by matching the fluid velocity to the ball velocity, we obtain ˙ . f (t) = A(t) = −a2 ξ(t)

(15.75)

This equation tells us that the wave field f (t − r/c)/r generated by the ball’s surface dis˙ − r/c)/r [Eq. (15.61)] — the result we derived more placement ξ(t) is given by ψ = −a2 ξ(t quickly in the previous section. We can regard Eq. (15.75) as matching the near-zone solution outward onto the wave-zone solution to determine the wave field as a function of the source’s motion. Equating the second term of Eq. (15.74) to the second term of the near-zone velocity potential (15.66) we obtain f˙(t) a2 ¨ B(t) = −ǫ = ǫ ξ(t) . (15.76) c c This is the term in the near-zone velocity potential ψ = A/r + B that will be responsible for radiation reaction. We can regard this radiation reaction potential ψ RR = B(t) as having been generated by matching the wave zone’s outgoing (or ingoing) wave field back into the near zone. This pair of matchings, outward then inward, is a special, almost trivial example of the technique of matched asymptotic expansions — a technique developed by applied mathematicians to deal with much more complicated matching problems than this one (see e.g. Cole, 1968). ¨ gives rise to a radiationThe radiation-reaction potential ψ RR = B(t) = ǫ(a2 /c)ξ(t) ... reaction contribution to the pressure on the ball’s surface δP RR = −ρψ˙ RR = −ǫ(ρa2 /c) ξ . Inserting this into the equation of motion (15.69) along with the loading pressure (15.68) and performing the same algebra as before, we get the following radiation-reaction-modified form of Eq. (15.72): ... κ a ξ¨ + ω 2 ξ = ǫτ ξ , where τ = (15.77) 1+κc ... is less than the fluid’s sound travel time to cross the ball’s radius, a/c. The term ǫτ ξ in the equation of motion is the ball’s radiation-reaction acceleration, as we see from the fact that it would change sign if we switched from outgoing waves, ǫ = +1, to ingoing waves, ǫ = −1. In the absence of radiation reaction, the ball’s surface oscillates sinusoidally in time, ξ = e±iωt . The radiation reaction term produces a weak damping of these oscillations: ξ ∝ e±iωt e−σt ,

1 where σ = ǫ(ωτ )ω 2

(15.78)

is the radiation-reaction-induced damping rate. Note that in order of magnitude the ratio of the damping rate to the oscillation frequency is σ/ω = ωτ . ωa/c = a/¯ λ, which is small compared to unity by virtue of the slow-motion assumption. If the waves were ingoing rather than outgoing, ǫ = −1, the fluid’s oscillations would grow. In either case, outgoing waves or ingoing waves, the radiation reaction force removes energy from the ball or adds it at the same rate as the sound waves carry energy off or bring it in. The total energy, wave plus ball, is conserved.

31 Expression (15.78) is two linearly independent solutions to the equation of motion (15.77) — one with the sign + and the other −. Since this equation of motion has been made third order by the radiation-reaction term, there must be a third independent solution. It is easy to see that, up to a tiny fractional correction, that third solution is ξ ∝ eǫt/τ .

(15.79)

For outgoing waves, ǫ = +1, this solution grows exponentially in time, on an extremely rapid timescale τ . a/c; it is called a runaway solution. Such runaway solutions are ubiquitous in equations of motion with radiation reaction. For example, a computation of the electromagnetic radiation reaction on a small, classical, electrically charged, spherical particle gives the Abraham-Lorentz equation of motion ... (15.80) m(¨ x − τ x ) = Fext (Rorlich 1965; Sec. 16.2 of Jackson 1999). Here x(t) is the the particle’s world line, Fext is the external force that causes the particle to accelerate, and the particle’s inertial mass m includes an electrostatic contribution analogous to 4πa3 ρ in our fluid problem. The timescale τ , like that in our fluid problem, is very short, and when the external force is absent, there is a runaway solution x ∝ et/τ . Much human heat and confusion were generated, in the the early and mid 20th century, over these runaway solutions (see, e.g., Rorlich 1965). For our simple model problem, little heat or confusion need be expended. One can easily verify that the runaway solution (15.79) violates the slow-motion assumption a/¯ λ ≪ 1 that underlies our derivation of the radiation reaction acceleration. It therefore is a spurious solution. Our model problem is sufficiently simple that one can dig deeper into it and learn that the runaway solution arises from the slow-motion approximation trying to reproduce a genuine, rapidly damped solution and getting the sign of the damping wrong (Ex. 15.13 and Burke 1970). **************************** EXERCISES Exercise 15.11 Problem: Aerodynamic Sound Generation Consider the emission of quadrupolar sound waves by a Kolmogorov spectrum of free turbulence (Sec. 14.3.4). Show that the power radiated per unit frequency interval has a spectrum Pω ∝ ω −7/2 . Also show that the total power radiated is roughly a fraction M 5 of the power dissipated in the turbulence, where M is the Mach number. Exercise 15.12 Problem: Energy Conservation for Radially Oscillating Ball Plus Sound Waves For the radially oscillating ball as analyzed in Sec. 15.5.3, verify that the radiation reaction acceleration removes energy from the ball, plus the fluid loaded onto it, at the same rate as the gravitational waves carry energy away.

32 Exercise 15.13 Problem: Radiation Reaction Without the Slow Motion Approximation Redo the computation of radiation reaction for a radially oscillating ball immersed in a fluid, without imposing the slow-motion assumption and approximation. Thereby obtain the following coupled equations for the radial displacement ξ(t) of the ball’s surface and the function Φ(t) ≡ a−2 f (t − ǫa/c), where ψ = r −1 f (t − ǫr/c) is the sound-wave field: ξ¨ + ωo2 ξ = κΦ˙ ,

ξ˙ = −Φ − ǫ(a/c)Φ˙ .

(15.81)

Show that in the slow-motion regime, this equation of motion has two weakly damped solutions of the same form (15.78) as we derived using the slow-motion approximation, and one rapidly damped solution ξ ∝ exp(−ǫκ/τ ). Burke (1970) shows that the runaway solution (15.79) obtained using the slow-motion approximation is caused by that approximation’s futile attempt to reproduce this genuine, rapidly damped solution (15.81). Exercise 15.14 Problem: Sound Waves from a Ball Undergoing Quadrupolar Oscillations Repeat the analysis of gravitational wave emission, radiation reaction, and energy conservation, as given in Sec. 15.5.3 and Ex. 15.12, for axisymmetric, quadrupolar oscillations of an elastic ball, rball = a + ξ(t)P2 (cos θ). Comment: Since the lowest multipolar order for gravitational waves is quadrupolar, this exercise is closer to the analogous problem of gravitational wave emission than the monopolar analysis in the text. Hint: If ω is the frequency of the ball’s oscillations, then the sound waves have the form n2 (ωr/c) − iǫj2 (ωr/c) −iωt , (15.82) ψ = Kℜ e r where K is a constant, ℜ(X) is the real part of X, ǫ is +1 for outgoing waves and −1 for ingoing waves, and j2 and n2 are the spherical Bessel and spherical Neuman functions of order 2. In the distant wave zone, x ≡ ωr/c ≫ 1, n2 (x) − iǫj2 (x) =

eiǫx ; x

(15.83)

x2 1 & x2 & x4 & . . . . 15

(15.84)

in the near zone x = ωr/x ≪ 1, n2 (x) = −

3 2 4 1 & x & x & . . . , x3

Here “& xn ” means “+ (some constant)xn ”.

j2 (x) =

****************************

15.6

T2 Convection

In this last section of Chap 15, we turn attention to fluid motions driven by thermal effects (see the overview in Sec. 15.1). As a foundation, we begin by discussing heat transport via thermal diffusion:

33

15.6.1

T2 Heat Conduction

We know experimentally that heat flows in response to a temperature gradient. When the temperature differences are small on the scale of the mean free path of the heat-conducting particles (as, in practice, almost always will be the case), then we can expand the heat flux as a Taylor series in the temperature gradient, Fheat = (constant) + (a term linear in ∇T ) + (a term quadratic in ∇T ) + .... Now, the constant term must vanish; otherwise there would be heat conduction in the absence of a temperature gradient and this would contradict the second law of thermodynamics. The first contributing term is thus the linear term, and we stop with it, just as we do for Hooke’s law of elasticity and Ohm’s law of electrical conductivity. Here, as in elasticity and electromagnetism, we must be on the lookout for special circumstances when the linear approximation becomes invalid and be prepared to modify our description accordingly. This rarely happens in fluid dynamics, so in this chapter we shall ignore higher-order terms and write Fheat = −κ∇T ,

(15.85)

where the constant κ is known as the coefficient of thermal conductivity or just the thermal conductivity; cf. Secs. 2.7 and 12.7.3. In general κ will be a tensor, as it describes a linear relation between two vectors Fheat and ∇T . However, when the fluid is isotropic (as it is for the kinds of fluids we have treated thus far), κ is just a scalar. We shall confine ourselves to this case in the present chapter; but in Chap. 18, when describing a plasma as a fluid, we shall find that a magnetic field can make the plasma’s transport properties be decidedly anisotropic, so the thermal conductivity is tensorial. In this section we shall incorporate heat conduction into the fundamental equations of fluid dynamics. This can be accomplished most readily via the conservation laws for momentum and energy. (We have already done this briefly in Sec. 12.7.3; here we shall do so again, in greater detail.) On the molecular scale, the diffusing heat shows up as an anisotropic term N1 in the momentum distributions N (p) = N0 + N1 of particles (molecules, atoms, electrons, photons, ...); cf., e.g., Eqs. (2.74a) and (2.74g). This anisotropic term is tiny in magnitude compared to the isotropic term N0 , which has already been included via u =(internal energy per unit mass) and P = (pressure) in our densities and fluxes of momentum and energy. The only place that the molecular-scale anisotropic term is of any quantitative consequence, macroscopically, is in the energy flux; so heat conductivity modifies only the energy flux and not the energy density or the momentum density or flux.5 Correspondingly, the law of momentum conservation (the Navier-Stokes equation) is left unchanged, while energy conservation is altered. More specifically, in an external gravitational field (see Box 12.3 for the more general case of fluids with significant self gravity), the energy density retains its standard perfect-fluid 5

This is only true non-relativistically. In relativistic fluid dynamics, it remains true in the fluid’s rest frame; but in frames where the fluid moves at high speed, the diffusive energy flux gets Lorentz transformed into heat-flow contributions to energy density, momentum density, and momentum flux.

34 form U =ρ

1 2 v +u+Φ 2

(15.86)

[Eq. (12.51)], and the energy flux gets modified by the addition of the diffusive heat-flow term (15.85): 1 2 v + h + Φ − ζθv − 2ησ · v − κ∇T ; F = ρv (15.87) 2 cf. Table 12.2 Assuming there are no sources or sinks of energy beyond those already included in these U and F (no nuclear or chemical reactions, radiation emission or absorption, ...), then the law of energy conservation takes the standard form ∂U +∇·F =0. ∂t

(15.88)

As in our discussion of the influence of viscosity on energy conservation (Sec. 12.7), so also here, we can derive a law for the evolution of entropy by combining energy conservation (15.88) with mass conservation and the first law of thermodynamics; see Ex. 15.15 for details. The result can be written in either of two equivalent forms. The first, a conservation law for entropy, says: ∂(ρs) 2ησ : σ + ζθ2 + ∇ · [ρsv − κ∇ ln T ] = κ(∇ ln T )2 + , ∂t T

(15.89)

where the colon signifies a double contraction of the second rank rate of shear tensor with itself. The quantity ρs is obviously entropy density (since s is entropy per unit mass), and ρsv is obviously a contribution to entropy flux produced by the motion of the entropy-endowed fluid. The term −κ∇ ln T = Fheat /T is a flux of heat divided by temperature and thus, since heat divided by temperature is entropy, it must be a flux of entropy carried by the flowing heat (i.e. carried, microscopically, by the anisotropy N1 in the momentum distributions of the molecules and other particles). The left side of Eq. (15.89) would vanish if entropy were conserved, so the right side must be the rate of production of entropy in a unit volume. It contains the viscous heating terms that we discussed in Sec. 12.7, and also a new term: 1 dS = κ(∇ ln T )2 = Fheat · ∇ . dtdV T

(15.90)

This entropy increase per unit volume is the continuum version of the thermodynamic law that, when an amount of heat dQ is transfered from a reservoir with high temperature T1 to a reservoir with lower temperature T2 , there is a net entropy increase given by 1 1 ; (15.91) − dS = dQ T1 T2 cf. Ex. 15.16.

35 Note that the second law of thermodynamics (entropy never decreases), applied to Eq. (15.89), implies that the thermal conductivity κ, like the viscosity coefficients η and ζ, must always be positive. Our second version of the law of entropy evolution, derivable from Eq. (15.89) by combining with mass conservation, says that the entropy per unit mass in a fluid element evolves at a rate given by 2ησ : σ + ζθ2 ds = κ∇2 T + ρT (15.92) dt T cf. Ex. 15.15. For a viscous, heat-conducting fluid moving in an external gravitational field, the governing equations are the standard law of mass conservation (12.25) or (12.27), the standard Navier-Stokes equation (12.65), the first law of thermodynamics [Eq. (2) or (3) of Box 12.2], and either the law of energy conservation (15.88) or the law of entropy evolution (15.89) or (15.92). This set of equations is far too complicated to solve, except via massive numerical simulations, unless some strong simplification is imposed. We must therefore introduce approximations. Our first approximation is that the thermal conductivity κ is constant; for most real applications this is close to true, and no significant physical effects are missed by assuming it. Our second approximation, which does limit somewhat the type of problem we can address, is that the fluid motions are very slow —slow enough that the squares of the shear and expansion (which are quadratic in the fluid speed) are neglibibly small, and we thus can ignore viscous dissipation. This permits us to rewrite the entropy evolution equation (15.92) as ds ρT = κ∇2 T. (15.93) dt We can convert this entropy evolution equation into an evolution equation for temperature by expressing the changes ds/dt of entropy per baryon in terms of changes dT /dt of temperature. The usual way to do this is to note that T ds (the amount of heat deposited in a unit mass of fluid) is given by CdT , where C is the fluid’s specific heat. However, the specific heat depends on what one holds fixed during the energy deposition: the fluid element’s volume or its pressure. As we have assumed that the fluid motions are very slow, the fractional pressure fluctuations will be correspondingly small. (This does not preclude significant temperature fluctuations, provided that they are compensated by density fluctuations of opposite sign. However, if there are temperature fluctuations, then these will tend to equalize through thermal conduction in such a way that the pressure does not change significantly.) Therefore, the relevant specific heat for a slowly moving fluid is the one at constant pressure, CP , and we must write T ds = CP dT .6 Eq. (15.93) then becomes a linear partial differential equation for the temperature dT ∂T ≡ + v · ∇T = χ∇2 T , dt ∂t 6

(15.94)

See e.g. Turner 1973 for a more formal justification of the use of the specific heat at constant pressure rather than constant volume.

36 where χ = κ/ρCp

(15.95)

is known as the thermal diffusivity and we have again taken the easiest route in treating CP and ρ as constant. When the fluid moves so slowly that the advective term v · ∇T is negligible, then Eq. (15.94) says that the heat simply diffuses through the fluid, with the thermal diffusivity being the diffusion coefficient for the temperature. The diffusive transport of heat by thermal conduction is similar to the diffusive transport of vorticity by viscous stress [Eq. (13.3)] and the thermal diffusivity χ is the direct analog of the kinematic viscosity ν. This motivates us to introduce a new dimensionless number known as the Prandtl number, which measures the relative importance of viscosity and heat conduction (in the sense of their relative abilities to produce a diffusion of vorticity and of heat): ν Pr = (15.96) χ For gases, both ν and χ are given to order of magnitude by the product of the mean molecular speed and the mean free path and so Prandtl numbers are typically of order unity. (For air, Pr ∼ 0.7.) By contrast, in liquid metals the free electrons carry heat very efficiently compared with the transport of momentum (and vorticity) by diffusing ions, and so their Prandtl numbers are small. This is why liquid sodium is used as a coolant in nuclear power reactors. At the other end of the spectrum, water is a relatively poor thermal conductor with Pr ∼ 6, and Prandtl numbers for oils, which are quite viscous and poor conductors, measure in the thousands. Other Prandtl numbers are given in Table 15.1. Fluid ν (m2 s−1 ) Earth’s mantle 1017 Solar interior 10−2 Atmosphere 10−5 Ocean 10−6

χ (m2 s−1 ) 10−6 102 10−5 10−7

Pr 1023 10−4 1 10

Table 15.1: Order of magnitude estimates for kinematic viscosity ν, thermal diffusivity χ, and Prandtl number Pr = ν/χ for earth, fire, air and water.

One might think that, when the Prandtl number is small (so κ is large compared to ν), one should necessarily include heat flow in the fluid equations. Not so. One must distinguish the flow from the fluid. In some low-Prandtl-number flows, the heat conduction is so effective that the fluid becomes essentially isothermal, and buoyancy effects are minimised. Conversely, in some large-Prandtl-number flows the large viscous stress reduces the velocity gradient so that slow, thermally driven circulation takes place and thermal effects are very important. In general, the kinematic viscosity is of direct importance in controlling the transport of momentum, and hence in establishing the velocity field, whereas heat conduction enters only indirectly (Sec. 15.6.2 below). We must therefore examine each flow on its individual merits.

37 There is another dimensionless number that is commonly introduced when discussing thermal effects: the Péclet number. It is defined, by analogy with the Reynolds’ number, by Pe =

VL , χ

(15.97)

where L is a characteristic length scale of the flow and V is a characteristic speed. The Péclet number measures the relative importance of advection and heat conduction. **************************** EXERCISES Exercise 15.15 Derivation: Equations for Entropy Evolution By combining the law of energy conservation (15.88) [with the energy density and flux given by Eqs. (15.86) and (15.87) where Φ(x) is a fixed external gravitational potential], with the law of mass conservation and the first law of thermodynamics, derive Eqs. (15.89) and (15.92) for the evolution of the fluid’s entropy. Exercise 15.16 Problem: Elementary Derivation of Entropy Production Beginning with the elementary law (15.91) for the increase of entropy when heat dQ is transferred from a high-temperature reservoir to a low-temperature one, derive the continuum equation (15.90) for the rate of increase of entropy per unit volume due to diffusive heat flow in an isotropic medium. Note that your derivation makes no reference to the nature of the medium, other than the requirements that it be isotropic and that temperature differences be small on the scales of the mean free paths of the heat carriers (so the diffusion approximation is valid). Thus, Eq. (15.90) is valid for isotropic solids as well as for fluids. Exercise 15.17 Example: Poiseuille Flow with a uniform temperature gradient A nuclear reactor is cooled with liquid sodium which flows through a set of pipes from the reactor to a remote heat exchanger, where the heat’s energy is used to generate electricity. Unfortunately, some heat will be lost through the walls of the pipe before it reaches the heat exchanger and this will reduce the reactor’s efficiency. In this question, we determine what fraction of the heat is lost through the pipe walls. Consider the flow of the sodium through one of the pipes, and assume that the Reynold’s number is modest so the flow is steady and laminar. Then the fluid velocity will have the parabolic Poiseuille profile ̟2 (15.98) v = 2¯ v 1− 2 R [Eq. (12.76) and associated discussion]. Here R is the pipe’s inner radius, ̟ is the cylindrical radial coordinate measured from the axis of the pipe, and v¯ is the mean speed along the pipe. Suppose that the pipe has length L ≫ R from the reactor to the heat exchanger, and is thermally very well insulated so its inner wall is at nearly the same temperature as the core of the fluid. Then the total temperature drop ∆T down the length L will be ∆T ≪ T ,

38 and the temperature gradient will be constant, so the temperature distribution in the pipe has the form z (15.99) T = T0 − ∆T + f (̟) . L (a) Use Eq. (15.93) to show that v¯R2 ∆T f= 2χL

3 ̟2 1 ̟4 − + 4 R2 4 R4

.

(15.100)

(b) Derive an expression for the conductive heat flux through the walls of the pipe and show that the ratio of the heat escaping through the walls to that convected by the fluid is ∆T /T . (Ignore the influence of the temperature gradient on the velocity field and treat the thermal diffusivity and specific heat as constant throughout the flow.) (c) Consider a nuclear reactor in which 10kW of power has to be transported through a pipe carrying liquid sodium. If the reactor temperature is ∼ 1000K and the exterior temperature is room temperature, estimate the flow of liquid sodium necessary to achieve the necessary transport of heat . Exercise 15.18 Problem: Thermal Boundary Layers In Sec. 13.4, we introduced the notion of a laminar boundary layer by analyzing flow past a thin plate. Now suppose that this same plate is maintained at a different temperature from the free flow. A thermal boundary layer will be formed, in addition to the viscous boundary layer, which we presume to be laminar. These two boundary layers will both extend outward from the wall but will (usually) have different thicknesses. (a) Explain why their relative thicknesses depend on the Prandtl number. (b) Using Eq. (15.94), show that in order of magnitude the thickness of the thermal boundary layer, δT , is given by v(δT )δT2 = ℓχ , where v(δT ) is the fluid velocity parallel to the plate at the outer edge of the thermal boundary layer and ℓ is the distance downstream from the leading edge. Let V be the free stream fluid velocity and ∆T be the temperature difference between the plate and the body of the flow. (c) Estimate δT in the limits of large and small Prandtl numbers. (d) What will be the boundary layer’s temperature profile when the Prandtl number is exactly unity?

****************************

39

15.6.2

T2 Boussinesq Approximation

When the heat fluxes are sufficiently small, we can use Eq. (15.94) to solve for the temperature distribution in a given velocity field, ignoring the feedback of the thermal effects onto the velocity. However, if we imagine increasing the flow’s temperature differences so the heat fluxes also increase, at some point thermal feedback effects will begin to influence the velocity significantly. The first feedback effect to occur is typically that of buoyancy, the tendency of the hotter (and hence lower-density) fluid to rise in a gravitational field and the colder (and hence denser) fluid to descend.7 In this section, we shall describe the effects of buoyancy as simply as possible. The minimal approach, which is adequate surprisingly often, is called the Boussinesq approximation. It can be used to describe many laboratory flows and atmospheric flows, and some geophysical flows. The type of flows for which the Boussinesq approximation is appropriate are those in which the fractional density changes are small (|∆ρ| ≪ ρ). By contrast, the velocity can undergo large changes. However, as the density changes following a fluid element must be small, ρ−1 dρ/dt ≃ 0, we can approximate the equation of continuity (mass conservation) dρ/dt + ρ∇ · v = 0 by the “incompressibility” relation ∇ · v = 0 . Boussinesq (1)

(15.101)

This does not mean that the density is constant. Rather, it means that the sole significant cause of density changes is thermal expansion. In discussing thermal expansion, it is convenient to introduce a reference density ρ0 and reference temperature T0 , equal to some mean of the density and temperature in the region of fluid that one is studying. We shall denote by τ ≡ T − T0 (15.102) the perturbation of the temperature away from its reference value. The thermally perturbed density can then be written as ρ = ρ0 (1 − ατ ) , (15.103) where α is the thermal expansion coefficient for volume8 [evaluated at constant pressure for the same reason as CP was at constant pressure in the paragraph following Eq. (15.93)]: ∂ ln ρ . α=− (15.104) ∂T P Equation (15.103) enables us to eliminate density perturbations as an explicit variable and replace them by temperature perturbations. Turn, now, to the Navier-Stokes equation (12.66) in a uniform external gravitational field: ∇P dv =− + g + ν∇2 v . dt ρ 7

(15.105)

This effect is put to good use in a domestic “gravity-fed” warm-air circulation system. The furnace generally resides in the basement not the attic! 8 Note that α is three times larger than the thermal expansion coefficient for the linear dimensions of the fluid.

40 We expand the pressure-gradient term as −

∇P ∇P (1 + ατ ) , ≃− ρ ρ0

(15.106)

and, as in our analysis of rotating flows [Eq. (13.53)], we introduce an effective pressure designed to compensate for the first-order effects of the uniform gravitational field: P ′ = P + ρ0 Φ = P − ρ0 g · x .

(15.107)

(Notice that P ′ measures the amount the pressure differs from the value it would have in supporting a hydrostatic atmosphere of the fluid at the reference density.) The Navier-Stokes equation (15.105) then becomes dv ∇P ′ =− − ατ g + ν∇2 v , dt ρ0

Boussinesq (2)

(15.108)

dropping the small term O(αP ′). In words, a fluid element accelerates in response to a buoyancy force which is the sum of the first and second terms on the right hand side of Eq. (15.108), and a viscous force. In order to solve this equation we must be able to solve for the temperature perturbation, τ . This evolves according to the standard equation of heat diffusion, Eq. (15.94): dτ = χ∇2 τ. Boussinesq (3) dt

(15.109)

Equations (15.101), (15.108) and (15.109) are the equations of fluid flow in the Boussinesq approximation; they control the coupled evolution for the velocity v and the temperature perturbation τ . We shall now use them to discuss free convection in a laboratory apparatus.

15.6.3

T2 Rayleigh-B´ enard Convection

In a relatively simple laboratory experiment to demonstrate convection, a fluid is confined between two rigid plates a distance d apart, each maintained at a fixed temperature, with the upper plate cooler than the lower by ∆T . When ∆T is small, viscous stresses, together with the no-slip boundary conditions at the plates, inhibit circulation; so, despite the upward boyancy force on the hotter, less-dense fluid near the bottom plate, the fluid remains stably at rest with heat being conducted diffusively upward. If the plates’ temperature difference ∆T is gradually increased, the buoyancy becomes gradually stronger. At some critical ∆T it will overcome the restraining viscous forces, and the fluid will start to circulate (convect) between the two plates. Our goal is to determine the critical temperature difference ∆Tcrit for the onset of convection. We now make some physical arguments to simplify the calculation of ∆Tcrit . From our experience with earlier instability calculations, especially those involving elastic bifurcations (Secs. 10.8 and 11.3.5), we anticipate that for ∆T < ∆Tcrit the response of the equilibrium

41 z

T0- T/2

d

x

T0+ T/ 2

Fig. 15.10: Rayleigh-Bénard convection. A fluid is confined between two horizontal surfaces separated by a vertical distance d. When the temperature difference between the two plates ∆T is increased sufficiently, the fluid will start to convect heat vertically. The reference effective pressure P0′ and reference temperature T0 are the values of P ′ and T measured at the midplane z = 0.

to small perturbations will be oscillatory (i.e., will have positive squared eigenfrequency ω 2 ), while for ∆T > ∆Tcrit , perturbations will grow exponentially (i.e., will have negative ω 2 ). Correspondingly, at ∆T = ∆Tcrit , ω 2 for some mode will be zero. This zero-frequency mode will mark the bifurcation of equilibria from one with no fluid motions to one with slow, convective motions. We shall search for ∆Tcrit by searching for a solution to the Boussinesq equations (15.101), (15.108) and (15.109) that represents this zero-frequency mode. In those equations we shall choose for the reference temperature T0 , density ρ0 and effective pressure P0 the values at the midplane between the plates, z = 0; cf. Fig. 15.10. The unperturbed equilibrium, when ∆T = ∆Tcrit , is a solution of the Boussinesq equations (15.101), (15.108) and (15.109) with vanishing velocity, a time-independent vertical temperature gradient dT /dz = −∆T /d, and a compensating, time-independent, vertical pressure gradient: ∆T z 2 ∆T z , P ′ = P0′ + gρ0 α . (15.110) d d 2 When the zero-frequency mode is present, the velocity v will be nonzero, and the temperature and effective pressure will have additional perturbations δτ and δP ′ : v=0,

τ = T − T0 = −

∆T ∆T z 2 ′ ′ z + δτ , P = P0 + gρ0 α + δP ′ . (15.111) v 6= 0 , τ = T − T0 = − d d 2 The perturbations v, δτ and δP ′ are governed by the Boussinesq equations and the boundary conditions v = 0 (no-slip) and δτ = 0 at the plates, z = ±d/2. We shall manipulate these in such a way as to get a partial differential equation for the scalar temperature perturbation δτ by itself, decoupled from the velocity and the pressure perturbation. Consider, first, the result of inserting expressions (15.111) into the Boussinesq-approximated Navier-Stokes equation (15.108). Because the perturbation mode has zero frequency, ∂v/∂t vanishes; and because v is extremely small, we can neglect the quadratic advective term v · ∇v, thereby bringing Eq. (15.108) into the form ∇δP ′ = ν∇2 v − gαδτ . ρ0

(15.112)

42 We want to eliminate δP ′ from this equation. The other Boussinesq equations are of no help for this, since δP ′ is absent from them. When we dealt with sound waves, we eliminated δP using the equation of state P = P (ρ, T ); but in the present analysis our Boussinesq approximation insists that the only significant changes of density are those due to thermal expansion, i.e. it neglects the influence of pressure on density, so the equation of state cannot help us. Lacking any other way to eliminate δP ′, we employ a very common trick: we take the curl of Eq. (15.112). As the curl of a gradient vanishes, δP ′ drops out. We then take the curl one more time and use the fact that ∇ · v = 0 to obtain ν∇2 (∇2 v) = αg∇2 δτ − α(g · ∇)∇δτ .

(15.113)

Turn, next, to the Boussinesq version of the equation of heat transport, Eq. (15.109). Inserting into it Eqs. (15.111) for τ and v, setting ∂δτ /∂t to zero because our perturbation has zero frequency, linearizing in the perturbation, and using g = −gez , we obtain vz ∆T = −χ∇2 δτ . d

(15.114)

This is an equation for the vertical velocity vz in terms of the temperature perturbation δτ . By inserting this vz into the z component of Eq. (15.113), we achieve our goal of a scalar equation for δτ alone: αg∆T νχ∇ ∇ ∇ δτ = d 2

2

2

∂ 2 δτ ∂ 2 δτ + ∂x2 ∂y 2

.

(15.115)

This is a sixth order differential equation, even more formidable than the fourth order equations that arise in the elasticity calculations of Chaps. 10 and 11. We now see how prudent it was to make simplifying assumptions at the outset! The differential equation (15.115) is, however, linear, and we can seek solutions using separation of variables. As the equilibrium is unbounded horizontally, we look for a single horizontal Fourier component with some wave number k; i.e., we seek a solution of the form δτ ∝ exp(ikx)f (z) ,

(15.116)

where f (z) is some unknown function. Such a δτ will be accompanied by motions v in the x and z directions (i.e., vy = 0) that also have the form vj ∝ exp(ikx)fj (z) for some other functions fj (z). The anszatz (15.116) converts the partial differential equation (15.115) into the single ordinary differential equation 3 2 Ra k 2 f d 2 f + − k =0, (15.117) dz 2 d4 where we have introduced yet another dimensionless number Ra =

αg∆T d3 νχ

(15.118)

43 called the Rayleigh number. By virtue of the relation (15.114) between vz and δτ , the Rayleigh number is a measure of the ratio of the strength of the buoyancy term −αδτ g to the viscous term ν∇2 v in the Boussinesq version (15.108) of the Navier-Stokes equation: Ra ∼

buoyancy force . viscous force

(15.119)

The general solution of Eq. (15.117) is an arbitrary, linear combination of three sine functions and three cosine functions: f=

3 X

An cos(µn kz) + Bn sin(µn kz) ,

(15.120)

n=1

where the dimensionless numbers µn are given by " #1/2 1/3 Ra µn = e2πni/3 − 1 ; k 4 d4

n = 1, 2, 3 ,

(15.121)

which involves the three cube roots of unity, e2πni/3 . The values of five of the coefficients An , Bn are fixed in terms of the sixth (an overall arbitrary amplitude) by five boundary conditions at the bounding plates, and a sixth boundary condition then determines the critical temperature difference ∆Tcrit (or equivalently, the critical Rayleigh number Racrit ) at which convection sets in. The six boundary conditions are: (i) The requirement that the fluid temperature be the same as the plate temperature at each plate, so δτ = 0 at z = ±d/2. (ii) The noslip boundary condition vz = 0 at each plate which, by virtue of Eq. (15.114) and δτ = 0 at the plates, translates into δτ,zz = 0 at z = ±d/2 (where the indices after the comma are partial derivatives). (iii) The no-slip boundary condition vx = 0, which by virtue of incompressibility ∇ · v = 0 implies vz,z = 0 at the plates, which in turn by Eq. (15.114) implies δτ,zzz + δτ,xxz = 0 at z = ±d/2. It is straightforward but computationally complex to impose these six boundary conditions and from them deduce the critical Rayleigh number for onset of convection; see Pellew and Southwell (1940). Rather than present the nasty details, we shall switch to a toy problem in which the boundary conditions are adjusted to give a simpler solution, but one with the same qualitative features as for the real problem. Specifically, we shall replace the no-slip condition (iii) (vx = 0 at the plates) by a condition of no shear, (iii’) vx,z = 0 at the plates. By virtue of incompressibility ∇ · v = 0, the x derivative of this translates into vz,zz = 0, which by Eq. (15.114) translates to δτ,zzxx + δτ,zzzz = 0. To recapitulate, we seek a solution of the form (15.120), (15.121) that satisfies the boundary conditions (i), (ii), (iii’). The terms in Eq. (15.120) with n = 1, 2 always have complex arguments and thus always have z dependences that are products of hyperbolic and trigonometric functions with real arguments. For n = 3 and large enough Rayleigh number, µ3 is positive and the solutions are pure sines and cosines. Let us just consider the n = 3 terms alone, in this regime, and impose boundary condition (i), that δτ = 0 at the plates. The cosine term by itself, δτ = constant × cos(µ3 kz) eikx ,

(15.122)

44 k Marginal Stability

imaginary Instability

real Stability

kcrit

=0

Ra crit

Ra

Fig. 15.11: Horizontal wave number k of the first mode to go unstable, as a function of Rayleigh number, Ra. Along the solid curve the mode has zero frequency; to the left of the curve it is stable, to the right it is unstable. Racrit is the minimum Rayleigh number for convective instability.

satisfies this, if we set µ3 kd ≡ 2

"

Ra k 4 d4

1/3

−1

#1/2

kd = (m + 1/2)π , 2

(15.123)

where m is an integer. It is straightforward to show, remarkably, that Eqs. (15.122), (15.123) also satisfy boundary conditions (ii) and (iii’), so they solve the toy version of our problem. As ∆T is gradually increased from zero, the Rayleigh number Ra gradually grows, passing one after another through the sequence of values (15.123) with m = 0, 1, 2, ... (for any chosen k). At each of these values there is a zero-frequency, circulatory mode of fluid motion with horizontal wave number k, which is passing from stability to instability. The first of these, m = 0, represents the onset of circulation for the chosen k, and the Rayleigh number at this onset [Eq. (15.123) with m = 0] is Ra =

(k 2 d2 + π 2 )3 . k 2 d2

(15.124)

This Ra(k) relation is plotted as a thick curve in Fig. 15.11. Notice in Fig. 15.11 that there is a critical Rayleigh number Racrit below which all modes are stable, independent of their wave numbers, and above which modes in some range kmin < k < kmax are unstable. From Eq. (15.124) we deduce that, for our toy problem, Racrit = 27π 4 /4 ≃ 660. When one imposes the correct boundary conditions (i), (ii), (iii) [instead of our toy choice (i), (ii), (iii’)] and works through the nasty details of the computation, one obtains a Ra(k) relation that looks qualitatively the same as Fig. 15.11, one deduces that convection should set in at Racrit ≃ 1700, which agrees reasonably well with experiment. One can carry out the same computation with the fluid’s upper surface free to move (e.g., due to placing air rather than a solid plate at z = d/2). Such a computation predicts that convection begins at Racrit ≃ 1100, though in practice surface tension is usually important and its effect must be included. One feature of these critical Rayleigh numbers is very striking. Because the Rayleigh number is an estimate of the ratio of buoyancy forces to viscous forces [Eq. (15.119)], an

45

Fig. 15.12: Hexagonal convection cells in Rayleigh-Bénard convection. The fluid, which is visualized using aluminum powder, rises at the centers of the hexagons and falls around the edges.

order-of-magnitude analysis suggests that convection should set in at Ra ∼ 1—which is wrong by three orders of magnitude! This provides a vivid reminder that order-of-magnitude estimates can be quite inaccurate. In this case, the main reason for the discrepancy is that the convective onset is governed by a sixth-order differential equation (15.115), and thus is very sensitive to the lengthscale d used in the order-of-magnitude analysis. If we choose d/π rather than d as the length scale, then an order-of-magnitude estimate could give Ra ∼ π 6 ∼ 1000, a much more satisfactory value. Once convection has set in, the unstable modes grow until viscosity and nonlinearities stabilize them, at which point they carry far more heat upward between the plates than does conduction. The convection’s velocity pattern depends, in practice, on the manner in which the heat is applied and the temperature dependence of the viscosity. For a limited range of Rayleigh numbers near Racrit , it is possible to excite a hexagonal pattern of convection cells as shown in Fig. 15.12. When the Rayleigh number becomes very large, the convection becomes fully turbulent and we must introduce an effective turbulent viscosity to replace the molecular viscosity (cf. Chap. 14). Free convection, like that in this laboratory experiment, also occurs in meteorological and geophysical flows. For example for air in a room, the relevant parameter values are α = 1/T ∼ 0.003 K−1 (Charles’ Law), and ν ∼ χ ∼ 10−5 m2 s−1 , so the Rayleigh number is Ra ∼ 3 × 108 (∆T /1K)(d/1m)3 . Convection in a room thus occurs extremely readily, even for small temperature differences. In fact, so many modes of convective motion can be excited that heat-driven air flow is invariably turbulent. It is therefore common in everyday situations to describe heat transport using a phenomenological turbulent thermal conductivity (cf. section 14.3). A further example is given in Box 15.3. **************************** EXERCISES Exercise 15.19 Problem: Critical Rayleigh Number Estimate the temperature to which pans of oil (ν ∼ 10−5 m2 s−1 , Pr∼ 3000), water (ν ∼ 10−6 m2 s−1 , Pr∼ 6) and mercury (ν ∼ 10−7 m2 s−1 , Pr∼ 0.02) would have to be heated in order

46 Box 15.3 Mantle Convection and Continental Drift As is now well known, the continents drift over the surface of the globe on a timescale of roughly a hundred million years. Despite the clear geographical evidence that the continents fit together, geophysicists were, for a long while, skeptical that this occured because they were unable to identify the forces responsible for overcoming the visco-elastic resilience of the crust. It is now known that these motions are in fact slow convective circulation of the mantle driven by internally generated heat from the radioactive decay of unstable isotopes, principally uranium, thorium and potassium. When the heat is generated within the convective layer, rather than passively transported from below, we must modify our definition of the Rayleigh number. Let the heat generated per unit mass per unit time be Q. In the analog of our laboratory analysis, where the fluid is assumed marginally unstable to convective motions, this Q will generate a heat flux ∼ ρQd, which must be carried diffusively. Equating this flux to κ∆T /d, we can solve for the temperature difference ∆T between the lower and upper edges of the convective mantle: ∆T ∼ ρQd2 /κ. Inserting this ∆T into Eq. (15.118), we obtain a modified expression for the Rayleigh number Ra′ =

αρgQd5 . κχν

(1)

Let us now estimate the value of Ra′ for the earth’s mantle. The mantle’s kinematic viscosity can be measured by post-glacial rebound studies (cf. Ex. 13.5) to be ∼ 1017 m2 s−1 . We can use the rate of attenuation of diurnal and annual temperature variation with depth in surface rock to estimate a thermal diffusivity χ ∼ 10−6 m2 s−1 . Direct experiment furnishes an expansion coefficient, α ∼ 3 × 10−5 K−1 . The thickness of the upper mantle is roughly 700 km and the rock density is about 4000 kg m−3 . The rate of heat generation can be estimated both by chemical analysis and direct measurement at the earth’s surface and turns out to be Q ∼ 10−11 W kg−1 . Combining these quantities, we obtain an estimated Rayleigh number Ra′ ∼ 106 , well in excess of the critical value for convection under free slip conditions which evaluates to 868 (Turcotte & Schubert 1982). For this reason, it is now believed that continental drift is driven primarily by mantle convection. to convect. Assume that the upper surface is at room temperature. Do not perform this experiment with mercury! Exercise 15.20 Problem: Width of Thermal Plume Consider a two dimensional thermal plume transporting heat away from a hot knife edge. Introduce a temperature deficit ∆T (z) measuring the typical difference in temperature between the plume and the surrounding fluid at height z above the knife edge, and let δp (z) be the width of the plume at height z. (a) Show that energy conservation implies the constancy of δp ∆T v¯z , where v¯z (z) is the

a 47

plume’s mean vertical speed at height z.

(b) Make an estimate of the buoyancy acceleration and use this to estimate v¯z .

(c) Use Eq. (15.109) to relate the width of the plume to the speed. Hence, show that the width of the plume scales as δp ∝ z 2/5 and the temperature deficit as ∆T ∝ z −3/5 . (d) Repeat this exercise for a three dimensional plume above a hot spot.

****************************

15.6.4

T2 Convection in Stars

The sun and other stars generate heat in their interiors by nuclear reactions. In most stars, the internal energy is predominantly in the form of hot hydrogen and helium ions and their electrons, while the thermal conductivity is due primarily to diffusing photons (Sec. 2.7), which have much longer mean free paths than the ions and electrons. When the photon mean free path becomes small due to high opacity (as happens in the outer 30 per cent of the sun; Fig. 15.13), the thermal conductivity goes down, so in order to transport the heat from nuclear burning, the star develops an increasingly steep temperature gradient. The star may then become convectively unstable and transport its energy far more efficiently by circulating its hot gas than it could have by photon diffusion. Describing this convection is a key step in understanding the interiors of the sun and other stars.

a Convection Zone

Photosphere

Core

Fig. 15.13: A convection zone occupies the outer 30 per cent of a solar-type star.

A heuristic argument provides the basis for a surprisingly simple description of this convection. As a foundation for our argument, let us identify the relevant physics: First: the pressure within stars varies through many orders of magnitude; typically 1012 for the sun. Therefore, we cannot use the Boussinesq approximation; instead, as a fluid element rises or descends, we must allow for its density to change in response to large changes of the surrounding pressure. Second: The convection involves circulatory motions on such large scales that the attendant shears are small and viscosity is thus unimportant. Third: Because the convection is driven by ineffectiveness of conduction, we can idealize each fluid element as retaining its heat as it moves, so the flow is adiabatic. Fourth: the convection will usually be very subsonic, as subsonic motions are easily sufficient to transport the nuclear-generated heat, except very close to the solar surface.

48

A

B

g

B Before

S

A After

Fig. 15.14: Convectively unstable interchange of two blobs in a star. Blob B rises to the former position of blob A and expands adiabatically to match the surrounding pressure. The entropy per unit mass of the blob is higher than that of the surrounding gas and so the blob has a lower density. It will therefore be buoyant and continue to rise. Similarly, blob A will continue to sink.

Our heuristic argument, then, focuses on convecting fluid blobs that move through the star’s interior very subsonically, adiabatically, and without viscosity. As the motion is subsonic, each blob will remain in pressure equilibrium with its surroundings. Now, suppose we make a virtual interchange between two blobs at different heights (Fig. 15.14). The blob that rises (blob B in the figure) will experience a decreased pressure and thus will expand, so its density will diminish. If its density after rising is lower than that of its surroundings, then it will be buoyant and continue to rise. Conversely, if the risen blob is denser than its surroundings, then it will sink back to its original location. Therefore, a criterion for convective instability is that the risen blob has lower density than its surroundings. Since the blob and its surroundings have the same pressure, and since the entropy s per unit mass of gas is larger, the lower is its density (there being more phase space available to its particles), the fluid is convectively unstable if the risen blob has a higher entropy than its surroundings. Now, the blob’s motion was adiabatic, so its entropy per unit mass s is the same after it rises as before. Therefore, the fluid is convectively unstable if the entropy per unit mass s at the location where the blob began (lower in the star) is greater than that at the location to which it rose (higher in the star); i.e., the star is convectively unstable if its entropy per unit mass decreases outward, ds/dr < 0. For small blobs, this instability will be counteracted by both viscosity and heat conduction; but for large blobs viscosity and conduction are ineffective, and the convection proceeds. When building stellar models, astrophysicists find it convenient to determine whether a region of a model is convectively unstable by computing what its structure would be without convection, i.e., with all its heat carried radiatively. That computation gives some temperature gradient dT /dr. If this computed dT /dr is superadiabiatic, i.e., if d ln P d ln T ∂ ln T d ln T − ≡− , (15.125) > − d ln r ∂ ln P s d ln r d ln r s

49 then correspondingly the entropy s decreases outward, and the star is convectively unstable. This is known as the Schwarzschild criterion for convection, since it was formulated by the same Karl Schwarzschild as discovered the Schwarzschild solution to Einstein’s equations (which describes a nonrotating black hole; Chap. 25). In practice, if the star is convective, then the convection is usually so efficient at transporting heat that the actual temperature gradient is only slightly superadiabatic; i.e., the entropy s is nearly independent of radius—it decreases outward only very slightly. (Of course, the entropy can increase significantly outwards in a convectively stable zone where radiative diffusion is adequate to transport heat.) We can demonstrate the efficiency of convection by estimating the convective heat flux when the temperature gradient is slightly superadiabatic, i.e., when ∆|∇T | ≡ |(dT /dr)| − |(dT /dr)s| is slightly positive. As a tool in our estimate, we introduce the concept of the mixing length, denoted by l—the typical distance a blob travels before breaking up. As the blob is in pressure equilibrium, we can estimate its fractional density difference from its surroundings by δρ/ρ ∼ δT /T ∼ ∆|∇T |l/T . Invoking Archimedes’ principle, we estimate the blob’s acceleration to be ∼ gδρ/ρ ∼ g∆|∇T |l/T (where g is the local acceleration of gravity), and hence the average speed with which a blob rises or sinks will be v¯ ∼ (g∆|∇T |/T )1/2 l. The convective heat flux is then given by Fconv ∼ CP ρ¯ v l∆|∇T | ∼ CP ρ(g/T )1/2 (∆|∇T |)3/2 l2 .

(15.126)

We can bring this into a more useful form, accurate to within factors of order unity, by setting the mixing length equal to the pressure scale height l ∼ H = |dr/d ln P | as is usually the case in the outer parts of a star, setting Cp ∼ h/T where h is the enthalpy per unit mass [cf. the first law of thermodynamics, Eq. (3) of Table ??], setting g = −(P/ρ)d ln P/dr ∼ c2 |d ln P/dr| [cf. the equation of hydrostatic equilibrium (12.13) and Eq. (15.48d) for the speed of sound c], and setting |∇T | ≡ |dT /dr| ∼ T d ln P/dr. The resulting expression for Fconv can then be inverted to give |∆∇T | ∼ |∇T |

Fconv hρc

2/3

∼

Fconv p 5 P kB T /mp 2

!2/3

.

(15.127)

Here the last expression is obtained from the fact that the gas is fully ionized, so its enthalpy is h = 52 P/ρ and its speed of sound is about the thermal speed of its protons (the most p kB T /mp (with kB Boltzmann’s constant and mp the numerous massive particle), c ∼ proton rest mass). It is informative to apply this estimate to the convection zone of the sun (the outer ∼ 30 per cent of its radius; Fig. 15.13). The luminosity of the sun is ∼ 4 × 1026 W and its radius is 7 × 105 km, so its convective energy flux is Fconv ∼ 108 W m−2 . Consider, first, the convection zone’s base. The pressure there is P ∼ 1 TPa and the temperature is T ∼ 106 K, so Eq. (15.127) predicts |∆∇T |/|∇T | ∼ 3 × 10−6 ; i.e., the temperature gradient at the base of the convection zone need only be superadiabatic by a few parts in a million in order to carry the solar energy flux.

50 By contrast, at the top of the convection zone (which is nearly at the solar surface), the gas pressure is only ∼ 10 kPa and the sound speed is ∼ 10 km s−1 , so hρc ∼ 108 W m−2 , and |∆∇T |/|∇T | ∼ 1; i.e., the temperature gradient must depart significantly from the adiabatic gradient in order to carry the heat. Moreover, the convective elements, in their struggle to carry the heat, move with a significant fraction of the sound speed so it is no longer true that they are in pressure equilibrium with their surroundings. A more sophisticated theory of convection is therefore necessary near the solar surface. Convection is very important in some other types of stars. It is the primary means of heat transport in the cores of stars with high mass and high luminosity, and throughout very young stars before they start to burn their hydrogen in nuclear reactions. **************************** EXERCISES Exercise 15.21 Problem: Radiative Transport The density and temperature in the interior of the sun are roughly 0.1 kg m−3 and 1.5 × 107 K. (a) Estimate the central gas pressure and radiation pressure and their ratio. (b) The mean free path of the radiation is determined almost equally by Thomson scattering, bound-free absorption and free-free absorption. Estimate numerically the photon mean free path and hence estimate the photon escape time and the luminosity. How well do your estimates compare with the known values for the sun? Exercise 15.22 Problem: Bubbles Consider a small bubble of air rising slowly in a large expanse of water. If the bubble is large enough for surface tension to be ignored, then it will form an irregular cap of radius r. Show that the speed with which the bubble rises is roughly ∼ (gr)1/2 . (A more refined estimate gives a numerical coefficient of 2/3.)

****************************

15.6.5

T2 Double Diffusion — Salt Fingers

Convection, as we have described it so far, is driven by the presence of an unbalanced buoyancy force in an equilibrium distribution of fluid. However, it can also arise as a higher order effect even if the fluid is stably stratified, i.e. if the density gradient is in the same direction as gravity. An example is salt fingering, a rapid mixing that can occur when warm, salty water lies at rest above colder fresh water. The higher temperature of the upper fluid initially outbalances the weight of its salt, making it more buoyant than the fresh water below. However, the heat diffuses downward faster than the salt, enabling a density inversion

51 gradually to develop and the salt-rich fluid to begin a slow interchange with the salt-poor fluid below. It is possible to describe this instability using a local perturbation analysis. The set up is somewhat similar to the one we used in Sec. 15.6.3 to analyze Rayleigh-Bénard convection: We consider a stratified fluid in which there is a vertical gradient in the temperature, and as before, we measure its departure from a reference temperature T0 at a midplane (z = 0) by τ ≡ T − T0 . We presume that in the equilibrium τ varies linearly with z, so ∇τ = (dτ /dz)ez is constant. Similarly, we characterize the salt concentration by C ≡ (concentration) − (equilibrium concentration at the mid plane), and we assume that in equilibrium C like τ varies linearly with height, so ∇C = (dC/dz)ez is constant. The density ρ will be equal to the equilibrium density at the midplane plus corrections due to thermal expansion and due to salt concentration ρ = ρ0 − αρ0 τ + βρ0 C (15.128) [cf. Eq. (15.103)]. Here β is a constant for concentration analogous to the thermal expansion coefficient α for temperature. In this problem, by contrast with Rayleigh-Bénard convection, it is easier to work directly with the pressure than the modified pressure. In equilibrium, hydrostatic equilibrium dictates that its gradient be ∇P = −ρg. Now, let us perturb about these values and write down the linearized equations for the evolution of the perturbations. We shall denote the perturbation of temperature (relative to the reference temperature) by δτ , of salt concentration by δC, of density by δρ, of pressure by δP , and of velocity by simply v since the unperturbed state has v = 0. We shall not ask about the onset of instability, but rather (because we expect our situation to be generically unstable) we shall seek a dispersion relation ω(k) for the perturbations. Correspondingly, in all our perturbation equations we shall replace ∂/∂t with −iω and ∇ with ik, except for the equilibrium ∇C and ∇τ which are constants. The first of our perturbation equations is the linearized Navier-Stokes equation −iωρ0 v = −ikδP + gδρ − νk 2 ρ0 v ,

(15.129)

where we have kept the viscous term because we expect the Prandtl number to be of order unity (for water Pr ∼ 6). Low velocity implies incompressibity ∇ · v = 0, which becomes k·v =0.

(15.130)

The density perturbation follows from the perturbed form of Eq. (15.128) δρ = −αρ0 δτ + βρ0 δC .

(15.131)

The temperature perturbation is governed by Eq. (15.109) which linearizes to −iωδτ + (v · ∇)τ = −χk 2 δτ .

(15.132)

Assuming that the timescale for the salt to diffuse is much longer than the temperature to diffuse, we can ignore salt diffusion all together so that dδC/dt = 0, which becomes −iωδC + (v · ∇)C = 0

(15.133)

52 Equations (15.129)–(15.133) are five equations for the five unknowns δP, δρ, δC, δT, v, one of which is a three component vector! Unless we are careful, we will end up with a seventh order algebraic equation. Fortunately, there is a way to keep the algebra manageable. First, we eliminate the pressure perturbation by taking the curl of Eq. (15.129) [or equivalently by crossing k into Eq. (15.129)]: (−iω + νk 2 )ρ0 k × v = k × gδρ

(15.134)

Taking the curl of this equation again allows us to incorporate incompressibility (15.130): (iω − νk 2 )ρ0 k 2 g · v = [(k · g)2 − k 2 g 2 ]δρ .

(15.135)

Since g points vertically, this is one equation for the density perturbation in terms of the vertical velocity perturbation vz . We can obtain a second equation of this sort by inserting Eq. (15.132) for δτ and Eq. (15.133) for δC into Eq. (15.131); the result is αρ0 βρ0 δρ = − (v · ∇)τ + (v · ∇)C . (15.136) iω − χk 2 iω Since the unperturbed gradients of temperature and salt concentration are both vertical, Eq. (15.136), like (15.135), involves only vz and not vx or vy . Solving both (15.135) and (15.136) for the ratio δρ/vz and equating these two expressions, we obtain the following dispersion relation for our perturbations: (k · g)2 2 2 ω(ω + iνk )(ω + iχk ) + 1 − 2 2 [ωα(g · ∇)τ − (ω + iχk 2 )β(g · ∇)C] = 0 . (15.137) k g When k is real, as we shall assume, we can write this dispersion relation as a cubic equation for p = −iω with real coefficients. The roots for p are either all real or one real and two complex conjugates, and growing modes have the real part of p positive. When the constant term in the cubic is negative, i.e. when (g · ∇)C < 0 ,

(15.138)

we are guaranteed that there will be at least one positive, real root p and this root will correspond to an unstable, growing mode. Therefore, a sufficient condition for instability is that the concentration of salt increase with height! By inspecting the dispersion relation we conclude that the growth rate will be maximal when k·g = 0, i.e. when the wave vector is horizontal. What is the direction of the velocity v for these fastest growing modes? Incompressibility (15.130) says that v is orthogonal to the horizontal k; and Eq. (15.134) says that k × v points in the same direction as k × g, which is horizontal since g is vertical. These two conditions imply that v points vertically. Thus, these fastest modes represent fingers of salty water descending past rising fingers of fresh water; cf. Fig. 15.15. For large k (narrow fingers), the dispersion relation (15.137) predicts a growth rate given approximately by iω ∼

β(g · ∇)C . νk 2

(15.139)

53

v f

L

Fig. 15.15: Salt Fingers in a fluid in which warm, salty water lies on top of cold fresh water.

Thus, the growth of narrow fingers is driven by the concentration gradient and retarded by viscosity. For larger fingers, the temperature gradient will participate in the retardation, since the heat must diffuse in order to break the buoyant stability. Now let us turn to the nonlinear development of this instability. Although we have just considered a single Fourier mode, the fingers that grow are roughly cylindrical rather than sheet-like. They lengthen at a rate that is slow enough for the heat to diffuse horizontally, though not so slow that the salt can diffuse. Let the diffusion coefficient for the salt be χC by analogy with χ for temperature. If the length of the fingers is L and their width is δf , then to facilitate heat diffusion and prevent salt diffusion, the vertical speed v must satisfy χL χC L ≪v≪ 2 . 2 δf δf

(15.140)

Balancing the viscous acceleration vν/δf2 by the buoyancy acceleration gβδC, we obtain v∼

gβδCδf2 . ν

We can therefore re-write Eq. (15.140) as 1/4 1/4 χC νL χνL ≪ δf ≪ . gβδC gβδC

(15.141)

(15.142)

Typically, χC ∼ 0.01χ, so Eq. (15.142) implies that the widths of the fingers lie in a narrow range, as is verified in laboratory experiments. Salt fingering can also occur naturally, for example in an estuary where cold river water flows beneath sea water warmed by the sun. However, the development of salt fingers is quite slow and in practice it only leads to mixing when the equilibrium velocity field is very small. This instability is one example of a quite general type of instability known as double diffusion which can arise when two physical quantities can diffuse through a fluid at different rates. Other examples include the diffusion of two different solutes and the diffusion of vorticity and heat in a rotating flow.

54

**************************** EXERCISES Exercise 15.23 Problem: Laboratory experiment Make an order of magnitude estimate of the size of the fingers and the time it takes for them to grow in a small transparent jar. You might like to try an experiment. Exercise 15.24 Problem: Internal Waves Consider a stably stratified fluid at rest and let there be a small (negative) vertical density gradient, dρ/dz. (a) By modifying the above analysis, ignoring the effects of viscosity, heat conduction and concentration gradients, show that small-amplitude linear waves, which propagate in a direction making an angle θ to the vertical, have an angular frequency given by ω = N| sin θ|, where N ≡ [(g · ∇) ln ρ]1/2 is known as the Brunt-Väisälä frequency. These waves are called internal waves. (b) Show that the group velocity of these waves is orthogonal to the phase velocity and interpret this result physically.

****************************

Bibliographic Note For textbook treatments of waves in fluids, we recommend Lighthill (1978) and Whitham (1974), and from a more elementary and physical viewpoint, Tritton (1977). To develop physical insight into gravity waves on water and sound waves in a fluid, we suggest portions of the movie by Bryson (1964). For solitary-wave solutions to the Korteweg-deVries equation, see materials, including brief movies, at the website of Takasaki (2006). For a brief, physically oriented introduction to Rayleigh-Bénard convection see Chap. 4 of Tritton (1987). In their Chaps. 5 and 6, Landau and Lifshitz (1959) give a fairly succinct treatment of diffusive heat flow in fluids, the onset of convection in several different physical situations, and the concepts underlying double diffusion. In his Chaps. 2–6, Chandrasekhar (1961) gives a thorough and rich treatment of the influence of a wide variety of phenomena on the onset of convection, and on the types of fluid motions that can occur near the onset of convection. The book by Turner (1973) is a thorough treatise on the influence of buoyancy (thermally induced and otherwise) on fluid motions. It includes all topics treated in Sec. 15.6 and much much more.

55 Box 15.4 Important Concepts in Chapter 15 • Gravity waves on water and other liquids, Sec. 15.2 – – – – – –

Deep water waves and shallow water waves, Secs. 15.2.1, 15.2.2 Tsunamis, Ex. 15.6 Dispersion, Sec. 15.3.1 Steepening due to nonlinear effects, Sec. 15.3.1, Fig. 15.4 Solitons or solitary waves; nonlinear steepening balances dispersion, Sec. 15.3 Korteweg-deVries equation, Secs. 15.3.1–15.3.4

• Surface tension and its stress tensor, Box 15.2 – Capillary waves, Sec. 15.2.3 • Rossby Waves in a Rotating Fluid, Sec. 15.4 • Sound waves in fluids and gases, Sec. 15.5 – Sound wave generation in slow-motion approximation: power proportional to squared time derivative of monopole moment, Sec. 15.5.2 – Decibel, Sec. 15.5.2 – Matched asymptotic expansions, Sec. 15.5.3 – Radiation reaction force; runaway solution as a spurious solution that violates the slow-motion approximation used to derive it, Sec. 15.5.3 • Thermal effects and convection, Sec. 15.6 – Coefficient of thermal conductivity, κ, and diffusive heat conduction, Sec. 15.6.1 – Thermal diffusivity, χ = κ/ρCp , and diffusion equation for temperature, Sec. 15.6.1 – Thermal expansion coefficient, α = (∂ ln ρ/∂T )P , Sec. 15.6.2 – Prandtl number, Pr= ν/χ ∼(vorticity diffusion)/(heat diffusion), Sec. 15.6.1 – Péclet number, Pe= V L/χ ∼(advection)/(conduction), Sec. 15.6.1 – Rayleigh number Ra=αg/∆T d3/(νχ) ∼(buoyancy)/(viscous force), Sec. 15.6.3 – Boussinesq approximation for analyzing thermally induced buoyancy, Sec. 15.6.2 – Free convection and forced convection, Sec. 17.1 – Rayleigh-Bénard (free) convection, Sec. 15.6.3 and Fig. 15.10 – Critical Rayleigh number for onset of Rayleigh-Bénard convection, Sec. 15.6.3 – Schwarzschild criterion for convection in stars, Sec. 15.6.4 – Double-diffusion instability, Sec. 15.6.5

56

Bibliography Bryson, A.E. 1964. Waves in Fluids, a movie (National Committee for Fluid Mechanics Films); available at http://web.mit.edu/fluids/www/Shapiro/ncfmf.html . Burke, W. W. 1970. “Runaway solutions: remarks on the asymptotic theory of radiation damping,” Phys. Rev. A 2, 1501–1505. Chandrasekhar, S. 1961. Hydrodynamics and Hydromagnetic Stability Oxford:Oxford University Press00 Cole, J. 1974. Perturbation Methods in Applied Mathematics, Waltham Mass: Blaisdell. Chelton, D.B. and Schlax, M.G. (1996). “Global observations of oceanic Rossby waves”, Science, 272, 234. Gill, A. E. 1982. Atmosphere Ocean Dynamics, New York: Academic Press. Greenspan, H. P. 1973. The Theory of Rotating Fluids, Cambridge: Cambridge University Press. Jackson, J. D. 1999. Classical Electrodynamics, third edition, New York: Wiley. Landau, L. D. and Lifshitz, E. M. 1959. Fluid Dynamics, Oxford: Pergamon. Libbrecht, K. G. & Woodard, M. F. 1991. Science, 253, 152. Lighthill, M. J. 1978. Waves in Fluids, Cambridge: Cambridge University Press. Pellew, A. and Southwell, R. V. 1940. Proceedings of the Royal Society, A176, 312. Rorlich, F. 1965. Classical Charged Particles, Reading Mass: Addison Wesley. Scott-Russell, J. 1844. Proc. Roy. Soc. Edinburgh, 319 (1844). Takasaki, K. 2006. Many Faces of Solitons, http://www.math.h.kyoto-u.ac.jp/ takasaki/soliton-lab/gallery/solitons/kdv-e.html . Tritton, D. J. 1987. Physical Fluid Dynamics, Oxford: Oxford Science Publications. Turcotte, D. L. and Schubert, G. 1982. Geodynamics, New York: Wiley. Turner, J. S. 1973. Buoyancy Effects in Fluids, Cambridge: Cambridge University Press. Whitham, G. B. 1974. Linear and Non-Linear Waves, New York: Wiley.

Contents 15 Waves and Convection 15.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Gravity Waves on the Surface of a Fluid . . . . . . . 15.2.1 Deep Water Waves . . . . . . . . . . . . . . . 15.2.2 Shallow Water Waves . . . . . . . . . . . . . . 15.2.3 Capillary Waves . . . . . . . . . . . . . . . . . 15.2.4 Helioseismology . . . . . . . . . . . . . . . . . 15.3 Nonlinear Shallow Water Waves and Solitons . . . . . 15.3.1 Korteweg-de Vries (KdV) Equation . . . . . . 15.3.2 Physical Effects in the KdV Equation . . . . . 15.3.3 Single-Soliton Solution . . . . . . . . . . . . . 15.3.4 Two-Soliton Solution . . . . . . . . . . . . . . 15.3.5 Solitons in Contemporary Physics . . . . . . . 15.4 Rossby Waves in a Rotating Fluid . . . . . . . . . . . 15.5 Sound Waves . . . . . . . . . . . . . . . . . . . . . . 15.5.1 Wave Energy . . . . . . . . . . . . . . . . . . 15.5.2 Sound Generation . . . . . . . . . . . . . . . . 15.5.3 T2 Radiation Reaction, Runaway Solutions, totic Expansions1 . . . . . . . . . . . . . . . 15.6 T2 Convection . . . . . . . . . . . . . . . . . . . . 15.6.1 T2 Heat Conduction . . . . . . . . . . . . . 15.6.2 T2 Boussinesq Approximation . . . . . . . . 15.6.3 T2 Rayleigh-Bénard Convection . . . . . . . 15.6.4 T2 Convection in Stars . . . . . . . . . . . . 15.6.5 T2 Double Diffusion — Salt Fingers . . . .

1

Our treatment is based on Burke (1970).

0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and Matched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asymp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 3 6 6 6 8 14 14 16 17 19 20 21 24 25 27 29 34 34 39 41 47 51

Chapter 15 Waves and Convection Version 0815.4.K, 25 February 2009. Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 15.1 Reader’s Guide • This chapter relies heavily on Chaps. 12 and 13. • Chap. 16 (compressible flows) relies to some extent on Secs. 15.2, 15.3 and 15.5 of this chapter. • The remaining chapters of this book do not rely significantly on this chapter.

15.1

Overview

In the preceding chapters, we have derived the basic equations of fluid dynamics and developed a variety of techniques to describe stationary flows. We have also demonstrated how, even if there exists a rigorous, stationary solution of these equations for a time-steady flow, instabilities may develop and the amplitude of oscillatory disturbances can grow with time. These unstable modes of an unstable flow can usually be thought of as waves that interact strongly with the flow and extract energy from it. Waves, though, are quite general and can be studied quite independently of their sources. Fluid dynamical waves come in a wide variety of forms. They can be driven by a combination of gravitational, pressure, rotational and surface-tension stresses and also by mechanical disturbances, such as water rushing past a boat or air passing through a larynx. In this chapter, we shall describe a few examples of wave modes in fluids, chosen to illustrate general wave properties. The most familiar types of wave are probably gravity waves on a large body of water (Sec. 15.2), e.g. ocean waves and waves on the surfaces of lakes and rivers. We consider these 1

2 in the linear approximation and find that they are dispersive in general, though they become nondispersive in the long-wavelength (shallow-water) limit. We shall illustrate gravity waves by their roles in helioseismology, the study of coherent-wave modes excited within the body of the sun by convective overturning motions. We shall also examine the effects of surface tension on gravity waves, and in this connection shall develop a mathematical description of surface tension (Box 15.2). In contrast to the elastodynamic waves of Chap. 11, waves in fluids often develop amplitudes large enough that nonlinear effects become important (Sec. 15.3). The nonlinearities can cause the front of a wave to steepen and then break—a phenomenon we have all seen at the sea shore. It turns out that, at least under some restrictive conditions, nonlinear waves have some very surprising properties. There exist soliton or solitary-wave modes in which the front-steepening due to nonlinearity is stably held in check by dispersion, and particular wave profiles are quite robust and can propagate for long intervals of time without breaking or dispersing. We shall demonstrate this by studying flow in a shallow channel. We shall also explore the remarkable behaviors of such solitons when they pass through each other. In a nearly rigidly rotating fluid, there is a remarkable type of wave in which the restoring force is the Coriolis effect, and which have the unusual property that their group and phase velocities are oppositely directed. These so-called Rossby waves, studied in Sec. 15.4, are important in both the oceans and the atmosphere. The simplest fluid waves of all are small-amplitude sound waves—a paradigm for scalar waves. These are nondispersive, just like electromagnetic waves, and are therefore sometimes useful for human communication. We shall study sound waves in Sec.15.5 and shall use them to explore an issue in fundamental physics: the radiation reaction force that acts back on a wave-emitting object. We shall also explore how sound waves can be produced by fluid flows. This will be illustrated with the problem of sound generation by high-speed turbulent flows—a problem that provides a good starting point for the topic of the following chapter, compressible flows. The last section of this chapter, Sec. 15.6, deals with dynamical motions of a fluid that are driven by thermal effects, convection. To understand convection, one must first understand diffusive head conduction. When viewed microscopically, heat conduction is a similar transport process to viscosity, and it is responsible for analogous physical effects. If a viscous fluid has high viscosity, then vorticity diffuses through it rapidly; simularly, if a fluid has high thermal conductivity, then heat diffuses through it rapidly. In the other extreme, when viscosity is low (i.e., when the Reynolds number is high), instabilities produce turbulence, which transports vorticity far more rapidly than diffusion could possibly do. Analogously, in heated fluids with modest conductivity, the accumulation of heat drives the fluid into convective motion and the heat is transported much more efficiently by this motion than by thermal conduction alone. As the convective heat transport increases, the fluid motion becomes more vigorous and, if the viscosity is sufficiently low, the thermally driven flow can also become turbulent. These effects are very much in evidence near solid boundaries, where thermal boundary layers can be formed, analogous to viscous boundary layers. In addition to thermal effects that resemble the effects of viscosity, there are also unique

3 thermal effects—particularly the novel and subtle combined effects of gravity and heat. Heat, unlike vorticity, causes a fluid to expand and thus, in the presence of gravity, to become buoyant; and this buoyancy can drive thermal circulation or free convection in an otherwise stationary fluid. (Free convection should be distinguished from forced convection in which heat is carried passively by a flow driven in the usual manner by externally imposed pressure gradients, for example when you blow on hot food to cool it.) The transport of heat is a fundamental characteristic of many flows. It dictates the form of global weather patterns and ocean currents. It is also of great technological importance and is studied in detail, for example, in the cooling of nuclear reactors and the design of automobile engines. From a more fundamental perspective, as we have already discussed, the analysis and experimental studies of convection have led to major insights into the route to chaos (cf. Sec. 14.5). In Sec. 15.6, we shall describe some flows where thermal effects are predominant. We shall begin in Sec. 15.6.1 by modifying the conservation laws of fluid dynamics so as to incorporate heat conduction. Then in Sec. 15.6.2 we shall discuss the Boussinesq approximation, which is appropriate for modest scale flows where buoyancy is important. This allows us in Sec. 15.6.3 to derive the conditions under which convection is initiated. Unfortunately, this Boussinesq approximation sometimes breaks down. In particular, as we discuss in Sec. 15.6.4, it is inappropriate for application to convection in stars and planets where circulation takes place over several gravitational scale heights. Here, we shall have to use alternative, more heuristic arguments to derive the relevant criterion for convective instability, known as the Schwarzschild criterion, and to quantify the associated heat flux. We shall apply this theory to the solar convection zone. Finally, in Sec. 15.6.5 we return to simple buoyancy-driven convection in a stratified fluid to consider double diffusion, a quite general type of instability which can arise when the diffusion of two physical quantities (in our case heat and the concentration of salt) render a fluid unstable despite the fact that the fluid would be stably stratified if there were only concentration gradients of one of these quantities.

15.2

Gravity Waves on the Surface of a Fluid

Gravity waves 1 are waves on and beneath the surface of a fluid, for which the restoring force is the downward pull of gravity. Familiar examples are ocean waves and the waves produced on the surface of a pond when a pebble is thrown in. Less familiar are “g-modes” of vibration of the sun, discussed at the end of this section. Consider a small-amplitude wave propagating along the surface of a flat-bottomed lake with depth ho , as shown in Fig. 15.1. As the water’s displacement is small, we can describe the wave as a linear perturbation about equilibrium. The equilibrium water is at rest, i.e. it has velocity v = 0. The water’s perturbed motion is essentially inviscid and incompressible, so ∇ · v = 0. A simple application of the equation of vorticity transport, Eq. (13.3), assures us that since the water is static and thus irrotational before and after the wave passes, it 1

Not to be confused with gravitational waves which are waves in the gravitational field that propagate at the speed of light and which we shall meet in Chap. 25

4 z 2 /k x

(x,t)

h0

Fig. 15.1: Gravity Waves propagating horizontally across a lake of depth ho .

must also be irrotational within the wave. Therefore, we can describe the wave inside the water by a velocity potential ψ whose gradient is the velocity field, v = ∇ψ .

(15.1)

Incompressibility, ∇ · v = 0, applied to this equation, implies that the velocity potential ψ satisfies Laplace’s equation ∇2 ψ = 0 (15.2) We introduce horizontal coordinates x, y and a vertical coordinate z measured upward from the lake’s equilibrium surface (cf. Fig. 15.1), and for simplicity we confine attention to a sinusoidal wave propagating in the x direction with angular frequency ω and wave number k. Then ψ and all other perturbed quantities will have the form f (z) exp[i(kx−ωt)] for some function f (z). More general disturbances can be expressed as a superposition of many of these elementary wave modes propagating in various horizontal directions (and in the limit, as a Fourier integral). All of the properties of such superpositions follow straightforwardly from those of our elementary plane-wave mode, so we shall continue to focus on it. We must use Laplace’e equation (15.2) to solve for the vertical variation, f (z), of the velocity potential. As the horizontal variation at a particular time is ∝ exp(ikx), direct substitution into Eq. (15.2) gives two possible vertical variations, ψ ∝ exp(±kz). The precise linear combination of these two forms is dictated by the boundary conditions. The one that we shall need is that the vertical component of velocity vz = ∂ψ/∂z vanish at the bottom of the lake (z = −ho ). The only combination that can vanish is a sinh function. Its integral, the velocity potential, therefore involves a cosh function: ψ = ψ0 cosh[k(z + ho )] exp[i(kx − ωt)].

(15.3)

An alert reader might note at this point that the horizontal velocity does not vanish at the lake bottom, whereas a no-slip condition should apply in practice. In fact, as we

5 discussed in Sec 13.4, a thin, viscous boundary layer along the bottom of the lake will join our potential-flow solution (15.3) to nonslipping fluid at the bottom. We shall ignore the boundary layer under the (justifiable) assumption that for our oscillating waves it is too thin to affect much of the flow. Returning to the potential flow, we must also impose a boundary condition at the surface. This can be obtained from Bernoulli’s law. The version of Bernoulli’s law that we need is that for an irrotational, isentropic, time-varying flow: v2 /2 + h + Φ + ∂ψ/∂t = constant everywhere in the flow

(15.4)

[Eqs. (12.46), (12.50)]. We shall apply this law at the surface of the perturbed water. Let us examine each term: (i) The term v2 /2 is quadratic in a perturbed quantity and therefore can be dropped. (ii) The enthalpy h = u + P/ρ (cf. Box 12.1) is a constant since u and ρ are constants throughout the fluid and P is constant on the surface and equal to the atmospheric pressure. [Actually, there will be a slight variation of the surface pressure caused by the varying weight of the air above the surface, but as the density of air is typically ∼ 10−3 that of water, this is a very small correction.] (iii) The gravitational potential at the fluid surface is Φ = gξ, where ξ(x, t) is the surface’s vertical displacement from equilibrium and we ignore an additive constant. (iv) The constant on the right hand side, which could depend on time C(t), can be absorbed into the velocity potential term ∂ψ/∂t without changing the physical observable v = ∇ψ. Bernoulli’s law applied at the surface therefore simplifies to give gξ +

∂ψ =0. ∂t

(15.5)

Now, the vertical component of the surface velocity in the linear approximation is just vz (z = 0, t) = ∂ξ/∂t. Expressing vz in terms of the velocity potential we then obtain ∂ψ ∂ξ = vz = . ∂t ∂z

(15.6)

Combining this with the time derivative of Eq. (15.5), we obtain an equation for the vertical gradient of ψ in terms of its time derivative: g

∂ψ ∂2ψ =− 2. ∂z ∂t

(15.7)

Finally, substituting Eq. (15.3) into Eq. (15.7) and setting z = 0 [because we derived Eq. (15.7) only at the water’s surface], we obtain the dispersion relation for linearized gravity waves: ω 2 = gk tanh(kho ) (15.8) How do the individual elements of fluid move in a gravity wave? We can answer this question by first computing the vertical and horizontal components of the velocity by differentiating Eq. (15.3) [Ex. 15.1]. We find that the fluid elements undergo elliptical motion similar to that found for Rayleigh waves on the surface of a solid (Sec.11.4). However, in gravity waves, the sense of rotation of the particles is always the same at a particular phase of the wave, in contrast to reversals found in Rayleigh waves. We now consider two limiting cases: deep water and shallow water.

6

15.2.1

Deep Water Waves

When the water is deep compared to the wavelength of the waves, kho ≫ 1, the dispersion relation (15.8) is approximately p ω = gk . (15.9) p Thus, deep water waves are dispersive; their group velocity Vg ≡ dω/dk = 21 g/k is half p their phase velocity, Vφ ≡ ω/k = g/k. [Note: We could have deduced the deep-water dispersion relation (15.9), up to a dimensionless multiplicative constant, by dimensional arguments: The only frequency that can be constructed from the relevant variables g, k, ρ √ is gk.]

15.2.2

Shallow Water Waves

For shallow water waves, with kho ≪ 1, the dispersion relation (15.8) becomes ω=

p

gho k .

(15.10)

√ Thus, these waves are nondispersive; their phase and group velocities are Vφ = Vg = gho . Below, when studying solitons, we shall need two special properties of shallow water waves. First, when the depth of the water is small compared with the wavelength, but not very small, the waves will be slightly dispersive. We can obtain a correction to Eq. (15.10) by expanding the tanh function of Eq. (15.8) as tanhx = x − x3 /3 + . . . . The dispersion relation then becomes p 1 2 2 (15.11) ω = gho 1 − k ho k . 6 Second, by computing v = ∇ψ from Eq. (15.3), we find that in the shallow-water limit the horizontal motions are much larger than the vertical motions, and are essentially independent of depth. The reason, physically, is that the fluid acceleration is produced almost entirely by a horizontal pressure gradient (caused by spatially variable water depth) that is independent of height; see Ex. 15.1.

15.2.3

Capillary Waves

When the wavelength is very short (so k is very large), we must include the effects of surface tension on the surface boundary condition. This can be done by a very simple, heuristic argument. Surface tension is usually treated as an isotropic force per unit length, γ, that lies in the surface and is unaffected by changes in the shape or size of the surface; see Box 15.2. In the case of a gravity wave, this tension produces on the fluid’s surface a net downward force per unit area −γd2 ξ/dx2 = γk 2 ξ, where k is the horizontal wave number. [This downward force is like that on a curved violin string; cf. Eq. (11.27) and associated discussion.] This additional force must be included in Eq. (15.5) as an augmentation of ρg. Correspondingly, the effect of surface tension on a mode with wave number k is simply to change the true

7 Box 15.2 Surface Tension In a water molecule, the two hydrogen atoms stick out from the larger oxygen atom somewhat like Micky Mouse’s ears, with an H-O-H angle of 105 degrees. This asymmetry of the molecule gives rise to a rather large electric dipole moment. In the interior of a body of water, the dipole moments are oriented rather randomly, but near the water’s surface they tend to be parallel to the surface and bond with each other so as to create surface tension — a macroscopically isotropic, two-dimensional tension force (force per unit length) γ that is confined to the water’s surface.

γ

γ

z

L

x P

(a)

y

(b)

More specifically, consider a line L in the water’s surface, with unit length [drawing (a) above]. The surface water on one side of L exerts a tension (pulling) force on the surface water on the other side. The magnitude of this force is γ and it is orthogonal to the line L regardless of L’s orientation. This is analogous to an isotropic pressure P in three dimensions, which acts orthogonally across any unit area. Choose a point P in the water’s surface and introduce local Cartesian coordinates there with x and y lying in the surface and z orthogonal to it [drawing (b) above]. In this coordinate system, the 2-dimensional stress tensor associated with surface tension has components (2) Txx =(2) Tyy = −γ, analogous to the 3-dimensional stress tensor for an isotropic pressure, Txx = Tyy = Tzz = P . We can also use a 3-dimensional stress tensor to describe the surface tension: Txx = Tyy = −γδ(z); all other Tjk = 0. If we integrate this 3-dimensional stress tensor through surface, we obtain the 2-dimensional R R the water’s R (2) stress tensor: Tjk dz = Tjk ; i.e., Txx dz = Tyy dz = −γ. The 2-dimensional metric of the surface is (2) g = g − ez ⊗ ez ; in terms of this 2-dimensional metric, the surface tension’s 3-dimensional stress tensor is T = −γδ(z)(2) g . Water is not the only fluid that exhibits surface tension; all fluids do so, at the interfaces between themselves and other substances. For a thin film, e.g. a soap bubble, there are two interfaces (the top face and the bottom face of the film), so the stress tensor is twice as large as for a single surface, T = −2γδ(z)(2) g. The hotter the fluid, the more randomly are oriented its surface molecules and hence the smaller the fluid’s surface tension γ. For water, γ varies from 75.6 dyne/cm2 at T = 0 C, to 72.0 dyne/cm2 at T = 25 C, to 58.9 dyne/cm2 at T = 100 C. In Exs. 15.3 and 15.4 we explore some applications of surface tension. In Sec. 15.2.3 and Ex. 15.5 we explore the influence of surface tension on water waves.

8 gravity to an effective gravity g→g+

γk 2 . ρ

(15.12)

The remainder of the derivation of the dispersion relation for deep gravity waves carries over unchanged, and the dispersion relation becomes ω 2 = gk +

γk 3 ρ

(15.13)

[cf. Eqs. (15.9) and (15.12)]. When the second term dominates, the waves are sometimes called capillary waves.

15.2.4

Helioseismology

The sun provides an excellent example of the excitation of small amplitude waves in a fluid body. In the 1960s, Robert Leighton and colleagues discovered that the surface of the sun oscillates vertically with a period of roughly five minutes and a speed of ∼ 1 km s−1 . This was thought to be an incoherent surface phenomenon until it was shown that the observed variation was, in fact, the superposition of thousands of highly coherent wave modes excited within the sun’s interior — normal modes of the sun. Present day techniques allow surface velocity amplitudes as small as 2 mm s−1 to be measured, and phase coherence for intervals as long as a year has been observed. Studying the frequency spectrum and its variation provides a unique probe of the sun’s interior structure, just as the measurement of conventional seismic waves, as described in Sec.11.4, probes the earth’s interior. The description of the normal modes of the sun requires some modification of our treatment of gravity waves. We shall eschew details and just outline the principles. First, the sun is (very nearly) spherical. We therefore work in spherical polar coordinates rather than Cartesian coordinates. Second, the sun is made of hot gas and it is no longer a good approximation to assume that the fluid is always incompressible. We must therefore replace the equation ∇ · v = 0 with the full equation of continuity (mass conservation) together with the equation of energy conservation which governs the relationship between the density and pressure perturbations. Third, the sun is not uniform. The pressure and density in the unperturbed gas vary with radius in a known manner and must be included. Fourth, the sun has a finite surface area. Instead of assuming that there will be a continuous spectrum of waves, we must now anticipate that the boundary conditions will lead to a discrete spectrum of normal modes. Allowing for these complications, it is possible to derive a differential equation to replace Eq. (15.7). It turns out that a convenient dependent variable (replacing the velocity potential ψ) is the pressure perturbation. The boundary conditions are that the displacement vanish at the center of the sun and that the pressure perturbation vanish at the surface. At this point the problem is reminiscent of the famous solution for the eigenfunctions of the Schrödinger equation for a hydrogen atom in terms of associated Laguerre polynomials. The wave frequencies of the sun’s normal modes are given by the eigenvalues of the differential equation. The corresponding eigenfunctions can be classified using three quantum

9

(b)

Frequency, mHz

(a)4

g10 ( = 2)

p17 ( = 20)

g18 ( = 4)

p10 ( = 60)

3

2

1 0

20

40

60

80

100

Spherical Harmonic Degree

120

140

0

1

.5

r/R

0

1

.5

r/R

Fig. 15.2: (a) Measured frequency spectrum for solar p-modes with different values of the quantum numbers n, l. The error bars are magnified by a factor 1000. Frequencies for modes with n > 30 and l > 1000 have been measured. (b) Sample eigenfunctions for g and p modes labeled by n (subscripts) and l (parentheses). The ordinate is the radial velocity and the abscissa is fractional radial distance from the sun’s center to its surface. The solar convection zone is the dashed region at the bottom. (Adapted from Libbrecht and Woodard 1991.)

numbers, n, l, m, where n counts the number of radial nodes in the eigenfunction and the angular variation is proportional to the spherical harmonic Ylm (θ, φ). If the sun were precisely spherical, the modes that are distinguished only by their m quantum number would be degenerate just as is the case with an atom when there is no preferred direction in space. However, the sun rotates with a latitude-dependent period in the range ∼ 25 − 30 days and this breaks the degeneracy just as an applied magnetic field in an atom breaks the degeneracy of the atom’s states (the Zeeman effect). From the splitting of the solar-mode spectrum, it is possible to learn about the distribution of rotational angular momentum inside the sun. When this problem is solved in detail, it turns out that there are two general classes of modes. One class is similar to gravity waves, in the sense that the forces which drive the gas’s motions are produced primarily by gravity (either directly, or indirectly via the weight of overlying material producing pressure that pushes on the gas.) These are called g modes. In the second class (known as p and f modes), the pressure forces arise mainly from the compression of the fluid just like in sound waves (which we shall study in Sec. 15.5 below). Now, it turns out that the g modes have large amplitudes in the middle of the sun, whereas the p and f modes are dominant in the outer layers [cf. Fig. 15.2(b)]. The reasons for this are relatively easy to understand and introduce ideas to which we shall return: The sun is a hot body, much hotter at its center (T ∼ 1.5 × 107 K) than on its surface (T ∼ 6000 K). The sound speed c is therefore much greater in its interior and so p and f modes of a given frequency ω can carry their energy flux ∼ ρξ 2 ω 2 c (Sec.15.5) with much smaller amplitudes ξ than near the surface. That is why the p- and f -mode amplitudes are much smaller in the center of the sun than near the surface. The g-modes are controlled by different physics and thus behave differently: The outer ∼ 30 percent (by radius) of the sun is convective (cf. Sec. 15.6.4) because the diffusion of

10 photons is inadequate to carry the huge amount of nuclear energy being generated in the solar core. The convection produces an equilibrium variation of pressure and density with radius that are just such as to keep the sun almost neutrally stable, so that regions that are slightly hotter (cooler) than their surroundings will rise (sink) in the solar gravitational field. Therefore there cannot be much of a mechanical restoring force which would cause these regions to oscillate about their average positions, and so the g modes (which are influenced almost solely by gravity) have little restoring force and thus are evanescent in the convection zone, and so their amplitudes decay quickly with increasing radius there. We should therefore expect only p and f modes to be seen in the surface motions and this is, indeed the case. Furthermore, we should not expect the properties of these modes to be very sensitive to the physical conditions in the core. A more detailed analysis bears this out. **************************** EXERCISES Exercise 15.1 Problem: Fluid Motions in Gravity Waves (a) Show that in a gravity wave in water of arbitrary depth, each fluid element undergoes elliptical motion. (Assume that the amplitude of the water’s displacement is small compared to a wavelength.) (b) Calculate the longitudinal diameter of the motion’s ellipse, and the ratio of vertical to longitudinal diameters, as functions of depth. (c) Show that for a deep-water wave, kho ≫ 1, the ellipses are all circles with diameters that die out exponentially with depth. (d) We normally think of a circular motion of fluid as entailing vorticity, but a gravity wave in water has vanishing vorticity. How can this vanishing vorticity be compatible with the circular motion of fluid elements? (e) Show that for a shallow-water wave, kho ≪ 1, the motion is (nearly) horizontal and independent of height z. (f) Compute the pressure perturbation P (x, z) inside the fluid for arbitrary depth. Show that, for a shallow-water wave the pressure is determined by the need to balance the weight of the overlying fluid, but for general depth, vertical fluid accelerations alter this condition of weight balance. Exercise 15.2 Problem: Maximum size of a water droplet What is the maximum size of water droplets that can form by water very slowly dripping out of a pipette? and out of a water faucet?

11 Exercise 15.3 Problem: Force balance for an interface between two fluids Consider a point P in the curved interface between two fluids. Introduce Cartesian coordinates at P with x and y parallel to the interface and z orthogonal [as in diagram (b) of Box 15.2], and orient the x and y axes along the directions of the interface’s “principal curvatures”, so the local equation for the interface is z=

y2 x2 + . 2R1 2R2

(15.14)

Here R1 and R2 are the surface’s “principal radii of curvature” at P; note that each of them can be positive or negative, depending on whether the surface bends up or down along their directions. Show that stress balance ∇ · T = 0 for the surface implies that the pressure difference across the surface is 1 1 , + ∆P = γ (15.15) R1 R2 where γ is the surface tension. Exercise 15.4 Challenge: Minimum Area of Soap Film For a soap film that is attached to a bent wire (e.g. to the circular wire that a child uses to blow a bubble), the air pressure on the film’s two sides is the same. Therefore, Eq. (15.15) (with γ replaced by 2γ since the film has two faces) tells us that at every point of the film, its two principal radii of curvature must be equal and opposite, R1 = −R2 . It is an interesting excercise in differential geometry to show that this means that the soap film’s surface area is an extremum with respect to variations of the film’s shape, holding its boundary on the wire fixed. If you know enough differential geometry, prove this extremal-area property of soap films, and then show that, in order for the film’s shape to be stable, its extremal area must actually be a minimum. Exercise 15.5 Problem: Capillary Waves Consider deep-water gravity waves of short enough wavelength that surface tension must be included, so the dispersion relation is Eq. (15.13). Show that there is a minimum value of the group velocity and find its value together with the wavelength of the associated wave. Evaluate these for water (γ ∼ 0.07 N m−1 ). Try performing a crude experiment to verify this phenomenon. Exercise 15.6 Example: Boat Waves A toy boat moves with uniform velocity u across a deep pond (Fig. 15.3). Consider the wave pattern (time-independent in the boat’s frame) produced on the water’s surface at distances large compared to the boat’s size. Both gravity waves and surface-tension or capillary waves are excited. Show that capillary waves are found both ahead of and behind the boat, and gravity waves, solely inside a trailing wedge. More specifically: (a) In the rest frame of the water, the waves’ dispersion relation is Eq. (15.13). Change notation so ω is the waves’ angular velocity as seen in the boat’s frame and ωo in the

12 Vgo Q

u P gw

Gr

av

ity

W av

es

Fig. 15.3: Capillary and Gravity waves excited by a small boat (Ex. 15.6).

water’s frame, so the dispersion relation is ωo2 = gk + (γ/ρ)k 3 . Use the doppler shift (i.e. the transformation between frames) to derive the boat-frame dispersion relation ω(k). (b) The boat radiates a spectrum of waves in all directions. However, only those with vanishing frequency in the boat’s frame, ω = 0, contribute to the time-independent (“stationary”) pattern. As seen in the water’s frame and analyzed in the geometric optics approximation of Chap. 6, these waves are generated by the boat (at points along its dash-dot trajectory in Fig. 15.3) and travel outward with the group velocity Vgo . Regard Fig. 15.3 as a snapshot of the boat and water at a particular moment of time. Consider a wave that was generated at an earlier time, when the boat was at location P, and that traveled outward from there with speed Vgo at an angle φ to the boat’s direction of motion. (You may restrict yourself to 0 ≤ φ ≤ π/2.) Identify the point Q that this wave has reached, at the time of the snapshot, by the angle θ shown in the figure. Show that θ is given by tan θ =

Vgo (k) sin φ , u − Vgo (k) cos φ

(15.16a)

where k is determined by the dispersion relation ω0 (k) together with the “vanishing ω” condition ω0 (k, φ) = uk cos φ . (15.16b)

(c) Specialize to capillary waves [k ≫

p gρ/γ]. Show that

tan θ =

3 tan φ . 2 tan2 φ − 1

(15.17)

Demonstrate that the capillary wave pattern is present for all values of θ (including in front of the boat, π/2 < θ < π, and behind it, 0 ≤ θ ≤ π/2).

13 (d) Next, specialize to gravity waves and show that tan θ =

tan φ . 2 tan2 φ + 1

(15.18)

Demonstrate that the gravity-wave pattern is confined to a trailing wedge with angles θ < θgw = sin−1 (1/3) = 19.47o ; cf. Fig. 15.3. You might try to reproduce these results experimentally. Exercise 15.7 Example: Shallow-Water Waves with Variable Depth; Tsunamis2 Consider shallow-water waves in which the height of the bottom boundary varies, so the unperturbed water’s depth is variable ho = ho (x, y). (a) Show that the wave equation for the perturbation ξ(x, y, t) of the water’s height takes the form ∂2ξ − ∇ · (gho ∇ξ) = 0 . (15.19) ∂t2 Note that gho is the square of the wave’s propagation speed c2 (phase speed and group speed), so this equation takes the form that we studied in the geometric optics approximation in Sec. 6.3.1. (b) Describe what happens to the direction of propagation of a wave as the depth ho of the water varies (either as a set of discrete jumps in ho or as a slowly varying ho ). As a specific example, how must the propagation direction change as waves approach a beach (but when they are sufficiently far out from the beach that nonlinearities have not yet caused them to begin to break). Compare with your own observations at a beach. (c) Tsunamis are waves with enormous wavelengths, ∼ 100 km or so, that propagate on the deep ocean. Since the ocean depth is typically ∼ 4 km, tsunamis are governed by the shallow-water wave equation (15.19). What would you have to do to the ocean floor to create a lens that would focus a tsunami, generated by an earthquake near Japan, so that it destroys Los Angeles? For simulations of tsunami propagation, see, e.g., http://bullard.esc.cam.ac.uk/~taylor/Tsunami.html . (d) The height of a tsunami, when it is in the ocean with depth ho ∼ 4 km, is only ∼ 1 meter or less. Use Eq. (15.19) to show that the tsunami height will increase by a large factor as the tsunami nears the shore. **************************** 2

Exercise courtesy David Stevenson.

14

15.3

Nonlinear Shallow Water Waves and Solitons

In recent decades, solitons or solitary waves have been studied intensively in many different areas of physics. However, fluid dynamicists became familiar with them in the nineteenth century. In an oft-quoted pasage, John Scott-Russell (1844) described how he was riding along a narrow canal and watched a boat stop abruptly. This deceleration launched a single smooth pulse of water which he followed on horseback for one or two miles, observing it “rolling on a rate of some eight or nine miles an hour, preserving its original figure some thirty feet long and a foot to a foot and a half in height”. This was a soliton – a one dimensional, nonlinear wave with fixed profile traveling with constant speed. Solitons can be observed fairly readily when gravity waves are produced in shallow, narrow channels. We shall use the particular example of a shallow, nonlinear gravity wave to illustrate solitons in general.

15.3.1

Korteweg-de Vries (KdV) Equation

The key to a soliton’s behavior is a robust balance between the effects of dispersion and the effects of nonlinearity. When one grafts these two effects onto the wave equation for shallow water waves, then at leading order in the strengths of the dispersion and nonlinearity one gets the Korteweg-de Vries (KdV) equation for solitons. Since a completely rigorous derivation of the KdV equation is quite lengthy, we shall content ourselves with a somewhat heuristic derivation that is based on this grafting process, and is designed to emphasize the equation’s physical content. We choose as the dependent variable in our wave equation the height ξ of the water’s surface above its quiescent position, and we confine ourselves to a plane wave that propagates in the horizontal x direction so ξ = ξ(x, t). In√the limit of very weak waves, ξ(x, t) is governed by the shallow-water dispersion relation ω = gho k, where ho is the depth of the quiescent water. This dispersion relation implies that ξ(x, t) must satisfy the following elementary wave equation: p p ∂ ∂2ξ ∂ ∂ ∂ ∂2ξ ξ. (15.20) − gho + gho 0 = 2 − gho 2 = ∂t ∂x ∂t ∂x ∂t ∂x In the second expression, we have factored the wave operator into two pieces, one that governs waves propagating rightward, and the other leftward. To simplify our derivation and the final wave equation, we shall confine ourselves to rightward propagating waves, and correspondingly we can simply remove the left-propagation operator from the wave equation, obtaining ∂ξ p ∂ξ + gho =0. (15.21) ∂t ∂x (Leftward propagating waves are described by this same equation with a change of sign.) We now graft the effects of dispersion onto this rightward wave equation. √ The dispersion relation, including the effects of dispersion at leading order, is ω = gho k(1 − 16 k 2 h2o ) [Eq. (15.11)]. Now, this dispersion relation ought to be derivable by assuming a variation ξ ∝ exp[i(kx − ωt)] and substituting into a generalization of Eq. (15.21) with corrections

15 that take account of the finite depth of the channel. We will take a short cut and reverse this process to obtain the generalization of Eq. (15.21) from the dispersion relation. The result is 1p ∂ξ p ∂ξ ∂3ξ + gho =− (15.22) gho h2o 3 , ∂t ∂x 6 ∂x as a direct calculation confirms. This is the “linearized KdV equation”. It incorporates weak dispersion associated with the finite depth of the channel but is still a linear equation, only useful for small-amplitude waves. Now let us set aside the dispersive correction and tackle nonlinearity. For this purpose we return to first principles for waves in very shallow water. Let the height of the surface above the lake bottom be h = ho + ξ. Since the water is very shallow, the horizontal velocity, v ≡ vx , is almost independent of depth (aside from the boundary layer which we ignore); cf. the discussion following Eq. (15.11). The flux of water mass, per unit width of channel, is therefore ρhv and the mass per unit width is ρh. The law of mass conservation therefore takes the form ∂h ∂(hv) + =0, (15.23a) ∂t ∂x where we have canceled the constant density. This equation contains a nonlinearity in the product hv. A second nonlinear equation for h and v can be obtained from the x component of the inviscid Navier-Stokes equation ∂v/∂t + v∂v/∂x = −(1/ρ)∂p/∂x, with p determined by the weight of the overlying water, p = gρ[h(x) − z]: ∂v ∂h ∂v +v +g =0. (15.23b) ∂t ∂x ∂x Equations (15.23a) and (15.23b) can be combined to obtain √ √ p ∂ v − 2 gh ∂ v − 2 gh + v − gh =0. (15.23c) ∂t ∂x √ This equation shows that√the quantity v − 2 gh is constant along characteristics that propagate with speed v − gh. (This constant quantity is a special case of a “Riemann invariant”, a concept that we shall study in Chap. 16.) When, as we shall require below, the nonlinearites are modest so h does not differ greatly from ho , these characteristics propagate leftward, which implies that for rightward propagating waves they begin at early times √ in undisturbed fluid where v = 0 and h = ho . Therefore, the constant value of v − 2 gh is √ −2 gho , and correspondingly in regions of disturbed fluid p p gh − gho . (15.24) v=2

Substituting this into Eq. (15.23a), we obtain p ∂h ∂h p + 3 gh − 2 gho =0. (15.25) ∂t ∂x We next substitute ξ = h − ho and expand to second order in ξ to obtain the final form of our wave equation with nonlinearities but no dispersion: r 3ξ g ∂ξ ∂ξ ∂ξ p + gho =− , (15.26) ∂t ∂x 2 ho ∂x

16 ξ

ξ

-2

-1

ξ

1

1

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2 0

1

2

χ

-2

-1

1

0.2 1

0

2

χ

-2

-1

0

1

2

χ

Fig. 15.4: Steepening of a Gaussian wave profile by the nonlinear term in the KdV equation. The increase of wave speed with amplitude causes the leading part of the profile to steepen with time and the trailing part to flatten. In the full KdV equation, this effect can be balanced by the effect of dispersion, which causes the high-frequency Fourier components in the wave to travel slightly slower than the low-frequency components. This allows stable solitons to form.

where the term on the right hand side is the nonlinear correction. We now have separate dispersive corrections (15.22) and nonlinear corrections (15.26) to the rightward wave equation (15.21). Combining the two corrections into a single equation, we obtain 3ξ ∂ξ ∂ξ p h2o ∂ 3 ξ =0. (15.27) + gho 1 + + ∂t 2ho ∂x 6 ∂x3 Finally, we substitute χ≡x−

p

gho t

(15.28)

to transform into a frame moving rightward with the speed of small-amplitude gravity waves. The result is the full Korteweg-deVries or KdV equation: ∂ξ 3 + ∂t 2

15.3.2

r

g ho

∂ξ 1 3 ∂3ξ ξ + h =0. ∂χ 9 o ∂χ3

(15.29)

Physical Effects in the KdV Equation

Before exploring solutions to the KdV equation (15.29), let us consider the physical effects of its nonlinear and dispersive terms. The second, nonlinear term derives from the nonlinearity in the (v · ∇)v term of the Navier-Stokes equation. The effect of this nonlinearity is to steepen the leading edge of a wave profile and flatten the trailing edge (Fig. 15.4.) Another way to understand the effect of this term is to regard it as a nonlinear coupling of linear waves. Since it is nonlinear in the wave amplitude, it can couple together waves with different wave numbers k. For example if we have a purely sinusoidal wave ∝ exp(ikx), then this nonlinearity will lead to the growth of a first harmonic ∝ exp(2ikx). Similarly, when two linear waves with spatial frequencies k, k ′ are superposed, this term will describe the production of new waves at the sum and difference spatial frequencies. We have already met such wave-wave coupling in our study of nonlinear optics (Chap. 9), and in the route to turbulence for rotating Couette flow (Fig. 14.12), and we shall meet it again in nonlinear plasma physics (Chap. 22).

17

(x1,t)

Stable Solitons

dispersing waves

x

x

(b)

(a)

Fig. 15.5: Production of stable solitons out of an irregular initial wave profile.

The third term in (15.29) is linear and is responsible for a weak dispersion of the wave. The higher-frequency Fourier components travel with slower phase velocities than lowerfrequency components. This has two effects. One is an overall spreading of a wave in a manner qualitatively familiar from elementary quantum mechanics; cf. Ex. 6.2. For example, in a Gaussian wave packet with width ∆x, the range of wave numbers k contributing significantly to the profile is ∆k ∼ 1/∆x. The spread in the group velocity is then ∼ ∆k ∂ 2 ω/∂k 2 ∼ (gho)1/2 h2o k∆k [cf. Eq. (15.11)]. The wave packet will then double in size in a time 2 1 ∆x ∆x √ ∼ . (15.30) tspread ∼ ∆vg ho k gho The second effect is that since the high-frequency components travel somewhat slower than the low-frequency components, there will be a tendency for the profile to become asymmetric with the trailing edge steeper than the leading edge. Given the opposite effects of these two corrections (nonlinearity makes the wave’s leading edge steeper; dispersion reduces its steepness), it should not be too surprising in hindsight that it is possible to find solutions to the KdV equation with constant profile, in which nonlinearity balances dispersion. What is quite surprising, though, is that these solutions, called solitons, are very robust and arise naturally out of random initial data. That is to say, if we solve an initial value problem numerically starting with several peaks of random shape and size, then although much of the wave will spread and disappear due to dispersion, we will typically be left with several smooth soliton solutions, as in Fig. 15.5.

15.3.3

Single-Soliton Solution

We can discard some unnecessary algebraic luggage in the KdV equation (15.29) by transforming both independent variables using the substitutions ξ , ζ= ho

3χ η= , ho

9 τ= 2

r

g t. ho

(15.31)

18 χ

1/2

0

(g h o)1/2 [1 + (

o

2ho

)]

χ Fig. 15.6: Profile of the single-soliton solution (15.33), (15.31) of the KdV equation. The width χ1/2 is inversely proportional to the square root of the peak height ξo .

The KdV equation then becomes ∂ζ ∂ζ ∂3ζ +ζ + 3 =0. ∂τ ∂η ∂η

(15.32)

There are well understood mathematical techniques3 for solving equations like the KdV equation. However, we shall just quote solutions and explore their properties. The simplest solution to the dimensionless KdV equation (15.32) is ζ = ζ0 sech2

"

ζ0 12

1/2 # 1 η − ζ0 τ . 3

(15.33)

This solution describes a one-parameter family of stable solitons. For each such soliton (each ζ0 ), the soliton maintains its shape while propagating at speed dη/dτ = ζ0 /3 relative to a weak wave. By transforming to the rest frame of the unperturbed water using Eqs. (15.31) and (15.28), we find for the soliton’s speed there; dx p ξo . (15.34) = gho 1 + dt 2ho The first term is the propagation speed of a weak (linear) wave. The second term is the nonlinear correction, proportional to the wave amplitude ξo . The half width of the wave may be defined by setting the argument of the hyperbolic secant to unity: 3 1/2 4ho χ1/2 = . (15.35) 3ξo The larger the wave amplitude, the narrower its length and the faster it propagates; cf. Fig. 15.6. 3

See, for example, Whitham (1974).

19 τ=−9

ζ 6 5 4 3 2 1

-25 -20 -15 -10 -5 η

ζ 6 5 4 3 2 1

-10 -5

ζ

τ=0

0 η

5

10

6 5 4 3 2 1

τ=9

5

10

η

15

20

25

Fig. 15.7: Two-Soliton solution to the dimensionless KdV equation (15.32). This solution describes two waves well separated for τ → −∞ that coalesce and then separate producing the original two waves in reverse order as τ → +∞. The notation is that of Eq. (15.36); the values of the parameters in that equation are η1 = η2 = 0 (so the solitons will be merged at time η = 0), α1 = 1, α2 = 1.4.

Let us return to Scott-Russell’s soliton. Converting to SI units, the speed was about 4 m s giving an estimate of the depth of the canal as ho ∼ 1.6 m. Using the width χ1/2 ∼ 5 m, we obtain a peak height ξo ∼ 0.25 m, somewhat smaller than quoted but within the errors allowing for the uncertainty in the definition of the width and an (appropriate) element of hyperbole in the account. −1

15.3.4

Two-Soliton Solution

One of the most fascinating properties of solitons is the way that two or more waves interact. The expectation, derived from physics experience with weakly coupled normal modes, might be that if we have two well separated solitons propagating in the same direction with the larger wave chasing the smaller wave, then the larger will eventually catch up with the smaller and nonlinear interactions between the two waves will essentially destroy both, leaving behind a single, irregular pulse which will spread and decay after the interaction. However, this is not what happens. Instead, the two waves pass through each other unscathed and unchanged, except that they emerge from the interaction a bit sooner than they would have had they moved with their original speeds during the interaction. See Fig. 15.7. We shall not pause to explain why the two waves survive unscathed, save to remark that there are topological invariants in the solution which must be preserved. However, we can exhibit one such twosoliton solution analytically: ∂2 [12 ln F (η, τ )] , ∂η 2 2 α2 − α1 f1 f2 , where F = 1 + f1 + f2 + α2 + α1 and fi = exp[−αi (η − ηi ) + αi3 τ ] ; ζ =

here αi and ηi are constants. This solution is depicted in Fig. 15.7.

(15.36)

20

15.3.5

Solitons in Contemporary Physics

Solitons were re-discovered in the 1960’s when they were found in numerical plasma simulations. Their topological properties were soon discovered and general methods to generate solutions were derived. Solitons have been isolated in such different subjects as the propagation of magnetic flux in a Josephson junction, elastic waves in anharmonic crystals, quantum field theory (as instantons) and classical general relativity (as solitary, nonlinear gravitational waves). Most classical solitons are solutions to one of a relatively small number of nonlinear ordinary differential equations, including the KdV equation, Burgers’ equation and the sine-Gordon equation. Unfortunately it has proved difficult to generalize these equations and their soliton solutions to two and three spatial dimensions. Just like research into chaos, studies of solitons have taught physicists that nonlinearity need not lead to maximal disorder in physical systems, but instead can create surprisingly stable, ordered structures. **************************** EXERCISES Exercise 15.8 Example: Breaking of a Dam Consider the flow of water along a horizontal channel of constant width after a dam breaks. Sometime after the initial transients have died away, the flow may be described by the nonlinear shallow wave equations (15.23): ∂h ∂(hv) + =0, ∂t ∂x

∂v ∂v ∂h +v +g =0. ∂t ∂x ∂x

(15.37)

Here h is the height of the flow, v is the horizontal speed of the flow and x is distance along the channel measured from the location of the dam. Solve for the flow assuming that initially (at t = 0) h = ho for x < 0 and h = 0 for x > 0 (no water). Your solution should have the form shown in Fig. 15.8. What is the speed of the front of the water? √ [Hints: Note that from the parameters of the problem we can construct only one velocity, gho and no length except ho . It√therefore is a reasonable guess that the solution has the self-similar form ˜ ˜ and v˜ are dimensionless functions of the similarity h = ho h(ξ), v = gho v˜(ξ), where h variable x/t . (15.38) ξ=√ gho Using this ansatz, convert the partial differential equations (15.37) into a pair of ordinary differential equations which can be solved so as to satisfy the initial conditions.] Exercise 15.9 Derivation: Single-Soliton Solution Verify that expression (15.33) does indeed satisfy the dimensionless KdV equation (15.32). Exercise 15.10 Derivation: Two-Soliton Solution

21

ho

3 2 1 t=0

h

x Fig. 15.8: The water’s height h(x, t) after a dam breaks.

(a) Verify, using symbolic-manipulation computer software (e.g., Macsyma, Maple or Mathematica) that the two-soliton expression (15.36) satisfies the dimensionless KdV equation. (Warning: Considerable algebraic travail is required to verify this by hand, directly.) (b) Verify analytically that the two-soliton solution (15.36) has the properties claimed in the text: First consider the solution at early times in the spatial region where f1 ∼ 1, f2 ≪ 1. Show that the solution is approximately that of the single-soliton described by Eq. (15.33). Demonstrate that the amplitude is ζ01 = 3α12 and find the location of its peak. Repeat the exercise for the second wave and for late times. (c) Use a computer to follow, numerically, the evolution of this two-soliton solution as time η passes (thereby filling in timesteps between those shown in Fig. 15.7).

****************************

15.4

Rossby Waves in a Rotating Fluid

In a nearly rigidly rotating fluid, the Coriolis effect (observed in a co-rotating reference frame; Sec. 13.5) provides the restoring force for an unusual type of wave motion called “Rossby waves.” These waves are seen in the Earth’s oceans and atmosphere. For a simple example, we consider the sea above a sloping seabed; Fig. 15.9. We assume the unperturbed fluid has vanishing velocity v = 0 in the Earth’s rotating frame, and we study weak waves in the sea with oscillating velocity v. (Since the fluid is at rest in the equilibrium state about which we are perturbing, we write the perturbed velocity as v rather than δv.) We assume that the wavelengths are long enough that viscosity is negligible. We shall also, in this case, restrict attention to small-amplitude waves so that nonlinear terms can

22 be dropped from our dynamical equations. The perturbed Navier-Stokes equation (13.54a) then becomes (after linearization) −∇δP ′ ∂v + 2Ω × v = . ∂t ρ

(15.39)

Here, as in Sec. 13.5, δP ′ is the perturbation in the effective pressure [which includes gravitational and centrifugal effects, P ′ = P + ρΦ − 12 ρ(Ω × x)2 ]. Taking the curl of Eq. (15.39), we obtain for the time derivative of the waves’ vorticity ∂ω = 2(Ω · ∇)v . (15.40) ∂t We seek a wave mode in which the horizontal fluid velocity oscillates in the x direction, vx , vy ∝ exp[i(kx − ωt)], and is independent of z in accord with the Taylor-Proudman theorem (Sec. 13.5.3): ∂vy ∂vx = =0. (15.41) vx and vy ∝ exp[i(kx − ωt)] , ∂z ∂z The only allowed vertical variation is in the vertical velocity vz , and differentiating ∇ · v = 0 with respect to z, we obtain ∂ 2 vz =0. (15.42) ∂z 2 The vertical velocity therefore varies linearly between the surface and the sea floor. Now, one boundary condition is that the vertical velocity must vanish at the surface. The other is that, at the seafloor z = −h, we must have vz (−h) = −αvy (x), where α is the tangent of the angle of inclination of the sea floor. The solution to Eq. (15.42) satisfying these boundary conditions is αz vz = vy . (15.43) h Taking the vertical component of Eq. (15.40) and evaluating ωz = vy,x − vx,y = ikvy , we obtain ∂vz 2Ωαvy ωkvy = 2Ω = . (15.44) ∂z h The dispersion relation therefore has the quite unusual form ωk =

h

z y

Fig. 15.9: Geometry of ocean for Rossby waves.

2Ωα . h

(15.45)

23

Fig. 15.10: Rossby Waves in a Rotating cylinder with sloping bottom.

Rossby waves have interesting properties: They can only propagate in one direction— parallel to the intersection of the sea floor with the horizontal (our ex direction). Their phase velocity Vph and group velocity Vg are equal in magnitude but in opposite directions, Vph = −Vg =

2Ωα ex . k2 h

(15.46)

If we use ∇ · v = 0, we discover that the two components of horizontal velocity are in quadrature, vx = iαvy /kh. This means that, when seen from above, the fluid circulates with the opposite sense to the angular velocity Ω. Rossby waves plays an important role in the circulation of the earth’s oceans; see, e.g., Chelton and Schlax (1996). A variant of these Rossby waves in air can be seen as undulations in the atmosphere’s jet stream produced when the stream goes over a sloping terrain such as that of the Rocky Mountains; and another variant in neutron stars generates gravitational waves (ripples of spacetime curvature) that are a promising source for ground-based detectors such as LIGO. **************************** EXERCISES Exercise 15.11 Example: Rossby Waves in a Cylndrical Tank with Sloping Bottom In the film Rotating Fluids by David Fultz (1969), about 20 minutes 40 seconds into the film, an experiment is described in which Rossby waves are excited in a rotating cylindrical tank with inner and outer vertical walls and a sloping bottom. Figure 15.11(a) is a photograph of the tank from the side, showing its bottom which slopes upward toward the center, and a bump on the bottom which generates the Rossby waves. The tank is filled with water, then set into rotation with an angular velocity Ω; the water is given time to settle down into rigid rotation with the cylinder. Then the cylinder’s angular velocity is reduced by a small amount, so the water is rotating at angular velocity ∆Ω ≪ Ω relative to the cylinder. As the water passes over the hump on the tank bottom, the hump generates Rossby waves. Those waves are made visible by injecting dye at a fixed radius, through a syringe attached to the tank. Figure 15.11(b) is a photograph of the dye trace as seen looking down on the tank

24 from above. If there were no Rossby waves present, the trace would be circular. The Rossby waves make it pentagonal. In this exercise you will work out the details of the Rossby waves, explore their physics, and explain the shape of the trace. Because the slope of the bottom is cylindrical rather than planar, this is somewhat different from the situation in the text (Fig. 15.9). However, we can deduce the details of the waves in this cylindrical case from those for the planar case by geometric optics considerations, making modest errors because the wavelength of the waves is not all that small compared to the circumference around the tank. (a) Show that the rays along which the waves propagate are circles centered on the tank’s symmetry axis. (b) Focus on the ray that is half way between the inner and outer walls of the tank. Let its radius be a and the depth of the water there be h, and the slope angle of the tank floor be α. Introduce quasi-Cartesian coordinates x = aφ, y = −̟, where {̟, φ, z} are cylindrical coordinates. By translating the Cartesian-coordinate waves of the text into quasi-Cartesian coordinates and noting that five wavelengths must fit into the circumference around the cylinder, show that the velocity field has the form v̟ , vφ , vz ∝ ei(10πφ+ωt) and deduce the ratios of the three components of velocity to each other. This solution has nonzero radial velocity at the walls — a warning that edge effects will modify the waves somewhat. This analysis ignores those edge effects. (c) Because the waves are generated by the ridge on the bottom of the tank, the wave pattern must remain at rest relative to that ridge, which means it must rotate relative to the fluid’s frame with the angular velocity dφ/dt = −∆Ω. From the waves’ dispersion relation deduce ∆Ω/Ω, the fractional slowdown of the tank that had to be imposed, in order to generate the observed pentagonal wave. (d) Compute the displacement field δx(̟, φ, z, t) of a fluid element whose undisplaced location (in the rigidly rotating cylindrical coordinates) is (̟, φ, z). Explain the pentagonal shape of the movie’s dye lines in terms of this displacement field. (e) Compute the wave’s vertical vorticity field ωz (relative to the rigidly rotating flow), and show that as a fluid element moves, and the vertical vortex line through it shortens or lengths due to the changing water depth, ωz changes proportionally to the vortex line’s length (as it must).

****************************

15.5

Sound Waves

So far, our discussion of fluid dynamics has mostly been concerned with flows sufficiently slow that the density can be treated as constant. We now introduce the effects of compressibility by discussing sound waves (in a non-rotating reference frame). Sound waves are prototypical

25 scalar waves and therefore are simpler in many respects than vector electromagnetic waves and tensor gravitational waves. For a sound wave in the small-amplitude limit, we linearize the Euler and continuity (mass conservation) equations to obtain ρ

∂v = −∇δP , ∂t

(15.47a)

∂δρ = −ρ∇ · v . (15.47b) ∂t As the flow is irrotational (vanishing vorticity before the wave arrives implies vanishing vorticity as it is passing), we can write the fluid velocity as the gradient of a velocity potential: v = ∇ψ .

(15.48a)

Inserting this into Eq. (15.47a) and integrating spatially (with ρ regarded as constant since its perturbation would give a second-order term), we obtain δP = −ρ

∂ψ . ∂t

(15.48b)

Setting δρ = (∂ρ/∂P )s δP in Eq. (15.47b) (with the derivative performed at constant entropy s because there generally will not be enough time in a single wave period for heat to cross a wavelength), and expressing δP and ∇v in terms of the velocity potential [Eqs. (15.48a) and (15.48b)], we obtain the dispersion-free wave equation ∇2 ψ =

1 ∂2ψ . c2 ∂t2

(15.48c)

1/2

(15.48d)

Here c=

∂P ∂ρ

s

is the adiabatic sound speed. For a perfect gas its value is c = (γP/ρ)1/2 where γ is the ratio of specific heats. The sound speed in air at 20◦ C is 340m s−1 . In water under atmospheric conditions, it is about 1.5km s−1 (not much different from sound speeds in solids). The general solution of the wave equation (15.48c) for plane sound waves propagating in the ±x directions is ψ = f1 (x − ct) + f2 (x + ct) , (15.49) where f1 , f2 are arbitrary functions.

15.5.1

Wave Energy

We shall use sound waves to illustrate how waves carry energy. The fluid’s energy density is U = ( 21 v 2 + u)ρ [Table 12.1 with Φ = 0]. The first term is the fluid’s kinetic energy;

26 the second, its internal energy. The internal energy density can be evaluated by a Taylor expansion in the wave’s density perturbation: 2 1 ∂ (uρ) ∂(uρ) δρ + δρ2 (15.50) uρ = [uρ] + 2 ∂ρ 2 ∂ρ s s where the three coefficients in brackets [] are evaluated at the equilibrium density. The first term in Eq. (15.50) is the energy of the background fluid, so we shall drop it. The second term will average to zero over a wave period, so we shall also drop it. The third term can be simplified using the first law of thermodynamics in the form du = T ds − P d(1/ρ) (which implies [∂(uρ)/∂ρ]s = u + P/ρ), followed by the definition h = u + P/ρ of enthalpy density, followed by the first law in the form dh = T ds + dP/ρ, followed by expression (15.48d) for the speed of sound. The result is 2 c2 ∂ (uρ) ∂h = = . (15.51) ∂ρ2 ∂ρ s ρ s Inserting this into the third term of (15.50) and averaging over a wave period and wavelength, we obtain for the wave energy per unit volume " 2 # 1 1 2 c2 2 1 ∂ψ (15.52) ε = ρv + δρ = ρ (∇ψ)2 + 2 = ρ(∇ψ)2 . 2 2ρ 2 c ∂t In the second equality we have used v = ∇ψ [Eq. (15.48a)] and δρ = (ρ/c2 )∂ψ/∂t [from δρ = (∂ρ/∂P )s δP = δP/c2 and Eq. (15.48b)]; the third equality can be deduced by multiplying the wave equation (15.48c) by ψ and averaging. Thus, there is equipartition of energy between the kinetic and internal energy terms. The energy flux is F = ( 12 v 2 + h)ρv [Table 12.1 with Φ = 0]. The kinetic energy flux (first term) is third order in the velocity perturbation and therefore vanishes on average. For a sound wave, the internal energy flux (second term) can be brought into a more useful form by expanding the enthalpy per unit mass: δP ∂h δP = [h] + . (15.53) h = [h] + ∂P s ρ Here we have used the first law of thermodynamics dh = T ds + (1/ρ)dP and adiabaticity of the perturbation, s =constant; and the terms in square brackets are unperturbed quantities. Inserting this into F = hρv and expressing δP and v in terms of the velocity potential [Eqs. (15.48a) and (15.48b)], and averaging over a wave period and wavelength, we obtain for the energy flux ∂ψ F = ρhv = δP v = −ρ ∇ψ . (15.54) ∂t This equation and Eq. (15.52) are a special case of the scalar-wave energy flux and energy density discussed in Sec. 6.3.1 and Ex. 6.9 [Eqs. (6.18)]. For a locally plane wave with ψ = ψo cos(k · x − ωt + ϕ) (where ϕ is an arbitrary phase), the energy density (15.52) is ε = 21 ρψo2 k 2 , and the energy flux (15.54) is F = 12 ρψo2 ωk. Since,

27 ˆ = ck ˆ for this dispersion-free wave, the phase and group velocities are both V = (ω/k)k ˆ (where k = k/k is the unit vector pointing in the wave-propagation direction), the energy density and flux are related by ˆ. F = εV = εck (15.55) The energy flux is therefore the product of the energy density and the wave velocity, as we might have anticipated. When studying dispersive waves in plasmas (Chaps. 20 and 22) we shall return to the issue of energy transport, and shall see that just as information in waves is carried at the group velocity, not the phase velocity, so energy is also carried at the group velocity. In Sec. 6.3.1 we used the above equations for the sound-wave energy density ε (there denoted u) and flux F to illustrate, via geometric-optics considerations, the behavior of wave energy in an inhomogeneous, time-varying medium. The energy flux carried by sound is conventionally measured in dB (decibels). The flux in decibels, FdB , is related to the flux F in W m−2 by FdB = 120 + 10 log10 (F ) .

(15.56)

Sound that is barely audible is about 1 dB. Normal conversation is about 50-60 dB. Jet aircraft and rock concerts can cause exposure to more than 120 dB with consequent damage to the ear.

15.5.2

Sound Generation

So far in this book, we have been concerned with describing how different types of waves propagate. It is also important to understand how they are emitted. We now outline some aspects of the theory of sound generation. The reader should be familiar with the theory of electromagnetic wave emission. There, one considers a localised region containing moving charges and consequently variable currents. The source can be described as a sum over electric and magnetic multipoles, and each multipole in the source produces a characteristic angular variation of the distant radiation field. The radiation-field amplitude decays inversely with distance from the source and so the Poynting flux varies with the inverse square of the distance. Integrating over a large sphere gives the total power radiated by the source, broken down into the power radiated by each multipolar component. When the reduced wavelength λ ¯ = 1/k of the waves is small compared to the source (a situation referred to as slow motion since the source’s charges then generally move slowly compared to the speed of light), the most powerful radiating multipole is the electric dipole d(t). The dipole’s average emitted power is given by the Larmor formula ¨2 d P= , (15.57) 6πǫ0 c3 ¨ is the second time derivative of d, the bar denotes a time average, and c is the speed where d of light, not sound. This same procedure can be followed when describing sound generation. However, as we are dealing with a scalar wave, sound can have a monopolar source. As an pedagogical

28 example, let us set a small, spherical, elastic ball, surrounded by fluid, into radial oscillation (not necessarily sinusoidal) with oscillation frequencies of order ω, so the emitted waves have reduced wavelengths of order λ = c/ω. Let the surface of the ball have radius a + ξ(t), and impose the slow-motion and small-amplitude conditions that λ ¯ ≫ a ≫ |ξ| .

(15.58)

As the waves will be spherical, the relevant outgoing-wave solution of the wave equation (15.48c) is f (t − r/c) , (15.59) ψ= r where f is a function to be determined. Since the fluid’s velocity at the ball’s surface must match that of the ball, we have (to first order in v and ψ) ˙ r = v(a, t) = ∇ψ ≃ − f (t − a/c) er ≃ − f (t) er , ξe a2 a2

(15.60)

where in the third equality we have used the slow-motion condition. Solving for f (t) and inserting into Eq. (15.59), we see that ψ(r, t) = −

˙ − r/c) a2 ξ(t . r

(15.61)

It is customary to express the radial velocity perturbation v in terms of an oscillating fluid monopole moment q = 4πρa2 ξ˙ . (15.62) Physically this is the total radial discharge of air mass (i.e. mass per unit time) crossing an imaginary fixed spherical surface of radius slightly larger than that of the oscillating ball. In ˙ = q(t)/4πρa2 . Using this and Eq. (15.61), we compute for the power terms of q we have ξ(t) radiated as sound waves [Eq. (15.54) integrated over a sphere centered on the ball] P=

q˙2 . 4πρc

(15.63)

Note that the power is inversely proportional to the signal speed. This is characteristic of monopolar emission and in contrast to the inverse cube variation for dipolar emission [Eq. (15.57)]. The emission of monopolar waves requires that the volume of the emitting solid body oscillate. When the solid simply oscillates without changing its volume, for example the reed on a musical instrument, dipolar emission should generally dominate. We can think of this as two monopoles of size a in antiphase separated by some displacement b ∼ a. The velocity potential in the far field is then the sum of two monopolar contributions, which almost cancel. Making a Taylor expansion, we obtain b ωb ψdipole ∼ ∼ , ψmonopole λ ¯ c

(15.64)

29 where ω and λ ¯ are the characteristic magnitudes of the angular frequency and reduced wavelength of the waves (which we have not assumed to be precisely sinusoidal). This reduction of ψ by the slow-motion factor ωb/c implies that the dipolar power emission is weaker than monopolar power by a factor ∼ (ωb/c)2 for similar frequencies and amplitudes of motion. However, to emit dipole radiation, momentum must be given to and removed from the fluid. In other words the fluid must be forced by a solid body. In the absence of such a solid body, the lowest multipole that can be radiated effectively is quadrupolar radiation, which is weaker by yet one more factor of (ωb/c)2 . These considerations are important to understanding how noise is produced by the intense turbulence created by jet engines, especially close to airports. We expect that the sound emitted by the free turbulence in the wake just behind the engine will be quadrupolar and will be dominated by emission from the largest (and hence fastest) turbulent eddies. [See the discussion of turbulent eddies in Sec. 14.3.4.] Denote by ℓ and vℓ the size and turnover speed of these largest eddies. Then the characteristic size of the sound’s source will be a ∼ b ∼ ℓ, the mass discharge will be q ∼ ρℓ2 vℓ , the characteristic frequency will be ω ∼ vℓ /ℓ, the reduced wavelength of the sound waves will be λ ¯ = c/ω ∼ ℓc/vℓ , and the slow-motion parameter will be b/¯ λ ∼ ωb/c ∼ vℓ /c. The quadrupolar power radiated per unit volume [Eq. (15.63) divided by the volume ℓ3 of an eddy and reduced by ∼ (bω/c)4 ] will therefore be dP vℓ3 vℓ 5 ∼ρ , (15.65) d3 x ℓ c and this power will be concentrated around frequency ω ∼ vℓ /ℓ. For air of fixed sound speed and length scale, and for which the largest eddy speed is proportional to some characteristic speed V (e.g. the average speed of the air leaving the engine), the sound generation increases proportional to the eighth power of the Mach number M = V /c. This is known as Lighthill’s law. The implications for the design of jet engines should be obvious.

15.5.3

T2 Radiation Reaction, Runaway Solutions, and Matched Asymptotic Expansions4

Let us return to our idealized example of sound waves produced by a radially oscillating, spherical ball. We shall use this example to illustrate several deep issues in theoretical physics: the radiation-reaction force that acts back on a source due to its emission of radiation, a spurious runaway solution to the source’s equation of motion caused by the radiation-reaction force, and matched asymptotic expansions, a mathematical technique for solving field equations when there are two different regions of space in which the equations have rather different behaviors. We shall meet these concepts again, in a rather more complicated way, in Chap. 26, when studying the radiation reaction force caused by emission of gravitational waves. For our oscillating ball, the two different regions of space that we shall match to each other are the near zone, r ≪ λ ¯ , and the wave zone, r & λ ¯. We consider, first, the near zone, and we redo, from a new point of view, the analysis of the matching of the near-zone fluid velocity to the ball’s surface velocity and the computation 4

Our treatment is based on Burke (1970).

30 of the pressure perturbation. Because the region near the ball is small compared to λ ¯ and the fluid speeds are small compared to c, the flow is very nearly incompressible ∇ · v = ∇2 ψ = 0; cf. the discussion of conditions for incompressibility in Sec. 12.6. [The near-zone equation ∇2 ψ = 0 is analogous to ∇2 Φ = 0 for the Newtonian gravitational potential in the weakgravity near zone of a gravitational-wave source (Chap. 26).] The general monopolar (spherical) solution to ∇2 ψ = 0 is ψ=

A(t) + B(t) . r

(15.66)

Matching the fluid’s radial velocity v = ∂ψ/∂r = −A/r 2 at r = a to the ball’s radial velocity ˙ we obtain ξ, ˙ . A(t) = −a2 ξ(t) (15.67) From the point of view of near-zone physics there is no mechanism for generating a nonzero spatially constant term B(t) in ψ [Eq. (15.66)], so if one were unaware of the emitted waves and their action back on the source, one would be inclined to set this B(t) to zero. [This is analogous to a Newtonian physicist who would be inclined to write the quadrupolar contribution to an axisymmetric source’s external gravitational field in the form Φ = P2 (cos θ)[A(t)r −3 + B(t)r 2 ] and then, being unaware of gravitational waves and their action back on the source, would set B(t) to zero; see Chap. 26]. Taking this near-zone point of view, with B = 0, we infer that the fluid’s pressure perturbation acting on the ball’s surface is ∂ψ(a, t) A˙ δP = −ρ = −ρ = ρaξ¨ . (15.68) ∂t a The motion ξ(t) of the ball’s surface is controlled by the elastic restoring forces in its interior and the fluid pressure perturbation δP on its surface. In the absence of δP the surface would oscillate sinusoidally with some angular frequency ωo , so ξ¨ + ωo2 ξ = 0. The pressure will modify this to m(ξ¨ + ωo2 ξ) = −4πa2 δP , (15.69) where m is an effective mass, roughly equal to the ball’s true mass, and the right hand side is the integral of the radial component of the pressure perturbation force over the sphere’s surface. Inserting the near-zone viewpoint’s pressure perturbation (15.68), we obtain (m + 4πa3 ρ)ξ¨ + mωo2 ξ = 0 .

(15.70)

Evidently, the fluid increases the ball’s effective inertial mass (it loads additional mass onto the ball), and thereby reduces its frequency of oscillation to ω=√

ωo , 1+κ

where κ =

4πa3 ρ m

(15.71)

is a measure of the coupling strength between the ball and the fluid. In terms of this loaded frequency the equation of motion becomes ξ¨ + ω 2 ξ = 0 .

(15.72)

31 This near-zone viewpoint is not quite correct, just as the standard Newtonian viewpoint is not quite correct for the near-zone gravity of a gravitational-wave source (Chap. 26). To improve on this viewpoint, we temporarily move out into the wave zone and identify the general, outgoing-wave solution to the sound wave equation, ψ=

f (t − ǫr/c) r

(15.73)

[Eq. (15.59)]. Here f is a function to be determined by matching to the near zone, and ǫ is a parameter that has been inserted to trace the influence of the outgoing-wave boundary condition. For outgoing waves (the real, physical, situation), ǫ = +1; if the waves were ingoing, we would have ǫ = −1. This wave-zone solution remains valid down into the near zone. In the near zone we can perform a slow-motion expansion to bring it into the same form as the near-zone velocity potential (15.66): f (t) f˙(t) ψ= −ǫ + ... . (15.74) r c The second term is sensitive to whether the waves are outgoing or ingoing and thus must ultimately be responsible for the radiation reaction force that acts back on the oscillating ball; for this reason we will call it the radiation-reaction potential. Equating the first term of this ψ to the first term of (15.66) and using the value (15.67) of A(t) obtained by matching the fluid velocity to the ball velocity, we obtain ˙ . f (t) = A(t) = −a2 ξ(t)

(15.75)

This equation tells us that the wave field f (t − r/c)/r generated by the ball’s surface dis˙ − r/c)/r [Eq. (15.61)] — the result we derived more placement ξ(t) is given by ψ = −a2 ξ(t quickly in the previous section. We can regard Eq. (15.75) as matching the near-zone solution outward onto the wave-zone solution to determine the wave field as a function of the source’s motion. Equating the second term of Eq. (15.74) to the second term of the near-zone velocity potential (15.66) we obtain f˙(t) a2 ¨ B(t) = −ǫ = ǫ ξ(t) . (15.76) c c This is the term in the near-zone velocity potential ψ = A/r + B that will be responsible for radiation reaction. We can regard this radiation reaction potential ψ RR = B(t) as having been generated by matching the wave zone’s outgoing (or ingoing) wave field back into the near zone. This pair of matchings, outward then inward, is a special, almost trivial example of the technique of matched asymptotic expansions — a technique developed by applied mathematicians to deal with much more complicated matching problems than this one (see e.g. Cole, 1968). ¨ gives rise to a radiationThe radiation-reaction potential ψ RR = B(t) = ǫ(a2 /c)ξ(t) ... reaction contribution to the pressure on the ball’s surface δP RR = −ρψ˙ RR = −ǫ(ρa2 /c) ξ . Inserting this into the equation of motion (15.69) along with the loading pressure (15.68)

32 and performing the same algebra as before, we get the following radiation-reaction-modified form of Eq. (15.72): ... κ a ξ¨ + ω 2 ξ = ǫτ ξ , where τ = (15.77) 1+κc ... is less than the fluid’s sound travel time to cross the ball’s radius, a/c. The term ǫτ ξ in the equation of motion is the ball’s radiation-reaction acceleration, as we see from the fact that it would change sign if we switched from outgoing waves, ǫ = +1, to ingoing waves, ǫ = −1. In the absence of radiation reaction, the ball’s surface oscillates sinusoidally in time, ξ = e±iωt . The radiation reaction term produces a weak damping of these oscillations: ξ ∝ e±iωt e−σt ,

1 where σ = ǫ(ωτ )ω 2

(15.78)

is the radiation-reaction-induced damping rate. Note that in order of magnitude the ratio of the damping rate to the oscillation frequency is σ/ω = ωτ . ωa/c = a/¯ λ, which is small compared to unity by virtue of the slow-motion assumption. If the waves were ingoing rather than outgoing, ǫ = −1, the fluid’s oscillations would grow. In either case, outgoing waves or ingoing waves, the radiation reaction force removes energy from the ball or adds it at the same rate as the sound waves carry energy off or bring it in. The total energy, wave plus ball, is conserved. Expression (15.78) is two linearly independent solutions to the equation of motion (15.77) — one with the sign + and the other −. Since this equation of motion has been made third order by the radiation-reaction term, there must be a third independent solution. It is easy to see that, up to a tiny fractional correction, that third solution is ξ ∝ eǫt/τ .

(15.79)

For outgoing waves, ǫ = +1, this solution grows exponentially in time, on an extremely rapid timescale τ . a/c; it is called a runaway solution. Such runaway solutions are ubiquitous in equations of motion with radiation reaction. For example, a computation of the electromagnetic radiation reaction on a small, classical, electrically charged, spherical particle gives the Abraham-Lorentz equation of motion ... m(¨ x − τ x ) = Fext

(15.80)

(Rorlich 1965; Sec. 16.2 of Jackson 1999). Here x(t) is the the particle’s world line, Fext is the external force that causes the particle to accelerate, and the particle’s inertial mass m includes an electrostatic contribution analogous to 4πa3 ρ in our fluid problem. The timescale τ , like that in our fluid problem, is very short, and when the external force is absent, there is a runaway solution x ∝ et/τ . Much human heat and confusion were generated, in the the early and mid 20th century, over these runaway solutions (see, e.g., Rorlich 1965). For our simple model problem, little heat or confusion need be expended. One can easily verify that the runaway solution (15.79) violates the slow-motion assumption a/¯ λ ≪ 1 that underlies our derivation of the radiation reaction acceleration. It therefore is a spurious solution.

33 Our model problem is sufficiently simple that one can dig deeper into it and learn that the runaway solution arises from the slow-motion approximation trying to reproduce a genuine, rapidly damped solution and getting the sign of the damping wrong (Ex. 15.14 and Burke 1970). **************************** EXERCISES Exercise 15.12 Problem: Aerodynamic Sound Generation Consider the emission of quadrupolar sound waves by a Kolmogorov spectrum of free turbulence (Sec. 14.3.4). Show that the power radiated per unit frequency interval has a spectrum Pω ∝ ω −7/2 . Also show that the total power radiated is roughly a fraction M 5 of the power dissipated in the turbulence, where M is the Mach number. Exercise 15.13 Problem: Energy Conservation for Radially Oscillating Ball Plus Sound Waves For the radially oscillating ball as analyzed in Sec. 15.5.3, verify that the radiation reaction acceleration removes energy from the ball, plus the fluid loaded onto it, at the same rate as the gravitational waves carry energy away. Exercise 15.14 Problem: Radiation Reaction Without the Slow Motion Approximation Redo the computation of radiation reaction for a radially oscillating ball immersed in a fluid, without imposing the slow-motion assumption and approximation. Thereby obtain the following coupled equations for the radial displacement ξ(t) of the ball’s surface and the function Φ(t) ≡ a−2 f (t − ǫa/c), where ψ = r −1 f (t − ǫr/c) is the sound-wave field: ξ¨ + ωo2 ξ = κΦ˙ ,

ξ˙ = −Φ − ǫ(a/c)Φ˙ .

(15.81)

Show that in the slow-motion regime, this equation of motion has two weakly damped solutions of the same form (15.78) as we derived using the slow-motion approximation, and one rapidly damped solution ξ ∝ exp(−ǫκ/τ ). Burke (1970) shows that the runaway solution (15.79) obtained using the slow-motion approximation is caused by that approximation’s futile attempt to reproduce this genuine, rapidly damped solution (15.81). Exercise 15.15 Problem: Sound Waves from a Ball Undergoing Quadrupolar Oscillations Repeat the analysis of gravitational wave emission, radiation reaction, and energy conservation, as given in Sec. 15.5.3 and Ex. 15.13, for axisymmetric, quadrupolar oscillations of an elastic ball, rball = a + ξ(t)P2 (cos θ). Comment: Since the lowest multipolar order for gravitational waves is quadrupolar, this exercise is closer to the analogous problem of gravitational wave emission than the monopolar analysis in the text.

34 Hint: If ω is the frequency of the ball’s oscillations, then the sound waves have the form n2 (ωr/c) − iǫj2 (ωr/c) −iωt ψ = Kℜ e , (15.82) r where K is a constant, ℜ(X) is the real part of X, ǫ is +1 for outgoing waves and −1 for ingoing waves, and j2 and n2 are the spherical Bessel and spherical Neuman functions of order 2. In the distant wave zone, x ≡ ωr/c ≫ 1, n2 (x) − iǫj2 (x) =

eiǫx ; x

(15.83)

in the near zone x = ωr/x ≪ 1, 3 n2 (x) = − 3 1 & x2 & x4 & . . . , x

Here “& xn ” means “+ (some constant)xn ”.

x2 1 & x2 & x4 & . . . . j2 (x) = 15

(15.84)

****************************

15.6

T2 Convection

In this last section of Chap 15, we turn attention to fluid motions driven by thermal effects (see the overview in Sec. 15.1). As a foundation, we begin by discussing heat transport via thermal diffusion:

15.6.1

T2 Heat Conduction

We know experimentally that heat flows in response to a temperature gradient. When the temperature differences are small on the scale of the mean free path of the heat-conducting particles (as, in practice, almost always will be the case), then we can expand the heat flux as a Taylor series in the temperature gradient, Fheat = (constant) + (a term linear in ∇T ) + (a term quadratic in ∇T ) + .... Now, the constant term must vanish; otherwise there would be heat conduction in the absence of a temperature gradient and this would contradict the second law of thermodynamics. The first contributing term is thus the linear term, and we stop with it, just as we do for Hooke’s law of elasticity and Ohm’s law of electrical conductivity. Here, as in elasticity and electromagnetism, we must be on the lookout for special circumstances when the linear approximation becomes invalid and be prepared to modify our description accordingly. This rarely happens in fluid dynamics, so in this chapter we shall ignore higher-order terms and write Fheat = −κ∇T ,

(15.85)

where the constant κ is known as the coefficient of thermal conductivity or just the thermal conductivity; cf. Secs. 2.7 and 12.7.3, where we have discussed it previously. In general κ will

35 be a tensor, as it describes a linear relation between two vectors Fheat and ∇T . However, when the fluid is isotropic (as it is for the kinds of fluids we have treated thus far), κ is just a scalar. We shall confine ourselves to this case in the present chapter; but in Chap. 18, when describing a plasma as a fluid, we shall find that a magnetic field can make the plasma’s transport properties be decidedly anisotropic, so the thermal conductivity is tensorial. In this section we shall discuss how heat conduction is incorporated into the fundamental equations of fluid dynamics—focusing on the underlying physics and on approximations, by contrast with our earlier quite formal analysis in Sec. 12.7.3. Heat conduction can be incorporated most readily via the conservation laws for momentum and energy. On the molecular scale, the diffusing heat shows up as an anisotropic term N1 in the momentum distributions N (p) = N0 +N1 of particles (molecules, atoms, electrons, photons, ...); cf., e.g., Eqs. (2.74a) and (2.74g). This anisotropic term is tiny in magnitude compared to the isotropic term N0 , which has already been included via u =(internal energy per unit mass) and P = (pressure) in our densities and fluxes of momentum and energy. The only place that the molecular-scale anisotropic term is of any quantitative consequence, macroscopically, is in the energy flux; so heat conductivity modifies only the energy flux and not the energy density or the momentum density or flux.5 Correspondingly, the law of momentum conservation (the Navier-Stokes equation) is left unchanged, while energy conservation is altered. We explored the modifications of energy conservation, formally, in Sec. 12.7.3. As we discussed there, energy conservation is most powerfully expressed in the language of entropy. For an inviscid fluid with vanishing heat conductivity, energy conservation is equivalent to the constancy of the the entropy per unit mass s moving with the fluid, ds/dt = 0. When viscosity and heat conductivity are turned on, then the entropy changes moving with a fluid element in the following manner: ρT

ds = κ∇2 T + 2ησ : σ + ζθ2 . dt

(15.86)

[This version of entropy evolution is deduced in Ex. 15.16 from one of the versions we wrote down in Chap. 12, Eq. (12.71).] This equation is most easily understood, physically, by rewriting it as ρT ds/dt + ∇ · (−κ∇T ) = +2ησ : σ + ζθ2 . The left side is the rate of change of the thermal energy density in the rest frame of the fluid (a time derivative of the thermal energy density, since T ds is the rate dQ at which heat is injected into a unit mass, plus the divergence of the thermally diffusing energy flux). Thye right side is the rate of viscous heating per unit volume. For a viscous, heat-conducting fluid moving in an external gravitational field, the governing equations are the standard law of mass conservation (12.25) or (12.27), the standard Navier-Stokes equation (12.65), the first law of thermodynamics [Eq. (2) or (3) of Box 12.2], and the law of entropy evolution (15.86). This set of equations is far too complicated to solve, except via massive numerical simulations, unless some strong simplification is imposed. We must therefore introduce approxima5

This is only true non-relativistically. In relativistic fluid dynamics, it remains true in the fluid’s rest frame; but in frames where the fluid moves at high speed, the diffusive energy flux gets Lorentz transformed into heat-flow contributions to energy density, momentum density, and momentum flux.

36 tions. Our first approximation (already implicit in the above equations) is that the thermal conductivity κ is constant; for most real applications this is close to true, and no significant physical effects are missed by assuming it. Our second approximation, which does limit somewhat the type of problem we can address, is that the fluid motions are very slow —slow enough that the squares of the shear and expansion (which are quadratic in the fluid speed) are neglibibly small, and we thus can ignore viscous dissipation. This permits us to rewrite the entropy evolution equation (15.86) as the law of energy conservation in the fluid’s rest frame ds ρT = κ∇2 T. (15.87) dt We can convert this entropy evolution equation into an evolution equation for temperature by expressing the changes ds/dt of entropy per baryon in terms of changes dT /dt of temperature. The usual way to do this is to note that T ds (the amount of heat deposited in a unit mass of fluid) is given by CdT , where C is the fluid’s specific heat. However, the specific heat depends on what one holds fixed during the energy deposition: the fluid element’s volume or its pressure. As we have assumed that the fluid motions are very slow, the fractional pressure fluctuations will be correspondingly small. (This does not preclude significant temperature fluctuations, provided that they are compensated by density fluctuations of opposite sign. However, if there are temperature fluctuations, then these will tend to equalize through thermal conduction in such a way that the pressure does not change significantly.) Therefore, the relevant specific heat for a slowly moving fluid is the one at constant pressure, CP , and we must write T ds = CP dT .6 Eq. (15.87) then becomes a linear partial differential equation for the temperature ∂T dT ≡ + v · ∇T = χ∇2 T , dt ∂t

(15.88)

χ = κ/ρCp

(15.89)

where is known as the thermal diffusivity and we have again taken the easiest route in treating CP and ρ as constant. When the fluid moves so slowly that the advective term v · ∇T is negligible, then Eq. (15.88) says that the heat simply diffuses through the fluid, with the thermal diffusivity being the diffusion coefficient for the temperature. The diffusive transport of heat by thermal conduction is similar to the diffusive transport of vorticity by viscous stress [Eq. (13.3)] and the thermal diffusivity χ is the direct analog of the kinematic viscosity ν. This motivates us to introduce a new dimensionless number known as the Prandtl number, which measures the relative importance of viscosity and heat conduction (in the sense of their relative abilities to produce a diffusion of vorticity and of heat): ν Pr = (15.90) χ 6 See e.g. Turner 1973 for a more formal justification of the use of the specific heat at constant pressure rather than constant volume.

37 For gases, both ν and χ are given to order of magnitude by the product of the mean molecular speed and the mean free path and so Prandtl numbers are typically of order unity. (For air, Pr ∼ 0.7.) By contrast, in liquid metals the free electrons carry heat very efficiently compared with the transport of momentum (and vorticity) by diffusing ions, and so their Prandtl numbers are small. This is why liquid sodium is used as a coolant in nuclear power reactors. At the other end of the spectrum, water is a relatively poor thermal conductor with Pr ∼ 6, and Prandtl numbers for oils, which are quite viscous and poor conductors, measure in the thousands. Other Prandtl numbers are given in Table 15.1. Fluid ν (m2 s−1 ) Earth’s mantle 1017 Solar interior 10−2 Atmosphere 10−5 Ocean 10−6

χ (m2 s−1 ) 10−6 102 10−5 10−7

Pr 1023 10−4 1 10

Table 15.1: Order of magnitude estimates for kinematic viscosity ν, thermal diffusivity χ, and Prandtl number Pr = ν/χ for earth, fire, air and water.

One might think that, when the Prandtl number is small (so κ is large compared to ν), one should necessarily include heat flow in the fluid equations. Not so. One must distinguish the flow from the fluid. In some low-Prandtl-number flows, the heat conduction is so effective that the fluid becomes essentially isothermal, and buoyancy effects are minimised. Conversely, in some large-Prandtl-number flows the large viscous stress reduces the velocity gradient so that slow, thermally driven circulation takes place and thermal effects are very important. In general, the kinematic viscosity is of direct importance in controlling the transport of momentum, and hence in establishing the velocity field, whereas heat conduction enters only indirectly (Sec. 15.6.2 below). We must therefore examine each flow on its individual merits. There is another dimensionless number that is commonly introduced when discussing thermal effects: the Péclet number. It is defined, by analogy with the Reynolds’ number, by Pe =

VL , χ

(15.91)

where L is a characteristic length scale of the flow and V is a characteristic speed. The Péclet number measures the relative importance of advection and heat conduction. **************************** EXERCISES Exercise 15.16 Derivation: Equations for Entropy Evolution By combining the law of energy increase (12.71) (which is equivalent to energy conservation) with the law of mass conservation, derive Eq. (15.86) for the evolution of the fluid’s entropy.

38 Exercise 15.17 Example: Poiseuille Flow with a uniform temperature gradient A nuclear reactor is cooled with liquid sodium which flows through a set of pipes from the reactor to a remote heat exchanger, where the heat’s energy is used to generate electricity. Unfortunately, some heat will be lost through the walls of the pipe before it reaches the heat exchanger and this will reduce the reactor’s efficiency. In this question, we determine what fraction of the heat is lost through the pipe walls. Consider the flow of the sodium through one of the pipes, and assume that the Reynold’s number is modest so the flow is steady and laminar. Then the fluid velocity will have the parabolic Poiseuille profile ̟2 v = 2¯ v 1− 2 (15.92) R [Eq. (12.76) and associated discussion]. Here R is the pipe’s inner radius, ̟ is the cylindrical radial coordinate measured from the axis of the pipe, and v¯ is the mean speed along the pipe. Suppose that the pipe has length L ≫ R from the reactor to the heat exchanger, and is thermally very well insulated so its inner wall is at nearly the same temperature as the core of the fluid. Then the total temperature drop ∆T down the length L will be ∆T ≪ T , and the temperature gradient will be constant, so the temperature distribution in the pipe has the form z T = T0 − ∆T + f (̟) . (15.93) L (a) Use Eq. (15.87) to show that v¯R2 ∆T f= 2χL

3 ̟2 1 ̟4 − + 4 R2 4 R4

.

(15.94)

(b) Derive an expression for the conductive heat flux through the walls of the pipe and show that the ratio of the heat escaping through the walls to that convected by the fluid is ∆T /T . (Ignore the influence of the temperature gradient on the velocity field and treat the thermal diffusivity and specific heat as constant throughout the flow.) (c) Consider a nuclear reactor in which 10kW of power has to be transported through a pipe carrying liquid sodium. If the reactor temperature is ∼ 1000K and the exterior temperature is room temperature, estimate the flow of liquid sodium necessary to achieve the necessary transport of heat . Exercise 15.18 Problem: Thermal Boundary Layers In Sec. 13.4, we introduced the notion of a laminar boundary layer by analyzing flow past a thin plate. Now suppose that this same plate is maintained at a different temperature from the free flow. A thermal boundary layer will be formed, in addition to the viscous boundary layer, which we presume to be laminar. These two boundary layers will both extend outward from the wall but will (usually) have different thicknesses. (a) Explain why their relative thicknesses depend on the Prandtl number.

39 (b) Using Eq. (15.88), show that in order of magnitude the thickness of the thermal boundary layer, δT , is given by v(δT )δT2 = ℓχ , where v(δT ) is the fluid velocity parallel to the plate at the outer edge of the thermal boundary layer and ℓ is the distance downstream from the leading edge. Let V be the free stream fluid velocity and ∆T be the temperature difference between the plate and the body of the flow. (c) Estimate δT in the limits of large and small Prandtl numbers. (d) What will be the boundary layer’s temperature profile when the Prandtl number is exactly unity?

****************************

15.6.2

T2 Boussinesq Approximation

When the heat fluxes are sufficiently small, we can use Eq. (15.88) to solve for the temperature distribution in a given velocity field, ignoring the feedback of the thermal effects onto the velocity. However, if we imagine increasing the flow’s temperature differences so the heat fluxes also increase, at some point thermal feedback effects will begin to influence the velocity significantly. The first feedback effect to occur is typically that of buoyancy, the tendency of the hotter (and hence lower-density) fluid to rise in a gravitational field and the colder (and hence denser) fluid to descend.7 In this section, we shall describe the effects of buoyancy as simply as possible. The minimal approach, which is adequate surprisingly often, is called the Boussinesq approximation. It can be used to describe many laboratory flows and atmospheric flows, and some geophysical flows. The type of flows for which the Boussinesq approximation is appropriate are those in which the fractional density changes are small (|∆ρ| ≪ ρ). By contrast, the velocity can undergo large changes. However, as the density changes following a fluid element must be small, ρ−1 dρ/dt ≃ 0, we can approximate the equation of continuity (mass conservation) dρ/dt + ρ∇ · v = 0 by the “incompressibility” relation ∇ · v = 0 . Boussinesq (1)

(15.95)

This does not mean that the density is constant. Rather, it means that the sole significant cause of density changes is thermal expansion. In discussing thermal expansion, it is convenient to introduce a reference density ρ0 and reference temperature T0 , equal to some mean of the density and temperature in the region of fluid that one is studying. We shall denote by τ ≡ T − T0 (15.96) 7

This effect is put to good use in a domestic “gravity-fed” warm-air circulation system. The furnace generally resides in the basement not the attic!

40 the perturbation of the temperature away from its reference value. The thermally perturbed density can then be written as ρ = ρ0 (1 − ατ ) , (15.97) where α is the thermal expansion coefficient for volume8 [evaluated at constant pressure for the same reason as CP was at constant pressure in the paragraph following Eq. (15.87)]: ∂ ln ρ α=− (15.98) . ∂T P Equation (15.97) enables us to eliminate density perturbations as an explicit variable and replace them by temperature perturbations. Turn, now, to the Navier-Stokes equation (12.66) in a uniform external gravitational field: dv ∇P =− + g + ν∇2 v . dt ρ

(15.99)

We expand the pressure-gradient term as −

∇P ∇P ≃− (1 + ατ ) , ρ ρ0

(15.100)

and, as in our analysis of rotating flows [Eq. (13.53)], we introduce an effective pressure designed to compensate for the first-order effects of the uniform gravitational field: P ′ = P + ρ0 Φ = P − ρ0 g · x .

(15.101)

(Notice that P ′ measures the amount the pressure differs from the value it would have in supporting a hydrostatic atmosphere of the fluid at the reference density.) The Navier-Stokes equation (15.99) then becomes dv ∇P ′ − ατ g + ν∇2 v , =− dt ρ0

Boussinesq (2)

(15.102)

dropping the small term O(αP ′). In words, a fluid element accelerates in response to a buoyancy force which is the sum of the first and second terms on the right hand side of Eq. (15.102), and a viscous force. In order to solve this equation we must be able to solve for the temperature perturbation, τ . This evolves according to the standard equation of heat diffusion, Eq. (15.88): dτ = χ∇2 τ. Boussinesq (3) dt

(15.103)

Equations (15.95), (15.102) and (15.103) are the equations of fluid flow in the Boussinesq approximation; they control the coupled evolution for the velocity v and the temperature perturbation τ . We shall now use them to discuss free convection in a laboratory apparatus. 8

Note that α is three times larger than the thermal expansion coefficient for the linear dimensions of the fluid.

41 z

T0- T/2

d

x

T0+ T/ 2

Fig. 15.11: Rayleigh-Bénard convection. A fluid is confined between two horizontal surfaces separated by a vertical distance d. When the temperature difference between the two plates ∆T is increased sufficiently, the fluid will start to convect heat vertically. The reference effective pressure P0′ and reference temperature T0 are the values of P ′ and T measured at the midplane z = 0.

15.6.3

T2 Rayleigh-B´ enard Convection

In a relatively simple laboratory experiment to demonstrate convection, a fluid is confined between two rigid plates a distance d apart, each maintained at a fixed temperature, with the upper plate cooler than the lower by ∆T . When ∆T is small, viscous stresses, together with the no-slip boundary conditions at the plates, inhibit circulation; so, despite the upward boyancy force on the hotter, less-dense fluid near the bottom plate, the fluid remains stably at rest with heat being conducted diffusively upward. If the plates’ temperature difference ∆T is gradually increased, the buoyancy becomes gradually stronger. At some critical ∆T it will overcome the restraining viscous forces, and the fluid will start to circulate (convect) between the two plates. Our goal is to determine the critical temperature difference ∆Tcrit for the onset of convection. We now make some physical arguments to simplify the calculation of ∆Tcrit . From our experience with earlier instability calculations, especially those involving elastic bifurcations (Secs. 10.8 and 11.3.5), we anticipate that for ∆T < ∆Tcrit the response of the equilibrium to small perturbations will be oscillatory (i.e., will have positive squared eigenfrequency ω 2 ), while for ∆T > ∆Tcrit , perturbations will grow exponentially (i.e., will have negative ω 2 ). Correspondingly, at ∆T = ∆Tcrit , ω 2 for some mode will be zero. This zero-frequency mode will mark the bifurcation of equilibria from one with no fluid motions to one with slow, convective motions. We shall search for ∆Tcrit by searching for a solution to the Boussinesq equations (15.95), (15.102) and (15.103) that represents this zero-frequency mode. In those equations we shall choose for the reference temperature T0 , density ρ0 and effective pressure P0 the values at the midplane between the plates, z = 0; cf. Fig. 15.11. The unperturbed equilibrium, when ∆T = ∆Tcrit , is a solution of the Boussinesq equations (15.95), (15.102) and (15.103) with vanishing velocity, a time-independent vertical temperature gradient dT /dz = −∆T /d, and a compensating, time-independent, vertical pressure gradient: v=0,

τ = T − T0 = −

∆T z, d

P ′ = P0′ + gρ0 α

∆T z 2 . d 2

(15.104)

42 When the zero-frequency mode is present, the velocity v will be nonzero, and the temperature and effective pressure will have additional perturbations δτ and δP ′ : v 6= 0 ,

τ = T − T0 = −

∆T z + δτ , d

P ′ = P0′ + gρ0 α

∆T z 2 + δP ′ . d 2

(15.105)

The perturbations v, δτ and δP ′ are governed by the Boussinesq equations and the boundary conditions v = 0 (no-slip) and δτ = 0 at the plates, z = ±d/2. We shall manipulate these in such a way as to get a partial differential equation for the scalar temperature perturbation δτ by itself, decoupled from the velocity and the pressure perturbation. Consider, first, the result of inserting expressions (15.105) into the Boussinesq-approximated Navier-Stokes equation (15.102). Because the perturbation mode has zero frequency, ∂v/∂t vanishes; and because v is extremely small, we can neglect the quadratic advective term v · ∇v, thereby bringing Eq. (15.102) into the form ∇δP ′ = ν∇2 v − gαδτ . ρ0

(15.106)

We want to eliminate δP ′ from this equation. The other Boussinesq equations are of no help for this, since δP ′ is absent from them. When we dealt with sound waves, we eliminated δP using the equation of state P = P (ρ, T ); but in the present analysis our Boussinesq approximation insists that the only significant changes of density are those due to thermal expansion, i.e. it neglects the influence of pressure on density, so the equation of state cannot help us. Lacking any other way to eliminate δP ′, we employ a very common trick: we take the curl of Eq. (15.106). As the curl of a gradient vanishes, δP ′ drops out. We then take the curl one more time and use the fact that ∇ · v = 0 to obtain ν∇2 (∇2 v) = αg∇2 δτ − α(g · ∇)∇δτ .

(15.107)

Turn, next, to the Boussinesq version of the equation of heat transport, Eq. (15.103). Inserting into it Eqs. (15.105) for τ and v, setting ∂δτ /∂t to zero because our perturbation has zero frequency, linearizing in the perturbation, and using g = −gez , we obtain vz ∆T = −χ∇2 δτ . d

(15.108)

This is an equation for the vertical velocity vz in terms of the temperature perturbation δτ . By inserting this vz into the z component of Eq. (15.107), we achieve our goal of a scalar equation for δτ alone: αg∆T νχ∇ ∇ ∇ δτ = d 2

2

2

∂ 2 δτ ∂ 2 δτ + ∂x2 ∂y 2

.

(15.109)

This is a sixth order differential equation, even more formidable than the fourth order equations that arise in the elasticity calculations of Chaps. 10 and 11. We now see how prudent it was to make simplifying assumptions at the outset!

43 The differential equation (15.109) is, however, linear, and we can seek solutions using separation of variables. As the equilibrium is unbounded horizontally, we look for a single horizontal Fourier component with some wave number k; i.e., we seek a solution of the form δτ ∝ exp(ikx)f (z) ,

(15.110)

where f (z) is some unknown function. Such a δτ will be accompanied by motions v in the x and z directions (i.e., vy = 0) that also have the form vj ∝ exp(ikx)fj (z) for some other functions fj (z). The anszatz (15.110) converts the partial differential equation (15.109) into the single ordinary differential equation 2 3 d Ra k 2 f 2 − k f + =0, (15.111) dz 2 d4 where we have introduced yet another dimensionless number Ra =

αg∆T d3 νχ

(15.112)

called the Rayleigh number. By virtue of the relation (15.108) between vz and δτ , the Rayleigh number is a measure of the ratio of the strength of the buoyancy term −αδτ g to the viscous term ν∇2 v in the Boussinesq version (15.102) of the Navier-Stokes equation: Ra ∼

buoyancy force . viscous force

(15.113)

The general solution of Eq. (15.111) is an arbitrary, linear combination of three sine functions and three cosine functions: f=

3 X

An cos(µn kz) + Bn sin(µn kz) ,

(15.114)

n=1

where the dimensionless numbers µn are given by " #1/2 1/3 Ra µn = e2πni/3 − 1 ; k 4 d4

n = 1, 2, 3 ,

(15.115)

which involves the three cube roots of unity, e2πni/3 . The values of five of the coefficients An , Bn are fixed in terms of the sixth (an overall arbitrary amplitude) by five boundary conditions at the bounding plates, and a sixth boundary condition then determines the critical temperature difference ∆Tcrit (or equivalently, the critical Rayleigh number Racrit ) at which convection sets in. The six boundary conditions are: (i) The requirement that the fluid temperature be the same as the plate temperature at each plate, so δτ = 0 at z = ±d/2. (ii) The noslip boundary condition vz = 0 at each plate which, by virtue of Eq. (15.108) and δτ = 0

44 at the plates, translates into δτ,zz = 0 at z = ±d/2 (where the indices after the comma are partial derivatives). (iii) The no-slip boundary condition vx = 0, which by virtue of incompressibility ∇ · v = 0 implies vz,z = 0 at the plates, which in turn by Eq. (15.108) implies δτ,zzz + δτ,xxz = 0 at z = ±d/2. It is straightforward but computationally complex to impose these six boundary conditions and from them deduce the critical Rayleigh number for onset of convection; see Pellew and Southwell (1940). Rather than present the nasty details, we shall switch to a toy problem in which the boundary conditions are adjusted to give a simpler solution, but one with the same qualitative features as for the real problem. Specifically, we shall replace the no-slip condition (iii) (vx = 0 at the plates) by a condition of no shear, (iii’) vx,z = 0 at the plates. By virtue of incompressibility ∇ · v = 0, the x derivative of this translates into vz,zz = 0, which by Eq. (15.108) translates to δτ,zzxx + δτ,zzzz = 0. To recapitulate, we seek a solution of the form (15.114), (15.115) that satisfies the boundary conditions (i), (ii), (iii’). The terms in Eq. (15.114) with n = 1, 2 always have complex arguments and thus always have z dependences that are products of hyperbolic and trigonometric functions with real arguments. For n = 3 and large enough Rayleigh number, µ3 is positive and the solutions are pure sines and cosines. Let us just consider the n = 3 terms alone, in this regime, and impose boundary condition (i), that δτ = 0 at the plates. The cosine term by itself, δτ = constant × cos(µ3 kz) eikx ,

(15.116)

"

(15.117)

satisfies this, if we set µ3 kd ≡ 2

Ra k 4 d4

1/3

−1

#1/2

kd = (m + 1/2)π , 2

where m is an integer. It is straightforward to show, remarkably, that Eqs. (15.116), (15.117) also satisfy boundary conditions (ii) and (iii’), so they solve the toy version of our problem. As ∆T is gradually increased from zero, the Rayleigh number Ra gradually grows, passing one after another through the sequence of values (15.117) with m = 0, 1, 2, ... (for any chosen k). At each of these values there is a zero-frequency, circulatory mode of fluid motion with horizontal wave number k, which is passing from stability to instability. The first of these, m = 0, represents the onset of circulation for the chosen k, and the Rayleigh number at this onset [Eq. (15.117) with m = 0] is Ra =

(k 2 d2 + π 2 )3 . k 2 d2

(15.118)

This Ra(k) relation is plotted as a thick curve in Fig. 15.12. Notice in Fig. 15.12 that there is a critical Rayleigh number Racrit below which all modes are stable, independent of their wave numbers, and above which modes in some range kmin < k < kmax are unstable. From Eq. (15.118) we deduce that, for our toy problem, Racrit = 27π 4 /4 ≃ 660. When one imposes the correct boundary conditions (i), (ii), (iii) [instead of our toy choice (i), (ii), (iii’)] and works through the nasty details of the computation, one obtains a Ra(k)

45 k Marginal Stability

kcrit

=0

imaginary Instability

real Stability

Ra crit

Ra

Fig. 15.12: Horizontal wave number k of the first mode to go unstable, as a function of Rayleigh number, Ra. Along the solid curve the mode has zero frequency; to the left of the curve it is stable, to the right it is unstable. Racrit is the minimum Rayleigh number for convective instability.

Fig. 15.13: Hexagonal convection cells in Rayleigh-Bénard convection. The fluid, which is visualized using aluminum powder, rises at the centers of the hexagons and falls around the edges.

relation that looks qualitatively the same as Fig. 15.12, one deduces that convection should set in at Racrit ≃ 1700, which agrees reasonably well with experiment. One can carry out the same computation with the fluid’s upper surface free to move (e.g., due to placing air rather than a solid plate at z = d/2). Such a computation predicts that convection begins at Racrit ≃ 1100, though in practice surface tension is usually important and its effect must be included. One feature of these critical Rayleigh numbers is very striking. Because the Rayleigh number is an estimate of the ratio of buoyancy forces to viscous forces [Eq. (15.113)], an order-of-magnitude analysis suggests that convection should set in at Ra ∼ 1—which is wrong by three orders of magnitude! This provides a vivid reminder that order-of-magnitude estimates can be quite inaccurate. In this case, the main reason for the discrepancy is that the convective onset is governed by a sixth-order differential equation (15.109), and thus is very sensitive to the lengthscale d used in the order-of-magnitude analysis. If we choose d/π rather than d as the length scale, then an order-of-magnitude estimate could give Ra ∼ π 6 ∼ 1000, a much more satisfactory value. Once convection has set in, the unstable modes grow until viscosity and nonlinearities stabilize them, at which point they carry far more heat upward between the plates than does conduction. The convection’s velocity pattern depends, in practice, on the manner in which

46 the heat is applied and the temperature dependence of the viscosity. For a limited range of Rayleigh numbers near Racrit , it is possible to excite a hexagonal pattern of convection cells as shown in Fig. 15.13. When the Rayleigh number becomes very large, the convection becomes fully turbulent and we must introduce an effective turbulent viscosity to replace the molecular viscosity (cf. Chap. 14). Free convection, like that in this laboratory experiment, also occurs in meteorological and geophysical flows. For example for air in a room, the relevant parameter values are α = 1/T ∼ 0.003 K−1 (Charles’ Law), and ν ∼ χ ∼ 10−5 m2 s−1 , so the Rayleigh number is Ra ∼ 3 × 108 (∆T /1K)(d/1m)3 . Convection in a room thus occurs extremely readily, even for small temperature differences. In fact, so many modes of convective motion can be excited that heat-driven air flow is invariably turbulent. It is therefore common in everyday situations to describe heat transport using a phenomenological turbulent thermal conductivity (cf. section 14.3). A further example is given in Box 15.3. **************************** EXERCISES Exercise 15.19 Problem: Critical Rayleigh Number Estimate the temperature to which pans of oil (ν ∼ 10−5 m2 s−1 , Pr∼ 3000), water (ν ∼ 10−6 m2 s−1 , Pr∼ 6) and mercury (ν ∼ 10−7 m2 s−1 , Pr∼ 0.02) would have to be heated in order to convect. Assume that the upper surface is at room temperature. Do not perform this experiment with mercury! Exercise 15.20 Problem: Width of Thermal Plume Consider a two dimensional thermal plume transporting heat away from a hot knife edge. Introduce a temperature deficit ∆T (z) measuring the typical difference in temperature between the plume and the surrounding fluid at height z above the knife edge, and let δp (z) be the width of the plume at height z. (a) Show that energy conservation implies the constancy of δp ∆T v¯z , where v¯z (z) is the plume’s mean vertical speed at height z. (b) Make an estimate of the buoyancy acceleration and use this to estimate v¯z . (c) Use Eq. (15.103) to relate the width of the plume to the speed. Hence, show that the width of the plume scales as δp ∝ z 2/5 and the temperature deficit as ∆T ∝ z −3/5 . (d) Repeat this exercise for a three dimensional plume above a hot spot.

****************************

47 Box 15.3 Mantle Convection and Continental Drift As is now well known, the continents drift over the surface of the globe on a timescale of roughly a hundred million years. Despite the clear geographical evidence that the continents fit together, geophysicists were, for a long while, skeptical that this occured because they were unable to identify the forces responsible for overcoming the visco-elastic resilience of the crust. It is now known that these motions are in fact slow convective circulation of the mantle driven by internally generated heat from the radioactive decay of unstable isotopes, principally uranium, thorium and potassium. When the heat is generated within the convective layer, rather than passively transported from below, we must modify our definition of the Rayleigh number. Let the heat generated per unit mass per unit time be Q. In the analog of our laboratory analysis, where the fluid is assumed marginally unstable to convective motions, this Q will generate a heat flux ∼ ρQd, which must be carried diffusively. Equating this flux to κ∆T /d, we can solve for the temperature difference ∆T between the lower and upper edges of the convective mantle: ∆T ∼ ρQd2 /κ. Inserting this ∆T into Eq. (15.112), we obtain a modified expression for the Rayleigh number Ra′ =

αρgQd5 . κχν

(1)

Let us now estimate the value of Ra′ for the earth’s mantle. The mantle’s kinematic viscosity can be measured by post-glacial rebound studies (cf. Ex. 13.5) to be ∼ 1017 m2 s−1 . We can use the rate of attenuation of diurnal and annual temperature variation with depth in surface rock to estimate a thermal diffusivity χ ∼ 10−6 m2 s−1 . Direct experiment furnishes an expansion coefficient, α ∼ 3 × 10−5 K−1 . The thickness of the upper mantle is roughly 700 km and the rock density is about 4000 kg m−3 . The rate of heat generation can be estimated both by chemical analysis and direct measurement at the earth’s surface and turns out to be Q ∼ 10−11 W kg−1 . Combining these quantities, we obtain an estimated Rayleigh number Ra′ ∼ 106 , well in excess of the critical value for convection under free slip conditions which evaluates to 868 (Turcotte & Schubert 1982). For this reason, it is now believed that continental drift is driven primarily by mantle convection.

15.6.4

T2 Convection in Stars

The sun and other stars generate heat in their interiors by nuclear reactions. In most stars, the internal energy is predominantly in the form of hot hydrogen and helium ions and their electrons, while the thermal conductivity is due primarily to diffusing photons (Sec. 2.7), which have much longer mean free paths than the ions and electrons. When the photon mean free path becomes small due to high opacity (as happens in the outer 30 per cent of the sun; Fig. 15.14), the thermal conductivity goes down, so in order to transport the heat from nuclear burning, the star develops an increasingly steep temperature gradient. The

a 48

star may then become convectively unstable and transport its energy far more efficiently by circulating its hot gas than it could have by photon diffusion. Describing this convection is a key step in understanding the interiors of the sun and other stars.

a Convection Zone

Photosphere

Core

Fig. 15.14: A convection zone occupies the outer 30 per cent of a solar-type star.

A heuristic argument provides the basis for a surprisingly simple description of this convection. As a foundation for our argument, let us identify the relevant physics: First: the pressure within stars varies through many orders of magnitude; typically 1012 for the sun. Therefore, we cannot use the Boussinesq approximation; instead, as a fluid element rises or descends, we must allow for its density to change in response to large changes of the surrounding pressure. Second: The convection involves circulatory motions on such large scales that the attendant shears are small and viscosity is thus unimportant. Third: Because the convection is driven by ineffectiveness of conduction, we can idealize each fluid element as retaining its heat as it moves, so the flow is adiabatic. Fourth: the convection will usually be very subsonic, as subsonic motions are easily sufficient to transport the nuclear-generated heat, except very close to the solar surface. Our heuristic argument, then, focuses on convecting fluid blobs that move through the star’s interior very subsonically, adiabatically, and without viscosity. As the motion is subsonic, each blob will remain in pressure equilibrium with its surroundings. Now, suppose we make a virtual interchange between two blobs at different heights (Fig. 15.15). The blob that rises (blob B in the figure) will experience a decreased pressure and thus will expand, so its density will diminish. If its density after rising is lower than that of its surroundings, then it will be buoyant and continue to rise. Conversely, if the risen blob is denser than its surroundings, then it will sink back to its original location. Therefore, a criterion for convective instability is that the risen blob has lower density than its surroundings. Since the blob and its surroundings have the same pressure, and since the entropy s per unit mass of gas is larger, the lower is its density (there being more phase space available to its particles), the fluid is convectively unstable if the risen blob has a higher entropy than its surroundings. Now, the blob’s motion was adiabatic, so its entropy per unit mass s is the same after it rises as before. Therefore, the fluid is convectively unstable if the entropy per unit mass s at the location where the blob began (lower in the star) is greater than that at the location to which it rose (higher in the star); i.e., the star is convectively unstable if its entropy per unit mass decreases outward, ds/dr < 0. For small blobs, this instability will be counteracted by both viscosity and heat conduction; but for large blobs viscosity and conduction are ineffective, and the convection proceeds. When building stellar models, astrophysicists find it convenient to determine whether a region of a model is convectively unstable by computing what its structure would be

49

A

B

g

B

S

A After

Before

Fig. 15.15: Convectively unstable interchange of two blobs in a star. Blob B rises to the former position of blob A and expands adiabatically to match the surrounding pressure. The entropy per unit mass of the blob is higher than that of the surrounding gas and so the blob has a lower density. It will therefore be buoyant and continue to rise. Similarly, blob A will continue to sink.

without convection, i.e., with all its heat carried radiatively. That computation gives some temperature gradient dT /dr. If this computed dT /dr is superadiabiatic, i.e., if d ln P d ln T ∂ ln T d ln T − ≡− , (15.119) > − d ln r ∂ ln P s d ln r d ln r s then correspondingly the entropy s decreases outward, and the star is convectively unstable. This is known as the Schwarzschild criterion for convection, since it was formulated by the same Karl Schwarzschild as discovered the Schwarzschild solution to Einstein’s equations (which describes a nonrotating black hole; Chap. 25). In practice, if the star is convective, then the convection is usually so efficient at transporting heat that the actual temperature gradient is only slightly superadiabatic; i.e., the entropy s is nearly independent of radius—it decreases outward only very slightly. (Of course, the entropy can increase significantly outwards in a convectively stable zone where radiative diffusion is adequate to transport heat.) We can demonstrate the efficiency of convection by estimating the convective heat flux when the temperature gradient is slightly superadiabatic, i.e., when ∆|∇T | ≡ |(dT /dr)| − |(dT /dr)s| is slightly positive. As a tool in our estimate, we introduce the concept of the mixing length, denoted by l—the typical distance a blob travels before breaking up. As the blob is in pressure equilibrium, we can estimate its fractional density difference from its surroundings by δρ/ρ ∼ δT /T ∼ ∆|∇T |l/T . Invoking Archimedes’ principle, we estimate the blob’s acceleration to be ∼ gδρ/ρ ∼ g∆|∇T |l/T (where g is the local acceleration of gravity), and hence the average speed with which a blob rises or sinks will be v¯ ∼ (g∆|∇T |/T )1/2 l. The convective heat flux is then given by Fconv ∼ CP ρ¯ v l∆|∇T | ∼ CP ρ(g/T )1/2 (∆|∇T |)3/2 l2 .

(15.120)

50 We can bring this into a more useful form, accurate to within factors of order unity, by setting the mixing length equal to the pressure scale height l ∼ H = |dr/d ln P | as is usually the case in the outer parts of a star, setting Cp ∼ h/T where h is the enthalpy per unit mass [cf. the first law of thermodynamics, Eq. (3) of Box 12.2], setting g = −(P/ρ)d ln P/dr ∼ c2 |d ln P/dr| [cf. the equation of hydrostatic equilibrium (12.13) and Eq. (15.48d) for the speed of sound c], and setting |∇T | ≡ |dT /dr| ∼ T d ln P/dr. The resulting expression for Fconv can then be inverted to give |∆∇T | ∼ |∇T |

Fconv hρc

2/3

∼

F p conv 5 P kB T /mp 2

!2/3

.

(15.121)

Here the last expression is obtained from the fact that the gas is fully ionized, so its enthalpy is h = 52 P/ρ and its speed of sound is about the thermal speed of its protons (the most p kB T /mp (with kB Boltzmann’s constant and mp the numerous massive particle), c ∼ proton rest mass). It is informative to apply this estimate to the convection zone of the sun (the outer ∼ 30 per cent of its radius; Fig. 15.14). The luminosity of the sun is ∼ 4 × 1026 W and its radius is 7 × 105 km, so its convective energy flux is Fconv ∼ 108 W m−2 . Consider, first, the convection zone’s base. The pressure there is P ∼ 1 TPa and the temperature is T ∼ 106 K, so Eq. (15.121) predicts |∆∇T |/|∇T | ∼ 3 × 10−6 ; i.e., the temperature gradient at the base of the convection zone need only be superadiabatic by a few parts in a million in order to carry the solar energy flux. By contrast, at the top of the convection zone (which is nearly at the solar surface), the gas pressure is only ∼ 10 kPa and the sound speed is ∼ 10 km s−1 , so hρc ∼ 108 W m−2 , and |∆∇T |/|∇T | ∼ 1; i.e., the temperature gradient must depart significantly from the adiabatic gradient in order to carry the heat. Moreover, the convective elements, in their struggle to carry the heat, move with a significant fraction of the sound speed so it is no longer true that they are in pressure equilibrium with their surroundings. A more sophisticated theory of convection is therefore necessary near the solar surface. Convection is very important in some other types of stars. It is the primary means of heat transport in the cores of stars with high mass and high luminosity, and throughout very young stars before they start to burn their hydrogen in nuclear reactions. **************************** EXERCISES Exercise 15.21 Problem: Radiative Transport The density and temperature in the interior of the sun are roughly 0.1 kg m−3 and 1.5 × 107 K. (a) Estimate the central gas pressure and radiation pressure and their ratio. (b) The mean free path of the radiation is determined almost equally by Thomson scattering, bound-free absorption and free-free absorption. Estimate numerically the photon

51 mean free path and hence estimate the photon escape time and the luminosity. How well do your estimates compare with the known values for the sun? Exercise 15.22 Problem: Bubbles Consider a small bubble of air rising slowly in a large expanse of water. If the bubble is large enough for surface tension to be ignored, then it will form an irregular cap of radius r. Show that the speed with which the bubble rises is roughly ∼ (gr)1/2 . (A more refined estimate gives a numerical coefficient of 2/3.) ****************************

15.6.5

T2 Double Diffusion — Salt Fingers

Convection, as we have described it so far, is driven by the presence of an unbalanced buoyancy force in an equilibrium distribution of fluid. However, it can also arise as a higher order effect even if the fluid is stably stratified, i.e. if the density gradient is in the same direction as gravity. An example is salt fingering, a rapid mixing that can occur when warm, salty water lies at rest above colder fresh water. The higher temperature of the upper fluid initially outbalances the weight of its salt, making it more buoyant than the fresh water below. However, the heat diffuses downward faster than the salt, enabling a density inversion gradually to develop and the salt-rich fluid to begin a slow interchange with the salt-poor fluid below. It is possible to describe this instability using a local perturbation analysis. The set up is somewhat similar to the one we used in Sec. 15.6.3 to analyze Rayleigh-Bénard convection: We consider a stratified fluid in which there is a vertical gradient in the temperature, and as before, we measure its departure from a reference temperature T0 at a midplane (z = 0) by τ ≡ T − T0 . We presume that in the equilibrium τ varies linearly with z, so ∇τ = (dτ /dz)ez is constant. Similarly, we characterize the salt concentration by C ≡ (concentration) − (equilibrium concentration at the mid plane), and we assume that in equilibrium C like τ varies linearly with height, so ∇C = (dC/dz)ez is constant. The density ρ will be equal to the equilibrium density at the midplane plus corrections due to thermal expansion and due to salt concentration ρ = ρ0 − αρ0 τ + βρ0 C (15.122) [cf. Eq. (15.97)]. Here β is a constant for concentration analogous to the thermal expansion coefficient α for temperature. In this problem, by contrast with Rayleigh-Bénard convection, it is easier to work directly with the pressure than the modified pressure. In equilibrium, hydrostatic equilibrium dictates that its gradient be ∇P = −ρg. Now, let us perturb about these values and write down the linearized equations for the evolution of the perturbations. We shall denote the perturbation of temperature (relative to the reference temperature) by δτ , of salt concentration by δC, of density by δρ, of pressure by δP , and of velocity by simply v since the unperturbed state has v = 0. We shall not ask about the onset of instability, but rather (because we expect our situation to be generically unstable) we shall seek a dispersion relation ω(k) for the perturbations. Correspondingly,

52 in all our perturbation equations we shall replace ∂/∂t with −iω and ∇ with ik, except for the equilibrium ∇C and ∇τ which are constants. The first of our perturbation equations is the linearized Navier-Stokes equation −iωρ0 v = −ikδP + gδρ − νk 2 ρ0 v ,

(15.123)

where we have kept the viscous term because we expect the Prandtl number to be of order unity (for water Pr ∼ 6). Low velocity implies incompressibity ∇ · v = 0, which becomes k·v =0.

(15.124)

The density perturbation follows from the perturbed form of Eq. (15.122) δρ = −αρ0 δτ + βρ0 δC .

(15.125)

The temperature perturbation is governed by Eq. (15.103) which linearizes to −iωδτ + (v · ∇)τ = −χk 2 δτ .

(15.126)

Assuming that the timescale for the salt to diffuse is much longer than the temperature to diffuse, we can ignore salt diffusion all together so that dδC/dt = 0, which becomes −iωδC + (v · ∇)C = 0

(15.127)

Equations (15.123)–(15.127) are five equations for the five unknowns δP, δρ, δC, δT, v, one of which is a three component vector! Unless we are careful, we will end up with a seventh order algebraic equation. Fortunately, there is a way to keep the algebra manageable. First, we eliminate the pressure perturbation by taking the curl of Eq. (15.123) [or equivalently by crossing k into Eq. (15.123)]: (−iω + νk 2 )ρ0 k × v = k × gδρ

(15.128)

Taking the curl of this equation again allows us to incorporate incompressibility (15.124): (iω − νk 2 )ρ0 k 2 g · v = [(k · g)2 − k 2 g 2 ]δρ .

(15.129)

Since g points vertically, this is one equation for the density perturbation in terms of the vertical velocity perturbation vz . We can obtain a second equation of this sort by inserting Eq. (15.126) for δτ and Eq. (15.127) for δC into Eq. (15.125); the result is βρ0 αρ0 (v · ∇)τ + (v · ∇)C . (15.130) δρ = − 2 iω − χk iω Since the unperturbed gradients of temperature and salt concentration are both vertical, Eq. (15.130), like (15.129), involves only vz and not vx or vy . Solving both (15.129) and (15.130) for the ratio δρ/vz and equating these two expressions, we obtain the following dispersion relation for our perturbations: (k · g)2 2 2 ω(ω + iνk )(ω + iχk ) + 1 − 2 2 [ωα(g · ∇)τ − (ω + iχk 2 )β(g · ∇)C] = 0 . (15.131) k g

53

v f

L

Fig. 15.16: Salt Fingers in a fluid in which warm, salty water lies on top of cold fresh water.

When k is real, as we shall assume, we can write this dispersion relation as a cubic equation for p = −iω with real coefficients. The roots for p are either all real or one real and two complex conjugates, and growing modes have the real part of p positive. When the constant term in the cubic is negative, i.e. when (g · ∇)C < 0 ,

(15.132)

we are guaranteed that there will be at least one positive, real root p and this root will correspond to an unstable, growing mode. Therefore, a sufficient condition for instability is that the concentration of salt increase with height! By inspecting the dispersion relation we conclude that the growth rate will be maximal when k·g = 0, i.e. when the wave vector is horizontal. What is the direction of the velocity v for these fastest growing modes? Incompressibility (15.124) says that v is orthogonal to the horizontal k; and Eq. (15.128) says that k × v points in the same direction as k × g, which is horizontal since g is vertical. These two conditions imply that v points vertically. Thus, these fastest modes represent fingers of salty water descending past rising fingers of fresh water; cf. Fig. 15.16. For large k (narrow fingers), the dispersion relation (15.131) predicts a growth rate given approximately by iω ∼

β(g · ∇)C . νk 2

(15.133)

Thus, the growth of narrow fingers is driven by the concentration gradient and retarded by viscosity. For larger fingers, the temperature gradient will participate in the retardation, since the heat must diffuse in order to break the buoyant stability. Now let us turn to the nonlinear development of this instability. Although we have just considered a single Fourier mode, the fingers that grow are roughly cylindrical rather than sheet-like. They lengthen at a rate that is slow enough for the heat to diffuse horizontally, though not so slow that the salt can diffuse. Let the diffusion coefficient for the salt be χC by analogy with χ for temperature. If the length of the fingers is L and their width is δf ,

54 then to facilitate heat diffusion and prevent salt diffusion, the vertical speed v must satisfy χL χC L ≪v≪ 2 . 2 δf δf

(15.134)

Balancing the viscous acceleration vν/δf2 by the buoyancy acceleration gβδC, we obtain v∼

gβδCδf2 . ν

We can therefore re-write Eq. (15.134) as 1/4 1/4 χC νL χνL ≪ δf ≪ . gβδC gβδC

(15.135)

(15.136)

Typically, χC ∼ 0.01χ, so Eq. (15.136) implies that the widths of the fingers lie in a narrow range, as is verified in laboratory experiments. Salt fingering can also occur naturally, for example in an estuary where cold river water flows beneath sea water warmed by the sun. However, the development of salt fingers is quite slow and in practice it only leads to mixing when the equilibrium velocity field is very small. This instability is one example of a quite general type of instability known as double diffusion which can arise when two physical quantities can diffuse through a fluid at different rates. Other examples include the diffusion of two different solutes and the diffusion of vorticity and heat in a rotating flow. **************************** EXERCISES Exercise 15.23 Problem: Laboratory experiment Make an order of magnitude estimate of the size of the fingers and the time it takes for them to grow in a small transparent jar. You might like to try an experiment. Exercise 15.24 Problem: Internal Waves Consider a stably stratified fluid at rest and let there be a small (negative) vertical density gradient, dρ/dz. (a) By modifying the above analysis, ignoring the effects of viscosity, heat conduction and concentration gradients, show that small-amplitude linear waves, which propagate in a direction making an angle θ to the vertical, have an angular frequency given by ω = N| sin θ|, where N ≡ [(g · ∇) ln ρ]1/2 is known as the Brunt-Väisälä frequency. These waves are called internal waves. (b) Show that the group velocity of these waves is orthogonal to the phase velocity and interpret this result physically.

****************************

55 Box 15.4 Important Concepts in Chapter 15 • Gravity waves on water and other liquids, Sec. 15.2 – – – – – –

Deep water waves and shallow water waves, Secs. 15.2.1, 15.2.2 Tsunamis, Ex. 15.6 Dispersion, Sec. 15.3.1 Steepening due to nonlinear effects, Sec. 15.3.1, Fig. 15.4 Solitons or solitary waves; nonlinear steepening balances dispersion, Sec. 15.3 Korteweg-deVries equation, Secs. 15.3.1–15.3.4

• Surface tension and its stress tensor, Box 15.2 – Capillary waves, Sec. 15.2.3 • Rossby Waves in a Rotating Fluid, Sec. 15.4 • Sound waves in fluids and gases, Sec. 15.5 – Sound wave generation in slow-motion approximation: power proportional to squared time derivative of monopole moment, Sec. 15.5.2 – Decibel, Sec. 15.5.2 – Matched asymptotic expansions, Sec. 15.5.3 – Radiation reaction force; runaway solution as a spurious solution that violates the slow-motion approximation used to derive it, Sec. 15.5.3 • Thermal effects and convection, Sec. 15.6 – Coefficient of thermal conductivity, κ, and diffusive heat conduction, Sec. 15.6.1 – Thermal diffusivity, χ = κ/ρCp , and diffusion equation for temperature, Sec. 15.6.1 – Thermal expansion coefficient, α = (∂ ln ρ/∂T )P , Sec. 15.6.2 – Prandtl number, Pr= ν/χ ∼(vorticity diffusion)/(heat diffusion), Sec. 15.6.1 – Péclet number, Pe= V L/χ ∼(advection)/(conduction), Sec. 15.6.1 – Rayleigh number Ra=αg/∆T d3/(νχ) ∼(buoyancy)/(viscous force), Sec. 15.6.3 – Boussinesq approximation for analyzing thermally induced buoyancy, Sec. 15.6.2 – Free convection and forced convection, Sec. ?? – Rayleigh-Bénard (free) convection, Sec. 15.6.3 and Fig. 15.11 – Critical Rayleigh number for onset of Rayleigh-Bénard convection, Sec. 15.6.3 – Schwarzschild criterion for convection in stars, Sec. 15.6.4 – Double-diffusion instability, Sec. 15.6.5

56

Bibliographic Note For textbook treatments of waves in fluids, we recommend Lighthill (1978) and Whitham (1974), and from a more elementary and physical viewpoint, Tritton (1977). To develop physical insight into gravity waves on water and sound waves in a fluid, we suggest portions of the movie by Bryson (1964). For solitary-wave solutions to the Korteweg-deVries equation, see materials, including brief movies, at the website of Takasaki (2006). For a brief, physically oriented introduction to Rayleigh-Bénard convection see Chap. 4 of Tritton (1987). In their Chaps. 5 and 6, Landau and Lifshitz (1959) give a fairly succinct treatment of diffusive heat flow in fluids, the onset of convection in several different physical situations, and the concepts underlying double diffusion. In his Chaps. 2–6, Chandrasekhar (1961) gives a thorough and rich treatment of the influence of a wide variety of phenomena on the onset of convection, and on the types of fluid motions that can occur near the onset of convection. The book by Turner (1973) is a thorough treatise on the influence of buoyancy (thermally induced and otherwise) on fluid motions. It includes all topics treated in Sec. 15.6 and much much more.

Bibliography Bryson, A.E. 1964. Waves in Fluids, a movie (National Committee for Fluid Mechanics Films); available at http://web.mit.edu/hml/ncfmf.html . Burke, W. W. 1970. “Runaway solutions: remarks on the asymptotic theory of radiation damping,” Phys. Rev. A 2, 1501–1505. Chandrasekhar, S. 1961. Hydrodynamics and Hydromagnetic Stability Oxford:Oxford University Press00 Cole, J. 1974. Perturbation Methods in Applied Mathematics, Waltham Mass: Blaisdell. Chelton, D.B. and Schlax, M.G. (1996). “Global observations of oceanic Rossby waves”, Science, 272, 234. Fultz, D. 1969. Rotating Flows, a movie (National Committee for Fluid Mechanics Films); available at http://web.mit.edu/hml/ncfmf.html . Gill, A. E. 1982. Atmosphere Ocean Dynamics, New York: Academic Press. Greenspan, H. P. 1973. The Theory of Rotating Fluids, Cambridge: Cambridge University Press. Jackson, J. D. 1999. Classical Electrodynamics, third edition, New York: Wiley. Landau, L. D. and Lifshitz, E. M. 1959. Fluid Dynamics, Oxford: Pergamon.

57 Libbrecht, K. G. & Woodard, M. F. 1991. Science, 253, 152. Lighthill, M. J. 1978. Waves in Fluids, Cambridge: Cambridge University Press. Pellew, A. and Southwell, R. V. 1940. Proceedings of the Royal Society, A176, 312. Rorlich, F. 1965. Classical Charged Particles, Reading Mass: Addison Wesley. Scott-Russell, J. 1844. Proc. Roy. Soc. Edinburgh, 319 (1844). Takasaki, K. 2006. Many Faces of Solitons, http://www.math.h.kyoto-u.ac.jp/ takasaki/soliton-lab/gallery/solitons/kdv-e.html . Tritton, D. J. 1987. Physical Fluid Dynamics, Oxford: Oxford Science Publications. Turcotte, D. L. and Schubert, G. 1982. Geodynamics, New York: Wiley. Turner, J. S. 1973. Buoyancy Effects in Fluids, Cambridge: Cambridge University Press. Whitham, G. B. 1974. Linear and Non-Linear Waves, New York: Wiley.

Contents 16 Compressible and Supersonic Flow 16.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Equations of Compressible Flow . . . . . . . . . . . . . . . . . 16.3 Stationary, Irrotational Flow . . . . . . . . . . . . . . . . . . . 16.3.1 Quasi-One Dimensional Flow . . . . . . . . . . . . . . 16.3.2 Setting up a Stationary, Transonic Flow . . . . . . . . 16.3.3 Rocket Engines . . . . . . . . . . . . . . . . . . . . . . 16.4 One Dimensional, Time-Dependent Flow . . . . . . . . . . . . 16.4.1 Riemann Invariants . . . . . . . . . . . . . . . . . . . . 16.4.2 Shock Tube . . . . . . . . . . . . . . . . . . . . . . . . 16.5 Shock Fronts . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5.1 Junction Conditions Across a Shock; Rankine-Hugoniot 16.5.2 Internal Structure of Shock . . . . . . . . . . . . . . . 16.5.3 Shock jump conditions in a perfect gas with constant γ 16.5.4 Mach Cone . . . . . . . . . . . . . . . . . . . . . . . . 16.6 Similarity Solutions — Sedov-Taylor Blast Wave . . . . . . . . 16.6.1 Atomic Bomb . . . . . . . . . . . . . . . . . . . . . . . 16.6.2 Supernovae . . . . . . . . . . . . . . . . . . . . . . . .

0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

1 1 3 6 6 8 9 14 14 16 19 20 23 24 25 29 32 34

Chapter 16 Compressible and Supersonic Flow Version 0816.1.K.tex 25 February 2009 Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 16.1 Reader’s Guide • This chapter relies heavily on Chap. 12 and on Secs. 15.2, 15.3 and 15.5 of Chap. 15. • No subsequent chapters rely substantially on this one.

16.1

Overview

So far, we have mainly been concerned with flows that are slow enough that they may be treated as incompressible. We now consider flows in which the velocity approaches or even exceeds the speed of sound and in which density changes along streamlines cannot be ignored. Such flows are common in aeronautics and astrophysics. For example, the motion of a rocket through the atmosphere is faster than the speed of sound in air. In other words, it is supersonic. Therefore, if we transform into the frame of the rocket, the flow of air past the rocket is also supersonic. When the flow speed exceeds the speed of sound in some reference frame, it is not possible for a pressure pulse to travel upstream in that frame and change the direction of the flow. However, if there is a solid body in the way (e.g. a rocket or aircraft), the flow direction must change. In a supersonic flow, this change happens nearly discontinuously, through the formation of shock fronts at which the flow suddenly decelerates from supersonic to subsonic. An example is shown in Fig. 16.1. Shock fronts are an inevitable feature of supersonic flows. In another example of supersonic flow, a rocket itself is propelled by the thrust created by escaping hot gases from its exhaust. These hot gases move through the exhaust at supersonic 1

2

Fig. 16.1: Complex pattern of shock waves formed around a model aircraft in a wind tunnel with air moving ten percent faster than the speed of sound (i.e. with Mach number M = 1.1.) Image from W. G. Vicenti; reproduced from Van Dyke 1982.

speeds, expanding and cooling as they accelerate. In this manner the random thermal motion of the gas molecules is converted into an organised bulk motion that carries away negative momentum from the rocket and pushes it forward.

Sun

400 km s -1

Earth

Bo

wS

hoc

k

Fig. 16.2: The supersonic solar wind forms a type of shock front known as a bow shock when it passes by a planet.

The solar wind furnishes yet another example of a supersonic flow. This high speed flow of ionized gas is accelerated in the solar corona and removes a fraction ∼ 10−14 of the sun’s mass every year. Its own pressure accelerates it to supersonic speeds of ∼ 400 km s−1 . When the outflowing solar wind encounters a planet, it is rapidly decelerated to subsonic speed by passing through a strong discontinuity known as a bow shock, which surrounds the planet (Fig. 16.2). The bulk kinetic energy in the solar wind, built up during acceleration, is rapidly and irreversibly transformed into heat as it passes through this shock front. In this chapter, we shall study some properties of supersonic flows. After restating the

3 basic equations of compressible fluid dynamics (Sec. 16.2), we shall analyze three important, simple cases: quasi-one-dimensional stationary flow (Sec. 16.3), time-dependent one dimensional flow (Sec. 16.4), and normal adiabatic shock fronts (Sec. 16.5). In these sections, we shall apply the results of our analyses to some contemporary examples, including the Space Shuttle (Box 16.2), rocket engines, shock tubes, and the Mach cone, N-wave and sonic booms produced by supersonic projectiles and aircraft. In Sec. 16.6, we will develop similarity-solution techniques for supersonic flows and apply them to supernovae, underwater depth charges, and nuclear-bomb explosions in the earth’s atmosphere.

16.2

Equations of Compressible Flow

In Chap. 12, we derived the equations of fluid dynamics, allowing for compressibility. We expressed them as conservation laws for mass [Eq. (12.25)], momentum [∂(ρv)/ ∂t+∇·T = 0 with T as given in Table 12.2], and energy [∂U/∂t + ∇ · F = 0 with U and F as given in Table 12.2]; and also an evolution law for entropy [Eq. (12.71)]. When, as in this chapter, heat conduction is negligible (κth → 0) and the gravitational field is a time-independent, external one (not generated by the flowing fluid), these equations become ∂ρ + ∇ · (ρv) = 0 , ∂t ∂(ρv) + ∇ · (P g + ρv ⊗ v − 2ησ − ζθg) = ρg , ∂t 1 2 ∂ 1 ( v + u + Φ)ρ + ∇ · [( v 2 + h + Φ)ρv − 2ησ · v − ζθv] = 0 , ∂t 2 2 1 ∂(ρs) 2ησ : σ + ζθ2 . + ∇ · (ρsv) = ∂t T

(16.1)

(16.2) (16.3)

(16.4)

Here σ : σ is index-free notation for σij σij . Some comments are in order. Equation (16.1) is the complete mass conservation equation (continuity equation) assuming that matter is neither added to nor removed from the flow. Equation (16.2) expresses the conservation of momentum allowing for one external force, gravity. Other external forces can be added. Equation (16.3), expressing energy conservation, includes a viscous contribution to the energy flux. If there are sources or sinks of fluid energy, then these must be included on the right-hand side of this equation. Possible sources of energy include chemical or nuclear reactions; possible energy sinks include cooling by emission of radiation. We will incorporate the effects of heat conduction into the energy equation in the next chapter. Equation (16.4) expresses the evolution of entropy, and will also need modification if there are additional contributions to the energy equation. The right-hand side of this equation is the rate of increase of entropy due to viscous heating. This equation is not independent of the preceding equations and the laws of thermodynamics, but is often more convenient to use. In particular, one often uses it (together with the first law of thermodynamics) in place of energy conservation (16.3).

4 These equations must be supplemented with an equation of state in the form P (ρ, T ) or P (ρ, s). For simplicity, we shall often focus on a perfect gas that undergoes adiabatic evolution with constant specific-heat ratio (adiabatic index) γ, so the equation of state has the simple form (Box 12.2 and Ex. 12.2) P = K(s)ργ .

(16.5)

Here K(s) is a function of the entropy per unit mass s and is thus constant during adiabatic evolution, but will change across shocks because the entropy increases in a shock (Sec. 16.5). The value of γ depends on the number of thermalized internal degrees of freedom of the gas’s constituent particles (Ex. 16.1). For a gas of free particles (e.g. fully ionized hydrogen), it is γ = 5/3; for the earth’s atmosphere, at temperatures between about 10 K and 1000 K, it is γ = 7/5 = 1.4 (Ex. 16.1). For such a gas, we can integrate the first law of thermodynamics (Box 12.2) to obtain a formula for the internal energy per unit mass, u=

P , (γ − 1)ρ

(16.6)

where we have assumed that the internal energy vanishes as the temperature T → 0. It will prove convenient to express the density ρ, the internal energy per unit mass u and the enthalpy per unit mass h in terms of the sound speed s s γP ∂P = c= (16.7) ∂ρ s ρ [Eq. (15.48d)]. A little algebra gives ρ=

c2 γK

1/(γ−1)

,

u=

c2 , γ(γ − 1)

h=u+

P c2 = . ρ γ−1

(16.8)

**************************** EXERCISES Exercise 16.1 *** Example: Values of γ Consider a gas consisting of several different particle species, e.g. oxygen molecules and nitrogen molecules in the case of the Earth’s atmosphere. Consider a sample of this gas with volume V , containing NA particles of species A all in thermodynamic equilibrium at a temperature T sufficiently low that we can ignore effects of special relativity. Let species A have νA internal degrees of freedom (e.g., rotation and vibration) that are thermally excited, so on average each such particle has 23 kB T of translational energy plus 12 νA kB T of internal energy. Then the sample’s total energy E and pressure P are X 3 νA 1 X NA kB T , P = NA kB T . (16.9) + E= 2 2 V A A

5 1.4 1.38 γ 1.36 1.34 1.32 300

T, K

500

700

1000

Fig. 16.3: The ratio of specific heats γ for air as a function of temperature.

(a) Use the laws of thermodynamics to show that the specific heats at fixed volume and pressure are T ∂S E X 3 νA PV T ∂S NA kB , CP = = = + = CV + , CV ≡ ∂T V,NA T 2 2 ∂T P,NA T A (16.10) so the ratio of specific heats is P CP A NA γ= =1+ P (16.11) νA . 3 CV A NA 2 + 2 (b) If there are no thermalized internal degrees of freedom, νA = 0 (e.g., for a fully ionized, nonrelativistic gas), then γ = 5/3. For the earth’s atmosphere, at temperatures between about 10 K and 1000 K, the rotational degrees of freedom of the O2 and N2 molecules are thermally excited, but the temperature is too low to excite their vibrational degrees of freedom. Explain why this means that νO2 = νN2 = 2, which implies γ = 7/5 = 1.4. (HInt: there are just two orthogonal axes around which the diatomic molecule can rotate.) (c) Between about 1000 and roughly 10,000 K the vibrational degrees of freedom are thermalized but the molecules have not dissociated substantially into individual atoms nor become substantially ionized. Explain why this means that νO2 = νN2 = 4 in this temperature range, which implies γ = 9/7 = 1.29. (Hint: an oscillator has kinetic energy and potential energy.) (d) At roughly 10,000 K the two oxygen atoms in O2 dissociate from each other, the two nitrogen atoms in N2 dissociate, and electrons begin to ionize from the atoms. Explain why this drives γ up toward 5/3 ≃ 1.67. The actual value of γ as a function of temperature for the range 200 K to 1300 K is shown in Fig. 16.3. Evidently, γ = 1.4 is a good approximation only up to about 400 K; the transition toward γ = 1.29 occurs gradually between about 400 K and 1400 K.

6 A*

A

Supersonic

Subsonic

A A*

1

1

M

Fig. 16.4: Variation of cross sectional area, A, of a narrow bundle of flow lines as the Mach number, M , increases. Note that the flow is transonic (M = 1) when A is at its minimum, A∗ .

****************************

16.3

Stationary, Irrotational Flow

16.3.1

Quasi-One Dimensional Flow

In their full generality, the fluid dynamic equations (16.1)–(16.4) are quite unwieldy. To demonstrate some of the novel features of supersonic flow, we shall proceed as in earlier chapters: We shall specialize to a very simple type of flow in which the physical effects of interest are strong, and extraneous effects are negligible. In particular, in this section, we shall seek insight into smooth transitions between subsonic and supersonic flow by restricting ourselves to a stationary (∂/∂t = 0), irrotational (∇×v = 0) flow in which gravity and viscosity are negligible (Φ = η = ζ = 0), as are various effects not included in our general equations: chemical reactions, thermal conductivity and radiative losses. (We shall explore effects of gravity in Ex. 16.4.) The vanishing viscosity implies [from the entropy evolution equation (16.4)] that the entropy per baryon s is constant along each flow line. We shall assume that s is the same on all flow lines, so the flow is fully isentropic (s constant everywhere) and the pressure P = P (ρ, s) can thus be regarded as a function only of the density, P = P (ρ). When we need a specific form for P (ρ), we will use that of an ideal gas with constant specific-heat ratio [Eqs. (16.5)–(16.8); Ex. 16.1], but much of our analysis will be done for a general isentropic P (ρ). We will make one further approximation, that the flow is almost one dimensional. In other words, the velocity vectors all make small angles with each other in the region of interest. These drastic simplifications are actually appropriate for many cases of practical interest. Granted these simplifications, we can consider a narrow bundle of streamlines which we call a streamtube and introduce, as a tool in studying it, its cross sectional area A, normal to the flow (Fig. 16.4). As the flow is stationary, the equation of mass conservation (16.1) states that the rate at

7 which mass passes through the streamtube’s cross section must be independent of position along the tube: ρvA = constant; (16.12) here v is the speed of the fluid in the streamtube. Rewriting this in differential form, we obtain dA dρ dv + + =0. (16.13) A ρ v Because the flow is stationary and inviscid, the law of energy conservation (16.3) reduces to Bernoulli’s theorem [Eqs. (12.46), (12.47)]: 1 h + v 2 = constant 2

(16.14)

along each streamline and thus along our streamtube. Since the flow is adiabatic, we can use the first law of thermodynamics (Box 12.2) dh = dP/ρ + T ds = dP/ρ = c2 dρ/ρ [where c is the speed of sound (16.7)] to write Eq. (16.14) in the differential form dρ vdv + 2 =0. ρ c

(16.15)

Finally and most importantly, we combine Eqs. (16.13) and (16.15) to obtain dv dA/A = 2 ; v M −1

dρ dA/A = −2 . ρ M −1

(16.16)

where M ≡ v/c

(16.17)

is the Mach number. This Mach number is an important new dimensionless number that can be used to characterize compressible flows. When the Mach number is less than 1, the flow is called subsonic; when M > 1, it is supersonic. By contrast with the Reynolds, Rossby and Ekman numbers, which are usually defined using a single set of (characteristic) values of the flow parameters (V , ν, Ω, L) and thus have a single value for any given flow, the Mach number by convention is defined at each point in the flow and thus is a flow variable M(x) similar to v(x) and ρ(x). Equations (16.16) make remarkable predictions: At points along a streamtube where the flow is extremely subsonic M ≪ 1, v varies inversely with A, in accord with everyday experience. At points where the flow is subsonic M < 1 but not extremely so, a decrease in the cross sectional area A still causes the fluid to speed up (v to increase), but not so strongly as when M ≪ 1. By contrast, at points where the flow is supersonic M > 1, a decrease of A causes the speed v to decrease, and an increase of A causes it to increase—just opposite to everyday, subsonic experience! However, because mass is still conserved, ρvA = constant, the synchronous decrease (or increase) of v and A is compensated by the opposite evolution of ρ. At points where the flow is extremely supersonic M ≫ 1, v remains nearly constant as A changes. Finally, a transition from subsonic to supersonic flow (i.e., a sonic point M = 1) can occur only at a minimum of the tube’s area A. These conclusions are very useful in analyzing stationary, high-speed flows.

8 Amin P2

P1

(a) P P1 1

P2

2b

2 3 2a

4

(b) x

v

2a 3 2

(c)

4

2b 1

Critical Point

x

Fig. 16.5: Stationary flow through a channel between two chambers maintained at different pressures P1 and P2 . When the pressure difference P1 − P2 is large enough, the flow is subsonic to the left of the channel’s throat and supersonic to the right. As it nears or enters the second chamber, the supersonic flow must decelerate abruptly to a subsonic speed in a strong shock.

16.3.2

Setting up a Stationary, Transonic Flow

The reader may wonder, at this point, whether it is easy to set up a flow in which the speed of the fluid changes continuously from subsonic to supersonic. The answer is quite illuminating. We can illustrate the answer using two chambers maintained at different pressures, P1 and P2 , and connected through a narrow channel along which the cross sectional area passes smoothly through a minimum A = A∗ , the channel’s throat (Fig. 16.5). When P2 = P1 , there will be no flow between the two chambers. When we decrease P2 slightly below P1 , there will be a slow subsonic flow through the channel (curve 1 in Fig. 16.5) that eventually will equalize the pressures. As we decrease P2 further, there comes a point (P = P2crit) at which the flow is forced to be transonic at the channel’s throat A = A∗ (curve 2). For all pressures P2 > P2crit, the flow is also transonic at the throat and has a universal form to the left of and near the throat, independent of the value of P2 (curve 2)—including a universal value for the rate of mass flow through the throat! This universal flow is supersonic to the right of the throat, but it must be brought to rest in chamber 2, since there is a hard wall at the chamber’s end. How is it brought to rest? Through a shock front, where it is driven subsonic almost discontinuously (curves 3 and 4; Sec. 16.5 below).

9 Subsonic Flow De Laval Nozzle Combustion Chamber P0

A*

Ve

P* Skirt Supersonic Flow

Fig. 16.6: Schematic illustration of a rocket engine. Note the skirt, which increases the thrust produced by the escaping exhaust gases.

How, physically, is it possible for the flow to have a universal form to the left of the shock? The key to this is that in any supersonic region of the flow, disturbances are unable to propagate upstream, so the upstream fluid has no way of knowing what the pressure P2 is in chamber 2. Although the flow to the left of the shock is universal, the location of the shock and the nature of the subsonic, post-shock flow are affected by P2 , since information can propagate upstream through that subsonic flow, from chamber 2 to the shock. The reader might now begin to suspect that the throat, in the transonic case, is a very special location. It is, and that location is known as a critical point of the stationary flow. From a mathematical point of view, critical points are singular points of the equations of stationary flow [Eqs. (16.12)–(16.16)]. This singularity shows up in the solutions to the equations, as depicted in Fig. 16.5(c). The universal solution that passes transonically through the critical point (solution 2) joins onto two different solutions to the right of the throat: solution 2a, which is supersonic, and solution 2b, which is subsonic. Which solution occurs in practice depends on conditions downstream. Other solutions that are arbitrarily near this universal solution [dashed curves in Fig. 16.5(c)] are either double valued and consequently unphysical, or are everywhere subsonic or everywhere supersonic (in the absence of shocks). The existence of critical points is a price we must pay, mathematically, for not allowing our equations to be time dependent. If we were to solve the time-dependent equations (which would then be partial differential equations), we would find that they change from elliptic to hyperbolic as the flow passes through a critical point. From a physical point of view, critical points are the places where a sound wave propagating upstream remains at rest in the flow. They are therefore the one type of place from which time-dependent transients, associated with setting up the flow in the first place, cannot decay away (if the equations are dissipation-free, i.e., inviscid). Thus, even the time-dependent equations can display peculiar behaviors at a critical point. However, when dissipation is introduced, these peculiarities get smeared out.

16.3.3

Rocket Engines

We have shown that, in order to push a quasi-one-dimensional flow from subsonic to supersonic, one must send it through a throat. This result is exploited in the design of rocket engines and jet engines. In a rocket engine, hot gas is produced by controlled burning of fuel in a large chamber

10 and it then escapes through a converging-diverging (also known as De Laval ) nozzle, as shown in Fig. 16.6. The nozzle is designed with a skirt so the flow becomes supersonic smoothly when it passes through the nozzle’s throat. To analyze this flow in some detail, let us approximate it as precisely steady and isentropic, and the gas as perfect (no viscosity) with constant ratio of specific heats γ. In this case, the enthalpy is h = c2 /(γ − 1) [Eqs. (16.8)], so Bernoulli’s theorem (16.14) reduces to 1 c20 c2 + v2 = . (γ − 1) 2 (γ − 1)

(16.18)

Here c is the sound speed in the flow and c0 is the “stagnation” sound speed, i.e., the sound speed evaluated in the rocket chamber where v = 0 . Dividing this Bernoulli theorem by c2 and manipulating, we learn how the sound speed varies with Mach number M = v/c: −1/2 γ−1 2 c = c0 1 + M . (16.19) 2 From mass conservation [Eq. (16.12)], we know that the cross sectional area A varies as A ∝ ρ−1 v −1 ∝ ρ−1 M −1 c−1 ∝ M −1 c(γ+1)/(1−γ) , where we have used ρ ∝ c2/(γ−1) [Eqs. (16.8)]. Combining with Eq. (16.19), and noting that M = 1 where A = A∗ (i.e., the flow is transonic at the throat), we find that 1 A = A∗ M

2 + γ+1

γ−1 γ+1

M

2

(γ+1) 2(γ−1)

,

(16.20)

The pressure P∗ at the throat can be deduced from P ∝ ργ ∝ c2γ/(γ−1) [Eqs. (16.5) and (16.8)] together with Eq. (16.19) with M = 0 and P = P0 = (stagnation pressure) in the chamber and M = 1 at the throat: γ γ−1 2 P∗ = P0 . (16.21) γ+1 We use these formulas in Box 16.2 and Ex. 16.3 to evaluate, numerically, some features of the space shuttle and its rocket engines. Bernoulli’s theorem is a statement that the fluid’s energy is conserved along a streamtube. (For conceptual simplicity we shall regard the entire interior of the nozzle as a single streamtube.) By contrast with energy, the fluid’s momentum is not conserved, since it pushes against the nozzle wall as it flows. As the subsonic flow accelerates down the nozzle’s converging region, the area of its streamtube diminishes, and the momentum flowing per second in the streamtube, (P + ρv 2 )A, decreases; the momentum is being transferred to the nozzle wall. If the rocket did not have a skirt, but instead opened up completely to the outside world at its throat, the rocket thrust would be T∗ = (ρ∗ v∗2 + P∗ )A∗ = (γ + 1)P∗ A∗ .

(16.22)

This is much less than if momentum had been conserved along the subsonic, accelerating streamtubes.

11 Much of the “lost” momentum is regained, and the thrust is made significantly larger than T∗ , by the force of the skirt on the stream tube in the diverging part of the nozzle (Fig. 16.6). The nozzle’s skirt keeps the flow quasi-one-dimensional well beyond the throat, driving it more and more strongly supersonic. In this accelerating, supersonic flow the tube’s momentum flow (P + ρv 2 )A increases downstream, and there is a compensating increase of the rocket’s forward thrust. This skirt-induced force accounts for a significant fraction of the thrust of a well-designed rocket engine. Rockets work most efficiently when the exit pressure of the gas, as it leaves the base of the skirt, matches the external pressure in the surrounding air. When the pressure in the exhaust is larger than the external pressure, the flow is termed under-expanded and a pulse of low pressure, known as a rarefaction will be driven into the escaping gases causing them to expand and increasing their speed. However, the exhaust will now be pushing on the surrounding air, rather than on the rocket. More thrust could have been exerted on the rocket if the flow had not been under-expanded. By contrast, when the exhaust has a smaller pressure that the surrounding air (i.e., is over-expanded ), shock fronts will form near the exit of the nozzle, affecting the fluid flow and sometimes causing separation of the flow from the nozzle’s walls. It is important that the nozzle’s skirt be shaped so that the exit flow is neither seriously over- or under-expanded. **************************** EXERCISES Exercise 16.2 Derivation: De Laval Nozzle Verify Eqs. (16.16) and (16.21). Exercise 16.3 Problem: Space Shuttle’s Solid-Fuel Boosters Use the rough figures in Box 16.2 to estimate the energy released per unit mass in burning the fuel. Does your answer seem reasonable to you? Exercise 16.4 *** Example: Adiabatic, Spherical Accretion of Gas Onto a Black Hole or Neutron Star Consider a black hole or neutron star with mass M, at rest in interstellar gas that has constant ratio of specific heats γ. In this exercise you will derive some features of the adiabatic, spherical accretion of the gas onto the hole or star, a problem first solved by Bondi (1952). This exercise shows how gravity can play a role analogous to a De Laval nozzle: it can trigger a transition of the flow from subsonic to supersonic. (a) Let ρ∞ and c∞ be the density and sound speed in the gas far from the hole (at radius r = ∞). Use dimensional analysis to estimate the rate of accretion of mass M˙ onto the star or hole, in terms of the parameters of the system: M, γ, ρ∞ , c∞ , and Newton’s gravitation constant G. [Hint: dimensional considerations alone cannot give the answer. Why? Augment your dimensional considerations by a knowledge of how the answer should scale with one of the parameters, e.g. the density, ρ∞ .]

12 Box 16.2 Space Shuttle The Space Shuttle provides many convenient examples of the behavior of supersonic flows. At launch, the shuttle and fuel have a mass ∼ 2 × 106 kg. The maximum thrust, T ∼ 3 × 107 N, occurs at lift-off and lifts the rocket with an initial acceleration relative to the ground of ∼ 0.5g. This increases to ∼ 3g as the fuel is burned and the total mass diminishes. Most of the thrust is produced by two solid-fuel boosters which burn fuel at a combined rate of m ˙ ∼ 10000 kg s−1 over a two minute period. They produce a combined thrust of T ∼ 2 × 107 N averaged over the two minutes, from which we can estimate the speed of the escaping gases as they leave the exhaust. Assuming that this speed is quite supersonic (so Pe ≪ ρe ve2 ), we estimate that ve ∼ T /m ˙ ∼ 2km s−1 . Now the combined exit areas of the two exhausts is Ae ∼ 20m2 , roughly four times the combined throat area, A∗ . Using Eq. (16.20), we deduce that the exit Mach number is Me ∼ 3. The exit pressure is Pe ∼ T /γMe2 Ae ∼ 8 × 104 N m−2 , about atmospheric. The stagnation pressure within the combustion region is roughly γ (γ − 1)Me2 γ−1 ∼ 35 atmospheres. P0 ∼ Pe 1 + 2

(1)

Of course the actual operation is far more complex than this. For example, to optimize the final altitude, one must allow for the decreasing mass and atmospheric pressure as well as the two dimensional gas flow through the nozzle. The space shuttle can also be used to illustrate the properties of shock waves (Sec. 16.5). When the shuttle re-enters the atmosphere it is traveling highly supersonically. It must therefore be preceded by a strong shock front which heats the onrushing air and consequently heats the shuttle. The shuttle continues moving supersonically until it reaches an altitude of 15km and until this time creates a shock wave pattern that can be heard on the gound as a sonic boom. The maximum heating rate occurs at an altitude of 70km. Here, the shuttle moves at V ∼ 7km s−1 and the sound speed is about 280 m s−1 , giving a Mach number of 25. If we adopt a specific heat ratio γ ∼ 1.5 and a mean molecular weight µ ∼ 10 appropriate to dissociated air, we can conclude from the Rankine-Hugoniot conditions and the ideal gas law, Eq. (4) of Box 12.2, that the post shock temperature is 2(γ − 1)µmp V 2 ∼ 9000K (2) T ∼ (γ + 1)2 k Exposure to gas at this high a temperature heats the nose to ∼ 1800K. There is a second, well-known consequence of this high temperature and that is that it is sufficient to ionise the air partially as well as dissociate it. This surrounds the shuttle with a sheath of plasma, which as we shall discover in chapter 18, prevents radio communication. The blackout is maintained for about 12 minutes.

13 (b) Give a simple physical argument, devoid of dimensional considerations, that produces the same answer for M˙ as you deduced in part (a). (c) Because the neutron star and black hole are both very compact with intense gravity near their surfaces, the inflowing gas is guaranteed to accelerate to supersonic speeds as it falls in. Explain why the speed will remain supersonic in the case of the hole, but must transition through a shock to subsonic flow near the surface of the neutron star. If the star has the same mass M as the hole, will the details of its accretion flow [ρ(r), c(r), v(r)] be the same as or different from those for the hole, outside the star’s shock? Will the mass accretion rates M˙ be the same or different? Justify your answers, physically. (d) By combining the Euler equation for v(r) with the equation of mass conservation, M˙ = 4πr 2 ρv, and with the sound-speed equation c2 = (∂P/∂ρ)s , show that (v 2 − c2 )

1 dρ GM 2v 2 = 2 − . ρ dr r r

(16.23)

Thereby deduce that the flow speed vs , sound speed cs , and radius rs at the sonic point (the radius of transition from subsonic to supersonic flow) are related by vs2 = c2s =

GM . 2rs

(16.24)

(e) By combining with the Bernoulli equation (with the effects of gravity included), deduce that the sound speed at the sonic point is related to that at infinity by 2c2∞ 5 − 3γ

(16.25)

(5 − 3γ) GM . 4 c2∞

(16.26)

c2s = and that the radius of the sonic point is rs =

Thence also deduce a precise value for the mass accretion rate M˙ in terms of the parameters of the problem. Compare with your estimate of M˙ in parts (a) and (b). [Comment: For γ = 5/3, which is the value for hot, ionized gas, this analysis places the sonic point at an arbitrarily small radius. In this limiting case (i) general relativistic effects strengthen the gravitational field, thereby moving the sonic point well outside the star or hole, and (ii) your answer for M˙ has a finite value close to the general relativistic prediction. See Part VI of this book.] (f) Much of the interstellar medium is hot and ionized, with density about one proton per cubic centimeter and temperature about 104 K. In such a medium, what is the mass accretion rate onto a 10 solar mass hole, and approximately how long does it take for the hole’s mass to double?

****************************

14

16.4

One Dimensional, Time-Dependent Flow

16.4.1

Riemann Invariants

Let us turn now to time-dependent flows. Again we confine our attention to the simplest situation that illustrates the physics, in this case, truly one-dimensional motion of an isentropic fluid in the absence of viscosity, thermal conductivity and gravity, so the flow is adiabatic as well as isentropic (entropy constant in time as well as space). The motion of the gas in such a flow is described by the equation of continuity and the Euler equation specialized to one dimension. dρ ∂v dv 1 ∂P = −ρ , =− , (16.27) dt ∂x dt ρ ∂x where d ∂ ∂ = +v (16.28) dt ∂t ∂x is the convective time derivative (the time derivative moving with the fluid). Given an isentropic equation of state P = P (ρ) that relates the pressure to the density, these two nonlinear equations can be combined into a single second order differential equation in the velocity. However, it is more illuminating to work with the first-order set. As the gas is isentropic, the density ρ and sound speed c = (dP/dρ)1/2 can both be regarded as functions of a single thermodynamic variable, which we choose to be the pressure. Taking linear combinations of Eqs. (16.27), we obtain two partial differential equations 1 ∂P ∂v 1 ∂P ∂v =0, (16.29) ± + (v ± c) ± ∂t ρc ∂t ∂x ρc ∂x which together are equivalent to Eqs. (16.27). We can rewrite these equations in terms of Riemann invariants Z dP J± ≡ v ± (16.30) ρc and characteristic speeds V± ≡ v ± c.

(16.31)

in the following way:

∂ ∂ + V± ∂t ∂x

J± = 0.

(16.32)

Equation (16.32) tells us that the convective derivative of each Riemann invariant J± vanishes for an observer who moves, not with the fluid speed, but, instead, with the speed V± . We say that each Riemann invariant is conserved along its characteristic (denoted by C± ), which is a path through spacetime satisfying C± :

dx = v ± c. dt

Note that in these equations, both v and c are functions of x and t.

(16.33)

15 t C J- - : = co ns t

st : on C+ =c J+ P

A

B

x

S

Fig. 16.7: Spacetime diagram showing the characteristics (thin solid and dashed lines) for a one dimensional adiabatic flow of an isentropic gas. The paths of the fluid elements are shown as thick solid lines. Initial data are presumed to be specified over some interval ∂S of x at time t = 0. The Riemann invariant J+ is constant along each characteristic C+ (thin dashed line) and thus at point P it has the same value, unchanged, as at point A in the initial data. Similarly J− is invariant along each characteristic C− (thin solid line) and thus at P it has the same value as at B. The shaded area of spacetime is the domain of dependence S of ∂S

The characteristics have a natural interpretation. They describe the motion of small disturbances traveling backward and forward relative to the fluid at the local sound speed. As seen in the fluid’s local rest frame v = 0, two neighboring events in the flow, separated by a small time interval ∆t and a space interval ∆x = +c∆t so that they lie on the same C+ characteristic, will have small velocity and pressure differences satisfying ∆v = −∆P/ρc [as one can deduce from Eqs. (16.27) with v = 0, d/dt = ∂/∂t and c2 = dP/dρ]. Now, for a linear sound wave, propagating along the positive x direction, ∆v and ∆P will separately vanish. However in a nonlinear wave, only the combination ∆J+ = ∆v + ∆P/ρc will vanish along C+ . Integrating over a finite interval of time, we recover the constancy of J+ along the characteristic C+ [Eq. (16.30)]. The Riemann invariants provide a general method for deriving the details of the flow from initial conditions. Suppose that the fluid velocity and the thermodynamic variables are specified over an interval of x, designated ∂S, at an initial time t = 0 [Fig. 16.7]. This means that J± are also specified over this interval. We can then determine J± at any point P in the domain of dependence S of ∂S (i.e., at any point linked to ∂S by two characteristics C± ) by simply propagating each of J± unchanged along its characteristic. From these values of J± at P, we can solve algebraically for all the other flow variables (v, P , ρ, ...) at P. To learn the evolution outside the domain of dependence S, we must specify the initial conditions outside ∂S. In practice, we do not actually know the characteristics C± until we have solved for the flow variables, so we must solve for the characteristics as part of the solution process. This means, in practice, that the solutionR involves algebraic manipulations of (i) the equation of state and the relations J± = v ± dP/ρc, which give J± in terms of v and c; and (ii) the conservation laws that J± are constant along C± , i.e. along curves dx/dt = v ± c.

16 q

t=0

q

q0

t >> 0

q0

x

(a)

x

(b)

Fig. 16.8: Evolution of a nonlinear sound wave. The fluid at the crest of the wave moves faster than the fluid in the trough. Mathematically, the flow eventually becomes triple-valued. Physically, a shock wave develops.

These algebraic manipulations have the goal of deducing c(x, t) and v(x, t) from the initial conditions on ∂S. We shall exhibit a specific example in the next subsection. We can use Riemann invariants to understand qualitatively how a nonlinear sound wave evolves with time. If the wave propagates in the positive x direction into previously undisturbed fluid (fluid with v = 0),R then the J− invariant, propagating Rbackward along C− , is constant everywhere, so v = dP/ρc + constant. Let us use q ≡ dP/ρc as our wave variable. For a perfect gas with constant ratio of specific heats γ, q = 2c/(γ − 1), so our oscillating wave variable is essentially the oscillating sound speed. Constancy of J− then says that v = q − q0 , where q0 is the stagnation value of q, i.e. the value of q in the undisturbed fluid. Now, J+ = v+q is conserved on each rightward characteristic C+ , and so both v and q are separately conserved on each C+ . If we sketch a profile of the wave pulse as in Fig. 16.8 and measure its amplitude using the quantity q, then the relation v = q − q0 says that the fluid at the crest of the wave moves faster than the fluid in a trough. This causes the leading edge of the wave to steepen, a process we have already encountered in our discussion of shallowwater solitons (Chap. 15). Now, sound waves, by constrast with shallow-water waves, are non-dispersive so the steepening will continue until |dv/dx| → ∞ (Fig. 16.8). When the velocity gradient becomes sufficiently large, viscosity and dissipation will become strong, producing an increase of entropy and a breakdown of our isentropic flow. This breakdown and entropy increase will occur in an extremely thin region—a shock wave, which we shall study in Sec. 16.5

16.4.2

Shock Tube

We have shown how one dimensional isentropic flows can be completely analyzed by propagating the Riemann invariants along characteristics. Let us illustrate this in more detail by analyzing a shock tube, a laboratory device for creating supersonic flows and studying the behavior of shock waves. In a shock tube, high pressure gas is retained at rest in the left half of a long tube by a thin membrane. At time t = 0, the membrane is ruptured by a laser beam and the gas rushes into the tube’s right half, which has usually been evacuated. Diagnostic photographs and velocity and pressure measurements are synchronised with the

17

(a)

x = − c0 t C−

ave nw ctio efa rar

x vacuum

P0 t =0

(b)

rarefaction

2c0t ( −1)

c0 t

t

C− undisturbed

(c)

t>0

C+

C−

C−

C−

s ga

t fron vacuum

C − C+ 2c0t x= ( −1)

x

Fig. 16.9: Shock Tube. (a) At t ≤ 0 gas is held at rest at high pressure P0 in the left half of the tube. (b) At t > 0 the high-pressure gas moves rightward down the tube at high speed, and a rarefaction wave propagates leftward at the sound speed. (c) Space-time diagram showing the flow’s characteristics (C+ : thin dashed lines; C− : thin solid lines) and fluid paths (thick solid lines). To the left of the rarefaction wave, x < −co t, the fluid is undisturbed. To the right of the gas front, x > [2/(γ − 1)]co t, is undisturbed (near) vacuum

onset of the flow. Let us idealize the operation of a shock tube by assuming, once more, that the gas is perfect with constant γ, so that P ∝ ργ . For times t ≤ 0, we suppose that the gas has uniform density ρ0 and pressure P0 (and consequently uniform sound speed c0 ) at x ≤ 0, and that ρ = P = 0 at x ≥ 0. At time t = 0, the barrier is removed and the gas flows towards positive x. Now, the first Riemann invariant J+ is conserved on C+ , which originates in the static gas, so it has the value J+ = v +

2c0 2c = . γ−1 γ−1

(16.34)

Note that in this case, the invariant is the same on all rightward characteristics, i.e. throughout the flow, so 2(c0 − c) everywhere. (16.35) v= γ−1 The second invariant is J− = v −

2c . γ−1

(16.36)

Its constant values are not so easy to identify because those characteristics C− that travel through the perturbed flow all emerge from the origin, where v and c are indeterminate; cf. Fig. 16.9. However, by combining Eq. (16.35) with Eq. (16.36), we deduce that v and c are separately constant on each characteristic C− . This enables us, trivially, to solve the differential equation dx/dt = v − c for the leftward characteristics C− , obtaining C− :

x = (v − c)t .

(16.37)

18 Here we have set the constant of integration equal to zero so as to obtain all the characteristics that propagate through the perturbed fluid. (For those in the unperturbed fluid, v = 0 and c = c0 , so x = x0 − c0 t with x0 < 0 the characteristic’s initial location.) Now Eq. (16.37) is true on each characteristic in the perturbed fluid. Therefore it is true throughout the perturbed fluid. We can therefore combine Eqs. (16.35), (16.37) to solve for v(x, t) and c(x, t) throughout the perturbed fluid. That solution, together with the obvious solution (same as initial data) to the left and right of the perturbed fluid, is: v = 0 , c = c0 2 v= c0 + γ+1

at x < −c0 t , x 2c0 γ−1 x , c= − t γ+1 γ+1 t 2c0 t. vacuum prevails at x > γ−1

at − c0 t < x <

2c0 t, γ−1 (16.38)

Notice, in this solution, that the gas at x < 0 remains at rest until a rarefaction wave from the origin reaches it. Thereafter it is accelerated rightward by the local pressure gradient, and as it accelerates it expands and cools so its speed of sound c goes down; asymptotically it reaches zero temperature as exhibited by c = 0 and an asymptotic speed v = 2c0 /(γ − 1) [cf. Eq. (16.34)]; see Fig. 16.9. In the expansion, the internal random velocity of the gas molecules is transformed into an ordered velocity just as in a rocket’s exhaust. However, the total energy per unit mass in the stationary gas is u = c20 /γ(γ − 1) [Eq. (16.8)], which is less than the asymptotic kinetic energy per unit mass of 2c20 /(γ − 1)2 . The missing energy has gone into performing work on the gas that is still struggling to reach its asymptotic speed. In the more realistic case where there initially is some low-density gas in the evacuated half of the tube, the expanding driver gas creates a strong shock as it plows into the lowdensity gas. In the next section we shall explore the structure of this and other shock fronts. **************************** EXERCISES Exercise 16.5 Problem: Fluid Paths in Free Expansion We have computed the velocity field for a freely expanding gas, Eq. (16.38). Use this result to show that the path of an individual fluid element is 2 2c0 t −c0 t γ+1 γ+1 x= x0 at 0 < −x0 /c0 < t . + γ−1 γ−1 x0 Exercise 16.6 Problem: Riemann Invariants for Shallow-Water Flow Consider the one-dimensional flow of shallow water in a straight, narrow channel, neglecting dispersion and boundary layers. The equations governing the flow, as derived and discussed in Chap. 15, are ∂h ∂(hv) + =0, ∂t ∂x ∂v ∂v ∂h +v +g =0; ∂t ∂x ∂x

(16.39)

19 upstream

v2 back

v1 front

P1, u1, V1=1/ρ1

downstream

P2, u2, V2=1/ρ2

shock

Fig. 16.10: Terminology and notation for a shock front and the flow into and out of it.

cf. Eqs. (15.23a) and (15.23b). Here h(x, t) is the height of the water and v(x, t) is its depth-independent velocity. (a) Find two Riemann invariants J± for these equations, and find two conservation laws for these J± which are equivalent to the shallow-water equations (16.39). (b) Use these Riemann invariants to demonstrate that shallow-water waves steepen in the manner depicted in Fig. 15.4, a manner analogous to the peaking of the nonlinear sound wave in Fig. 16.8. (c) Use these Riemann invariants to solve for the flow of water h(x, t) and v(x, t) after a dam breaks (the problem posed in Ex. 15.8). The initial conditions, at t = 0, are v = 0 everywhere, and h = ho at x < 0, h = 0 (no water) at x > 0. ****************************

16.5

Shock Fronts

We have just demonstrated that in an ideal fluid, large perturbations to fluid dynamical variables inevitably evolve to form a divergently large velocity gradient—a shock front or a shock wave or simply a shock. Now, when the gradient becomes large, we can no longer ignore the viscous stress because the viscous terms in the Navier-Stokes equation involve second derivatives in space, whereas the inertial term involves only first derivatives. As in turbulence and in boundary layers, so also in a shock front, the viscous stress converts the fluid’s ordered, bulk kinetic energy into microscopic kinetic energy, i.e. thermal energy. The ordered fluid velocity v thereby is rapidly—almost discontinuously—reduced from supersonic to subsonic, and the fluid is heated. The cooler, supersonic region of incoming fluid is said to be ahead of or upstream from the shock, and it hits the shock’s front side; the hotter, subsonic region of outgoing fluid is said to be behind or downstream from the shock, and it emerges from the shock’s back side; see Fig. 16.10.

20

16.5.1

Junction Conditions Across a Shock; Rankine-Hugoniot Relations

The viscosity is crucial to the internal structure of the shock, but it is just as negligible in the downstream flow behind the shock as in the upstream flow ahead of the shock, since there velocity gradients are modest again. Remarkably, if (as is usually the case) the shock front is very thin compared to the length scales in the upstream and downstream flows, and the time for the fluid to pass through the shock is short compared to the upstream and downstream timescales, then we can deduce the net influence of the shock on the flow without any reference to the viscous processes that operate within the shock, and without reference to the shock’s detailed internal structure. We do so by treating the shock as a discontinuity across which certain junction conditions must be satisfied. This is similar to electromagnetic theory, where the junction conditions for the electric and magnetic fields across a material interface are independent of the detailed structure of the interface. The keys to the shock’s junction conditions are the conservation laws for mass, momentum and energy: The fluxes of mass, momentum, and energy must usually be the same in the downstream flow, emerging from the shock, as in the upstream flow, entering the shock. To understand this, we first note that, because the time to pass through the shock is so short, mass, momentum and energy cannot accumulate in the shock, so the flow can be regarded as stationary. In a stationary flow the mass flux is always constant, as there is no way to create new mass; its continuity across the shock can be written as [ρv · n] = 0 ,

(16.40a)

where n is the unit normal to the shock front and the square bracket means the difference in the values on the downstream and upstream sides of the shock. Similarly, the total momentum flux, T · n, must be conserved in the absence of external forces. Now T has both a mechanical component, P g + ρv ⊗ v and a viscous component, −ζθg − 2ησ. However, the viscous component is negligible in the upstream and downstream flows, which are being matched to each other, so the mechanical component by itself must be conserved across the shock front: [(P g + ρv ⊗ v) · n] = 0 . (16.40b) Similar remarks apply to the energy flux, though here we must be slightly more restrictive. There are three ways that a change in the energy flux could occur. First, energy may be added to the flow by chemical or nuclear reactions that occur in the shock front. Second, the gas may be heated to such a high temperature that it will lose energy through the emission of radiation. Third, energy may be conducted far upstream by suprathermal particles so as to pre-heat the incoming gas. This will thicken the shock front and may make it so thick that it can no longer be sensibly approximated as a discontinuity. If any of these processes are occuring, we must check to see whether they are strong enough to significantly influence energy conservation across the shock. What such a check often reveals is that preheating is negligible, and the lengthscales over which the chemical and nuclear reactions and radiation emission operate are much greater than the length over which viscosity acts. In this case we can conserve energy flux across the viscous shock and then follow the evolutionary effects of reactions and radiation (if significant) in the downstream flow.

21 A shock with negligible preheating, and with negligible radiation emission and chemical and nuclear reactions inside the shock, will have the same energy flux in the departing, downstream flow as in the entering, upstream flow, i.e., they will satisfy 1 2 v + h ρv · n = 0 . (16.40c) 2 Shocks which satisfy the conservation laws of mass, momentum and energy, Eqs. (16.40), are said to be adiabatic. By contrast, with mass, momentum and energy, the flux of entropy will not be conserved across a shock front, since viscosity and other dissipative processes increase the entropy as the fluid flows through the shock. So far, the only type of dissipation which we have discussed is viscosity and this is sufficient by itself to produce a shock front and keep it thin. However, heat conduction, which we shall analyze in the following chapter, and electrical resistivity, which is important in magnetic shocks (Chap. 18), can also contribute to the dissipation and can influence the detailed structure of the shock front. For an adiabatic shock, the three requirements of mass, momentum and energy conservation, known collectively as the Rankine-Hugoniot relations, enable us to relate the downstream flow and its thermodynamic variables to their upstream counterparts.1 Let us work in a reference frame where the incoming flow is normal to the shock front and the shock is at rest, so the flow is stationary. Then the conservation of tangential momentum — the tangential component of Eq. (16.40b) — tells us that the outgoing flow is also normal to the shock in our chosen reference frame. We say that the shock is normal, not oblique. We use the subscripts 1, 2 to denote quantities measured ahead of and behind the shock respectively; i.e., 1 is the incoming flow and 2 is the outgoing flow (cf. Fig. 16.10 above). The Rankine-Hugoniot relations (16.40) then take the forms ρ2 v2 = ρ1 v1 = j , (16.41a) 2 2 P2 + ρ2 v2 = P1 + ρ1 v1 , (16.41b) 1 1 h2 + v22 = h1 + v12 , (16.41c) 2 2 where j is the mass flux, which is determined by the upstream flow. These equations can be brought into a more useful form by replacing the density with the specific volume V ≡ 1/ρ, replacing the enthalpy by its value in terms of P and V , h = u + P/ρ = u + P V , and performing some algebra; the result is 1 u2 − u1 = (P1 + P2 )(V1 − V2 ) 2 j2 = 1

P2 − P1 , V1 − V2

(16.42a) (16.42b)

The existence of shocks was actually understood quite early on, more or less in this way, by Stokes. However, he was persuaded by his former student Rayleigh that such discontinuities were impossible because they would violate energy conservation. With a deference that professors traditionally show their students, Stokes believed him.

22 P Possible Downstream State

P2 Increasing Entropy -j pe slo

s=

con

stan

2

P

Shock Adiabat

s=

1 Upstream State

V2

V1

con

t

stan

t

V

Fig. 16.11: Shock Adiabat. The pressure and specific volume V = 1/ρ in the upstream flow are P1 and V1 , and in the downstream flow P2 and V2 . The dashed curves are ordinary adiabats (curves of constant entropy per unit mass s). The thick curve is the shock adiabat, the curve of allowed downstream states (V2 , P2 ) for a given upstream state (V1 , P1 ). The actual location of the downstream state on this adiabat is determined by the mass flux j flowing through the shock: the slope of the dotted line connecting the upstream and downstream states is −j 2 .

v1 − v2 = [(P2 − P1 )(V1 − V2 )]1/2 .

(16.42c)

This is the most widely used form of the Rankine-Hugoniot relations. It must be augmented by the equation of state in the form u = u(P, V ) .

(16.43)

Some of the physical content of these Rankine-Hugoniot relations is depicted in Fig. 16.11. The thermodynamic state of the upstream (incoming) fluid is the point (V1 , P1 ) in this volume-pressure diagram. The thick solid curve, called the shock adiabat, is the set of all possible downstream (outgoing) fluid states. This shock adiabat can be computed by combining Eq. (16.42a) with the equation of state (16.43). Those equations will actually give a curve that extends away from (V1 , P1 ) in both directions, up-leftward and downrightward. Only the up-leftward portion is compatible with an increase of entropy across the shock; the down-rightward portion requires an entropy decrease, which is forbidden by the second law of thermodynamics, and therefore is not drawn on Fig. 16.11. The actual location (V2 , P2 ) of the downstream state along the shock adiabat is determined by the Eq. (16.42b) in a simple way: the slope of the dotted line connecting the upstream and downstream states is −j 2 , where j is the mass flux passing through the shock. When one thereby has learned (V2 , P2 ), one can compute the downstream speed v2 from Eq. (16.42c). It can be shown that the pressure and density always increase across a shock (as is the case in Fig. 16.11), and the fluid always decelerates, P 2 > P1 ,

V 2 < V1 ,

v2 < v1 ;

(16.44)

see Ex. 16.10. It can also be demonstrated in general, and will be verified in a particular case below, that the Rankine-Hugoniot relations require the flow to be supersonic with respect

23 to the shock front upstream v1 > c1 and subsonic downstream, v2 < c2 . Physically, this is sensible (as we have seen above): When the fluid approaches the shock supersonically, it is not possible to communicate a pressure pulse upstream from the shock (via a Riemann invariant moving at the speed of sound) and thereby cause the flow to decelerate; therefore, to slow the flow a shock must develop.2 By contrast, the shock front can and does respond to changes in the downstream conditions, since it is in causal contact with the downstream flow; sound waves and a Riemann invariant can propagate upstream, through the downstream flow, to the shock.

16.5.2

Internal Structure of Shock

Although they are often regarded as discontinuities, shocks, like boundary layers, do have structure. The simplest case is that of a gas in which the shear viscosity coefficient is molecular in origin and is given by η = ρν ∼ ρlmfp v¯th /3, where lmfp is the molecular mean free path and vth ∼ c is the thermal speed of the molecules. In this case the viscous stress Txx = −ζθ − 2ησxx is −(ζ + 4η/3)dv/dx, where ζ is the coefficient of bulk viscosity which can be of the same order as the coeffient of shear viscosity. In the shock, this must roughly balance the total kinetic momentum flux ∼ ρv 2 . If we estimate the velocity gradient dv/dx by v1 /δS where δS is a measure of the shock thickness and we estimate the sound speed in the shock front by c ∼ v1 , then we deduce that the shock thickness is δS ∼ lmfp, the collision mean free path √ in the gas. For air at standard temperature and pressure, the mean free path is lmfp ∼ ( 2nπσ 2 )−1 ∼ 70 nm, where n is the molecular density and σ is the molecular diameter. This is very small! Microscopically, it makes sense that δS ∼ lmfp as an individual molecule only needs a few collisions to randomize its ordered motion perpendicular to the shock front. However, this estimate raises a problem as it brings into question our use of the continuum approximation (cf. Sec. 12.1). It turns out that, when a more careful calculation of the shock structure is carried out incorporating heat conduction, the shock thickness is several mean free paths, fluid dynamics is acceptable for an approximate theory, and the results are in rough accord with measurements of the velocity profiles of shocks with modest Mach numbers. Despite this, a kinetic treatment is usually necessary for an accurate description of the shock structure. So far we have assumed that the shocked fluid is made of uncharged molecules. A more complicated type of shock can arise in an ionized gas, i.e. a plasma (Part V). Shocks in the solar wind are examples. In this case, the collision mean free paths are enormous, in fact comparable with the transverse size of the shock, and therefore one might expect the shocks to be so thick that the Rankine-Hugoniot relations will fail. However, spacecraft measurements reveal solar-wind shocks that are relatively thin—far thinner than the collisional mean free paths of the plasma’s electrons and ions. In this case, it turns out that collisionless, collective interactions in the plasma are responsible for the viscosity and dissipation. (The particles create plasma waves, which in turn deflect the particles.) These processes are so efficient that thin shock fronts can occur without individual particles having to hit one another. Since 2

Of course, if there is some faster means of communication, for example photons or, in an astrophysical context, cosmic rays, then there may be a causal contact between the shock and the inflowing gas, and this can either prevent shock formation or lead to a more complex shock structure.

24 the shocks are thin, they must satisfy the Rankine-Hugoniot relations. We shall discuss collisionless shocks further in Chap. 20. To summarize, shocks are machines that decelerate a normally incident upstream flow to a subsonic speed, so it can be in causal contact with conditions downstream. In the process, bulk momentum flux, ρv 2 , is converted into pressure, bulk kinetic energy is converted into internal energy, and entropy is manufactured by the dissipative processes at work in the shock front. For a given shock Mach number, the downstream conditions are fixed by the conservation laws of mass, momentum and energy and are independent of the detailed dissipation mechanism.

16.5.3

Shock jump conditions in a perfect gas with constant γ

Let us again specialize to a perfect gas with constant specific-heat p √ratio γ, so the equation of state is u = P V /(γ − 1) and the sound speed is c = γP/ρ = γP V [Eqs. (16.5)–(16.8)]. We measure the strength of the shock using the shock Mach number p M, which is defined to be the Mach number in the upstream flow, M ≡ M1 = v1 /c1 = v12 /γP1 V1 . With the aid of this equation of state and Mach number, we can bring the Rankine-Hugoniot relations (16.42) into the form V2 v2 γ−1 2 ρ1 = = = + , (16.45a) ρ2 V1 v1 γ + 1 (γ + 1)M 2 P2 2γM 2 γ − 1 = − , P1 γ+1 γ+1 M22 =

2 + (γ − 1)M 2 . 2γM 2 − (γ − 1)

(16.45b)

(16.45c)

Here M2 ≡ v2 /c2 is the downstream Mach number. The results for this equation of state illustrate a number of general features of shocks: The density and pressure increase across the shock, the flow speed decreases, and the downstream flow is subsonic—all discussed above—and one important new feature: A shock weakens as its Mach number M decreases. In the limit that M → 1, the jumps in pressure, density, and speed vanish and the shock disappears. In the strong-shock limit, M ≫ 1, the jumps are V2 v2 γ−1 ρ1 = = ≃ , ρ2 V1 v1 γ+1

(16.46a)

P2 2γM 2 ≃ . P1 γ+1

(16.46b)

Thus, the density jump is always of order unity, but the pressure jump grows ever larger as M increases. Air has γ ≃ 1.4 (Ex. 16.1), so the density compression ratio for a strong shock in air is 6 and the pressure ratio is P2 /P1 = 1.2M 2

25 c0t2 c0t1 Present projectile position

vp vp t1 vp t2 Fig. 16.12: Construction for Mach cone formed by a supersonic projectile. The cone angle is α = sin−1 (M −1 ), where M = vp /c0 is the Mach number of the projectile.

16.5.4

Mach Cone

The shock waves formed by a supersonically moving body are quite complex close to the body and depend on its detailed shape, Reynolds’ number, etc. However, far from the body, the leading shock has the form of the Mach cone shown in Fig. 16.12. We can understand this cone by the construction shown in the figure. The shock is the boundary between that fluid which is in sound-based causal contact with the projectile and that which is not. This boundary is mapped out by (conceptual) sound waves that propagate into the fluid from the projectile at the ambient sound speed c0 . When the projectile is at the indicated position, the envelope of these circles is the shock front and has the shape of the Mach cone, with opening angle (the Mach angle) 1 α = sin−1 ( ) . (16.47) M

P

Fig. 16.13: Double shock created by supersonic projectile and associated “N wave” pressure distribution.

Usually, there will be two such shock cones, one attached to the projectile’s bow shock and the other formed out of the complex shock structure in its tail region. The pressure must jump twice, once across each of these shocks, and will therefore form an N wave which

26 propagates cylindrically away from the projectile as shown in Fig. 16.13. Behind the first shock, the density and pressure drop off gradually by more than the first shock’s compression. As a result, the fluid flowing into the second shock has a lower pressure, density, and sound speed than that flowing into the first (cf. Fig. 16.13). This causes the Mach number of the second shock to be higher than that of the first, and its Mach angle thus to be smaller. As a result, the separation between the shocks increases as they travel—∝ r 1/2 it turns out, where r is the perpendicular distance of the point of observation from the projectile’s trajectory. In order to conserve energy flux the wave amplitude will then decrease ∝ r −3/4 , rather than ∝ r −1/2 as would be true of a cylindrical sound pulse [?]. Often a double boom can be heard on the ground. **************************** EXERCISES Exercise 16.7 *** Problem: Hydraulic Jumps and Breaking Ocean Waves Run water at a high flow rate from a kitchen tap onto a dinner plate (Fig. 16.14). What you see is called a hydraulic jump. It is the kitchen analog of a breaking ocean wave, and the shallow-water-wave analog of a shock front in a compressible gas. In this exercise you will develop the theory of hydraulic jumps (and breaking ocean waves) using the same tools as for shock fronts.

Fig. 16.14: Hydraulic jump on a dinner plate under a kitchen tap.

(a) Recall that for shallow-water waves, the water motion, below the water’s surface, is nearly horizontal with speed independent of depth z (Ex. 15.1). The same is true of the water in front of and behind a hydraulic jump. Apply the conservation of mass and momentum to a hydraulic jump, in the jump’s rest frame, to obtain equations for the height of the water h2 and water speed v2 behind the jump (emerging from it) in terms of those in front of the jump, h1 , v1 . These are the analog of the Rankine-Hugoniot

27 relations for a shock front. [Hint: In momentum conservation you will need to use the pressure P as a function of height in front of and behind the jump.] (b) You did not use energy conservation across the jump in your derivation, but it was needed in the analysis of a shock front. Why? √ (c) Show that the upstream speed v1 is greater than the speed gh1 of small-amplitude gravity waves [shallow-water waves; Eq. (15.10) and associated discussion]; i.e. the upstream flow is “supersonic”. Similarly show that the downstream flow speed v2 is √ slower than the speed gh2 of small-amplitude gravity waves; i.e., the downstream flow is “subsonic”. (d) We normally view a breaking ocean wave in the rest frame of the quiescent upstream water. Use your hydraulic-jump equations to show that the speed of the breaking wave as seen in this frame is related to the depths h1 and h2 in front of and behind the breaking wave by 1/2 g(h1 + h2 )h2 vbreak = 2h1 [(Fig. 16.15)].

vbreak h2 h1

v=0

Fig. 16.15: Ocean wave breaking on a slowly sloping beach. The depth of water ahead of the wave is h1 and the depth behind the wave is h2 .

Exercise 16.8 Problem: Shock Tube Consider a shock tube as discussed in Sec. 16.4 and Fig. 16.11. Suppose that there is a small density of gas at rest in the evacuated half of the tube, with specific heat ratio γ1 which might differ from that of the driver gas, and with initial sound speed c1 . After the membrane is ruptured, the driver gas will expand into the evacuated half of the tube forming a shock front. Show that, in the limit of very large pressure ratio across the shock, the shock Mach number is c0 γ1 + 1 , M1 ≃ γ−1 c1 where c0 is the driver gas’s initial sound speed.

28 Exercise 16.9 Problem: Sonic Boom from the Space Shuttle Use the quoted scaling of N wave amplitude with cylindrical radius r to make an order of magnitude estimate of the flux of acoustic energy produced by the space shuttle flying at Mach 2 at an altitude of 20km. Quote your answer in dB [cf. Eq. (15.56)]. Exercise 16.10 Derivation and Challenge: Signs of Change Across a Shock (a) Almost all equations of state satisfy the condition (∂ 2 V /∂P 2 )s > 0. Show that, when this is satisfied, the Rankine-Hugoniot relations and the law of entropy increase imply that the pressure and density must increase across a shock and the fluid must decelerate; i.e., P2 > P1 , V2 < V1 , and v2 < v1 . (b) Show that in a fluid that violates (∂ 2 V /∂P 2 )s > 0, the pressure and density must still increase and the fluid decelerate across a shock, as otherwise the shock would be unstable. For a solution to this exercise, see Sec. 84 of Landau and Lifshitz (1959). Exercise 16.11 Problem: Relativistic Shock In astrophysics (e.g. in supernova explosions and in jets emerging from the vicinities of black holes), one sometimes encounters shock fronts for which the flow speeds relative to the shock approach the speed of light, and the internal energy density is comparable to the rest mass density. (a) Show that the relativistic Rankine-Hugoniot equations for such a shock take the following form: η22 − η12 = (P2 − P1 )(η1 V1 + η2 V2 ) , (16.48a) j2 =

P2 − P1 , η1 V1 − η2 V2

v2 γ2 = jV2 ,

v1 γ1 = jV1 .

(16.48b) (16.48c)

Here, (i) we use units in which the speed of light is one (as in Chap. 1); (ii) V ≡ 1/ρo is the volume per unit rest mass and ρo is the rest-mass density (equal to some standard rest mass per baryon times the number density of baryons; cf. Sec. 1.11.4); (iii) we denote the total density of mass-energy including rest mass by ρR (it was denoted ρ in Chap. 1) and the internal energy per unit rest mass by u so ρR = ρo (1 + u); and in terms of these the quantity η ≡ (ρR + P )/ρo = 1 + u + P/ρo = 1 + h is the relativistic enthalpy per unit rest mass (i.e. the enthalpy per unit rest mass including the rest-mass contribution to the energy) as measured in the fluid rest frame; (iv) P is the pressure as measured√in the fluid rest frame; (v) v is the flow velocity in the shock’s rest frame and γ ≡ 1/ 1 − v 2 , so vγ is the spatial part of the flow 4-velocity; and (vi) j is the rest-mass flux (rest mass per unit area per unit time) entering and leaving the shock. (b) Use a pressure-volume diagram to discuss these relativistic Rankine-Hugoniot equations in a manner analogous to Fig. 16.11.

29 (c) Show that in the nonrelativistic limit, the relativistic Rankine-Hugoniot equations (16.48) reduce to the nonrelativistic ones (16.41 ).

****************************

16.6

Similarity Solutions — Sedov-Taylor Blast Wave

Strong explosions can generate shock waves. Examples include atmospheric nuclear explosions, supernova explosions, and depth charges. The debris from a strong explosion will be at much higher pressure than the surrounding gas and will therefore drive a strong spherical shock into the surroundings. Initially, this shock wave will travel at roughly the radial speed of the expanding debris. However, the mass of fluid swept up by the shock will eventually exceed that of the explosion debris. The shock will then decelerate and the energy of the explosion will be transferred to the swept-up fluid. It is of obvious importance to be able to calculate how fast and how far the shock front will travel. First make an order of magnitude estimate. Let the total energy of the explosion be E and the density of the surrounding fluid (assumed uniform) be ρ0 . Then after time t, when the shock radius is R(t), the mass of swept-up fluid will be ∼ ρ0 R3 . The fluid velocity behind the shock will be roughly the radial velocity of the shock front, v ∼ R˙ ∼ R/t, and so the kinetic energy of the swept-up gas will be ∼ ρ0 R5 /t2 . There will also be internal energy in the postshock flow, with energy density roughly equal to the post-shock pressure, ρu ∼ P ∼ ρ0 R˙ 2 [cf. the strong-shock jump condition (16.45b) with P1 ∼ ρ0 c20 so P1 M 2 ∼ ρ0 v 2 ∼ ρ0 R˙ 2 ]. The total internal energy within the expanding shock will then be ∼ ρR˙ 2 R3 , equal in order of magnitude to the kinetic energy. Equating either term to the total energy E of the explosion, we obtain the rough estimate E = κρ0 R5 t−2 , (16.49) which implies that at time t the shock front has reached the radius R=

E κρ0

1/5

t2/5 .

(16.50)

Here κ is a numerical constant of order unity. This scaling should hold roughly from the time that the mass of the debris is swept up to the time that the shock weakens to a Mach number of order unity so we can no longer use the strong-shock value ∼ ρ0 R˙ 2 for the post shock pressure. Note that we could have obtained Eq. (16.50) by a purely dimensional argument as E and ρ0 are the only significant controlling parameters in the problem. However, it is usually possible and always desirable to justify any such dimensional argument on the basis of the governing equations. If, as we shall assume, the motion remains radial and the gas is perfect with constant specific-heat ratio γ, then we can solve for the details of the flow behind the shock front by

30 integrating the radial flow equations ∂ρ 1 ∂ + 2 (r 2 ρv) = 0 , ∂t r ∂r ∂v 1 ∂P ∂v +v + =0, ∂t ∂r ρ ∂r ∂ P ∂ P +v =0. ∂t ργ ∂r ργ

(16.51a) (16.51b) (16.51c)

The first two equations are the familiar continuity equation and Euler equation written for a spherical flow. The third equation is energy conservation expressed as the adiabaticexpansion relation, P ∝ ργ moving with a fluid element. Although P/ργ is time-indendent for each fluid element, its value will change from element to element. Gas that has passed through the shock more recently will be given a smaller entropy than gas which was swept up when the shock was stronger, and thus will have a smaller value of P/ργ . Given suitable initial conditions, the partial differential equations (16.51) can be integrated numerically. However, there is a practical problem in that it is not easy to determine the initial conditions in an explosion! Fortunately, at late times, when most of the mass has been swept up, the fluid evolution is independent of the details of the initial expansion and in fact can be understood analytically as a similarity solution. By this, we mean that the shape of the radial profiles of pressure, density and velocity are independent of time. We have already seen some examples of similarity solutions. The first was the Blasius solution for the structure of a laminar boundary layer (Sec. 13.4). In this case we argued on the basis of mass and momentum conservation that the thickness of the boundary layer as a function of distance x downstream would be ∼ δ = (νx/V )1/2 where V was the speed of the flow above the boundary layer. This motivated us to introduce the dimensionless variable ξ = y/δ and argue that the boundary layer’s speed vx (x, y) would be equal to the free stream velocity V times some universal function f ′ (ξ). This anszatz converted the fluid’s partial differential equations into an ordinary differential equation for f (ξ), which we solved numerically. Our explosion problem is somewhat similar. The characteristic scaling length in the explosion is the radius R(t) of the shock, so the fluid and thermodynamic variables should be expressible as some characteristic values multiplying universal functions of ξ ≡ r/R(t) .

(16.52)

Our thermodynamic variables are P, ρ, u and a natural choice for their characteristic values is the values immediately behind the shock. If we assume that the shock is strong then we can use the strong-shock jump conditions (16.46a), (16.46b) to determine those values, and then write 2 P = ρ0 R˙ 2 P˜ (ξ) , (16.53a) γ+1 γ+1 ρ0 ρ˜(ξ) , (16.53b) ρ= γ−1 2 ˙ R v˜(ξ) , (16.53c) v= γ+1

31 1

= 1.4 v∼( ) ∼ P( )

0.5

∼( ) ρ 0

0.5

1

Fig. 16.16: Scaled pressure, density and velocity as a function of scaled radius behind a SedovTaylor blast wave in air with γ = 1.4.

with P˜ (1) = ρ˜(1) = v˜(1) = 1 since ξ = 1 is the shock’s location. Note that the velocity v is scaled to the post-shock velocity measured in the inertial frame in which the upstream fluid is at rest, rather than in the non-inertial frame in which the decelerating shock is at rest. The self-similarity anszatz (16.53) and resulting self-similar solution for the flow are called the Sedov-Taylor blast-wave solution, since L. I. Sedov and G. I. Taylor independently developed it. We need one more piece of information before we can solve for the flow: the variation of the shock radius R with time. However all that is necessary is the scaling R = (E/κρ0 )1/5 t2/5 ∝ t2/5 [Eq. (16.50)] with the constant κ left undetermined for the moment. The partial differential equations (16.51) can then be transformed into ordinary differential equations by inserting the anszatz (16.53), changing the independent variables from r, t to R, ξ, and using ! ˙ ∂ ∂ ∂ 2ξ ∂ 2R ∂ ξR =− + R˙ =− + , (16.54) ∂t r R ∂ξ R ∂R ξ 5t ∂ξ R 5t ∂R ξ

∂ ∂r

t

1 ∂ = . R ∂ξ R

(16.55)

The three resulting first order differential equations are rather complex but can in fact be solved analytically (e.g. Landau and Lifshitz 1959). The results for an explosion in air are exhibited in Fig. 16.16. Armed with these solutions for P˜ (ξ), ρ˜(ξ), v˜(ξ), we can evaluate the flow’s energy E, which is equal to the explosion’s total energy during the time interval when this similarity

32 solution is accurate. The energy E is given by the integral Z R 1 2 2 E= 4πr drρ v +u 2 0 ! Z 2 ˜ 2 P 2˜ v 4πρ0 R3 R˙ 2 (γ + 1) 1 . dξξ 2ρ˜ + = (γ − 1) (γ + 1)2 (γ + 1)2 ρ˜ 0

(16.56)

Here we have used Eqs. (16.53) and substituted u = P/ρ(γ − 1) for the internal energy [Eq. (16.6)]. The energy E appears not only on the left side of this equation, but also on the right, in the terms ρo R3 R˙ 2 = (4/25)E/κ. Thus, E cancels out, and Eq. (16.56) becomes an equation for the unknown constant κ. Evaluating that equation numerically, we find that κ varies from 2.5 to 1.4 as γ increases from 1.4 (air) to 1.67 (monatomic gas or fully ionised plasma). It is enlightening to see how the fluid behaves in this blast-wave solution. The fluid that passes through the shock is compressed so that it mostly occupies a fairly thin spherical shell immediately behind the shock [see the spike in ρ˜(ξ) in Fig. 16.16]. This shell moves ˙ somewhat slower than the shock [v = 2R/(γ + 1); Eq. (16.53) and Fig. 16.16]. As the post-shock flow is subsonic, the pressure within the blast wave is fairly uniform [see the curve P˜ (ξ) in fig. 16.16]; in fact the central pressure is typically about half the maximum pressure immediately behind the shock. This pressure pushes on the spherical shell, thereby accelerating the freshly swept-up fluid.

16.6.1

Atomic Bomb

The first atomic bomb was exploded in New Mexico in 1945, and photographs released later that year (Fig. 16.17) showed the radius of the blast wave as a function of time. The pictures were well fit by R ∼ 60(t/1ms)0.4 m up to about t = 100 ms when the shock Mach number fell to about unity (Fig. 16.17). Combining this information with the Sedov-Taylor similarity solution, the Russian physicist L. I. Sedov and others were able to infer the total energy released, which was an official American secret at the time. If we adopt an intermediate specific heat ratio of γ = 1.5, as the air will be partially ionised by the shock front, we can use Eq. (16.56) to obtain the estimate E ∼ 1.5 × 1014 J or about the same energy release as 30 ktons of TNT. (Hydrogen bombs have been manufactured over a thousand times more energetic than the first atomic bombs. However, contemporary arsenals contain bombs that are typically only one megaton!) We can use the Sedov-Taylor solution to infer some further features of the explosion. The post-shock gas is at density ∼ (γ + 1)/(γ − 1) ∼ 5 times the ambient density ρ0 ∼ 1 kg m−3 . Similarly, using the perfect gas law with a mean molecular weight µ ∼ 10 and the strong shock jump conditions, the post shock temperature can be computed: −1.2 mp µ t 4 T2 = ∼ 4 × 10 K. (16.57) ρ2 k 1ms This is enough to ionise the gas at early times.

33

Fig. 16.17: Exploding atomic bomb.

34

Fig. 16.18: Cassiopeia A – a supernova remnant left behind an exploding star in our galaxy approximately 300 years ago. The image to the left is made using the Very Large Array Radio Telescope; that to the right by the Chandra X-ray Observatory.

16.6.2

Supernovae

The evolution of most massive stars ends in a supernova explosion (like that which was observed in 1987 in the Large Magellanic Cloud), in which a neutron star of mass m ∼ 3×1030 kg is formed. This neutron star has a gravitational binding energy of about 0.1mc2 ∼ 3 × 1046 J. Most of this binding energy is released in the form of neutrinos in the collapse that forms the neutron star, but an energy E ∼ 1044 J drives off the outer envelope of the pre-supernova star, a mass M0 ∼ 1031 kg. This stellar material escapes with a rms speed V0 ∼ (2E/M0 )1/2 ∼ 5000 km s−1 . The expanding debris eventually drives a blast wave into the surrounding interstellar medium of density ρ0 ∼ 10−21 kg m−3 . The expansion of the blast wave can be modeled using the Sedov-Taylor solution after the swept-up interstellar gas has become large enough to dominate the blast wave, so the star-dominated initial conditions are no longer important—i.e. after a time ∼ (3M0 /4πρ0 )1/3 /V0 ∼ 1000yr. The blast wave then decelerates in a Sedov-Taylor self-similar way until the shock speed nears the sound speed in the surrounding gas; this takes about 100, 000yr. Supernova remnants of this sort are efficient emitters of radio waves and several hundred have been observed in the Galaxy. In some of the younger examples, like Cassiopeia A, (Fig. 16.18) it is possible to determine the expansion speed, and the effects of deceleration can be measured. The observations are consistent with the prediction of the Sedov-Taylor solution, namely that the radius varries ¨ = −3R˙ 2 /2R. as R ∝ t2/5 , or R **************************** EXERCISES Exercise 16.12 Problem: Underwater explosions A simple analytical solution to the Sedov-Taylor similarity equations, can be found for the

35 particular case γ = 7. This is a fair approximation to the behavior of water under explosive conditions as it will be almost incompressible. (a) Make the ansatz (whose self-consistency we’ll check later), that the velocity in the post-shock flow varies linearly with radius from the origin to the shock, i.e. v˜(ξ) = ξ. Use Eq. (16.54) to transform the equation of continuity into an ordinary differential equation and hence solve for the density function ρ˜(ξ). (b) Next use the equation of motion to discover that P˜ (ξ) = ξ 3 . (c) Verify that your solutions for the functions P˜ , ρ˜, v˜ satisfy the remaining entropy equation therby vindicating the original ansatz. (d) Finally, substitute into Eq. (16.56) to show that E=

2πR5 ρ0 225t2

(e) An explosive charge weighing 100kg with an energy release of 108 J kg−1 is detonated underwater. For what range of shock radius do you expect that the Sedov-Taylor similarity solution will be valid? Exercise 16.13 Problem: Stellar Winds Many stars possess powerful stellar winds which drive strong spherical shock waves into the surrounding interstellar medium. If the strength of the wind remains constant, the kinetic and internal energy of the swept-up interstellar medium will increase linearly with time. (a) Modify the analysis of the point explosion to show that the speed of the shock wave at time t is 3R(t)/5t, where R is the associated shock radius. What is the speed of the post-shock gas? (b) Now suppose that the star explodes as a supernova and the blast wave expands into the relatively slowly moving stellar wind. Suppose that the rate at which mass has left the star and the speed of the wind have been constant for a long time. How do you expect the density of gas in the wind to vary with radius? Modify the Sedov-Taylor analysis again to show that the expected speed of the shock wave at time t is now 2R(t)/3t. Exercise 16.14 Problem: Similarity Solution for Shock Tube Use a self-similarity analysis to derive the solution (16.38) for the shock-tube flow depicted in Fig. 16.9. ****************************

36 Box 16.3 Important Concepts in Chapter 16 • γ-law equation of state, P = K(s)ργ , Sec. 16.2 – Values of γ for various situations, Sec. 16.2, Ex. 16.1 • Mach number, subsonic flow, supersonic flow, Sec. 16.3.1 • Quasi-one-dimensional transonic flow, Sec. 16.3 – Opposite signs of dv/dA in supersonic vs. subsonic flow; and of dρ/dA, Sec. 16.3.1 – Sonic point and critical point of flow, Secs. 16.3.1 and 16.3.2 – How a rocket engine works, and its De Laval nozzle, Sec. 16.3.3 • Transonic accretion of gas onto a neutron star or black hole, Ex. 16.4 • Riemann invariants for one-dimensional, time-dependent compressible flow, Sec. 16.4.1 – Their use to compute the details of the flow, Secs. 16.4.1, 16.4.2 • Steepening of a nonlinear sound wave to form a shock, Sec. 16.4.1, Fig. 16.8 • Shock tube, Sec. 16.4.2 • Shock waves, Sec. 16.5 – Upstream and downstream sides of the shock, Sec. 16.5 – Continuity of normal fluxes of mass, momentum and energy across a shock, Sec. 16.5.1 – Rankine-Hugoniot relations for shock; shock adiabat, Secs. 16.5.1, and 16.5.3; and Ex. 16.11 in relativistic regime – Internal structure and thickness of a shock and role of viscosity, Sec. 16.5.2 – Mach cone, N-wave and sonic boom, Sec. 16.5.4 and Ex. 16.9 – Hydraulic jump and breaking ocean waves, Ex. 16.7 • Sedov-Taylor similarity solution for the flow behind a shock, Sec. 16.6 – Application to bombs and supernovae, Secs. 16.6.1, 16.6.2

Bibliographic Note For textbook treatments of compressible flows and shock waves, we recommend Liepmann and Roshko (1968), Thompson (1984), and the relevant sections of Landau and Lifshitz

37 (1959). The two-volume treatise by Zel’dovich and Raizer (1979) is a compendium of insights into shock waves and high-temperature hydrodynamics by an author (Yakov Borisovich Zel’dovich) who had a huge influence in the design of nuclear and thermonuclar weapons in the USSR and later on astrophysics and cosmology. Sedov (1959) is a classic and insightful treatise on similarity methods in physics. The movie by Coles (1965) of the flow of air down a channel with throats gives good physical insight into subsonic, supersonic, and transonic flows, and shocks.

Bibliography Bondi, H. 1952. Monthly Notices of the Royal Astronomical Society, 112, 195. Coles, D. 1965. Channel Flow of a Compressible Fluid, a movie (National Committee for Fluid Mechanics Films); available at http://web.mit.edu/fluids/www/Shapiro/ncfmf.html . Landau, L. D. and Lifshitz, E. M. 1959. Fluid Mechanics, Reading, Massachusetts: Addison Wesley. Liepmann, H. & Roshko, A. 1968. Compressible Gas Dynamics. Sedov, L. I. 1959. Similarity and Dimensional Methods in Mechanics, New York: Academic Press. Thompson, P. A. 1984. Compressible Fluid Dynamics, Maple Press Co. Van Dyke, M. 1982. An Album of Fluid Flow, Stanford: Parabolic Press. Whitham, G. B. 1974. Linear and Nonlinear Waves, New York: Wiley. Zel’dovich, Ya. B. & Raizer, Yu. P. 1979. Physics of Shock Waves and High Temperature Hydrodynamic Phenomena.

Contents 17 Magnetohydrodynamics 17.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Basic Equations of MHD . . . . . . . . . . . . . . . . . . 17.2.1 Maxwell’s Equations in MHD Approximation . . 17.2.2 Momentum and Energy Conservation . . . . . . . 17.2.3 Boundary Conditions . . . . . . . . . . . . . . . . 17.2.4 Magnetic field and vorticity . . . . . . . . . . . . 17.3 Magnetostatic Equilibria . . . . . . . . . . . . . . . . . . 17.3.1 Controlled thermonuclear fusion . . . . . . . . . . 17.3.2 Z-Pinch . . . . . . . . . . . . . . . . . . . . . . . 17.3.3 Θ Pinch . . . . . . . . . . . . . . . . . . . . . . . 17.3.4 Tokamak . . . . . . . . . . . . . . . . . . . . . . . 17.4 Hydromagnetic Flows . . . . . . . . . . . . . . . . . . . . 17.5 Stability of Hydromagnetic Equilibria . . . . . . . . . . . 17.5.1 Linear Perturbation Theory . . . . . . . . . . . . 17.5.2 Z-Pinch; Sausage and Kink Instabilities . . . . . . 17.5.3 Energy Principle . . . . . . . . . . . . . . . . . . 17.6 Dynamos and Reconnection of Magnetic Field Lines . . . 17.6.1 Cowling’s theorem . . . . . . . . . . . . . . . . . 17.6.2 Kinematic dynamos . . . . . . . . . . . . . . . . . 17.6.3 Magnetic Reconnection . . . . . . . . . . . . . . . 17.7 Magnetosonic Waves and the Scattering of Cosmic Rays 17.7.1 Cosmic Rays . . . . . . . . . . . . . . . . . . . . 17.7.2 Magnetosonic Dispersion Relation . . . . . . . . . 17.7.3 Scattering of Cosmic Rays . . . . . . . . . . . . .

0

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 4 7 10 11 13 13 15 17 17 18 22 22 25 27 29 29 30 30 33 33 34 36

Chapter 17 Magnetohydrodynamics Version 0817.1.K.pdf, 4 March 2009. Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 17.1 Reader’s Guide • This chapter relies heavily on Chap. 12 and somewhat on the treatment of vorticity transport in Sec. 13.2 • Part V, Plasma Physics (Chaps. 18-21) relies heavily on this chapter.

17.1

Overview

In preceding chapters, we have described the consequences of incorporating viscosity and thermal conductivity into the description of a fluid. We now turn to our final embellishment of fluid mechanics, in which the fluid is electrically conducting and moves in a magnetic field. The study of flows of this type is known as Magnetohydrodynamics or MHD for short. In our discussion, we eschew full generality and with one exception just use the basic Euler equation augmented by magnetic terms. This suffices to highlight peculiarly magnetic effects and is adequate for many applications. The simplest example of an electrically conducting fluid is a liquid metal, for example, mercury or liquid sodium. However, the major use of MHD is in plasma physics. (A plasma is a hot, ionized gas containing free electrons and ions.) It is by no means obvious that plasmas can be regarded as fluids since the mean free paths for collisions between the electrons and ions are macroscopically long. However, as we shall learn in Part V (Sec. 19.5 and Chap. 21), collective interactions between large numbers of plasma particles can isotropize the particles’ velocity distributions in some local mean reference frame, thereby making it sensible to describe the plasma macroscopically by a mean density, velocity, and pressure. These mean 1

2 quantities can then be shown to obey the same conservation laws of mass, momentum and energy, as we derived for fluids in Chap. 12. As a result, a fluid description of a plasma is often reasonably accurate. We defer to Part V further discussion of this point, asking the reader to take this on trust for the moment. We are also, implicitly, assuming that the average velocity of the ions is nearly the same as the average velocity of the electrons. This is usually a good approximation; if it were not so, then the plasma would carry an unreasonably large current density. There are two serious technological applications of MHD that may become very important in the future. In the first, strong magnetic fields are used to confine rings or columns of hot plasma that (it is hoped) will be held in place long enough for thermonuclear fusion to occur and for net power to be generated. In the second, which is directed toward a similar goal, liquid metals are driven through a magnetic field in order to generate electricity. The study of magnetohydrodynamics is also motivated by its widespread application to the description of space (within the solar system) and astrophysical plasmas (beyond the solar system). We shall illustrate the principles of MHD using examples drawn from each of these areas. After deriving the basic equations of MHD (Sec. 17.2), we shall elucidate hydromagnetic equilibria by describing a Tokamak (Sec. 17.3). This is currently the most popular scheme for magnetic confinement of hot plasma. In our second application (Sec. 17.4) we shall describe the flow of conducting liquid metals or plasma along magnetized ducts and outline its potential as a practical means of electrical power generation and spacecraft propulsion. We shall then return to the question of hydromagnetic confinement of hot plasma and focus on the stability of equilibria (Sec. 17.5). This issue of stability has occupied a central place in our development of fluid mechanics and it will not come as a surprise to learn that it has dominated research into plasma fusion. When a magnetic field plays a role in the equilibrium (e.g. for magnetic confinement of a plasma), the field also makes possible new modes of oscillation and some of these modes can be unstable to exponential growth. Many magnetic confinement geometries are unstable to MHD modes. We shall demonstrate this qualitatively by considering the physical action of the magnetic field, and also formally using variational methods. In Sec. 17.6 we turn to a geophysical problem, the origin of the earth’s magnetic field. It is generally believed that complex fluid motions within the earth’s liquid core are responsible for regenerating the field through dynamo action. We shall use a simple model to illustrate this process. When magnetic forces are added to fluid mechanics, a new class of waves, called magnetosonic waves, can propagate. We conclude our discussion of MHD in Sec. 17.7 by deriving the properties of these wave modes in a homogeneous plasma and showing how they control the propagation of cosmic rays in the interplanetary and interstellar media.

17.2

Basic Equations of MHD

The equations of MHD describe the motion of a conducting fluid in a magnetic field. This fluid is usually either a liquid metal or a plasma. In both cases, the conductivity ought to be regarded as a tensor if the gyro frequency exceeds the collision frequency. (If there are several collisions per gyro orbit then the influence of the magnetic field on the transport

3

F=j×B

v j.

N

N

S B

S (b)

(a)

B

Fig. 17.1: The two key physical effects occuring in MHD. (a) A moving conductor modifies the magnetic field by appearing to drag the field lines with it. When the conductivity is infinite, the field lines appear to be frozen into the moving conductor. (b) When electric current, flowing in the conductor, crosses magnetic field lines there will be a Lorentz force, which will accelerate the fluid.

coefficients will be minimal.) However, in order to keep the mathematics simple, we shall treat the conductivity as a constant scalar, κe . In fact, it turns out that, for many of our applications, it is adequate to take the conductivity as infinite. There are two key physical effects that occur in MHD (Fig. 17.1), and understanding them well is the key to developing physical intuition in this subject. The first effect arises when a good conductor moves into a magnetic field. Electric current is induced in the conductor which, by Lenz’s law, creates its own magnetic field. This induced magnetic field tends to cancel the original, externally supported field, thereby, in effect, excluding the magnetic field lines from the conductor. Conversely, when the magnetic field penetrates the conductor and the conductor is moved out of the field, the induced field reinforces the applied field. The net result is that the lines of force appear to be dragged along with the conductor – they “go with the flow”. Naturally, if the conductor is a fluid with complex motions, the ensuing magnetic field distribution can become quite complex, and the current will build up until its growth is balanced by Ohmic dissipation. The second key effect is dynamical. When currents are induced by a motion of a conducting fluid through a magnetic field, a Lorentz (or j × B) force will act on the fluid and modify its motion. In MHD, the motion modifies the field and the field, in turn, reacts back and modifies the motion. This makes the theory highly non-linear. Before deriving the governing equations of MHD, we should consider the choice of primary variables. In electromagnetic theory, we specify the spatial and temporal variation of either the electromagnetic field or its source, the electric charge density and current density. One choice is computable (at least in principle) from the other using Maxwell’s equations, augmented by suitable boundary conditions. So it is with MHD and the choice depends on convenience. It turns out that for the majority of applications, it is most instructive to deal with the magnetic field as primary, and use Maxwell’s equations ∇·E =

ρe , ǫ0

∇·B=0,

∇×E =−

∂B , ∂t

∇ × B = µ0 j + µ0 ǫ0

∂E ∂t

to express the current density and the electric field in terms of the magnetic field.

(17.1)

4

17.2.1

Maxwell’s Equations in MHD Approximation

Ohm’s law, as normally formulated, is valid only in the rest frame of the conductor. In particular, for a conducting fluid, Ohm’s law relates the current density j′ measured in the fluid’s local rest frame, to the electric field E′ measured there: j′ = κe E′ ,

(17.2)

where κe is the electric conductivity. Because the fluid is generally acclerated, dv/dt 6= 0, its local rest frame is generally not inertial. Since it would produce a terrible headache to have to transform time and again from some inertial frame to the continually changing local rest frame when applying Ohm’s law, it is preferable to reformulate Ohm’s law in terms of the fields E, B and j measured in the inertial frame. To facilitate this (and for completeness), we shall explore the frame dependence of all our electromagnetic quantities E, B, j and ρe : We shall assume, throughout our development of magnetohydrodynamics, that the fluid moves with a non-relativistic speed v ≪ c relative to our chosen reference frame. We can then express the rest-frame electric field in terms of the inertial-frame electric and magnetic fields as E′ = E + v × B ; E ′ = |E′| ≪ E . (17.3a) p In the first equation we have set the Lorentz factor γ ≡ 1/ 1 − v 2 /c2 to unity consistent with our nonrelativistic approximation. The second equation follows from the high conductivity of the fluid, which guarantees that current will quickly flow in whatever manner it must to annihilate any electric field E′ that might be formed in the fluid’s rest frame. By contrast with the extreme frame dependence (17.3a) of the electric field, the magnetic field is essentially the same in the fluid’s local rest frame as in the laboratory. More specifically, the analog of Eq. (17.3a) is B′ = B − (v/c2) × E, and since E ∼ vB the second term is of magnitude (v/c)2 B, which is negligible, giving B′ = B . (17.3b) Because E is very frame dependent, so is its divergence, the electric charge density ρe : In the laboratory frame, where E ∼ vB, Gauss’s and Ampére’s laws [the first and fourth of Eqs. (17.1)] imply that ρe ∼ ǫ0 vB/L ∼ (v/c2 )j where L is the lengthscale on which E and B vary; and the relation E ′ ≪ E with Gauss’s law implies |ρ′e | ≪ |ρe |: ρe ∼ vj/c2 ,

|ρ′e | ≪ |ρe | , .

(17.3c)

By transforming the current density between frames and approximating γ ≃ 1, we obtain j′ = j + ρe v = j + O(v/c)2j; so in the nonrelativistic limit (first order in v/c) we can ignore the charge density and write j′ = j . (17.3d) To recapitulate, in nonrelativistic magnetohydrodynamic flows, the magnetic field and current density are frame independent up to fractional corrections of order (v/c)2 , while the electric field and charge density are very frame dependent and are generally small in the sense that E/c ∼ (v/c)B ≪ B [in Gaussian cgs units E ∼ (v/c)B ≪ B] and ρe c ∼ (v/c)j ≪ j.

5 Combining Eqs. (17.2), (17.3a) and (17.3d), we obtain the nonrelativistic form of Ohm’s law in terms of quantities measured in our chosen inertial, laboratory frame: j = κe (E + v × B) .

(17.4)

We are now ready to derive explicit equations for the (inertial-frame) electric field and current density in terms of the (inertial-frame) magnetic field. We begin with Ampere’s law written as ∇ × B − µ0j = µ0 ǫ0 ∂E/∂t = (1/c2 )∂E/∂t, and we notice that the time derivative of E is of order Ev/L ∼ Bv 2 /L (since E ∼ vB), so the right-hand side is O(Bv 2 /c2 L) and thus can be neglected compared to the O(B/L) term on the left, yielding: j=

1 ∇×B . µ0

(17.5a)

We next insert this expression for j into the inertial-frame Ohm’s law (17.4), thereby obtaining 1 ∇×B . E = −v × B + (17.5b) κe µ0 If we happen to be interested in the charge density (which is rare in MHD), we can compute it by taking the divergence of this electric field. ρe = −ǫ0 ∇ · (v × B) .

(17.5c)

Equations (17.5) express all the secondary electromagnetic variables in terms of our primary one, B. This has been possible because of the high electric conductivity and our choice to confine ourselves to nonrelativistic (low-velocity) situations; it would not be possible otherwise. We next derive an evolution law for the magnetic field by taking the curl of Eq. (17.5b), using Maxwell’s equation ∇ × E = −∂B/∂t, and the vector identity ∇ × (∇ × B) = ∇(∇ · B) − ∇2 B and using ∇ · B = 0; the result is ∂B = ∇ × (v × B) + ∂t

1 µ0 κe

∇2 B .

(17.6)

This equation is called the induction equation and describes the temporal evolution of the magnetic field. It is similar in form to the propagation law for vorticity in a flow with ∇P × ∇ρ = 0 [Eq. (13.3)], which says ∂ω/∂t = ∇ × (v × ω) + ν∇2 ω. The ∇ × (v × B) term in Eq. (17.6) dominates when the conductivity is large, and can be regarded as describing the freezing of magnetic field lines into the fluid in the same way as the ∇ × (v × ω) term describes the freezing of vortex lines into a fluid with small viscosity ν; cf. Fig. 17.2. By analogy with Eq. (13.9), when flux-freezing dominates, the fluid derivative of B/ρ can be written as d B B D B ≡ − ·∇ v =0 (17.7) Dt ρ dt ρ ρ

6

B

B

ω

(a)

(b)

Fig. 17.2: Pictorial representation of the evolution of magnetic field in a fluid endowed with infinite electrical conductivity. a) A uniform magnetic field at time t = 0 in a vortex. b) At a later time, when the fluid has rotated through ∼ 30◦ , the circulation has stretched and distorted the magnetic field.

(where ρ is mass density, not to be confused with charge density ρe ). This says that B/ρ evolves in the same manner as the separation ∆x between two points in the fluid; cf. Fig. 13.3 and associated discussion. The term (1/µ0 κe )∇2 B in the B-field evolution equation (17.6) is analogous to the vorticity diffusion term ν∇2 ω in the vorticity evolution equation (13.3); therefore, when κe is not too large, magnetic field lines will diffuse through the fluid. The effective diffusion coefficient (analogous to ν) is DM = 1/µ0 κe . (17.8) The earth’s magnetic field provides an example of field diffusion. That field is believed to be supported by electric currents flowing in the earth’s iron core. Now, we can estimate the electric conductivity of iron under these conditions and from it deduce a value for the diffusivity, DM ∼ 1 m2 s−1 . The size of the earth’s core is L ∼ 104 km, so if there were no fluid motions, then we would expect the magnetic field to diffuse out of the core and escape from the earth in a time τM ∼ L2 /DM ∼ three million years which is much shorter than the age of the earth, ∼ 5 billion years. The reason for this discrepancy, as we shall discuss, is that there are internal circulatory motions in the liquid core which are capable of regenerating the magnetic field through dynamo action. Although Eq. (17.6) describes a genuine diffusion of the magnetic field, the resulting magnetic decay time must be computed by solving the complete boundary value problem. To give a simple illustration, suppose that a poor conductor (e.g. a weakly ionized column of plasma) is surrounded by an excellent conductor, (e.g. the metal walls of the container in which the plasma is contained), and that magnetic field lines supported by wall currents thread the plasma. The magnetic field will only diminish after the wall currents undergo Ohmic dissipation and this can take much longer than the diffusion time for the plasma column alone. It is customary to introduce a dimensionless number called the Magnetic Reynolds number, RM , directly analogous to the normal Reynolds number, to describe the relative importance of flux freezing and diffusion. The normal Reynolds number can be regarded as the ratio of the magnitude of the vorticity-freezing term ∇ × (v × ω) ∼ (V /L)ω in the vorticity evolution equation ∂ω/∂t = ∇ × (v × ω) + ν∇2 ω to the magnitude of the diffusion term ν∇2 ω ∼ (ν/L2 )ω: R = (V /L)(ν/L2 )−1 = V L/ν. Here V is a characteristic speed and L a

7 Substance Mercury Liquid Sodium Laboratory Plasma Earth’s Core Interstellar Gas

L, m 0.1 0.1 1 107 1017

V , m s−1 0.1 0.1 100 0.1 103

DM , m2 s−1 1 0.1 10 1 103

τM , s RM 0.01 0.01 0.1 0.1 0.1 10 14 10 106 31 10 1017

Table 17.1: Characteristic Magnetic diffusivities DM , decay times τM and Magnetic Reynolds Numbers RM for some common MHD flows with characteristic length scales L and velocities V .

characteristic lengthscale of the flow. Similarly, the Magnetic Reynolds number is the ratio of the magnitude of the magnetic-flux-freezing term ∇×(v×B) ∼ (V /L)B to the magnitude of the magnetic-flux-diffusion term DM ∇2 B = (1/µo κe )∇2 B ∼ B/(µo κe L2 ) in the induction equation (magnetic-field evolution equation) (17.6): RM =

VL V /L = µ0 κe V L . = 2 DM /L DM

(17.9)

When RM ≫ 1, the field lines are effectively frozen into the fluid; when RM ≪ 1, Ohmic dissipation is dominant. Magnetic Reynolds numbers and diffusion times for some typical MHD flows are given in Table 17.1. For most laboratory conditions, RM is modest, which means that electric resistivity 1/κe is significant and the magnetic diffusivity DM is rarely negligible. By contrast, in space physics and astrophysics, RM ≫ 1 so the resistivity can be ignored almost always and everywhere. This limiting case, when the electric conductivity is treated as infinite, is often called perfect MHD. The phrase almost always and everywhere needs clarification. Just as for large-Reynoldsnumber fluid flows, so also here, boundary layers and discontinuities can be formed, in which the gradients of physical quantities are automatically large enough to make RM ∼ 1 locally. A new and important example discussed below is magnetic reconnection. This occurs when regions magnetized along different directions are juxtaposed, for example when the solar wind encounters the earth’s magnetosphere. In such discontinuities and boundary layers, magnetic diffusion and Ohmic dissipation are important; and, as in ordinary fluid mechanics, these dissipative layers and discontinuities can control the character of the overall flow despite occupying a negligible fraction of the total volume.

17.2.2

Momentum and Energy Conservation

The fluid dynamical aspects of MHD are handled by adding an electromagnetic force term to the Euler or Navier-Stokes equation. The magnetic force density j × B is the sum of the Lorentz forces acting on all the fluid’s charged particles in a unit volume. There is also an electric force density ρe E, but this is smaller than j × B by a factor O(v 2 /c2 ) by virtue of Eqs. (17.5a)–(17.5c), so we shall ignore it. When j×B is added to the Euler equation (12.40)

8 [or equivalently to the Navier-Stokes equation with the viscosity neglected as unimportant in the situations we shall study], it takes the following form: ρ

dv (∇ × B) × B = ρg − ∇P + j × B = ρg − ∇P + . dt µ0

(17.10)

Here we have used expression (17.5a) for the current density in terms of the magnetic field. This is our basic MHD force equation. Like all other force densities in this equation, the magnetic one j × B can be expressed as minus the divergence of a stress tensor, the magnetic portion of the Maxwell stress tensor, TM =

B2g B ⊗ B − ; 2µ0 µ0

(17.11)

see Ex. 17.1. By virtue of j × B = −∇ · TM and other relations explored in Sec. 12.5 and Box 12.3, we can convert the force-balance equation (17.10) into the conservation law for momentum [generalization of Eq. (12.38)] ∂(ρv) + ∇ · (P g + ρv ⊗ v + Tg + TM ) = 0 . ∂t

(17.12)

Here Tg is the gravitational stress tensor [Eq. (1) of Box 12.3], which resembles the magnetic one: g⊗g g2g + ; (17.13) Tg = − 8πG 4πG it is generally unimportant in laboratory plasmas but can be quite important in some astrophysical plasmas — e.g., near black holes and neutron stars. The two terms in the magnetic Maxwell stress tensor, Eq. (17.11) can be identified as the “push” of an isotropic magnetic pressure of B 2 /2µ0 that acts just like the gas pressure P , and the “pull” of a tension B 2 /µ0 that acts parallel to the magnetic field. The combination of the tension and the isotropic pressure give a net tension B 2 /2µ0 along the field and a net pressure B 2 /2µ0 perpendicular to the field lines. If we expand the divergence of the magnetic stress tensor, we obtain for the magnetic force density fm = −∇ · TM = j × B =

(∇ × B) × B , µo

(17.14)

as expected. Using standard vector identities, we can rewrite this as fm = −∇

B2 2µ0

2 (B · ∇)B (B · ∇)B B + + . =− ∇ µ0 2µ0 ⊥ µ0 ⊥

(17.15)

Here “⊥” means keep only the components perpendicular to the magnetic field; the fact that fm = j × B guarantees that the net force parallel to B must vanish, so we can throw away the component along B in each term. This transversality of fm means that the magnetic force does not inhibit nor promote motion of the fluid along the magnetic field. Instead,

9

B

(a)

(∆ Bµ02 )−

−

−

B2 µ0 R (b)

Fig. 17.3: Contributions to the electromagnetic force density acting on a conducting fluid in a nonuniform field. There is a magnetic-pressure force density −(∇B 2 /2µ0 )⊥ acting perpendicular to the magnetic field; and a magnetic curvature force density [(B · ∇)B/µ0 ]⊥ , which is also perpendicular to the magnetic field and lies in the plane of the field’s bend, pointing toward its center of curvature; the magnitude of this curvature force censity is (B 2 /µ0 R) where R is the radius of curvature.

fluid elements are free to slide along the field like beads that slide without friction along a magnetic “wire”. The “⊥” expressions in Eq. (17.15) say that the magnetic force density has two parts: first, the negative of the two-dimensional gradient of the magnetic pressure B 2 /2µ0 orthogonal to B (Fig. 17.3a), and second, an orthogonal curvature force (B · ∇)B/µ0 , which has magnitude B 2 /µ0 R, where R is the radius of curvature of a field line. This curvature force acts toward the field line’s center of curvature (Fig. 17.3b) and is the magnetic-field-line analog of the force that acts on a curved wire or string under tension. Just as the magnetic force density dominates and the electric force is negligible [O(v 2 /c2 )] in our nonrelativistic situation, so also the electromagnetic contribution to the energy density is predominantly due to the magnetic term UM = B 2 /2µ0 with negligible electric contribution. The electromagnetic energy flux is just the Poynting Flux FM = E × B/µ0. Inserting these into the law of energy conservation (12.53) [and continuing to neglect viscosity] we obtain 1 2 1 2 B2 E×B ∂ + ∇ · ( v + h + Φ)ρv + =0. (17.16) v +u+Φ ρ+ ∂t 2 2µ0 2 µ0 When gravitational energy is important, we must augment this equation with the gravitational energy density and flux as discussed in Box 12.2. As in Secs. 12.7.3 and 15.6.1, we can combine this energy conservation law with mass conservation and the first law of thermodynamics to obtain an equation for the evolution of entropy: Eqs. (12.70) and (12.71) are modified to read ds j2 ∂(ρs) + ∇ · (ρsv) = ρ = . ∂t dt κe T

(17.17)

Thus, just as viscosity increases entropy through viscous dissipation and thermal conductivity increases entropy through diffusive heat flow [Eqs. (12.70) and (12.71) and (15.86)], so also

10 electrical conductivity increases entropy through Ohmic dissipation. From Eq. (17.17) we see that our fourth transport coefficient κe , like our previous three (the two coefficients of viscosity η ≡ ρν and ζ and the thermal conductivity κ), is constrained to be positive by the second law of thermodynamics.

17.2.3

Boundary Conditions

The equations of MHD must be supplemented by boundary conditions at two different types of interfaces. The first is a contact discontinuity, i.e. the interface between two distinct media that do not mix; for example the surface of a liquid metal or a rigid wall of a plasma containment device. The second is a shock front which is being crossed by the fluid. Here the boundary is between shocked and unshocked fluid. We can derive the boundary conditions by transforming into a primed frame in which the interface is instantaneously at rest (not to be confused with the fluid’s local rest frame) and then transforming back into our original unprimed inertial frame. In the primed frame, we resolve the velocity and magnetic and electric vectors into components normal and tangential to the surface. If n is a unit vector normal to the surface, then the normal and tangential components of velocity in either frame are vn = n · v ,

vt = v − (n · v)n

(17.18)

and similarly for the E and B. At a contact discontinuity, vn′ = vn − vsn = 0

(17.19)

on both sides of the interface surface; here vsn is the normal velocity of the surface. At a shock front, mass flux across the surface is conserved [cf. Eq. (16.40a)]: [ρvn′ ] = [ρ(vn − vsn )] = 0 .

(17.20)

Here as in Chap. 16 we use the notation [X] to signify the difference in some quantity X across the interface. When we consider the magnetic field, it does not matter which frame we use since B is unchanged to the Galilean order at which we are working. Let us construct a thin “pill box” V (Fig. 17.4) and integrate the equation ∇ · B = 0 over its volume, invoke the divergence theorem and let the box thickness diminish to zero; thereby we see that [Bn ] = 0 .

(17.21)

By contrast, the tangential component of the magnetic field will generally be discontinuous across a interface because of the presence of surface currents. We deduce the boundary condition on the electric field by integrating Maxwell’s equation ∇ × E = −∂B/∂t over the area bounded by the circuit C in Fig. 17.4 and using Stokes theorem, letting the two short legs of the circuit vanish. We thereby obtain [E′t ] = [Et ] + [(vs × B)t ] = 0 ,

(17.22)

11

V

S

C

Fig. 17.4: (a) Elementary pill box V and (b) elementary circuit C used in deriving the MHD junction conditions at a surface S.

where vs is the velocity of a frame that moves with the surface. Note that only the normal component of the velocity contributes to this expression, so we can replace vs by vsn n. The normal component of the electric field, like the tangential component of the magnetic field, will generally be discontinuous as there may be surface charge at the interface. There are also dynamical boundary conditions that can be deduced by integrating the laws of momentum conservation (17.12) and energy conservation (17.16) over the pill box and using Gauss’s theorem to convert the volume integral of a divergence to a surface integral. The results, naturally, are the requirements that the normal fluxes of momentum T · n and energy F · n be continuous across the surface [T being the total stress, i.e., the quantity inside the divergence in Eq. (17.12) and F the total energy flux, i.e., the quantity inside the divergence in Eq. (17.16)]; see Eqs. (16.41) and (16.42) and associated discussion. The normal and tangential components of [T · n] = 0 read

Bt2 + =0, 2µ0

(17.23)

Bn Bt =0, ρvn vt − µ0

(17.24)

P+

ρvn2

where we have omitted the gravitational stress, since it will always be continuous in situations studied in this chapter (no surface layers of mass). Similarly, continuity of the energy flux [F · n] = 0 reads

17.2.4

(E + vs × B) × B 1 2 =0. v + h + Φ ρ(vn − vsn ) + 2 µ0

(17.25)

Magnetic field and vorticity

We have already remarked on how the magnetic field and the vorticity are both axial vectors that can be written as the curl of a polar vector and that they satisfy similar transport

12 v

v

B=Constant

B

ω

B

v (a)

(b)

Fig. 17.5: (a) Amplification of the strength of a magnetic field by vortical motion. When the RM ≫ 1, the magnetic field will be frozen into the rotating fluid and will be wound up so as to increase its strength. (b) When a tangled magnetic field is frozen into a irrotational flow, it will generally create vorticity.

equations. It is not surprising that they are physically intimately related. To explore this relationship in full detail would take us beyond the scope of this book. However, we can illustrate their interaction by showing how they can create and destroy each other. First, consider a simple vortex threaded at time t = 0 with a uniform magnetic field. If the magnetic Reynolds number is large enough, then the magnetic field will be carried along with the flow and wound up like spaghetti on the end of a fork (Fig. 17.5a). This will increase the magnetic energy in the vortex, though not the mean flux of magnetic field. This amplification will continue until either the field gradient is large enough that the field decays through Ohmic dissipation, or the field strength is large enough to react back on the flow and stop it spinning. Second, consider an irrotational flow containing a tangled magnetic field. Provided that the magnetic Reynolds number is again sufficiently large, the magnetic stress will act on the flow and induce vorticity. We can describe this formally by taking the curl of the equation of motion, Eq. (17.10). (For simplicity, we assume that the density ρ is constant and the electric conductivity is infinite.) We then obtain ∂ω ∇ × [(∇ × B) × B] − ∇ × (v × ω) = . ∂t µ0 ρ

(17.26)

The term on the right-hand side of this equation changes the number of vortex lines threading the fluid, just like the ∇P × ∇ρ/ρ2 term on the right-hand side of Eq. (13.3). Note, though, that as the divergence of the vorticity is zero, any fresh vortex lines that are made, must be created as continuous curves that grow out of points or lines where the vorticity vanishes. **************************** EXERCISES Exercise 17.1 Derivation: Basic Equations of MHD

13 (a) Verify that −∇ · TM = j × B where TM is the magnetic stress tensor (17.11). (b) Take the scalar product of the fluid velocity v with the equation of motion (17.10) and combine with mass conservation to obtain the energy conservation equation (17.16). (c) Combine energy conservation (17.16) with the first law of thermodynamics and mass conservation to obtain Eq. (17.17) for the evolution of the entropy. Exercise 17.2 Problem: Diffusion of Magnetic Field Consider an infinite cylinder of plasma with constant electric conductivity, surrounded by vacuum. Assume that the cylinder initially is magnetized uniformly parallel to its length, and assume that the field decays quickly enough that the plasma’s inertia keeps it from moving much during the decay (so v ≃ 0). (a) Show that the reduction of magnetic energy as the field decays is compensated by the Ohmic heating of the plasma plus energy lost to outgoing electromagnetic waves (which will be negligible if the decay is slow). (b) Compute the approximate magnetic profile after the field has decayed to a small fraction of its original value. Your answer should be expressible in terms of a Bessel function. Exercise 17.3 Problem: The Earth’s Bow Shock The solar wind is a supersonic, hydromagnetic flow of plasma originating in the solar corona. At the radius of the earth’s orbit, the density is ρ ∼ 6 × 10−21 kg m−3 , the velocity is v ∼ 400 km s−1 , the temperature is T ∼ 105 K and the magnetic field strength is B ∼ 1 nT. (a) By balancing the momentum flux with the magnetic pressure exerted by the earth’s dipole magnetic field, estimate the radius above the earth at which the solar wind passes through a bow shock. (b) Consider a strong perpendicular shock at which the magnetic field is parallel to the shock front. Show that the magnetic field strength will increase by the same ratio as the density on crossing the shock front. Do you expect the compression to increase or decrease as the strength of the field is increased, keeping all of the other flow variables constant? ****************************

17.3

Magnetostatic Equilibria

17.3.1

Controlled thermonuclear fusion

For a half century, plasma physicists have striven to release nuclear energy in a controlled manner by confining plasma at a temperature in excess of a hundred million degrees using

14 strong magnetic fields. In the most widely studied scheme, deuterium and tritium combine according to the reaction d + t → α + n + 22.4MeV. (17.27) The fast neutrons can be absorbed in a surrounding blanket of Lithium and the heat can then be used to drive a generator. At first this task seemed quite simple. However, it eventually became clear that it is very difficult to confine hot plasma with a magnetic field because most confinement geometries are unstable. In this book we shall restrict our attention to a few simple confinement devices emphasizing the one that is the basis of most modern efforts, the Tokamak. (Tokamaks were originally developed in the Soviet Union and the word is derived from a Russian abbreviation for toroidal magnetic field.) In this section we shall only treat equilibrium configurations; in Sec. 17.4, we shall consider their stability. In our discussions of both equilibrium and stability, we shall treat the plasma in the MHD approximation. At first this might seem rather unrealistic, because we are dealing with a dilute gas of ions and electrons that undergo infrequent Coulomb collisions. However, as we shall discuss in detail in Part V, collective effects produce a sufficiently high effective collision frequency to make the plasma behave like a fluid, so MHD is usually a good approximation for describing these equilibria and their rather slow temporal evolution. Let us examine some numbers that characterize the regime in which a successful controlledfusion device must operate. The ratio of plasma pressure to magnetic pressure β≡

P B 2 /2µ0

(17.28)

plays a key role. For the magnetic field to have any chance of confining the plasma, its pressure must exceed that of the plasma; i.e., β must be less than one. In fact the most successful designs achieve β ∼ 0.2. The largest field strengths that can be safely sustained in the laboratory are B ∼ 10 T (1T = 10 kG) and so β . 0.2 limits the gas pressure to P . 107 Pa ∼ 100 atmospheres. Plasma fusion can only be economically feasible if more power is released by nuclear reactions than is lost to radiative cooling. Both heating and cooling are ∝ n2 . However while the radiative cooling rate increases comparatively slowly with temperature, the nuclear reaction rate increases very rapidly. As the mean energy of the ions increases, the number of ions in the Maxwellian tail of the distribution function that are energetic enough to penetrate the Coulomb barrier will increase exponentially. This means that, if the rate of heat production exceeds the cooling rate by a modest factor, then the temperature has a value essentially fixed by atomic and nuclear physics. In the case of a d-t plasma this is T ∼ 108 K. The maximum hydrogen density that can be confined is therefore n = P/2kT ∼ 3 × 1021 m−3 . Now, if a volume V of plasma is confined at a given density n and temperature Tmin for a time τ , then the amount of nuclear energy generated will be proportional to n2 V τ , while the energy to heat the plasma up to T is ∝ nV . Therefore, there is a minimum value of the product nτ that must be attained before there will be net energy production.

15 This condition is known as the Lawson criterion. Numerically, the plasma must be confined for ∼ (n/1020 m−3 )−1 s, typically ∼ 30 ms. Now the sound speed at these temperatures is ∼ 3 × 105 m s−1 and so an unconfined plasma would hit the few-meter-sized walls of the vessel in which it is held in a few µs. Therefore, the magnetic confinement must be effective for typically 104 − 105 dynamical timescales (sound crossing times). It is necessary that the plasma be confined and confined well if we want to build a viable reactor.

17.3.2

Z-Pinch

Before discussing Tokamaks, let us begin by describing a simpler confinement geometry known as the Z-pinch [Fig. 17.6(a)]. In a Z-pinch, electric current is induced to flow along a cylinder of plasma. This creates a toroidal magnetic field whose tension prevents the plasma from expanding radially much like hoops on a barrel prevent it from exploding. Let us assume that the cylinder has a radius R and is surrounded by vacuum. Now, in static equilibrium we must balance the plasma pressure gradient by a Lorentz force: ∇P = j × B . (17.29) (Gravitational forces can safely be ignored.) Equation (17.29) implies immediately that B · ∇P = j · ∇P = 0. Both the magnetic field and the current density lie on constant pressure (or isobaric) surfaces. An equivalent version of the force balance equation (17.29), obtained using Eq. (17.15), says d B2 B2 P+ =− , (17.30) d̟ 2µ0 µ0 ̟ where ̟ is the radial cylindrical coordinate. This exhibits the balance between the gradient of plasma and magnetic pressure on the left, and the magnetic tension on the right. Treating this as a differential equation for B 2 and integrating it assuming that P falls to zero at the surface of the column, we obtain for the surface magnetic field Z 4µ0 R 2 B (R) = 2 P ̟d̟ . (17.31) R 0 We can re-express the surface toroidal field in terms of the total current flowing along the plasma as B(R) = µ0 I/2πR (Ampere’s law); and assuming that the plasma is primarily hydrogen so its ion density n and electron density are equal, we can write the pressure as P = 2nkB T . Inserting these into Eq. (17.31), integrating and solving for the current, we obtain 1/2 16πNkB T I= , (17.32) µ0 where N is the number of ions per unit length. For a 1 m column of plasma with hydrogen density n ∼ 1020 m−3 and temperature T ∼ 108 K, this says that currents of several MA are required for confinement.

16

(a) j B

(b) j

j

j B

(c)

Fig. 17.6: (a) The Z-pinch. (b) The θ-pinch. (c) The Tokamak.

17

17.3.3

Θ Pinch

There is a complementary equilibrium for a cylindrical plasma in which the magnetic field lies parallel to the axis and the current density encircles the cylinder [Fig. 17.6(b)]. This is called the θ-pinch. This configuration is usually established by making a cylindrical metal tube with a small gap so that current can flow around it as shown in the figure. The tube is filled with cold plasma and then the current is turned on quickly, producing a quickly growing longitudinal field inside the tube (as inside a solenoid). Since the plasma is highly conducting, the field lines cannot penetrate the plasma column but instead exert a stress on its surface causing it to shrink radially and rapidly. The plasma heats up due to both the radial work done on it and Ohmic heating. Equilibrium is established when the magnetic pressure B 2 /8π at the plasma’s surface balances its internal pressure.

17.3.4

Tokamak

One of the problems with these pinches (and we shall find others below) is that they have ends through which plasma can escape. This is readily cured by replacing the cylinder with a torus. It turns out that the most stable geometry, called the Tokamak, combines features of both Z- and θ-pinches; see Fig. 17.6(c). If we introduce spherical coordinates (r, θ, φ), then magnetic field lines and currents that lie in an r, θ plane (orthogonal to ~eφ ) are called poloidal, whereas φ components are called toroidal. In a Tokamak, the toroidal field is created by external poloidal current windings. However, the poloidal field is mostly created as a consequence of toroidal current induced to flow within the plasma torus. The resulting net field lines wrap around the plasma torus in a helical manner, defining a magnetic surface on which the pressure is constant. The average value of 2πdθ/dφ along the trajectory of a field line is called the rotational transform, i, and is a property of the magnetic surface on which the field line resides. If i/2π is a rational number, then the field line will close after a finite number of circuits. However, in general, i/2π will not be rational so a single field line will cover the whole magnetic surface ergodically. This allows the plasma to spread over the whole surface rapidly. The rotational transform is a measure of the toroidal current flowing within the magnetic surface and of course increases as we move outwards from the innermost magnetic surface, while the pressure decreases. The best performance to date was registered by the Tokamak Test Fusion Reactor (TTFR) in Princeton in 1994 (see Strachen et al 1994). The radius of the torus was ∼ 2.5 m and the magnetic field strength B ∼ 5 T. A nuclear power of ∼ 10 MW was produced with ∼ 40 MW of heating. The actual confinement time approached τ ∼ 1 s. The next major step is ITER (whose name means “the way” in Latin): a tokamakbased experimental fusion reactor being developed by a large international consortium; see http://www.iter.org/ . Its tokamak will be about twice as large in linear dimensions as TTFR and its goal is a fusion power output of 410 MW, about ten times that of TTFR. Even when “break-even” with large power output can be attained routinely, there will remain major engineering problems before controlled fusion will be fully practical. ****************************

18 B0

z y x

v

2a

E0

Fig. 17.7: Hartmann flow with speed v along a duct of thickness 2a, perpendicular to an applied magnetic field of strength B0 . The short side walls are conducting and the two long horizontal walls are electrically insulating.

EXERCISES Exercise 17.4 Problem: Strength of Magnetic Field in a Magnetic Confinement Device The currents that are sources for strong magnetic fields have to be held in place by solid conductors. Estimate the limiting field that can be sustained using normal construction materials. Exercise 17.5 Problem: Force-free Equilibria In an equilibrium state of a very low-β plasma, the pressure forces are ignorably small and so the Lorentz force j × B must vanish; such a plasma is said to be “force-free”. This, in turn, implies that the current density is parallel to the magnetic field, so ∇ × B = αB. Show that α must be constant along a field line, and that if the field lines eventually travel everywhere, then α must be constant everywhere. ****************************

17.4

Hydromagnetic Flows

Now let us consider a simple stationary flow. We consider flow of an electrically conducting fluid along a duct of constant cross-section perpendicular to a uniform magnetic field B0 (see Fig. 17.7). This is sometimes known as Hartmann Flow. The duct has two insulating walls (top and bottom as shown in the figure), separated by a distance 2a that is much smaller than the separation of short side walls, which are electrically conducting. In order to relate Hartmann flow to magnetic-free Poiseuille flow (viscous, laminar flow between plates), we shall reinstate the viscous force in the equation of motion. For simplicity we shall assume that the time-independent flow (∂v/∂t = 0) has travelled sufficiently far

19

R

I (a)

0

(b)

+V

(c)

(d)

Fig. 17.8: Four variations on Hartmann flow. a)Electromagnetic Brake. b) MHD Power generator c) Flow meter d) Electromagnetic pump.

down the duct (x direction) to have reached an x-independent form, so v · ∇v = 0 and v = v(y, z); and we assume that gravitational forces are unimportant. Then the flow’s equation of motion takes the form ∇P = j × B + η∇2 v ,

(17.33)

where η = ρν is the coefficient of dynamical viscosity. The magnetic (Lorentz) force j×B will alter the balance between the Poiseuille flow’s viscous force η∇2 v and the pressure gradient ∇P . The details of that altered balance and the resulting magnetic-influenced flow will depend on how the walls are connected electrically. Let us consider four possibilities that bring out the essential physics: Electromagnetic Brake; Fig. 17.8a We short circuit the electrodes so a current j can flow. The magnetic field lines are partially dragged by the fluid, bending them (as embodied in ∇ × B = µ0 j) so they can exert a decelerating tension force j × B = (∇ × B) × B/µ0 = B · ∇B/µ0 on the flow (Fig. 17.3b). This is an Electromagnetic Brake. The pressure gradient, which is trying to accelerate the fluid, is balanced by the magnetic tension. The work being done (per unit volume) by the pressure gradient, v · (−∇P ), is converted into heat through viscous and Ohmic dissipation. MHD Power generator; Fig. 17.8b This is similar to the electromagnetic brake except that an external load is added to the circuit. Useful power can be extracted from the flow. This may ultimately be practical in power stations where a flowing, conducting fluid can generate electricity directly without having to drive a turbine. Flow Meter; Fig. 17.8c

20 When the electrodes are on open circuit, the induced electric field will produce a measurable potential difference across the duct. This voltage will increase monotonically with the rate of flow of fluid through the duct and therefore can provide a measurement of the flow. Electromagnetic Pump; Figs. 17.7 and 17.8d Finally we can attach a battery to the electrodes and allow a current to flow. This produces a Lorentz force which either accelerates or decelerates the flow depending on the direction of the magnetic field. This method is used to pump liquid sodium coolant around a nuclear reactor. It has also been proposed as a means of spacecraft propulsion in interplanetary space. We consider in some detail two limiting cases of the electromagnetic pump. When there is a constant pressure gradient Q = −dP/dx but no magnetic field, a flow with modest Reynolds number will be approximately laminar with velocity profile z 2 Q 1− , (17.34) vx (z) = 2η a where a is the half width of the channel. This is the one-dimensional version of the “Poiseuille flow” in a pipe such as a blood vessel, which we studied in Sec. 12.7.6; cf. Eq. (12.76). Now suppose that uniform electric and magnetic fields E0 , B0 are applied along the ey and ez directions respectively (Fig. 17.7). The resulting magnetic force j × B can either reinforce or oppose the fluid’s motion. When the applied magnetic field is small, B0 ≪ E0 /vx , the effect of the magnetic force will be very similar to that of the pressure gradient, and Eq. 17.34 must be modified by replacing Q ≡ −dP/dx by −dP/dx + jy Bz = −dP/dx + κe E0 B0 . [Here jy = κe (Ey − vx Bz ) ≃ κe E0 .] If the strength of the magnetic field is increased sufficiently, then the magnetic force will dominate the viscous force, except in thin boundary layers near the walls. Outside the boundary layers, in the bulk of the flow, the velocity will adjust so that the electric field vanishes in the rest frame of the fluid, i.e. vx = E0 /B0 . In the boundary layers there will be a sharp drop of vx from E0 /B0 to zero at the walls, and correspondingly a strong viscous force, η∇2 v. Since the pressure gradient ∇P must be essentially the same in the boundary layer as in the adjacent bulk flow and thus cannot balance this large viscous force, it must be balanced instead by the magnetic force, j × B + η∇2 v = 0 [Eq. (17.33)] with j = κe (E + v × B) ∼ κe vx B0 ey . We thereby see that the thickness of the boundary layer will be given by 1/2 η δH ∼ . (17.35) κe B 2 This suggests a new dimensionless number to characterize the flow, a = B0 a H= δH

κe η

1/2

(17.36)

called the Hartmann number. H 2 is essentially the ratio of the magnetic force |j × B| ∼ κe vx B02 to the viscous force ∼ ηvx /a2 , assuming a lengthscale a rather than δH for variations of the velocity.

21 1 0.8 0.6 vx (z) vx (0) 0.4 0.2 -1

- 0.5

0 z/a

0.5

1

Fig. 17.9: Velocity profiles [Eq. 17.38] for flow in an electromagnetic pump of width 2a with small and large Hartmann number scaled to the velocity at the center of the channel. Dashed curve: the almost parabolic profile for H = 0.1 [Eq. (17.34)]. Solid curve: the almost flat topped profile for H = 10.

The detailed velocity profile vx (z) away from the vertical side walls is computed in Exercise 17.6 and is shown for low and high Hartmann numbers in Fig. 17.9. Notice that at low H, the plotted profile is nearly parabolic as expected, and at high H it consists of boundary layers at z ∼ −a and z ∼ a, and a uniform flow in between. **************************** EXERCISES Exercise 17.6 Example: Hartmann Flow Compute the velocity profile of a conducting fluid in a duct of thickness 2a perpendicular to externally generated, uniform electric and magnetic fields (E0 ey and B0 ez ) as shown in Fig. 17.7. Away from the vertical sides of the duct, the velocity vx is just a function of z and the pressure can be written in the form P = −Qx + p(z), where Q is the longitudinal pressure gradient. (a) Show that the velocity field satisfies the differential equation (Q + κe B0 E0 ) d2 vx κe B02 − vx = − . 2 dz η η

(17.37)

(b) Impose suitable boundary conditions at the bottom and top walls of the channel and solve this differential equation to obtain the following velocity field: Q + κe B0 E0 cosh(Hz/a) vx = 1− , (17.38) κe B02 coshH where H is the Hartmann number; cf. Fig. 17.9.

****************************

22

17.5

Stability of Hydromagnetic Equilibria

Having used the MHD equation of motion to analyze some simple flows, let us return to the question of magnetic confinement and demonstrate a procedure to analyze the stability of hydromagnetic equilibria. We first perform a straightforward linear perturbation analysis about equilibrium, obtaining an eigenequation for the perturbation’s oscillation frequencies ω. For sufficiently simple equilibria, this eigenequation can be solved analytically, but most equilibria are too complex for this so the eigenequation must be solved numerically or by other approximation techniques. This is rather similar to the task we face in attempting to solve the Schrödinger equation for multi-electron atoms. It will not be a surprise to learn that variational methods are especially practical and useful, and we shall develop a suitable formalism. We shall develop the perturbation theory, eigenequation, and variational formalism in some detail not only because of their importance for the stability of hydromagnetic equilibria, but also because essentially the same techniques (with different equations) are used in studying the stability of other equilibria. One example is the oscillations and stability of stars, in which the magnetic field is unimportant while self gravity is crucial [see, e.g., Chap. 6 of Shapiro and Teukolsky (1983), and Sec. 15.2.4 of this book, on helioseismology]. Another example is the oscillations and stability of elastostatic equilibria, in which B is absent but shear stresses are important (see, e.g., Secs. 11.3 and 11.4).

17.5.1

Linear Perturbation Theory

Consider a perfectly conducting isentropic fluid at rest in equilibrium with pressure gradients that balance magnetic forces. For simplicity, we shall ignore gravity. (This is usually justified in laboratory situations.) The equation of equilibrium then reduces to ∇P = j × B .

(17.39)

We now perturb slightly about this equilibrium and ignore the (usually negligible) effects of viscosity and magnetic-field diffusion, so η = ρν ≃ 0, κe ≃ ∞. It is useful and conventional to describe the perturbations in terms of two different types of quantities: (i) The change in a quantity (e.g. the fluid density) moving with the fluid, which is called a Lagrangian perturbation and denoted by the symbol ∆ (e.g, the Lagrangian density perturbation ∆ρ). (ii) The change at fixed location in space, which is called an Eulerian perturbation and denoted by the symbol δ (e.g, the Eulerian density perturbation δρ). The fundamental variable used in the theory is the fluid’s Lagrangian displacement ∆x ≡ ξ(x, t); i.e. the change in location of a fluid element, moving with the fluid. A fluid element whose location is x in the unperturbed equilibrium is moved to location x + ξ(x, t) by the perturbations. From their definitions, one can see that the Lagrangian and Eulerian perturbations are related by ∆ = δ+ξ·∇ e.g., ∆ρ = δρ + ξ · ∇ρ . (17.40) Now, consider the transport law for the magnetic field, ∂B/∂t = ∇×(v×B) [Eq. (17.6)]. To linear order, the velocity is v = ∂ξ/∂t. Inserting this into the transport law, and setting the

23 full magnetic field at fixed x, t equal to the equilibrium field plus its Eulerian perturbation B → B+δB, we obtain ∂δB/∂t = ∇×[(∂ξ/∂t)×(B +δB)]. Linearizing in the perturbation, and integrating in time, we obtain for the Eulerian perturbation of the magnetic field: δB = ∇ × (ξ × B) .

(17.41)

Since the current and the field are related, in general, by the linear equation j = ∇ × B/µ0 , their Eulerian perturbations are related in this same way: δj = ∇ × δB/µ0 .

(17.42)

In the equation of mass conservation, ∂ρ/∂t + ∇ · (ρv) = 0, we replace the density by its equilibrium value plus its Eulerian perturbation, ρ → ρ + δρ and replace v by ∂ξ/∂t, and we linearize in the perturbation to obtain δρ + ρ∇ · ξ + ξ · ∇ρ = 0 .

(17.43)

The Lagrangian density perturbation, obtained from this via Eq. (17.40), is ∆ρ = −ρ∇ · ξ .

(17.44)

We assume that, as it moves, the fluid gets compressed or expanded adiabatically (no Ohmic or viscous or heating, or radiative cooling). Then the Lagrangian change of pressure ∆P in each fluid element (moving with the fluid) is related to the Lagrangian change of density by γP ∂P ∆ρ = ∆ρ = −γP ∇ · ξ , (17.45) ∆P = ∂ρ s ρ where γ is the fluid’s adiabatic index (ratio of specific heats), which might or might not be independent of position in the equilibrium configuration. Correspondingly, the Eulerian perturbation of the pressure (perturbation at fixed location) is δP = ∆P − (ξ · ∇)P = −γP (∇ · ξ) − (ξ · ∇)P .

(17.46)

This is the pressure perturbation that appears in the fluid’s equation of motion. By replacing v → ∂ξ/∂t, P → P +δP and B → δB, and j → j+δj in the fluid’s equation of motion (17.10) and neglecting gravity, and by then linearizing in the perturbation, we obtain ∂2ξ ˆ . ρ 2 = j × δB + δj × B − ∇δP = F[ξ] (17.47) ∂t ˆ Here F[ξ] is a real, linear differential operator, whose form one can deduce by substituting expressions (17.41), (17.42), (17.46) for δB, δj, and δP , and ∇ × B/µ0 for j. By performing those substitutions and carefully rearranging the terms, we eventually convert the operator ˆ into the following form, expressed in slot-naming index notation: F Bj Bk Bj Bk B2 Bi Bj B2 ˆ ξk;k + ξj;i + ξj;k + P + ξi;k + ξk;k Fi [ξ] = (γ − 1)P + 2µ0 µ0 2µ0 µ0 µ0 ;i ;j (17.48)

24 Honestly! Here the semicolons denote gradients (partial derivatives in Cartesian coordinates; connection coefficients are required in curvilinear coordinates). We write the operator Fî in the explicit form (17.48) because of its power for demonstrating that Fî is self adjoint (Hermitian, with real variables rather than complex): By introducing the Kronecker-delta components of the metric, gij = δij , we can rewrite Eq. (17.48) in the form Fî [ξ] = (Tijkl ξk;l );j , (17.49) where Tijkl are the components of a fourth rank tensor that is symmetric under interchange of its first and second pairs of indices, Tijkl = Tklij . It then should be evident that, when we integrate over the volume V of our hydromagnetic configuration, we obtain Z Z Z Z Z ζi (Tijkl ξk;l );j = − Tijkl ζi;j ξk;l = ξi (Tijkl ζk;l );j = ξ · F[ζ]dV . ζ · F[ξ]dV = V

V

V

V

V

(17.50) Here we have used Gauss’s theorem (integration by parts), and to make the surface terms vanish we have required that ξ and ζ be any two functions that vanish on the boundary of the configuration, ∂V [or, more generally, for which Tijkl ξk;l ζinj and Tijkl ζk;l ξi nj vanish there, with nj the normal to the boundary]. Equation (17.50) demonstrates the self adjointness ˆ We shall use this below. (Hermiticity) of F. Returning to our perturbed MHD system, we seek its normal modes by assuming a harmonic time dependence, ξ ∝ e−iωt . The first-order equation of motion then becomes ˆ + ρω 2 ξ = 0 . F[ξ]

(17.51)

This is an eigenequation for the fluid’s Lagrangian displacement ξ, with eigenvalue ω 2. It must be augmented by boundary conditions at the edge of the fluid; see below. By virtue of the elegant, self-adjoint mathematical form (17.49) of the differential operator Fˆ , our eigenequation (17.51) is of a very special and powerful type, called Sturm-Liouville; see, e.g, Mathews and Walker (1970). From the general (rather simple) theory of SturmLiouville equations, we can infer that all the eigenvalues ω 2 are real, so the normal modes are purely oscillatory (ω 2 > 0, ξ ∝ e±i|ω|t ) or are purely exponentially growing or decaying (ω 2 < 0, ξ ∝ e±|ω|t ). Exponentially growing modes represent instability. Sturm-Liouville theory also implies that all eigenfunctions [labeled by indices “(n)”] with different eigenfrequencies R (n) (m) are orthogonal to each other, in the sense that V ρξ ξ = 0. The boundary conditions, as always, are crucial. In the simplest case, the conducting fluid is supposed to extend far beyond the region where the disturbances are of appreciable amplitude. In this case we merely require that |ξ| → 0 as |x| → ∞. More reasonably, the fluid might be enclosed within rigid walls, where the normal component of ξ vanishes. The most commonly encountered case, however, involves a conducting fluid surrounded by vacuum. No current will flow in the vacuum region and so ∇ × δB = 0 there. In this case, a suitable magnetic field perturbation in the vacuum region must be matched to the magnetic field derived from Eq. (17.41) for the perfect MHD region using the junction conditions discussed in Sec. 17.2.3.

25

17.5.2

Z-Pinch; Sausage and Kink Instabilities

We illustrate MHD stability theory using a simple, analytically tractable example. We consider a long cylindrical column of a conducting, incompressible liquid such as mercury, with column radius R and fluid density ρ. The column carries a current I longitudinally along its surface, so j = (I/2πR)δ(̟ − R)ez , and it is confined by the resulting external toroidal magnetic field Bφ ≡ B. The interior of the plasma is field free and at constant pressure P0 . From ∇ × B = µ0 j, we deduce that the exterior magnetic field is Bφ ≡ B =

µ0 I 2π̟

at

̟≥R.

(17.52)

Here (̟, φ, z) are the usual cylindrical coordinates. This hydromagnetic equilibrium configuration is called the Z-pinch because the z-directed current on the column’s surface creates the external toroidal field B, which pinches the column until its internal pressure is balanced by the field’s tension, 2 B ; (17.53) P0 = 2µ0 ̟ ̟=R see Sec. 17.3.2 and Fig. 17.6a It is quicker and more illuminating to analyze the stability of this Z-pinch equilibrium ˆ and the outcome is the same. Treating only the most directly instead of by evaluating F, elementary case, we consider small, axisymmetric perturbations with an assumed variation ξ ∝ ei(kz−ωt) f(̟) for some function f. As the magnetic field interior to the column vanishes, the equation of motion ρdv/dt = −∇(P + δP ) becomes −ω 2 ρξ̟ = −δP ′ ,

−ω 2 ρξz = −ikδP ,

(17.54)

where the prime denotes differentiation with respect to radius ̟. Combining these two equations, we obtain ξz′ = ikξ̟ . (17.55) Because the fluid is incompressible, it satisfies ∇ · ξ = 0; i.e., ̟ −1 (̟ξ̟ )′ + ikξz = 0 ,

(17.56)

which, with Eq. (17.55), leads to ξz′′ +

ξz′ − k 2 ξz = 0 . ̟

(17.57)

The solution of this equation that is regular at ̟ = 0 is ξz = AI0 (k̟) at ̟ ≤ R ,

(17.58)

where A is a constant and In (x) is the modified Bessel function In (x) = i−n Jn (ix). From Eq. (17.55) and dI0 (x)/dx = I1 (x), we obtain ξ̟ = −iAI1 (k̟).

(17.59)

26 Next, we consider the region exterior to the fluid column. As this is vacuum, it must be current-free; and as we are dealing with a purely axisymmetric perturbation, the ̟ component of ∇ × δB = µ0 δj reads ∂δBφ = ikδBφ = µ0 δj̟ = 0. ∂z

(17.60)

The φ component of the magnetic perturbation therefore vanishes outside the column. The interior and exterior solutions must be connected by the law of force balance, i.e. by the boundary condition (17.23) at the fluid surface. Allowing for the displacement of the surface and retaining only linear terms, this becomes P0 + ∆P = P0 + (ξ · ∇)P0 + δP =

(B + ∆Bφ )2 B2 B BδBφ = + (ξ · ∇)B + , 2µ0 2µ0 µ0 µ0

(17.61)

where all quantities are evaluated at ̟ = R. Now, the equilibrium force-balance condition gives us that P0 = B 2 /2µ0 [Eq. (17.53)] and ∇P0 = 0. In addition we have shown that δBφ = 0. Therefore Eq. (17.61) becomes simply BB ′ δP = ξ̟ . µ0

(17.62)

Substituting δP from Eqs. (17.54) and (17.58), B from Eq. (17.52), and ξ̟ from Eq. (17.59), we obtain the dispersion relation −µ0 I 2 kRI1 (kR) ω = 4π 2 R4 ρ I0 (kR) −µ0 I 2 k; k ≪ R−1 ∼ 2 2 8π R ρ −µ0 I 2 ∼ k; k ≫ R−1 , 4π 2 R3 ρ 2

(17.63)

where we have used I0 (x) ∼ 1, I1 (x) ∼ x/2 as x → 0 and I1 (x)/I0 (x) → 1 as x → ∞. Because I0 and I1 are positive for all kR > 0, for every wave number k this dispersion relation says that ω 2 is negative. Therefore, ω is imaginary and the perturbation grows exponentially with time, and the Z-pinch configuration is dynamically unstable. If we define a characteristic Alfvén speed by a = B(R)/(µ0 ρ)1/2 [Eq. (17.74) below], then we see that the growth time for modes with wavelength comparable to the column diameter is a few Alfvén crossing times, a few times 2R/a. This is fast! This is sometimes called a sausage instability, because its eigenfunction ξ̟ ∝ eikz consists of oscillatory pinches of the column’s radius that resemble the pinches between sausages in a link. This sausage instability has a simple physical interpretation (Fig. 17.10a), one that illustrates the power of the concepts of flux freezing and magnetic tension for developing intuition. If we imagine an inward radial motion of the fluid, then the toroidal loops of magnetic field will be carried inward too and will therefore shrink. As the fluid is incompressible, the strength of the field will increase, leading to a larger “hoop” stress or, equivalently, a

27 B

B v

v

j

v

(a)

(b)

j

v

Fig. 17.10: Physical interpretation of a) sausage and b) kink instabilities.

larger j × B Lorentz force. This cannot be resisted by any increase in pressure and so the perturbation will continue to grow. So far, we have only considered axisymmetric perturbations. We can generalize our analysis by allowing the perturbations to vary as ξ ∝ exp(imφ). (Our sausage instability corresponds to m = 0.) Modes with m ≥ 1, like m = 0, are also generically unstable. For example, m = 1 modes are known as kink modes. In this case, there is a bending of the column so that the field strength will be intensified along the inner face of the bend and reduced along the outer face, thereby amplifying the instability (Fig. 17.10b). In addition the incorporation of compressibility, as is appropriate for plasma instead of mercury, introduces only algebraic complexity; the conclusions are unchanged. The column is still highly unstable. We can also add magnetic field to the column’s interior. These MHD instabilities have bedevilled attempts to confine plasma for long enough to bring about nuclear fusion. Indeed, considerations of MHD stability were one of the primary motivations for the Tokamak, the most consistently successful of trial fusion devices. The Θ-pinch (Sec. 17.3.3 and Fig. 17.6b) turns out to be quite MHD stable, but naturally, cannot confine plasma without closing its ends. This can be done through the formation of a pair of magnetic mirrors or by bending the column into a closed torus. However, magnetic mirror machines have problems with losses and toroidal Θ-pinches exhibit new MHD instabilities involving the interchange of bundles of curving magnetic field lines. The best compromise appears to be a Tokamak with its combination of toroidal and poloidal magnetic field. The component of magnetic field along the plasma torus acts to stabilize through its pressure against sausage type instabilities and through its tension against kink-type instablities. In addition, formation of image currents in the conducting walls of a Tokamak vessel can also have a stabilising influence.

17.5.3

Energy Principle

Analytical, or indeed numerical solutions to the perturbation equations are only readily obtained in the most simple of geometries and for the simplest fluids. However, as the equation of motion is expressible in self-adjoint form, it is possible to write down a variational principle and use it to derive approximate stability criteria. To do this, begin by multiplying ˙ and then integrate over the whole volume V, ˆ the equation of motion ρ∂ 2 ξ/∂t2 = F[ξ] by ξ,

28 and use Gauss’s theorem to integrate by parts. The result is dE = 0 , where E = T + W , (17.64) dt Z Z 1 1 ˙2 ˆ . dV ξ · F[ξ] (17.65) W =− T = dV ρξ 2 2 V V The integrals T and W are the perturbation’s kinetic and potential energy, and E = T + W is the conserved total energy. Any solution of the equation of motion ∂ 2 ξ/∂t2 = F[ξ] can beP expanded in terms of a ˆ is complete set of normal modes ξ(n) (x) with eigenfrequencies ωn , ξ = n An ξ(n) e−iωn t . As F a real, self-adjoint operator, these normal modes can all be chosen to be real and orthogonal, even when some of their frequencies are degenerate. As the perturbation evolves, its energy sloshes back and forth between kinetic T and potential W , so time averages of T and W are ¯ . This implies, for each normal mode, that equal, T¯ = W ωn2

=R

V

W [ξ (n) ] dV 21 ρξ (n)2

.

(17.66)

As the denominator is positive definite, we conclude that a hydromagnetic equilibrium is stable against small perturbations if and only if the potential energy W [ξ] is a positive definite functional of the perturbation ξ. This is sometimes called the Rayleigh Principle in dynamics; in the MHD context, it is known as the Energy Principle. ˆ It is straightforward to verify, by virtue of the self-adjointness of F[ξ], that expression (17.66) serves as an action principle for the eigenfrequencies: If one inserts into (17.66) a trial function ξtrial in place of ξ (n) , then the resulting value of (17.66) will be stationary under small variations of ξ trial if and only if ξtrial is equal to some eigenfunction ξ(n) ; and the stationary value of (17.66) is that eigenfunction’s squared eigenfrequency ωn2 . This action principle is most useful for estimating the lowest few squared frequencies ωn2 . Relatively crude trial eigenfunctions can furnish surprisingly accurate eigenvalues. Whatever may be our chosen trial function ξ trial , the computed value of the action (17.66) will always be larger than ω02 , the squared eigenfrequency of the most unstable mode. Therefore, if we compute a negative value of (17.66) using some trial eigenfunction, we know that the equilibrium must be even more unstable. These energy principle and action principle are special cases of the general conservation law and action principle for Sturm-Liouville differential equations; see, e.g., Mathews and Walker (1970). **************************** EXERCISES Exercise 17.7 Example: Reformulation of the Energy Principle The form (17.48) of the potential energy functional derived in the text is necessary to demonˆ is self-adjoint. However, there are several simpler, equivalent forms strate that the operator F which are more convenient for practical use.

29 B

P

Fig. 17.11: Impossibility of an axisymmetric dynamo.

(a) Use Eq. (17.47) to show that ˆ ξ · F[ξ] = j · b × ξ − b2 /µ0 − γP (∇ · ξ)2 − (∇ · ξ)(ξ · ∇)P + ∇ · [(ξ × B) × b/µ0 + γP ξ(∇ · ξ) + ξ(ξ · ∇)P ] ,

(17.67)

where b ≡ δB is the Eulerian perturbation of the magnetic field. (b) Transform the potential energy W [ξ] into a sum over volume and surface integrals. (c) Consider axisymmetric perturbations of the cylindrical Z-pinch of an incompressible fluid, as discussed in Sec 17.5.2, and argue that the surface integral vanishes. (d) Hence adopt a simple trial eigenfunction and obtain a variational estimate of the growth rate of the fastest growing mode. ****************************

17.6

Dynamos and Reconnection of Magnetic Field Lines

As we have already remarked, the time scale for the earth’s magnetic field to decay is estimated to be roughly a million years. This means that some process within the earth must be regenerating the magnetic field. This process is known as a dynamo process. In general, what happens in a dynamo process is that motion of the fluid is responsible for stretching the magnetic field lines and thereby increasing the magnetic energy density, thereby compensating the decrease in the magnetic energy associated with Ohmic decay. In fact, the details of how this happens inside the earth are not well understood. However, some general principles of dynamo action have been formulated.

17.6.1

Cowling’s theorem

It is simple to demonstrate that it is impossible for a stationary magnetic field, in a fluid with finite electric conductivity κe , to be axisymmetric. Suppose that there were such a

30 dynamo and the poloidal (meridional) field had the form sketched in Fig. 17.11. Then there must be at least one neutral point marked P (actually a circle about the symmetry axis), where the poloidal field vanishes. However, the curl of the magnetic field does not vanish at P, so there must be a toroidal current jφ there. Now, in the presence of finite resistivity, there must also be a toroidal electric field at P, since jφ = κe [Eφ + (vP × BP )φ ] = κe Eφ .

(17.68)

The nonzero Eφ in turn implies, via ∇×E = −∂B/∂t, that the amount of poloidal magnetic flux threading the circle at P must change with time, violating our original supposition that the magnetic field distribution is stationary. We therefore conclude that any self-perpetuating dynamo must be more complicated than a purely axisymmetric magnetic field. This is known as Cowling’s theorem.

17.6.2

Kinematic dynamos

The simplest types of dynamo to consider are those in which we specify a particular velocity field and allow the magnetic field to evolve according to the transport law (17.6). Under certain circumstances, this can produce dynamo action. Note that we do not consider, in our discussion, the dynamical effect of the magnetic field on the velocity field. The simplest type of motion is one in which a dynamo cycle occurs. In this cycle, there is one mechanism for creating toroidal magnetic field from poloidal field and a separate mechanism for regenerating the poloidal field. The first mechanism is usually differential rotation. The second is plausibly magnetic buoyancy in which a toroidal magnetized loop is lighter than its surroundings and therefore rises in the gravitational field. As the loop rises, Coriolis forces twist the flow causing poloidal magnetic field to appear. This completes the dynamo cycle. Small scale, turbulent velocity fields may also be responsible for dynamo action. In this case, it can be shown on general symmetry grounds that the velocity field must contain helicity, a non-zero expectation of v · ω. If the magnetic field strength grows then its dynamical effect will eventually react back on the flow and modify the velocity field. A full description of a dynamo must include this back reaction. Dynamos are a prime target for numerical simulations of MHD and significant progress has been made in understanding specialized problems, like the terrestrial dynamo, in recent years.

17.6.3

Magnetic Reconnection

Our discussion so far of the evolution of the magnetic field has centered on the induction equation (magnetic transport law), Eq. (17.6); and we have characterized our magnetized fluid by a magnetic Reynolds number using some characteristic length L associated with the flow and have found that Ohmic dissipation is unimportant when RM ≫ 1. This is reminiscent of the procedure we followed when discussing vorticity. However, for vorticity we discovered a very important exception to an uncritical neglect of viscosity and dissipation at large Reynolds number, namely boundary layers. In particular, we found that such flow near solid surfaces will develop very large velocity gradients on account of the no-slip boundary

31

B

v

v

Fig. 17.12: Illustration of magnetic reconnection. A continuous flow can develop through the shaded reconnection region where Ohmic diffusion is important. Magnetic field lines “exchange partners” changing the overall field topology. Magnetic field components perpendicular to the plane of the illustration do not develop large gradients and so do not inhibit the reconnection process.

condition and that the local Reynolds number can thereby decrease to near unity, allowing viscous stress to change the character of the flow completely. Something very similar, called magnetic reconnection, can happen in hydromagnetic flows with large RM , even without the presence of solid surfaces: Consider two oppositely magnetized regions of conducting fluid moving toward each other (the upper and lower regions in Fig. 17.12). There will be a mutual magnetic attraction of the two regions as magnetic energy would be reduced if the two sets of field lines were superposed. However, strict flux freezing prevents superposition. Something has to give. What happens is a compromise. The attraction causes large magnetic gradients to develop accompanied by a buildup of large current densities, until Ohmic diffusion ultimately allows the magnetic field lines to slip sideways through the fluid and to reconnect with field in the other region (the sharply curved field lines in Fig. 17.12). This reconnection mechanism can be clearly observed at work within Tokamaks and at the earth’s magnetopause where the solar wind’s magnetic field meets the earth’s magnetosphere. However the details of the reconnection mechanism are quite complex, involving plasma instabilities and shock fronts. Large, inductive electric fields can also develop when the magnetic geometry undergoes rapid change. This can happen in the reversing magnetic field in the earth’s magnetotail, leading to the acceleration of charged particles which impact the earth during a magnetic substorm. Like dynamo action, reconnection has a major role in determining how magnetic fields actually behave in both laboratory and space plasmas. ****************************

32 EXERCISES Exercise 17.8 Problem: Differential rotation in the solar dynamo This problem shows how differential rotation leads to the production of toroidal magnetic field from poloidal field. (a) Verify that for a fluid undergoing differential rotation around a symmetry axis with angular velocity Ω(r, θ), the φ component of the induction equation reads ∂Ω ∂Ω ∂Bφ , (17.69) = sin θ Bθ + Br r ∂t ∂θ ∂r where θ is the co-latitiude. (The resistive term can be ignored.) (b) It is observed that the angular velocity on the solar surface is largest at the equator and decreases monotonically towards the poles. There is evidence (though less direct) that ∂Ω/∂r < 0 in the outer parts of the sun where the dynamo operates. Suppose that the field of the sun is roughly poloidal. Sketch the appearance of the toroidal field generated by the poloidal field. Exercise 17.9 Problem: Buoyancy in the solar dynamo Consider a slender flux tube in hydrostatic equilibrium in a conducting fluid. Assume that the diameter of the flux tube is much less than its length, and than its radius of curvature R, and than the external pressure scale height H; and assume that the magnetic field is directed along the tube, so there is negligible current flowing along the tube. (a) Show that the requirement of static equilibrium implies that B2 ∇ P+ =0. 2µ0

(17.70)

(b) Assume that the tube makes a complete circular loop of radius R in the equatorial plane of a spherical star. Also assume that the fluid is isothermal of temperature T so that the pressure scale height is H = kB T /µg, where µ is the mean molecular weight and g is the gravity. Prove that magnetostatic equilibrium is possible only if R = 2H. (c) In the solar convection zone, H ≪ R/2. What happens to the toroidal field produced by differential rotation? Suppose the toroidal field breaks through the solar surface. What direction must the field lines have to be consistent with the previous example?

****************************

33

17.7

Magnetosonic Waves and the Scattering of Cosmic Rays

We have discussed global wave modes in a non-uniform magnetostatic plasma and described how they may be unstable. We now consider a particularly simple example: planar, monochromatic, propagating wave modes in a uniform, magnetized, conducting medium. These waves are called magnetosonic modes. They can be thought of as sound waves that are driven not just by gas pressure but also by magnetic pressure and tension. Although magnetosonic waves have been studied under laboratory conditions, there the magnetic Reynolds numbers are generally quite small and they damp quickly. No such problem arises in space plasmas, where magnetosonic modes are routinely studied by the many spacecraft that monitor the solar wind and its interaction with planetary magnetospheres. It appears that these modes perform an important function in space plasmas; they control the transport of cosmic rays. Let us describe some of the properties of cosmic rays before giving a formal derivation of the magnetosonic-wave dispersion relation.

17.7.1

Cosmic Rays

Cosmic rays are the high-energy particles, primarily protons, that bombard the earth’s magnetosphere from outer space. They range in energy from ∼ 1MeV to ∼ 3 × 1011 GeV = 0.3 ZeV. (The highest cosmic ray energy measured is 50 J. Thus, naturally occuring particle accelerators are far more impressive than their terrestrial counterparts which can only reach to ∼ 10 TeV = 104 GeV!) Most sub-relativistic particles originate within the solar system; their relativistic counterparts, up to energies ∼ 100 TeV, are believed to come mostly from interstellar space, where they are accelerated by the expanding shock waves formed by supernova explosions (cf. Sec. 16.6.2). The origin of the highest energy particles, above ∼ 100 TeV, is an intriguing mystery. The distribution of cosmic ray arrival directions at earth is inferred to be quite isotropic (to better than one part in 104 at an energy of 10 GeV). This is somewhat surprising because their sources, both within and beyond the solar system, are believed to be distributed anisotropically, so the isotropy needs to be explained. Part of the reason for the isotropization is that the interplanetary and interstellar media are magnetized and the particles gyrate around the magnetic field with the gyro frequency ωG = eBc2 /ǫ, where ǫ is the particle energy and B is the magnetic field strength. The Larmor radii of the non-relativistic particles are typically small compared with the size of the solar system and those of the relativistic particles are typically small compared with the typical scales in the interstellar medium. Therefore, this gyrational motion can effectively erase any azimuthal asymmetry with respect to the field direction. However, this does not stop the particles from streaming away from their sources along the direction of the magnetic field, thereby producing anisotropy at earth; so something else must be impeding this flow and scattering the particles, causing them to effectively diffuse along and across the field through interplanetary and interstellar space. As we shall verify in Chap. 18 below, Coulomb collisions are quite ineffective, and if they were effective, then they would cause huge energy losses in violation of observations. We therefore seek some means of changing a cosmic ray’s momentum, without altering its

34 energy significantly. This is reminiscent of the scattering of electrons in metals, where it is phonons (elastic waves in the crystal lattice) that are responsible for much of the scattering. It turns out that in the interstellar medium magnetosonic waves can play a role analogous to phonons, and scatter the cosmic rays. As an aid to understanding this, we now derive the waves’ dispersion relation.

17.7.2

Magnetosonic Dispersion Relation

Our procedure by now should be familiar. We consider a uniform, isentropic, magnetized fluid at rest, perform a linear perturbation, and seek monochromatic, plane-wave solutions varying ∝ ei(k·x−ωt) . We ignore gravity and dissipative processes (specifically viscosity, thermal conductivity and electrical resisitivity), as well as gradients in the equilibrium, which can all be important in one circumstance or another. It is convenient to use the velocity perturbation as the independent variable. The perturbed and linearized equation of motion (17.10) then takes the form −iρωδv = −ic2s kδρ + δj × B ,

(17.71)

where δv is the velocity perturbation, cs is the sound speed [c2s = (∂P/∂ρ)s = γP/ρ] and δP = c2s δρ is the Eulerian pressure perturbation for our homogeneous equilibrium [note that ∇P = ∇ρ = 0 so Eulerian and Lagrangian perturbations are the same]. We use the notation cs to avoid confusion with the speed of light. The perturbed equation of mass conservation ∂ρ/∂t + ∇ · (ρv) = 0 becomes ωδρ = ρk · δv , (17.72) and Faraday’s law ∂B/∂t = −∇ × E and the MHD law of magnetic-field transport with dissipation ignored, ∂B/∂t = ∇ × (v × B) become ωδB = k × E = −k × (δv × B) .

(17.73)

We introduce the Alfvén velocity a≡

B (µ0 ρ)1/2

(17.74)

and insert δρ [Eq. (17.72)] and δB [Eq. (17.73)] into Eq. (17.71)] to obtain [k × {k × (δv × a)}] × a + c2s (k · δv)k = ω 2 δv .

(17.75)

This is an eigenequation for the wave’s frequency ω 2 and eigendirection δv. The straightforward way to solve it is to rewrite it in the standard matrix form Mij δvj = ω 2 δvi and then use standard matrix (determinant) methods. It is quicker, however, to seek the three eigendirections δv and eigenfrequencies ω one by one, by projection along perferred directions: We first seek a solution to Eq. (17.75) for which δv is orthogonal to the plane formed by the unperturbed magnetic field and the wave vector, δv = a × k (up to a multiplicative constant). Inserting this δv into Eq. (17.75), we obtain the dispersion relation ω = ±a · k ;

ω = ±a cos θ , k

(17.76)

35 B f ω/k

i θ s

s i f

Fig. 17.13: Phase velocity surfaces for the three types of magnetosonic modes, fast (f), intermediate (i) and slow(s). The three curves are polar plots of the wave phase velocity ω/k in units of √ the Alfvén speed a = B/ µ0 ρ. In the particular example shown, the sound speed cs is half the Alfvén speed.

where θ is the angle between k and the unperturbed field. This type of wave is known as the Intermediate mode and also as the Alfvén mode. Its phase speed ω/k = a cos θ is plotted as the larger figure-8 curve in Fig. 17.13. The velocity and magnetic perturbations δv and δB are both along the direction a×k, so the wave is fully transverse; and there is no compression (δρ = 0), which accounts for the absence of the sound speed cS in the dispersion relation. This Alfvén mode has a simple physical interpretation in the limiting case when k is parallel to B. We can think of the magnetic field lines as strings with tension B 2 /µ0 and inertia ρ, Their transverse oscillations then propagate with speed p which are plucked transversely. √ tension/inertia = B/ µ0 ρ = a. The dispersion relations for the other two modes can be deduced by projecting the eigenequation (17.75) successively along k and along a to obtain the two scalar equations (k · a)(a · δv)k 2 = {(a2 + c2s )k 2 − ω 2 }(k · δv) , (k · a)(k · δv)c2s = ω 2 (a · δv) .

(17.77)

Combining these equations, we obtain the dispersion relation for the remaining two magnetosonic modes ( 1/2 ) 2 2 2 ω 1 2 4c a cos θ . = ± (a + c2s ) 1 ± 1 − s2 (17.78) k 2 (a + c2s )2 (By inserting this dispersion relation, with the upper or lower sign, back into Eqs. (17.77), we can deduce the mode’s eigendirection δv.) This dispersion relation tells us that ω 2 is positive and so there are no unstable modes, which seems reasonable as there is no source of free energy. (The same is true, of course, for the Alfvén mode). These waves are compressive,

36 with the gas being moved by a combination of gas pressure and magnetic pressure and tension. The modes can be seen to be non-dispersive which is also to be expected as we have introduced neither a characteristic timescale nor a characteristic length into the problem. The mode with the plus signs in Eq. (17.78) is called the fast magnetosonic mode; its phase speed is depicted by the outer, quasi-circular curve in Fig. 17.13. A good approximation to its phase speed when a ≫ cs or a ≪ cs is ω/k ≃ ±(a2 + c2s )1/2 . When propagating perpendicular to B, the fast mode can be regarded as simply a longitudinal sound wave in which the gas pressure is augmented by the magnetic pressure B 2 /2µ0 (adopting a specific heat ratio γ for the magnetic field of 2 as B ∝ ρ and so Pmag ∝ ρ2 under perpendicular compression). The mode with the minus signs in Eq. (17.78) is called the slow magnetosonic mode. Its phase speed (depicted by the inner figure-8 curve in Fig. 17.13) can be approximated by ω/k = ±acs cos θ/(a2 + c2s )1/2 when a ≫ cs or a ≪ cs . Note that slow modes, like the intermediate modes, but unlike the fast modes, are incapable of propagating perpendicular to the unperturbed magnetic field; see Fig. 17.13. In the limit of vanishing Alfvén speed or sound speed, the slow modes cease to exist for all directions of propagation. In Part V, we will discover that MHD is a good approximation to the behavior of plasmas only at frequencies below the “ion gyro frequency”, which is a rather low frequency. For this reason, magnetosonic modes are usually regarded as low-frequency modes.

17.7.3

Scattering of Cosmic Rays

Now let us return to the issue of cosmic ray propagation, which motivated our investigation of magnetosonic modes. Let us consider 100 GeV particles in the interstellar medium. The electron (and ion, mostly proton) density and magnetic field strength in the solar wind are typically n ∼ 104 m−3 , B ∼ 100 pT. The Alfvén speed is then a ∼ 30 km s−1 , much slower than the speeds of the cosmic rays. In analyzing the cosmic-ray propagation, a magnetosonic wave can therefore be treated as essentially a magnetostatic perturbation. A relativistic cosmic ray of energy ǫ has a gyro radius of rG = ǫ/eBc, in this case ∼ 3 × 1012 m. Cosmic rays will be unaffected by waves with wavelength either much greater than or much less than rG . However waves, especially Alfvén waves, with wavelength matched to the gyro radius will be able to change the particle’s pitch angle α (the angle its momentum makes with the mean magnetic field direction). If the Alfvén waves in this wavelength range have rms dimensionless amplitude δB/B ≪ 1, then the particle’s pitch angle will change by an amount δα ∼ δB/B every wavelength. Now, if the wave spectrum is broadband, individual waves can be treated as uncorrelated so the particle pitch angle changes stochastically. In other words, the particle diffuses in pitch angle. The effective diffusion coefficient is 2 δB Dα ∼ ωG , (17.79) B where ωG = c/rG is the gyro frequency. The particle will therefore be scattered by roughly a radian in pitch angle every time it traverses a distance ℓ ∼ (B/δB)2 rG . This is effectively the particle’s collisional mean free path. Associated with this mean free path is a spatial

37 diffusion coefficient

ℓc . (17.80) 3 It is thought that δB/B ∼ 10−1 in the relevant wavelength range in the interstellar medium. An estimate of the collision mean free path is then ℓ(100GeV) ∼ 3 × 1014 m. Now, the thickness of our galaxy’s interstellar disk of gas is roughly L ∼ 3 × 1018 m∼ 104 ℓ. Therefore an estimate of the cosmic ray anisotropy is ∼ ℓ/L ∼ 10−4 , roughly compatible with the measurements. Although this discussion is an oversimplification, it does demonstrate that the cosmic rays in both the interplanetary medium and the interstellar medium can be scattered and confined by magnetosonic waves. This allows their escape to be impeded without much loss of energy, so that their number density and energy density can be maintained at the observed level at earth. A good question to ask at this point is “Where do the Alfvén waves come from?”. The answer turns out to be that they are almost certainly created by the cosmic rays themselves. In order to proceed further and give a more quantitative description of this interaction, we must go beyond a purely fluid description and start to specify the motions of individual particles. This is where we shall turn next, in Chap. 18. Dx ∼

**************************** EXERCISES Exercise 17.10 Example: Rotating Magnetospheres Many self-gravitating cosmic bodies are both spinning and magnetized. Examples are the earth, the sun, black holes surrounded by highly conducting accretion disks (which hold a magnetic field on the hole), neutron stars (pulsars), and magnetic white dwarfs. As a consequence of the body’s spin, large, exterior electric fields are induced, whose divergence must be balanced by free electrical charge. This implies that the region around the body cannot be vacuum. It is usually filled with plasma and is called a magnetosphere. MHD provides a convenient formalism for describing the structure of this magnetosphere. Magnetospheres are found around most planets and stars. Magnetospheres surrounding neutron stars and black holes are believed to be responsible for the emission from pulsars and quasars. As a model of a rotating magnetosphere, consider a magnetized and infinitely conducting star, spinning with angular frequency Ω∗ . Suppose that the magnetic field is stationary and axisymmetric with respect to the spin axis and that the magnetosphere is perfectly conducting. (a) Show that the azimuthal component of the magnetospheric electric field Eφ must vanish if the magnetic field is to be stationary. Hence show that there exists a function Ω(r) which must be parallel to Ω∗ and satisfy E = −(Ω × r) × B .

(17.81)

Show that if the motion of the magnetosphere’s conducting fluid is simply a rotation then its angular velocity must be Ω.

38 (b) Use the induction equation (magnetic-field transport law) to show that (B · ∇)Ω = 0 .

(17.82)

(c) Use the boundary condition at the surface of the star to show that the magnetosphere corotates with the star, i.e. Ω = Ω∗ . This is known as Ferraro’s law of isorotation. Exercise 17.11 Example: Solar Wind The solar wind is a magnetized outflow of plasma away from the solar corona. We will make a simple model of it generalizing the results from the last exercise. In this case, the fluid not only rotates with the sun but also moves away from it. We just consider stationary, axisymmetric motion in the equatorial plane and idealize the magnetic field as having the form Br (r), Bφ (r). (If this were true at all latitudes, the sun would have to contain magnetic monopoles!) (a) Use the results from the previous exercise plus the perfect MHD relation, E = −v × B to argue that the velocity field can be written in the form v=

κB + (Ω × r). ρ

(17.83)

where κ and Ω are constant along a field line. Interpret this relation kinematically. (b) Resolve the velocity and the magnetic field into radial and azimuthal components, vr , vφ , Br , Bφ and show that ρvr r 2 , Br r 2 are constant. (c) Use the induction equation to show that Br vr = . vφ − Ωr Bφ

(17.84)

(c) Use the equation of motion to show that the specific angular momentum, including both the mechanical and the magnetic contributions, Λ = rvφ −

rBr Bφ µ0 ρvr

(17.85)

is constant. (e) Combine these two relations to argue that vφ =

Ωr[MA2 Λ/Ωr 2 − 1] MA2 − 1

(17.86)

where MA is the Alfvén Mach number. Show that the solar wind must pass through a critical point where its radial speed equals the Alfvén speed. (f) In the solar wind, this critical point is located at about 20 solar radii. Explain why this implies that, through the action of the solar wind, the sun loses its spin faster than it loses its mass.

39 (g) At earth, the radial velocity in the solar wind is about 400 km s−1 and the mean proton density is about 4 × 106 m−3 . Estimate how long it will take the sun to slow down, and comment on your answer. (The mass of the sun is 2 × 1030 kg, its radius is 7 × 108 m and its rotation period is about 25 days.)

****************************

Bibliographic Note For textbook introductions to plasma physics we recommend the relevant chapters of Schmidt (1966) and Boyd and Sanderson (1969). For the theory of MHD instabilities and applications to magnetic confinement, see Bateman (1978) and Jeffrey and Taniuti (1966). For applications to astrophysics and space physics, see Parker (1979) and Parks (1991).

Bibliography Bateman. 1978. MHD Instabilities, Cambridge Mass.: MIT Press. Boyd, T. J. M. & Sanderson. 1969. Plasma Dynamics, London: Nelson. Jeffrey, A. & Taniuti, T. 1966. Magnetohydrodynamic Stability and Thermonuclear Confinement, New York: Academic Press. Mathews, J. & Walker, R.L. 1970. Mathematical Methods of Physics, New York: W.A. Benjamin. Moffatt, H. K. 1978. Magnetic Field Generation in Electrically Conducting Fluids, Cambridge: Cambridge University Press. Parker, E. N. 1979. Cosmical Magnetic Fields, Oxford: Clarendon Press. Parks, G. K. 1991. Physics of Space Plasmas, Redwood City, California: Addison Wesley. Schmidt, G. 1966. Physics of High Temperature Plasmas, New York: Academic Press. Shapiro, S.L. and Teukolsky, S.A. 1983. Black Holes, White Dwarfs, and Neutron Stars, New York: John Wiley and Sons. Strachen, J. D. et al. 1994. Phys. Rev. Lett., 72, 3526; see also http://www.pppl.gov/ projects/pages/tftr.html .

40

Box 17.2 Important Concepts in Chapter 17 • Fundamental MHD concepts and laws – Magnetic field B as primary electromagnetic variable; E, ρe , j expressible in terms of it, Sec. 17.2.1 – B, j frame independent in nonrelativistic MHD; E, ρe frame dependent, Sec. 17.2.1 – Magnetic Reynold’s number and magnetic diffusion coefficient, Sec. 17.2.1 – Evolution law for B: freezing into fluid at high magnetic Reynold’s number; diffusion through fluid at lower magnetic Reynold’s number, Sec. 17.2.1 – Magnetic force on fluid expressed in various ways: j × B, minus divergence of magnetic stress tensor, curvature force orthogonal to B minus gradient of magnetic pressure orthogonal to B, Sec. 17.2.2 – Ohmic dissipation and evolution of entropy, Sec. 17.2.2 – Boundary conditions at a contact discontinuity and at a shock, Sec. 17.2.3 – Pressure ratio β ≡ P/(B 2 /2µ0), Sec. 17.3.1 • Interaction of vorticity and magnetic fields: tangled B lines can create vorticity, vorticity can amplify B, dynamos, Secs. 17.2.4 and 17.6 • Hartmann flow: electromagnetic brake, power generator, flow meter, electromagnetic pump, Sec. 17.4 • Controlled fusion via magnetic confinement; confinement geometries (Z-pinch, θpinch, Tokamak) and magnetostatic equilibrium, Sec. 17.3 • Stability of magnetostatic (hydromagnetic) equilibria, Sec. 17.5 – Lagrangian and Eulerian perturbations, linearized dynamical equation for the fluid displacement ξ, self-adjointness, Sturm-Liouville eigenequation, Sec. 17.5.1 – Energy principle (action principle) for eigenfrequencies, Sec. 17.5.3 – Sausage and kink instabilities for Z-pinch configuration, Sec. 17.5.2 • Reconnection of magnetic field lines, Sec. 17.6.3 • Magnetosonic waves and dispersion relations, Sec. 17.7.2 – Alfven mode, fast magnetosonic mode, slow magnetosonic mode, Sec. 17.7.2 – Scattering of cosmic rays by Alfven waves, Sec. 17.7.3

Contents 17 Magnetohydrodynamics 17.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Basic Equations of MHD . . . . . . . . . . . . . . . . . . 17.2.1 Maxwell’s Equations in MHD Approximation . . 17.2.2 Momentum and Energy Conservation . . . . . . . 17.2.3 Boundary Conditions . . . . . . . . . . . . . . . . 17.2.4 Magnetic field and vorticity . . . . . . . . . . . . 17.3 Magnetostatic Equilibria . . . . . . . . . . . . . . . . . . 17.3.1 Controlled thermonuclear fusion . . . . . . . . . . 17.3.2 Z-Pinch . . . . . . . . . . . . . . . . . . . . . . . 17.3.3 Θ Pinch . . . . . . . . . . . . . . . . . . . . . . . 17.3.4 Tokamak . . . . . . . . . . . . . . . . . . . . . . . 17.4 Hydromagnetic Flows . . . . . . . . . . . . . . . . . . . . 17.5 Stability of Hydromagnetic Equilibria . . . . . . . . . . . 17.5.1 Linear Perturbation Theory . . . . . . . . . . . . 17.5.2 Z-Pinch; Sausage and Kink Instabilities . . . . . . 17.5.3 Energy Principle . . . . . . . . . . . . . . . . . . 17.6 Dynamos and Reconnection of Magnetic Field Lines . . . 17.6.1 Cowling’s theorem . . . . . . . . . . . . . . . . . 17.6.2 Kinematic dynamos . . . . . . . . . . . . . . . . . 17.6.3 Magnetic Reconnection . . . . . . . . . . . . . . . 17.7 Magnetosonic Waves and the Scattering of Cosmic Rays 17.7.1 Cosmic Rays . . . . . . . . . . . . . . . . . . . . 17.7.2 Magnetosonic Dispersion Relation . . . . . . . . . 17.7.3 Scattering of Cosmic Rays . . . . . . . . . . . . .

0

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 4 7 10 11 13 13 15 17 17 18 22 22 25 27 29 29 30 31 33 33 34 36

Chapter 17 Magnetohydrodynamics Version 0817.2.K.pdf, 11 March 2009. Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 17.1 Reader’s Guide • This chapter relies heavily on Chap. 12 and somewhat on the treatment of vorticity transport in Sec. 13.2 • Part V, Plasma Physics (Chaps. 18-21) relies heavily on this chapter.

17.1

Overview

In preceding chapters, we have described the consequences of incorporating viscosity and thermal conductivity into the description of a fluid. We now turn to our final embellishment of fluid mechanics, in which the fluid is electrically conducting and moves in a magnetic field. The study of flows of this type is known as Magnetohydrodynamics or MHD for short. In our discussion, we eschew full generality and with one exception just use the basic Euler equation augmented by magnetic terms. This suffices to highlight peculiarly magnetic effects and is adequate for many applications. The simplest example of an electrically conducting fluid is a liquid metal, for example, mercury or liquid sodium. However, the major use of MHD is in plasma physics. (A plasma is a hot, ionized gas containing free electrons and ions.) It is by no means obvious that plasmas can be regarded as fluids since the mean free paths for collisions between the electrons and ions are macroscopically long. However, as we shall learn in Part V (Sec. 19.5 and Chap. 21), collective interactions between large numbers of plasma particles can isotropize the particles’ velocity distributions in some local mean reference frame, thereby making it sensible to describe the plasma macroscopically by a mean density, velocity, and pressure. These mean 1

2 quantities can then be shown to obey the same conservation laws of mass, momentum and energy, as we derived for fluids in Chap. 12. As a result, a fluid description of a plasma is often reasonably accurate. We defer to Part V further discussion of this point, asking the reader to take this on trust for the moment. We are also, implicitly, assuming that the average velocity of the ions is nearly the same as the average velocity of the electrons. This is usually a good approximation; if it were not so, then the plasma would carry an unreasonably large current density. There are two serious technological applications of MHD that may become very important in the future. In the first, strong magnetic fields are used to confine rings or columns of hot plasma that (it is hoped) will be held in place long enough for thermonuclear fusion to occur and for net power to be generated. In the second, which is directed toward a similar goal, liquid metals are driven through a magnetic field in order to generate electricity. The study of magnetohydrodynamics is also motivated by its widespread application to the description of space (within the solar system) and astrophysical plasmas (beyond the solar system). We shall illustrate the principles of MHD using examples drawn from each of these areas. After deriving the basic equations of MHD (Sec. 17.2), we shall elucidate hydromagnetic equilibria by describing a Tokamak (Sec. 17.3). This is currently the most popular scheme for magnetic confinement of hot plasma. In our second application (Sec. 17.4) we shall describe the flow of conducting liquid metals or plasma along magnetized ducts and outline its potential as a practical means of electrical power generation and spacecraft propulsion. We shall then return to the question of hydromagnetic confinement of hot plasma and focus on the stability of equilibria (Sec. 17.5). This issue of stability has occupied a central place in our development of fluid mechanics and it will not come as a surprise to learn that it has dominated research into plasma fusion. When a magnetic field plays a role in the equilibrium (e.g. for magnetic confinement of a plasma), the field also makes possible new modes of oscillation and some of these modes can be unstable to exponential growth. Many magnetic confinement geometries are unstable to MHD modes. We shall demonstrate this qualitatively by considering the physical action of the magnetic field, and also formally using variational methods. In Sec. 17.6 we turn to a geophysical problem, the origin of the earth’s magnetic field. It is generally believed that complex fluid motions within the earth’s liquid core are responsible for regenerating the field through dynamo action. We shall use a simple model to illustrate this process. When magnetic forces are added to fluid mechanics, a new class of waves, called magnetosonic waves, can propagate. We conclude our discussion of MHD in Sec. 17.7 by deriving the properties of these wave modes in a homogeneous plasma and showing how they control the propagation of cosmic rays in the interplanetary and interstellar media.

17.2

Basic Equations of MHD

The equations of MHD describe the motion of a conducting fluid in a magnetic field. This fluid is usually either a liquid metal or a plasma. In both cases, the conductivity ought to be regarded as a tensor if the gyro frequency exceeds the collision frequency. (If there are several collisions per gyro orbit then the influence of the magnetic field on the transport

3

F=j×B

v j.

N

N

S B

S (b)

(a)

B

Fig. 17.1: The two key physical effects occuring in MHD. (a) A moving conductor modifies the magnetic field by appearing to drag the field lines with it. When the conductivity is infinite, the field lines appear to be frozen into the moving conductor. (b) When electric current, flowing in the conductor, crosses magnetic field lines there will be a Lorentz force, which will accelerate the fluid.

coefficients will be minimal.) However, in order to keep the mathematics simple, we shall treat the conductivity as a constant scalar, κe . In fact, it turns out that, for many of our applications, it is adequate to take the conductivity as infinite. There are two key physical effects that occur in MHD (Fig. 17.1), and understanding them well is the key to developing physical intuition in this subject. The first effect arises when a good conductor moves into a magnetic field. Electric current is induced in the conductor which, by Lenz’s law, creates its own magnetic field. This induced magnetic field tends to cancel the original, externally supported field, thereby, in effect, excluding the magnetic field lines from the conductor. Conversely, when the magnetic field penetrates the conductor and the conductor is moved out of the field, the induced field reinforces the applied field. The net result is that the lines of force appear to be dragged along with the conductor – they “go with the flow”. Naturally, if the conductor is a fluid with complex motions, the ensuing magnetic field distribution can become quite complex, and the current will build up until its growth is balanced by Ohmic dissipation. The second key effect is dynamical. When currents are induced by a motion of a conducting fluid through a magnetic field, a Lorentz (or j × B) force will act on the fluid and modify its motion. In MHD, the motion modifies the field and the field, in turn, reacts back and modifies the motion. This makes the theory highly non-linear. Before deriving the governing equations of MHD, we should consider the choice of primary variables. In electromagnetic theory, we specify the spatial and temporal variation of either the electromagnetic field or its source, the electric charge density and current density. One choice is computable (at least in principle) from the other using Maxwell’s equations, augmented by suitable boundary conditions. So it is with MHD and the choice depends on convenience. It turns out that for the majority of applications, it is most instructive to deal with the magnetic field as primary, and use Maxwell’s equations ∇·E =

ρe , ǫ0

∇·B=0,

∇×E =−

∂B , ∂t

∇ × B = µ0 j + µ0 ǫ0

∂E ∂t

to express the current density and the electric field in terms of the magnetic field.

(17.1)

4

17.2.1

Maxwell’s Equations in MHD Approximation

Ohm’s law, as normally formulated, is valid only in the rest frame of the conductor. In particular, for a conducting fluid, Ohm’s law relates the current density j′ measured in the fluid’s local rest frame, to the electric field E′ measured there: j′ = κe E′ ,

(17.2)

where κe is the electric conductivity. Because the fluid is generally acclerated, dv/dt 6= 0, its local rest frame is generally not inertial. Since it would produce a terrible headache to have to transform time and again from some inertial frame to the continually changing local rest frame when applying Ohm’s law, it is preferable to reformulate Ohm’s law in terms of the fields E, B and j measured in the inertial frame. To facilitate this (and for completeness), we shall explore the frame dependence of all our electromagnetic quantities E, B, j and ρe : We shall assume, throughout our development of magnetohydrodynamics, that the fluid moves with a non-relativistic speed v ≪ c relative to our chosen reference frame. We can then express the rest-frame electric field in terms of the inertial-frame electric and magnetic fields as E′ = E + v × B ; E ′ = |E′| ≪ E . (17.3a) p In the first equation we have set the Lorentz factor γ ≡ 1/ 1 − v 2 /c2 to unity consistent with our nonrelativistic approximation. The second equation follows from the high conductivity of the fluid, which guarantees that current will quickly flow in whatever manner it must to annihilate any electric field E′ that might be formed in the fluid’s rest frame. By contrast with the extreme frame dependence (17.3a) of the electric field, the magnetic field is essentially the same in the fluid’s local rest frame as in the laboratory. More specifically, the analog of Eq. (17.3a) is B′ = B − (v/c2) × E, and since E ∼ vB the second term is of magnitude (v/c)2 B, which is negligible, giving B′ = B . (17.3b) Because E is very frame dependent, so is its divergence, the electric charge density ρe : In the laboratory frame, where E ∼ vB, Gauss’s and Ampére’s laws [the first and fourth of Eqs. (17.1)] imply that ρe ∼ ǫ0 vB/L ∼ (v/c2 )j where L is the lengthscale on which E and B vary; and the relation E ′ ≪ E with Gauss’s law implies |ρ′e | ≪ |ρe |: ρe ∼ vj/c2 ,

|ρ′e | ≪ |ρe | , .

(17.3c)

By transforming the current density between frames and approximating γ ≃ 1, we obtain j′ = j + ρe v = j + O(v/c)2j; so in the nonrelativistic limit (first order in v/c) we can ignore the charge density and write j′ = j . (17.3d) To recapitulate, in nonrelativistic magnetohydrodynamic flows, the magnetic field and current density are frame independent up to fractional corrections of order (v/c)2 , while the electric field and charge density are very frame dependent and are generally small in the sense that E/c ∼ (v/c)B ≪ B [in Gaussian cgs units E ∼ (v/c)B ≪ B] and ρe c ∼ (v/c)j ≪ j.

5 Combining Eqs. (17.2), (17.3a) and (17.3d), we obtain the nonrelativistic form of Ohm’s law in terms of quantities measured in our chosen inertial, laboratory frame: j = κe (E + v × B) .

(17.4)

We are now ready to derive explicit equations for the (inertial-frame) electric field and current density in terms of the (inertial-frame) magnetic field. We begin with Ampere’s law written as ∇ × B − µ0j = µ0 ǫ0 ∂E/∂t = (1/c2 )∂E/∂t, and we notice that the time derivative of E is of order Ev/L ∼ Bv 2 /L (since E ∼ vB), so the right-hand side is O(Bv 2 /c2 L) and thus can be neglected compared to the O(B/L) term on the left, yielding: j=

1 ∇×B . µ0

(17.5a)

We next insert this expression for j into the inertial-frame Ohm’s law (17.4), thereby obtaining 1 ∇×B . E = −v × B + (17.5b) κe µ0 If we happen to be interested in the charge density (which is rare in MHD), we can compute it by taking the divergence of this electric field. ρe = −ǫ0 ∇ · (v × B) .

(17.5c)

Equations (17.5) express all the secondary electromagnetic variables in terms of our primary one, B. This has been possible because of the high electric conductivity and our choice to confine ourselves to nonrelativistic (low-velocity) situations; it would not be possible otherwise. We next derive an evolution law for the magnetic field by taking the curl of Eq. (17.5b), using Maxwell’s equation ∇ × E = −∂B/∂t, and the vector identity ∇ × (∇ × B) = ∇(∇ · B) − ∇2 B and using ∇ · B = 0; the result is ∂B = ∇ × (v × B) + ∂t

1 µ0 κe

∇2 B .

(17.6)

This equation is called the induction equation and describes the temporal evolution of the magnetic field. It is similar in form to the propagation law for vorticity in a flow with ∇P × ∇ρ = 0 [Eq. (13.3)], which says ∂ω/∂t = ∇ × (v × ω) + ν∇2 ω. The ∇ × (v × B) term in Eq. (17.6) dominates when the conductivity is large, and can be regarded as describing the freezing of magnetic field lines into the fluid in the same way as the ∇ × (v × ω) term describes the freezing of vortex lines into a fluid with small viscosity ν; cf. Fig. 17.2. By analogy with Eq. (13.9), when flux-freezing dominates, the fluid derivative of B/ρ can be written as d B B D B ≡ − ·∇ v =0 (17.7) Dt ρ dt ρ ρ

6

B

B

ω

(a)

(b)

Fig. 17.2: Pictorial representation of the evolution of magnetic field in a fluid endowed with infinite electrical conductivity. a) A uniform magnetic field at time t = 0 in a vortex. b) At a later time, when the fluid has rotated through ∼ 30◦ , the circulation has stretched and distorted the magnetic field.

(where ρ is mass density, not to be confused with charge density ρe ). This says that B/ρ evolves in the same manner as the separation ∆x between two points in the fluid; cf. Fig. 13.3 and associated discussion. The term (1/µ0 κe )∇2 B in the B-field evolution equation (17.6) is analogous to the vorticity diffusion term ν∇2 ω in the vorticity evolution equation (13.3); therefore, when κe is not too large, magnetic field lines will diffuse through the fluid. The effective diffusion coefficient (analogous to ν) is DM = 1/µ0 κe . (17.8) The earth’s magnetic field provides an example of field diffusion. That field is believed to be supported by electric currents flowing in the earth’s iron core. Now, we can estimate the electric conductivity of iron under these conditions and from it deduce a value for the diffusivity, DM ∼ 1 m2 s−1 . The size of the earth’s core is L ∼ 104 km, so if there were no fluid motions, then we would expect the magnetic field to diffuse out of the core and escape from the earth in a time τM ∼ L2 /DM ∼ three million years which is much shorter than the age of the earth, ∼ 5 billion years. The reason for this discrepancy, as we shall discuss, is that there are internal circulatory motions in the liquid core which are capable of regenerating the magnetic field through dynamo action. Although Eq. (17.6) describes a genuine diffusion of the magnetic field, the resulting magnetic decay time must be computed by solving the complete boundary value problem. To give a simple illustration, suppose that a poor conductor (e.g. a weakly ionized column of plasma) is surrounded by an excellent conductor, (e.g. the metal walls of the container in which the plasma is contained), and that magnetic field lines supported by wall currents thread the plasma. The magnetic field will only diminish after the wall currents undergo Ohmic dissipation and this can take much longer than the diffusion time for the plasma column alone. It is customary to introduce a dimensionless number called the Magnetic Reynolds number, RM , directly analogous to the normal Reynolds number, to describe the relative importance of flux freezing and diffusion. The normal Reynolds number can be regarded as the ratio of the magnitude of the vorticity-freezing term ∇ × (v × ω) ∼ (V /L)ω in the vorticity evolution equation ∂ω/∂t = ∇ × (v × ω) + ν∇2 ω to the magnitude of the diffusion term ν∇2 ω ∼ (ν/L2 )ω: R = (V /L)(ν/L2 )−1 = V L/ν. Here V is a characteristic speed and L a

7 Substance Mercury Liquid Sodium Laboratory Plasma Earth’s Core Interstellar Gas

L, m 0.1 0.1 1 107 1017

V , m s−1 0.1 0.1 100 0.1 103

DM , m2 s−1 1 0.1 10 1 103

τM , s RM 0.01 0.01 0.1 0.1 0.1 10 14 10 106 31 10 1017

Table 17.1: Characteristic Magnetic diffusivities DM , decay times τM and Magnetic Reynolds Numbers RM for some common MHD flows with characteristic length scales L and velocities V .

characteristic lengthscale of the flow. Similarly, the Magnetic Reynolds number is the ratio of the magnitude of the magnetic-flux-freezing term ∇×(v×B) ∼ (V /L)B to the magnitude of the magnetic-flux-diffusion term DM ∇2 B = (1/µo κe )∇2 B ∼ B/(µo κe L2 ) in the induction equation (magnetic-field evolution equation) (17.6): RM =

VL V /L = µ0 κe V L . = 2 DM /L DM

(17.9)

When RM ≫ 1, the field lines are effectively frozen into the fluid; when RM ≪ 1, Ohmic dissipation is dominant. Magnetic Reynolds numbers and diffusion times for some typical MHD flows are given in Table 17.1. For most laboratory conditions, RM is modest, which means that electric resistivity 1/κe is significant and the magnetic diffusivity DM is rarely negligible. By contrast, in space physics and astrophysics, RM ≫ 1 so the resistivity can be ignored almost always and everywhere. This limiting case, when the electric conductivity is treated as infinite, is often called perfect MHD. The phrase almost always and everywhere needs clarification. Just as for large-Reynoldsnumber fluid flows, so also here, boundary layers and discontinuities can be formed, in which the gradients of physical quantities are automatically large enough to make RM ∼ 1 locally. A new and important example discussed below is magnetic reconnection. This occurs when regions magnetized along different directions are juxtaposed, for example when the solar wind encounters the earth’s magnetosphere. In such discontinuities and boundary layers, magnetic diffusion and Ohmic dissipation are important; and, as in ordinary fluid mechanics, these dissipative layers and discontinuities can control the character of the overall flow despite occupying a negligible fraction of the total volume.

17.2.2

Momentum and Energy Conservation

The fluid dynamical aspects of MHD are handled by adding an electromagnetic force term to the Euler or Navier-Stokes equation. The magnetic force density j × B is the sum of the Lorentz forces acting on all the fluid’s charged particles in a unit volume. There is also an electric force density ρe E, but this is smaller than j × B by a factor O(v 2 /c2 ) by virtue of Eqs. (17.5a)–(17.5c), so we shall ignore it. When j×B is added to the Euler equation (12.40)

8 [or equivalently to the Navier-Stokes equation with the viscosity neglected as unimportant in the situations we shall study], it takes the following form: ρ

dv (∇ × B) × B = ρg − ∇P + j × B = ρg − ∇P + . dt µ0

(17.10)

Here we have used expression (17.5a) for the current density in terms of the magnetic field. This is our basic MHD force equation. Like all other force densities in this equation, the magnetic one j × B can be expressed as minus the divergence of a stress tensor, the magnetic portion of the Maxwell stress tensor, TM =

B2g B ⊗ B − ; 2µ0 µ0

(17.11)

see Ex. 17.1. By virtue of j × B = −∇ · TM and other relations explored in Sec. 12.5 and Box 12.3, we can convert the force-balance equation (17.10) into the conservation law for momentum [generalization of Eq. (12.38)] ∂(ρv) + ∇ · (P g + ρv ⊗ v + Tg + TM ) = 0 . ∂t

(17.12)

Here Tg is the gravitational stress tensor [Eq. (1) of Box 12.3], which resembles the magnetic one: g⊗g g2g + ; (17.13) Tg = − 8πG 4πG it is generally unimportant in laboratory plasmas but can be quite important in some astrophysical plasmas — e.g., near black holes and neutron stars. The two terms in the magnetic Maxwell stress tensor, Eq. (17.11) can be identified as the “push” of an isotropic magnetic pressure of B 2 /2µ0 that acts just like the gas pressure P , and the “pull” of a tension B 2 /µ0 that acts parallel to the magnetic field. The combination of the tension and the isotropic pressure give a net tension B 2 /2µ0 along the field and a net pressure B 2 /2µ0 perpendicular to the field lines. If we expand the divergence of the magnetic stress tensor, we obtain for the magnetic force density fm = −∇ · TM = j × B =

(∇ × B) × B , µo

(17.14)

as expected. Using standard vector identities, we can rewrite this as fm = −∇

B2 2µ0

2 (B · ∇)B (B · ∇)B B + + . =− ∇ µ0 2µ0 ⊥ µ0 ⊥

(17.15)

Here “⊥” means keep only the components perpendicular to the magnetic field; the fact that fm = j × B guarantees that the net force parallel to B must vanish, so we can throw away the component along B in each term. This transversality of fm means that the magnetic force does not inhibit nor promote motion of the fluid along the magnetic field. Instead,

9

B

(a)

(∆ Bµ02 )−

−

−

B2 µ0 R (b)

Fig. 17.3: Contributions to the electromagnetic force density acting on a conducting fluid in a nonuniform field. There is a magnetic-pressure force density −(∇B 2 /2µ0 )⊥ acting perpendicular to the magnetic field; and a magnetic curvature force density [(B · ∇)B/µ0 ]⊥ , which is also perpendicular to the magnetic field and lies in the plane of the field’s bend, pointing toward its center of curvature; the magnitude of this curvature force censity is (B 2 /µ0 R) where R is the radius of curvature.

fluid elements are free to slide along the field like beads that slide without friction along a magnetic “wire”. The “⊥” expressions in Eq. (17.15) say that the magnetic force density has two parts: first, the negative of the two-dimensional gradient of the magnetic pressure B 2 /2µ0 orthogonal to B (Fig. 17.3a), and second, an orthogonal curvature force (B · ∇)B/µ0 , which has magnitude B 2 /µ0 R, where R is the radius of curvature of a field line. This curvature force acts toward the field line’s center of curvature (Fig. 17.3b) and is the magnetic-field-line analog of the force that acts on a curved wire or string under tension. Just as the magnetic force density dominates and the electric force is negligible [O(v 2 /c2 )] in our nonrelativistic situation, so also the electromagnetic contribution to the energy density is predominantly due to the magnetic term UM = B 2 /2µ0 with negligible electric contribution. The electromagnetic energy flux is just the Poynting Flux FM = E × B/µ0. Inserting these into the law of energy conservation (12.53) [and continuing to neglect viscosity] we obtain 1 2 1 2 B2 E×B ∂ + ∇ · ( v + h + Φ)ρv + =0. (17.16) v +u+Φ ρ+ ∂t 2 2µ0 2 µ0 When gravitational energy is important, we must augment this equation with the gravitational energy density and flux as discussed in Box 12.2. As in Secs. 12.7.3 and 15.6.1, we can combine this energy conservation law with mass conservation and the first law of thermodynamics to obtain an equation for the evolution of entropy: Eqs. (12.70) and (12.71) are modified to read ds j2 ∂(ρs) + ∇ · (ρsv) = ρ = . ∂t dt κe T

(17.17)

Thus, just as viscosity increases entropy through viscous dissipation and thermal conductivity increases entropy through diffusive heat flow [Eqs. (12.70) and (12.71) and (15.86)], so also

10 electrical conductivity increases entropy through Ohmic dissipation. From Eq. (17.17) we see that our fourth transport coefficient κe , like our previous three (the two coefficients of viscosity η ≡ ρν and ζ and the thermal conductivity κ), is constrained to be positive by the second law of thermodynamics.

17.2.3

Boundary Conditions

The equations of MHD must be supplemented by boundary conditions at two different types of interfaces. The first is a contact discontinuity, i.e. the interface between two distinct media that do not mix; for example the surface of a liquid metal or a rigid wall of a plasma containment device. The second is a shock front which is being crossed by the fluid. Here the boundary is between shocked and unshocked fluid. We can derive the boundary conditions by transforming into a primed frame in which the interface is instantaneously at rest (not to be confused with the fluid’s local rest frame) and then transforming back into our original unprimed inertial frame. In the primed frame, we resolve the velocity and magnetic and electric vectors into components normal and tangential to the surface. If n is a unit vector normal to the surface, then the normal and tangential components of velocity in either frame are vn = n · v ,

vt = v − (n · v)n

(17.18)

and similarly for the E and B. At a contact discontinuity, vn′ = vn − vsn = 0

(17.19)

on both sides of the interface surface; here vsn is the normal velocity of the surface. At a shock front, mass flux across the surface is conserved [cf. Eq. (16.40a)]: [ρvn′ ] = [ρ(vn − vsn )] = 0 .

(17.20)

Here as in Chap. 16 we use the notation [X] to signify the difference in some quantity X across the interface. When we consider the magnetic field, it does not matter which frame we use since B is unchanged to the Galilean order at which we are working. Let us construct a thin “pill box” V (Fig. 17.4) and integrate the equation ∇ · B = 0 over its volume, invoke the divergence theorem and let the box thickness diminish to zero; thereby we see that [Bn ] = 0 .

(17.21)

By contrast, the tangential component of the magnetic field will generally be discontinuous across a interface because of the presence of surface currents. We deduce the boundary condition on the electric field by integrating Maxwell’s equation ∇ × E = −∂B/∂t over the area bounded by the circuit C in Fig. 17.4 and using Stokes theorem, letting the two short legs of the circuit vanish. We thereby obtain [E′t ] = [Et ] + [(vs × B)t ] = 0 ,

(17.22)

11

V

S

C

Fig. 17.4: (a) Elementary pill box V and (b) elementary circuit C used in deriving the MHD junction conditions at a surface S.

where vs is the velocity of a frame that moves with the surface. Note that only the normal component of the velocity contributes to this expression, so we can replace vs by vsn n. The normal component of the electric field, like the tangential component of the magnetic field, will generally be discontinuous as there may be surface charge at the interface. There are also dynamical boundary conditions that can be deduced by integrating the laws of momentum conservation (17.12) and energy conservation (17.16) over the pill box and using Gauss’s theorem to convert the volume integral of a divergence to a surface integral. The results, naturally, are the requirements that the normal fluxes of momentum T · n and energy F · n be continuous across the surface [T being the total stress, i.e., the quantity inside the divergence in Eq. (17.12) and F the total energy flux, i.e., the quantity inside the divergence in Eq. (17.16)]; see Eqs. (16.41) and (16.42) and associated discussion. The normal and tangential components of [T · n] = 0 read

Bt2 + =0, 2µ0

(17.23)

Bn Bt =0, ρvn vt − µ0

(17.24)

P+

ρvn2

where we have omitted the gravitational stress, since it will always be continuous in situations studied in this chapter (no surface layers of mass). Similarly, continuity of the energy flux [F · n] = 0 reads

17.2.4

(E + vs × B) × B 1 2 =0. v + h + Φ ρ(vn − vsn ) + 2 µ0

(17.25)

Magnetic field and vorticity

We have already remarked on how the magnetic field and the vorticity are both axial vectors that can be written as the curl of a polar vector and that they satisfy similar transport

12 v

v

B=Constant

B

ω

B

v (a)

(b)

Fig. 17.5: (a) Amplification of the strength of a magnetic field by vortical motion. When the RM ≫ 1, the magnetic field will be frozen into the rotating fluid and will be wound up so as to increase its strength. (b) When a tangled magnetic field is frozen into a irrotational flow, it will generally create vorticity.

equations. It is not surprising that they are physically intimately related. To explore this relationship in full detail would take us beyond the scope of this book. However, we can illustrate their interaction by showing how they can create and destroy each other. First, consider a simple vortex threaded at time t = 0 with a uniform magnetic field. If the magnetic Reynolds number is large enough, then the magnetic field will be carried along with the flow and wound up like spaghetti on the end of a fork (Fig. 17.5a). This will increase the magnetic energy in the vortex, though not the mean flux of magnetic field. This amplification will continue until either the field gradient is large enough that the field decays through Ohmic dissipation, or the field strength is large enough to react back on the flow and stop it spinning. Second, consider an irrotational flow containing a tangled magnetic field. Provided that the magnetic Reynolds number is again sufficiently large, the magnetic stress will act on the flow and induce vorticity. We can describe this formally by taking the curl of the equation of motion, Eq. (17.10). (For simplicity, we assume that the density ρ is constant and the electric conductivity is infinite.) We then obtain ∂ω ∇ × [(∇ × B) × B] − ∇ × (v × ω) = . ∂t µ0 ρ

(17.26)

The term on the right-hand side of this equation changes the number of vortex lines threading the fluid, just like the ∇P × ∇ρ/ρ2 term on the right-hand side of Eq. (13.3). Note, though, that as the divergence of the vorticity is zero, any fresh vortex lines that are made, must be created as continuous curves that grow out of points or lines where the vorticity vanishes. **************************** EXERCISES Exercise 17.1 Derivation: Basic Equations of MHD

13 (a) Verify that −∇ · TM = j × B where TM is the magnetic stress tensor (17.11). (b) Take the scalar product of the fluid velocity v with the equation of motion (17.10) and combine with mass conservation to obtain the energy conservation equation (17.16). (c) Combine energy conservation (17.16) with the first law of thermodynamics and mass conservation to obtain Eq. (17.17) for the evolution of the entropy. Exercise 17.2 Problem: Diffusion of Magnetic Field Consider an infinite cylinder of plasma with constant electric conductivity, surrounded by vacuum. Assume that the cylinder initially is magnetized uniformly parallel to its length, and assume that the field decays quickly enough that the plasma’s inertia keeps it from moving much during the decay (so v ≃ 0). (a) Show that the reduction of magnetic energy as the field decays is compensated by the Ohmic heating of the plasma plus energy lost to outgoing electromagnetic waves (which will be negligible if the decay is slow). (b) Compute the approximate magnetic profile after the field has decayed to a small fraction of its original value. Your answer should be expressible in terms of a Bessel function. Exercise 17.3 Problem: The Earth’s Bow Shock The solar wind is a supersonic, hydromagnetic flow of plasma originating in the solar corona. At the radius of the earth’s orbit, the density is ρ ∼ 6 × 10−21 kg m−3 , the velocity is v ∼ 400 km s−1 , the temperature is T ∼ 105 K and the magnetic field strength is B ∼ 1 nT. (a) By balancing the momentum flux with the magnetic pressure exerted by the earth’s dipole magnetic field, estimate the radius above the earth at which the solar wind passes through a bow shock. (b) Consider a strong perpendicular shock at which the magnetic field is parallel to the shock front. Show that the magnetic field strength will increase by the same ratio as the density on crossing the shock front. Do you expect the compression to increase or decrease as the strength of the field is increased, keeping all of the other flow variables constant? ****************************

17.3

Magnetostatic Equilibria

17.3.1

Controlled thermonuclear fusion

For a half century, plasma physicists have striven to release nuclear energy in a controlled manner by confining plasma at a temperature in excess of a hundred million degrees using

14 strong magnetic fields. In the most widely studied scheme, deuterium and tritium combine according to the reaction d + t → α + n + 22.4MeV. (17.27) The fast neutrons can be absorbed in a surrounding blanket of Lithium and the heat can then be used to drive a generator. At first this task seemed quite simple. However, it eventually became clear that it is very difficult to confine hot plasma with a magnetic field because most confinement geometries are unstable. In this book we shall restrict our attention to a few simple confinement devices emphasizing the one that is the basis of most modern efforts, the Tokamak. (Tokamaks were originally developed in the Soviet Union and the word is derived from a Russian abbreviation for toroidal magnetic field.) In this section we shall only treat equilibrium configurations; in Sec. 17.4, we shall consider their stability. In our discussions of both equilibrium and stability, we shall treat the plasma in the MHD approximation. At first this might seem rather unrealistic, because we are dealing with a dilute gas of ions and electrons that undergo infrequent Coulomb collisions. However, as we shall discuss in detail in Part V, collective effects produce a sufficiently high effective collision frequency to make the plasma behave like a fluid, so MHD is usually a good approximation for describing these equilibria and their rather slow temporal evolution. Let us examine some numbers that characterize the regime in which a successful controlledfusion device must operate. The ratio of plasma pressure to magnetic pressure β≡

P B 2 /2µ0

(17.28)

plays a key role. For the magnetic field to have any chance of confining the plasma, its pressure must exceed that of the plasma; i.e., β must be less than one. In fact the most successful designs achieve β ∼ 0.2. The largest field strengths that can be safely sustained in the laboratory are B ∼ 10 T (1T = 10 kG) and so β . 0.2 limits the gas pressure to P . 107 Pa ∼ 100 atmospheres. Plasma fusion can only be economically feasible if more power is released by nuclear reactions than is lost to radiative cooling. Both heating and cooling are ∝ n2 . However while the radiative cooling rate increases comparatively slowly with temperature, the nuclear reaction rate increases very rapidly. As the mean energy of the ions increases, the number of ions in the Maxwellian tail of the distribution function that are energetic enough to penetrate the Coulomb barrier will increase exponentially. This means that, if the rate of heat production exceeds the cooling rate by a modest factor, then the temperature has a value essentially fixed by atomic and nuclear physics. In the case of a d-t plasma this is T ∼ 108 K. The maximum hydrogen density that can be confined is therefore n = P/2kT ∼ 3 × 1021 m−3 . Now, if a volume V of plasma is confined at a given density n and temperature Tmin for a time τ , then the amount of nuclear energy generated will be proportional to n2 V τ , while the energy to heat the plasma up to T is ∝ nV . Therefore, there is a minimum value of the product nτ that must be attained before there will be net energy production.

15 This condition is known as the Lawson criterion. Numerically, the plasma must be confined for ∼ (n/1020 m−3 )−1 s, typically ∼ 30 ms. Now the sound speed at these temperatures is ∼ 3 × 105 m s−1 and so an unconfined plasma would hit the few-meter-sized walls of the vessel in which it is held in a few µs. Therefore, the magnetic confinement must be effective for typically 104 − 105 dynamical timescales (sound crossing times). It is necessary that the plasma be confined and confined well if we want to build a viable reactor.

17.3.2

Z-Pinch

Before discussing Tokamaks, let us begin by describing a simpler confinement geometry known as the Z-pinch [Fig. 17.6(a)]. In a Z-pinch, electric current is induced to flow along a cylinder of plasma. This creates a toroidal magnetic field whose tension prevents the plasma from expanding radially much like hoops on a barrel prevent it from exploding. Let us assume that the cylinder has a radius R and is surrounded by vacuum. Now, in static equilibrium we must balance the plasma pressure gradient by a Lorentz force: ∇P = j × B . (17.29) (Gravitational forces can safely be ignored.) Equation (17.29) implies immediately that B · ∇P = j · ∇P = 0. Both the magnetic field and the current density lie on constant pressure (or isobaric) surfaces. An equivalent version of the force balance equation (17.29), obtained using Eq. (17.15), says d B2 B2 P+ =− , (17.30) d̟ 2µ0 µ0 ̟ where ̟ is the radial cylindrical coordinate. This exhibits the balance between the gradient of plasma and magnetic pressure on the left, and the magnetic tension on the right. Treating this as a differential equation for B 2 and integrating it assuming that P falls to zero at the surface of the column, we obtain for the surface magnetic field Z 4µ0 R 2 B (R) = 2 P ̟d̟ . (17.31) R 0 We can re-express the surface toroidal field in terms of the total current flowing along the plasma as B(R) = µ0 I/2πR (Ampere’s law); and assuming that the plasma is primarily hydrogen so its ion density n and electron density are equal, we can write the pressure as P = 2nkB T . Inserting these into Eq. (17.31), integrating and solving for the current, we obtain 1/2 16πNkB T I= , (17.32) µ0 where N is the number of ions per unit length. For a 1 m column of plasma with hydrogen density n ∼ 1020 m−3 and temperature T ∼ 108 K, this says that currents of several MA are required for confinement.

16

(a) j B

(b) j

j

j B

(c)

Fig. 17.6: (a) The Z-pinch. (b) The θ-pinch. (c) The Tokamak.

17

17.3.3

Θ Pinch

There is a complementary equilibrium for a cylindrical plasma in which the magnetic field lies parallel to the axis and the current density encircles the cylinder [Fig. 17.6(b)]. This is called the θ-pinch. This configuration is usually established by making a cylindrical metal tube with a small gap so that current can flow around it as shown in the figure. The tube is filled with cold plasma and then the current is turned on quickly, producing a quickly growing longitudinal field inside the tube (as inside a solenoid). Since the plasma is highly conducting, the field lines cannot penetrate the plasma column but instead exert a stress on its surface causing it to shrink radially and rapidly. The plasma heats up due to both the radial work done on it and Ohmic heating. Equilibrium is established when the magnetic pressure B 2 /8π at the plasma’s surface balances its internal pressure.

17.3.4

Tokamak

One of the problems with these pinches (and we shall find others below) is that they have ends through which plasma can escape. This is readily cured by replacing the cylinder with a torus. It turns out that the most stable geometry, called the Tokamak, combines features of both Z- and θ-pinches; see Fig. 17.6(c). If we introduce spherical coordinates (r, θ, φ), then magnetic field lines and currents that lie in an r, θ plane (orthogonal to ~eφ ) are called poloidal, whereas φ components are called toroidal. In a Tokamak, the toroidal field is created by external poloidal current windings. However, the poloidal field is mostly created as a consequence of toroidal current induced to flow within the plasma torus. The resulting net field lines wrap around the plasma torus in a helical manner, defining a magnetic surface on which the pressure is constant. The average value of 2πdθ/dφ along the trajectory of a field line is called the rotational transform, i, and is a property of the magnetic surface on which the field line resides. If i/2π is a rational number, then the field line will close after a finite number of circuits. However, in general, i/2π will not be rational so a single field line will cover the whole magnetic surface ergodically. This allows the plasma to spread over the whole surface rapidly. The rotational transform is a measure of the toroidal current flowing within the magnetic surface and of course increases as we move outwards from the innermost magnetic surface, while the pressure decreases. The best performance to date was registered by the Tokamak Test Fusion Reactor (TTFR) in Princeton in 1994 (see Strachen et al 1994). The radius of the torus was ∼ 2.5 m and the magnetic field strength B ∼ 5 T. A nuclear power of ∼ 10 MW was produced with ∼ 40 MW of heating. The actual confinement time approached τ ∼ 1 s. The next major step is ITER (whose name means “the way” in Latin): a tokamakbased experimental fusion reactor being developed by a large international consortium; see http://www.iter.org/ . Its tokamak will be about twice as large in linear dimensions as TTFR and its goal is a fusion power output of 410 MW, about ten times that of TTFR. Even when “break-even” with large power output can be attained routinely, there will remain major engineering problems before controlled fusion will be fully practical. ****************************

18 B0

z y x

v

2a

E0

Fig. 17.7: Hartmann flow with speed v along a duct of thickness 2a, perpendicular to an applied magnetic field of strength B0 . The short side walls are conducting and the two long horizontal walls are electrically insulating.

EXERCISES Exercise 17.4 Problem: Strength of Magnetic Field in a Magnetic Confinement Device The currents that are sources for strong magnetic fields have to be held in place by solid conductors. Estimate the limiting field that can be sustained using normal construction materials. Exercise 17.5 Problem: Force-free Equilibria In an equilibrium state of a very low-β plasma, the pressure forces are ignorably small and so the Lorentz force j × B must vanish; such a plasma is said to be “force-free”. This, in turn, implies that the current density is parallel to the magnetic field, so ∇ × B = αB. Show that α must be constant along a field line, and that if the field lines eventually travel everywhere, then α must be constant everywhere. ****************************

17.4

Hydromagnetic Flows

Now let us consider a simple stationary flow. We consider flow of an electrically conducting fluid along a duct of constant cross-section perpendicular to a uniform magnetic field B0 (see Fig. 17.7). This is sometimes known as Hartmann Flow. The duct has two insulating walls (top and bottom as shown in the figure), separated by a distance 2a that is much smaller than the separation of short side walls, which are electrically conducting. In order to relate Hartmann flow to magnetic-free Poiseuille flow (viscous, laminar flow between plates), we shall reinstate the viscous force in the equation of motion. For simplicity we shall assume that the time-independent flow (∂v/∂t = 0) has travelled sufficiently far

19

R

I (a)

0

(b)

+V

(c)

(d)

Fig. 17.8: Four variations on Hartmann flow. a)Electromagnetic Brake. b) MHD Power generator c) Flow meter d) Electromagnetic pump.

down the duct (x direction) to have reached an x-independent form, so v · ∇v = 0 and v = v(y, z); and we assume that gravitational forces are unimportant. Then the flow’s equation of motion takes the form ∇P = j × B + η∇2 v ,

(17.33)

where η = ρν is the coefficient of dynamical viscosity. The magnetic (Lorentz) force j×B will alter the balance between the Poiseuille flow’s viscous force η∇2 v and the pressure gradient ∇P . The details of that altered balance and the resulting magnetic-influenced flow will depend on how the walls are connected electrically. Let us consider four possibilities that bring out the essential physics: Electromagnetic Brake; Fig. 17.8a We short circuit the electrodes so a current j can flow. The magnetic field lines are partially dragged by the fluid, bending them (as embodied in ∇ × B = µ0 j) so they can exert a decelerating tension force j × B = (∇ × B) × B/µ0 = B · ∇B/µ0 on the flow (Fig. 17.3b). This is an Electromagnetic Brake. The pressure gradient, which is trying to accelerate the fluid, is balanced by the magnetic tension. The work being done (per unit volume) by the pressure gradient, v · (−∇P ), is converted into heat through viscous and Ohmic dissipation. MHD Power generator; Fig. 17.8b This is similar to the electromagnetic brake except that an external load is added to the circuit. Useful power can be extracted from the flow. This may ultimately be practical in power stations where a flowing, conducting fluid can generate electricity directly without having to drive a turbine. Flow Meter; Fig. 17.8c

20 When the electrodes are on open circuit, the induced electric field will produce a measurable potential difference across the duct. This voltage will increase monotonically with the rate of flow of fluid through the duct and therefore can provide a measurement of the flow. Electromagnetic Pump; Figs. 17.7 and 17.8d Finally we can attach a battery to the electrodes and allow a current to flow. This produces a Lorentz force which either accelerates or decelerates the flow depending on the direction of the magnetic field. This method is used to pump liquid sodium coolant around a nuclear reactor. It has also been proposed as a means of spacecraft propulsion in interplanetary space. We consider in some detail two limiting cases of the electromagnetic pump. When there is a constant pressure gradient Q = −dP/dx but no magnetic field, a flow with modest Reynolds number will be approximately laminar with velocity profile z 2 Q 1− , (17.34) vx (z) = 2η a where a is the half width of the channel. This is the one-dimensional version of the “Poiseuille flow” in a pipe such as a blood vessel, which we studied in Sec. 12.7.6; cf. Eq. (12.76). Now suppose that uniform electric and magnetic fields E0 , B0 are applied along the ey and ez directions respectively (Fig. 17.7). The resulting magnetic force j × B can either reinforce or oppose the fluid’s motion. When the applied magnetic field is small, B0 ≪ E0 /vx , the effect of the magnetic force will be very similar to that of the pressure gradient, and Eq. 17.34 must be modified by replacing Q ≡ −dP/dx by −dP/dx + jy Bz = −dP/dx + κe E0 B0 . [Here jy = κe (Ey − vx Bz ) ≃ κe E0 .] If the strength of the magnetic field is increased sufficiently, then the magnetic force will dominate the viscous force, except in thin boundary layers near the walls. Outside the boundary layers, in the bulk of the flow, the velocity will adjust so that the electric field vanishes in the rest frame of the fluid, i.e. vx = E0 /B0 . In the boundary layers there will be a sharp drop of vx from E0 /B0 to zero at the walls, and correspondingly a strong viscous force, η∇2 v. Since the pressure gradient ∇P must be essentially the same in the boundary layer as in the adjacent bulk flow and thus cannot balance this large viscous force, it must be balanced instead by the magnetic force, j × B + η∇2 v = 0 [Eq. (17.33)] with j = κe (E + v × B) ∼ κe vx B0 ey . We thereby see that the thickness of the boundary layer will be given by 1/2 η δH ∼ . (17.35) κe B 2 This suggests a new dimensionless number to characterize the flow, a = B0 a H= δH

κe η

1/2

(17.36)

called the Hartmann number. H 2 is essentially the ratio of the magnetic force |j × B| ∼ κe vx B02 to the viscous force ∼ ηvx /a2 , assuming a lengthscale a rather than δH for variations of the velocity.

21 1 0.8 0.6 vx (z) vx (0) 0.4 0.2 -1

- 0.5

0 z/a

0.5

1

Fig. 17.9: Velocity profiles [Eq. 17.38] for flow in an electromagnetic pump of width 2a with small and large Hartmann number scaled to the velocity at the center of the channel. Dashed curve: the almost parabolic profile for H = 0.1 [Eq. (17.34)]. Solid curve: the almost flat topped profile for H = 10.

The detailed velocity profile vx (z) away from the vertical side walls is computed in Exercise 17.6 and is shown for low and high Hartmann numbers in Fig. 17.9. Notice that at low H, the plotted profile is nearly parabolic as expected, and at high H it consists of boundary layers at z ∼ −a and z ∼ a, and a uniform flow in between. **************************** EXERCISES Exercise 17.6 Example: Hartmann Flow Compute the velocity profile of a conducting fluid in a duct of thickness 2a perpendicular to externally generated, uniform electric and magnetic fields (E0 ey and B0 ez ) as shown in Fig. 17.7. Away from the vertical sides of the duct, the velocity vx is just a function of z and the pressure can be written in the form P = −Qx + p(z), where Q is the longitudinal pressure gradient. (a) Show that the velocity field satisfies the differential equation (Q + κe B0 E0 ) d2 vx κe B02 − vx = − . 2 dz η η

(17.37)

(b) Impose suitable boundary conditions at the bottom and top walls of the channel and solve this differential equation to obtain the following velocity field: Q + κe B0 E0 cosh(Hz/a) vx = 1− , (17.38) κe B02 coshH where H is the Hartmann number; cf. Fig. 17.9.

****************************

22

17.5

Stability of Hydromagnetic Equilibria

Having used the MHD equation of motion to analyze some simple flows, let us return to the question of magnetic confinement and demonstrate a procedure to analyze the stability of hydromagnetic equilibria. We first perform a straightforward linear perturbation analysis about equilibrium, obtaining an eigenequation for the perturbation’s oscillation frequencies ω. For sufficiently simple equilibria, this eigenequation can be solved analytically, but most equilibria are too complex for this so the eigenequation must be solved numerically or by other approximation techniques. This is rather similar to the task we face in attempting to solve the Schrödinger equation for multi-electron atoms. It will not be a surprise to learn that variational methods are especially practical and useful, and we shall develop a suitable formalism. We shall develop the perturbation theory, eigenequation, and variational formalism in some detail not only because of their importance for the stability of hydromagnetic equilibria, but also because essentially the same techniques (with different equations) are used in studying the stability of other equilibria. One example is the oscillations and stability of stars, in which the magnetic field is unimportant while self gravity is crucial [see, e.g., Chap. 6 of Shapiro and Teukolsky (1983), and Sec. 15.2.4 of this book, on helioseismology]. Another example is the oscillations and stability of elastostatic equilibria, in which B is absent but shear stresses are important (see, e.g., Secs. 11.3 and 11.4).

17.5.1

Linear Perturbation Theory

Consider a perfectly conducting isentropic fluid at rest in equilibrium with pressure gradients that balance magnetic forces. For simplicity, we shall ignore gravity. (This is usually justified in laboratory situations.) The equation of equilibrium then reduces to ∇P = j × B .

(17.39)

We now perturb slightly about this equilibrium and ignore the (usually negligible) effects of viscosity and magnetic-field diffusion, so η = ρν ≃ 0, κe ≃ ∞. It is useful and conventional to describe the perturbations in terms of two different types of quantities: (i) The change in a quantity (e.g. the fluid density) moving with the fluid, which is called a Lagrangian perturbation and denoted by the symbol ∆ (e.g, the Lagrangian density perturbation ∆ρ). (ii) The change at fixed location in space, which is called an Eulerian perturbation and denoted by the symbol δ (e.g, the Eulerian density perturbation δρ). The fundamental variable used in the theory is the fluid’s Lagrangian displacement ∆x ≡ ξ(x, t); i.e. the change in location of a fluid element, moving with the fluid. A fluid element whose location is x in the unperturbed equilibrium is moved to location x + ξ(x, t) by the perturbations. From their definitions, one can see that the Lagrangian and Eulerian perturbations are related by ∆ = δ+ξ·∇ e.g., ∆ρ = δρ + ξ · ∇ρ . (17.40) Now, consider the transport law for the magnetic field, ∂B/∂t = ∇×(v×B) [Eq. (17.6)]. To linear order, the velocity is v = ∂ξ/∂t. Inserting this into the transport law, and setting the

23 full magnetic field at fixed x, t equal to the equilibrium field plus its Eulerian perturbation B → B+δB, we obtain ∂δB/∂t = ∇×[(∂ξ/∂t)×(B +δB)]. Linearizing in the perturbation, and integrating in time, we obtain for the Eulerian perturbation of the magnetic field: δB = ∇ × (ξ × B) .

(17.41)

Since the current and the field are related, in general, by the linear equation j = ∇ × B/µ0 , their Eulerian perturbations are related in this same way: δj = ∇ × δB/µ0 .

(17.42)

In the equation of mass conservation, ∂ρ/∂t + ∇ · (ρv) = 0, we replace the density by its equilibrium value plus its Eulerian perturbation, ρ → ρ + δρ and replace v by ∂ξ/∂t, and we linearize in the perturbation to obtain δρ + ρ∇ · ξ + ξ · ∇ρ = 0 .

(17.43)

The Lagrangian density perturbation, obtained from this via Eq. (17.40), is ∆ρ = −ρ∇ · ξ .

(17.44)

We assume that, as it moves, the fluid gets compressed or expanded adiabatically (no Ohmic or viscous or heating, or radiative cooling). Then the Lagrangian change of pressure ∆P in each fluid element (moving with the fluid) is related to the Lagrangian change of density by γP ∂P ∆ρ = ∆ρ = −γP ∇ · ξ , (17.45) ∆P = ∂ρ s ρ where γ is the fluid’s adiabatic index (ratio of specific heats), which might or might not be independent of position in the equilibrium configuration. Correspondingly, the Eulerian perturbation of the pressure (perturbation at fixed location) is δP = ∆P − (ξ · ∇)P = −γP (∇ · ξ) − (ξ · ∇)P .

(17.46)

This is the pressure perturbation that appears in the fluid’s equation of motion. By replacing v → ∂ξ/∂t, P → P +δP and B → δB, and j → j+δj in the fluid’s equation of motion (17.10) and neglecting gravity, and by then linearizing in the perturbation, we obtain ∂2ξ ˆ . ρ 2 = j × δB + δj × B − ∇δP = F[ξ] (17.47) ∂t ˆ Here F[ξ] is a real, linear differential operator, whose form one can deduce by substituting expressions (17.41), (17.42), (17.46) for δB, δj, and δP , and ∇ × B/µ0 for j. By performing those substitutions and carefully rearranging the terms, we eventually convert the operator ˆ into the following form, expressed in slot-naming index notation: F Bj Bk Bj Bk B2 Bi Bj B2 ˆ ξk;k + ξj;i + ξj;k + P + ξi;k + ξk;k Fi [ξ] = (γ − 1)P + 2µ0 µ0 2µ0 µ0 µ0 ;i ;j (17.48)

24 Honestly! Here the semicolons denote gradients (partial derivatives in Cartesian coordinates; connection coefficients are required in curvilinear coordinates). We write the operator Fî in the explicit form (17.48) because of its power for demonstrating that Fî is self adjoint (Hermitian, with real variables rather than complex): By introducing the Kronecker-delta components of the metric, gij = δij , we can rewrite Eq. (17.48) in the form Fî [ξ] = (Tijkl ξk;l );j , (17.49) where Tijkl are the components of a fourth rank tensor that is symmetric under interchange of its first and second pairs of indices, Tijkl = Tklij . It then should be evident that, when we integrate over the volume V of our hydromagnetic configuration, we obtain Z Z Z Z Z ζi (Tijkl ξk;l );j = − Tijkl ζi;j ξk;l = ξi (Tijkl ζk;l );j = ξ · F[ζ]dV . ζ · F[ξ]dV = V

V

V

V

V

(17.50) Here we have used Gauss’s theorem (integration by parts), and to make the surface terms vanish we have required that ξ and ζ be any two functions that vanish on the boundary of the configuration, ∂V [or, more generally, for which Tijkl ξk;l ζinj and Tijkl ζk;l ξi nj vanish there, with nj the normal to the boundary]. Equation (17.50) demonstrates the self adjointness ˆ We shall use this below. (Hermiticity) of F. Returning to our perturbed MHD system, we seek its normal modes by assuming a harmonic time dependence, ξ ∝ e−iωt . The first-order equation of motion then becomes ˆ + ρω 2 ξ = 0 . F[ξ]

(17.51)

This is an eigenequation for the fluid’s Lagrangian displacement ξ, with eigenvalue ω 2. It must be augmented by boundary conditions at the edge of the fluid; see below. By virtue of the elegant, self-adjoint mathematical form (17.49) of the differential operator Fˆ , our eigenequation (17.51) is of a very special and powerful type, called Sturm-Liouville; see, e.g, Mathews and Walker (1970). From the general (rather simple) theory of SturmLiouville equations, we can infer that all the eigenvalues ω 2 are real, so the normal modes are purely oscillatory (ω 2 > 0, ξ ∝ e±i|ω|t ) or are purely exponentially growing or decaying (ω 2 < 0, ξ ∝ e±|ω|t ). Exponentially growing modes represent instability. Sturm-Liouville theory also implies that all eigenfunctions [labeled by indices “(n)”] with different eigenfrequencies R (n) (m) are orthogonal to each other, in the sense that V ρξ ξ = 0. The boundary conditions, as always, are crucial. In the simplest case, the conducting fluid is supposed to extend far beyond the region where the disturbances are of appreciable amplitude. In this case we merely require that |ξ| → 0 as |x| → ∞. More reasonably, the fluid might be enclosed within rigid walls, where the normal component of ξ vanishes. The most commonly encountered case, however, involves a conducting fluid surrounded by vacuum. No current will flow in the vacuum region and so ∇ × δB = 0 there. In this case, a suitable magnetic field perturbation in the vacuum region must be matched to the magnetic field derived from Eq. (17.41) for the perfect MHD region using the junction conditions discussed in Sec. 17.2.3.

25

17.5.2

Z-Pinch; Sausage and Kink Instabilities

We illustrate MHD stability theory using a simple, analytically tractable example. We consider a long cylindrical column of a conducting, incompressible liquid such as mercury, with column radius R and fluid density ρ. The column carries a current I longitudinally along its surface, so j = (I/2πR)δ(̟ − R)ez , and it is confined by the resulting external toroidal magnetic field Bφ ≡ B. The interior of the plasma is field free and at constant pressure P0 . From ∇ × B = µ0 j, we deduce that the exterior magnetic field is Bφ ≡ B =

µ0 I 2π̟

at

̟≥R.

(17.52)

Here (̟, φ, z) are the usual cylindrical coordinates. This hydromagnetic equilibrium configuration is called the Z-pinch because the z-directed current on the column’s surface creates the external toroidal field B, which pinches the column until its internal pressure is balanced by the field’s tension, 2 B ; (17.53) P0 = 2µ0 ̟ ̟=R see Sec. 17.3.2 and Fig. 17.6a It is quicker and more illuminating to analyze the stability of this Z-pinch equilibrium ˆ and the outcome is the same. Treating only the most directly instead of by evaluating F, elementary case, we consider small, axisymmetric perturbations with an assumed variation ξ ∝ ei(kz−ωt) f(̟) for some function f. As the magnetic field interior to the column vanishes, the equation of motion ρdv/dt = −∇(P + δP ) becomes −ω 2 ρξ̟ = −δP ′ ,

−ω 2 ρξz = −ikδP ,

(17.54)

where the prime denotes differentiation with respect to radius ̟. Combining these two equations, we obtain ξz′ = ikξ̟ . (17.55) Because the fluid is incompressible, it satisfies ∇ · ξ = 0; i.e., ̟ −1 (̟ξ̟ )′ + ikξz = 0 ,

(17.56)

which, with Eq. (17.55), leads to ξz′′ +

ξz′ − k 2 ξz = 0 . ̟

(17.57)

The solution of this equation that is regular at ̟ = 0 is ξz = AI0 (k̟) at ̟ ≤ R ,

(17.58)

where A is a constant and In (x) is the modified Bessel function In (x) = i−n Jn (ix). From Eq. (17.55) and dI0 (x)/dx = I1 (x), we obtain ξ̟ = −iAI1 (k̟).

(17.59)

26 Next, we consider the region exterior to the fluid column. As this is vacuum, it must be current-free; and as we are dealing with a purely axisymmetric perturbation, the ̟ component of ∇ × δB = µ0 δj reads ∂δBφ = ikδBφ = µ0 δj̟ = 0. ∂z

(17.60)

The φ component of the magnetic perturbation therefore vanishes outside the column. The interior and exterior solutions must be connected by the law of force balance, i.e. by the boundary condition (17.23) at the fluid surface. Allowing for the displacement of the surface and retaining only linear terms, this becomes P0 + ∆P = P0 + (ξ · ∇)P0 + δP =

(B + ∆Bφ )2 B2 B BδBφ = + (ξ · ∇)B + , 2µ0 2µ0 µ0 µ0

(17.61)

where all quantities are evaluated at ̟ = R. Now, the equilibrium force-balance condition gives us that P0 = B 2 /2µ0 [Eq. (17.53)] and ∇P0 = 0. In addition we have shown that δBφ = 0. Therefore Eq. (17.61) becomes simply BB ′ δP = ξ̟ . µ0

(17.62)

Substituting δP from Eqs. (17.54) and (17.58), B from Eq. (17.52), and ξ̟ from Eq. (17.59), we obtain the dispersion relation −µ0 I 2 kRI1 (kR) ω = 4π 2 R4 ρ I0 (kR) −µ0 I 2 k; k ≪ R−1 ∼ 2 2 8π R ρ −µ0 I 2 ∼ k; k ≫ R−1 , 4π 2 R3 ρ 2

(17.63)

where we have used I0 (x) ∼ 1, I1 (x) ∼ x/2 as x → 0 and I1 (x)/I0 (x) → 1 as x → ∞. Because I0 and I1 are positive for all kR > 0, for every wave number k this dispersion relation says that ω 2 is negative. Therefore, ω is imaginary and the perturbation grows exponentially with time, and the Z-pinch configuration is dynamically unstable. If we define a characteristic Alfvén speed by a = B(R)/(µ0 ρ)1/2 [Eq. (17.74) below], then we see that the growth time for modes with wavelength comparable to the column diameter is a few Alfvén crossing times, a few times 2R/a. This is fast! This is sometimes called a sausage instability, because its eigenfunction ξ̟ ∝ eikz consists of oscillatory pinches of the column’s radius that resemble the pinches between sausages in a link. This sausage instability has a simple physical interpretation (Fig. 17.10a), one that illustrates the power of the concepts of flux freezing and magnetic tension for developing intuition. If we imagine an inward radial motion of the fluid, then the toroidal loops of magnetic field will be carried inward too and will therefore shrink. As the fluid is incompressible, the strength of the field will increase, leading to a larger “hoop” stress or, equivalently, a

27 B

B v

v

j

v

(a)

(b)

j

v

Fig. 17.10: Physical interpretation of a) sausage and b) kink instabilities.

larger j × B Lorentz force. This cannot be resisted by any increase in pressure and so the perturbation will continue to grow. So far, we have only considered axisymmetric perturbations. We can generalize our analysis by allowing the perturbations to vary as ξ ∝ exp(imφ). (Our sausage instability corresponds to m = 0.) Modes with m ≥ 1, like m = 0, are also generically unstable. For example, m = 1 modes are known as kink modes. In this case, there is a bending of the column so that the field strength will be intensified along the inner face of the bend and reduced along the outer face, thereby amplifying the instability (Fig. 17.10b). In addition the incorporation of compressibility, as is appropriate for plasma instead of mercury, introduces only algebraic complexity; the conclusions are unchanged. The column is still highly unstable. We can also add magnetic field to the column’s interior. These MHD instabilities have bedevilled attempts to confine plasma for long enough to bring about nuclear fusion. Indeed, considerations of MHD stability were one of the primary motivations for the Tokamak, the most consistently successful of trial fusion devices. The Θ-pinch (Sec. 17.3.3 and Fig. 17.6b) turns out to be quite MHD stable, but naturally, cannot confine plasma without closing its ends. This can be done through the formation of a pair of magnetic mirrors or by bending the column into a closed torus. However, magnetic mirror machines have problems with losses and toroidal Θ-pinches exhibit new MHD instabilities involving the interchange of bundles of curving magnetic field lines. The best compromise appears to be a Tokamak with its combination of toroidal and poloidal magnetic field. The component of magnetic field along the plasma torus acts to stabilize through its pressure against sausage type instabilities and through its tension against kink-type instablities. In addition, formation of image currents in the conducting walls of a Tokamak vessel can also have a stabilising influence.

17.5.3

Energy Principle

Analytical, or indeed numerical solutions to the perturbation equations are only readily obtained in the most simple of geometries and for the simplest fluids. However, as the equation of motion is expressible in self-adjoint form, it is possible to write down a variational principle and use it to derive approximate stability criteria. To do this, begin by multiplying ˙ and then integrate over the whole volume V, ˆ the equation of motion ρ∂ 2 ξ/∂t2 = F[ξ] by ξ,

28 and use Gauss’s theorem to integrate by parts. The result is dE = 0 , where E = T + W , dt Z Z 1 1 ˙2 ˆ . dV ξ · F[ξ] W =− T = dV ρξ 2 2 V V

(17.64) (17.65)

The integrals T and W are the perturbation’s kinetic and potential energy, and E = T + W is the conserved total energy. Any solution of the equation of motion ∂ 2 ξ/∂t2 = F[ξ] can beP expanded in terms of a (n) ˆ is complete set of normal modes ξ (x) with eigenfrequencies ωn , ξ = n An ξ(n) e−iωn t . As F a real, self-adjoint operator, these normal modes can all be chosen to be real and orthogonal, even when some of their frequencies are degenerate. As the perturbation evolves, its energy sloshes back and forth between kinetic T and potential W , so time averages of T and W are ¯ . This implies, for each normal mode, that equal, T¯ = W ωn2

W [ξ (n) ] =R . dV 21 ρξ (n)2 V

(17.66)

As the denominator is positive definite, we conclude that a hydromagnetic equilibrium is stable against small perturbations if and only if the potential energy W [ξ] is a positive definite functional of the perturbation ξ. This is sometimes called the Rayleigh Principle in dynamics; in the MHD context, it is known as the Energy Principle. ˆ It is straightforward to verify, by virtue of the self-adjointness of F[ξ], that expression (17.66) serves as an action principle for the eigenfrequencies: If one inserts into (17.66) a trial function ξtrial in place of ξ (n) , then the resulting value of (17.66) will be stationary under small variations of ξ trial if and only if ξtrial is equal to some eigenfunction ξ(n) ; and the stationary value of (17.66) is that eigenfunction’s squared eigenfrequency ωn2 . This action principle is most useful for estimating the lowest few squared frequencies ωn2 . Relatively crude trial eigenfunctions can furnish surprisingly accurate eigenvalues. Whatever may be our chosen trial function ξ trial , the computed value of the action (17.66) will always be larger than ω02 , the squared eigenfrequency of the most unstable mode. Therefore, if we compute a negative value of (17.66) using some trial eigenfunction, we know that the equilibrium must be even more unstable. These energy principle and action principle are special cases of the general conservation law and action principle for Sturm-Liouville differential equations; see, e.g., Mathews and Walker (1970). **************************** EXERCISES Exercise 17.7 Derivation: Properties of Eigenmodes Derive the properties of the eigenvalues and eigenfunctions for perturbations of an MHD equilibrium that are asserted in the next to the last paragraph of Sec. 17.5.1, namely:

29 (a) For each normal mode the eigenfrequency ωn is either real or imaginary. (b) EigenfunctionsRξ (m) and ξ(n) that have different eigenvalues ωm 6= ωn are orthogonal to each other, ρ ξ (m) · ξ (m) dV = 0. Exercise 17.8 Example: Reformulation of the Energy Principle The form (17.48) of the potential energy functional derived in the text is necessary to demonˆ is self-adjoint. However, there are several simpler, equivalent forms strate that the operator F which are more convenient for practical use. (a) Use Eq. (17.47) to show that ˆ ξ · F[ξ] = j · b × ξ − b2 /µ0 − γP (∇ · ξ)2 − (∇ · ξ)(ξ · ∇)P + ∇ · [(ξ × B) × b/µ0 + γP ξ(∇ · ξ) + ξ(ξ · ∇)P ] ,

(17.67)

where b ≡ δB is the Eulerian perturbation of the magnetic field. (b) Transform the potential energy W [ξ] into a sum over volume and surface integrals. (c) Consider axisymmetric perturbations of the cylindrical Z-pinch of an incompressible fluid, as discussed in Sec 17.5.2, and argue that the surface integral vanishes. (d) Hence adopt a simple trial eigenfunction and obtain a variational estimate of the growth rate of the fastest growing mode.

****************************

17.6

Dynamos and Reconnection of Magnetic Field Lines

As we have already remarked, the time scale for the earth’s magnetic field to decay is estimated to be roughly a million years. This means that some process within the earth must be regenerating the magnetic field. This process is known as a dynamo process. In general, what happens in a dynamo process is that motion of the fluid is responsible for stretching the magnetic field lines and thereby increasing the magnetic energy density, thereby compensating the decrease in the magnetic energy associated with Ohmic decay. In fact, the details of how this happens inside the earth are not well understood. However, some general principles of dynamo action have been formulated.

17.6.1

Cowling’s theorem

It is simple to demonstrate that it is impossible for a stationary magnetic field, in a fluid with finite electric conductivity κe , to be axisymmetric. Suppose that there were such a dynamo and the poloidal (meridional) field had the form sketched in Fig. 17.11. Then there must be at least one neutral point marked P (actually a circle about the symmetry axis),

30 B

P

Fig. 17.11: Impossibility of an axisymmetric dynamo.

where the poloidal field vanishes. However, the curl of the magnetic field does not vanish at P, so there must be a toroidal current jφ there. Now, in the presence of finite resistivity, there must also be a toroidal electric field at P, since jφ = κe [Eφ + (vP × BP )φ ] = κe Eφ .

(17.68)

The nonzero Eφ in turn implies, via ∇×E = −∂B/∂t, that the amount of poloidal magnetic flux threading the circle at P must change with time, violating our original supposition that the magnetic field distribution is stationary. We therefore conclude that any self-perpetuating dynamo must be more complicated than a purely axisymmetric magnetic field. This is known as Cowling’s theorem.

17.6.2

Kinematic dynamos

The simplest types of dynamo to consider are those in which we specify a particular velocity field and allow the magnetic field to evolve according to the transport law (17.6). Under certain circumstances, this can produce dynamo action. Note that we do not consider, in our discussion, the dynamical effect of the magnetic field on the velocity field. The simplest type of motion is one in which a dynamo cycle occurs. In this cycle, there is one mechanism for creating toroidal magnetic field from poloidal field and a separate mechanism for regenerating the poloidal field. The first mechanism is usually differential rotation. The second is plausibly magnetic buoyancy in which a toroidal magnetized loop is lighter than its surroundings and therefore rises in the gravitational field. As the loop rises, Coriolis forces twist the flow causing poloidal magnetic field to appear. This completes the dynamo cycle. Small scale, turbulent velocity fields may also be responsible for dynamo action. In this case, it can be shown on general symmetry grounds that the velocity field must contain helicity, a non-zero expectation of v · ω. If the magnetic field strength grows then its dynamical effect will eventually react back on the flow and modify the velocity field. A full description of a dynamo must include this back reaction. Dynamos are a prime target for numerical simulations of MHD and significant progress has been made in understanding specialized problems, like the terrestrial dynamo, in recent years.

31

B

v

v

Fig. 17.12: Illustration of magnetic reconnection. A continuous flow can develop through the shaded reconnection region where Ohmic diffusion is important. Magnetic field lines “exchange partners” changing the overall field topology. Magnetic field components perpendicular to the plane of the illustration do not develop large gradients and so do not inhibit the reconnection process.

17.6.3

Magnetic Reconnection

Our discussion so far of the evolution of the magnetic field has centered on the induction equation (magnetic transport law), Eq. (17.6); and we have characterized our magnetized fluid by a magnetic Reynolds number using some characteristic length L associated with the flow and have found that Ohmic dissipation is unimportant when RM ≫ 1. This is reminiscent of the procedure we followed when discussing vorticity. However, for vorticity we discovered a very important exception to an uncritical neglect of viscosity and dissipation at large Reynolds number, namely boundary layers. In particular, we found that such flow near solid surfaces will develop very large velocity gradients on account of the no-slip boundary condition and that the local Reynolds number can thereby decrease to near unity, allowing viscous stress to change the character of the flow completely. Something very similar, called magnetic reconnection, can happen in hydromagnetic flows with large RM , even without the presence of solid surfaces: Consider two oppositely magnetized regions of conducting fluid moving toward each other (the upper and lower regions in Fig. 17.12). There will be a mutual magnetic attraction of the two regions as magnetic energy would be reduced if the two sets of field lines were superposed. However, strict flux freezing prevents superposition. Something has to give. What happens is a compromise. The attraction causes large magnetic gradients to develop accompanied by a buildup of large current densities, until Ohmic diffusion ultimately allows the magnetic field lines to slip sideways through the fluid and to reconnect with field in the other region (the sharply curved field lines in Fig. 17.12). This reconnection mechanism can be clearly observed at work within Tokamaks and at the earth’s magnetopause where the solar wind’s magnetic field meets the earth’s magnetosphere.

32 However the details of the reconnection mechanism are quite complex, involving plasma instabilities and shock fronts. Large, inductive electric fields can also develop when the magnetic geometry undergoes rapid change. This can happen in the reversing magnetic field in the earth’s magnetotail, leading to the acceleration of charged particles which impact the earth during a magnetic substorm. Like dynamo action, reconnection has a major role in determining how magnetic fields actually behave in both laboratory and space plasmas. **************************** EXERCISES Exercise 17.9 Problem: Differential rotation in the solar dynamo This problem shows how differential rotation leads to the production of toroidal magnetic field from poloidal field. (a) Verify that for a fluid undergoing differential rotation around a symmetry axis with angular velocity Ω(r, θ), the φ component of the induction equation reads ∂Bφ ∂Ω ∂Ω , (17.69) = sin θ Bθ + Br r ∂t ∂θ ∂r where θ is the co-latitiude. (The resistive term can be ignored.) (b) It is observed that the angular velocity on the solar surface is largest at the equator and decreases monotonically towards the poles. There is evidence (though less direct) that ∂Ω/∂r < 0 in the outer parts of the sun where the dynamo operates. Suppose that the field of the sun is roughly poloidal. Sketch the appearance of the toroidal field generated by the poloidal field. Exercise 17.10 Problem: Buoyancy in the solar dynamo Consider a slender flux tube in hydrostatic equilibrium in a conducting fluid. Assume that the diameter of the flux tube is much less than its length, and than its radius of curvature R, and than the external pressure scale height H; and assume that the magnetic field is directed along the tube, so there is negligible current flowing along the tube. (a) Show that the requirement of static equilibrium implies that B2 =0. ∇ P+ 2µ0

(17.70)

(b) Assume that the tube makes a complete circular loop of radius R in the equatorial plane of a spherical star. Also assume that the fluid is isothermal of temperature T so that the pressure scale height is H = kB T /µg, where µ is the mean molecular weight and g is the gravity. Prove that magnetostatic equilibrium is possible only if R = 2H.

33 (c) In the solar convection zone, H ≪ R/2. What happens to the toroidal field produced by differential rotation? Suppose the toroidal field breaks through the solar surface. What direction must the field lines have to be consistent with the previous example?

****************************

17.7

Magnetosonic Waves and the Scattering of Cosmic Rays

We have discussed global wave modes in a non-uniform magnetostatic plasma and described how they may be unstable. We now consider a particularly simple example: planar, monochromatic, propagating wave modes in a uniform, magnetized, conducting medium. These waves are called magnetosonic modes. They can be thought of as sound waves that are driven not just by gas pressure but also by magnetic pressure and tension. Although magnetosonic waves have been studied under laboratory conditions, there the magnetic Reynolds numbers are generally quite small and they damp quickly. No such problem arises in space plasmas, where magnetosonic modes are routinely studied by the many spacecraft that monitor the solar wind and its interaction with planetary magnetospheres. It appears that these modes perform an important function in space plasmas; they control the transport of cosmic rays. Let us describe some of the properties of cosmic rays before giving a formal derivation of the magnetosonic-wave dispersion relation.

17.7.1

Cosmic Rays

Cosmic rays are the high-energy particles, primarily protons, that bombard the earth’s magnetosphere from outer space. They range in energy from ∼ 1MeV to ∼ 3 × 1011 GeV = 0.3 ZeV. (The highest cosmic ray energy measured is 50 J. Thus, naturally occuring particle accelerators are far more impressive than their terrestrial counterparts which can only reach to ∼ 10 TeV = 104 GeV!) Most sub-relativistic particles originate within the solar system; their relativistic counterparts, up to energies ∼ 100 TeV, are believed to come mostly from interstellar space, where they are accelerated by the expanding shock waves formed by supernova explosions (cf. Sec. 16.6.2). The origin of the highest energy particles, above ∼ 100 TeV, is an intriguing mystery. The distribution of cosmic ray arrival directions at earth is inferred to be quite isotropic (to better than one part in 104 at an energy of 10 GeV). This is somewhat surprising because their sources, both within and beyond the solar system, are believed to be distributed anisotropically, so the isotropy needs to be explained. Part of the reason for the isotropization is that the interplanetary and interstellar media are magnetized and the particles gyrate around the magnetic field with the gyro frequency ωG = eBc2 /ǫ, where ǫ is the particle energy and B is the magnetic field strength. The Larmor radii of the non-relativistic particles are typically small compared with the size of the solar system and those of the relativistic particles are typically small compared with the typical scales in the interstellar medium.

34 Therefore, this gyrational motion can effectively erase any azimuthal asymmetry with respect to the field direction. However, this does not stop the particles from streaming away from their sources along the direction of the magnetic field, thereby producing anisotropy at earth; so something else must be impeding this flow and scattering the particles, causing them to effectively diffuse along and across the field through interplanetary and interstellar space. As we shall verify in Chap. 18 below, Coulomb collisions are quite ineffective, and if they were effective, then they would cause huge energy losses in violation of observations. We therefore seek some means of changing a cosmic ray’s momentum, without altering its energy significantly. This is reminiscent of the scattering of electrons in metals, where it is phonons (elastic waves in the crystal lattice) that are responsible for much of the scattering. It turns out that in the interstellar medium magnetosonic waves can play a role analogous to phonons, and scatter the cosmic rays. As an aid to understanding this, we now derive the waves’ dispersion relation.

17.7.2

Magnetosonic Dispersion Relation

Our procedure by now should be familiar. We consider a uniform, isentropic, magnetized fluid at rest, perform a linear perturbation, and seek monochromatic, plane-wave solutions varying ∝ ei(k·x−ωt) . We ignore gravity and dissipative processes (specifically viscosity, thermal conductivity and electrical resisitivity), as well as gradients in the equilibrium, which can all be important in one circumstance or another. It is convenient to use the velocity perturbation as the independent variable. The perturbed and linearized equation of motion (17.10) then takes the form −iρωδv = −ic2s kδρ + δj × B ,

(17.71)

where δv is the velocity perturbation, cs is the sound speed [c2s = (∂P/∂ρ)s = γP/ρ] and δP = c2s δρ is the Eulerian pressure perturbation for our homogeneous equilibrium [note that ∇P = ∇ρ = 0 so Eulerian and Lagrangian perturbations are the same]. We use the notation cs to avoid confusion with the speed of light. The perturbed equation of mass conservation ∂ρ/∂t + ∇ · (ρv) = 0 becomes ωδρ = ρk · δv , (17.72) and Faraday’s law ∂B/∂t = −∇ × E and the MHD law of magnetic-field transport with dissipation ignored, ∂B/∂t = ∇ × (v × B) become ωδB = k × E = −k × (δv × B) .

(17.73)

We introduce the Alfvén velocity a≡

B (µ0 ρ)1/2

(17.74)

and insert δρ [Eq. (17.72)] and δB [Eq. (17.73)] into Eq. (17.71)] to obtain [k × {k × (δv × a)}] × a + c2s (k · δv)k = ω 2 δv .

(17.75)

35 B f ω/k

i θ s

s i f

Fig. 17.13: Phase velocity surfaces for the three types of magnetosonic modes, fast (f), intermediate (i) and slow(s). The three curves are polar plots of the wave phase velocity ω/k in units of √ the Alfvén speed a = B/ µ0 ρ. In the particular example shown, the sound speed cs is half the Alfvén speed.

This is an eigenequation for the wave’s frequency ω 2 and eigendirection δv. The straightforward way to solve it is to rewrite it in the standard matrix form Mij δvj = ω 2 δvi and then use standard matrix (determinant) methods. It is quicker, however, to seek the three eigendirections δv and eigenfrequencies ω one by one, by projection along perferred directions: We first seek a solution to Eq. (17.75) for which δv is orthogonal to the plane formed by the unperturbed magnetic field and the wave vector, δv = a × k (up to a multiplicative constant). Inserting this δv into Eq. (17.75), we obtain the dispersion relation ω = ±a · k ;

ω = ±a cos θ , k

(17.76)

where θ is the angle between k and the unperturbed field. This type of wave is known as the Intermediate mode and also as the Alfvén mode. Its phase speed ω/k = a cos θ is plotted as the larger figure-8 curve in Fig. 17.13. The velocity and magnetic perturbations δv and δB are both along the direction a×k, so the wave is fully transverse; and there is no compression (δρ = 0), which accounts for the absence of the sound speed cS in the dispersion relation. This Alfvén mode has a simple physical interpretation in the limiting case when k is parallel to B. We can think of the magnetic field lines as strings with tension B 2 /µ0 and inertia ρ, Their transverse oscillations then propagate with speed p which are plucked transversely. √ tension/inertia = B/ µ0 ρ = a. The dispersion relations for the other two modes can be deduced by projecting the eigenequation (17.75) successively along k and along a to obtain the two scalar equations (k · a)(a · δv)k 2 = {(a2 + c2s )k 2 − ω 2 }(k · δv) , (k · a)(k · δv)c2s = ω 2 (a · δv) .

(17.77)

36 Combining these equations, we obtain the dispersion relation for the remaining two magnetosonic modes ( 1/2 ) ω 1 2 4c2s a2 cos2 θ 2 . = ± (a + cs ) 1 ± 1 − (17.78) k 2 (a2 + c2s )2 (By inserting this dispersion relation, with the upper or lower sign, back into Eqs. (17.77), we can deduce the mode’s eigendirection δv.) This dispersion relation tells us that ω 2 is positive and so there are no unstable modes, which seems reasonable as there is no source of free energy. (The same is true, of course, for the Alfvén mode). These waves are compressive, with the gas being moved by a combination of gas pressure and magnetic pressure and tension. The modes can be seen to be non-dispersive which is also to be expected as we have introduced neither a characteristic timescale nor a characteristic length into the problem. The mode with the plus signs in Eq. (17.78) is called the fast magnetosonic mode; its phase speed is depicted by the outer, quasi-circular curve in Fig. 17.13. A good approximation to its phase speed when a ≫ cs or a ≪ cs is ω/k ≃ ±(a2 + c2s )1/2 . When propagating perpendicular to B, the fast mode can be regarded as simply a longitudinal sound wave in which the gas pressure is augmented by the magnetic pressure B 2 /2µ0 (adopting a specific heat ratio γ for the magnetic field of 2 as B ∝ ρ and so Pmag ∝ ρ2 under perpendicular compression). The mode with the minus signs in Eq. (17.78) is called the slow magnetosonic mode. Its phase speed (depicted by the inner figure-8 curve in Fig. 17.13) can be approximated by ω/k = ±acs cos θ/(a2 + c2s )1/2 when a ≫ cs or a ≪ cs . Note that slow modes, like the intermediate modes, but unlike the fast modes, are incapable of propagating perpendicular to the unperturbed magnetic field; see Fig. 17.13. In the limit of vanishing Alfvén speed or sound speed, the slow modes cease to exist for all directions of propagation. In Part V, we will discover that MHD is a good approximation to the behavior of plasmas only at frequencies below the “ion gyro frequency”, which is a rather low frequency. For this reason, magnetosonic modes are usually regarded as low-frequency modes.

17.7.3

Scattering of Cosmic Rays

Now let us return to the issue of cosmic ray propagation, which motivated our investigation of magnetosonic modes. Let us consider 100 GeV particles in the interstellar medium. The electron (and ion, mostly proton) density and magnetic field strength in the solar wind are typically n ∼ 104 m−3 , B ∼ 100 pT. The Alfvén speed is then a ∼ 30 km s−1 , much slower than the speeds of the cosmic rays. In analyzing the cosmic-ray propagation, a magnetosonic wave can therefore be treated as essentially a magnetostatic perturbation. A relativistic cosmic ray of energy ǫ has a gyro radius of rG = ǫ/eBc, in this case ∼ 3 × 1012 m. Cosmic rays will be unaffected by waves with wavelength either much greater than or much less than rG . However waves, especially Alfvén waves, with wavelength matched to the gyro radius will be able to change the particle’s pitch angle α (the angle its momentum makes with the mean magnetic field direction). If the Alfvén waves in this wavelength range have rms dimensionless amplitude δB/B ≪ 1, then the particle’s pitch angle will change by an amount δα ∼ δB/B every wavelength. Now, if the wave spectrum is broadband, individual

37 waves can be treated as uncorrelated so the particle pitch angle changes stochastically. In other words, the particle diffuses in pitch angle. The effective diffusion coefficient is 2 δB Dα ∼ ωG , (17.79) B where ωG = c/rG is the gyro frequency. The particle will therefore be scattered by roughly a radian in pitch angle every time it traverses a distance ℓ ∼ (B/δB)2 rG . This is effectively the particle’s collisional mean free path. Associated with this mean free path is a spatial diffusion coefficient ℓc Dx ∼ . (17.80) 3 It is thought that δB/B ∼ 10−1 in the relevant wavelength range in the interstellar medium. An estimate of the collision mean free path is then ℓ(100GeV) ∼ 3 × 1014 m. Now, the thickness of our galaxy’s interstellar disk of gas is roughly L ∼ 3 × 1018 m∼ 104 ℓ. Therefore an estimate of the cosmic ray anisotropy is ∼ ℓ/L ∼ 10−4 , roughly compatible with the measurements. Although this discussion is an oversimplification, it does demonstrate that the cosmic rays in both the interplanetary medium and the interstellar medium can be scattered and confined by magnetosonic waves. This allows their escape to be impeded without much loss of energy, so that their number density and energy density can be maintained at the observed level at earth. A good question to ask at this point is “Where do the Alfvén waves come from?”. The answer turns out to be that they are almost certainly created by the cosmic rays themselves. In order to proceed further and give a more quantitative description of this interaction, we must go beyond a purely fluid description and start to specify the motions of individual particles. This is where we shall turn next, in Chap. 18. **************************** EXERCISES Exercise 17.11 Example: Rotating Magnetospheres Many self-gravitating cosmic bodies are both spinning and magnetized. Examples are the earth, the sun, black holes surrounded by highly conducting accretion disks (which hold a magnetic field on the hole), neutron stars (pulsars), and magnetic white dwarfs. As a consequence of the body’s spin, large, exterior electric fields are induced, whose divergence must be balanced by free electrical charge. This implies that the region around the body cannot be vacuum. It is usually filled with plasma and is called a magnetosphere. MHD provides a convenient formalism for describing the structure of this magnetosphere. Magnetospheres are found around most planets and stars. Magnetospheres surrounding neutron stars and black holes are believed to be responsible for the emission from pulsars and quasars. As a model of a rotating magnetosphere, consider a magnetized and infinitely conducting star, spinning with angular frequency Ω∗ . Suppose that the magnetic field is stationary and axisymmetric with respect to the spin axis and that the magnetosphere is perfectly conducting.

38 (a) Show that the azimuthal component of the magnetospheric electric field Eφ must vanish if the magnetic field is to be stationary. Hence show that there exists a function Ω(r) which must be parallel to Ω∗ and satisfy E = −(Ω × r) × B .

(17.81)

Show that if the motion of the magnetosphere’s conducting fluid is simply a rotation then its angular velocity must be Ω. (b) Use the induction equation (magnetic-field transport law) to show that (B · ∇)Ω = 0 .

(17.82)

(c) Use the boundary condition at the surface of the star to show that the magnetosphere corotates with the star, i.e. Ω = Ω∗ . This is known as Ferraro’s law of isorotation. Exercise 17.12 Example: Solar Wind The solar wind is a magnetized outflow of plasma away from the solar corona. We will make a simple model of it generalizing the results from the last exercise. In this case, the fluid not only rotates with the sun but also moves away from it. We just consider stationary, axisymmetric motion in the equatorial plane and idealize the magnetic field as having the form Br (r), Bφ (r). (If this were true at all latitudes, the sun would have to contain magnetic monopoles!) (a) Use the results from the previous exercise plus the perfect MHD relation, E = −v × B to argue that the velocity field can be written in the form v=

κB + (Ω × r). ρ

(17.83)

where κ and Ω are constant along a field line. Interpret this relation kinematically. (b) Resolve the velocity and the magnetic field into radial and azimuthal components, vr , vφ , Br , Bφ and show that ρvr r 2 , Br r 2 are constant. (c) Use the induction equation to show that vr Br = . vφ − Ωr Bφ

(17.84)

(c) Use the equation of motion to show that the specific angular momentum, including both the mechanical and the magnetic contributions, Λ = rvφ − is constant.

rBr Bφ µ0 ρvr

(17.85)

39 (e) Combine these two relations to argue that vφ =

Ωr[MA2 Λ/Ωr 2 − 1] MA2 − 1

(17.86)

where MA is the Alfvén Mach number. Show that the solar wind must pass through a critical point where its radial speed equals the Alfvén speed. (f) In the solar wind, this critical point is located at about 20 solar radii. Explain why this implies that, through the action of the solar wind, the sun loses its spin faster than it loses its mass. (g) At earth, the radial velocity in the solar wind is about 400 km s−1 and the mean proton density is about 4 × 106 m−3 . Estimate how long it will take the sun to slow down, and comment on your answer. (The mass of the sun is 2 × 1030 kg, its radius is 7 × 108 m and its rotation period is about 25 days.) ****************************

Bibliographic Note For textbook introductions to plasma physics we recommend the relevant chapters of Schmidt (1966) and Boyd and Sanderson (1969). For the theory of MHD instabilities and applications to magnetic confinement, see Bateman (1978) and Jeffrey and Taniuti (1966). For applications to astrophysics and space physics, see Parker (1979) and Parks (1991).

Bibliography Bateman. 1978. MHD Instabilities, Cambridge Mass.: MIT Press. Boyd, T. J. M. & Sanderson. 1969. Plasma Dynamics, London: Nelson. Jeffrey, A. & Taniuti, T. 1966. Magnetohydrodynamic Stability and Thermonuclear Confinement, New York: Academic Press. Mathews, J. & Walker, R.L. 1970. Mathematical Methods of Physics, New York: W.A. Benjamin. Moffatt, H. K. 1978. Magnetic Field Generation in Electrically Conducting Fluids, Cambridge: Cambridge University Press. Parker, E. N. 1979. Cosmical Magnetic Fields, Oxford: Clarendon Press. Parks, G. K. 1991. Physics of Space Plasmas, Redwood City, California: Addison Wesley.

40 Box 17.2 Important Concepts in Chapter 17 • Fundamental MHD concepts and laws – Magnetic field B as primary electromagnetic variable; E, ρe , j expressible in terms of it, Sec. 17.2.1 – B, j frame independent in nonrelativistic MHD; E, ρe frame dependent, Sec. 17.2.1 – Magnetic Reynold’s number and magnetic diffusion coefficient, Sec. 17.2.1 – Evolution law for B: freezing into fluid at high magnetic Reynold’s number; diffusion through fluid at lower magnetic Reynold’s number, Sec. 17.2.1 – Magnetic force on fluid expressed in various ways: j × B, minus divergence of magnetic stress tensor, curvature force orthogonal to B minus gradient of magnetic pressure orthogonal to B, Sec. 17.2.2 – Ohmic dissipation and evolution of entropy, Sec. 17.2.2 – Boundary conditions at a contact discontinuity and at a shock, Sec. 17.2.3 – Pressure ratio β ≡ P/(B 2 /2µ0), Sec. 17.3.1 • Interaction of vorticity and magnetic fields: tangled B lines can create vorticity, vorticity can amplify B, dynamos, Secs. 17.2.4 and 17.6 • Hartmann flow: electromagnetic brake, power generator, flow meter, electromagnetic pump, Sec. 17.4 • Controlled fusion via magnetic confinement; confinement geometries (Z-pinch, θpinch, Tokamak) and magnetostatic equilibrium, Sec. 17.3 • Stability of magnetostatic (hydromagnetic) equilibria, Sec. 17.5 – Lagrangian and Eulerian perturbations, linearized dynamical equation for the fluid displacement ξ, self-adjointness, Sturm-Liouville eigenequation, Sec. 17.5.1 – Energy principle (action principle) for eigenfrequencies, Sec. 17.5.3 – Sausage and kink instabilities for Z-pinch configuration, Sec. 17.5.2 • Reconnection of magnetic field lines, Sec. 17.6.3 • Magnetosonic waves and dispersion relations, Sec. 17.7.2 – Alfven mode, fast magnetosonic mode, slow magnetosonic mode, Sec. 17.7.2 – Scattering of cosmic rays by Alfven waves, Sec. 17.7.3

Schmidt, G. 1966. Physics of High Temperature Plasmas, New York: Academic Press.

41 Shapiro, S.L. and Teukolsky, S.A. 1983. Black Holes, White Dwarfs, and Neutron Stars, New York: John Wiley and Sons. Strachen, J. D. et al. 1994. Phys. Rev. Lett., 72, 3526; see also http://www.pppl.gov/ projects/pages/tftr.html .

Contents V

PLASMA PHYSICS

2

18 The Particle Kinetics of Plasma 18.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Examples of Plasmas and their Density-Temperature Regimes . . . . . . 18.2.1 Ionization boundary . . . . . . . . . . . . . . . . . . . . . . . . . 18.2.2 Degeneracy boundary . . . . . . . . . . . . . . . . . . . . . . . . . 18.2.3 Relativistic boundary . . . . . . . . . . . . . . . . . . . . . . . . . 18.2.4 Pair-production boundary . . . . . . . . . . . . . . . . . . . . . . 18.2.5 Examples of natural and man-made plasmas . . . . . . . . . . . . 18.3 Collective Effects in Plasmas – Debye Shielding and Plasma Oscillations . 18.3.1 Debye Shielding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.2 Collective behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.3 Plasma Oscillations and Plasma Frequency . . . . . . . . . . . . . 18.4 Coulomb Collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.1 Collision frequency . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.2 The Coulomb logarithm . . . . . . . . . . . . . . . . . . . . . . . 18.4.3 Thermal Equilibration Times in a Plasma . . . . . . . . . . . . . 18.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5 Transport Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5.1 Anomalous Resistivity and Anomalous Equilibration . . . . . . . 18.6 Magnetic Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6.1 Cyclotron frequency and Larmor radius. . . . . . . . . . . . . . . 18.6.2 Validity of the Fluid Approximation. . . . . . . . . . . . . . . . . 18.6.3 Conductivity Tensor . . . . . . . . . . . . . . . . . . . . . . . . . 18.7 Adiabatic Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.7.1 Homogeneous, time-independent magnetic field . . . . . . . . . . 18.7.2 Homogeneous time-independent electric and magnetic fields . . . 18.7.3 Inhomogeneous, time-independent magnetic field . . . . . . . . . . 18.7.4 A slowly time varying magnetic field . . . . . . . . . . . . . . . .

1

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 2 4 4 5 5 7 7 8 8 9 10 12 12 14 16 17 19 19 20 21 24 24 24 25 27

Part V PLASMA PHYSICS

2

Plasma Physics Version 0818.1.K.pdf, April 1, 2009. A plasma is a gas that is significantly ionized (through heating or photoionization) and thus is composed of electrons and ions, and that has a low enough density to behave classically, i.e. to obey Maxwell-Boltzmann statistics rather than Fermi-Dirac or Bose-Einstein. Plasma physics originated in the nineteenth century, in the study of gas discharges (Crookes 1879). However, it was soon realised that plasma is also the key to understanding the propagation of radio waves across the Atlantic (Heaviside 1902). The subject received a further boost in the early 1950s, with the start of the controlled (and the uncontrolled) thermonuclear fusion program. The various confinement devices described in the preceding chapter are intended to hold plasma at temperatures as high as ∼ 108 K; the difficulty of this task has turned out to be an issue of plasma physics as much as MHD. After fusion, the next new venue for plasma research was extraterrestrial. Although it was already understood that the Earth was immersed in a tenuous outflow of ionized hydrogen known as the solar wind, the dawn of the space age in 1957 also initiated experimental space plasma physics. More recently, the interstellar and intergalactic media beyond the solar system as well as exotic astronomical objects like quasars and pulsars have allowed us to observe plasmas under quite extreme conditions, unreproducible in any laboratory experiment. The dynamical behavior of a plasma is more complex than the dynamics of the gases and fluids we have met so far. This dynamical complexity has two main origins: (i) The dominant form of interparticle interaction in a plasma, Coulomb scattering, is so weak that the mean free paths of the electrons and ions are often larger than the plasma’s macroscopic length scales. This allows the particles’ momentum distribution functions to deviate seriously from their equilibrium Maxwellian forms and, in particular, to be highly anisotropic. (ii) The electromagnetic fields in a plasma are of long range. This allows charged particles to couple to each other electromagnetically and act in concert as modes of excitation (plasma waves or plasmons) that behave like single dynamical entities. Much of plasma physics consists of the study of the properties and interactions of these modes. The dynamical behavior of a plasma depends markedly on frequency. At the lowest of frequencies the ions and electrons are locked together by electrostatic forces and behave like an electrically conducting fluid; this is the regime of magnetohydrodynamics (MHD; Chap. 17). At somewhat higher frequencies the electrons and the ions can move relative to 3

4 each other, behaving like two separate, interpenetrating fluids; we shall study this two-fluid regime in Chap. 19. At still higher frequencies, complex dynamics is supported by momentum space anisotropies and can be analyzed using a variant of the kinetic-theory collisionless Boltzmann equation that we introduced in Chap. 2. We shall study such dynamics in Chap. 20. In the two-fluid and collisionless-Boltzmann analyses of Chaps. 19 and 20 we focus on phenomena that can be treated as linear perturbations of an equilibrium state. However, the complexities and long mean free paths of plasmas also produce rich nonlinear phenomena; we shall study some of these in Chap. 21. As a foundation for the dynamical studies in Chaps. 19, 20, and 21, we develop in Chap. 18 detailed insights into the microscopic structure of a plasma.

Chapter 18 The Particle Kinetics of Plasma Version 0818.1.K.pdf, April 1, 2009. Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 18.1 Reader’s Guide • This chapter relies significantly on portions of nonrelativistic kinetic theory as developed in Chap. 2. • It also relies a bit but not greatly on portions of magnetohydrodynamics as developed in Chap. 18. • The remaining chapters 19-21 of Part V, Plasma Physics rely heavily on this chapter.

18.1

Overview

The preceding chapter, Chap. 17, can be regarded as a transition from fluid mechanics toward plasma physics: It described equilibrium and low-frequency dynamical phenomena in a magnetized plasma using fluid-mechanics techniques. In this chapter, we prepare for more sophisticated descriptions of plasma by introducing a number of elementary foundational concepts peculiar to plasma, and by exploring a plasma’s structure on the scale of individual particles using elementary techniques from kinetic theory. Specifically, in Sec. 18.2 we identify the region of densities and temperatures in which matter, in statistical equilibrium, takes the form of a plasma, and we meet a number of specific examples of plasmas that occur in Nature and in the laboratory. Then in Sec. 18.3 we study two phenomena that are important for plasmas: the collective manner in which

1

2 large numbers of electrons and ions shield out the electric field of a charge in a plasma (Debye shielding), and oscillations of a plasma’s electrons relative to its ions (plasma oscillations). In Sec. 18.4, we study the Coulomb scattering by which a plasma’s electrons and ions deflect an individual charged particle from straight-line motion and exchange energy with it. We then examine the statistical properties of large numbers of such Coulomb scatterings— most importantly, the rates (inverse timescales) for the velocities of a plasma’s electrons and ions to isotropise, and the rates for them to thermalize. Our calculations reveal that Coulomb scattering is so weak that, in most plasmas encountered in Nature, it is unlikely to produce isotropised or thermalized velocity distributions. In Sec. 18.5 we give a brief preview of the fact that in real plasmas the scattering of electrons and ions off collective plasma excitations (plasmons) will often isotropize and thermalize their velocities far faster than would Coulomb scattering, and will cause many real plasmas to be far more isotropic and thermalized than our Coulomb-scattering analyses suggest. We shall explore this “anomalous” behavior in Chaps. 21 and 22. Finally, in Sec. 18.5 use the statistical properties of Coulomb scatterings to derive a plasma’s transport coefficients, specifically its electrical and thermal conductivities, for situations where Coulomb scattering dominates over particle-plasmon scattering. Most plasmas are significantly magnetized. This introduces important new features into their dynamics which we describe in Sec. 18.6: cyclotron motion (the spiraling of particles around magnetic field lines), a resulting anisotropy of the plasma’s pressure (different pressure along and orthogonal to the field lines), and the split of a plasma’s adiabatic index into four different adiabatic indices for four different types of compression. Finally, in Sec. 18.7, we examine the motion of an individual charged particle in a slightly inhomogeneous and slowly time varying magnetic field, and we describe adiabatic invariants which control that motion in easily understood ways.

18.2

Examples of Plasmas and their Density-Temperature Regimes

The density-temperature regime in which matter behaves as a nonrelativistic plasma is shown in Fig. 18.1. In this figure, and in most of Part V, we shall confine our attention to pure hydrogen plasma comprising protons and electrons. Many plasmas contain large fractions of other ions, which can have larger charges and do have greater masses than protons. This generalization introduces few new issues of principle so, for simplicity, we shall eschew it. The boundaries of the plasma regime in Fig. 18.1 are dictated by the following considerations:

18.2.1

Ionization boundary

We shall be mostly concerned with fully ionized plasmas, even though partially ionized plasmas such as the ionosphere are often encountered in physics, astronomy, and engineering. The plasma regime’s ionization boundary is the bottom curve in Fig. 18.1, at a temperature of a few thousand degrees. This boundary is dictated by chemical equilibrium for the reaction H↔p+e

(18.1)

3

10

25

30

3

Relativistic Nonrelativistic

10

Pairs

Co deg ene Ind llectiv Deg rate ep Pa e Be r h ene rate ticles avior

Sun Core

Fusion Expmnts

Magnetosphere

Gas Discharge

Solar Wind

Nonrelativistic Plasma Ionosphere

4

Interstellar Medium

6

Intergalactic Medium

log T , K

8

35

-1

ed Ioniz ed bin ecom

-3

Non

R

1 log kT, kev

5

Non de Deg genera ene rate te

0

log n, m -3 15 20

-5

2 -27

-22

-17

-12

-7

-2

3

8

log ρ , kg/m 3

Fig. 18.1: The density-temperature regime in which matter, made largely of hydrogen, behaves as a nonrelativistic plasma. The densities and temperatures of specific examples of plasmas are indicated by dashed lines. The number density of electrons n is shown horizontally at the top, and the corresponding mass density ρ is shown at the bottom. The temperature T is shown at the left in degrees Kelvin, and at the right kB T is shown in keV, thousands of electron volts.

4 as described by the Saha equation (Ex. 4.6): ne np (2πme kB T )3/2 −IP /kB T = e . (18.2) nH h3 Here ne , np , nH are the number densities of electrons, protons, and neutral Hydrogen atoms (at the relevant temperatures hydrogen molecules have dissociated into individual atoms); T is temperature; me is the electron rest mass; h is Planck’s constant; kB is Boltzmann’s constant; and IP = 13.6 eV is the ionization potential of hydrogen—i.e., the binding energy of its ground state. The boundary plotted in Fig. 18.1 is that of 50 percent ionization, i.e., ne = np = nH = ρ/2mH (with mH the mass of a hydrogen atom); but because of the exponential factor in Eq. (18.2), the line of 90 percent ionization is virtually indistinguishable from that of 50 percent ionization on the scale of the figure. Using the rough equivalence 1 eV∼ = 104 K, we might have expected that the ionization boundary would correspond to a temperature T ∼ IP /kB ∼ 105 K. However this is true only near the degeneracy boundary (see below). When the plasma is strongly non-degenerate, ionization occurs at a significantly lower temperature due to the vastly greater number of states available to an electron when free than when bound in a hydrogen atom. Equivalently, at low densities, once a Hydrogen atom has been broken up into an electron plus a proton, the electron (or proton) must travel a large distance before encountering another proton (or electron) with which to recombine, making a new Hydrogen atom; as a result equilibrium occurs at a lowered temperature, where the ionization rate is thereby lowered to match the smaller recombination rate.

18.2.2

Degeneracy boundary

The electrons, with their small rest masses, become degenerate more easily than the protons or hydrogen atoms. The slanting line on the right side of Fig. 18.1 is the plasma’s boundary of electron degeneracy. This boundary is determined by the demand that the mean occupation number of the electrons’ single-particle quantum states not be ≪ 1. In other words, the volume of phase space per electron, i.e. the product of the volumes of real space ∼ n−1 e and 3/2 of momentum space ∼ (me kB T ) occupied by each electron, should be comparable with the elementary quantum mechanical phase-space volume given by the uncertainty principle, h3 . Inserting the appropriate factors of order unity [cf. Eq. (2.39)], this relation becomes the boundary equation (2πme kB T )3/2 ne ≃ 2 . (18.3) h3 When the electrons becomes degenerate (rightward of the degeneracy line in Fig. 18.1), as they do in a metal or a white dwarf star, the electron de Broglie wavelength becomes large compared with the mean interparticle spacing, and quantum mechanical considerations are of paramount importance.

18.2.3

Relativistic boundary

Another important limit arises when the electron thermal speeds become relativistic. This occurs when T ∼ me c2 /kB ∼ 6 × 109 K (18.4)

5 (top horizontal line in Fig. 18.1). Although we shall not consider them much further, the properties of relativistic plasmas (above this line) are mostly analogous to those of nonrelativistic plasmas (below this line).

18.2.4

Pair-production boundary

Finally, for plasmas in statistical equilibrium, electron-positron pairs are created in profusion at high enough temperatures. In Ex. 4.5 we showed that, for kB T ≪ me c2 but T high enough that pairs begin to form, the density of positrons divided by that of protons is 3 n+ 1 1 h 2 eme c /kB T . (18.5) = , where y ≡ ne √ 2 1/2 np 2y[y + (1 + y ) ] 4 2πme kB T Setting this expression to unity gives the pair-production boundary. This boundary curve, labeled “Pairs” in Fig. 18.1, is similar in shape to the ionization boundary but shifted in temperature by ∼ 2 × 104 ∼ αF−2 , where αF is the fine structure constant. This is because we are now effectively “ionizing the vacuum” rather than a hydrogen atom, and the “ionization potential of the vacuum” is ∼ 2me c2 = 4IP /αF2 . We shall encounter a plasma above the pair-production boundary, and thus with a profusion of electron-positron pairs, in our discussion of the early universe in Chap. 25.

18.2.5

Examples of natural and man-made plasmas

Figure 18.1 and Table 18.1 show the temperature-density regions for the following plasmas: • Laboratory gas discharge. The plasmas created in the laboratory by electric currents flowing through hot gas, e.g., in vacuum tubes, spark gaps, welding arcs, and neon and fluorescent lights. • Controlled thermonuclear fusion experiments. The plasmas in which experiments for controlled thermonuclear fusion are carried out, e.g., in tokamaks. • Ionosphere. The part of the earth’s upper atmosphere (at heights of ∼ 50 − 300 km) that is partially photoionized by solar ultraviolet radiation. • Magnetosphere. The plasma of high-speed electrons and ions that are locked onto the earth’s dipolar magnetic field and slide around on its field lines at several earth radii. • Sun’s core. The plasma at the center of the sun, where fusion of hydrogen to form helium generates the sun’s heat. • Solar wind. The wind of plasma that blows off the sun and outward through the region between the planets. • Interstellar medium. The plasma, in our Galaxy, that fills the region between the stars; this plasma exhibits a fairly wide range of density and temperature as a result of such processes as heating by photons from stars, heating and compression by shock waves from supernovae, and cooling by thermal emission of radiation.

6 Plasma Gas discharge Tokamak Ionosphere Magnetosphere Solar core Solar wind Interstellar medium Intergalactic medium

ne (m−3 ) 1016 1020 1012 107 1032 106 105 1

T (K) 104 108 103 107 107 105 104 106

B (T) — 10 10−5 10−8 — 10−9 10−10 —

λD (m) 10−4 10−4 10−3 102 10−11 10 10 105

ND 104 108 105 1010 1 1011 1010 1015

ωp (s−1 ) 1010 1012 108 105 1018 105 104 102

νee (s−1 ) 105 104 103 10−8 1016 10−6 10−5 10−13

ωc (s−1 ) — 1012 106 103 — 102 10 —

rL (m) — 10−5 10−1 104 — 104 104 —

Table 18.1: Representative densities, temperatures and magnetic field strengths together with derived plasma parameters in a variety of environments. For definitions, see text. Values are given to order of magnitude as all of these environments are quite inhomogeneous.

• Intergalactic medium. The plasma that fills the space outside galaxies and clusters of galaxies; we shall meet the properties and evolution of this intergalactic plasma in our study of cosmology, in the last chapter of this book. Characteristic plasma properties in these various environments are collected in Table 18.1. In the next three chapters we shall study applications from all these environments. **************************** EXERCISES Exercise 18.1 Derivation: Boundary of Degeneracy Show that the condition ne ≪ (me kB T )3/2 /h3 [cf. Eq. (18.3)] that electrons be nondegenerate is equivalent to the following statements: −1/3

(a) The mean separation between electrons, l ≡ ne , is large compared to the de Broglie wavelength, λ ¯ dB = ~/(momentum), of an electron whose kinetic energy is kB T . (b) The uncertainty in the location of an electron drawn at random from the thermal distribution is small compared to the average inter-electron spacing. (c) The quantum mechanical zero-point energy associated with squeezing each electron −1/3 into a region of size l = ne is small compared to the electron’s mean thermal energy kB T .

****************************

7

18.3

Collective Effects in Plasmas – Debye Shielding and Plasma Oscillations

In this section we introduce two key ideas that are associated with most of the collective effects in plasma dynamics: Debye shielding and plasma oscillation.

18.3.1

Debye Shielding

Any charged particle inside a plasma attracts other particles with opposite charge and repels those with the same charge, thereby creating a net cloud of opposite charges around itself. This cloud shields the particle’s own charge from external view; i.e., it causes the particle’s Coulomb field to fall off exponentially at large radii, rather than falling off as 1/r 2 . 1 This Debye shielding of a particle’s charge can be demonstrated and quantified as follows: Consider a single fixed test charge Q surrounded by a plasma of protons and electrons. Let us define average densities for electrons and protons as smooth functions of radius r from the test charge, np (r), ne (r) and let the mean densities of electrons and protons (which must be equal because there must be overall charge neutrality) be n ¯ . Then the electrostatic potential Φ(r) outside the particle satisfies Poisson’s equation, which we write in SI units:2 ∇2 Φ = −

(np − ne )e Q − δ(r) . ǫ0 ǫ0

(18.6)

(We denote the positive charge of a proton by +e and the negative charge of an electron by −e.) A proton at radius r from the particle has an electrostatic potential energy eΦ(r). Correspondingly, the number density of protons at radius r is altered from n ¯ by the Boltzmann factor exp(−eΦ/kB T ); and, similarly, the density of electrons is altered by exp(+eΦ/kB T ): np = n ¯ exp(−eΦ/kB T ) ≃ n ¯ (1 − eΦ/kB T ) , ne = n ¯ exp(+eΦ/kB T ) ≃ n ¯ (1 + eΦ/kB T ) .

(18.7)

where we have made a Taylor expansion of the Boltzmann factor valid for eΦ ≪ kB T . By inserting the linearized versions of Eq. (18.7) into (18.6), we obtain ∇2 Φ =

2e2 n ¯ Q Φ − δ(r) . ǫ0 kB T ǫ0

(18.8)

The spherically symmetric solution to this equation, Φ= 1

Q −√2 r/λD e , 4πǫ0 r

(18.9)

Analogous effects are encounted in condensed matter physics and quantum electrodynamics. For those who prefer Gaussian units the translation is most easily effected by the transformations 4πǫ0 → 1 and µ0 /4π → 1, and inserting factors of c by inspection using dimensional analysis. It is also useful to recall that 1 T≡ 104 Gauss and that the charge on an electron is −1.6 × 10−19 C≡ −4.8 × 10−10 esu. 2

8 has the form of a Coulomb field with an exponential cutoff. of the exponential cutoff, λD ≡

ǫ0 kB T n ¯ e2

1/2

T /1K = 69 n ¯ /1 m−3

3

The characteristic lengthscale

1/2

m,

(18.10)

is called the Debye length. It is a rough measure of the size of the Debye shielding cloud that the charged particle carries with itself. The charged particle could be some foreign charged object (not a plasma electron or proton), or equally well, it could be one of the plasma’s own electrons or protons. Thus, we can think of each electron in the plasma as carrying with itself a positively charged Debye shielding cloud of size λD , and each proton as carrying a negatively charged cloud. Each electron and proton not only carries its own cloud; it also plays a role as one of the contributors to the clouds around other electrons and protons.

18.3.2

Collective behavior

A charged particle’s Debye cloud is almost always made of a huge number of electrons, and very nearly the same number of protons. It is only a tiny, time-averaged excess of electrons over protons (or protons over electrons) that produces the cloud’s net charge and the resulting exponential decay of the electrostatic potential. Ignoring this tiny excess, the mean number of electrons in the cloud and the mean number of protons are roughly ND ≡ n ¯

4π 3 (T /1K)3/2 λD = 1.4 × 106 . 3 (¯ n/1 m−3 )1/2

(18.11)

This Debye number is large compared to unity throughout the density-temperature regime of plasmas, except for the tiny lower right-hand corner of Fig. 18.1. The boundary of that corner region (labeled “Collective behavior / Independent Particles”) is given by ND = 1. The upper left-hand side of that boundary has ND ≫ 1 and is called the “regime of collective behavior” because a huge number of particles are collectively responsible for the Debye cloud, and this leads to a variety of collective dynamical phenomena in the plasma. The lower right-hand side has ND < 1 and is called the “regime of independent particles” because in it collective phenomena are of small importance. In this book we shall restrict ourselves to the huge regime of collective behavior and ignore the tiny regime of independent particles. Characteristic values for the Debye length in a variety of environments are collected in Table 18.1.

18.3.3

Plasma Oscillations and Plasma Frequency

Of all the dynamical phenomena that can occur in a plasma, perhaps the most important is a relative oscillation of the plasma’s electrons and protons. The simplest version of this plasma oscillation is depicted in Fig. 18.2. Suppose for the moment that the protons are all 3

In nuclear physics this potential is known as a Yukawa potential.

9

+++ +++ +++ +++ +++ +++ +++ +++ +++

ξ ___ ___ ___ ___ ___ ___ ___ ___ ___

neutral E=

enξ ε0

Fig. 18.2: Idealized depiction of the displacement of electrons relative to protons, which occurs during plasma oscillations.

fixed and displace the electrons rightward (in the x-direction) with respect to the protons by an amount ξ, thereby producing a net negative charge per unit area −e¯ nξ at the right end of the plasma, a net positive charge per unit area +e¯ nξ at the left end, and a corresponding electric field E = e¯ nξ/ǫ0 in the x-direction throughout the plasma. The electric field pulls on the plasma’s electrons and protons, giving the electrons an acceleration d2 ξ/dt2 = −eE/me and the protons an acceleration smaller by me /mp = 1/1860, which we shall neglect. The result is an equation of motion for the electrons’ collective displacement: e e2 n ¯ d2 ξ = − E = − ξ. dt2 me ǫ0 me

(18.12)

Since Eq. (18.12) is a harmonic-oscillator equation, the electrons oscillate sinusoidally, ξ = ξo cos(ωp t), at the plasma frequency ωp ≡

n ¯ e2 ǫ0 me

1/2

= 56.4

n ¯ 1/2 −1 s . 1 m−3

(18.13)

Notice that this frequency of plasma oscillations depends only on the plasma density n ¯ and not on its temperature or on the strength of any magnetic field that might be present. Note that if we define the electron thermal speed to be ve ≡ (kB Te /me )1/2 , then ωp ≡ ve /λD . In other words a thermal electron travels about a Debye length in a plasma period. Just as the Debye length functions as the electrostatic correlation length, so the plasma period plays the role of the electrostatic correlation time. Characteristic values for the plasma frequency in a variety of environments are collected in Table 18.1.

18.4

Coulomb Collisions

In this section we will study transport coefficients (electrical and thermal conductivities) and the establishment of local thermodynamic equilibrium in a plasma under the hypothesis

10

e θD b p

v

Fig. 18.3: The geometry of a Coulomb collision.

that Coulomb collisions provide the dominant source of scattering for both electrons and protons. In fact, as we shall see later, Coulomb scattering is usually a less effective scattering mechanism than collisionless processes mediated by fluctuating electromagnetic fields.

18.4.1

Collision frequency

Consider first, as we did in our discussion of Debye screening, a single test particle — let it be an electron — interacting with background field particles — let these be protons for the moment. The test electron moves with speed ve . The field protons will move much more slowly if they are near thermodynamic equilibrium (as their masses are much greater than those of the electrons), so they can be treated, for the moment, as at rest. When the electron flies by a single proton, we can characterize the encounter using an impact parameter b, which is what the distance of closest approach would have been if the electron were not deflected; see Fig. 18.3. The electron will be scattered by the Coulomb field of the proton, a process sometimes called Rutherford scattering. If the deflection angle is small, θD ≪ 1, we can approximate its value by computing the perpendicular impulse exerted by the Coulomb field of the proton integrating along the unperturbed straight line trajectory. Z +∞ e2 b e2 me ve θD = dt = , (18.14) 2 2 2 3/2 2πǫo ve b −∞ 4πǫo (b + ve t ) This implies that θD = bo /b for b ≫ bo , where bo ≡

e2 . 2πǫ0 me ve2

(18.15) (18.16)

When b . b0 , this approximation breaks down and the deflection angle is of order a radian.4 Below we shall need to know how much energy the electron loses, for the large-impactparameter case. That energy loss, −∆E, is equal to the energy gained by the proton. Since the proton is initially at rest, and since momentum conservation implies it gains a momentum ∆p = me ve θD , ∆E must be 2 (∆p)2 me bo ∆E = − E for b ≫ bo . (18.17) =− 2mp mp b 4

A more careful calculation gives 2 tan θD /2 = bo /b, see e.g. Leighton (1959).

11 Here E = 12 me ve2 is the electron’s initial energy. We turn, next, from an individual Coulomb collision to the net, statistically averaged effect of many collisions. The first thing we shall compute is the mean time tD required for the orbit of the test electron to be deflected by an angle of order a radian from its initial direction, and the inverse of tD : the “deflection rate” or “deflection frequency” νD = 1/tD . If the dominant source of this deflection were a single large-angle scattering event, then the relevant cross section would be σ = πb2o (since all impact parameters . bo produce large-angle scatterings), and the mean deflection time and frequency would be νD ≡

1 = nσve = nπb2o ve tD

(single large-angle scattering of an electron by a proton).

(18.18) Here n is the proton number density, which is the same as the electron number density in our hydrogen plasma. The cumulative, random-walk effects of many small-angle scatterings off field protons actually produce a net deflection of order a radian in a time shorter than this. As the directions of the individual scatterings are random, the mean deflection angle P after many 2 scatterings vanishes. However, the mean square deflection angle, hΘ2 i = all encounters θD will not vanish. That mean square deflection angle, during a time t, accumulates up to Z bmax 2 bmax bo 2 2 . (18.19) nve t2πbdb = n2πbo ve t ln hΘ i = b bmin bmin 2 Here the factor (bo /b)2 in the integrand is the squared deflection angle θD for impact parameter b, and the remaining factor nve t2πbdb is the number of encounters that occur with impact parameters between b and b + db during time t. The integral diverges logarithmically at both its lower limit bmin and its upper limit bmax . Below we shall discuss the physical origins of and values of the cutoffs bmin and bmax . The value of t that makes the mean square deflection angle hΘ2 i equal to unity is, to within factors of order unity, the deflection time −1 tD (and inverse deflection frequency νD ):

ep νD =

1 ne4 ln Λ 2 , = n2πb v ln Λ = o e tep 2πǫ20 m2e ve3 D

where Λ = bmax /bmin .

(18.20)

Here the superscript ep indicates that the test particle is an electron and the field particles are protons. Notice that this deflection frequency is larger, by a factor 2 ln Λ, than the frequency (18.18) for a single large-angle scattering. We must also consider the repulsive collisions of our test electron with field electrons. Although we are no longer justified in treating the field electrons as being at rest, the impact parameter for a large angle deflection is still ∼ b0 , so Eq. (18.20) is also appropriate to this case, in order of magnitude: ee νD

ne4 ln Λ 1 ep 2 . = ee ∼ νD ∼ n2πbo ve ln Λ = tD 2πǫ20 m2e ve3

(18.21)

12 Finally, and in the same spirit, we can compute the collision frequency for the protons. Because electrons are so much lighter than protons, proton-proton collisions will be more effective in deflecting protons than proton-electron collisions. Therefore, the proton collision frequency is given by Eqs. (18.21) with the electron subscripts replaced by proton subscripts pp νD =

18.4.2

ne4 ln Λ 1 . ∼ tpp 2πǫ20 m2p vp3 D

(18.22)

The Coulomb logarithm

The maximum impact parameter bmax , which appears in Λ ≡ bmax /bmin , is the Debye length λD , since at impact parameters b ≫ λD the Debye shielding screens out the field particle’s Coulomb field, while at b ≪ λD Debye shielding is unimportant. The minimum impact parameter bmin has different values depending on whether quantum mechanical wave packet spreading is important or not for the test particle during the collision. Because of wave-function spreading, the nearest the test particle can come to a field particle is the test particle’s de Broglie wavelength, i.e., bmin = ~/mv. However if the de Broglie wavelength is smaller than b0 , then the effective value of bmin will be simply b0 . In summary, bmin = max[bo = 2e2 /me ve2 , ~/me ve ] , bmin = max[bo = 2e2 /mp vp2 , ~/mp vp ] ,

and bmax = λD for test electrons ; and bmax = λD for test protons . (18.23)

Over most of the accessible range of density and temperature for a plasma, 3 . ln Λ . 30. Therefore if we set ln Λ ≃ 10 , (18.24) our estimate is good to a factor ∼ 3. For tables of ln Λ, see Spitzer (1962).

18.4.3

Thermal Equilibration Times in a Plasma

Suppose that a hydrogen plasma is heated in some violent way (e.g., by a shock wave). Such heating will typically give the plasma’s electrons and protons a non-Maxwellian velocity distribution. Coulomb collisions will then, as time passes, (and in the absence of more violent disruptions), force the particles to exchange energy in random ways, and will gradually drive them into thermal equilibrium. This thermal equilibration is achieved at different rates for the electrons and the protons, and correspondingly the following three timescales are all different: time required for electrons to equilibrate with each other, eq , tee ≡ achieving a near Maxwellian velocity distribution teq (18.25) pp ≡ (time for protons to equilibrate with each other) , eq tep ≡ (time for electrons to equilibrate with protons) . In this section we shall compute these three equilibration times. Electron-electron equilibration. In evaluating teq ee , we shall assume that the electrons begin with typical individual energies of order kB Te , where Te is the temperature to which they are

13 going to equilibrate, but their initial velocity distribution is rather non-Maxwellian. Then we can choose a typical electron as the “test particle”. We have argued that Coulomb interactions with electrons and protons are comparably effective in deflecting test electrons. However, they are not comparably effective in transferring energy. When the electron collides with a stationary proton, the energy transfer is ∆E me 2 ≃ − θD E mp

(18.26)

[Eq. (18.17)]. This is smaller than the typical energy transfer in an electron-electron collision by the ratio me /mp . Therefore it is the collisions between the electrons that are responsible for establishing an electron Maxwellian distribution function. The alert reader may spot a problem at this point. According to Eq. (18.26), electrons always lose energy to protons and never gain it. This would cause the electron temperature to continue to fall below the proton temperature, in clear violation of the second law of thermodynamics. Actually what happens in practice is that if we allow for the finite proton velocities, then the electrons can gain energy from some electron-proton collisions. This is also the case for the electron-electron collisions of immediate concern. The correct formalism for dealing with this situation is the Fokker-Planck formalism, discussed in Sec. 5.7. FokkerPlanck is appropriate because, as we have shown, many weak scatterings dominate the few strong scatterings. If we use the Fokker-Planck approach to define an energy equilibration time for a nearly Maxwellian distribution of electrons with temperature T , then it turns out that a simple estimate based on combining the deflection time, given by Eq. (18.18) with the typical energy transfer estimated in relation (18.26) and assuming a typical velocity v = (3kB T /me )1/2 gives an answer good to a factor 2. It is actually convenient to express the energy equilibration timescale using its reciprocal, the electron-electron equilibration rate, νee . This facilitates comparison with the other frequencies characterizing the plasma. The true Fokker-Planck estimate for electrons near equilibrium is then nσT c ln Λ νee = 2π 1/2

kB Te me c2

−3/2

= 2.5 × 10

−5

n T −3/2 ln Λ e s−1 , −3 1K 10 1m

(18.27)

where we have used the Thomson cross section σT = (8π/3)(e2 /4πǫ0 me c2 )2 = 6.65 × 10−29 m2 .

(18.28)

As for proton deflections [Eq. (18.22)], so also for proton energy equilibration, the light electrons are far less effective at influencing the protons than are other protons. Therefore, the protons achieve a thermal distribution by equilibrating with each other, and their protonproton equilibration rate can be written down immediately from Eq. (18.27) by replacing the electron masses and temperatures with the protonic values. νpp

nσT c ln Λ = 2π 1/2

me mp

1/2

kB Tp me c2

−3/2

= 5.8 × 10

−7

n T −3/2 ln Λ p s−1 . −3 1K 10 1m (18.29)

14 Finally, if the electrons and protons have different temperatures, we should compute the timescale for the two species to equilibrate with each other. This again is easy to estimate using the energy transfer equation (18.26): tep ≃ (mp /me )tee The more accurate FokkerPlanck result for the electron-proton equilibration rate is again very close and is given by n T −3/2 ln Λ e s−1 . = 4.0 × 10 νep 1K 10 1m−3 (18.30) Thus, at the same density and temperature, protons require ∼ (mp /me )1/2 = 43 times longer to reach thermal equilibrium among themselves than do the electrons, and proton-electron equilibration takes a time ∼ (mp /me ) = 1836 longer than electron-electron equilibration. 2nσT c ln Λ = π 1/2

18.4.4

me mp

kB Te me c2

−3/2

−8

Discussion

In Table 18.1, we show the electron-electron equilibration rates for a variety of plasma environments. Generically, they are very small compared with the plasma frequencies. For example, if we take parameters appropriate to a Tokamak, we find that νee ∼ 10−8 ωp and νep ∼ 10−11 ωp . In fact we can see that the equilibration time is comparable, to order of magnitude, with the total plasma confinement time ∼ 0.1 s (cf. section 17.3) . The disparity between νe and ωp is even greater in the interstellar medium. For this reason most plasmas are well described as collisionless and we must anticipate that the particle distribution functions will depart significantly from Maxwellian form. **************************** EXERCISES Exercise 18.2 Derivation: Coulomb logarithm (a) Express the Coulomb logarithm in terms of the Debye number, ND , in the classical regime, when bmin ∼ b0 . (b) Use the representative parameters from Table 18.1 to evaluate Coulomb logarithms for the sun’s core, a Tokamak, and the interstellar medium, and verify that they lie in the range 3 . ln Λ . 30. Exercise 18.3 Derivation: Electron-electron collision rate Using the non-Fokker-Planck arguments outlined in the text, compute an estimate of the electron-electron equilibration rate and show that it agrees with the Fokker-Planck result, Eq. (18.27), to within a factor 2. Exercise 18.4 Problem: Dependence on thermal equilibration on charge and mass Compute the ion equilibration rate for a pure He3 plasma with electron density 1020 m−3 and temperature 108 K.

15 Exercise 18.5 Example: Stopping of α-particles A 100 MeV α-particle is incident upon a plastic containing electrons with density ne = 2 × 1029 m−3 . Estimate the distance that it will travel before coming to rest. This is known as the range. (Ignore relativistic corrections and refinements such as the density effect. However, do consider the appropriate values of bmax , bmin .) Exercise 18.6 Example: Parameters for Various Plasmas Estimate the Debye length λD , the Debye number ND , the plasma frequency fp ≡ ωp /2π ep and the electron deflection timescale tee D ∼ tD , for the following plasmas. (a) An atomic bomb explosion in the Earth’s atmosphere one millisecond after the explosion. [Use the Sedov-Taylor similarity solution for conditions behind the bomb’s shock wave; Sec. 16.6.] (b) The ionized gas that envelopes the Space Shuttle [cf. Box 16.2] as it re-enters the Earth’s atmosphere. (c) The expanding universe during its early evolution, just before it became cool enough for electrons and protons to combine to form neutral hydrogen (i.e., just before ionization “turned off”). [As we shall discover in Chap. 25, the universe today is filled with black body radiation, produced in the big bang, that has a temperature T = 2.7 K, and the universe today has a mean density of hydrogen ρ ∼ 1 × 10−29 g/cm3 . Extrapolate backward in time to infer the density and temperature at the epoch just before ionization turned off.] Exercise 18.7 Problem: Equilibration Time for a Globular Star Cluster Stars have many similarities to electrons and ions in a plasma. These similarities arise from the fact that in both cases the interaction between the individual particles (stars, or ions and electrons) is a radial, 1/r 2 force. The principal difference is the fact that the force between stars is always attractive, so there is no analog of Debye shielding. One consequence of this difference is the fact that a plasma can be spatially homogeneous and static, when one averages over length scales large compared to the interparticle separation; but a collection of stars cannot be: The stars congregate into clusters that are held together by the stars’ mutual gravity. A globular star cluster is an example. A typical globular cluster is a nearly spherical swarm of stars with the following parameters: (cluster radius)≡ R = 10 light years; (total number of stars in the cluster) ≡ N = 106 ; and (mass of a typical star)≡ m = (0.4 solar masses)= 8 ×1032 grams. Each star moves on an orbit of the average, “smeared out” gravitational field of the entire cluster; and since that smeared-out gravitational field is independent of time, each star conserves its total energy (kinetic plus gravitational) as it moves. Actually, the total energy is only approximately conserved. Just as in a plasma, so also here, gravitational “Coulomb collisions” of the star with other stars produce changes of the star’s energy. (a) What is the mean time tE for a typical star to change its energy substantially? Express your answer, accurate to within a factor ∼ 3, in terms of N, R, and m; evaluate it numerically and compare it with the age of the Universe.

16 (b) The cluster evolves substantially on the timescale tE . What type of evolution would you expect to occur? What type of stellar energy distribution would you expect to result from this evolution? 5

****************************

18.5

Transport Coefficients

Because electrons have far lower masses than ions, they have far higher typical speeds at fixed temperature and are much more easily accelerated; i.e., they are much more mobile. As a result, it is the motion of the electrons, not the ions, that is responsible for the transport of heat and charge through a plasma. In the spirit of the discussion above, we can now compute transport properties such as the electric conductivity and thermal conductivity on the presumption that it is Coulomb collisions that determine the electron mean free paths and that magnetic fields are unimportant. (Later we will see that collisionless effects usually provide a more serious impediment to charge and heat flow than Coulomb collisions and thus dominate the conductivities.) Consider, first, an electron exposed to a constant, accelerating electric field E. The electron’s typical drift velocity along the direction of the electric field is −eE/me νD , where νD is the deflection frequency (rate) evaluated in Eqs. (18.20) and (18.21). The associated current density is therefore j ∼ ne2 E/me νD , and the electrical conductivity is κe ∼ ne2 /me νD . (Note that electron-electron collisions conserve momentum and do not impede the flow of current, so electron-proton collisions, which happen about as frequently, produce all the electrical resistance and are thus responsible for this κe .) The thermal conductivity can likewise be estimated by noting that a typical electron travels a mean free path ℓ ∼ ve /νD from a location where the average temperature is different by an amount ∆T ∼ ℓ|∇T |. The heat flux transported by the electrons is therefore ∼ nve k∆T which should be equated to −κ∇T . We therefore obtain the electron contribution 2 to the thermal conductivity as κ ∼ nkB T /me νD . Computations based on the Fokker-Planck approach6 produce equations for the electrical and thermal conductivities that agree with the above estimates to within factors of order unity: κe = 4.9

e2 σT c ln Λme

kB c κ = 19.1 σT ln Λ 5 6

kB Te me c2

kB Te me c2

3/2

5/2

= 1.5 × 10

= 4.4 × 10

−11

−3

Te 1K

Te 1K

3/2

5/2

For a detailed discussion, see, e.g., Binney & Tremaine (1987). Spitzer (1962).

ln Λ 10

ln Λ 10

−1

−1

Ω−1 m−1 , (18.31a)

Wm−1 K−1 .

(18.31b)

17 Here σT is the Thomson cross section, Eq. (18.28). Note that neither transport coefficient depends explicitly upon the density; increasing the number of charge or heat carriers is compensated by the reduction in their mean free paths.

18.5.1

Anomalous Resistivity and Anomalous Equilibration

We have demonstrated that the theoretical Coulomb interaction between charged particles gives very long mean free paths and, consequently, very large electrical and thermal conductivities. Is this the way that real plasmas behave? The answer is invariably “no”. As we shall show in the next three chapters, a plasma can support a variety of modes of “collective excitation” in which large numbers of electrons and/or ions move in collective, correlated fashions that are mediated by electromagnetic fields which they create. When the modes of a plasma are sufficiently strongly excited, the electromagnetic fields carried by the excitations can be much more effective than Coulomb scattering at deflecting the orbits of individual electrons and ions, and at feeding energy into or removing it from electrons and ions. Correspondingly, the electrical and thermal conductivities will be reduced. The reduced transport coefficients are termed anomalous and as we shall start to discuss in Chap. 21, it is one of the principal tasks of nonlinear plasma physics to provide quantitative calculations of these coefficients. **************************** EXERCISES Exercise 18.8 Challenge: Thermoelectric Transport Coefficients (a) Consider a plasma in which the magnetic field is so weak that it presents little impediment to the flow of heat and electric current. Suppose that the plasma has a gradient ∇Te of its electron temperature and also has an electric field E. It is a familiar fact that the temperature gradient will cause heat to flow and the electric field will create an electric current. Not so familiar, but somewhat obvious if one stops to think about it, is the fact that the temperature gradient also creates an electric current, and the electric field also causes heat flow. Explain in physical terms why this is so. (b) So long as the mean free path of an electron between substantial deflections, ℓmfp = (3kB Te /me )1/2 tD,e , is short compared to the lengthscale for substantial temperature change, Te /|∇Te |, and short compared to the lengthscale for the electrons to be accelerated to near the speed of light by the electric field, me c2 /eE, the fluxes of heat q and of electric charge J will be governed by electron diffusion and will be linear in ∇T and E: q = −κ∇T − βE , J = κe E + α∇T . (18.32) The coefficients κ (heat conductivity), κe (electrical conductivity), β, and α are called thermoelectric transport coefficients. Use kinetic theory, in a situation where ∇T =

18 0, to derive the conductivity equations J = κe E and q = −βE, and the following approximate formulae for the transport coefficients: κe ∼

ne2 tD,e , me

β∼

kB T κe . e

(18.33)

Show that, aside from a coefficient of order unity, this κe , when expressed in terms of the plasma’s temperature and density, reduces to Eq. (18.31). The specific coefficients in Eq. (18.31) comes from a Fokker-Planck analysis.7 (c) Use kinetic theory, in a situation where E = 0, to derive the conductivity equations q = −κ∇T and J = α∇T , and the approximate formulae κ ∼ kB n

kB Te tD,e , me

α∼

e κ. kB Te

(18.34)

Show that, aside from a coefficient of order unity, this κ reduces to Eq. (18.31).8 (d) It can be shown9 that for a hydrogen plasma it must be true that αβ = 0.581 . κe κ

(18.35)

By studying the entropy-governed probability distributions for fluctuations away from statistical equilibrium, one can derive another relation among the thermoelectric transport coefficients, the Onsager relation 10 β = αT +

5 kB Te κe ; 2 e

(18.36)

Equations (18.35) and (18.36) determine α and β in terms of κe and κ. Show that your approximate values of the transport coefficients, Eqs. (18.33) and (18.34), are in rough accord with Eqs. (18.35) and (18.36). (e) If a temperature gradient persists for sufficiently long, it will give rise to sufficient charge separation in the plasma to build up an electric field (called a “secondary field”) that prevents further charge flow. Show that this suppresses the heat flow: The total heat flux is then q = −κT, effective ∇T , where αβ κT, effective = 1 − κ = 0.419κ . (18.37) κe κ

**************************** 7

Spitzer (1962). ibid 9 ibid 10 Kittel (1958), Secs 33, 34; Reif (1965), Sec 15.8. 8

19

18.6

Magnetic Field

18.6.1

Cyclotron frequency and Larmor radius.

Many of the plasmas that we will encounter are endowed with a strong magnetic field. This causes the charged particles to travel along helical orbits about the field direction rather than move rectilinearly between collisions. If we designate the magnetic field strength B, the equation of motion for an electron becomes me

dv = −ev × B , dt

(18.38)

which gives rise to a constant speed v|| parallel to the magnetic field and a circular motion perpendicular to the field with angular velocity eB ωc = = 1.76 × 1011 me

B 1T

s−1 .

(18.39)

This angular velocity is called the electron cyclotron frequency, or simply the cyclotron frequency. Notice that this cyclotron frequency depends only on the magnetic field strength B and not on the plasma’s density n ¯ or the electron velocity (i.e., the plasma temperature T ). Neither does it depend upon the angle between v and B, (called the pitch angle, α). We also define a Larmor radius, rL , which is the radius of the gyrational orbit projected perpendicular to the direction of the magnetic field. This is given by v sin α v⊥ = = 5.7 × 10−7 rL = ωc ωc

v⊥ 1kms−1

B 1T

−1

m.

(18.40)

Protons (and other ions) in a plasma also undergo cyclotron motion. Because their masses are larger by mp /me = 1836 than the electron mass, their angular velocity eB = 0.96 × 108 s−1 ωcp = mp

B 1T

(18.41)

is 1836 times lower. This quantity is called the proton cyclotron frequency or ion cyclotron frequency. The sense of gyration is, of course, opposite to that of the electrons. If the protons have similar temperatures to the electrons, their speeds are typically ∼ 43 times smaller than those of the electrons, and their typical Larmor radii are ∼ 43 times larger than those of the electrons. We demonstrated above that all the electrons in a plasma can oscillate in phase at the plasma frequency. The electrons’ cyclotron motion can also become coherent. Such coherent motions are called cyclotron resonances or cyclotron oscillations. Ion cyclotron resonances can also occur. Characteristic cyclotron frequencies and Larmor radii are tabulated in Table 18.1. It can be seen that the cyclotron frequency, like the plasma frequency, is typically larger than the the energy equilibration rates.

20

18.6.2

Validity of the Fluid Approximation.

In Chap. 17, we developed the magnetohydrodynamic (MHD) description of a magnetized plasma. We described the plasma by its density and temperature (or equivalently its pressure). Under what circumstances is this an accurate description? The answer to this question turns out to be quite complex and a full discussion would go well beyond this book. Some aspects are, however, easy to describe. One circumstance when a fluid description ought to be acceptable is when the timescales τ that characterize the macroscopic flow are long −1 compared with time required to establish Maxwellian equilibrium (i.e. τ ≫ νei ), and the excitation level of collective wave modes is so small that these do not interfere seriously with the influence of Coulomb collisions. Unfortunately, this is rarely the case. (One type of plasma where this might be a quite good approximation is that in the interior of the sun.) Magnetohydrodynamics can still provide a moderately accurate description of a plasma even if the electrons and ions are not fully equilibrated, when the electrical conductivity can be treated as very large and the thermal conductivity as very small. This means that we can treat the magnetic Reynolds’ number as effectively infinite and the plasma and the equation of state as that of a perfect fluid (as we assumed in much of Chap. 17). It is not so essential that the actual particle distribution functions be Maxwellian, merely that they have second moments that can be associated with a common temperature. Quite often in plasma physics almost all of the dissipation is localized, for example to the vicinity of a shock front, and the remainder of the flow can be treated using MHD. The MHD description then provides a boundary condition for a fuller plasma physical analysis of the dissipative region. This simplifies the analysis of such situtations. The great advantage of fluid descriptions, and the reason why physicists abandon them with such reluctance, is that they are much simpler than other descriptions of a plasma. One only has to cope with the pressure, density, and velocity and does not have to deal with an elaborate statistical description of the positions and velocities of all the particles. Generalizations of the simple fluid approximation have therefore been devised which can extend the domain of validity of simple MHD ideas. One extension, which we develop in the following chapter, is to treat the protons and the electrons as two separate fluids and derive dynamical equations to describe their (coupled) evolution. The other approach, which we describe now, is to acknowledge that, in most plasmas, the cyclotron period is very short compared with the Coulomb collision time and that the timescale on which energy is transferred backwards and forwards between the electrons and the protons and the electromagnetic field is −1 intermediate between ωc−1 and νee . Intuitively, this allows the electron and proton velocity distributions to become axisymmetric with respect to the magnetic field direction, though not fully isotropic. In other words, we can characterize the plasma using a density and two separate components of pressure, one associated with motion along the direction of the magnetic field, the other with gyration about the field. For simplicity, let us just consider the electrons and their stress tensor, which we can write as Z d3 p jk Te = Ne pj pk , (18.42) m where Ne is the electron number density in phase space; cf. Sec. 2.5. If we orient cartesian

21 axes so that the direction of ez is parallel to the local magnetic field, then 

 Pe⊥ 0 0 ||Tejk || =  0 Pe⊥ 0  , 0 0 Pe||

(18.43)

where Pe⊥ is the electron pressure perpendicular to B, and Pe|| is the electron pressure parallel to B. Now suppose that there is a compression or expansion on a timescale that is long compared with the cyclotron period but short compared with the Coulomb collision timescale so that we should not expect that Pe⊥ is equal to Pe|| and we anticipate that they will evolve with density according to different laws. The adiabatic indices governing P⊥ and P|| in such a situation are easily derived from kinetic theory arguments (Exercise 18.11): For compression perpendicular to B and no change of length along B, γ⊥ ≡

∂ ln P⊥ ∂ ln ρ

=2,

γ|| ≡

s

∂ ln P|| ∂ ln ρ

=1.

(18.44)

s

and for compression parallel to B and no change of transverse area, γ⊥ ≡

∂ ln P⊥ ∂ ln ρ

=1,

γ|| ≡

s

∂ ln P|| ∂ ln ρ

=3.

(18.45)

s

By contrast if the expansion is sufficiently slow that Coulomb collisions are effective (though not so slow that heat conduction can operate), then we expect the velocity distribution to maintain isotropy and both components of pressure to evolve according to the law appropriate to a monatomic gas, ∂ ln P|| ∂ ln P⊥ 5 γ= (18.46) = = . ∂ ln ρ s ∂ ln ρ s 3

18.6.3

Conductivity Tensor

As is evident from the foregoing remarks, if we are in a regime where Coulomb scattering really does determine the particle mean free path, then an extremely small magnetic field strength suffices to ensure that individual particles complete gyrational orbits before they collide. Specifically, for electrons, the deflection time tD , given by Eq. (18.20) exceeds ωc−1 if B & 10

−12

n 1cm−3

Te 1K

−3/2

T.

(18.47)

This is almost always the case. It is also almost always true for the ions. When inequality (18.47) is satisfied, the transport coefficients must be generalized to form tensors. Let us compute the electrical conductivity tensor for a plasma in which a steady electric field E is applied. Once again orienting our coordinate system so that the

22 magnetic field is parallel to ez , we can write down an equation of motion for the electrons by balancing the electromagnetic acceleration with the average rate of loss of momentum due to collisions. −e(E + v × B) − me νD v = 0 (18.48) Solving for the velocity we obtain      vx 1 ωc /νD 0 Ex e  vy  = −  −ωc /νD   Ey  . 1 0 2 me νD (1 + ωc2 /νD ) 2 2 vz 0 0 1 + ωc /νD Ez

(18.49)

As the current density is je = −nv = κe E, the electrical conductivity tensor is given by 

 1 ωc /νD 0 ne  −ωc /νD  . 1 0 κe = 2 me νD (1 + ωc2/νD ) 2 2 0 0 1 + ωc /νD 2

(18.50)

It is apparent from the form of this conductivity tensor that when ωc ≫ νD , the conductivity perpendicular to the magnetic field is greatly inhibited, whereas that along the magnetic field is unaffected. Similar remarks apply to the flow of heat. It is therefore often assumed that only transport parallel to the field is effective. However, as is made clear in the next section, if the plasma is inhomogeneous, cross-field transport can be quite rapid in practice. **************************** EXERCISES Exercise 18.9 Example: Relativistic Larmor radius Use the relativistic equation of motion to show that the relativistic electron cyclotron frequency is ωc = eB/γme where γ is the electron Lorentz factor. What is the relativistic Larmor radius? Exercise 18.10 Example: Ultra-High-Energy Cosmic Rays The most energetic cosmic ray reported in recent years is believed to be a proton and to have an energy ∼ 3 × 1020 eV. In order to arrive at earth, it must have passed through a perpendicular Galactic magnetic field of strength 0.3 nT for a distance of ∼ 1000 light year. Through what angle will it have been deflected? Exercise 18.11 Problem: Adiabatic Indices for Rapid Compression of a Magnetized Plasma Consider a plasma in which, in the local rest frame of the electrons, the electron stress tensor has the form (18.43) with ez ≡ b the direction of the magnetic field. The following analysis for the electrons can be carried out independently for the ions, with the same resulting formulae. (a) Show that Pe|| = nme hv||2 i ,

1 Pe ⊥ = nme h|v⊥ |2 i , 2

(18.51)

23 where hv||2 i is the mean square electron velocity parallel to B and h|v⊥ |2 i is the mean square velocity orthogonal to B. (The velocity distributions are not assumed to be Maxwellian.) (b) Consider a fluid element with length l along the magnetic field and cross sectional area ¯ be the mean velocity of the electrons (¯ A orthogonal to the field. Let v v = 0 in the mean electron rest frame) and let θ and σjk be the expansion and shear of the mean ¯ . Show that electron motion as computed from v 1 dA/dt 2 dl/dt = θ + σ jk bj bk , = θ − σ jk bj bk , (18.52) l 3 A 3 where b = B/|B| = ez is a unit vector in the direction of the magnetic field. (c) Assume that the timescales for compression and shearing are short compared to the Coulomb-scattering electron-deflection timescale, τ ≪ tD,e . Show, using the laws of energy and particle conservation, that 2

2 dl 1 dhv|| i =− , 2 hv|| i dt l dt 2 1 dhv⊥ i 1 dA =− , 2 hv⊥ i dt A dt 1 dn 1 dl 1 dA =− − . n dt l dt A dt

(18.53)

(d) Show that dl/dt dA/dt 5 1 dPe|| = −3 − = − θ − 2σ jk bj bk , Pe|| dt l A 3 dl/dt dA/dt 5 1 dPe⊥ =− −2 = − θ + σ jk bj bk . Pe⊥ dt l A 3

(18.54)

(e) Show that when the fluid is expanding or compressing entirely perpendicular to B, with no expansion or compression along B, the pressures change in accord with the adiabatic indices of Eq. (18.44). Show, similarly, that when the fluid expands or compresses along B, with no expansion or compression in the perpendicular direction, the pressures change in accord with the adiabatic indices of Eq. (18.45). (f) Hence derive the so-called double adiabatic equations of state P⊥2 Pk ∝ n5 ,

P⊥ ∝ nB ,

(18.55)

valid for changes on timescales long compared with the cyclotron period but short compared with all Coulomb collision times. 11

**************************** 11

See Chew, Goldberger & Low (1956)

24

18.7

Adiabatic Invariants

In the next three chapters we shall meet a variety of plasma phenomena that can be understood in terms of the orbital motions of individual electrons and ions. These phenomena typically entail motions in an electromagnetic field that is nearly, but not quite spatially homogeneous on the scale of the Larmor radius rL , and that is nearly but not quite constant during a cyclotron period 2π/ωc . In this section, in preparation for the next three chapters, we shall review charged particle motion in such nearly homogeneous, nearly time-independent fields. Since the motions of electrons are usually of greater interest than those of ions, we shall presume throughout this section that the charged particle is an electron; and we shall denote its charge by −e and its mass by me .

18.7.1

Homogeneous, time-independent magnetic field

From the nonrelativistic version of the Lorentz force equation, dv/dt = −(e/me )v × B, one readily deduces that an electron in a homogeneous, time-independent magnetic field B moves with uniform velocity v|| parallel to the field, and moves perpendicular to the field in a circular orbit with the cyclotron frequency ωc = eB/me and Larmor radius rL = me v⊥ /eB. Here v⊥ is the electron’s time-independent transverse speed (speed perpendicular to B).

18.7.2

Homogeneous time-independent electric and magnetic fields

Suppose that the homogeneous magnetic field B is augmented by a homogeneous electric field E; and assume, initially, that |E × B| < B 2 c. Then examine the electric and magnetic fields in a new reference frame, one that moves with the velocity vD =

E×B B2

(18.56)

relative to the original frame. Note that the moving frame’s velocity vD is perpendicular to both the magnetic field and the electric field. From the Lorentz transformation law for the electric field, E′ = γ(E + vD × B), we infer that in the moving frame the electric field and the magnetic field are parallel to each other. As a result, in the moving frame the electron’s motion perpendicular to the magnetic field is purely circular; and, correspondingly, in the original frame its perpendicular motion consists of a drift with velocity vD , and superimposed on that drift, a circular motion (Fig. 18.4). In other words, the electron moves in a circle whose center (the electron’s guiding center ) drifts with velocity vD . Notice that the drift velocity (18.56) is independent of the electron’s charge and mass, and thus is the same for ions as for electrons. This drift is called the E × B drift. When the component of the electric field orthogonal to B is so large that the drift velocity computed from (18.56) exceeds the speed of light, the electron’s guiding center, of course, cannot move with that velocity. Instead, the electric field drives the electron up to higher and higher velocities as time passes, but in a sinusoidally modulated manner. Ultimately the electron velocity gets arbitrarily close to the speed of light.

25

B E

Fig. 18.4: The electron motion (upper diagram) and proton motion (lower diagram) orthogonal to the magnetic field, when there are constant electric and magnetic fields with |E × B| < B 2 c. Each electron and proton moves in a circle with a superposed drift velocity vD given by Eq. (18.56).

When a uniform, time-independent gravitational field g accompanies a uniform, timeindependent magnetic field B, its effect on an electron will be the same as that of an electric field Eequivalent = −(me /e)g: The electron’s guiding center will acquire a drift velocity vD = −

me g × B , e B2

(18.57)

and similarly for a proton. This gravitational drift velocity is typically very small.

18.7.3

Inhomogeneous, time-independent magnetic field

When the electric field vanishes, the magnetic field is spatially inhomogeneous and timeindependent, and the inhomogeneity scale is large compared to the Larmor radius rL of the electron’s orbit, the electron motion is nicely described in terms of a guiding center. Consider, first, the effects of a curvature of the field lines (Fig. 18.5a). Suppose that the speed of the electron along the field lines is v|| . We can think of this as a guiding center motion. As the field lines bend in, say, the direction of the unit vector n with radius of curvature R, this longitudinal guiding-center motion experiences the acceleration a = v||2 n/R. That acceleration is equivalent to the effect of an electric field Eeffective = (−me /e)v||2 n/R, and it therefore produces a drift of the guiding center with vD = (Eeffective ×B)/B 2 . Since the curvature R of the field line and the direction n of its bend are given by B −2 (B·∇)B = n/R, this curvature drift velocity is vD =

me v||2 c e

B×

(B · ∇)B . B4

(18.58)

Notice that the magnitude of this drift is vD =

rL v|| v|| . R v⊥

(18.59)

26 B

B

B out of paper

mirror point =0

Large B n

vD

vD

B Small B

(a)

(b)

(c)

Fig. 18.5: An electron’s motion in a time-independent, inhomogeneous magnetic field. (a) The drift induced by the curvature of the field lines. (b) The drift induced by a transverse gradient of the magnitude of the magnetic field. (c) The change in electron pitch angle induced by a longitudinal gradient of the magnitude of the magnetic field.

A second kind of inhomogeneity is a transverse spatial gradient of the magnitude of B. As is shown in Fig. 18.5b, such a gradient causes the electron’s circular motion to be tighter (smaller radius of curvature of the circle) in the region of larger B than in the region of smaller B; and this difference in radii of curvature clearly induces a drift. It is straightforward to show that the resulting gradient drift velocity is vD =

2 −me v⊥ c B × ∇B . 2e B3

(18.60)

A third, and last, kind of inhomogeneity is a longitudinal gradient of the magnitude of B Fig. 18.5c. Such a gradient results from the magnetic field lines converging toward each other (or diverging away from each other). The effect of this convergence is most easily inferred in a frame that moves longitudinally with the electron. In such a frame the magnetic field changes with time, ∂B′ /∂t 6= 0, and correspondingly there is an electric field that satisfies ∇ × E′ = −∂B′ /∂t. The kinetic energy of the electron as measured in this longitudinally 2 moving frame is the same as the transverse energy 21 me v⊥ in the original frame. This kinetic ′ energy is forced to change by the electric field E . The change in energy during one circuit around the magnetic field is I Z ω 2 ∂B′ me v⊥ ∆B 1 c 2 ′ me v⊥ = −e E · d l = e · dA = e ∆B πrL2 = . (18.61) ∆ 2 ∂t 2π 2 B Here the second expression involves a line integral once around the electron’s circular orbit and has ∂B′ /∂t parallel to dA; the third expression involves a surface integral over the interior of the orbit; in the fourth the time derivative of the magnetic field has been expressed as (ωc /2π)∆B where ∆B is the change in magnetic field strength along the electron’s guiding center during one circular orbit. Equation (18.61) can be rewritten as a conservation law along the world line of the electron’s guiding center: 2 me v⊥ = constant . (18.62) 2B

27 2 Notice that the conserved quantity me v⊥ /2B is equal to 1/2π times the total magnetic flux threading through the electron’s circular orbit, πrL2 B; thus, the electron moves along the field lines in such a manner as to keep the magnetic flux enclosed within its orbit always constant; see Fig. 18.5c. A second interpretation of (18.62) is in terms of the magnetic 2 moment created by the electron’s circulatory motion; that moment is µ = (−me v⊥ /2B 2 )B; and its magnitude is the conserved quantity

µ=

2 me v⊥ = constant . 2B

(18.63)

An important consequence of the conservation law (18.62) is a gradual change in the electron’s pitch angle α ≡ tan−1 (v|| /v⊥ ) (18.64) as it spirals along the converging field lines: Because there is no electric field in the original frame, the electron’s total kinetic energy is conserved in that frame, 1 2 Ekin = me (v||2 + v⊥ ) = constant . 2

(18.65)

2 This, together with the constancy of µ = me v⊥ /2B and the definition (18.64) of the electron pitch angle, implies that the pitch angle varies with magnetic field strength as

tan2 α =

Ekin −1. µB

(18.66)

Notice that as the field lines converge, B increases in magnitude, and α decreases. Ultimately, when B reaches a critical value Bcrit = Ekin /µ, the pitch angle α goes to zero. The electron then “reflects” off the strong-field region and starts moving back toward weak fields, with increasing pitch angle. The location at which the electron reflects is called the electron’s mirror point. Figure 18.6 shows two examples of this mirroring. The first example is a “magnetic bottle.” Electrons whose pitch angles at the center of the bottle are sufficiently small have mirror points within the bottle and thus cannot leak out. The second example is the van Allen belts of the earth. Electrons (and also ions) travel up and down the magnetic field lines of the van Allen belts, reflecting at mirror points. It is not hard to show that the gradient of B can be split up into the three pieces we have studied: a curvature with no change of B = |B| (Fig. 18.5a), a change of B orthogonal to the magnetic field (Fig. 18.5b), and a change of B along the magnetic field (Fig. 18.5c). When (as we have assumed) the lengthscales of these changes are far greater than the electron’s Larmor radius, their effects on the electron’s motion superpose linearly.

18.7.4

A slowly time varying magnetic field

When the magnetic field changes on timescales long compared to the cyclotron period 2π/ωc , its changes induce alterations of the electron’s orbit that can be deduced with the aid of

28 B mirror point

B

mirror point

mirror point

B

mirror point

(a)

(b)

Fig. 18.6: Two examples of the mirroring of particles in an inhomogeneous magnetic field: (a) A magnetic bottle. (b) The earth’s van Allen belts.

adiabatic invariants—i.e., quantities that are invariant when the field changes adiabatically (slowly).12 2 The conserved magnetic moment µ = me v⊥ /2B associated with an electron’s transverse, circular motion is an example of an adiabatic invariant. We proved its invariance in Eqs. (18.61) and (18.62) above (where we were computing in a reference frame in which the magnetic field changed slowly, and associated with that change there was a weak electric field). This adiabatic invariant can be shown to be, aside from a constant multiplicative H factor 2πme c/e, the action associated with the electron’s circular motion, Jφ = pφ dφ. Here φ is the angle around the circular orbit and pφ = me v⊥ rL − eAφ is the φ component of the electron’s canonical momentum. The action Jφ is a well-known adiabatic invariant. Whenever a slightly inhomogeneous magnetic field varies slowly in time, not only is 2 µ = me v⊥ /2B adiabatically invariant (conserved); so also are two other actions. One is the action associated with motion from one mirror point of the magnetic field to another and back, I J|| = p|| · d l . (18.67) Here p|| = me v|| − eA|| = me v|| is the generalized (canonical) momentum along the field line, and d l is distance along the field line; so the adiabatic invariant is the spatial average hv|| i of the longitudinal speed of the electron, multiplied by twice the distance ∆l between mirror points, J|| = 2hv|| i∆l. The other (third) adiabatic invariant is the action associated with the drift of the guiding center: an electron mirroring back and forth along the field lines drifts sideways, and by its drift it traces out a 2-dimensional surface to which the magnetic field is parallel—e.g., the surface of the magnetic bottle in Fig. 18.6a. The action of the electon’s drift around this magnetic surface turns out to be proportional to the total magnetic flux enclosed within the surface. Thus, if the field geometry changes very slowly, the magnetic flux enclosed by the magnetic surface on which the electron moves is adiabatically conserved. 12

See e.g. Landau & Lifshitz (1960), Northrop (1963).

29 How nearly constant are the adiabatic invariants? The general theory of adiabatic invariants shows that, so long as the temporal changes of the magnetic field structure are smooth enough to be described by analytic functions of time, then the fractional failures of the adiabatic invariants to be conserved are of order e−τ /P , where τ is the timescale on which the field changes and P is the period of the motion associated with the adiabatic invariant (2π/ωc for the invariant µ; the mirroring period for the longitudinal action; the drift period for the magnetic flux enclosed in the electron’s magnetic surface). Because the exponential e−τ /P dies out so quickly with increasing timescale τ , the adiabatic invariants are conserved to very high accuracy whenever τ ≫ P . **************************** EXERCISES Exercise 18.12 Example: Mirror Machine One method for confining hot plasma is to arrange electric coils so as to make a mirror machine in which the magnetic field has the geometry sketched in Fig. 18.6a. Suppose that the magnetic field in the center is 1 T and the field strength at the two necks is 10 T, and that plasma is introduced with an isotropic velocity distribution near the center of the bottle. (a) What fraction of the plasma particles will escape? (b) Sketch the pitch angle distribution function for the particles that remain. (c) Suppose that Coulomb collisions cause particles to diffuse in pitch angle α with a diffusion coefficient ∆α2 = t−1 (18.68) Dαα ≡ D ∆t Estimate how long it will take most of the plasma to escape the mirror machine. (d) What do you suspect will happen in practice?

Exercise 18.13 Challenge: Penning Traps A clever technique for studying the behavior of individual electrons or ions is to entrap them using a combination of electric and magnetic fields. One of the simplest and most useful devices is the Penning trap.13 Basically this comprises a uniform magnetic field B combined with a hyperboloidal electrostatic field that is maintained between hyperboloidal electrodes as shown in Fig. 18.7. The electrostatic potential has the form Φ(x) = Φ0 (z 2 − x2 /2 − y 2 /2)/2d2, where Φ0 is the potential difference maintained across the electrodes, d is the √ minimum axial distance from the origin to the hyperloidal cap as well as 1/ 2 times the minimum radius of the ring electrode. (a) Show that the potential satisfies Laplace’s equation, as it must. 13

Brown & Gabrielse (1986)

30 B

e3

Cap

d

E

RING

RING

Cap

Fig. 18.7: Penning Trap for localizing individual charged particles. The magnetic field is uniform and parallel to the axis of symmetry e3 . The electric field is maintained between a pair of hyperboloidal caps and a hyperboloidal ring.

(b) Now consider an individual charged particle in the trap. Show that there can be three separate oscillations excited: (i) Cyclotron orbits in the magnetic field with angular frequency ωc , (ii) “Magnetron” orbits produced by E × B drift around the axis of symmetry with angular frequency ωm , (iii) Axial oscillations parallel to the magnetic field with angular frequency ωz . Assume that ωm ≪ ωz ≪ ωc and show that ωz2 ≃ 2ωm ωc . (c) Typically the potential difference across the electrodes is ∼ 10 V, the magnetic field strength is B ∼ 6 T, and the radius of the ring and the height of the caps above the center of the traps are ∼ 3 mm. Estimate the three independent angular frequencies for electrons and ions verifying the ordering ωm ≪ ωz ≪ ωc . Also estimate the maximum velocities associated with each of these oscillations if the particle is to be retained by the trap. (d) Solve the classical equation of motion exactly and demonstrate that the magnetron motion is formally unstable. Penning traps have been used to perform measurements of the electron-proton mass ratio and the magnetic moment of the electron with unprecedented precision.

****************************

Bibliographic Note For a very thorough treatment of the particle kinetics of plasmas, see Shkarofsky, Johnston and Bachynski (1966). For less detailed treatments see the relevant portions of Boyd and

31 Box 18.2 Important Concepts in Chapter 18 • Density-Temperature regime for plasmas, Sec. 18.2 and Fig. 18.1 • Examples of environments where plasmas occur, Sec. 18.2, Fig. 18.1, and Table 18.1 • Debye shielding, Debye length and Debye number, Secs. 18.3.1 and 18.3.2 • Plasma oscillations and plasma frequency, Sec. 18.3.3 • Coulomb logarithm, its role in quantifying the cumulative effects of small-angle Coulomb scatterings, and its typical values, Secs. 18.4.1, 18.4.2 • Deflection times tD and rates νD for Coulomb collisions (ee, pp and ep), Sec. 18.4.1 • Thermal equilibration times tE and rates νE = 1/tE for Coulomb collisions (ee, pp, and ep and their ratios), Sec. 18.4.3 • Electric and thermal conductivity for an unmagnetized plasma when the principal impediment is Coulomb scattering, Sec. 18.5 • Anomalous resistivity and equilibration, Sec. 18.5 • Cyclotron frequency for electrons and protons; Larmor radius, Sec. 18.6.1 • Anisotropic pressures, adiabatic indices, and electrical conductivity in a magnetized plasma, Secs. 18.6.2 and 18.6.3 • Drift velocities: E × B drift, curvature drift, and gradient drift, Secs. 18.7.2, and 18.7.3 • Adiabatic invariants for charged particle motion in an inhomogenous, time-varying magnetic field, and the character of the particle motion, Secs. 18.7.3, 18.7.4, and Fig. 18.6

Sanderson (1969), Krall and Trivelpiece (1973), Jackson (1999), Schmidt (1979), and especially Spitzer (1962). For particle motion in inhomogeneous and time varying magnetic fields, see Northrop (1963) and the relevant portions of Jackson (1999).

Bibliography Binney, J.J. and Tremaine, S.D. 1987. Galactic Dynamics, Princeton, NJ: Princeton University Press. Boyd, T.J.M. and Sanderson, J.J. 1969. Plasma Dynamics, London: Nelson. Brown, L.S. and Gabrielse, G. 1986. “Geonium theory: Physics of a single electron or ion in a Penning trap.” Reviews of Modern Physics, 58, 233.

32 Chew, C.F., Goldberger, M.L. and Low, F.E. 1956. Proceedings of the Royal Society of London, A236, 112. Crookes, W. (1879). Phil. Trans., 1, 135. Heaviside, O. 1902. ???? Jackson, J.D. 1999. Classical Electrodynamics, third edition. New York: Wiley. Kittel, C. 1958. Elementary Statistical Physics. New York: Wiley. Krall, N.A. and Trivelpiece, A.W. 1973. Principles of Plasma Physics. New York: McGraw Hill. Landau, L.D. and Lifshitz, E.M. 1960. Mechanics. Oxford: Pergamon Press. Leighton, R.B. 1959. Principles of Modern Physics. New York: McGraw Hill. Northrop, T.G. 1963. Adiabatic Motion of Charged Particles. New York: Interscience. Reif, F. 1965. Fundamentals of Statistical and Thermal Physics. New York: McGraw Hill. Rosenbluth, M.N., Macdonald, M., and Judd, D.L. 1957. Physical Review, ???. Schmidt, G. 1979. Physics of High Temperature Plasmas. New York: Academic Press. Shkarofsky, I.P., Johnston, T.W., and Bachynski, M.P. 1966. The Particle Kinetics of Plasmas, Reading Mass.: Addison-Wesley. Spitzer, Jr., L. 1962. Physics of Fully Ionized Gases, second edition. New York: Interscience.

Contents 19 Waves in Cold Plasmas: Two-Fluid Formalism 19.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Dielectric Tensor, Wave Equation, and General Dispersion Relation 19.3 Wave Modes in an Unmagnetized Plasma . . . . . . . . . . . . . . . 19.3.1 Two-Fluid Formalism . . . . . . . . . . . . . . . . . . . . . . 19.3.2 Dielectric Tensor and Dispersion Relation for a Cold Plasma 19.3.3 Electromagnetic Plasma Waves . . . . . . . . . . . . . . . . 19.3.4 Langmuir Waves and Ion Acoustic Waves in Warm Plasmas 19.3.5 Cutoffs and Resonances . . . . . . . . . . . . . . . . . . . . 19.4 Wave Modes in a Cold, Magnetized Plasma . . . . . . . . . . . . . 19.4.1 Dielectric Tensor and Dispersion Relation . . . . . . . . . . 19.4.2 Parallel Propagation . . . . . . . . . . . . . . . . . . . . . . 19.4.3 Perpendicular Propagation . . . . . . . . . . . . . . . . . . . 19.5 Propagation of Radio Waves in the Ionosphere . . . . . . . . . . . . 19.6 CMA Diagram for Wave Modes in Cold, Magnetized Plasma . . . . 19.7 Two Stream Instability . . . . . . . . . . . . . . . . . . . . . . . . .

0

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

1 1 3 5 5 6 8 9 13 16 16 17 20 22 25 28

Chapter 19 Waves in Cold Plasmas: Two-Fluid Formalism Version 0819.1.K.pdf, 08 April 2009. Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 19.1 Reader’s Guide • This chapter relies significantly on: – Chapter 19 on the particle kinetics of plasmas. – The basic concepts of fluid mechanics, Secs. 12.4 and 12.5. – Magnetosonic waves, Sec. 17.7. – The basic concepts of geometric optics, Secs. 6.2 and 6.3 • The remaining chapters 21 and 22 of Part V, Plasma Physics, rely heavily on this chapter.

19.1

Overview

The growth of plasma physics came about historically through the studies of oscillations in electric discharges and the contemporaneous development of the means to broadcast radio waves over increasing distances by reflecting them off a layer of partially ionized gas in the upper atmosphere known as the ionosphere. It is therefore not surprising that most early research was devoted to describing the different modes of wave propagation. Even in the simplest, linear approximation, we will see that the variety of possible modes is immense. In the previous section, we have introduced several length and time scales, e.g. the gyro radius 1

2 and Debye length, the plasma period, the gyro period and the collision frequency. To these must now be added the wavelength and period of the wave under study. The relative ordering of these different scales controls the characteristics of the waves, and it is already apparent that there are a bewildering number of possibilities. If we further recognize that plasmas are collisionless and that there is no guarantee that the particle distribution functions can be characterized by a single temperature, then the possibilities multiply. Fortunately, the techniques needed to describe the propagation of linear wave perturbations in a particular equilibrium configuration of a plasma are straightforward and can be amply illustrated by studying a few simple cases. In this section, we shall follow this course by restricting our attention to one class of modes, those where we can either ignore completely the thermal motions of the ions and electrons that comprise the plasma (in other words treat these species as cold ) or include them using just a velocity dispersion or temperature. We can then apply our knowledge of fluid dynamics by treating the ions and electrons separately as fluids, upon which act electromagnetic forces. This is called the two-fluid formalism for plasmas. In the next chapter, we shall show when and how waves are sensitive to the actual distribution of particle speeds by developing the more sophisticated kinetic-theory formalism and using it to study waves in warm plasmas. We begin our two-fluid study of plasma waves in Sec. 19.2 by deriving a very general wave equation, which governs weak waves in a homogeneous plasma that may or may not have a magnetic field, and also governs electromagnetic waves in any other dielectric medium. That wave equation and the associated dispersion relation for the wave modes depend on a dielectric tensor, which must be derived from an examination of the motion of the electrons and protons (or other charge carriers) inside the wave. In Sec. 19.3 we specialize to wave modes in a uniform, unmagnetized plasma. Using a two-fluid (electron-fluid and proton-fluid) description of the charge-carriers’ motions, we derive the dielectric tensor and thence the dispersion relation for the wave modes. The modes fall into two classes: (i) Transverse or electromagnetic modes. These are modified versions of electromagnetic waves in vacuum. As we shall see, they can propagate only at frequencies above the plasma frequency; at lower frequencies they become evanescent. (ii) Longitudinal waves, which come in two species: Langmuir waves and ion acoustic waves. Longitudinal waves are a melded combination of sound waves in a fluid and electrostatic plasma oscillations; their restoring force is a mixture of thermal pressure and electrostatic forces. In Sec. 19.4, we explore how a uniform magnetic field changes the character of these waves. The B field makes the plasma anisotropic but axially symmetric. As a result, the dielectric tensor, dispersion relation, and wave modes have much in common with those in an anisotropic but axially symmetric dielectric crystal, which we studied in the context of nonlinear optics in Chap. 9. A plasma, however, has a much richer set of characteristic frequencies than does a crystal (electron plasma frequency, electron cyclotron frequency, ion cyclotron frequency, ...). As a result, even in the regime of weak linear waves and a cold plasma (no thermal pressure), the plasma has a far greater richness of modes than does a crystal. In Sec. 19.4 we derive the general dispersion relation that encompasses all of these coldmagnetized-plasma modes, and explore the special cases of modes that propagate parallel to

3 and perpendicular to the magnetic field. Then in Sec. 19.5 we examine a practical example: the propagation of radio waves in the Earth’s ionosphere, where it is a good approximation to ignore the ion motion and work with a one-fluid (i.e. electron-fluid) theory. Having gained insight into simple cases (parallel modes, perpendicular modes, and one-fluid modes), we return in Sec. 19.6 to the full class of linear modes in a cold, magnetized two-fluid plasma and briefly describe some tools by which one can make sense of them all. Finally, in Sec. 19.7 we turn to the question of plasma stability. In Part IV we saw that fluid flows that have sufficient shear are unstable; perturbations can feed off the relative kinetic energy of adjacent regions of the fluid, and use that energy to power an exponential growth. In plasmas, with their long mean free paths, there can similarly exist kinetic energies of relative, ordered, motion in velocity space; and perturbations, feeding off those energies, can grow exponentially. To study this in full requires the kinetic-theory description of a plasma, which we develop in Chap. 21; but in Sec. 19.7 we get insight into a prominent example of such a velocity-space instability by analyzing two cold plasma streams moving through each other. We illustrate the resulting two-stream instability by a short discussion of particle beams that are created in disturbances on the surface of the sun and propagate out through the solar wind.

19.2

Dielectric Tensor, Wave Equation, and General Dispersion Relation

We begin our study of waves in plasmas by deriving a very general wave equation which applies equally well to electromagnetic waves in unmagnetized plasmas, in magnetized plasmas, and in any other kind of dielectric medium such as an anisotropic crystal. This wave equation is the same one as we used in our study of nonlinear optics in Chap. 9 [Eqs. (9.38) and (9.39a)], and the derivation is essentially the same as that sketched in Exercise 9.6: When a wave propagates through a plasma (or other dielectric), it entails a relative motion of electrons and protons (or other charge carriers). Assuming the wave has small enough amplitude to be linear, those charge motions can be embodied in an oscillating polarization (electric dipole moment per unit volume) P(x, t), which is related to the plasma’s (or dielectric’s) varying charge density ρe and current density j in the usual way ρe = −∇ · P ,

j=

∂P . ∂t

(19.1)

(These relations enforce charge conservation, ∂ρe /∂t + ∇ · j = 0.) When these ρe and j are inserted into the standard Maxwell equations for E and B, one obtains ∇·E=−

∇·P , ǫ0

∇·B =0 ,

∇×E= −

∂B , ∂t

∇ × B = µ0

∂P 1 ∂E + 2 . ∂t c ∂t

(19.2)

If the plasma is endowed with a uniform magnetic field Bo , that field can be left out of these equations, as its divergence and curl are guaranteed to vanish. Thus, we can regard E, B and P in these Maxwell equations as the perturbed quantities associated with the waves.

4 From a detailed analysis of the response of the charge carriers to the oscillating E and B fields, one can deduce a linear relationship between the waves’ electric field E and the polarization P, Pj = ǫo χjk Ek . (19.3) Here ǫ0 is the vacuum permitivity and χjk is a dimensionless, tensorial electric susceptibility [cf. Eq. (9.21)]. A different, but equivalent, viewpoint on the relationship between P and E can be deduced by taking the time derivative of Eq. (19.3), setting ∂P/∂t = j, assuming a sinusoidal time variation e−iωt so ∂E/∂t = −iωE, and then reinterpreting the result as Ohm’s law with a tensorial electric conductivity κe jk : jj = κe jk Ek ,

κe jk = −iωǫ0 χjk .

(19.4)

Evidently, for sinusoidal waves the electric susceptibility χjk and the electric conductivity κe jk embody the same information about the wave-particle interactions. That information is also embodied in a third object: the dimensionless dielectric tensor ǫjk , which relates the electric displacement D to the electric field E: Dj ≡ ǫ0 Ej + Pj = ǫ0 ǫjk Ek ,

ǫjk = δjk + χjk = δjk +

i κe jk . ǫ0 ω

(19.5)

In the next section we shall derive the value of the dielectric tensor ǫjk for waves in an unmagnetized plasma, and in Sec. 19.3 we shall derive it for a magnetized plasma. Using the definition D = ǫ0 E + P, we can eliminate P from equations (19.2), thereby obtaining the familiar form of Maxwell’s equations for dielectric media with no non-polarizationbased charges or currents: ∇·D=0,

∇·B =0 ,

∇×E= −

∂B , ∂t

∇ × B = µ0

∂D . ∂t

(19.6)

By taking the curl of the third of these equations and combining with the fourth and with Dj = ǫ0 ǫjk Ek , we obtain the wave equation that governs the perturbations: ∇2 E − ∇(∇ · E) − ǫ ·

1 ∂2E =0, c2 ∂t2

(19.7)

where ǫ is our index-free notation for ǫjk . Specializing to a plane-wave mode with wave vector k and angular frequency ω, so E ∝ eikx e−iωt , we convert this wave equation into a homogeneous algebraic equation for the Cartesian components of the electric vector Ej (cf. Box 11.2): Lij Ej = 0 . (19.8) where Lij = ki kj − k 2 δij +

ω2 ǫij . c2

(19.9)

5 The algebratized wave equation (19.8) can have a solution only if the determinant of the three-dimensional matrix Lij vanishes: 2 ω 2 det||Lij || ≡ det ki kj − k δij + 2 ǫij . c

(19.10)

For a plasma this will be a polynomial equation for the angular frequency as a function of the wave vector (with ω and k appearing not only explicity in Lij but also implicitly in the functional form of ǫjk ). Each solution, ω(k), of this equation will be the dispersion relation for a particular wave mode. We therefore can regard Eq. (19.10) as the general dispersion relation for plasma waves—and for linear electromagnetic waves in any other kind of dielectric medium. To obtain an explicit form of the dispersion relation (19.8), we must give a prescription for calculating the dielectric tensor ǫij , or equivalently the conductivity tensor κe ij or thesusceptibility tensor χij . The simplest prescription involves treating the electrons and ions as independent fluids.

19.3

Wave Modes in an Unmagnetized Plasma

19.3.1

Two-Fluid Formalism

We now specialize to waves in a homogeneous, unmagnetized electron-proton plasma. The plasma necessarily contains rapidly moving electrons and ions, and their individual responses to an applied electromagnetic field depend on their velocities. In the simplest model of this response, we average over all the particles in a species (electrons or protons in this case) and treat them collectively as a fluid. Now, the fact that all the electrons are treated as one fluid does not mean that they have to collide with one another many times in each wave period. In fact, as we have already emphasized in Chap. 19, electron-electron collisions are usually quite rare and we usually ignore them. Nevertheless, we can still define a mean fluid velocity for both the electrons and the protons by averaging over their total velocity distribution functions just as we would for a gas: us = hvis ;

s = p, e ,

(19.11)

where the subscripts p and e refer to protons and electrons. Similarly, for each fluid we define a pressure tensor using the fluid’s dispersion of particle velocities: Ps = ns ms h(v − us ) ⊗ (v − us )i

(19.12)

[cf. Eqs. 18.42) and (18.43)]. Consider, first, the unperturbed plasma in the absence of a wave and work in a frame in which the proton fluid velocity up vanishes. By assumption the equilibrium is spatially uniform. Therefore there can be no spatial variation in the electric field and no net charge density. (If there were an electric field, then charges would quickly flow to neutralize it.)

6 Therefore, the electron density must equal the proton density. Furthermore, there can be no net current as this would lead to a variation in the magnetic field, so since the proton current vanishes, the electron current = −ene ue must also vanish and the electron fluid velocity ue must also vanish in our chosen frame. Now apply an electromagnetic perturbation. This will induce a small, oscillating fluid velocity us in both the proton and electron fluids. It should not worry us that the fluid velocity is small compared with the random speeds of the constituent particles; the same is true in any subsonic gas dynamical flow, but the fluid description remains good there and so also here. The oscillating density ns and oscillating mean velocity us of each species s must satisfy the equation of continuity (particle conservation) ∂ns + ∇ · (ns us ) = 0 ∂t

(19.13)

and the equation of motion (momentum conservation—Euler equation with Lorentz force added to the right side) ∂us + (us · ∇)us = −∇ · Ps + ns qs (E + us × B) . ns ms (19.14) ∂t In these equations and below, qs = ±e is the particles’ charge (positive for protons and negative for electrons). Note that, as collisions are ineffectual, we cannot assume that the pressure tensor is isotropic.

19.3.2

Dielectric Tensor and Dispersion Relation for a Cold Plasma

Continuing to keep the plasma unmagnetized, let us further simplify matters (temporarily) by restricting ourselves to a cold plasma, so the tensorial pressures vanish, Ps = 0. As we are only interested in linear wave modes, we rewrite Eqs. (19.13), (19.14) just retaining terms that are first order in perturbed quantities, i.e. dropping the (us · ∇)us and us × B terms. Then, focusing on a plane-wave mode, ∝ exp[i(k · x − ωt)], we bring the equation of motion (19.14) into the form −iωns ms us = qs ns E (19.15) for each species, s = p, e. From this, we can immediately deduce the linearized current density X X ins q 2 s j= ns qs us = E, (19.16) m ω s s s from which we infer that the conductivity tensor κe has Cartesian components κe ij =

X ins q 2 s

s

ms ω

δij ,

(19.17)

where δij is the Kronecker delta. Note that the conductivity is purely imaginary, which means that the current oscillates out of phase with the applied electric field, which in turn

7 implies that there is no time-averaged ohmic energy dissipation, hj · Ei = 0. Inserting the conductivity tensor (19.17) into the general equation (19.5) for the dielectric tensor, we obtain ωp2 i (19.18) κe ij = 1 − 2 δij . ǫij = δij + ǫ0 ω ω Here and throughout this chapter, the plasma frequency ωp is very slightly different from that used in Chap. 19: it includes a tiny (1/1860) correction due to the motion of the protons, which was neglected in the analysis of plasma oscillations in Sec. 18.3.3: ωp2

X ns q 2 ne2 me s = 1+ . = ms ǫ0 me ǫ0 mp s

(19.19)

Note that because there is no physical source of a preferred direction in the plasma, the dielectric tensor (19.18) is isotropic. Now let the waves propagate in the z direction, without loss of generality, so k = kez . Then the algebratized wave operator (19.9), with ǫ given by (19.18), takes the following form:   ωp2 c2 k 2 1 − − 0 0 ω2 ω2 ω2   2 2 ω2 Lij = 2  (19.20)  . 0 0 1 − cωk2 − ωp2 c ωp2 0 0 1 − ω2 The corresponding dispersion relation det||Ljk || becomes

c2 k 2 ωp2 1− 2 − 2 ω ω

2 ωp2 1− 2 =0. ω

(19.21)

This is a polynomical equation of order 6 for ω as a function of k, so formally there are six solutions corresponding to three pairs of modes propagating in opposite directions. Two of the pairs of modes are degenerate with frequency ω=

q

ωp2 + c2 k 2 .

(19.22)

We shall study them in the next subsection. The remaining pair of modes exist at a single frequency, ω = ωp . (19.23) These must be the electrostatic plasma oscillations that we studied in Sec. 18.3.3, (though now with an arbitrary wave number k while in Sec. 18.3.3 the wave number was assumed zero.) In Sec. 19.3.4 we shall show that this is so and shall explore how these plasma oscillations get modified by finite-temperature effects.

8

19.3.3

Electromagnetic Plasma Waves

p To learn the physical nature of the modes with dispersion relation ω = ωp2 + c2 k 2 [Eq. (19.22)], we must examine the details of their electric-field oscillations, magnetic-field oscillations, and electron and proton motions. The first key to this is the algebratized wave equation Lij Ej = 0, with Lij specialized to the dispersion relation (19.22): ||Lij || = diag[0, 0, (ω 2 − ωp2 )/c2 ]. In this case the general solution to Lij Ej = 0 is an electric field that lies in the x–y plane (transverse plane), i.e. that is orthogonal to the waves’ propagation vector k = kez . The third of the Maxwell equations (19.2) implies that the magnetic field is B = (k/ω) × E ,

(19.24)

which also lies in the transverse plane and is orthogonal to E. Evidently, these modes are close analogs of electromagnetic waves in vacuum; correspondingly, they are known as the plasma’s electromagnetic modes. The electron and proton motions in these modes, as given by Eq. (19.15), are oscillatory displacements in the direction of E but out of phase with E. The amplitudes of the fluid motions vary as 1/ω; as one decreases ω, the fluid amplitudes grow. The dispersion relation for these modes, Eq. (19.22), implies that they can only propagate (i.e. have real angular frequency when the wave vector is real) if ω exceeds the plasma frequency. As ω is decreased toward ωp , k → 0 so these modes become electrostatic plasma oscillations with arbitrarily long wavelength orthogonal to the oscillation direction, i.e., they become a spatially homogeneous variant of the plasma oscillations studied in Sec. 18.3.3. At ω < ωp these modes become evanescent. In their regime of propagation, ω > ωp , these cold-plasma electromagnetic waves have a phase velocity given by −1/2 ωp2 ωˆ ˆ, Vph = k = c 1 − 2 k (19.25) k ω ˆ ≡ k/k is a unit vector in the propagation direction. Although this phase velocity where k exceeds the speed of light, causality is not violated because information (and energy) propagate at the group velocity, not the phase velocity. The group velocity is readily shown to be 1/2 ωp2 ∂ω c2 k ˆ, Vg = = =c 1− 2 k (19.26) ∂k ω ω which is less than c. These cold-plasma electromagnetic modes transport energy and momentum just like wave modes in a fluid. There are three contributions to the waves’ mean (time-averaged) energy density: the electric, the magnetic and the kinetic energy densities. (If we had retained the pressure then there would have been an additional contribution from the internal energy.) In order to compute these mean energy densities, we must form the time average of products of physical quantities. Now, we have used the complex representation to denote each of our oscillating quantities (e.g. Ex ), so we must be careful to remember that Aei(k·x−ωt) is an abbreviation for the real part of this quantity—which is the physical A. It is easy to show

9 that the time-averaged value of the physical A times the physical B (which we shall denote by hABi) is given in terms of their complex amplitudes by hABi =

AB ∗ + A∗ B . 4

(19.27)

Using Eqs. (19.24) and (19.25), we can write the magnetic energy density in the form hB i/2µ0 = (1−ωp2 /ω 2)ǫ0 hE 2 i/2. Using Eq. (19.16), the electron kinetic energy is ne me hu2e i/2 = 2 (ωpe /ω 2 )ǫ0 hE 2 i/2 and likewise for the proton kinetic energy. Summing these contributions and using Eq. (19.27), we obtain 2

U =

ǫ0 EE ∗ BB ∗ X ns ms us u∗s + + 4 4µ0 4 s

ǫ0 EE ∗ . = 2

(19.28)

The mean energy flux in the wave is carried (to quadratic order) by the electromagnetic field and is given by the Poynting flux. (The kinetic-energy flux vanishes to this order.) A straightforward calculation gives S = hE × Bi = = UVg ,

E × B∗ + E ∗ × B EE ∗ k = 4 2µ0 ω (19.29)

where we have used µ0 = c−2 ǫ−1 0 . We therefore find that the energy flux is the product of the energy density and the group velocity, as is true quite generally; cf. Sec. 6.3. (If it were not true, then a localized wave packet, which propagates at the group velocity, would move along a different trajectory from its energy, and we would wind up with energy in regions with vanishing amplitude!)

19.3.4

Langmuir Waves and Ion Acoustic Waves in Warm Plasmas

For our case of a cold, unmagnetized plasma, the third pair of modes embodied in the dispersion relation (19.21) only exists at a single frequency, the plasma frequency ω = ωp . These modes’ wave equation Lij Ej = 0 with ||Lij || = diag(−k 2 , −k 2 , 0) [Eq. (19.20) with ω 2 = ωp2 ] implies that E points in the z-direction, i.e., along k; the Maxwell equations then imply B = 0, and the equations of motion (19.15) imply that the fluid displacements are in the direction of E. Clearly, these modes, like electromagnetic modes in the limit k = 0 and ω = ωp , are electrostatic plasma oscillations. However, in this case where the spatial variations of E and us are along the direction of oscillation instead of perpendicular, k is not constrained to vanish; rather, all wave numbers are allowed. This means that the plasma can undergo plane-parallel oscillations at ω = ωp with displacements in some Cartesian z-direction, and any arbitrary z-dependent amplitude that one might wish. But these oscillations cannot transport energy; because ω is independent of k, their group velocity Vg = ∇k ω vanishes. So far we have confined ourselves to wave modes in cold plasmas and have ignored thermal motions of the particles. When thermal motions are turned on, the resulting thermal pressure

10 gradients convert longitudinal plasma oscillations, at finite wave number k, into propagating, energy-transporting modes called Langmuir waves. As we have already intimated, because the plasma is collisionless, to understand the thermal effects fully we must turn to kinetic theory (Chap. 21). However, within the present chapter’s two-fluid formalism, we can deduce the leading order effects of finite temperature with the guidance of physical arguments: The key to the Langumir waves’ propagation is the warm electrons’ thermal pressure. (The proton pressure is unimportant because the protons oscillate electrostically with an amplitude that is tiny compared to the electrons.) Now, in an adiabatic sound wave in a fluid (where the particle mean free paths are small compared to the wavelength), we relate the pressure perturbation to the density perturbation by assuming that the entropy is constant. In other words, we write ∇P = c2s m∇n, where cs = (γP/nm)1/2 is the adiabatic sound speed, n is the particle density, m the particle mass, and γ is the specific heat ratio, which is 5/3 for a monatomic gas. However, the electron gas is collisionless and we are only interested in the tensorial pressure gradient parallel to k (which we take to point in the z direction), δPe zz,z . We can therefore ignore all electron motion perpendicular to the wave vector as this is not coupled to the parallel motion. (This would not be a valid assumption if a strong magnetic field were present.) The electron motion is then effectively one dimensional as there is only one (translational) degree of freedom. The relevant specific heat at constant volume is therefore just kB /2 per electron, while that at constant pressure is 3kB /2, giving γ = 3. The effective sound speed for the electron gas is then cs = (3kB Te /me )1/2 , and correspondingly the perturbations of longitudinal electron pressure and electron density are related by δPe zz 3kB Te = c2s = . me δne me

(19.30)

This is one of the equations governing Langmuir waves. The others are the linearized equation of continuity (19.13) which relates the electrons’ density perturbation to the longitudinal component of their fluid velocity perturbation k δne = ne ue z , ω

(19.31)

the linearized equation of motion (19.14) which relates ue z and δPe zz to the longitudinal component of the oscillating electric field −iωne me ue z = ikδPe zz − ne eEz ,

(19.32)

and the linearized form of Poisson’s equation ∇ · E = ρe /ǫ0 which relates Ez to δne ikEz = −

δne e . ǫ0

(19.33)

Equations (19.30)–(19.33) are four equations for three ratios of the perturbed quantities. By combining these equations, we obtain a condition that must be satisfied in order for them to have a solution: 3kB Te 2 2 2 k = ωpe (1 + 3k 2 λ2D ) ; ω 2 = ωpe + (19.34) me

11 /m e Te

Vph =c

ω

3k B =√

EM ωp c u

Vph

Langmuir

to ff

resonance

Ion Acoustic

ω pp /m p k BTe Vph=√

1 λD

√ Tep λ1 T

k D

Fig. 19.1: Dispersion relations for electromagnetic waves, Langmuir waves, and ion-acoustic waves in an unmagnetized plasma. In the dotted regions the waves are strongly damped, according to kinetic-theory analyses in Chap. 21.

p here λD = ǫ0 kB Te /ne e2 is the Debye length [Eq. (18.10)]. Equation (19.34) is the BohmGross dispersion relation for Langmuir waves. From this dispersion relation we deduce the phase speed of a Langmuir wave: Vph =

kB Te me

1/2 3+

1 2 k λ2D

1/2

.

(19.35)

Evidently, when the wavelength is less than or of order the Debye length (kλD & 1), the phase speed becomes comparable with the electron thermal speed. It is then possible for individual electrons to transfer energy between adjacent compressions and rarefactions in the wave. As we shall see in the next chapter, when we recover Eq. (19.34) from a kinetic treatment, the resulting energy transfers damp the wave. Therefore, the Bohm-Gross dispersion relation is only valid for wavelengths much longer than the Debye length, i.e. kλD ≪ 1; cf. Fig. 19.1. In our analysis of Langmuir waves, we have ignored the proton motion. This is justified as long as the proton thermal speeds are small compared to the electron thermal speeds, i.e. Tp ≪ mp Te /me , which will almost always be the case. Proton motion is, however, not ignorable in a second type of plasma waves that owe its existence to finite temperature: ion acoustic waves. These are waves that propagate with frequencies far below the electron plasma frequency—frequencies so low that the electrons remain locked electrostatically to the protons, keeping the plasma charge neutral and preventing electromagnetic fields from participating in the oscillations. As for Langmuir waves, we can derive the ion-acoustic dispersion relation using fluid theory combined with physical arguments: In the next chapter, using kinetic theory we shall see that ion acoustic waves can propagate only when the proton temperature is very small compared with the electron temperature, Tp ≪ Te ; otherwise they are damped. (Such a temperature disparity is produced, e.g., when a plasma passes through a shock wave, and it can be maintained for a long time because Coulomb collisions are so ineffective at restoring Tp ∼ Te ; cf. Sec. 18.4.3.) Because Tp ≪ Te ,

12

height

resonance

low high electron density

cutoff

low

ion acoustic wave

EM wave

Fig. 19.2: Cutoff and resonance illustrated by wave propagation in the Earth’s ionosphere. The thick, arrowed curves are rays and the thin, dashed curves are phase fronts. The electron density is proportional to the darkness of the shading.

the proton pressure can be ignored and the waves’ restoring force is provided by electron pressure. Now, in an ion acoustic wave, by contrast with a Langmuir wave, the individual thermal electrons can travel over many wave lengths during a single wave period, so the electrons remain isothermal as their mean motion oscillates in lock-step with the protons’ mean motion. Correspondingly, the electrons’ effective (one-dimensional) specific heat ratio is γeff = 1. Although the electrons provide the ion-acoustic waves’ restoring force, the inertia of the electrostatically-locked electrons and protons is almost entirely that of the heavy protons. Correspondingly, the waves’ phase velocity is Via =

γeff Pe np mp

1/2

kˆ =

kB Te mp

1/2

kˆ ,

(19.36)

[cf. Ex. 19.4] and the dispersion relation is ω = Vph k = (kB Te /mp )1/2 k. From this phase velocity and our physical description of these ion-acoustic waves, it should be evident that they are the magnetosonic waves of MHD theory (Sec. 17.7.2), in the limit that the plasma’s magnetic field is turned off. In Ex. 19.4, we show that the character of these waves gets modified when their wavelength becomes of order the Debye length, i.e. when kλD ∼ 1. The dispersion relation then gets modified to 1/2 kB Te /mp ω= (19.37) k, 1 + λ2D k 2 which means that for kλ p pD ≫ 1, the waves’ frequency approaches the proton plasma frequency 2 ωpp ≡ ne /ǫ0 mp ≃ me /mp p ωp . A kinetic-theory treatment reveals that these waves are strong damped when kλD & Te /Tp . These features of the ion-acoustic dispersion relation are illustrated in Fig. 19.1.

13

19.3.5

Cutoffs and Resonances

Electomagnetic waves, Langmuir waves and ion-acoustic waves in an unmagnetized plasma provide examples of cutoffs and resonances. A cutoff is a frequency at which a wave mode ceases to propagate, because its wave number k there becomes zero. Langmuir and electromagnetic waves at ω → ωp are examples; see Fig. 19.1. Consider, for concreteness, a monochromatic radio-frequency electromagnetic wave propagating upward into the earth’s ionosphere at some nonzero angle to the vertical (left side of Fig. 19.2), and neglect the effects of the earth’s magnetic field. As the wave moves deeper (higher) into the ionosphere, it encounters a rising electron density n and correspondingly a rising plasma frequency ωp . The wave’s wavelength will typically be small compared to the inhomogeneity scale for ωp , so the wave propagation can be analyzed using geometric optics (Sec. 6.3). Across a phase front, that portion which is higher in the ionosphere will have a smaller k and thus a larger wavelength and phase speed, and thus a greater distance between phase fronts (dashed lines). Therefore, the rays, which are orthogonal to the phase fronts, will bend away from the vertical (left side of Fig. 19.2); i.e., the wave will be reflected away from the cutoff at which ωp → ω and k → 0. Clearly, this behavior is quite general. Wave modes are generally reflected from regions in which slowly changing plasma conditions give rise to cutoffs. A resonance is a frequency at which a wave mode ceases to propagate because its wave number k there becomes infinite, i.e. its wavelength goes to zero. Ion-acoustic waves provide an example; see Fig. 19.1. Consider, for concreteness, an ion-acoustic wave deep within the ionosphere, where ωpp is larger than the wave’s frequency ω. As the wave propagates toward the upper edge of the ionosphere, at some nonzero angle to the vertical, the portion of a phase front that is higher sees a smaller electron density and thus a smaller ωpp , and thence has a larger k and shorter wavelength, and thus a shorter distance between phase fronts (dashed lines). This causes the rays to bend toward the vertical (right side of Fig. 19.2). The wave is “attracted” into the region of the resonance, ω → ωpp , k → ∞, where it gets “Landau damped” (Chap. 21) and dies. This behavior is quite general. Wave modes are generally attracted toward regions in which slowly changing plasma conditions give rise to resonances, and upon reaching a resonance, they die. We shall study wave propagation in the ionosphere in greater detail in Sec. 19.5 below. **************************** EXERCISES Exercise 19.1 Derivation: Time-averaged energy density in a wave mode Verify Eq. (19.27). Exercise 19.2 Example: Effect of Collisional Damping Consider a transverse electromagnetic wave mode propagating in an unmagnetized, partially ionized gas in which the electron-neutral collision frequency is νe . Include the effects of collisions in the electron equation of motion, Eq. (19.15), by introducing a term −ne me νe ue on the right hand side. Ignore ion motion and electron-ion and electron-electron collisions.

14 Derive the dispersion relation when ω ≫ νe and show by explicit calculation that the rate of loss of energy per unit volume (−∇ · S, where S is the Poynting flux) is balanced by the Ohmic heating of the plasma. (Hint: It may be easiest to regard ω as real and k as complex.) Exercise 19.3 Problem: Fluid Drifts in a Time-Independent Plasma Consider a hydrogen plasma described by the two-fluid formalism. Suppose that Coulomb collisions have had time to isotropize the electrons and protons and to equalize their temperatures so that their partial pressures Pe = ne kB T and Pp = np kB T are equal. An electric field E created by external charges is applied. (a) Using the law of force balance for fluid s, show that its drift velocity perpendicular to the magnetic field is vs⊥ =

E × B ∇Ps × B ms − − [(vs · ∇)vs ]⊥ × B . B2 qs ns B 2 qs B 2

(19.38)

The first term is the E × B drift discussed in Chap. 19. (b) The second term, called the “diamagnetic drift,” is different for the electrons and the protons. Show that this drift produces a current density perpendicular to B given by j⊥ =

(∇P ) × B , B2

(19.39)

where P is the total pressure. (c) The third term can be called the “drift-induced drift”. Show that if the electrons are nearly locked to the ion motion, then the associated current density is well approximated by ρ (19.40) j⊥ = − 2 [(v · ∇)v] × B , B where ρ is the density and v is the average fluid speed. These results are equivalent to those obtained using a pure MHD treatment. Exercise 19.4 Derivation: Ion Acoustic Waves Ion acoustic waves can propagate in an unmagnetized plasma when the electron temperature Te greatly exceeds the ion temperature. In this limit the electron density ne can be approximated by ne = n0 exp(eΦ/kB Te ), where n0 is the mean electron density and Φ is the electrostatic potential. (a) Show that the nonlinear equations of continuity and motion for the ion (proton) fluid and Poisson’s equation for the potential take the form ∂n ∂(nu) + =0, ∂t ∂z ∂u e ∂Φ ∂u +u =− , ∂t ∂z mp ∂z e ∂2Φ = − (n − n0 eeΦ/kB Te ) , 2 ∂z ǫ0

(19.41)

15 where n is the proton density, u is the proton fluid velocity, and the waves propagate in the z direction. (b) Linearize these equations and show that the dispersion relation for small-amplitude ion acoustic modes is −1/2 1/2 kB Te /mp 1 = k, (19.42) ω = ωpp 1 + 2 2 λD k 1 + λ2D k 2 where λD is the Debye length. Verify that in the long-wavelength limit, this agrees with Eq. (19.36). Exercise 19.5 Challenge: Ion Acoustic Solitons In this exercise we shall explore nonlinear effects in ion acoustic waves (Ex. 19.4), and shall show that they give rise to solitons that obey the same Korteweg-de Vries equation as governs solitonic water waves (Sec. 15.3). (a) Introduce an expansion parameter ε ≪ 1 and expand the ion density, ion velocity and potential in the form n = n0 (1 + εn1 + ε2 n2 + . . . ) , u = (kB Te /mp )1/2 (εu1 + ε2 u2 + . . . ) , Φ = (kB Te /e)(εΦ1 + ε2 Φ2 + . . . ) . Change independent variables from (t, z) to (τ, η) where √ η = 2ε1/2 λ−1 [z − (kB Te /mp )1/2 t] , √ 3/2 D τ = 2ε ωpp t .

(19.43)

(19.44)

By substituting Eqs. (19.43) and (19.44) into the nonlinear equations (19.41) and equating terms of the same order, show that n1 , u1 , Φ1 each satisfy the Korteweg-de Vries equation (15.32): ∂ζ ∂3ζ ∂ζ +ζ + 3 =0. (19.45) ∂τ ∂η ∂η (b) In Sec. 15.3 we discussed the exact, single-soliton solution (15.33) to this KdV equation. Show that for an ion-acoustic soliton, this solution propagates with the physical speed (1 + ε)(kB Te /mp )1/2 , which is greater the larger is the wave’s amplitude ε.

****************************

16

19.4

Wave Modes in a Cold, Magnetized Plasma

19.4.1

Dielectric Tensor and Dispersion Relation

We now complicate matters somewhat by introducing a uniform magnetic field B0 into the unperturbed plasma. To avoid further complications, we make the plasma cold, i.e. we omit thermal effects. The linearized equation of motion for each species then becomes −iωus =

qs qs E + us × B0 . ms ms

(19.46)

It is convenient to multiply this equation of motion by ns qs /ǫ0 and introduce a scalar plasma frequency and scalar and vectorial cyclotron frequencies for each species ωps =

ns qs2 ǫ0 ms

1/2

,

ωcs =

qs B0 , ms

ˆ 0 = qs B0 ω cs = ωcs B ms

(19.47)

p [so ωpp = (me /mp ) ωpe ≃ ωpe /43, ωce < 0, ωcp > 0, and ωcp = (me /mp )|ωce | ≃ |ωce |/1860]. Thereby we bring the equation of motion into the form nqs nqs 2 −iω us + ω cs × us = ωps E. (19.48) ǫ0 ǫ0 By combining this equation with ω cs ×(this equation), we can solve for the fluid velocity of species s as a linear function of the electric field E: 2 2 2 ωps ωωps iωps ns qs E − us = −i ω × E + ω ω ·E. (19.49) cs cs 2 − ω2 2 − ω2) 2 − ω 2 )ω cs ǫ0 ωcs (ωcs (ωcs (This relation is useful when one tries to deduce the physical P properties of a wave mode.) From this fluid velocity we can read off the current j = s ns qs us as a linear function of E; by comparing with Ohm’s law j = κe · E, we then obtain the tensorial conductivity κe , which we insert into Eq. (19.18) to get the following expression for the dielectric tensor (in which B0 and thence ω cs is taken to be along the z axis):   ǫ1 −iǫ2 0 0  , ǫ =  iǫ2 ǫ1 (19.50) 0 0 ǫ3 where ǫ1 = 1 −

X s

2 ωps , 2 ω 2 − ωcs

ǫ2 =

X s

2 ωps ωcs , 2 2 ) ω(ω − ωcs

ǫ3 = 1 −

2 X ωps s

ω2

.

(19.51)

Let the wave propagate in the x–z plane, at an angle θ to the z-axis (i.e. to the magnetic field). Then the algebratized wave operator (19.8) takes the form   ǫ1 − n ˜ 2 cos2 θ −iǫ2 n ˜ 2 sin θ cos θ 2 ω  , iǫ2 ǫ1 − n ˜2 0 ||Lij || = 2  (19.52) c 2 2 2 n ˜ sin θ cos θ 0 ǫ3 − n ˜ sin θ

17 where n ˜=

ck ω

(19.53)

is the wave’s index of refraction—i.e, the wave’s phase velocity is Vph = ω/k = c/˜ n. (We introduce the tilde in this chapter to distinguish n ˜ from the particle density n.) The algebratized wave operator (19.52) will be needed when we explore the physical nature of modes, in particular the directions of their electric fields which satisfy Lij Ej = 0. From the wave operator (19.52) we deduce the waves’ dispersion relation det||Lij || = 0. Some algebra brings this into the form tan2 θ =

−ǫ3 (˜ n2 − ǫR )(˜ n2 − ǫL ) , ǫ1 (˜ n2 − ǫ3 ) n ˜ 2 − ǫRǫ1ǫL

(19.54)

where ǫL = ǫ1 − ǫ2 = 1 −

X s

19.4.2

2 ωps , ω(ω − ωcs )

ǫR = ǫ1 + ǫ2 = 1 −

X s

2 ωps . ω(ω + ωcs )

(19.55)

Parallel Propagation

As a first step in making sense out of the general dispersion relation (19.54) for waves in a cold, magnetized plasma, let us consider wave propagation along the magnetic field, so θ = 0. The dispersion relation (19.54) then factorizes to give three pairs of solutions: c2 k 2 = ǫL , ω2 c2 k 2 = ǫR , ω2 ǫ3 = 0 .

(19.56)

Consider the first solution (19.56), n ˜ 2 = ǫL . The algebratized wave equation Lij Ej = 0 in this case requires that the electric field direction be E ∝ (ex − iey )e−iωt , which is a lefthand circularly polarized wave propagating along the static magnetic field (z direction). The second solution (19.56), n ˜ 2 = ǫR , is the corresponding right-hand circular polarized mode. From Eqs. (19.55) we see that these two modes propagate with different phase velocities (but only slightly different, if ω is far from the electron cyclotron frequency and far from the proton cyclotron frequency.) The third solution (19.56), ǫ3 = 0, is just the electrostatic plasma oscillation in which the electrons and protons oscillate parallel to and are unaffected by the static magnetic field. As an aid to exploring the frequency dependence of the left and right modes, we plot in Fig. 19.3 the refractive index n = ck/ω as a function of ω/|ωce|. In the high-frequency limit, the refractive index for both modes is slightly less than unity and approaches that for an unmagnetized plasma, n ˜ = ck/ω ≃ 1 − 21 ωp2/ω 2 [cf. Eq. (19.25)],

18 n~ 2 = ( ck ω)

Resonance

2

ler

Resonance

ist

=0

RH

Wh

LH

c2/a2

Alfven

Cutoffs LH

1 coL

| ce|

RH coR

es

t

an

an

en

ev

ev

c es

ce

nt

pe

RH

LH

Fig. 19.3: Square of wave refractive index for circularly polarized waves propagating along the static magnetic field in a proton-electron plasma with ωpe > ωce . (Remember, that we will regard both the electron and the proton cyclotron frequencies as positive numbers.) The angular frequency is plotted logarithmically in units of the modulus of the electron gyro frequency.

but with a small difference between the modes given to leading order by n ˜L − n ˜R ≃

2 ωpe ωce ω3

(19.57)

This difference is responsible for an important effect known as Faraday rotation: Suppose that a linearly polarized wave is incident upon a magnetized plasma and propagates parallel to the magnetic field. We can deduce the behavior of the polarization by expanding the mode as a linear superposition of the two circular polarized eigenmodes, left and right. These two modes propagate with slightly different phase velocities, so after propagating some distance through the plasma, they acquire a relative phase shift δφ. When one then reconstitutes the linear polarized mode from the circular eigenmodes, this phase shift is manifest in a rotation of the plane of polarisation through an angle ∆φ/2 (for small ∆φ). This, together with the difference in refractive indices (19.57) (which determines ∆φ) implies a rotation rate for the plane of polarization given by 2 ωpe ωce dχ = . dz 2ω 2c

(19.58)

As the wave frequency is reduced, the refractive index decreases to zero, first for the right circular wave, then for the left circular wave; cf. Fig. 19.3. Vanishing at a finite frequency

19 corresponds to vanishing of k and infinite wavelength, i.e., it signals a cutoff ; cf. Fig. 19.2 and associated discussion. When the frequency is lowered further, the refractive index becomes negative and the wave mode becomes evanescent. In other words, if we imagine a wave of constant frequency propagating into an inhomogeneous plasma parallel to its density gradient then solving the dispersion relation (19.22) for k we find that purely imaginary values for k are required if ω < ωcutoff . The cutoff frequencies are different for the two modes and are given by i 1/2 1 h 2 2 ωcoR,L = (ωce + ωcp )2 + 4(ωpe + ωpp ) ± (|ωce | − ωcp ) 2 ≃ ωpe ± |ωce | (19.59) assuming (as us usually the case) that ωpe ≫ |ωce |. As we lower the frequency further (Fig. 19.3), first the right mode and then the left regain the ability to propagate. When the wave frequency lies between the proton and electron gyro freqencies, ωcp < ω < |ωce|, only the right mode propagates. This mode is sometimes called a whistler. As its frequency increases toward the electron gyro frequency |ωce| (where it first recovered the ability to propagate), its refractive index and wave vector become infinite—a signal that ω = |ωce | is a resonance for the whistler; cf. Fig. 19.2. The physical origin of this resonance is that the wave frequency becomes resonant with the gyrational frequency of the electrons that are orbiting the magnetic field in the same sense as the wave’s electric vector rotates. To quantify the strong wave absorption that occurs at this resonance, one must carry out a kinetic-theory analysis that takes account of the electrons’ thermal motions (Chap. 21). Another feature of the whistler is that it is highly dispersive close to resonance; its dispersion relation there is given approximately by ω≃

|ωce | 2 /c2 k 2 1 + ωpe

(19.60)

The group velocity, obtained by differentiating Eq. (19.60), is given approximately by 3/2 ∂ω 2ωce c ω ˆ0 . Vg = ≃ B 1− ∂k ωpe |ωce |

(19.61)

This velocity varies extremely rapidly close to resonance, so waves of different frequency propagate at very different speeds. This is the physical origin of the phenomenon by which whistlers were discovered, historically. They were first encountered by radio operators who heard strange tones with rapidly changing pitch in their earphones. These turned out to be whistler modes excited by lightning in the southern hemisphere that propagated along the earth’s magnetic field through the magnetosphere to the northern hemisphere. Only modes below the lowest electron gyro frequency on the path could propagate and these were highly dispersed, with the lower frequencies arriving first. There is also a resonance associated with the right hand polarized wave, which propagates below the proton cyclotron frequency; see Fig. 19.3.

20 E

B

drift

electron

drift

ion

Fig. 19.4: Gyration of electrons and ions in a low frequency Alfvén wave. Although the electrons and ions gyrate with opposite senses about the magnetic field, their E × B drifts are similar. It is only in the next highest order of approximation that a net ion current is produced parallel to the applied electric field.

Finally, let us examine the low frequency limit of these waves (Fig. 19.3). We find that for both modes, −1/2 a2 , (19.62) ω = ak 1 + 2 c where a = B0 {µ0 ne (mp + me )}−1/2 is the Alfvén speed that arose in our discussion of magnetohydrodynamics [Eq. (17.76)]. As this chapter’s two-fluid physical description is more detailed than the MHD treatment, it is not surprising that the degeneracy between the two propagating modes that exists in MHD is broken and we find that there are two circular polarized eigenmodes travelling with slightly different speeds. p The phase speed a/ 1 + a2 /c2 to which both modes asymptote, as ω → 0, is slightly lower than the Alfvén speed. This fact could not be deduced using nonrelativistic MHD, because it neglects the displacement current. It is illuminating to consider what is happening to the particles in a very-low-frequency Alfvén wave; see Fig. 19.4. As the wave frequency is below both the electron and the proton cyclotron frequencies, both types of particle will orbit the B0 field many times in a wave period. When the wave electric field is applied, the guiding centers of both types of orbits undergo the same E × B drift, so the two fluid velocities also drift at this rate and the currents associated with the proton and electron drifts cancel. However, when we consider higher-order corrections to the guiding center response, we find that the ions drift slightly faster than the electrons, which produces a net current that modifies the magnetic field and gives rise to the modified propagation speed.

19.4.3

Perpendicular Propagation

Turn, next, to waves that propagate perpendicular to the static magnetic field, (k = kex ; θ = π/2). In this case our general dispersion relation (19.54) again has three solutions

21

Resonance

2

n~2= ( ck ω)

Resonance =2

E c2/a2

E

O

1

| ce |

LH

E

pe co1

UH

ent esc n a ev O

co 2

Cutoff

evanescent

ev an es ce nt

Cutoff

E

Fig. 19.5: Variation of refractive index n ˜ for wave propagation perpendicular to the magnetic field in an electron ion plasma with ωpe > ωce . The ordinary mode is designated by O, the extraordinary mode by E.

corresponding to three modes: c2 k 2 = ǫ3 , ω2 c2 k 2 ǫR ǫL , = 2 ω ǫ1 ǫ1 = 0 .

(19.63)

The first solution

ωp2 n ˜ = ǫ3 = 1 − 2 (19.64) ω has the same index of refraction as for electromagnetic waves in an unmagnetized plasma; cf. Eq. (19.22), so this is called the ordinary mode. In this mode, the electric vector and velocity perturbation are parallel to the static magnetic field and so the current response is uninfluenced by it. The second solution (19.63), 2

n ˜ 2 = ǫR ǫL /ǫ1 =

ǫ21 − ǫ22 , ǫ1

(19.65)

22 is known as the extraordinary mode and has an electric field that is orthogonal to B0 but not to k. The refractive indices for the ordinary and extraordinary modes are plotted as functions of frequency in Fig. 19.5. The ordinary-mode curve is dull; it is just like that in an unmagnetized plasma. The extraordinary-mode curve is more interesting. It has two cutoffs, with frequencies 1/2 1 2 1 2 ωco1,2 ≃ ωpe + ωce (19.66) ± ωce , 4 2 and two resonances with strong absorption, at frequencies known as the Upper and Lower Hybrid frequencies. These frequencies are given approximately by 2 2 1/2 ωU H ≃ (ωpe + ωce ) , 2 1/2 (ωpe + |ωce |ωcp)|ωce |ωcp ωLH ≃ . 2 + ω2 ωpe ce

(19.67)

In the limit of very low frequency, the p extraordinary, perpendicularly propagating mode has the same dispersion relation ω = ak/ 1 + a2 /c2 as the paralleling propagating modes [Eq. (19.62)]. It has become the fast magnetosonic wave, propagating perpendicular to the static magnetic field [Sec. 17.7.2], while the parallel waves became the Alfvén modes.

19.5

Propagation of Radio Waves in the Ionosphere

The discovery that radio waves could be reflected off the ionosphere and thereby transmitted over long distances revolutionized communications and stimulated intensive research on radio wave propagation in a magnetoactive plasma. The ionosphere is a dense layer of partially ionized gas between 50 and 300 km above the surface of the earth. The ionization is due to incident solar UV radiation and although the ionization fraction increases with height, the actual density of free electrons passes through a maximum whose height rises and falls with the sun. The electron gyro frequency varies from ∼ 0.5 to ∼ 1 MHz over the ionosphere and the plasma frequency increases from effectively zero to a maximum that can be as high as 100 MHz, so typically, |ωpe| ≫ ωce . We are interested in wave propagation above the electron plasma frequency, which in turn is well in excess of the ion plasma frequency and the ion gyro frequency. It is therefore a good approximation to ignore ion motions altogether. In addition, at the altitudes of most interest for radio wave propagation, the temperature is very low, Te ∼ 200 − 600K, and the cold plasma approximation is well motivated. A complication that one must sometimes face in the ionosphere is the influence of collisions (Ex. 19.2 above), but in this section we shall ignore it. It is conventional in magnetoionic theory to introduce two dimensionless parameters X=

2 ωpe , ω2

Y =

|ωce | ω

(19.68)

23 in terms of which (ignoring ion motions) the components (19.51) of the dielectric tensor are ǫ1 = 1 +

X , Y −1 2

ǫ2 =

XY , Y2−1

ǫ3 = 1 − X .

(19.69)

It is convenient, in this case, to rewrite the dispersion relation det||Lij || = 0 in a form different from Eq. (19.54)—a form derivable, e.g., by computing explicitly the determiant of the matrix (19.52), setting X −1+n ˜2 , (19.70) x= 1−n ˜2 solving the resulting quadratic in x, then solving for n ˜ 2 . The result is the Appleton-Hartree dispersion relation X

n ˜2 = 1 − 1−

Y 2 sin2 θ 2(1−X)

±

n

Y 4 sin4 θ 2(1−X)2

+Y

2

cos2

θ

o1/2

(19.71)

There are two commonly used approximations to this dispersion relation. The first is the quasi-longitudinal approximation, which is used when k is approximately parallel to the static magnetic field, i.e. when θ is small. In this case, just retaining the dominant terms in the dispersion relation, we obtain n ˜2 ≃ 1 −

X . 1 ± Y cos θ

(19.72)

This is just the dispersion relation (19.56) for the left and right modes in strictly parallel propagation, with the substitution B0 → B0 cos θ. By comparing the magnitude of the terms that we dropped from the full dispersion relation in deriving (19.72) with those that we retained, one can show that the quasi-longitudinal approximation is valid when Y 2 sin2 θ ≪ 2(1 − X) cos θ .

(19.73)

The second approximation is the quasi-transverse approximation; it is appropriate when inequality (19.73) is reversed. In this case the two modes are generalizations of the precisely perpendicular ordinary and extraordinary modes, and their approximate dispersion relations are n ˜ 2O = 1 − X , n ˜ 2E = 1 −

X(1 − X) . 1 − X − Y 2 sin2 θ

(19.74)

The ordinary-mode dispersion relation is unchanged from the strictly perpendicular one, (19.64); the extraordinary dispersion relation is obtained from the strictly perpendicular one (19.65) by the substitution B0 → B0 sin θ. The quasi-longitudinal and quasi-transverse approximations simplify the problem of tracing rays through the ionosphere. Commercial radio stations operate in the AM (amplitude modulated) (0.5-1.6 MHz), SW (short wave) (2.3-18 MHz) and FM (frequency modulated) (88-108 MHz) bands. Waves in

24 the first two bands are reflected by the ionosphere and can therefore be transmitted over large surface areas. FM waves, with their higher frequencies, are not reflected and must therefore be received as ground waves. However, they have the advantage of a larger bandwidth and consequently a higher fidelity audio output. As the altitude of the reflecting layer rises at night, short wave communication over long distances becomes easier. **************************** EXERCISES Exercise 19.6 Example: Dispersion and Faraday rotation of Pulsar pulses A radio pulsar emits regular pulses at 1s intervals which propagate to Earth through the ionized interstellar plasma, with electron density ne ≃ 3 × 104 m−3 . The pulses observed at f = 100 MHz are believed to be emitted at the same time as those at much higher frequency but arrive with a delay of 100ms. (a) Explain briefly why pulses travel at the group velocity instead of the phase velocity and show that the expected time delay of the f = 100 MHz pulses relative to the high-frequency pulses is given by Z e2 ∆t = 2 ne dx , (19.75) 8π me ǫ0 f 2 c where the integral is along the waves’ propagation path. Hence compute the distance to the pulsar. (b) Now suppose that the pulses are linearly polarized and that the propagation is accurately described by the quasi-longitudinal approximation. Show that the plane of polarization will be Faraday rotated through an angle ∆χ =

e∆t hBk i me

(19.76)

R R where hBk i = ne B · dx/ ne dx. The plane of polarization of the pulses emitted at 100 MHz is believed to be the same as that as at high frequency but is observed to be rotated through 3 radians. Calculate the mean parallel component of the interstellar magnetic field. Exercise 19.7 Derivation: Appleton-Hartree Dispersion Relation Derive Eq. (19.71) Exercise 19.8 Example: Reflection of Short Waves by the Ionosphere The free electron density in the night-time ionosphere increases exponentially from 109 m−3 to 1011 m−3 as the altitude increases from 100 to 200km and diminishes above this height. Use Snell’s law [Eq. (6.43)] to calculate the maximum range of 10 MHz transmission, assuming a single ionospheric reflection. ****************************

25

(a)

(b)

B

Vg

ω=const ω Vph = k k

ω=const

B ω=const

ck/

ω=const

Fig. 19.6: (a) Wave normal surface for a whistler mode propagating at an angle θ with respect to ˆ as a vector the magnetic field direction. In this diagram we plot the phase velocity Vph = (ω/k)k from the origin, with the direction of the magnetic field chosen upward. When we fix the frequency ω of the wave, the tip of the phase velocity vector sweeps out the figure-8 curve as its angle θ to the magnetic field changes. This curve should be thought of as rotated around the vertical (magneticfield) direction to form a figure-8 “wave-normal” surface. Note that there are some directions where no mode can propagate. (b) Refractive index surface for the same whistler mode. Here we plot ck/ω as a vector from the origin, and as its direction changes with fixed ω, this vector sweeps out the two hyperboloid-like surfaces. Since the length of the vector is ck/ω = n ˜ , this figure can be thought of as a polar plot of the refractive index n ˜ as a function of wave propagation direction θ for fixed ω; hence the name “refractive index surface”. The group velocity Vg is orthogonal to the refractive-index surface (Ex. 19.10). Note that for this Whistler mode, the energy flow (along Vg ) is focused toward the direction of the magnetic field.

19.6

CMA Diagram for Wave Modes in Cold, Magnetized Plasma

Magnetoactive plasmas are anisotropic, just like optically active crystals. This implies that the phase speed of a propagating wave mode depends upon the angle between the direction of propagation and the magnetic field. There are two convenient ways to exhibit this anisotropy diagramatically. The first method, due originally to Fresnel, is to construct phase velocity surfaces (also called wave-normal surfaces), which are polar plots of the wave phase velocity Vph = ω/k as a function of the angle θ that the wave vector k makes with the magnetic field; see Fig. 19.6(a). The second type of surface, used originally by Hamilton, is the refractive index surface. This is a polar plot of the refractive index n = ck/ω for a given frequency again as a function of the wave vector’s angle θ to B; see Fig. 19.6(b). This has the important property that the group velocity is perpendicular to the surface. As discussed above the energy flow is along the direction of the group velocity and, in a magnetoactive plasma, this can make a large

26 angle with the wave vector. A particularly useful application of these ideas is to a graphical representation of the various types of wave modes that can propagate in a cold, magnetoactive plasma. This is known as the Clemmow-Mullaly-Allis or CMA diagram. The character of waves of a given frequency ω depends on the ratio of this frequency to the plasma freqency and the cyclotron 2 2 freqency. This allows us p to define two dimensionless numbers, ωpe ωpp /ω and |ωce |ωcp/ω . [Recall that ωpp = ωpe me /mp and ωcp = ωce (me /mp ).] The space defined by these two dimensionless parameters can be subdivided into sixteen regions, within each of which the propagating modes have a distinctive character. The mode properties are indicated by sketching the topological form of the wave normal surfaces associated with each region. The form of each wave-normal surface in each region can be deduced from the general dispersion relation (19.54). To deduce it, one must solve the dispersion relation for 1/˜ n= ω/kc = Vph /c as a function of θ and ω, and then generate the polar plot of Vph (θ). On the CMA diagram’s wave-normal curves the characters of the parallel and perpendicular modes are indicated by labels: R and L for right and left parallel modes (θ = 0), and O and X for ordinary and extraordinary perpendicular modes (θ = π/2). As one moves across a boundary from one region to another, there is often a change of which parallel mode gets deformed continuously, with increasing θ, into which perpendicular mode. In some regions a wave-normal surface has a figure-eight shape, indicating that the waves can propagate only over a limited range of angles, θ < θmax . In some regions there are two wave-normal surfaces, indicating that—at least in some directions θ—two modes can propagate; in other regions there is just one wave-normal surface, so only one mode can propagate; and in the bottom-right two regions there are no wave-normal surfaces, since no waves can propagate at these high densities and low magnetic-field strengths. **************************** EXERCISES Exercise 19.9 Problem: Exploration of Modes in CMA Diagram For each of the following modes studied earlier in this chapter, identify in the CMA diagram the trajectory, as a function of frequency ω, and verify that the turning on and cutting off of the modes, and the relative speeds of the modes, are in accord with the CMA diagram’s wave-normal curves. a. EM modes in an unmagnetized plasma. b. Left and right modes for parallel propagation in a magnetized plasma. c. Ordinary and extraordinary modes for perpendicular propagation in a magnetized plasma. Exercise 19.10 Derivation: Refractive Index Surface Verify that the group velocity of a wave mode is perpendicular to the refractive index surface. ****************************

27

ε3=0

B

R

R

L

LX

O

X εL=

mp me

R

R

LX O

L

| ce|

X

RO X

2

R O ε1= 0 εRεL=ε1ε3

L R

L R

O

X

X

R εR=

me mp

L ε1= 0 R

L

X O

X

L

εL=0

O

L 0

R

X O

εR=0

pe

pp 2

1

n Fig. 19.7: Clemmow-Mullally-Allis (CMA) Diagram for wave modes of frequency ω propagating in a plasma with plasma frequencies ωpe , ωpp and gyro frequencies ωce , ωcp . Plotted upward is the dimensionless quantity |ωce |ωcp /ω 2 , which is proportional to B 2 , so magnetic field strength also inreases upward. Plotted rightward is the dimensionless quantity ωpe ωpp /ω 2 , which is proportional to n2 , so the plasma number density also increases rightward. Since both the ordinate and the abscissa scale as 1/ω 2 , ω increases in the left-down direction. This plane is split into sixteen regions by a set of curves on which various dielectric components have special values. In each of the sixteen regions is shown a wave-normal surface (phase-velocity surface at fixed ω; Fig. 19.6(a). It depicts the types of wave modes that can propagate for that region’s values of frequency ω, magnetic field strength B, and electron number density n. In each wave-normal diagram the dashed circle indicates the speed of light; a point outside that circle has phase velocity greater than c; inside the circle, Vph < c. The topologies of the wave normal surfaces and speeds relative to c are constant throughout each of the sixteen regions, and change as one moves between regions.

28

19.7

Two Stream Instability

When considered on large enough scales, plasmas behave like fluids and are subject to a wide variety of fluid dynamical instabilities. However, as we are discovering, plasmas have internal degrees of freedom associated with their velocity distribution and this offers additional opportunities for unstable wave modes to grow and for free energy to be released. A full description of velocity-space instabilities is necessarily kinetic and must await the following chapter. However, it is instructive to consider a particularly simple example, the two stream instability, using cold plasma theory as this brings out several features of the more general theory in a particularly simple manner. We will apply this theory in a slightly unusual way to the propagation of fast electron beams through the slowly outflowing solar wind. These electron beams are created by coronal disturbances generated on the surface of the sun (specifically those associated with “Type III” radio bursts). The observation of these fast electron beams was initially a puzzle because plasma physicists knew that they should be unstable to the exponential growth of electrostatic waves. What we will do in this section is demonstrate the problem. What we will not do is explain what is thought to be its resolution, since that involves nonlinear plasma physical considerations beyond the scope of this book.1 Consider a simple, cold (i.e. with negligible thermal motions) electron-proton plasma at rest. Ignore the protons for the moment. We can write the dispersion relation for electron plasma oscillations in the form 2 ωpe = 1. (19.77) ω2 Now allow the ions also to oscillate about their mean positions. The dispersion relation is slightly modified to 2 2 ωp2 ωpe ωpp = + =1 (19.78) ω2 ω2 ω2 [cf. Eq. (19.21)]. If we added other components (for example Helium ions) that would simply add extra terms to the right hand side of Eq. (19.78). Next, return to Eq. (19.77) and look at it in a reference frame through which the electrons are moving with speed u. The observed wave frequency must be Doppler-shifted and so the dispersion relation becomes 2 ωpe =1, (19.79) (ω − ku)2 where ω is now the angular frequency measured in this new frame. It should be evident from this how to generalize Eq. (19.78) to the case of several cold streams moving with different speeds ui . We simply add the terms associated with each component using angular frequencies that have been Doppler-shifted into the rest frames of the streams. 2 2 ωp2 ωp1 + +··· = 1 . (ω − ku1 )2 (ω − ku2 )2

(This procedure will be justified via kinetic theory in the next chapter.) 1

e.g.Melrose, D. B. 1980

(19.80)

29 LHS

large k´

small k

1

kV1

k'V 1

kV2

k'V2

Fig. 19.8: Left hand side of the dispersion relation (19.80) for two cold plasma streams and two different choices of wave vector k. There are only two real roots for ω for small enough k.

The left hand side of the dispersion relation (19.80) is plotted in Fig. 19.8 for the case of two cold plasma streams. The dispersion relation (19.80) is a quartic in ω and so it should have four roots. However, for small enough k only two of these roots will be real; cf. Fig. 19.8. The remaining two roots must be a complex conjugate pair and the root with the positive imaginary part corresponds to a growing mode. We have therefore shown that for small enough k the two stream plasma will be unstable and small electrostatic disturbances will grow exponentially to large amplitude and ultimately react back upon the plasma. As we add more cold streams to the plasma, so we add more modes, some of which will be unstable. This simple example demonstrates how easy it is for a plasma to tap the free energy residing in anisotropic particle distribution functions. Let us return to our original problem and work in the frame of the solar wind (u1 = 0) where the plasma frequency is ωp . If the beam density is a fraction α of the solar wind 2 2 density so ωp2 = αωp1 , and the beam velocity is u2 = V , then by differentiating Eq. (19.80), we find that the local minimum of the left hand side is ωp2(1 + α1/3 )1/2 /ω 2. This minimum exceeds unity for ωp k < (1 + α1/3 )3/2 . (19.81) V This is therefore the condition for there to be a growing mode. The maximum value for the growth rate can be found simply by varying k. It is ωi =

31/2 α1/3 ωp . 24/3

(19.82)

For the solar wind near earth, we have ωp ∼ 105 rad s−1 , α ∼ 10−3 , V ∼ 104 km s−1 . We therefore find that the instability should grow in a length of 30km, much less than the distance from the sun which is 1.5 × 108 km! This describes the problem that we will not resolve. **************************** EXERCISES

30 Exercise 19.11 Derivation: Two stream instability Verify Eq. (19.82) Exercise 19.12 Example: Relativistic Two Stream Instabilty In a very strong magnetic field, we can consider electrons as constrained to move in one dimension along the direction of the magnetic field. Consider a beam of relativistic protons propagating with density nb and speed ub ∼ c through a cold electron-proton plasma along B. Generalize the dispersion relation (19.80) for modes with k k B. Exercise 19.13 Problem: Drift Waves Another type of wave mode that can be found from a fluid description of a plasma (but which requires a kinetic treatment to understand completely) is a Drift wave. The limiting case that we consider here is a modification of an ion acoustic mode in a strongly magnetized plasma with a density gradient. Suppose that the magnetic field is uniform and parallel to the direction ez . Let there be a gradient in the equilibrium density of both the electrons and the protons n0 = n0 (x). In the spirit of our description of ion acoustic modes in an unmagnetized, homogeneous plasma, (cf. Eq. (19.36)), treat the ion fluid as cold but allow the electrons to be warm and isothermal with temperature Te . We seek modes of frequency ω propagating perpendicular to the density gradient, i.e. with k = (0, ky , kz ). (i) Consider the equilibrium of the warm electron fluid and show that there must be a fluid drift velocity along the direction ey of magnitude Vde = −

Via2 1 dn0 , ωci n0 dx

(19.83)

where Via = (kB Te /mp )1/2 is the ion acoustic speed. Explain in physical terms the origin of this drift and why we can ignore the equilibrium drift motion for the ions. (ii) We will limit our attention to low frequency electrostatic modes that have phase velocities below the Alfvén speed. Under these circumstances, perturbations to the magnetic field can be ignored and the electric field can be written as E = −∇Φ. Write down the three components of the linearized ion equation of motion in terms of the perturbation to the ion density n, the ion fluid velocity u and the electrostatic potential Φ. (iii) Write down the linearized equation of ion continuity, including the gradient in n0 , and combine with the equation of motion to obtain an equation for the fractional ion density perturbation 2 2 2 (ωcp kz − ω 2 k 2 )Via2 + ωcp ωky Vde eΦ δn = . (19.84) 2 − ω2) n0 ω 2 (ωcp kB Te (iv) Argue that the fractional electron density perturbation follows a linearized Boltzmann distribution so that eΦ δne = . (19.85) n0 kB Te

31 (v) Use both the ion and the electron density perturbations in Poisson’s equation to obtain the electrostatic drift wave dispersion relation in the low frequency (ω ≪ ωcp ), long wavelength (kλD ≪ 1) limit: ω=

ky Vde 1 2 2 ± [ky Vde + 4kz2 Via2 ]1/2 . 2 2

(19.86)

Describe the physical character of the mode in the additional limit kz → 0. A proper justification of this procedure requires a kinetic treatment which also shows that, under some circumstances, drift waves can be unstable and grow exponentially. Just as the two stream instability provides a mechanism for plasmas to erase non-uniformity in velocity space, so drift waves can rapidly remove spatial irregularities.

****************************

Bibliographic Note The definitive monograph on waves in plasmas is Stix (1992). For the most elementary textbook treatment see portions of Chen (1974). For more sophisticated and detailed textbook treatments see Boyd and Sanderson (1969), Clemmow and Doughtery (1969), Krall and Trivelpiece (1973), Landau and Lifshitz (1981), and Schmidt (1966). For treatments that focus on astrophysical plasmas, see Melrose (1980) and Parks (1991).

Bibliography Boyd, T. J. M. & Sanderson, J. J. 1969. Plasma Dynamics, London: Nelson. Chen, F. F. 1974. Introduction to Plasma Physics, New York: Plenum. Clemmow, P. C. & Dougherty, J. P. 1969. Electrodynamics of Particles and Plasmas, Reading: Addison-Wesley. Krall, N. A. & Trivelpiece, A. W. 1973. Principles of Plasma Physics, New York: McGraw-Hill. Landau, L. D. & Lifshitz, E. M. 1981. Plasma Kinetics, Oxford: Pergamon. Melrose, D. B. 1980. Plasma Astrophysics, New York: Gordon and Breach. Parks, G. K. 1991. Physics of Space Plasmas, Redwood City: Addison-Wesley. Schmidt, G. 1966. Physics of High Temperature Plasmas, New York: Academic. Stix, T. H. 1992. Waves in Plasmas, New York: American Institute of Physics.

32

Box 19.2 Important Concepts in Chapter 20 • For a linear dielectric medium: polarization vector P, electrical susceptibility tensor χij , dielectric tensor ǫij , electrical conductivity tensor κe ij , wave operator in Fourier domain Lij , and dispersion relation det||Lij || = 0, Sec. 19.2 • Two-fluid formalism for a plasma: for each species – fluid velocity us = hvs i, pressure tensor Ps , particle conservation, and equation of motion (Euler equation plus Lorentz force), Sec. 19.3.1 • Waves in cold, unmagnetized plasma, Secs. 19.3.2, 19.3.3 – How to deduce the electric field and particle motions in a wave mode, Secs. 19.3.2, 19.3.4 – Electromagnetic waves and their cutoff at the plasma frequency, Secs. 19.3.3, 19.3.5 – Nonpropagating electrostatic oscillations, Sec. 19.3.2 • Waves in warm, unmagnetized plasma: – Electrostatic oscillations become Langmuir waves, Sec. 19.3.4 – Ion acoustic waves, Sec. 19.3.4 • Cutoff: form of dispersion relation near a cutoff; Electromagnetic waves and Langmuir waves as examples; for inhomogeneous plasma: deflection of wave away from cutoff region in space, Secs. 19.3.5, 19.5 • Resonance: form of dispersion relation near a resonance; Ion acoustic waves as example; for inhomogeneous plasma: attraction of wave into resonance region and dissipation there, Secs. 19.3.5, 19.5 • Waves in cold, magnetized plasma, Sec. 19.4 – Waves propagating parallel to the magnetic field: Alfven waves, whistlers, right-circurlarly-polarized EM waves, and left-circularly-polarized EM waves, Sec. 19.4.2 – Waves propagating perpendicular to the magnetic field: Magnetosonic waves, upper hybrid waves, lower hybrid waves, ordinary EM waves, extraordinary EM waves, Sec. 19.4.3 – Ways to depict dependence of phase velocity on direction: phase-velocity (or wave-normal) surface and CMA diagram based on it; refractive index surface, Sec. 19.6 • Two-stream instability, Sec. 19.7

Contents 20 Kinetic Theory of Warm Plasmas 20.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Basic Concepts of Kinetic Theory and its Relationship to Two-Fluid Theory 20.2.1 Distribution Function and Vlasov Equation . . . . . . . . . . . . . . 20.2.2 Relation of Kinetic Theory to Two-Fluid Theory . . . . . . . . . . . 20.2.3 Jeans’ Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 Electrostatic Waves in an Unmagnetized Plasma: Landau Damping . . . . . 20.3.1 Formal Dispersion Relation . . . . . . . . . . . . . . . . . . . . . . . 20.3.2 Two-Stream Instability . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.3 The Landau Contour . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.4 Dispersion Relation For Weakly Damped or Growing Waves . . . . . 20.3.5 Langmuir Waves and their Landau Damping . . . . . . . . . . . . . . 20.3.6 Ion Acoustic Waves and Conditions for their Landau Damping to be Weak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.4 Stability of Electrostatic Waves in Unmagnetized Plasmas . . . . . . . . . . 20.5 Particle Trapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.6 N Particle Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . .

0

1 1 2 2 4 6 7 7 9 10 12 14 16 19 21 24

Chapter 20 Kinetic Theory of Warm Plasmas Version 0820.1.K.pdf, 8 April 2009. Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 20.1 Reader’s Guide • This chapter relies significantly on: – Portions of Chap. 2 on kinetic theory: Secs. 2.2.1 and 2.2.2 on the distribution function, and Sec. 2.6 on Liouville’s theorem and the collisionless Boltzmann equation. – Section 19.3 on Debye shielding, collective behavior of plasmas and plasma oscillations. – Portions of Chap. 20: Wave equation and dispersion relation for dielectrics (Sec. 19.2), and two-fluid formalism and its application to Langmuir and ion acoustic waves (Secs. 19.3 and 19.7) • Chapter 22 on nonlinear dynamics of plasmas relies heavily on this chapter.

20.1

Overview

At the end of Chap. 20, we showed how to generalize cold-plasma two-fluid theory so as to accommodate several distinct plasma beams, and thereby we discovered an instability. If the beams are not individually monoenergetic (i.e. cold), as we assumed they were in Chap. 20, but instead have broad velocity dispersions that overlap in velocity space (i.e. if the beams are warm), then the approach of Chap. 20 cannot be used, and a more powerful description of the plasma is required. Chapter 20’s approach entailed specifying the positions and velocities 1

2 of specific groups of particles (the “fluids”); this is an example of a Lagrangian description. It turns out that the most robust and powerful method for developing the kinetic theory of warm plasmas is an Eulerian one in which we specify how many particles are to be found in a fixed volume of one-particle phase space. In this chapter, using this Eulerian approach, we develop the kinetic theory of plasmas. We begin in Sec. 20.2 by introducing kinetic theory’s one-particle distribution function, f (x, v, t) and recovering its evolution equation (the collisionless Boltzmann equation, also called Vlasov equation), which we have met previously in Chap. 2. We then use this Vlasov equation to derive the two-fluid formalism used in Chap. 20 and to deduce some physical approximations that underlie the two-fluid description of plasmas. In Sec. 20.3 we explore the application of the Vlasov equation to Langmuir waves—the one-dimensional electrostatic modes in an unmagnetized plasma that we met in Chap. 20. Using kinetic theory, we rederive the Bohm-Gross dispersion relation for Langmuir waves, and as a bonus we uncover a physical damping mechanism, called Landau damping, that did not and cannot emerge from the two-fluid analysis of Chap. 20. This subtle process leads to the transfer of energy from a wave to those particles that can “surf” or “phase-ride” the wave (i.e. those whose velocity resolved parallel to the wave vector is just slightly less than the wave’s phase speed). We show that Landau damping works because there are usually fewer particles traveling faster than the wave and augmenting its energy density than those traveling slower and extracting energy from it. However, in a collisionless plasma, the particle distributions need not be Maxwellian. In particular, it is possible for a plasma to possess an “inverted” particle distribution with more fast ones than slow ones; then there is a net injection of particle energy into the waves, which creates an instability. In Sec. 20.3 we use kinetic theory to derive a necessary and sufficient criterion for instability due to this cause. In Sec. 20.4 we examine in greater detail the physics of Landau damping and show that it is an intrinsically nonlinear phenomenon; and we give a semi-quantitative discussion of Nonlinear Landau damping, prefatory to a more detailed treatment of some other nonlinear plasma effects in the following chapter. Although the kinetic-theory, Vlasov description of a plasma that is developed and used in this chapter is a great improvement on the two-fluid description of Chap. 20, it is still an approximation; and some situations require more accurate descriptions. We conclude this chapter by introducing greater accuracy via N-particle distribution functions, and as applications we use them (i) to explore the approximations underlying the Vlasov description, and (ii) to explore two-particle correlations that are induced in a plasma by Coulomb interactions and the influence of those correlations on a plasma’s equation of state.

20.2

Basic Concepts of Kinetic Theory and its Relationship to Two-Fluid Theory

20.2.1

Distribution Function and Vlasov Equation

In Chap. 2 we introduced the number density of particles in phase space, called the distribution function N (P, ~p). We showed that, this quantity is Lorentz invariant and that

3 it satisfies the collisionless Boltzmann equation (2.61) and (2.62); and we interpreted this equation as N being constant along the phase-space trajectory of any freely moving particle. In order to comply with the conventions of the plasma-physics community, we shall use the name Vlasov equation in place of collisionless Boltzmann equation,1 and we shall change notation in a manner described in Sec. 2.2.2: We use velocity v rather than momentum as an independent variable, we denote the distribution function by f (v, x, t), and we normalize it so that Z f (v, x, t)dVv = n(x, t)

(20.1)

where n(x, t) is the particle space density at that point in spacetime and dVv ≡ dvx dvy dvz is the three-dimensional volume element of velocity space. (For simplicity, we shall also restrict ourselves to nonrelativistic speeds; the generalization to relativistic plasma theory is straightforward, though seldom used.) This one-particle distribution function f (v, x, t) and its resulting kinetic theory give a good description of a plasma in the regime of large Debye number, ND ≫ 1—which includes almost all plasmas that occur in the universe; cf. Sec. 18.3.2 and Fig. 18.1. The reason is that, when ND ≫ 1, we can define f (v, x, t) by averaging over a physical-space volume that is large compared to the interparticle spacing and that thus contains many particles, but is still small compared to the Debye length. By such an average—the starting point of kinetic theory—, the electric fields of individual particles are made unimportant, and the Coulomb-interaction-induced correlations between pairs of particles are made unimportant. In the last section of this chapter we shall use a 2-particle distribution function to explore this issue in detail. In Chap. 2 we showed that in the absence of collisions (a good assumption for plasmas!), the distribution function evolves in accord with the Vlasov equation (2.61), (2.62). We shall now rederive the Vlasov equation beginning with the law of conservation of particles for each species s = e (electrons) and p (protons): ∂fs ∂fs ∂(fs vj ) ∂(fs aj ) + ∇ · (fs v) + ∇v · (fs a) ≡ + + =0. ∂t ∂t ∂xj ∂vj

(20.2)

Here a=

qs dv (E + v × B) = dt ms

(20.3)

is the electromagnetic acceleration of a particle of species s, which has mass ms and charge qs , and E and B are the electric and magnetic fields averaged over the same volume as is used in constructing f . Equation (20.2) has the standard form for a conservation law: the time derivative of a density (in this case density of particles in phase space, not just physical space), plus the divergence of a flux (in this case the spatial divergence of the particle flux, 1

This equation was introduced and explored in 1913 by James Jeans in the context of stellar dynamics, and then rediscovered and explored by Anatoly Alexandrovich Vlasov in 1938 in the context of plasma physics. Plasma physicists have honored Vlasov by naming the equation after him. For details of this history, see Hénon (1982).

4 f v = f dx/dt, in the physical part of phase space plus the velocity divergence of the particle flux, f a = f dv/dt, in velocity space) is equal to zero. Now x, v are independent variables, so that ∇ · v = 0 and ∇v x = 0. In addition, E and B are functions of x, t not v, and the term v × B is perpendicular to v. Therefore, ∇v · (E + v × B) = 0 .

(20.4)

These facts permit us to pull v and a out of the derivatives in Eq. (20.2), thereby obtaining ∂fs ∂fs dxj ∂fs dvj ∂fs + (v · ∇)fs + (a · ∇v )fs ≡ + + =0. ∂t ∂t dt ∂xj dt ∂vj

(20.5)

We recognize this as the statement that fs is a constant along the trajectory of a particle in phase space, dfs =0, (20.6) dt which is the Vlasov equation for species s. Equation (20.6) tells us that, when the space density near a given particle increases, the velocity-space density must decrease, and vice versa. Of course, if we find that other forces or collisions are important in some situation, we can represent them by extra terms added to the right hand side of the Vlasov equation (20.6) in the manner of the Boltzmann transport equation (2.63); cf. Sec. 2.6. So far, we have treated the electromagnetic field as being somehow externally imposed. However, it is actually produced by the net charge and current densities associated with the two particle species. These are expressed in terms of the distribution function by ρe =

X s

qs

Z

fs dVv ,

j=

X s

qs

Z

fs v dVv .

(20.7)

These equations, together with Maxwell’s equations and the Vlasov equation (20.5) with a = dv/dt given by the Lorentz force law (20.3), form a complete set of equations for the structure and dynamics of a plasma. They constitute the kinetic theory of plasmas.

20.2.2

Relation of Kinetic Theory to Two-Fluid Theory

Before developing techniques to solve the Vlasov equation, we shall first relate it to the twofluid approach used in the previous chapter. We begin doing so by constructing the moments of the distribution function fs , defined by Z ns = fs dVv , Z 1 us = fs v dVv , ns Z Ps = ms fs (v − us ) ⊗ (v − us ) dVv . (20.8)

5 These are the density, the mean fluid velocity and the pressure tensor for species s. (Of course, Ps is just the three-dimensional stress tensor Ts [Eq. (2.30c)] evaluated in the rest frame of the fluid.) By integrating the Vlasov equation (20.5) over velocity space and using Z Z Z (v · ∇)fs dVv = ∇ · (fs v) dVv = ∇ · fs v dVv , Z Z (a · ∇v )fs dVv = − (∇v · a)fs dVv = 0 , (20.9) together with Eq. (20.8), we obtain the continuity equation ∂ns + ∇ · (ns us ) = 0 (20.10) ∂t for each particle species s. [It should not be surprising that the Vlasov equation implies the continuity equation, since the Vlasov equation is equivalent to the conservation of particles in phase space (20.2), while the continuity equation is just the conservation of particles in physical space.] The continuity equation is the first of the two fundamental equations of two-fluid theory. The second is the equation of motion, i.e. the evolution equation for the fluid velocity us . To derive this, we multiply the Vlasov equation (20.5) by the particle velocity v and then integrate over velocity space, i.e. we compute the Vlasov equation’s first moment. The details are a useful exercise for the reader (Ex. 20.1); the result is ∂us + (us · ∇)us = −∇ · Ps + ns qs (E + us × B) , (20.11) ns ms ∂t which is identical with Eq. (19.14). A difficulty now presents itself. We can compute the charge and current densities from ns and us using Eqs. (20.8). However, we do not yet know how to compute the pressure tensor Ps . We could derive an equation for the evolution of Ps by taking the second moment of the Vlasov equation (i.e. multipling it by v ⊗ v and integrating over velocity space), but that evolution R equation would involve an unknown third moment of fs on the right hand side, M3 = fs v ⊗ v ⊗ v dVv which is related to the heat-flux tensor. In order to determine the evolution of this M3 , we would have to construct the third moment of the Vlasov equation, which would involve the fourth moment of fs as a driving term, and so on. Clearly, this procedure will never terminate unless we introduce some additional relationship between the moments. Such a relationship, called a closure relation, permits us to build a self-contained theory involving only a finite number of moments. For the two-fluid theory of Chap. 20, the closure relation that we implicitly used was the same idealization that one makes when regarding a fluid as perfect, namely that the heat-flux tensor vanishes. This idealization is less well justified in a collisionless plasma, with its long mean free paths, than in a normal gas or liquid with its short mean free paths. An example of an alternative closure relation is one that is appropriate if radiative processes thermostat the plasma to a particular temperature so Ts =constant; then we can set Ps = ns kB Ts g ∝ ns where g is the metric tensor. Clearly, a fluid theory of plasmas can be no more accurate than its closure relation.

6

20.2.3

Jeans’ Theorem.

Let us now turn to the difficult task of finding solutions to the Vlasov equation. There is an elementary (and, after the fact, obvious) method to write down a class of solutions that are often useful. This is based on Jeans’ theorem (named after the astronomer who first drew attention to it in the context of stellar dynamics2 ). Suppose that we know the particle acceleration a as a function of v, x, and t. (We assume this for pedagogical purposes; it is not necessary for our final conclusion). Then, for any particle with phase space coordinates (x0 , v0 ) specified at time t0 , we can (at least in principle) compute the particle’s future motion, x = x(x0 , v0 , t), v = v(x0 , v0 , t). These particle trajectories are the characteristics of the Vlasov equation, analogous to the characteristics of the equations for one-dimensional supersonic fluid flow which we studied in Sec. 16.4 (see Fig. 16.7). Now, for many choices of the acceleration a(v, x, t), there are constants of the motion also known as integrals of the motion that are preserved along the particle trajectories. Simple examples, familiar from elementary mechanics, include the energy (for a time-independent plasma) and the angular momentum (for a spherically symmetric plasma). These integrals can be expressed in terms of the initial coordinates (x0 , v0 ). If we know n constants of the motion, then only 6 − n additional variables need be chosen from (x0 , v0 ) to specify the motion completely. Now, the Vlasov equation tells us that fs is constant along a trajectory in x − v space. Therefore, fs must, in general be expressible as a function of (x0 , v0 ). Equivalently, it can be rewritten as a function of the n constants of the motion and the remaining 6−n initial phasespace coordinates. However, there is no requirement that it actually depend on all of these variables. In particular, any function of the integrals of motion alone that is independent of the remaining initial coordinates will satisfy the Vlasov equation (20.5). This is Jeans’ Theorem. In words, functions of constants of the motion take constant values along actual dynamical trajectories in phase space and therefore satisfy the Vlasov equation. Of course, a situation may be so complex that no integrals of the particles’ equation of motion can be found, in which case, Jeans’ theorem is useless. Alternatively, there may be integrals but the initial conditions may be sufficiently complex that extra variables are required to determine fs . However, it turns out that in a wide variety of applications, particularly those with symmetries such as time independence ∂fs /∂t = 0, simple functions of simple integrals suffice to describe a plasma’s distribution functions. We have already met and used a variant of Jeans’ theorem in our analysis of statistical equilibrium in Sec. 3.4. There the statistical mechanics distribution function ρ was found to depend only on the integrals of the motion. We have also, unknowingly, used Jeans’ theorem in our discussion of Debye screening in a plasma (Sec. 18.3.1). To understand this, let us suppose that we have a single isolated positive charge at rest in a stationary plasma (∂fs /∂t = 0), and we want to know the electron distribution function in its vicinity. Let us further suppose that the electron distribution at large distances from the charge is known to be Maxwellian with temperature T , i.e. fe (v, x, t) ∝ exp(− 21 me v 2 /kB T ). Now, the electrons have an energy integral, E = 21 me v 2 − eΦ, where Φ is the electrostatic potential. As Φ becomes constant at large distance from the 2

Jeans (1926)

7 charge, we can therefore write fe ∝ exp(−E/kB T ) at large distance. However, the particles near the charge must have traveled there from large distance and so must have this same distribution function. Therefore, close to the charge, fe ∝ e−E/kB T = e−[(me v

2 /2−eΦ)/k

BT ]

,

and the electron density is obtained by integration over velocity Z ne = fe dVv ∝ e(eΦ/kB T ) .

(20.12)

(20.13)

This is just the Boltzmann distribution that we asserted to be appropriate in Sec. 18.3.1. **************************** EXERCISES Exercise 20.1 Derivation: Two-fluid Equation of Motion Derive the two-fluid equation of motion (20.11) by multiplying the Vlasov equation (20.5) by v and integrating over velocity space. Exercise 20.2 Example: Positivity of Distribution Function The one-particle distribution function f (x, v, t) ought not to become negative if it is to remain physical. Show that this is guaranteed if it initially is everywhere nonnegative and it evolves by the collisionless Vlasov equation.

****************************

20.3

Electrostatic Waves in an Unmagnetized Plasma: Landau Damping

20.3.1

Formal Dispersion Relation

As our principal application of the kinetic theory of plasmas, we shall explore its predictions for the dispersion relations, stability, and damping of longitudinal, electrostatic waves in an unmagnetized plasma—Langmuir waves and ion acoustic waves. When studying these waves in Sec. 19.3 using two-fluid theory, we alluded time and again to properties of the waves that could not be derived by fluid techniques. Our goal, now, is to elucidate those properties using kinetic theory. As we shall see, their origin lies in the plasma’s velocity-space dynamics. Consider an electrostatic wave propagating in the z direction. Such a wave is one dimensional in that the electric field points in the z direction, E = Eez , and varies as ei(kz−ωt) so it depends only on z and not on x or y; the distribution function similarly varies as ei(kz−ωt) and is independent of x, y; and the Vlasov, Maxwell, and Lorentz force equations produce

8 no coupling of particle velocities vx , vy into the z direction. This suggests the introduction of one-dimensional distribution functions, obtained by integration over vx and vy : Fs (v, z, t) ≡

Z

fs (vx , vy , v = vz , z, t)dvx dvy .

(20.14)

Here and throughout we suppress the subscript z on vz . Restricting ourselves to weak waves so nonlinearities can be neglected, we linearize the one-dimensional distribution functions: Fs (v, z, t) ≃ Fs0 (v) + Fs1 (v, z, t) .

(20.15)

Here Fs0 (v) is the distribution function of the unperturbed particles (s = e for electrons and s = p for protons) in the absence of the wave and Fs1 is the perturbation induced by and linearly proportional to the electric field E. The evolution of Fs1 is governed by the linear approximation to the Vlasov equation (20.5): ∂Fs1 ∂Fs1 qs E dFs0 +v + =0. ∂t ∂z ms dv

(20.16)

We seek a monochromatic, plane-wave solution to this Vlasov equation, so ∂/∂t → −iω and ∂/∂z → ik in Eq. (20.16). Solving the resulting equation for Fs1 , we obtain an equation for Fs1 in terms of E: dFs0 −iqs E. Fs1 = (20.17) ms (ω − kv) dv This equation implies that the charge density associated with the wave is related to the electric field by ! X Z +∞ X −iq 2 Z +∞ F ′ dv s0 s E, (20.18) ρe = qs Fs1 dv = ms −∞ ω − kv −∞ s s ′ where the prime denotes a derivative with respect to v: Fs0 = dFs0 /dv. A quick route from here to the waves’ dispersion relation is to insert this charge density into Poisson’s equation ∇ · E = ikE = ρe /ǫ0 and note that both sides are proportional to E, so a solution is possible only if X q 2 Z +∞ F ′ dv s s0 1+ =0. (20.19) m ǫ k ω − kv s 0 −∞ s

An alternative route, which makes contact with the general analysis of waves in a dielectric medium (Sec. 19.2), is developed in Ex. 20.3 below. This route reveals that the dispersion relation is given by the vanishing of the zz component of the dielectric tensor, which we denoted ǫ3 in Chap. 20 [cf. Eq. (19.50)], and it shows that ǫ3 is given by expression (20.19): X q 2 Z +∞ F ′ dv s0 s =0. (20.20) ǫ3 (ω, k) = 1 + ms ǫ0 k −∞ ω − kv s

9 Since ǫ3 = ǫzz is the only component of the dielectric tensor that we shall meet in this chapter, we shall simplify notation henceforth by omitting the subscript 3. The form of the dispersion relation (20.20) suggests that we combine the unperturbed electron and proton distribution functions Fe0 (v) and Fp0 (v) to produce a single, unified distribution function me F (v) = Fe0 (v) + (20.21) Fp0 (v) , mp in terms of which Eq. (20.20) takes the form e2 ǫ(ω, k) = 1 + me ǫ0 k

Z

+∞

−∞

F ′ dv =0. ω − kv

(20.22)

Note that each proton is weighted less heavily than each electron by a factor me /mp = 1/1860 in the unified distribution function (20.21) and the dispersion relation (20.22). This is due to the protons’ greater inertia and corresponding weaker response to an applied electric field, and it causes the protons to be of no importance in Langmuir waves (Sec. 20.3.5 below). However, in ion-acoustic waves (Sec. 20.3.6), the protons can play an important role because large numbers of them may move with thermal speeds that are close to the waves’ phase velocity and thereby can interact resonantly with the waves.

20.3.2

Two-Stream Instability

As a first application of the general dispersion relation (20.22), we use it to rederive the dispersion relation (19.77) associated with the cold-plasma two-stream instability of Sec. 19.7. We begin by performing an integration by parts on the general dispersion relation (20.22), obtaining: Z +∞ F dv e2 =1. (20.23) me ǫ0 −∞ (ω − kv)2 We then presume, as in Sec. 19.7, that the fluid consists of two or more streams of cold particles (protons or electrons) moving in the z direction with different fluid speeds u1 , u2 , . . ., so F (v) = n1 δ(v − u1 ) + n2 δ(v − u2 ) + . . .. Here nj is the number density of particles in stream j if the particles are electrons, and me /mp times the number density if they are protons. Inserting this F (v) into Eq. (20.23) and noting that nj e2 /me ǫ0 is the squared 2 plasma frequency ωpj of stream j, we obtain the dispersion relation 2 2 ωp1 ωp2 + + ... = 1 , (ω − ku1 )2 (ω − ku2 )2

(20.24)

which is identical to the dispersion relation Eq. (19.80) used in our analysis of the two-stream instability. Clearly the general dispersion relation (20.23) [or equally well (20.22)] provides us with a tool for exploring how the two-stream instability is influenced by a warming of the plasma, i.e. by a spread of particle velocities around the mean, fluid velocity of each stream. We shall explore this in Sec. 20.4 below.

10

20.3.3

The Landau Contour

The general dispersion relation (20.22) has a troubling feature: for real ω and k its integrand becomes singular at v = ω/k = (the waves’ phase velocity) unless dF/dv vanishes there, which is generically unlikely. This tells us that if, as we shall assume, k is real, then ω cannot be real, except, perhaps, for a non-generic mode whose phase velocity happens to coincide with a velocity for which dF/dv = 0. With ω/k complex, we must face the possibility of some subtlety in how the integral over v in the dispersion relation (20.22) is performed—the possibility that we may have to make v complex in the integral and follow some special route in the complex velocity plane from v = −∞ to v = +∞. Indeed, there is such a subtlety, as Lev Landau has shown.3 Our simple derivation of the dispersion relation, above, cannot reveal this subtlety—and, indeed, is suspicious, since in going from Eq. (20.16) to (20.17) our derivation entailed dividing by ω − kv which vanishes when v = ω/k, and dividing by zero is always a suspicious practice. Faced by this conundrum, Landau developed a more sophisticated derivation of the dispersion relation, one based on posing generic initial data for electrostatic waves, then evolving those data forward in time and identifying the plasma’s electrostatic modes by their late-time sinusoidal behaviors, and finally reading off the dispersion relation for the modes from the equations for the late-time evolution. We shall sketch a variant of this analysis: For simplicity, from the outset we restrict ourselves to plane waves propagating in the z direction with some fixed, real wave number k, so the linearized one-dimensional distribution function and the electric field have the forms Fs (v, z, t) = Fs0 (v) + Fs1 (v, t)eikz ,

E(z, t) = E(t)eikz .

(20.25)

At t = 0 we pose initial data Fs1 (v, 0) for the electron and proton velocity distributions; these data determine the initial electric field E(0) via Poisson’s equation. We presume that these initial distributions [and also the unperturbed plasma’s velocity distribution Fs0 (v)] are analytic functions of velocity v, but aside from this constraint the Fs1 (v, 0) are generic. (A Maxwellian distribution is analytic, and most any physically reasonable initial distribution can be well approximated by an analytic function.) We then evolve these initial data forward in time. The ideal tool for such evolution is the Laplace transform, and not the Fourier transform. The power of the Laplace transform is much appreciated by engineers, and under-appreciated by many physicists. Those readers who are not intimately familiar with evolution via Laplace transforms should work carefully through Ex. 20.4. That exercise uses Laplace transforms, followed by conversion of the final answer into Fourier language, to derive the following formula for the time-evolving electric field in terms of the initial velocity distributions Fs1 (v, 0): # Z iσ+∞ −iωt "X Z +∞ e qs Fs1 (v, 0) E(t) = dv dω . (20.26) 2πǫ0 k −∞ ω − kv iσ−∞ ǫ(ω, k) s Here the integral in frequency space is along the solid horizontal line at the top of Fig. 20.1, with the imaginary part of ω held fixed at ωi = σ and the real part ωr varying from −∞ 3

Landau, L. D. 1946. J. Phys. USSR, 10, 25.

11 ωi ω2

ω3

σ

ωr

ω4

ω1

Fig. 20.1: Contour of integration for evaluating E(t) [Eq. (20.26)] as a sum over residues of the integrand’s poles—the modes of the plasma.

to +∞. The Laplace techniques used to derive this formula are carefully designed to avoid any divergences and any division by zero. This careful design leads to the requirement that the height σ of the integration line above the real frequency axis be larger than the e-folding rate ℑ(ω) of the plasma’s most rapidly growing mode (or, if none grow, still larger than zero and thus larger than the e-folding rate of the most slowly decaying mode): σ > max ℑ(ωn ) , n

and σ > 0 ,

(20.27)

where n = 1, 2, . . . labels the modes and ωn is the complex frequency of mode n. Equation (20.26) also entails a velocity integral. In the Laplace-based analysis that leads to this formula, there is never any question about the nature of the velocity v: it is always real, so the integral is over real v running from −∞ to +∞. However, because all the frequencies ω appearing in Eq. (20.26) have ωi = σ > 0, there is no possibility in the velocity integral of any divergence of the integrand. In Eq. (20.26) for the evolving field, ǫ(ω, k) is the same dielectric function as we deduced in our previous analysis (Sec. 20.3.1): Z +∞ ′ F dv e2 . (20.28) ǫ(ω, k) = 1 + me ǫ0 k −∞ ω − kv However, here by contrast with there our derivation has dictated unequivocally how to handle the v integration—the same way as in Eq. (20.26): v is strictly real and the only frequencies appearing in the evolution equations have ωi = σ > 0, so the v integral running along the real velocity axis passes under the integrand’s pole at v = ω/k as shown in Fig. 20.2(a). To read off the modal frequencies from the evolving field E(t) at times t > 0, we use techniques from complex-variable theory. It can be shown that, because (by hypothesis) Fs1 (v, 0) and Fs0 (v) are analytic functions of v, the integrand of the ω integral in Eq. (20.26) is meromorphic—i.e., when the integrand is analytically continued throughout the complex

12 frequency plane, its only singularities are poles. This permits us to evaluate the frequency integral, at times t > 0, by closing the integration contour in the lower-half frequency plane as shown by the dashed curve in Fig. 20.1. Because of the exponential factor e−iωt , the contribution from the dashed part of the contour vanishes, which means that the integral around the contour is equal to E(t) (the contribution from the solid horizontal part). Complex-variable theory tells us that this integral is given by a sum over the residues Rn of the integrand at the poles (labeled n = 1, 2, . . .): X X E(t) = 2πi Rn = An e−iωn t . (20.29) n

n

Here ωn is the frequency at pole n and An is 2πiRn with its time dependence e−iωn t factored out. It is evident, then, that each pole of the analytically continued integrand of Eq. (20.26) corresponds to a mode of the plasma and the pole’s complex frequency is the mode’s frequency. Now, for very special choices of the initial data Fs1 (v, 0), there may be poles in the square-bracket term in Eq. (20.26), but for generic initial data there will be none, and the only poles will be the zeroes of ǫ(ω, k). Therefore, generically, the modes’ frequencies are the zeroes of ǫ(ω, k)—when that function (which was originally defined only along the line ωi = σ) has been analytically extended throughout the complex frequency plane. So how do we compute the analytically extended dielectric function ǫ(ω, k)? Imagine holding k fixed and real, and exploring the (complex) value of ǫ, thought of as a function of ω/k, by moving around the complex ω/k plane (same as complex velocity plane). In particular, imagine computing ǫ from Eq. (20.28) at one point after another along the arrowed path shown in Fig. 20.2. This path begins at an initial location ω/k where ωi /k = σ/k > 0 and travels downward to some other location below the real axis. At the starting point, the discussion following Eq. (21.28) tells us how to handle the velocity integral: just integrate v along the real axis. As ω/k is moved continuously (with k held fixed), ǫ(ω, k) being analytic must vary continuously. If, when ω/k crosses the real velocity axis, the integration contour in Eq. (20.28) were to remain on the velocity axis, then the contour would jump over the integral’s moving pole v = ω/k, and there would be a discontinuous jump of the function ǫ(ω, k) at the moment of crossing, which is not possible. To avoid such a discontinous jump, it is necessary that the contour of integration be kept below the pole, v = ω/k, as that pole moves into the lower half velocity plane; cf. Fig. 20.2(b,c). The rule that the integration contour must always pass beneath the pole v = ω/k as shown in Fig. 20.2 is called the Landau prescription; the contour is called the Landau contour and is denoted L; and our final formula for the dielectric function (and for its vanishing at the modal frequencies—the dispersion relation) is Z e2 F ′ dv ǫ(ω, k) = 1 + (20.30) =0. me ǫ0 k L ω − kv

20.3.4

Dispersion Relation For Weakly Damped or Growing Waves

In most practical situations, electrostatic waves are weakly damped or weakly unstable, i.e. |ωi | ≪ ωr , so the amplitude changes little in one wave period. In this case, the dielectric

13 vi

vi

ω/k

vi

σ/k ω/k vr (a)

vr

L (b)

vr

L ω/k (c)

Fig. 20.2: Derivation of the Landau contour L: The dielectric function ǫ(ω, k) is originally defined, in Eqs. (20.26) and (20.28), solely for ωi /k = σ/k > 0, the point in diagram (a). Since ǫ(ω, k) must be an analytic function of ω at fixed k and thus must vary continuously as ω is continuously changed, the dashed contour of integration in Eq. (20.28) must be kept always below the pole v = ω/k, as shown in (b) and (c).

function (20.30) can be evaluated at ω = ωr + iωi using the first term in a Taylor series expansion away from the real axis: ∂ǫr ∂ǫi ǫ(k, ωr + iωi ) ≃ ǫ(k, ωr ) + ωi +i ∂ωi ∂ωi ωi =0 ∂ǫr ∂ǫi +i = ǫ(k, ωr ) + ωi − ∂ωr ∂ωr ωi =0 ∂ǫr ≃ ǫ(k, ωr ) + iωi . (20.31) ∂ωr ωi =0 Here ǫr and ǫi are the real and imaginary parts of ǫ; in going from the first line to the second we have assumed that ǫ(k, ω) is an analytic function of ω near the real axis and thence have used the Cauchy-Riemann equations for the derivatives; and in going from the second line to the third we have used the fact that ǫi → 0 when the velocity distribution is one that produces ωi → 0 [cf. Eq. (20.33) below], so the middle term on the second line is second order in ωi and can be neglected. Equation (20.31) expresses the dielectric function slightly away from the real axis in terms of its value and derivative on and along the real axis. The on-axis value can be computed by breaking the Landau contour depicted in Fig. 20.2(b) into three pieces: two lines from ±∞ to a small distance δ from the pole, plus a semicircle of radius δ around the pole, and by then taking the limit δ → 0. The first two terms (the R two straight lines) together produce the Cauchy principal value of the integral (denoted P below), and the third produces 2πi times half the residue of the pole at v = ωr /k: Z e2 F ′ dv ′ ǫ(k, ωr ) = 1 − dv + iπF (v = ωr /k) . (20.32) me ǫ0 k 2 P v − ωr /k Inserting this equation and its derivative with respect to ωr into Eq. (20.31), and setting the

14 result to zero, we obtain e2 ǫ(k, ωr +iωi ) ≃ 1− me ǫ0 k 2

Z

P

∂ F ′ dv + iπF ′ (ωr /k) + iωi v − ωr /k ∂ωr

Z

P

F ′ dv = 0 . (20.33) v − ωr /k

This is the dispersion relation in the limit |ωi | ≪ ωr . Notice that the vanishing of ǫr determines the real part of the frequency e2 1− me ǫ0 k 2

Z

P

F′ dv = 0 , v − ωr /k

(20.34)

and the vanishing of ǫi determines the imaginary part ωi =

πF ′(ωr /k) R F′ ∂

− ∂ωr

P v−ωr /k

dv

.

(20.35)

Notice, further, that the sign of ωi is influenced by the sign of F ′ = dF/dv at v = ωr /k = Vφ = (the waves’ phase velocity). As we shall see, this has a simple physical origin and important physical consequences. Usually, but not always, the denominator of Eq. (20.35) is positive, so the sign of ωi is the same as the sign of F ′ (ωr /k).

20.3.5

Langmuir Waves and their Landau Damping

We shall now apply the dispersion relation (20.33) to Langmuir waves in a thermalized plasma. Langmuir waves typically move so fast that the slow ions cannot interact with them, so their dispersion relation is influenced significantly only by the electrons. We therefore shall ignore the ions and include only the electrons in F (v). We obtain F (v) by integrating out vy and vz in the 3-dimensional Boltzmann distribution [Eq. (2.22d) with E = 12 me (vx2 +vy2 +vz2 )]; R the result, correctly normalized so that F (v)dv = n, is F =n

me 2πkB T

1/2

e−(me v

2 /2k

BT )

,

(20.36)

where T is the electron temperature. Now, as we saw in Eq. (20.35), ωi ∝ F ′ (v = ωr /k) with a proportionality constant that is usually positive. Physically, this proportionality arises from the manner in which electrons surf on the waves: Those electrons moving slightly faster than the waves’ phase velocity Vφ = ωr /k (usually) lose energy to the waves on average, while those moving slightly slower (usually) extract energy from the waves on average. Therefore, (i) if there are more slightly slower particles than slightly faster [F ′ (v = ωr /k) < 0], then the particles on average gain energy from the waves and the waves are damped [ωi < 0]; (ii) if there are more slightly faster particles than slightly slower [F ′ (v = ωr /k) > 0], then the partices on average lose energy to the waves and the waves are amplified [ωi > 0]; and (iii) the bigger the disparity between the number of slightly faster electrons and the number of slightly slower electrons, i.e. the bigger |F ′ (ωr /k)|, the larger will be the damping or growth of wave energy, i.e. the

15 larger will be |ωi |. It will turn out, quantitatively, that, if the waves’ phase velocity ωr /k is anywhere near the steepest point on the side p of the electron velocity distribution, i.e. if ωr /k is of order the electron thermal velocity kB T /me , then the waves will be strong damped, ωi ∼ −ωr . Since our dispersion relation (20.36) is valid only p when the waves are weakly damped, we must restrict p ourseves to waves with ωr /k ≫ kB T /me (a physically allowed regime) or to ωr /k ≪ kB T /me (a regime that does not occur in Langmuir waves; cf. Fig. 19.1). p Requiring, then, that ωr /k ≫ kB T /me and p noting that the integral in Eq. (20.33) gets its dominant contribution from velocities v . kB T /me , we can expand 1/(v − ωr /k) in the integrand as a power series in vk/ωr , obtaining Z ∞ Z k2 v k3v2 k4 v3 k F ′ dv ′ = + 2 + 3 + 4 + ... dvF ωr ωr ωr ωr −∞ P v − ωr /k 2 2 4 nk 3nhv ik = 2 + + ... ωr ωr4 nk 2 3kB T k 2 = 2 1+ + ... ωr me ωr2 ωp2 nk 2 2 2 (20.37) ≃ 2 1 + 3k λD 2 . ωr ωr Substituting Eqs. (20.36) and (20.37) into Eqs. (20.34) and (20.35), and noting that p ωr /k ≫ kB T /me ≡ ωp λD implies kλD ≪ 1 and ωr ≃ ωp , we obtain ωr = ωp (1 + 3k 2 λ2D )1/2 ,

(20.38a)

π 1/2 ω 1 3 p ωi = − . exp − 2 2 − 8 k 3 λ3D 2k λD 2

(20.38b)

p The real part of this dispersion relation, ωr = ωp 1 + 3k 2 λ2D , reproduces the BohmGross result that we derived from fluid theory in Sec. 19.3.4 and plotted in Fig. 19.1. The imaginary part reveals the damping of these Langmuir waves by surfing electrons—so-called Landau damping. The fluid theory could not predict this Landau damping, because it is a result of the internal dynamics in velocity space, of which the fluid theory is oblivious. Notice that, as the waves’ wavelength is decreased, i.e. as k increases, the waves’ phase velocity decreases toward the electron thermal velocity and the damping becomes stronger, as is expected from our discussion of the number of electrons that can surf on the waves. In the limit k → 1/λD (where our dispersion relation has broken down √and so is only an orderof-magnitude guide), the dispersion relation predicts that ωr /k ∼ kB T and ωi /ωr ∼ 1/10. In the opposite regime of large wavelength kλD ≪ 1 (where our dispersion relation should be quite accurate), the Landau damping is very weak—so weak that ωi decreases to zero with increasing k faster than any power of k.

16

Fp F

√kBTp/mp

Fe √kBTe/mp

Fig. 20.3: Electron and ion contributions to the net distribution function F (v)in a thermalized plasma. When Te ∼ Tp , the phase speed of ion acoustic waves is near the left tick mark on the horizontal axis—a speed at which surfing protons have a maximal ability to Landau-damp the waves, and the waves are strongly damped. When Te ≫ Tp , the phase speed is near the right tick mark—far out on the tail of the proton velocity distribution so few protons can surf and damp the waves, and near the peak of the electron distribution so the number of electrons moving slightly slower than the waves is nearly the same as the number moving slightly faster and there is little net damping by the electrons. In this case the waves can propagate.

20.3.6

Ion Acoustic Waves and Conditions for their Landau Damping to be Weak

As we saw in Sec. 19.3.4, ion acoustic waves are the analog of ordinary sound waves in a fluid: They occur at low frequencies where the mean (fluid) electron velocity is very nearly locked to the mean (fluid) proton velocity so the electric polarization is small; the restoring force is due to thermal pressure and not to the electrostatic field; and the inertia is provided by the heavy protons. It was asserted in Sec. 19.3.4 that to avoid these waves being strongly damped, the electron temperature must be much higher than the proton temperature, Te ≫ Tp . We can now understand this in terms of particle surfing and Landau damping: Suppose that the electrons and protons have Maxwellian velocity distributions but possibly with different temperatures. Because of their far greater inertia, the protons will have a far smaller mean thermal speed than the electrons, so the net one-dimensional distribution function F (v) = Fe (v) + (me /mp )Fp (v) [Eq. (20.21)] that appears in the kinetic-theory dispersion relation has the form shown in Fig. 20.3. Now, if Te ∼ Tp , then the contributions of the electron pressure and proton pressure to the waves’ p be comparable, p restoring force will and the waves’ phase velocity will therefore be ωr /k ∼ k(Te + Tp )/mp ∼ kTp /mp = vth,p , which is the thermal proton velocity and also is the speed at which the proton contribution to F (v) has its steepest slope (see the left tick mark on the horizontal axis in Fig. 20.3); so |F ′ (v = ωr /k)| is large. This means there will be large numbers of protons that can surf on the waves and a large disparity between the number moving slightly slower than the waves (which extract energy from the waves) and the number moving slightly faster (which give energy to the waves). The result will be strong Landau damping by the protons.

17 This strong Landau damping is avoided if Te ≫ Tp . Then the waves’ phase velocity p will be ω /k ∼ kT /m r e p which is large compared to the proton thermal velocity vth,p = p kTp /mp and so is way out on the tail of the proton velocity distribution where there are very few protons that can surf and damp the waves; see the right tick mark on the horizontal axis in Fig. 20.3. Thus, Landau damping by protons has been shut down by raising the electron p velocity p temperature. What about Landau damping by electrons? The phase ωr /k ∼ kTe /mp is small compared to the electron thermal velocity vth,e = kTe /me , so the waves reside near the peak of the electron velocity distribution, where Fe (v) is large so many electrons can surf with the waves, but Fe′ (v) is small so there are nearly equal numbers of faster and slower electrons and the surfing produces little net Landau damping. Thus, Te /Tp ≫ 1 leads to successful propagation of ion acoustic waves. A detailed computation based on the |ωi | ≪ ωr version of our kinetic-theory dispersion relation, Eq. (20.33), makes this physical argument quantitative. The details are p carried out p in Ex. 20.5 under the assumptions that Te ≫ Tp and kB Tp /mp ≪ ωr /k ≪ kB Te /me (corresponding to the above discussion); and the result is: ωr = k

s

kB Te /mp , 1 + k 2 λ2D

"r p # 3/2 π/8 Te −Te /Tp ωi me . =− + exp ωr (1 + k 2 λ2D )3/2 mp Tp 2(1 + k 2 λ2D )

(20.39a)

(20.39b)

The real part of this dispersion relation was plotted inp Fig. 19.1; as is shown there and in the above formula, for kλD ≪ 1 the waves’ phase speed is kB Te /mp , and the waves are only p weakly damped: they can propagate for roughly mp /me ∼ 43 periods before damping has a strong effect. This damping remains present when Te /Tp → 0, so it must be due to surfing electrons. When the wavelength is decreased (k is increased) into the regime kλD & 1, the waves’ frequency asymptotes toward ωr = ωpp , the proton plasma frequency, and the phase velocity decreases, so more protons can surf the waves and the Landau damping increases. p Equation (20.39) shows us that the damping becomes very strong when kλD ∼ Te /Tp , and p that this is also the point at which ωr /k has decreased to the proton thermal velocity kB Tp /mp —which is in accord with our physical arguments about proton surfing. When Te /Tp is decreased from ≫ 1 toward unity, the ion damping becomes strong regardless of how small may be k [cf. the second term of ωi /ωr in Eq. (20.39)]. This is also in accord with our physical reasoning. Ion acoustic waves are readily excited at the bow shock where the earth’s magnetosphere impacts the solar wind. It is observed that they are not able to propagate very far away from the shock, by contrast with the Alfvén waves, which are much less rapidly damped. **************************** EXERCISES

18 Exercise 20.3 Example: Derivation of Dielectric Tensor for Longitudinal, Electrostatic Waves Derive expression (20.20) for the zz component of the dielectric tensor in a plasma excited by a weak electrostatic wave and show that the wave’s dispersion relation is ǫ3 = 0. Hints: Notice that the z component of the plasma’s electric polarization Pz is related to the charge density by ∇ · P = ikPz = −ρe [Eq. (19.1)]; combine this with Eq. (20.18) to get a linear relationship between Pz and Ez = E; argue that the only nonzero component of the plasma’s electric susceptibility is χzz and deduce its value by comparing the above result with Eq. (19.3); then construct the dielectric tensor ǫij from Eq. (19.5) and the algebratized wave operator Lij from Eq. (19.9), and deduce that the dispersion relation det||Lij || = 0 takes the form ǫzz ≡ ǫ3 = 0, where ǫ3 is given by Eq. (20.20). Exercise 20.4 Example: Landau Contour Deduced Using Laplace Transforms Use Laplace-transform techniques to derive Eqs. (20.26)–(20.28) for the time-evolving electric field of electrostatic waves with fixed wave number k and initial velocity perturbations Fs1 (v, 0). A sketch of the solution follows. (a) When the initial data are evolved forward in time, they produce F1 (v, t) and E(t). Construct the Laplace transforms of these evolving quantities:4 Z ∞ Z ∞ −pt ˜ ˜ F1 (v, p) = dte F1 (v, t) , E(p) = dte−pt E(t) . (20.40) 0

0

To ensure that the time integral is convergent, insist that ℜ(p) be greater than p0 ≡ maxn ℑ(ωn ) ≡ (the e-folding rate of the most strongly growing mode—or, if none grow, then the most weakly damped mode). This is an essential step in the argument leading to the Landau contour. Also, for simplicity of the subsequent analysis, insist that ℜ(p) > 0. (b) By constructing the Laplace transform of the one-dimensional Vlasov equation (20.16) and integrating by parts the term involving ∂Fs1 /∂t, obtain an equation for a linear ˜ combination of F˜s1 (v, p) and E(p) in terms of the initial data Fs1 (v, t = 0). By then combining with the Laplace transform of Poisson’s equation, show that X qs Z ∞ Fs1 (v, 0) 1 ˜ dv . (20.41) E(p) = ǫ(ip, k) s kǫ0 −∞ ip − kv Here ǫ(ip, k) is the dielectric function (20.22) evaluated for frequency ω = ip, with the integral running along the real v axis, and [as we noted in part (a)] ℜ(p) must be greater than p0 , the largest ωi of any mode, and greater than 0. This situation for the dielectric function is the one depicted in Fig. 20.2(a). (c) Laplace-transform theory tells us that the time-evolving electric field (with wave number k) can be expressed in terms of its Laplace transform (20.41) by Z σ+i∞ pt dp ˜ E(t) = E(p)e , (20.42) 2πi σ−i∞ 4

For useful insights into the Laplace transform, see, e.g., Sec. 4-2 of Mathews, J. and Walker, R. L., 1964, Mathematical Methods of Physics, New York: Benjamin.

19 where σ is any real number larger than p0 and larger than 0. Combine this equation ˜ with expression (20.41) for E(p), and set p = −iω. Thereby arrive at the desired result, Eq. (20.26). Exercise 20.5 Derivation: Ion Acoustic Dispersion Relation Consider a plasma in which the electrons have a Maxwellian velocity distribution with temperature Te , the protons are Maxwellian with temperature p Tp , and Tp ≪ Te ; and consider a p mode in this plasma for which kB Tp /mp ≪ ωr /k ≪ kB Te /me (right tick mark in Fig. 20.3). As was argued in the text, for such a mode it is reasonable to expect weak damping, |ωi | ≪ ωr . Making approximations based on these “≪” inequalities, show that the dispersion relation (20.33) reduces to Eq. (20.39). Exercise 20.6 Problem: Dispersion relations for a non-Maxwellian distribution function. Consider a plasma with cold protons [whose velocity distribution can be ignored in F (v)] and hot electrons with one dimensional distribution function of the form F (v) =

nv0 . π(v02 + v 2 )

(20.43)

(a) Derive the dielectric function ǫ(ω, k) for this plasma and use it to show that the dispersion relation for Langmuir waves is ω = ωpe − ikv0 .

(20.44)

(b) Compute the dispersion relation for ion acoustic waves assuming that their phase speeds are much less than v0 but large compared to the cold protons’ thermal velocities (so the contribution from proton surfing can be ignored). Your result should be ω=

kv0 (me /mp )1/2 ikv0 (me /mp ) − . 2 1/2 [1 + (kv0 /ωpe ) ] [1 + (kv0 ωpe )2 ]2

(20.45)

****************************

20.4

Stability of Electrostatic Waves in Unmagnetized Plasmas

Equation (20.35) implies that the sign of F ′ at resonance dictates the sign of the imaginary part of ω. This raises the interesting possibility that distribution functions that increase with velocity over some range of positive v might be unstable to the exponential growth of electrostatic waves. In fact, the criterion for instability turns out to be a little more complex than this [as one might suspect from the fact that the sign of the denominator of Eq. (20.35) is non-obvious], and deriving it is an elegant exercise in complex variable theory.

20

ζi

ζ Plane

Zi <

ζP=vmin

C >

−∞ ∞

ζr

>

Z Plane C •

P Zr

Fig. 20.4: Mapping of the real axis in the complex ζ plane onto the Z plane; cf. Eq. (20.47). This is an example of a Nyquist diagram.

Let us rewrite our dispersion relation (20.30) (again restricting our attention to real values of k) in the form Z(ω/k) = k 2 > 0 , (20.46) where the complex function Z(ζ) is given by e2 Z(ζ) = me ǫ0

Z

L

F′ dv . v−ζ

(20.47)

Now let us map the real axis in the complex ζ plane (wave phase velocity plane) onto the Z plane (k 2 plane; Fig. 20.4). The points at (ζr = ±∞, ζi = 0) clearly map onto the origin and so the image of the real ζ axis must be a closed curve. Furthermore, by inspection, the region enclosed by the curve is the image of the upper half ζ plane. (The curve may cross itself, if Z(ζ) is multi-valued, but this will not affect our argument.) Now, there will be a growing mode (ζi > 0) if and only if the curve crosses the positive real Z axis. Let us consider an upward crossing at some point P as shown in Fig. 20.4 and denote by ζP the corresponding point on the real ζ axis. At this point, the imaginary part of Z vanishes, which implies immediately that F ′ (v = ζP ) = 0; cf. Eq. (20.35). Furthermore, as Zi is increasing at P (cf. Fig. 20.4), F ′′ (v = ζP ) > 0. Thus, for instability to the growth of electrostaic waves it is necessary that the one-dimensional distribution function have a minimum at some v = vmin ≡ ζP . This is not sufficient, as the real part of Z(ζP ) = Z(vmin ) must also be positive. Since vmin ≡ ζP is real, Zr (vmin) is the Cauchy principal value of the integral (20.47), which we can rewrite as follows using an integration by parts: Z Z F′ d[F (v) − F (vmin )]/dv Zr (vmin) = dv = dv v − vmin P v−ζ P Z [F (v) − F (vmin )] dv = (v − vmin )2 P F (vmin − δ) − F (vmin ) F (vmin + δ) − F (vmin ) + lim . (20.48) − δ→0 −δ δ The second, limit term vanishes since F ′ (vmin ) = 0, and in the first term we do not need the

21 Cauchy principal value because F is a minimum at vmin , so our requirement is that Zr (vmin ) =

Z

+∞ −∞

[F (v) − F (vmin )] dv > 0 . (v − vmin )2

(20.49)

Thus, a sufficient condition for instability is that there exist some velocity vmin at which the distribution function F (v) has a minimum, and that in addition the minimum be deep enough that the integral (20.49) is positive. This is called the Penrose criterion for instability.5 **************************** EXERCISES Exercise 20.7 Example: Penrose Criterion Consider an unmagnetized electron plasma with a one dimensional distribution function F (v) ∝ {[(v − v0 )2 + u2 ]−1 + [(v + v0 )2 + u2 ]−1 } ,

(20.50)

where v0 and u are constants. Show that this distribution function possesses a minimum if v0 > 3−1/2 u, but the minimum is not deep enough to cause instability unless v0 > u. Exercise 20.8 Problem: Range of Unstable Wave Numbers Consider a plasma with distribution function F (v) that has precisely two peaks, at v = v1 and v = v2 , and a minimum between them at v = vmin . For what range of wave numbers k will there be at least one unstable mode? Express your answer in terms of integrals over the distribution function analogous to that, Eq. (20.49), which appears in the Penrose criterion for instability. ****************************

20.5

Particle Trapping

We now return to the description of Landau damping. Our treatment so far has been essentially linear in the wave amplitude, or equivalently in the perturbation to the distribution function. What happens when the wave amplitude is not infinitesimally small? Consider a single Langmuir wave mode as observed in a frame moving with the mode’s phase velocity. In this frame the electrostatic field oscillates spatially, E = E0 sin kz, but has no time dependence. Figure 20.5 shows some phase-space orbits of electrons in this oscillatory potential. The solid curves are orbits of particles that move fast enough to avoid being trapped in the potential wells at kz = 0, 2π, 4π, . . .. The dashed curves are orbits of trapped particles. As seen in another frame, these trapped particles are surfing on the wave,

22

v

z

Fig. 20.5: Phase-space orbits for trapped (dashed lines) and untrapped (solid lines) electrons.

with their velocities performing low-amplitude oscillations around the wave’s phase velocity ω/k. The equation of motion for an electron trapped in the minimum z = 0 has the form −eE0 sin kz me 2 ≃ −ωb z ,

z¨ =

(20.51)

where we have assumed small-amplitude oscillations and approximated sin kz ≃ kz, and where 1/2 eE0 k (20.52) ωb = me is known as the bounce frequency. Since the potential well is actually anharmonic, the trapped particles will mix in phase quite rapidly. The growth or damping of the wave is characterized by a growth or damping of E0 , and correspondingly by a net acceleration or deceleration of untrapped particles, when averaged over a wave cycle. It is this net feeding of energy into and out of the untrapped particles that causes the wave’s Landau damping or growth. Now suppose that the amplitude E0 of this particular wave mode is changing on a time scale τ due to interactions with the electrons, or possibly (as we shall see in Chap. 22) due to interactions with other waves propagating through the plasma. The potential well will then change on this same timescale and we expect that τ will also be a measure of the maximum length of time a particle can be trapped in the potential well. These nonlinear, wave trapping effects should only be important when the bounce period ∼ ωb−1 is short compared with τ , i.e. when E0 ≫ me /ekτ 2 . Electron trapping can cause particles to be bunched together at certain preferred phases of a growing wave. This can have important consequences for the radiative properties of the plasma. Suppose, for example, that the plasma is magnetized. Then the electrons gyrate around the magnetic field and emit cyclotron radiation. If their gyrational phases are 5

Penrose, O. 1960, Phys. Fluids 3, 258.

23 Wp W+ eΦ z

WWe Fig. 20.6: BGK waves. The ordinate is eΦ, where Φ(z) is the one dimensional electrostatic potential. The proton total energies, Wp , are displayed increasing upward; the electron energies, We , increase downward. In this example, corresponding to Challenge 20.9, there are monoenergetic proton (solid line) and electron (dashed line) beams plus a bound distribution of electrons (shaded region) trapped in the potential wells formed by the electrostatic potential.

random then the total power that they radiate will be the sum of their individual particle powers. However, if N electrons are localized at the same gyrational phase due to being trapped in a potential well of a wave, then, they will radiate like one giant electron with a charge Ne. As the radiated power is proportional to the square of the charge carried by the radiating particle, the total power radiated by the bunched electrons will be N times the power radiated by the same number of unbunched electrons. Very large amplification factors are thereby possible both in the laboratory and in Nature, for example in the Jovian magnetosphere. This brief discussion suggests that there may be much more richness in plasma waves than is embodied in our dispersion relations with their simple linear growth and decay, even when the wave amplitude is small enough that the particle motion is only slightly perturbed by its interaction with the wave. This motivates us to discuss more systematically nonlinear plasma physics, which is the topic of our next chapter. **************************** EXERCISES Exercise 20.9 Challenge: BGK Waves Consider a steady, one dimensional, large amplitude electrostatic wave in an unmagnetized, proton-electron plasma. Write down the Vlasov equation for each component in a frame moving with the wave, i.e. in which the electrostatic potential is a time-independent function of z, Φ = Φ(z), not necessarily precisely sinusoidal. (a) Use Jeans’ theorem to argue that proton and electron distribution functions that are just functions of the energy Wp,e = mp,e v 2 /2 ± eΦ satisfy the Vlasov equation. Then show that Poisson’s equation for the potential Φ can be rewritten in the form 2 1 dΦ + V (Φ) = const. (20.53) 2 dz

24 where the potential V is −2/ǫ0 times the kinetic energy density of the particles (which depends on Φ). (b) It is possible to find many self-consistent potential profiles and distribution functions in this manner. These are called BGK waves.6 Explain how, in principle, one can solve for the self-consistent distribution functions in a large amplitude wave of given potential profile. (c) Carry out this procedure, assuming that the potential profile is of the form Φ(z) = Φ0 cos kz with Φ0 > 0. Assume also that the protons are monoenergetic with Wp = W+ > eΦ0 and move along the positive z-direction, and that there are both monoenergetic (with We = W− ), untrapped electrons (also moving along the positive z-direction), and trapped electrons with distribution Fe (We ), −eΦ0 ≤ We < eΦ0 . Show that the density of trapped electrons must vanish at the wave troughs (at z = (2n + 1)π/k; n = 0, 1, 2, 3 . . . ). Let the proton density at the troughs be np0 , and assume that there is no net current as well as no net charge density. Show that the total electron density can be written as 1/2 Z eΦ0 me (W+ − eΦ0 ) dWe Fe (We ) ne (z) = np0 + . (20.54) 1/2 mp (W− + eΦ) −eΦ0 [2me (We + eΦ)] (d) Use Poisson’s equation to show that Z

ξ0

−ξ0

ǫ0 k 2 Φ dWe Fe (We ) = + np0 [2me (We + ξ)]1/2 e2

"

W+ − ξ0 W+ − ξ

1/2

−

me (W+ − ξ0 ) mp (W− + ξ)

1/2 #

,

(20.55) where eΦ0 = ξ0 . (e) Solve this integral equation for Fe (We ). (Hint: it is of Abel type.) (f) Exhibit some solutions graphically.

****************************

20.6

N Particle Distribution Function

Before turning to nonlinear phenomena in plasmas (the next chapter), let us digress briefly and explore ways to study correlations between particles, of which our Vlasov formalism is oblivious. The Vlasov formalism treats each particle as independent, migrating through phase space in response to the local mean electromagnetic field and somehow unaffected by individual electrostatic interactions with neighboring particles. As we discussed in chapter 19, this 6

Bernstein, I. B., Greene, J. M. & Kruskal, M. D. 1957 Phys. Rev. 108 546

25 is likely to be a good approximation in a collisionless plasma because of the influence of screening—except in the tiny “independent-particle” region of Fig. 18.1. However, we would like to have some means of quantifying this and of deriving an improved formalism that takes into account the correlations between individual particles. One environment where this may be particularly relevant is the interior of the sun. Here, although the gas is fully ionized, the Debye number is not particularly large (i.e., one is near the independent particle region of Fig. 18.1) and Coulomb corrections to the perfect gas equation of state may be responsible for measurable changes in the internal structure as derived, for example, using helioseismological analysis (cf. Sec. 15.2). In this application our task is simpler than in the general case because the gas will locally be in thermodynamic equilibrium at some temperature T . It turns out that the general case when the plasma departs significantly from thermal equilibrium is extremely hard to treat. The one-particle distribution function that we use in the Vlasov formalism is the first member of a hierarchy of k-particle distribution functions, f (k) (x1 , x2 , . . . , xk , v1 , v2 , . . . , vk , t) which equal the probability that particle 1 will be found in a volume of its phase space (which we shall denote by dx1 dv1 ≡ dx1 dx2 dx3 dvx1 dvx2 dvx3 ), and that particle 2 will be found in volume dx2 dv2 of its phase space, etc. Our probability interpretation of these distribution functions dictates for f (1) a different normalization than we use in the Vlasov formalism, f (1) = f /n where n is the number density of particles, and dictates that Z f (k) dx1 dv1 · · · dxk dvk = 1. (20.56) It is useful to relate the distribution functions f (k) to the concepts of statistical mechanics, which we developed in Chap. 3. Suppose we have an ensemble of N-electron plasmas and let the probability that a member of this ensemble is in a volume dN xdN v of 6N dimensional phase space be F˜ dN xdN v. (N is a very large number!) F˜ satisfies the Liouville equation N

i ∂ F˜ X h + (vi · ∇i )F˜ + (ai · ∇vi )F˜ = 0 ∂t i=1

(20.57)

where ai is the electromagnetic acceleration of the i’th particle, and ∇i and ∇vi are gradients with respect to the position and velocity of particle i. We can construct the k-particle “reduced” distribution function from the statistical-mechanics distribution function F˜ by integrating over all but k of the particles: f (k) (x1 , x2 , . . . , xk , v1 , v2 , . . . , vk , t) Z k dxk+1 . . . dxN vk+1 . . . dvN F˜ (x1 , . . . , xN , v1 , . . . , vN ) =N

(20.58)

(Note: k is typically a very small number, by contrast with N; below we shall only be concerned with k = 1, 2, 3.) The reason for the prefactor N k in Eq. (20.58) is that whereas F˜ referred to the probability of finding particle 1 in dx1 dv1 , particle 2 in dx2 dv2 etc, the reduced distribution function describes the probability of finding any of the N identical, though distinguishable particles in dx1 dv1 and so on. (As long as we are dealing with

26 non-degenerate plasmas we can count the electrons classically.) As N ≫ k, the number of possible ways we can choose k particles for k locations in phase space is approximately N k . For simplicity, suppose that the protons are immobile and form a uniform, neutralizing background of charge so that we need only consider the electron distribution and its correlations. Let us further suppose that the forces associated with mean electromagnetic fields produced by external charges and currents can be ignored. We can then restrict our attention to direct electron-electron electrostatic interactions. The acceleration is then e X ai = ∇i Φij , (20.59) me j where Φij (xij ) = −e/4πǫ0 xij is the electrostatic potential between two electrons i, j and xij = |xi − xj |. We can now derive the so-called BBGKY7 hierarchy of kinetic equations which relate the k-particle distribution function to integrals over the k + 1 particle distribution function. The first equation in this hierarchy is given by integrating Liouville’s Eq. (20.57) over dx2 . . . dxN dv2 . . . dvN . If we assume that R the distribution function decreases to zero at large distances, then integrals of the type dxi ∇i F˜ vanish and the one particle distribution function evolves according to Z −eN ∂f (1) (1) dx2 . . . dxN dv2 . . . dvN Σj ∇1 Φ1j · ∇v1 F˜ + (v1 · ∇)f = ∂t me Z −eN 2 = dx2 . . . dxN dv2 . . . dvN ∇1 Φ12 · ∇v1 F˜ me Z −e = dx2 dv2 ∇v1 f (2) · ∇1 Φ12 , (20.60) me where we have again replaced the probability of having any particle at a location in phase space by N times the probability of having one specific particle there. The left hand side of Eq. (20.60) describes the evolution of independent particles and the right hand side takes account of their pairwise mutual correlation. The evolution equation for f (2) can similarly be derived by integrating the Liouville equation (20.57) over dx3 . . . dxN dv3 . . . dvN ∂f (2) e ∇v1 f (2) · ∇1 Φ12 + ∇v2 f (2) · ∇2 Φ12 + (v1 · ∇1 )f (2) + (v2 · ∇2 )f (2) + ∂t me Z −e = dx3 dv3 ∇v1 f (3) · ∇1 Φ13 + ∇v2 f (3) · ∇2 Φ23 , (20.61) me and, in general, allowing for the presence of mean electromagnetic field (in addition to the inter-electron electrostatic field) causing an acceleration aext = −(e/me )(E + v × B), we 7

Bogolyubov, N. N. 1962, Studies in Statistical Mechanics Vol 1, ed. J. de Boer & G. E. Uhlenbeck, Amsterdam: North Holland; Born, M. & Green, H. S. 1949, A General Kinetic Theory of Liquids, Cambridge: Cambridge University Press; Kirkwood, J. G. 1946, J. Chem. Phys. 14, 18; Yvon, J. 1935, La Théorie des ´ ´ Fluides et l’Equation d’Etat, Actualités Scientifiques et Industrielles, Paris: Hermann.

27 obtain the BBGKY hierarchy of kinetic equations " # k k X X e ∂f (k) + (vi · ∇i )f (k) + (aext · ∇vi )f (k) + (∇vi f (k) · ∇i ) Φij i ∂t m e i=1 j6=i −e = me

Z

dxk+1 dvk+1

k X i=1

∇vi f (k+1) · ∇i Φik+1 .

(20.62)

We see explicitly how we require knowledge of the (k + 1)-particle distribution function in order to determine the evolution of the k-particle distribution function. The pairwise mutual correlation that appeared in Eq. (20.60) implicitly contains information about triple and higher correlations. It is convenient to define the two point correlation function, ξ12 (x1 , v1 , x2 , v2 , t) for particles 1,2, by f (2) (1, 2) = f1 f2 (1 + ξ12 ) , (20.63) where we introduce the notation f1 = f (1) (x1 , v1 , t) and f2 = f (1) (x2 , v2 , t). We now restrict attention to a plasma in thermal equilibrium at temperature T . In this case, f1 , f2 will be Maxwellian distribution functions, independent of x, t. Now, let us make an ansatz, namely that ξ12 is just a function of the electrostatic interaction energy between the two electrons and therefore it does not involve the electron velocities. (It is, actually, possible to justify this directly for an equilibrium distribution of particles interacting electrostatically, but we shall make do with showing that our final answer for ξ12 is just a function of x12 .) As screening should be effective at large distances, we anticipate that ξ12 → 0 as x12 → ∞. Now turn to Eq. (20.61), and introduce the simplest imaginable closure relation, ξ12 = 0. In other words, completely ignore all correlations. We can then replace f (2) by f1 f2 and perform the integral over x2 , v2 to recover the collisionless Vlasov equation (20.5). We therefore see explicitly that particle-particle correlations are indeed ignored in the simple Vlasov approach. For the 3-particle distribution function, we expect that when electron 1 is distant from both electrons 2,3, then f (3) ∼ f1 f2 f3 (1 + ξ23 ), etc. Summing over all three pairs we write, f (3) = f1 f2 f3 (1 + ξ23 + ξ31 + ξ12 + χ123 ) ,

(20.64)

where χ123 is the three point correlation function that ought to be significant when all three particles are close together. χ123 is, of course determined by the next equation in the BBGKY hierarchy. We next make the closure relation χ123 = 0, that is to say, we ignore the influence of third bodies on pair interactions. This is reasonable because close, three body encounters are even less frequent than close two body encounters. We can now derive an equation for ξ12 by seeking a steady state solution to Eq. (20.61), i.e. a solution with ∂f (2) /∂t = 0. We substitute Eqs. (20.63) and (20.64) into (20.61) (with χ123 = 0) to obtain e(1 + ξ12 ) f1 f2 (v1 · ∇1 )ξ12 + (v2 · ∇2 )ξ12 − {(v1 · ∇1 )Φ12 + (v2 · ∇2 )Φ12 } kB T Z ef1 f2 dx3 dv3 f3 (1 + ξ23 + ξ31 + ξ12 ) [(v1 · ∇1 )Φ13 + (v2 · ∇2 )Φ23 ] , (20.65) = kB T

28 where we have used the relation ∇v1 f1 = −

me v1 f1 , kB T

(20.66)

valid for an unperturbed Maxwellian distribution function. We can rewrite this equation using the relations Z ∇1 Φ12 = −∇2 Φ12 , ∇1 ξ12 = −∇2 ξ12 , ξ12 ≪ 1 , dv3 f3 = n , (20.67) to obtain Z eΦ12 ne (v1 − v2 ) · ∇1 ξ12 − = dx3 (1 + ξ23 + ξ31 + ξ12 )[(v1 · ∇1 )Φ13 + (v2 · ∇2 )Φ23 ] . kB T kB T (20.68) Now, symmetry considerations tell us that Z dx3 (1 + ξ31 )∇1 Φ13 = 0 , Z dx3 (1 + ξ12 )∇2 Φ23 = 0 , (20.69) and, in addition, Z

Z

dx3 ξ12 ∇1 Φ13 = −ξ12 dx3 ξ12 ∇2 Φ23 = −ξ12

Z

Z

dx3 ∇3 Φ13 = 0 , dx3 ∇3 Φ23 = 0.

Therefore, we end up with Z eΦ12 ne (v1 − v2 ) · ∇1 ξ12 − = dx3 [ξ23 (v1 · ∇1 )Φ31 + ξ31 (v2 · ∇2 )Φ23 ] . kB T kB T

(20.70)

(20.71)

As this equation must be true for arbitrary velocities, we can set v2 = 0 and obtain Z ∇1 (kB T ξ12 − eΦ12 ) = ne dx3 ξ23 ∇1 Φ31 . (20.72) We take the divergence of Eq. (20.72) and use Poisson’s equation, ∇21 Φ12 = eδ(x12 )/ǫ0 , to obtain e2 ξ12 δ(x12 ) , (20.73) ∇21 ξ12 − 2 = λD ǫ0 kB T where λD = (kB T ǫ0 /ne2 )1/2 is the Debye length [Eq. (18.10)]. The solution of Eq. (20.73) is ξ12 =

−e2 e−x12 /λD . 4πǫ0 kB T x12

(20.74)

29 Note that the sign is negative because the electrons repel one another. Note also that, to order of magnitude, ξ12 (λD ) ∼ ND−1 which is small if the Debye number is much greater −2/3 than unity. At the mean interparticle spacing, ξ12 (n−1/3 ) ∼ ND . Only for distances x12 . e2 /ǫ0 kB T will the correlation effects become large and our expansion procedure and truncation (χ123 = 0) become invalid. This analysis justifies the use of the Vlasov equation when ND ≫ 1. Let us now return to the problem of computing the Coulomb correction to the pressure of an ionized gas. It is easiest to begin by computing the Coulomb correction to the internal energy density. For a one component plasma this is simply given by Z −e dx1 n1 n2 ξ12 Φ12 , (20.75) Uc = 2 where the factor 1/2 compensates for double counting the interactions. Substituting Eq. (20.74) and performing the integral, we obtain Uc =

−ne2 . 8πǫ0 λD

(20.76)

The pressure can be obtained from this energy density using elementary thermodynamics. From the definition of the free energy, Eq. (4.17b), the volume density of Coulomb free energy, Fc is given by integrating ∂(Fc /T ) 2 Uc = −T . (20.77) ∂T n From this, we obtain Fc = −ne2 /12πǫ0 λD . The Coulomb contribution to the pressure is then given given by Eq. (4.18) ∂(Fc /n) −ne2 1 2 Pc = n = = Uc . (20.78) ∂n 24πǫ0 λD 3 T Therefore, including the Coulomb interaction decreases the pressure at a given density and temperature. We have kept the neutralising protons fixed so far. In a real plasma they are mobile and so the Debye length must be reduced by a factor 2−1/2 [cf. Eq. (18.9)]. In addition, Eq. (20.75) must be multiplied by a factor 4 to take account of the proton-proton and proton-electron interactions. The end result is −n3/2 e3 Pc = , (20.79) 3/2 23/2 3πǫ0 T 1/2 where n is still the number density of electrons. Numerically the gas pressure for a perfect electron-proton gas is P = 1.6 × 1013 (ρ/1000kg m−3 )(T /106K)N m−2 ,

(20.80)

and the Coulomb correction to this pressure is Pc = −7.3 × 1011 (ρ/1000kg m−3 )3/2 (T /106 K)−1/2 N m−2 .

(20.81)

30 In the interior of the sun this is about one percent of the total pressure. In denser, cooler stars, it is significantly larger. However, for most of the plasmas that one encounters, this analysis suffices to show that the order of magnitude of the two point correlation function ξ12 is ∼ ND−1 across a Debye −2/3 sphere and only ∼ ND at the distance of the mean inter-particle spacing. Only those particles that are undergoing large deflections through angles ∼ 1 radian are close enough for ξ = O(1). This is the ultimate justification for treating plasmas as collisionless and using the mean electromagnetic fields in the Vlasov description. **************************** EXERCISES Exercise 20.10 Derivation: BBGKY Hierarchy Complete the derivation of Eq. (20.71) from Eq. (20.61). Exercise 20.11 Problem: Correlations in a Tokamak Plasma For a Tokamak plasma compute, in order of magnitude, the two point correlation function for two electrons separated by (a) a Debye length, (b) the mean inter-particle spacing. Exercise 20.12 Derivation: Thermodynamic identities Verify Eq. (20.77), (20.78). Exercise 20.13 Example: Thermodynamics of Coulomb Plasma Compute the entropy of a proton-electron plasma in thermal equilibrium at temperature T including the Coulomb correction. ****************************

Bibliographic Note The references for this chapter are similar to those for Chap. 20. The most useful are probably Chap. 8 of Stix (1992), Chaps. 3 and 7 of Schmidt (1966), Chaps. 7, 8 and 9 of Krall and Trivilpiece (1973), Chap. 3 of Lifshitz and Pitaevski (1981), and Chaps. XXXX of Boyd and Sanderson (1969).

31 Box 20.2 Important Concepts in Chapter 21 • Kinetic theory concepts: distribution function fs (v, x, t) and Vlasov equation for it, Sec. 20.2.1 • Relation of kinetic theory to two-fuid formalism, Sec. 20.2.2 • Jeans’ theorem for solutions of Vlasov equation, Sec. 20.2.3 • Electrostatic Waves treated via kinetic thoery, Secs. 20.3–20.5 – Distribution function reduced to one dimension (that of the wave propagation), Fs (v, z, t); its split into an equilibrium part Fs0 (v) and perturbation Fs1 , and the unified equilibrium distribution for electrons and protons, F0 = Fe0 + (me /mp )Fp0 , Sec. 20.3.1 – Dispersion relation as vanishing of the plasma’s dielectric permitivity ǫ(ω, k) = 0, Sec. 20.3.1 – Landau contour and its derivation via Laplace transforms, Sec. 20.3.3 and Ex. 20.4 – Dispersion relation ǫ(ω, k) = 0 as an integral along the Landau contour, Sec. 20.3.3 – Dispersion relation specialized to case of weak damping or growth, |ωi | ≪ ωr : ωi proportional to dF0 /dv evaluated at the waves’ phase velocity, and interpretation of this in terms of surfing particles, Secs. 20.3.4, 20.3.5, and 20.5 – Landau damping of Langmuir waves, Sec. 20.3.5 – Landau damping of ion acoustic waves, Sec. 20.3.6 – Two-stream instability, Sec. 20.3.2 – Criteria for instability of waves in terms of paths in complex phase-velocity plane, and Nyquist diagram, Sec. 20.4 – Penrose criterion for instability of a double humped velocity distribution, Sec. 20.4 – Particle trapping in a wave, Sec. 20.5 • N-particle distribution functions, Sec. 20.6 – BBGKY hierarchy of equations for their evolution, Sec. 20.6 – Two-point and three-point correlation functions, Sec. 20.6

Bibliography Boyd, T. J. M. and Sanderson, J. J. 1969. Plasma Dynamics, Nelson: London.

32 Hénon, M. 1982. “Vlasov Equation?” Astonomy and Astrophysics, 114, 211. Krall, N. A. and Trivelpiece, A. W. 1973. McGraw-Hill.

Principles of Plasma Physics, New York:

Lifshitz, E. M. and Pitaevski, L. P. 1981. Physical Kinetics, Oxford: Pergamon. Schmidt, G. 1966. Physics of High Temperature Plasmas, New York: Academic Press. Stix, T. H. 1992. Waves in Plasmas, New York: American Institute of Physics.

Contents 21 Nonlinear Dynamics of Plasmas 21.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Quasilinear Theory in Classical Language . . . . . . . . . . . . . . . . . . . . 21.2.1 Classical Derivation of the Theory . . . . . . . . . . . . . . . . . . . . 21.2.2 Summary of Quasilinear Theory . . . . . . . . . . . . . . . . . . . . . 21.2.3 Conservation Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.4 Generalization to Three Dimensions . . . . . . . . . . . . . . . . . . . 21.3 Quasilinear Theory in Quantum Mechanical Language . . . . . . . . . . . . 21.3.1 Wave-Particle Interactions . . . . . . . . . . . . . . . . . . . . . . . . 21.3.2 The relationship between classical and quantum mechanical formalisms in plasma physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.3 Three-Wave Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Quasilinear Evolution of Unstable Distribution Functions: The Bump in Tail 21.4.1 Instability of Streaming Cosmic Rays . . . . . . . . . . . . . . . . . . 21.5 Parametric Instabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.6 Solitons and Collisionless Shock Waves . . . . . . . . . . . . . . . . . . . . .

0

1 1 2 2 9 9 10 12 12 18 19 22 24 26 28

Chapter 21 Nonlinear Dynamics of Plasmas Version 0821.2.K.pdf, 18 April 2009. Changes from 0821.2.K.pdf are only correction of a few minor typographical errors. Box 21.1 Reader’s Guide • This chapter relies significantly on: – Portions of Chap. 2 on kinetic theory: Secs. 2.2.1 and 2.2.2 on the distribution function, Sec. 2.2.5 on the mean occupation number, and Sec. 2.6 on Liouville’s theorem and the collisionless Boltzmann equation. – Section 18.3 on Debye shielding, collective behavior of plasmas and plasma oscillations. – Sections 20.1–20.5 on kinetic theory of warm plasmas. • This chapter also relies to some extent but not greatly on: – The concept of spectral density as developed in Sec. 5.3. – Section 5.7 on the Fokker-Planck equation. No subsequent material in this book relies significantly on this chapter.

Please send comments, suggestions, and errata via email to [email protected], or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125

21.1

Overview

In Chap. 20 we met our first example of a velocity space instability, the two stream instability, which illustrated the general principle that departures from Maxwellian equilibrium in 1

2 velocity space in a collisionless plasma might be unstable and lead to the exponential growth of small amplitude waves, just as we found can happen for departures from spatial uniformity in a fluid. In Chap. 21, where we analyzed warm plasmas, we derived the dispersion relation for electrostatic waves in an unmagnetized plasma, we showed how Landau damping can damp the waves when the phase space density of the resonant particles diminishes with increasing speed, and we showed that in the opposite case of an increasing phase space density the waves can grow at the expense of the energies of near-resonant particles (provided the Penrose criterion is satisfied). In this chapter, we shall explore the back-reaction of the waves on the near-resonant particles. This back-reaction is a (weakly) nonlinear process, so we shall have to extend our analysis of the wave-particle interactions to include the leading nonlinearity. This extension is called quasilinear theory or weak turbulence theory, and it allows us to follow the time development of the waves and the near-resonant particles simultaneously. We develop this formalism in Sec. 21.2 and verify that it enforces the laws of particle conservation, energy conservation, and momentum conservation. Our original development of the formalism is entirely in classical language and meshes nicely with the theory of electrostatic waves as presented in Chap. 21. In Sec. 21.3, we reformulate the theory in terms of the emission, absorption and scattering of wave quanta. Although waves in plasmas almost always entail large quantum occupation numbers and thus are highly classical, this quantum formulation of the classical theory has great computational and heuristic power (and as one would expect, despite the presence of Planck’s constant ~ in the formalism, ~ nowhere appears in the final answers to problems). Our initial derivation and development of the formalism is restricted to the interaction of electrons with electrostatic waves, but we also describe how the formalism can be generalized to describe a multitude of wave modes and particle species interacting with each other. We also describe circumstances in which this formalism can fail, and the resonant particles can couple strongly, not to a broad-band distribution of incoherent waves (as the formalism presumes) but instead to one or a few individual, coherent modes. In Sec. 22.6 we explore an example. In Sec. 21.4 we turn to our first illustrative application of quasilinear theory: to a warm electron beam propagating through a stable plasma. We show how the particle distribution function evolves so as to shut down the growth of the waves and we illustrate this by describing the problem of the isotropization of Galactic cosmic rays. Next, in Sec. 21.5, we consider parametric instabilities which are very important in the absorption of laser light in experimental studies of the compression of small deuterium-tritium pellets - a possible forerunner of a commercial nuclear fusion reactor. Finally, in Sec. 21.6 we return to ion acoustic solitons and explain how the introduction of dissipation can create a collisionless shock, similar to that found where the earth’s bow shock meets the solar wind.

21.2

Quasilinear Theory in Classical Language

21.2.1

Classical Derivation of the Theory

In Chap. 21 we discovered that a distribution of hot electrons or ions can Landau damp a wave mode. We also showed that some distributions lead to exponential growth of the waves

3 in time. Either way there is energy transfer between the waves and the particles. We now turn to the back-reaction of the waves on the near-resonant particles that damp or amplify them. For simplicity, we shall derive the back-reaction equations (“quasilinear theory”) in the special case of electrons interacting with electrostatic Langmuir waves, and then shall assert the (rather obvious) generalization to protons or other ions and to interaction with other types of wave modes. We begin with the electrons’ one-dimensional distribution function Fe (v, z, t) [Eq. (20.14)]. As in Chap. 21, we split Fe into two parts, but we must do so more carefully here than there. The foundation for our split is a two-lengthscale expansion of the same sort as we used in developing geometric optics (Sec. 6.3): We introduce two disparate lengthscales, the short one being the typical reduced wavelength of a Langmuir wave λ ¯ ∼ 1/k, and the long one being a scale L ≫ λ ¯ over which we perform spatial averages. Later, when applying our formalism to an inhomogeneous plasma, L must be somewhat shorter than the spatial inhomogeneity scale but still ≫ λ ¯ . In our present situation of a homogeneous background plasma, there can still be large-scale spatial inhomogeneities caused by the growth or damping of the wave modes by their interaction with the electrons, and we must choose L somewhat smaller than the growth or damping length but still large compared to λ ¯. Our split of Fe is into the spatial average of Fe over the length L (denoted F0 ) plus a rapidly varying part that averages to zero (denoted F1 ): F0 ≡ hFe i ,

F1 ≡ Fe − F0 ,

Fe = F0 + F1 .

(21.1)

(For simplicity, we omit the subscript e from F0 and F1 .) The time evolution of Fe , and thence of F0 and F1 , is governed by the one-dimensional Vlasov equation, in which we assume a uniform neutralizing ion background, no background magnetic field, and interaction with electrostatic waves. We cannot use the linearized Vlasov equation (20.16), which formed the foundation for all of Chap. 21, because the processes we wish to study are nonlinear. Rather, we must use the fully nonlinear Vlasov equation [Eq. (20.5) integrated over the irrelevant components vx and vy of velocity as in Eq. (20.14)]: ∂Fe ∂Fe eE ∂Fe +v − =0. ∂t ∂z me ∂v

(21.2)

Here E is the rapidly varying electric field associated with the waves. Inserting Fe = F0 + F1 into this Vlasov equation, we obtain ∂F0 e ∂F1 ∂F1 ∂F1 e ∂F0 ∂F0 +v − E+ +v − E=0. ∂t ∂z me ∂v ∂t ∂z me ∂v

(21.3)

We then split this equation into two parts, its average over the large lengthscale L and its remaining time-varying part. The averaged part gets contributions only from the first three terms (since the last three are linear in F1 and E, which have vanishing averages): ∂F0 e ∂F0 +v − ∂t ∂z me

∂F1 E ∂v

=0.

(21.4)

4 This is an evolution equation for the averaged distribution F0 ; the third, nonlinear term drives the evolution. This driving term is the only nonlinearity that we shall keep in the quasilinear Vlasov equation. The rapidly varying part of the Vlasov equation (21.3) is just the last three terms ∂F1 e ∂F0 ∂F1 +v − E=0, ∂t ∂z me ∂v plus a nonlinear term e − me

∂F1 E− ∂v

∂F1 E ∂v

(21.5)

which we discard as being far smaller than the linear ones. If we were to keep this term, we would find that it can produce a “three-wave mixing”, in which two electrostatic waves with different wave numbers k1 and k2 interact weakly to try to generate a third electrostatic wave with wave number k3 = k1 ± k2 . We shall discuss such three-wave mixing in Sec. 21.3.3 below; for the moment we shall ignore it, and correspondingly shall discard the nonlinearity (21.6). Equation (21.5) is the same linear evolution equation for F1 as we developed and studied in Chap. 21. Here as there we bring its physics to the fore by decomposing into plane-wave, monochromatic modes; but here, by contrast with there, we shall be dealing with many modes and sums over the effects of many modes, so we must do the decomposition a little more carefully. The foundation for the decomposition is a spatial Fourier transform inside our averaging “box” of length L, F˜1 (v, k, t) =

Z

L −ikz

e

F1 (v, z, t)dz ,

˜ t) = E(k,

0

Z

L

e−ikz E(z, t)dz.

(21.6)

0

We take F1 and E to represent the physical quantities and thus to be real; this implies that ˜ so the inverse Fourier transforms are F˜1 (−k) = F˜1∗ (k) and similarly for E, F1 (v, z, t) =

Z

∞ ikz

e −∞

dk F˜1 (v, k, t) , 2π

E(z, t) =

Z

∞

˜ k, t) dk eikz E(v, 2π −∞

for 0 < z < L .

(21.7) (This choice of how to do the mathematics corresponds to idealizing F1 and E as vanishing outside the box; alternatively we could treat them as though they were periodic with period L and replace Eq. (21.7) by a sum over discrete values of k—multiples of 2π/L.) From our study of linearized waves in Chap. 20, we know that a mode with wave number k will oscillate in time with some frequency ω(k) so F˜1 ∝ e−iω(k)t

and E˜ ∝ e−iω(k)t .

(21.8)

For simplicity, we assume that the mode propagates in the +z direction; when studying modes traveling in the opposite direction, we just turn our z axis around. In Sec. 21.2.4 we will generalize to three-dimensional situations and include all directions of propagation

5 simultaneously. For simplicity, we also assume that for each wave number k there is at most one mode type present (i.e., only a Langmuir wave or only an ion acoustic wave). With these simplifications, ω(k) is a unique function with ωr > 0 when k > 0. Notice that the reality ˜ of E(z, t) implies [from the second of Eqs. (21.6)] E(−k, t) = E˜ ∗ (k, t) for all t, i.e. [cf. Eq. ∗ ˜ (21.8)] E(−k, 0)e−iω(−k)t = E˜ ∗ (k, 0)e+iω (k)t for all t, which in turn implies ω(−k) = −ω ∗ (k) ;

i.e. ωr (−k) = −ωr (k) ,

ωi (−k) = ωi (k) .

(21.9)

This should be obvious physically: it says that, for our chosen conventions, both the negative k and positive k contributions to Eq. (21.7) propagate in the +z direction, and both grow or are damped in time at the same rate. In general, ω(k) is determined by the Landaucontour dispersion relation (20.30). However, throughout Secs. 22.2–22.4 we shall specialize to weakly damped or growing Langmuir waves with phase velocities ωr /k large compared to the rms electron speed: vph =

ωr ≫ vrms = k

sZ

v 2 F0 (v)dv .

(21.10)

For such waves, from Eqs. (20.34), (20.35), and the first two lines of (20.37), we deduce the following explicit forms for the real and imaginary parts of ω: ωr2

=

ωp2

ωi =

2 3 vrms 1+ 2 (ωp /k)2

for k > 0 ,

πe2 ωr ′ F (ωr /k) for k > 0. 2ǫ0 me k 2 0

(21.11)

(21.12)

The linearized Vlasov equation (21.5) implies that the modes’ amplitudes F˜1 (v, k, t) and ˜ t) are related by E(k, ie ∂F0 /∂v ˜ E. (21.13) F˜1 = me (ω − kv) This is just Eq. (20.17) with d/dv replaced by ∂/∂v because F0 now varies slowly in space and time as well as varying with v. Turn, now, from the rapidly varying quantities F1 and E and their Vlasov equation, dispersion relation, and damping rate, to the spatially averaged distribution function F0 and its spatially averaged Vlasov equation (21.4). We shall bring this Vlasov equation’s nonlinear term into a more useful form. The key quantity in this nonlinear term is the average of the product of the rapidly varying quantities F1 and E. Parseval’s theorem permits us to rewrite this as Z Z Z ∞ 1 L dk E˜ ∗ F˜1 1 ∞ hEF1 i = EF1 dz = EF1 dz = , (21.14) L 0 L −∞ L −∞ 2π where in the second step we have used our mathematical idealization that F1 and E vanish outside our averaging box, and the third equality is Parseval’s theorem. Inserting Eq. (21.13),

6 we bring this into the form e hEF1 i = me

Z

∞ −∞

i ∂F0 dk E˜ ∗ E˜ . 2π L ω − kv ∂v

(21.15)

˜ The quantity E˜ ∗ E/L is a function of wave number k, time t, and also the location and size L of the averaging box. In order for Eq. (21.15) to be physically and computationally useful, it is essential that this quantity not fluctuate wildly as k, t, L, and the box location are varied. In most circumstances, if the box is chosen to be far larger than λ ¯ = 1/k, ∗ ˜ ˜ then E E/L indeed will not fluctuate wildly. When one develops the quasilinear theory with greater care and rigor than we can do in so short a treatment, one discovers that this non-fluctuation is a consequence of the Random Phase Approximation or RPA for short—an approximation which says that the phase of E˜ varies randomly with k, t, L, and the box location on suitably short lengthscales.1 Like ergodicity (Secs. 3.5 and 5.3), although the RPA is often valid, sometimes it can fail. Sometimes there is an organized bunching of the particles in phase space that induces nonrandom phases on the plasma waves. Quasilinear theory requires that RPA be valid and for the the moment we shall assume it so, but in Sec. 21.6 we shall meet an example for which it fails: strong ion-acoustic solitons. ˜ The RPA implies that, as we increase the length L of our averaging box, E˜ ∗ E/L will approach a well-defined limit. This limit is 1/2 the spectral density SE (k) of the random process E(z, t) at fixed time t; cf. Eq. (5.21). Correspondingly, it is natural to express quasilinear theory in the language of spectral densities. We shall do so, but with a normalization of the spectral density that is tied to the physical energy density and differs slightly from that used in Chap. 5: In place of SE (k), we use the Langmuir-wave spectral energy density, Ek . We follow plasma physicists’ conventions by defining this quantity to include the oscillatory kinetic energy in the electrons, as well as the electrical energy to which it is, on average, equal. As in the theory of random processes (Chap. 5) we shall add the energy at −kR to that ∞ at +k, so that all the energy is regarded as residing at positive wave number, and 0 dk Ek is the total wave energy per unit volume in the plasma, averaged over length L. Invoking the RPA, we can use Parseval’s theorem to compute the electrical energy density + Z + * * Z ∞ ∞ ˜ E˜ ∗ ǫ0 hE 2 i E E˜ E˜ ∗ dk dk = , (21.16) = ǫ0 ǫ0 2 2L 2π L 0 −∞ 2π ˜ E˜ ∗ (k) = E(−k) ˜ where we have used E(k) E˜ ∗ (−k). We double this to account for the wave energy in the oscillating electrons and then read off the spectral energy density as the integrand: ˜E ˜ ∗i ǫ0 hE . Ek = (21.17) πL This wave energy density can be regarded as a function either of wave number k or wave phase velocity vph = ωr /k. It is useful to plot Ek (vph ) on the same graph as the 1

For detailed discussion see Pines, D. and Schrieffer, J. R. 1962, Phys. Rev. 125, 804; also Davidson, R. C. 1972, Methods in Nonlinear Plasma Theory, New York: Academic Press.

7

Fo(v)

εk(vph) v, vph= ωr/k

vrms

Fig. 21.1: The spatially averaged electron velocity distribution Fo (v) (solid curve) and wave energy distribution Ek (vph ) (dotted curve) for the situation treated in Secs. 22.2–22.4.

averaged electron velocity distribution Fo (v). Figure 21.1 is such a plot. It shows the physical situation we are considering: approximately thermalized electrons with (possibly) a weak beam (bump) of additional electrons at velocities v ≫ vrms ; and a distribution of Langmuir waves with phase velocities vph ≫ vrms . There is an implicit time dependence associated with the growth or decay of the waves, so that Ek ∝ e2ωi t . Moreover, since the waves’ energy density travels through phase space (physical space and wave vector space) on the same trajectories as a wave packet, i.e. along the geometric-optics rays discussed in Chap. 6.

dxj dt

wp

∂ωr = = Vg j , ∂kj

dkj dt

=−

wp

∂ωr ∂xj

(21.18)

[Eqs. (6.25a) and (6.25b)], this growth or decay actually occurs along a wave-packet trajectory (with the averaging box also having to be carried along that trajectory). Thus, the equation of motion for the waves’ energy density is dEk ∂Ek ≡ + dt ∂t

dz dt

wp

∂Ek + ∂z

dk dt

wp

∂Ek = 2ωiEk . ∂k

(21.19)

Here we have used the fact that our electrostatic waves are presumed to propagate in the z direction, so the only nonzero component of k is kz ≡ k and the only nonzero component of the group velocity is Vg z = (dz/dt)wp = (∂ωr /∂k)z . For weakly damped, high-phase-speed Langmuir waves, ωr (x, k) is given by Eq. (21.11), with the x-dependence arising from the slowly spatially varyingp electron density n(x), which induces a slow spatial variation in the plasma frequency ωp = e2 n/ǫ0 me . ˜ ∗ E(k)/L ˜ Now, the context in which the quantity E(k) = (πL/ǫ0 )Ek arose was our evaluation of the nonlinear term in the Vlasov equation (21.4) for the electrons’ averaged distribution function F0 . By inserting (21.15) and (21.17) into (21.4), we bring that nonlinear

8 Vlasov equation into the form ∂F0 D , ∂v

∂F0 ∂F0 ∂ +v = ∂t ∂z ∂v

(21.20)

where Z ∞ e2 i D(v) = dk Ek 2 2ǫ0 me −∞ ω − kv 2 Z ∞ e ωi = dk Ek . 2 ǫ0 me 0 (ωr − kv)2 + ωi2

(21.21)

Here in the second step we have used Eq. (21.9). Equation (21.20) says that F0 (v, z, t) is transported in physical space with the electron speed v and diffuses in velocity space with the velocity diffusion coefficient D(v). Notice that D(v) is manifestly real, and a major contribution to it comes from waves whose phase speeds ωr /k nearly match the particle speed v, i.e. from resonant waves. The two-lengthscale approximation that underlies quasilinear theory requires that the waves grow or damp on a lengthscale long compared to a wavelength, and equivalently that |ωi | be much smaller than ωr . This allows us, for each v, to split the integral in Eq. (21.21) into a piece due to modes that can resonate with electrons of that speed because ωr /k ≃ v, plus a piece that cannot so resonate. We consider these two pieces in turn. The resonant piece can be written down, in the limit |ωi| ≪ ωr , by approximating the resonance in Eq. (21.21) as a delta function D

res

e2 π ≃ ǫ0 m2e

Z

∞

dkEk δ(ωr − kv) .

(21.22)

0

In the diffusion equation (21.20) this influences F0 (v) only at velocities v where there reside resonating waves with substantial wave energy, i.e. out on the tail of the electron velocity distribution, under the dotted Ek curve of Fig. 21.1. We shall refer to electrons in this region as the resonant electrons. In Sec. 22.4 below, we will explore the dynamical influence of this resonant diffusion coefficient on the velocity distribution F0 (v) of the resonant electrons. The vast majority of the electrons reside at velocities |v| . vrms , where there are no waves (because waves there get damped very quickly). For these nonresonant electrons the denominator in Eq. (21.21) for the diffusion coefficient is approximately equal to ωr2 ≃ ωp2 = e2 n/ǫ0 me , and correspondingly the diffusion coefficient has the form D

non−res

1 ≃ nme

Z

∞

ωi Ek dk .

(21.23)

0

The nonresonant electrons at v . vrms are the ones that participate in the wave motions and account for the waves’ oscillating charge density. The time averaged kinetic energy of these nonresonant electrons thus must include a conserved piece not associated with the waves, plus a wave piece that is equal to the waves’ electrical energy and thus to half the

9 R∞ waves’ total energy, 12 0 dkEk , and which thus must change at a rate 2ωi times that energy. Correspondingly, we expect the nonresonant piece of D(v) to produce a change of the timeaveraged electron energy given by Z 1 ∞ ∂Ue ∂Fez + = dk 2ωi Ek , (21.24) ∂t ∂z 2 0 where Fez is the electron energy flux. Indeed, this is the case; see Ex. 21.1. Because we have already accounted for this electron contribution to the wave energy in our definition of Ek , we shall ignore it henceforth in the evolution of F0 (v), and correspondingly, for weakly damped or growing waves we shall focus solely on the resonant part of the diffusion coefficient, Eq. (21.22).

21.2.2

Summary of Quasilinear Theory

All the fundamental equations of quasilinear theory are now in hand. They are: (i) The general dispersion relation (20.34), (20.35) for the waves’ frequency ωr (k) and growth rate ωi (k) [which, for the high-speed Langmuir waves on which we are focusing, reduces to Eqs. (21.11), (21.12)]; this dispersion relation depends on the electrons’ slowly-evolving timeaveraged velocity distribution F0 (v, z, t). (ii) The equation of motion (21.19) for the waves’ slowly evolving spectral energy density Ek (k, z, t), in which appear ωr (k) and ωi . (iii) Equation (21.21) or (21.22) for the diffusion coefficient D(v) in terms of Ek . (iv) The diffusive evolution equation (21.20) for the slow evolution of F0 (v, z, t). The fundamental functions in this theory are Ek (k, z, t) for the waves and F0 (v, z, t) for the electrons. Quasilinear theory sweeps under the rug and ignores the details of the oscillating electric field E(z, t) and the oscillating part of the distribution function F1 (v, z, t). Those quantities were needed in deriving the quasilinear equations, but they are needed no longer—except, sometimes, as an aid to physically understanding.

21.2.3

Conservation Laws

It is instructive to verify that the quasilinear equations enforce the conservation of particles, momentum and energy. R We begin with particles (electrons). The R number density of electrons is n = F0 dv and the z −component of particle flux is Sz = nvdv (where here and below all velocity integrals go from −∞ to +∞). Therefore, by integrating the diffusive evolution equation (21.20) for F0 over velocity, we obtain Z Z ∂n ∂Sz ∂F0 ∂ ∂ ∂ D dv = 0 , (21.25) + = F0 + v F0 dv = ∂t ∂z ∂t ∂z ∂v ∂v which is the law of particle conservation for our one-dimensional situation where there is no dependence of anything on x or y. R The z component of electron momentum Rdensity is Gez ≡ me vF0 and the zz component e of electron momentum flux (stress) is Tzz = me v 2 F0 ; so evaluating the first moment of the

10 evolution equation (21.20) for F0 we obtain Z Z Z e ∂Gez ∂Tzz ∂ ∂F0 ∂F0 ∂F0 ∂F0 dv = me v D dv = −me D + = me v +v dv , ∂t ∂z ∂t ∂z ∂v ∂v ∂v (21.26) where we have used integration by parts in the last step. The waves influence the momentum of the resonant electrons through the delta-function part of the diffusion coefficient D [Eq. (21.22)], and the momentum of the nonresonant electrons through the remainder of the diffusion coefficient [difference between Eqs. (21.21) and (21.22); cf. Exercise 21.1]. Because we have included the evolving part of the nonresonant electrons’ momentum and energy as part of the waves’ momentum and energy, we must restrict attention in Eq. (21.26) to the resonant electrons; cf. last sentence in Sec. 21.2.1. We therefore insert the delta-function part of D [Eq. (21.22)] into Eq. (21.26), thereby obtaining: Z Z Z e ∂F0 dk πe2 πe2 ∂Gez ∂Tzz dv dkEk δ(ωr − kv) Ek F0′ (ωr /k) + =− = . (21.27) ∂t ∂z ǫ0 me ∂v ǫ0 me k Here we have exchanged the order of integration, integrated out v, and set F0′ (v) ≡ ∂F0 /∂v. Assuming, for definiteness, high-speed Langmuir waves, we can rewrite the last expression in terms of ωi with the aid of Eq. (21.12): Z Z Z e ∂ Ek k ∂ Ek k ∂ωr Ek k ∂Gez ∂Tzz =− dk − dk + = −2 dkωi ∂t ∂z ωr ∂t ωr ∂z ωr ∂k w ∂Gw ∂T = − z − zz . (21.28) ∂t ∂z The second equality follows from Eq. (21.19). In the last two terms on the first line, R dk Ek k/ωr = Gw z is the waves’ density of z-component of momentum (as one can see from the fact that each plasmon carries a momentum pz = ~k and an energy ~ωr ; cf. Sec. 21.3 below). RSimilarly, since the waves’ momentum and energy travel with the group velocw ity dωr /dk, dk Ek (k/ωr )(∂ωr /∂k) = Tzz is the waves’ flux of momentum. Obviously, Eq. (21.28) represents the conservation of total momentum, that of the resonant electrons plus that of the waves (which includes the evolving part of the nonresonant electron momentum). Energy conservation can be handled in a similar fashion; see Ex. 21.2.

21.2.4

Generalization to Three Dimensions

We have so far restricted our attention to Langmuir waves propagating in one direction, +ez . The generalization to three dimensions is straightforward: The waves’ wave number k gets ˆ where k ˆ is a unit vector in the direction of the phase replaced by the wave vector k = k k, velocity. The waves’ spectral energy density becomes Ek , which depends on k, varies slowly in space x and time t, and is related to the waves’ total energy density by Z Uw = Ek dVk , (21.29) where dVk ≡ dkx dky dkz is the volume integral in wave-vector space.

11 Because the plasma is isotropic, the dispersion relation ω(k) = ω(|k|) has the same form as in the one-dimensional case, and the group and phase velocities point in the same direction ˆ Vph = (ω/k)k, ˆ Vg = (dωr /dk)k. ˆ The evolution equation (21.19) for the waves’ spectral k: energy density moving along a ray (a wave-packet trajectory) becomes dEk ∂Ek ≡ + dt ∂t

dxj dt

wp

∂Ej + ∂z

dkj dt

wp

∂Ek = 2ωi Ek . ∂kj

(21.30)

The equation for the wave-packet trajectory remains (21.18), unchanged. The diffusion ˆ⊗k ˆ component has the same coefficient acts only along the direction of the waves; i.e. its k ˆ vanish, so form (21.22) as in the one-dimensional case and components orthogonal to k πe2 D= ǫ0 m2e

Z

ˆ⊗k ˆ δ(ωr − k · v)dVk . Ek k

(21.31)

Because the waves will generally propagate in a variety of directions, the net D is not unidirectional. This diffusion coefficient enters into the obvious generalization of the evolution equation (21.20) for the averaged distribution function f0 (v)—on which we shall suppress the subscript 0 for ease of notation: ∂f + v · ∇f = ∇v · (D · ∇v f ) . ∂t

(21.32)

Here ∇v is the gradient in velocity space, i.e. in index notation ∂/∂vj . **************************** EXERCISES Exercise 21.1 Problem: Non-resonant Particle Energy in Wave Show that the nonresonant part of the diffusion coefficient in velocity space, Eq. (21.23), produces a rate of change of electron kinetic energy given by Eq. (21.24). Exercise 21.2 Problem: Energy Conservation Show that the quasilinear evolution equations guarantee conservation of total energy, that of the resonant electrons plus that of the waves. Pattern your analysis after that for momentum, Eqs. (21.26)–(21.28). ****************************

12

21.3

Quasilinear Theory in Quantum Mechanical Language

21.3.1

Wave-Particle Interactions

The attentive reader, will have noticed a familiar structure to our quasilinear theory. It is reminiscent of the geometric optics formalism that we introduced in Chap. 6. Here as there we can reinterpret the formalism in terms of quanta carried by the waves. At the most fundamental level, we could (second) quantize the field of plasma waves into quanta, (usually called plasmons) and describe their creation and annihilation using quantum mechanical transition probabilities. However, there is no need to go through the rigors of the quantization procedure, since the basic concepts of creation, annihilation, and transition probabilities should already be familiar to most readers in the context of photons coupled to atomic systems. Those concepts can be carried over essentially unchanged to plasmons, and by doing so we shall recover our quasilinear theory rewritten in quantum language. A major role in the quantum theory is played by the occupation number for electrostatic wave modes, which are just the single-particle states of the quantum theory. Our electrostatic waves have spin zero, since there is no polarization freedom (the direction of E is unique: it must point along k). In other words, there is only one polarization state for each k, so the number of modes (i.e. of quantum states) in a volume dVx dVk = dxdydzdkxdky dkz of phase space is dNstates = dVx dVk /(2π)3 , and correspondingly the number density of states in phase space is dNstates /dVx dVk = 1/(2π)3 ; cf. Sec. 2.3. The density of energy in phase space is Ek , and the energy of an individual plasmon is ~ωr , so the number density of plasmons in phase space is dN/dVx dVk = Ek /~ωr . Therefore, the states’ occupation number is given by η(k, x, t) =

dN (2π)3 Ek dN/dVx dVk = = . dNstates /dVx dVk dVx dVk /(2π)3 ~ωr

(21.33)

This is actually the mean occupation number ; the occupation numbers of individual states will fluctuate statistically around this mean. In this chapter (as in most of our treatment of statistical physics in Part 1 of this book), we will not deal with the individual occupation numbers, since quasilinear theory is oblivious to them and deals only with the mean. Thus, without any danger of ambiguity we shall simplify our terminology by suppressing the word “mean”. Equation (21.33) says that η(k, x, t) and Ek are the same quantity, aside from normalization. In the classical formulation of quasilinear theory we use Ek ; in the equivalent quantum formulation we use η. We can think of η equally well as a function of the state’s wave number k or of the momentum p = ~k of the individual plasmons that reside in the state. The motion of these individual plasmons is governed by Hamilton’s equations with the Hamiltonian determined by their dispersion relation via H(p, x, t) = ~ωr (k = p/~, x, t); see Sec. 6.3.2. The plasmon trajectories in phase space are, of course, identical to the wave-packet trajectories (the rays) of the classical wave theory. The third expression in Eq. (21.33) allows us to think of η as the number density in x, k phase space, with the relevant phase space volume renormalized from dVk to dVk /(2π)3 =

13

hk

p-hk

p

p p -h k

hk

Fig. 21.2: Feynman diagrams showing creation and annihilation of a plasmon (with momentum ~k) by an electron (with momentum p).

(dkx /2π)(dky /2π)(dkz /2π). The factors of 2π appearing here are the same ones as appear in the relationship between a spatial Fourier transform and its inverse [e.g., Eq. (21.7)]. We shall see the quantity dVk /(2π)3 appearing over and over again in the quantum mechanical theory, and it can generally be traced to that Fourier transform relationship. Resonant interactions between the waves and resonant electrons cause Ek to evolve as dEk /dt = 2ωi Ek [Eq. (21.30)]. Therefore, the plasmon occupation number will also vary as dη ∂η dxj ∂η dpj ∂η ≡ + · + = 2ωi η . dt ∂t dt ∂xj dt ∂pj

(21.34)

Here the plasmon velocity dxj /dt, as deduced from Hamilton’s equations, is equal (of course) to a wave packet’s group velocity Vg j , and the force acting on a plasmon, dpj /dt, as deduced from Hamilton’s equations, is ~ times the ray equations’ dkj /dt = −∂ωr /∂xj . The fundamental process that we are dealing with in Eq. (21.34) is the creation (or annihilation) of a plasmon by an electron; cf. Fig. 21.2. The kinematics of this process is simple: energy and momentum must be conserved in the interaction. In plasmon creation (left diagram), the plasma gains one quantum of energy ~ωr so the electron must lose this same energy, ~ωr = −∆(me v 2 /2) ≃ −∆p·v, where ∆p is the electron’s change of momentum and v is its velocity, and in “≃” we assume that the electron’s fractional change of energy is small (an assumption inherent in quasilinear theory). Since the plasma momentum change, ~k, is minus the electron momentum change, we conclude that ~ωr = −∆(me v 2 /2) ≃ −∆p · v = ~k · v .

(21.35)

This is just the resonance condition contained in the delta function (21.31). Thus, energy and momentum conservation in the fundamental plasmon creation process imply that the electron producing the plasmon must resonate with the plasmon’s mode, i.e. the component of the electron’s velocity along k must be the same as the mode’s phase speed, i.e. the electron must “surf” with the wave mode in which it is creating the plasmon, always remaining in the same trough or crest of the mode. A fundamental quantity in the quantum description is the probability per unit time for an electron with velocity v to spontaneously emit a Langmuir plasmon into a volume ∆Vk in

14 k-space; i.e. the number of k plasmons emitted by a single v electron per unit time into the volume ∆Vk centered on some wave vector k. This probability is expressed in the following form: ∆Vk dNplasmons (21.36) ≡ W (v, k) . dt (2π)3 In a volume ∆Vx of physical space and ∆Vv of electron velocity space, there are f (v)∆Vx ∆Vv electrons; and Eq. (21.36) tells us that these electrons increase the number of plasmons in ∆Vx and ∆Vk by dNplasmons ∆Vk ≡ W (v, k) f (v)∆Vx ∆Vv . (21.37) dt (2π)3 Dividing by ∆Vx and by ∆Vk /(2π)3, we obtain for the rate of change of the plasmon occupation number [second line of Eq. (21.33)] dη(k) = W (v, k)f (v)∆Vv . dt

(21.38)

Integrating over all of velocity space, we obtain a final expression for the influence of spontaneous plasmon emission on the plasmon occupation number: Z dη(k) = W (v, k)f (v)dVv . (21.39) dt Our introduction of the factor (2π)3 in the definition (21.36) of W (v, k) was designed to avoid a factor (2π)3 in this equation for the evolution of the occupation number. Below, we shall deduce the fundamental emission rate W for high-speed Langmuir phonons by comparison with our classical formulation of quasilinear theory. Because the plasmons have spin zero, they obey Bose-Einstein statistics, which means that the rate for induced emission of plasmons is larger than that for spontaneous emission by the occupation number η of the state that receives the plasmons. Furthermore, the principle of detailed balance (unitarity in quantum mechanical language) tells us that W is also the relevant transition probability for the inverse process of absorption of a plasmon in a transition between the same two electron momentum states (right diagram in Fig. 21.2). This permits us to write down a master equation for the evolution of the plasmon occupation number in a homogeneous plasma: dη = dt

Z

dVv W (v, k){f (v)[1 + η(k)] − f (v − ~k/me )η(k)}

(21.40)

The first term in the square brackets in the integrand is the contribution from spontaneous emission, the second term is induced emission, and the final term (after the square brackets) is absorption. The master equation (21.40) is actually the evolution law (21.34) for η in disguise, with the e-folding rate ωi written in a fundamental quantum mechanical form. To make contact with Eq. (21.34), we first notice that in our classical development of quasilinear theory we neglected spontaneous emission, so we drop it from Eq. (21.40). In the absorption term,

15

k Vph

α

P v

Fig. 21.3: Emission geometry for Cerenkov emission. An electron at P moving with speed v emits waves with phase speed Vph < v along a direction that make an angle α = cos−1 (Vph /v) with the direction of the electron’s motion.

the momentum of the plasmon is so much smaller than the electron momentum that we can make a Taylor expansion f (v − ~k/me ) ≃ f (v) − (~/me )(k · ∇v )f .

(21.41)

Inserting this into Eq. (21.40) and removing the spontaneous-emission term, we obtain Z dη ~ ≃η W (k · ∇v )f dVv . (21.42) dt me For comparison, Eq. (21.34), with ωi given by the classical high-speed Langmuir relation (21.12) and converted to 3-dimensional notation, says that Z πe2 ωr dη =η δ(ωr − k · v)k · ∇v f dVv . (21.43) dt ǫ0 k 2 me Comparing Eqs. (21.42) and (21.43), we see that the fundamental quantum emission rate for plasmons is πe2 ωr δ(ωr − k · v) . W = (21.44) ǫ0 k 2 ~ Note that this emission rate is inversely proportional to ~ and is therefore a very large number under the classical conditions of our quasilinear theory. This computation has shown that the classical absorption rate −ωi is the difference between the quantum mechanical absorption rate and induced emission rate. Under normal conditions, when k · ∇v f < 0 (∂F0 /∂v < 0 in one dimensional language), the absorption dominates over emission, so the absorption rate −ωi is positive, describing Landau damping. However (as we saw in Chap. 21), when this inequality is reversed, there can be wave growth (subject of course to there being a suitable mode into which the plasmons can be emitted, as guaranteed when the Penrose criterion is fulfilled). Although spontaneous emission was absent from our classical development of quasilinear theory, it nevertheless can be a classical process and therefore must be added to the quasilinear formalism. Classically or quantum mechanically, the spontaneous emission is a form

16 of Cerenkov radiation, since (as for Cerenkov light emitted by electrons moving through a dielectric medium), the plasmons are produced when an electron moves through the plasma faster than the waves’ phase velocity. More specifically, only when v > Vph = ωr /k can there be an angle α of k relative to v along which the resonance condition is satisfied, ˆ = v cos α = ωr /k. The plasmons are emitted at this angle α to the electron’s direction v·k of motion; cf. Fig. 21.3. The spontaneous Cerenkov emission rate Z dη = W (v, k)f (v)dVv . (21.45) dt s [Eq. (21.39) integrated over electron velocity] takes the following form when we use the Langmuir expression (21.44) for W : Z πe2 ωr dη = f (v)δ(ωr − k · v)dVv . (21.46) dt s ǫ0 ~ k2 Translated into classical language via Eq. (21.33), this Cerenkov emission rate is

dEk dt

s

e2 = 2 8π ǫ0

Z

ωr2 f (v)δ(ωr − k · v)dVv . k2

(21.47)

Note that Planck’s constant is absent from the classical expression, but present in the quantum one. In the above analysis we computed the fundamental emission rate W by comparing the quantum induced-emission rate minus absorption rate with the classical growth rate for plasma energy. An alternative route to Eq. (21.44) for W would have been to use classical plasma considerations to compute the classical Cerenkov emission rate (21.47), then convert to quantum language using η = (2π)3 Ek /~ωr thereby obtaining Eq. (21.46), then compare with the fundamental formula (21.45). By comparing Eqs. (21.43) and (21.46) and assuming a thermal (Maxwell) distribution for the electron velocities, we see that the spontaneous Cerenkov emission is ignorable in comparison with Landau damping when the electron temperature is smaller than η~ω/kB . Sometimes it is convenient to define a classical brightness temperature TB (k) for the plasma waves given implicitly by η(k) = (e~ω/kB TB (k) − 1)−1 ∼ kB TB (k)/~ω. In this language, spontaneous emission of plasmons with wave vector k is generally ignorable when the wave brightness temperature exceeds the electron kinetic temperature—as one might expect on thermodynamic grounds. In a plasma in strict thermal equilibrium, we expect Cerenkov emission to be balanced by Landau damping, so as to maintain a thermal distribution of Langmuir waves with a temperature equal to that of the electrons, TB (k) = Te for all k. Turn, now, from the evolution of the plasmon distribution η(k, x, t) to that of the particle distribution f (v, x, t). Classically, f evolves via the velocity-space diffusion equation (21.32). We shall write down a fundamental quantum mechanical evolution equation (the “kinetic equation”) which appears at first sight to differ remarkably from (21.32), but then shall recover (21.32) in the classical limit.

17 v + hk/me (1+η)

η v

(1+η)

η v - hk/me

Fig. 21.4: Three level system for understanding the electron kinetic equation.

To derive the electron kinetic equation, we must consider three electron velocity states, v and v ± ~k/me ; Fig. 21.4. Momentum conservation says that an electron can move between these states by emission or absorption of a plasmon with wave vector k. The fundamental probability for these transitions is the same one, W , as for plasmon emission, since these transitions are merely plasmon emission as seen from the electron viewpoint. Therefore, the electron kinetic equation must take the form Z df (v) dVk = {(1 + η)[W (v + ~k/me , k)f (v + ~k/me ) − W (v, k)f (v)] dt (2π)3 (21.48) −η[W (v + ~k/me , k)f (v) − W (v, k)f (v − ~k/me )]} . The four terms can be understood by inspection of Fig. 21.4. The two downward transitions in that diagram entail plasmon emission and thus are weighted by (1 + η), where the 1 is the spontaneous contribution and the η the induced emission. In the first of these (1 + η) terms in Eq. (21.48), the v electron state gets augmented so the sign is positive; in the second it gets depleted so the sign is negative. The two upward transitions entail plasmon absorption and thus are weighted by η, and the sign in Eq. (21.48) is plus when the final electron state has velocity v, and minus when the initial state is v. In the domain of classical quasilinear theory, the momentum of each emitted or absorbed plasmon must be small compared to that of the electron, so we can expand the terms in Eq. (21.48) in powers of ~k/me . Carrying out that expansion to second order and retaining those terms that are independent of ~ and therefore classical, we obtain the quasilinear electron kinetic equation: df = ∇v · [A(v)f + D(v) · ∇v f ] , (21.49) dt where ∇v is the gradient in velocity space (and not v · ∇), and where Z dVk W (v, k)~k , A(v) = (2π)3 me Z dVk η(k)W (v, k)~k ⊗ ~k . (21.50) D(v) = (2π)3 m2e The kinetic equation (21.49) is of Fokker-Planck form [Sec. 5.7; Eq. (5.75)]. The quantity −A is a resistive Fokker-Planck coefficient associated with spontaneous emission; and D, which we can reexpress in the notation of Eq. (5.83c) as ∆v ⊗ ∆v D= (21.51) ∆t

18 with ∆v = −~k/me , is the combined resistive-diffusive coefficient that arises when electron recoil can be ignored; it is associated with plasmon absorption and induced emission. In our classical quasilinear analysis we ignored spontaneous emission and thus had no resistive term in the evolution equation (21.32) for f . We can recover that evolution equation and its associated D by dropping the resistive term from the quantum kinetic equation (21.49) and inserting expression (21.44) for W into Eq. (21.50) for the quantum D. The results agree with the classical equations (21.31) and (21.32). Let us return briefly to Cerenkov emission by an electron or ion. In the presence of a background magnetic field B, the resonance condition for Cerenkov emission must be modified. Only the momentum parallel to the magnetostatic field need be conserved, not the total vectorial momentum. The unbalanced components of momentum perpendicular to B are compensated by a reaction force from B itself and thence by the steady currents that produce B; and correspondingly the unbalanced perpendicular momentum ultimately does work on those currents. In this situation, one can show that the Cerenkov resonance condition ωr − k · v = 0 is modified to ωr − kk · vk = 0 ,

(21.52)

where || means the component parallel to B. If we allow for the electron gyrational motion as well, then some number of gyrational quanta can be fed into the emitted plasmons, so Eq. (21.52) gets modified to read ωr − kk · vk = nωce

(21.53)

where n is an integer. For nonrelativistic electrons, the strongest resonance is for n = 1,

21.3.2

The relationship between classical and quantum mechanical formalisms in plasma physics

We have demonstrated how the structure of the classical quasilinear equations is mandated by quantum mechanics. In developing the quantum equations, we had to rely on one classical calculation, that which gave us the emission rate W . However, even this was not strictly necessary, since with significant additional effort we could have calculated the relevant quantum mechanical matrix elements and then computed W directly from Fermi’s golden rule. This has to be the case, because quantum mechanics is the fundamental physical theory and thus must be applicable to plasma excitations just as it is applicable to atoms. Of course, if we are only interested in classical processes, as is usually the case in plasma physics, then we end up taking the limit ~ → 0 in all observable quantities and the classical rate is all we need. This raises an important point of principle. Should we perform our calculations classically or quantum mechanically? The best answer is to be pragmatic. Many calculations in nonlinear plasma theory are so long and arduous that we need all the help we can get to complete them. We therefore combine both classical and quantum considerations (confident that both must be correct throughout their overlapping domain of applicability), in whatever proportion minimises our computational effort.

19

C

A

B B A C (a)

(b)

Fig. 21.5: (a) A three-wave process in which two plasmons A and B interact nonlinearly to create a third plasmon C. Conserving energy and linear momentum we obtain ωC = ωA + ωB and kC = kA + kB . For example A and C might be transverse electromagnetic plasmons satisfying the dispersion relation (19.22) and B might be a longitudinal plasmon (Langmuir or ion acoustic); or A and C might be Langmuir plasmons and B might be an ion acoustic plasmon — the case treated in the text and in Exs. 21.5 and 21.6. (b) The time-reverse three-wave process in which plasmon C generates plasmon B by an analog of Cerenkov emission, and while doing so recoils into plasmon state A.

21.3.3

Three-Wave Mixing

We have discussed plasmon emission and absorption both classically and quantum mechanically. Our classical and quantum formalisms can be generalized straightforwardly to encompass other nonlinear processes. Among the most important other processes are three-wave interactions (in which two waves coalesce to form a third wave or one wave splits up into two) and scattering processes, in which waves are scattered off particles without creating or destroying plasmons. In this section we shall focus on three-wave mixing. We shall present the main ideas in the text but shall leave most of the details to Exs. 21.5 and 21.6. In three-wave mixing, where waves A and B combine to create a wave C [Fig. 21.5(a)], the equation for the growth of the amplitude of wave C will contain nonlinear driving terms that combine the harmonic oscillations of waves A and B, i.e. driving terms proportional to exp[i(kA · x−ωA t)] exp[i(kB · x−ωB t)]. In order for wave C to build up coherently over many oscillation periods, it is necessary that the spacetime dependence of these driving terms be the same as that of wave C, exp[i(kC · x − ωC t)]; i.e. it is necessary that kC = kA + kB ,

ωC = ωA + ωB .

(21.54)

Quantum mechanically, this can be recognized as momentum and energy conservation for the waves’ plasmons. We have met three-wave mixing previously, for electromagnetic waves in a nonlinear dielectric crystal (Sec. 9.5). There, as here, the conservation law (21.54) was necessary in order for the mixing to proceed; see Eq. (9.24) and associated discussion.

20 When the three waves are all electrostatic, the three-wave mixing arises from the nonlinear term (21.6) in the rapidly varying part (wave part) of the Vlasov equation, which we discarded in our quasilinear analysis. Generalized to three dimensions with this term treated as a driver, that Vlasov equation takes the form e e ∂f1 + v · ∇f1 − E · ∇v f0 = (E · ∇v f1 − hE · ∇v f1 i) . ∂t me me

(21.55)

In the driving term (right hand side of this equation), E could be the electric field of wave A and f1 could be the perturbed velocity distribution of wave B or vice versa, and the E and f1 terms on the left side could be those of wave C. If the wave vectors and frequencies are related by (21.54), then via this equation waves A and B will coherently generate wave C. The dispersion relations for Langmuir and ion acoustic waves permit the conservation law (21.54) to be satisfied if A is Langmuir so ωA ∼ ωpe , B is ion acoustic so ωB . ωpp ≪ ωA , and C is Langmuir. By working out the detailed consequences of the driving term (21.55) in the quasilinear formalism and comparing with the quantum equations for 3-wave mixing (Ex. 21.5), one can deduce the fundamental rate for the process A + B → C [Fig. 21.5(a)]. Detailed balance (unitarity) guarantees that the time reversed process C → A + B [Fig. 21.5(b)] will have identically the same fundamental rate. This time-reversed process has a physical interpretation analogous to the the emission of a Cerenkov plasmon by a high-speed, resonant electron: C is a “high-energy” Langmuir plasmon (ωC ∼ ωpe ) that can be thought of as Cerenkov-emitting a “low-energy” ion acoustic plasmon (ωB . ωpp ≪ ωA ) and in the process recoiling slightly into Langmuir state A. The fundamental rate that one obtains for this wave-wave Cerenkov process and its time reversal, when the plasma’s electrons are thermalized at temperature Te , is [Ex. 21.52 ]. WAB↔C = RAB↔C (kA , kia, kC )δ(kA + kia − kC )δ(ωA + ωia − ωC ) .

(21.56)

where RAB↔C (kA , kia, kC ) =

8π 5 ~e2 (mp /me )ωB3 ˆ ˆ 2 (kA · kC ) . 2 (kB Te )2 kia

(21.57)

[Here we use the subscript “ia” for the ion acoustic plasmon (plasmon B) to avoid confusion with Boltzmann’s constant kB .] This is the analog of the rate (21.44) for Cerenkov emission by an electron: The ionacoustic occupation number will evolve via an evolution law analogous to (21.40) with this rate replacing W on the right hand side, η replaced by the ion acoustic occupation number ηia , and the electron distribution replaced by a product of A-mode and C-mode Langmuir occupation numbers; see Ex. 21.5. Moreover, there will be a similar evolution law for the Langmuir occupation number, involving the same fundamental rate (21.56); Ex. 21.6. **************************** EXERCISES 2

See also Eq. (A.3.12) of Tsytovich, V. N. 1970. Nonlinear Effects in Plasma, New York: Plenum. The rates for many other wave-wave mixing processes are worked out in this book, but beware: it contains a large number of typographical errors.

21 Exercise 21.3 Problem: Cerenkov power in electrostatic waves Show that the Langmuir wave power radiated by an electron moving with speed v in a plasma with plasma frequency ωp is given by e2 ωp2 P = ln 4πǫ0 v

kmax v ωp

,

(21.58)

where kmax is the largest wave number at which the waves can propagate. (For larger k the waves are strongly Landau damped.) Exercise 21.4 Derivation: Electron Fokker-Planck Equation Fill in the missing details in the derivation of the electron Fokker-Planck equation (21.49) Exercise 21.5 Example and Challenge:Three-Wave Mixing — Ion-Acoustic Evolution Consider the three-wave processes shown in Fig. 21.5, with A and C being Langmuir plasmons and B an ion acoustic plasmon and with the fundamental rate being given by Eqs. (21.56) and (21.57). (a) By summing the rates of forward and backward reactions [diagrams (a) and (b)], show that the occupation number for the ion acoustic plasmons satisfies the kinetic equation Z dVkA dVkC dηB = WAB↔C [(1 + ηA + ηB )ηC − ηA ηC )] . (21.59) dt (2π)3 (2π)3 [Hints: (i) The rate for A + B → C [Fig. 21.5(a)] will be proportional to (ηC + 1)ηA ηB ; why? (ii) When you sum the rates for the two diagrams, (a) and (b), the terms involving ηA ηB ηc should cancel.] (b) The ion acoustic plasmons have far lower frequencies than the Langmuir plasmons, so ωB ≪ ωA ≃ ωC . Assume that they also have far lower wave numbers, |kB | ≪ |kA | ≃ |kC |. Assume further (as will typically be the case) that the ion acoustic plasmons, because of their tiny individual energies, have far larger occupation numbers than the Langmuir plasmons so ηB ≫ ηA ∼ ηC . Using these approximations, show that the evolution law (21.59) for the ion acoustic waves reduces to the form Z dηia(k) dVk′ = ηia (k) RAB↔C (k′ − k, k, k′ )δ[ωia (k) − k · Vg L (k′ )]k · ∇k′ ηL (k′ ) , dt (2π)6 (21.60) where ηL is the Langmuir (waves A and C) occupation number, Vg L is the Langmuir group velocity, and RC↔BA is the fundamental rate (21.57). (c) Notice the strong similarities between the evolution equation (21.60) for the ion acoustic plasmons that are Cerenkov-emitted and absorbed by Langmuir plasmons, and the evolution equation (21.43) for Langmuir plasmons Cerenkov-emitted and absorbed by fast electrons! Discuss the similarities and the physical reasons for them.

22 (d) Carry out an explicit classical calculation of the nonlinear interaction between Langmuir waves with wave vectors kA and kC to produce ion-acoustic waves with wave vector kB ≡ kia = kC − kA . Base your calculation on the nonlinear Vlasov equation (21.55) and [for use in relating E and f1 in the nonlinear term] the 3-dimensional analog of Eq. (21.13). Assume a spatially-independent Maxwellian averaged electron velocity distribution f0 with temperature Te (so ∇f0 = 0). From your result compute, in the random phase approximation, the evolution of the ion-acoustic energy density Ek and thence the evolution of the occupation number η(k). Bring that evolution equation into the functional form (21.60). By comparing quantitatively with Eq. (21.60), read off the fundamental rate RC↔BA . Your result should be the rate in Eq. (21.57). Exercise 21.6 Example and Challenge:Three-Wave Mixing — Langmuir Evolution Continuing the analysis of the preceding excercise: (a) Derive the kinetic equation for the Langmuir occupation number. [Hint: You will have to sum over four Feynman diagrams, corresponding to the mode of interest playing the role of A and then the role of C in each of the two diagrams in Fig. 21.5.] (b) Using the approximations outlined in part (c), show that the Langmuir occupation number evolves in accord with the diffusion equation dηL (k′ ) = ∇k′ · [D(k′ ) · ∇k′ ηL (k′ )] , dt

(21.61)

where the diffusion coefficient is given by the following integral over the ion acoustic wave distribution Z dVk′ ′ . (21.62) D(k ) = ηia (k) k ⊗ k RC↔BA (k − k′ , k′ , k) δ[ωia (k) − k · Vg L (k′ )] (2π)6 (c) Discuss the strong similarity between this evolution law for resonant Langmuir plasmons interacting with ion acoustic waves, and the one (21.31), (21.32) for resonant electrons interacting with Langmuir waves. Why are they so similar?

****************************

21.4

Quasilinear Evolution of Unstable Distribution Functions: The Bump in Tail

A quite common occurence in plasmas arises when a weak beam of electrons passes through a stable Maxwellian plasma with speed vb large compared with the thermal width of the background plasma σe . When the velocity width of the beam σb is small compared with vb ,

23 the distribution is known as a bump in tail distribution; see Fig. 21.6 below. In this section we shall explore the stability and nonlinear evolution of such a distribution. Let us deal with the simple case of a one dimensional electron distribution function F0 (v) and approximate the beam by a Maxwellian Fb (v) =

nb −(v−vb )2 /2σ2 e , (2π)1/2 σb

(21.63)

where nb is the beam electron density. For simplicity we shall treat the protons as a uniform neutralizing background. Now, let us suppose that at time t = 0, the beam is established and the Langmuir wave energy density Ek is very small. The waves will grow fastest when the waves’ phase velocity Vph = ωr /k resides where the slope of the distribution function is most positive, i.e. when Vph = vb − σb . The associated maximum growth rate as computed from Eq. (21.12) is π 1/2 v 2 n b b ωp , (21.64) ωimax = 8e σb ne where e = 2.72... is not the electron charge. Now modes will grow over a range of wave phase velocities ∆Vph ∼ σb . By using the Bohm-Gross dispersion relation (19.34) in the form 2 −1/2 ω = ωp (1 − 3σe2 /Vph ) ,

(21.65)

we find that the bandwidth of the growing modes is given roughly by ∆ω = Kωp

σb , vb

(21.66)

where K = 3(σe /vb )2 [1−3(σe /vb )2 ]−3/2 is a constant & 0.1 typically. Combining Eqs. (21.64), (21.66) we obtain 3 nb ωimax π 1/2 vb . (21.67) ∼ 2 ∆ω 8eK σb ne Dropping constants of order unity, we conclude that the growth time for the waves ∼ (ωimax )−1 is long compared with the coherence time ∼ (∆ω)−1 provided that σb &

nb ne

1/3

vb .

(21.68)

When inequality (21.68) is satisfied the waves will take several coherence times to grow and so we expect that no permanent phase relations will be established in the electric field and that quasilinear theory is an appropriate tool. However, when this inequality is reversed, the instability resembles more the two stream instability of Chap. 20 and the growth is so rapid as to imprint special phase relations on the waves, so the random phase approximation fails and quasilinear theory is invalid. Restricting ourselves to slow growth, we shall use the quasilinear theory to explore the evolution of the wave and particle distributions. We can associate the wave energy density Ek not just with a given value of k but with a corresponding value of Vph = ωr /k, and thence

24 F(v)

2σb t σe

vb

v

Fig. 21.6: Evolution of the one-dimensional electron distribution function from a “bump on tail” shape to a flat distribution function, due to the growth and scattering of electrostatic waves.

with the velocities v = Vph of electrons that resonate with the waves. Using Eq. (21.22) for the velocity diffusion coefficent and Eq. (21.12) for the associated wave growth rate, we can then write the temporal evolution equations for the electron distribution function F0 (v, t) and the wave energy density Ek (v, t) as πe2 ∂ Ek ∂F0 ∂F0 , = 2 ∂t me ǫ0 ∂v v ∂v πe2 2 ∂F0 ∂Ek = v Ek . (21.69) ∂t me ǫ0 ωp ∂v Here for simplicity we have assumed a spatially homogeneous distribution of particles and waves so d/dt → ∂/∂t. This pair of nonlinear equations must be solved numerically, but their qualitative behavior can be understood analytically without much effort; see Fig. 21.6. Waves resonant with the rising part of the electron distribution function at first will grow exponentially, causing the particles to diffuse and flatten the slope of f and thereby reduce the wave growth rate. Ultimately, the slope ∂F0 /∂f will diminish to zero and the wave energy density will become constant, with its integral, by energy conservation [Ex. 21.2], equal to the total kinetic energy lost by the beam. In this way we see that a velocity space irregularity in the distribution function leads to the growth of electrostatic waves which can react back on the particles in such a way as to saturate the instability. The net result is a beam of particles with a much broadened width propagating through the plasma. The waves will ultimately damp through three-wave processes or other damping mechanisms, sending their energy ultimately into heat.

21.4.1

Instability of Streaming Cosmic Rays

For a simple illustration of this general type of instability we return to the issue of the isotropization of Galactic cosmic rays, which we introduced in Sec. 17.7. We argued there

25 that cosmic rays propagating through the interstellar medium are effectively scattered by hydromagnetic Alfvén waves. We did not explain where these Alfvén waves originated. It now seems likely that much of the time these waves are generated by the cosmic rays themselves. Suppose that we have a beam of cosmic rays propagating through the interstellar gas at high speed. The interstellar gas is magnetized, which allows many more wave modes to propagate than in the unmagnetized case. It turns out that the particle distribution is unstable to the growth of Alfvén waves satisfying the resonance condition (21.53), modified to take account for the fact that we are dealing with mildly relativistic protons rather than non-relativistic electrons: ωcp ω − kk · vk = . (21.70) γ Here γ is the Lorentz factor of the protons, and we assume that n = 1. As the cosmic rays travel much faster than the waves, the effective resonance condition is that the wavelength of the Alfvén wave match the particle gyro radius. The growth rate of these waves can be studied using a kinetic theory analogous to that which we have just developed for Langmuir waves.3 Dropping factors of order unity, it is given approximately by u ncr cr ωcp −1 , (21.71) ωi ≃ np a where ncr is the number density of cosmic rays, np is the number density of thermal protons in the background plasma, ucr is the mean speed of the cosmic ray protons through the background plasma and a is the Alfvén speed. So if the particles have a mean speed in excess of the Alfvén speed, the waves will grow, exponentially at first. It is observed that the energy density of cosmic rays builds up until it is roughly comparable with that of the thermal plasma. As more cosmic rays are produced, they will escape from the Galaxy at a sufficient rate to maintain this balance. Therefore, in a steady state, the ratio of the number density of cosmic rays to the thermal proton density is roughly the inverse of their meanenergy ratio. Adopting a mean cosmic ray energy of ∼ 1 GeV and an ambient temperature in the interstellar medium of T ∼ 104 K, this ratio of number densities is ∼ 10−9 . The ion gyro period in the interstellar medium is roughly ∼ 100 s for a typical field of strength of ∼ 100 pT. Cosmic rays streaming at a few times the Alfvén speed will create Alfvén waves in ∼ 1010 s, of order a few hundred years, long before they escape from the Galaxy. The waves will then react back on the cosmic rays, scattering them in momentum space [Eq. (21.49)]. Now each time a particle is scattered by an Alfvén wave quantum, the ratio of its energy change to the magnitude of its momentum change must be the same as that in the waves and equal to the Alfvén speed, which is far smaller than the original energy to momentum ratio of the particle, ∼ c for a mildly relativistic proton. Therefore the effect of the Alfvén waves is to scatter the particle directions without changing their energies significantly. As the particles are already gyrating around the magnetic field, the effect of the waves is principally to change the angle between their momenta and the field (known as the pitch angle), so as to reduce their mean speed along the magnetic field. 3

Melrose 1984.

26 Now when this mean speed is reduced to a value of order the Alfvén speed, the growth rate diminishes just like the growth rate of Langmuir waves is diminished after the electron distribution function is flattened. Under a wide variety of conditions, cosmic rays are believed to maintain the requisite energy density in Alfvén wave turbulence to prevent them from streaming along the magnetic field with a mean speed much faster than the Alfvén speed (which varies between ∼ 3 and ∼ 30 km s−1 ). This is a different model of their transport from spatial diffusion, which we assumed in Sec. 18.7, but the end result is similar and cosmic rays are confined to our galaxy for more than ∼ 10 million years. These processes can be observed directly using spacecraft in the interplanetary medium. **************************** EXERCISES Exercise 21.7 Problem: Stability of isotropic distribution function Consider an arbitrary isotropic distribution function and consider its stability to the growth of Langmuir waves. Show that the linear growth rates of all such waves are negative and so the plasma is stable to these modes. Exercise 21.8 Challenge: Alfvén wave emission by streaming cosmic rays Consider a beam of high energy cosmic ray protons streaming along a background magnetostatic field in a collisionless plasma. Let the cosmic rays have an isotropic distribution function in a frame that moves with along the magnetic field with speed u, and assume that u is large compared with the Alfvén speed but small compared with the speeds of the individual cosmic rays. Using the resonance condition (21.53) argue that there will be strong emission and absorption of Alfvén modes by the cosmic rays when their Larmor radii roughly match the wavelengths of the Alfvén waves. Adapt the discussion of the emission of Langmuir waves by a bump on tail distribution to show that the growth rate is given to order of magnitude by Eq. (21.71).

****************************

21.5

Parametric Instabilities

One of the approaches that is currently being pursued toward the goal of bringing about commercial nuclear fusion is to compress pellets of a mixture of deuterium and tritium using powerful lasers so that the gas densities and temperatures are large enough for the nuclear energy release to exceed the energy expended in bringing about the compression. At these densities the incident laser radiation behaves like a large amplitude plasma wave and is subject to a new type of instability that may already be familiar from dynamics, namely a parametric instability. Consider how the incident light is absorbed by the relatively tenuous ionized plasma around the pellet. The critical density at which the incident wave frequency equals the

27 Box 22.1 Laser Fusion In the simplest scheme for Laser fusion, it is proposed that solid pellets of deuterium and tritium be compressed and heated to allow the reaction d + t → α + n + 14MeV

(1)

to proceed. An individual pellet would have mass m ∼ 5 mg and initial radius ri ∼ 2 mm. If ∼ 1/3 of the fuel burns then the energy released by a single pellet would be ∼ 500 MJ. As we described in Sec. 9.2, among the most powerful lasers available are the Qswitched, Neodymium-glass pulsed lasers which are capable of delivering an energy flux of ∼ 1018 W m−2 at the pellet surface for several ns. These lasers operate at an infra red wavelength of 1.06µm, however nonlinear optical methods (cf. Sec. 9.6) can be used to double or even triple the operating frequency. In a working reactor, ten lasers might be used, each one illuminating about 1 steradian with an initial radiation pressure of Prad ∼ 3 × 109 N m−2 , which is less than a pellet’s bulk modulus (cf. Table 10.1 in Sec. 10.3). Radiation pressure alone is therefore inadequate to compress the pellets. However, at the high energy density involved, the incident radiation can be absorbed by plasma processes (see text), and the energy can be devoted to evaporating the surface layers of the pellet. The escaping hydrogen will carry away far more radial momentum which will cause the pellet to implode under the reaction force. This process is known as ablation. The maximum ablation pressure is ∼ 2Prad c/ve ∼ 104 Prad where ve ∼ 30 km s−1 is the speed of the escaping gases. Maximum compression will be achieved if the pellet remains cool. In this case, the dominant pressure will be the degeneracy pressure associated with the electrons. Compression factors of ∼ 104 are contemplated, which are believed to be sufficient to initiate nuclear reactions. −3 plasma frequency is ρ ∼ 5λ−2 µm kg m , where λµm is the wavelength measured in µm. For a wave incident energy flux F ∼ 1018 Wm−2 , the amplitude of the wave electric field will be E ∼ (F/ǫ0 c)1/2 ∼ 2 × 1010 V m−1 . The velocity of a free electron oscillating in a wave this strong will be v ∼ eE/me ω ∼ 2000 km s−1 which is almost one per cent of the speed of light. It is therefore not surprising that nonlinear wave processes are important. One of the most important such processes is called stimulated Raman scattering. In this process, the coherent electromagnetic wave with frequency ω convects a small pre-existing density fluctuation associated with a relatively low frequency Langmuir wave with frequency ωpe and converts it into a current which varies at the beat frequency ω − ωpe . This creates a new electromagnetic mode with this frequency. The vector sum of the k vectors of the two modes must also equal the incident k vector. When this can first happen, the new k is almost antiparallel to that of the incident mode and so the radiation is backscattered. The new mode can combine nonlinearly with the original electromagnetic wave to produce a force ∝ ∇E 2 , which amplifies the original density fluctuation. Provided the growth rate

28 of the wave is faster than the natural damping rates, e.g. that of Landau damping, there can be a strong back-scattering of the incident wave at a density well below the critical density of the incident radiation. (A further condition which must be satisfied is that the bandwidth of the incident wave must also be less than the growth rate. This will generally be true for a laser.) Stimulated Raman scattering is an example of a parametric instability. The incident wave frequency is called the pump frequency. One difference between parametric instabilities involving waves as opposed to just oscillations is that it is necessary to match spatial as well as temporal frequencies. Reflection of the incident radiation by this mechanism reduces the ablation of the pellet and also creates a population of suprathermal electrons, which conduct heat into the interior of the pellet and inhibit compression. Various strategies, including increasing the wave frequency, have be devised to circumvent Raman back scattering (and also a related process called Brillouin back-scattering in which the Langmuir mode is replaced by an ion acoustic mode).

21.6

Solitons and Collisionless Shock Waves

In Sec. 19.3, we introduced ion-acoustic waves that have a phase speed Vph ∼ (kB Te /mp )1/2 , determined by a combination of the electron temperature Te and the proton mass mp . In Sec. 20.3.6, we argued that these waves would be strongly Landau damped unless the electron temperature greatly exceeded the proton temperature. However, this formalism was only valid for waves of small amplitude so that the linear approximation could be used. In Ex. 19.5, we considered the profile of a nonlinear wave and found a solution for a single ion-acoustic soliton valid when the waves are weakly nonlinear. We will now consider this problem in a slightly different way that is valid for strong nonlinearity in the wave amplitude. However we will restrict our attention to waves that propagate without change of form and so will not generalize the Korteweg-De Vries equation. Once again, we use the fluid model and introduce an ion fluid velocity u. The electrons are supposed to be highly mobile and to assume a local density ∝ exp(eφ/kB Te ), where φ is the electrostatic potential. The ions must satisfy equations of continuity and motion ∂ ∂n + (nu) = 0 , ∂t ∂z ∂u ∂u e ∂φ +u =− . ∂t ∂x mp ∂z

(21.72)

We now seek a solution for a wave moving with constant speed V , through the ambient plasma. In this case, all physical quantities must be functions of a single dependent variable ξ = z − V t. If we allow a prime to denote differentiation with respect to ξ, then Eqs. (21.72) become (u − V )n′ = −nu′ , e (u − V )u′ = − φ′ . mp

(21.73)

29 0.4 ε0Φ n0 kBTe

1.0

0.0

1.3 1.58

-0.6 0.0

1.65

φ

meV2/2e

Fig. 21.7: Potential function Φ(φ) used for exhibiting the properties of an ion-acoustic soliton for four different values of the ion acoustic Mach number M .

These two equations can be integrated and combined to obtain an expression for the ion density n in terms of the electrostatic potential n = n0 (1 − 2eφ/mp V 2 )−1/2 ,

(21.74)

where n0 is the ion density, presumed uniform, long before the wave arrives. The next step is to combine this ion density with the electron density and substitute into Poisson’s equation to obtain a nonlinear ordinary differential equation for the potential: ) ( −1/2 2eφ n e 0 (21.75) 1− − eeφ/kB Te . φ′′ = − ǫ0 mp V 2 Now the best way to think about this problem is to formulate the equivalent dynamical problem of a particle moving in a one dimensional potential well, where φ measures the position coordinate and ξ is the time coordinate. As the right hand side of Eq. (21.75) varies only with φ, we can treat it as minus the gradient of a scalar potential, Φ(φ). Integrating the right hand side of Eq. (21.75) and assuming that Φ → 0 as φ → 0, (i.e. as ξ → ∞), long before the arrival of the pulse, we obtain "( # 1/2 ) n0 kB Te 2eφ mp V 2 Φ(φ) = 1− 1− − (eeφ/kB Te − 1) . (21.76) ǫ0 mp V 2 kB Te We have assumed that 0 < φ < mp V 2 /2e. The shape of this potential well is sketched in Fig. 21.7 and is determined by the parameter M = (mp V 2 /kB T )1/2 which is immediately recognizable as the ion-acoustic Mach number, i.e. the ratio of the speed of the soliton to the ion-acoustic speed in the undisturbed medium. A solution for the potential profile φ(ξ) in the wave corresponds to the trajectory of a particle with zero total energy in this potential well. The particle starts at φ = 0, with zero kinetic energy (i.e. φ′ = 0) and then accelerates to a maximum speed near the minimum in the potential before decelerating. If there is a turning point, the particle will come to rest, φ(ξ) will attain a maximum and then the particle will return to the origin. The particle trajectory corresponds to a symmetrical soliton, propagating with uniform speed. Two conditions must be satisfied for a soliton solution. First, the potential well must be attractive. This will only happen when d2 Φ/dφ2 (0) < 0 which implies that M > 1. Second,

30 a

b φ

Φ 0.0

0.0

φ

m eV2/2e

ξ

Fig. 21.8: Ion-acoustic shock waves. a) Solution in terms of equivalent potential. b) Electrostatic potential profile in shock.

there must be a turning point. This happens if Φ(mp V 2 /2e) > 0. The maximum value of M for which this is the case is a solution of the equation eM

2 /2

− 1 − M2 = 0

(21.77)

or M = 1.58. Hence, ion acoustic soliton solutions only exist for 1 < M < 1.58.

(21.78)

The wave must travel sufficiently fast with respect to the ions that the particle can pass through the potential barrier. However the wave must not be so fast with respect to the electron thermal speed that the electrons are able to short out the potential near its maximum. This analogy with particle dynamics is generally helpful. It also assists us in understanding a deep connection between solitons and laminar shock fronts. The equations that we have been solving so far contain the two key ingredients for a soliton, nonlinearity to steepen the wave profile and dispersion to spread it. However, they do not make provision for any form of dissipation, a necessary condition for a shock front where the entropy must increase. In a real collisionless plasma, this dissipation can take on many forms. It may be associated with anomalous resistivity or perhaps some viscosity associated with the ions. In many circumstances, some of the ions are reflected by the potential barrier and counterstream against the incoming ions which they eventually heat. Whatever its origin, the net effect of this dissipation will be to cause the equivalent particle to lose its total energy so that it can never return to its starting point. Given an attractive and bounded potential well, we find that the particle has no alternative except to sink toward to the bottom of the well. Depending upon the strength of the dissipation, the particle may undergo several oscillations before coming to rest. The structure to which this type of solution corresponds is a laminar shock front. Unlike with a soliton, the wave profile in a shock wave is not symmetric in this case and instead describes a permanent change in the electrostatic potential φ. Repeating the arguments above, we find that a shock wave can only exist when M > 1, that is to say, it must be supersonic with respect to the ion-acoustic sound speed. In addition there is a maximum critical Mach number close to M = 1.6 above which a laminar shock becomes impossible.

31

Fig. 21.9: Illustration of the form of the collisionless bow shock formed around the earth’s magnetosphere. The earth’s bow shock has been extensively studied using spacecraft. Alfvén, ion-acoustic, whistler and Langmuir waves are all generated with large amplitude in the vicinity of shock fronts by the highly non-thermal particle distributions. (Adapted from Parks 1991.)

What happens when the critical Mach number is exceeded? Here there are several possibilities which include relying upon a more rapidly moving wave to form the shock front or appealing to turbulent conditions downstream from the shock front to enhance the dissipation rate. This ion-acoustic shock front is the simplest example of a collisionless shock. Essentially every wave mode can be responsible for the formation of a shock. The dissipation in these shocks is still not very well understood but we can observe them in several different environments within the heliosphere and also in the laboratory. The best studied of these shock waves are those based on hydromagnetic waves which were introduced briefly in chapter 10. The solar wind moves with a speed that is typically 5 times the Alfvén speed. It should therefore form a bow shock (one based upon the fast magnetosonic mode), whenever it encounters a planetary magnetosphere. This happens despite the fact that the mean free path

32 of the ions in the solar wind is typically much larger than the size of the shock front. The thickness of the shock front turns out to be a few ion Larmor radii. This is a dramatic illustration of the importance of collective effects in controlling the behavior of essentially collisionless plasmas; see Fig. 21.9; also Sagdeev and Kennel (1991). **************************** EXERCISES Exercise 21.9 Derivation: Critical Mach number for an Ion-acoustic shock wave Verify Eq. (21.76) and show numerically that the critical Mach number for a laminar shock front is M = 1.56. Exercise 21.10 Problem: Solar-wind termination shock The solar wind is a quasi-spherical outflow of plasma from the sun. At the radius of the earth’s orbit, the mean proton and electron densities are np ∼ ne ∼ 4 × 106 m−3 , their temperatures are Tp ∼ Te ∼ 105 K, and their common radial fluid speed is ∼ 400 km s−1 . The mean magnetic field strength is ∼ 1 nT. Eventually, the radial momentum flux in the solar wind falls to the value of the mean interstellar pressure, ∼ 10−13 N m−2 and a shock will develop. (a) Estimate the radius where this will occur. (b) The solar system moves through the interstellar medium with a speed ∼ 30 km s−1 . Sketch the likely flow pattern near this radius. (c) How do you expect the magnetic field to vary with radius in the outflowing solar wind? Estimate its value at the termination shock. (d) Estimate the electron plasma frequency, the ion acoustic Mach number and the proton Larmor radius just ahead of the termination shock front and comment upon the implications of these values for the shock structure. (e) The Voyager 1 spacecraft was launched in 1977 and is now moving radially away from the sun with a speed ∼ 15 km s−1 . When do you think it will pass through the termination shock?

****************************

Bibliographic Note For a concise treatment of the classical quasilinear theory of wave-particle interactions as in Sec. 21.2, see Sec. 49 of Lifshitz and Pitaevskii (1981). Section 51 of this book extends these techniques, concisely, to the study of fluctuations and correlations in plasmas.

33 Box 21.2 Important Concepts in Chapter 22 • Quasilinear theory of wave-particle interactions in classical language, Sec. 21.2 – – – –

Averaging over several wavelengths to get unperturbed quantities, Sec. 21.2.1 Spectral energy density for plasma waves, Sec. 21.2.1 Time derivatives moving with plasma waves or particles, Sec. 21.2.1 Diffusion of particles due to interaction with waves; diffusion coefficient; resonant (surfing) particles and nonresonant particles, Sec. 21.2.1

• Quasilinear theory of wave-particle interactions in quantum language, Secs. 21.3.1, 21.3.2 – Plasma modes or states, and plasmons, Sec. 21.3.1 – Mean occupation number for plasmon modes and its relationship to spectral energy density, Sec. 21.3.1 – Feynman diagrams and fundamental rate W (v, k), Sec. 21.3.1 – Time reversal invariance of fundamental rate (unitarity; detailed balance), Sec. 21.3.1 – Spontaneous emission, stimulated emission and absorption, and the master equation that includes all of them, Sec. 21.3.1 – Spontaneous emission viewed as Cerenkov radiation, Sec. 21.3.1 – Comparison of quantum and classical analyses to get value of fundamental rate, Secs. 21.3.1, 21.3.2 – Particle evolution equation as a Fokker-Planck equation, Sec. 21.3.1 • Quasilinear theory of three-wave mixing, Sec. 21.3.3, Exs. 21.5, 21.6 • Quasilinear evolution of bump-on-tail instability and its implications for cosmic rays, Sec. 21.4.1 • Parametric instability in a plasma and stimulated Raman scattering, Sec. 21.5 • Solitons and collisionless shock waves in a plasma, Sec. 21.6

For a more detailed and rich, pedagogical, classical treatment of nonlinear effects in plasmas, see Chaps. 10 and 11 of Krall and Trivelpiece (1973). For a classical treatment that is extended to include excitations of magnetized plasmas, see Chaps. 16–18 of Stix. For an encyclopedic treatment of nonlinear wave-particle and wave-wave interactions formulated using the techniques of quantum theory as in our Sec. 22.3 and extended to a huge variety of wave modes in both unmagnetized and magnetized plasmas, see Tsytovich (1970). However, beware that this book is bursting with typographical errors.

34 For applications to astrophysical plamas, see Melrose (1984) and Parks (1991), and for applications to laser-plasma interactions, see Kruer (1988).

Bibliography Clemmow, P. C. and Dougherty, J. P. 1969. Electrodynamics of Particles and Plasmas, Reading: Addison Wesley. Krall, N. and Trivelpiece, A. W. 1973. Principles of Plasma Physics, New York: McGraw Hill. Kruer, W. L. 1988. The Physics of Laser-Plasma Interactions, Redwood City: Addison Wesley. Lifshitz, E. M. and Pitaevskii, L. P. 1981. Physical Kinetics, Oxford: Pergamon. Melrose, D. B. 1984. Instabilities in Space and Laboratory Plasmas. Cambridge: Cambridge University Press. Parks, G. K. 1991. Physics of Space Plasmas- An Introduction, Redwood City: Addison Wesley. Sagdeev, R. Z. and Kennel, C. F. 1991. “Collisionless Shock Waves,” Scientific American 264, 106. Stix, T. H. 1992. Waves in Plasmas, New York: American Institute of Physics. Tsytovich, V.N. 1970. Nonlinear Effects in Plasma, Publisher????

Contents VI

GENERAL RELATIVITY

ii

22 From Special to General Relativity 22.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Special Relativity Once Again . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2.1 Geometric, Frame-Independent Formulation . . . . . . . . . . . . . . 22.2.2 Inertial Frames and Components of Vectors, Tensors and Physical Laws 22.2.3 Light Speed, the Interval, and Spacetime Diagrams . . . . . . . . . . 22.3 Differential Geometry in General Bases and in Curved Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3.1 Non-Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . 22.3.2 Vectors as Differential Operators; Tangent Space; Commutators . . . 22.3.3 Differentiation of Vectors and Tensors; Connection Coefficients . . . . 22.3.4 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.4 The Stress-Energy Tensor Revisited . . . . . . . . . . . . . . . . . . . . . . . 22.5 The Proper Reference Frame of an Accelerated Observer [MTW pp. 163–176, 327–332] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

i

1 1 2 2 4 6 7 8 12 15 20 24 29

Part VI GENERAL RELATIVITY

ii

General Relativity Version 0822.1.K.pdf, 22 April 2009 We have reached the final Part of this book, in which we present an introduction to the basic concepts of general relativity and its most important applications. This subject, although a little more challenging than the material that we have covered so far, is nowhere near as formidable as its reputation. Indeed, if you have mastered the techniques developed in the first five Parts, the path to the Einstein Field Equations should be short and direct. The General Theory of Relativity is the crowning achievement of classical physics, the last great fundamental theory created prior to the discovery of quantum mechanics. Its formulation by Albert Einstein in 1915 marks the culmination of the great intellectual adventure undertaken by Newton 250 years earlier. Einstein created it after many wrong turns and with little experimental guidance, almost by pure thought. Unlike the special theory, whose physical foundations and logical consequences were clearly appreciated by physicists soon after Einstein’s 1905 formulation, the unique and distinctive character of the general theory only came to be widely appreciated long after its creation. Ultimately, in hindsight, rival classical theories of gravitation came to seem unnatural, inelegant and arbitrary by comparison.1 Experimental tests of Einstein’s theory also were slow to come; only since 1970 have there been striking tests of high enough precision to convince most empiricists that, in all probability, and in its domain of applicability, general relativity is essentially correct. Despite this, it is still very poorly tested compared with, for example, quantum electrodynamics. We begin our discussion of general relativity Chap. 22 with a review and elaboration of special relativity as developed in Chap. 1, focusing on those that are crucial for the transition to general relativity. Our elaboration includes: (i) an extension of differential geometry to curvilinear coordinate systems and general bases both in the flat spacetime of special relativity and in the curved spacetime that is the venue for general relativity, (ii) an in-depth exploration of the stress-energy tensor, which in general relativity generates the curvature of spacetime, and (iii) construction and exploration of the reference frames of accelerated observers, e.g. physicists who reside on the Earth’s surface. In Chap. 23, we turn to the basic concepts of general relativity, including spacetime curvature, the Einstein Field Equation that governs the generation of spacetime curvature, the laws of physics in curved spacetime, and weak-gravity limits of general relativity. For a readable account at a popular level, see Will (1993); for a more detailed, scholarly account see, e.g. Pais (1982). 1

iii

iv In the remaining chapters, we explore applications of general relativity to stars, black holes, gravitational waves, experimental tests of the theory, and cosmology. We begin in Chap 24 by studying the spacetime curvature around and inside highly compact stars (such as neutron stars). We then discuss the implosion of massive stars and describe the circumstances under which the implosion inevitably produces a black hole, we explore the surprising and, initially, counter-intuitive properties of black holes (both nonspinning holes and spinning holes), and we learn about the many-fingered nature of time in general relativity. In Chap. 25 we study experimental tests of general relativity, and then turn attention to gravitational waves, i.e. ripples in the curvature of spacetime that propagate with the speed of light. We explore the properties of these waves, their close analogy with electromagnetic waves, their production by binary stars and merging black holes, projects to detect them, both on earth and in space, and the prospects for using them to explore observationally the dark side of the universe and the nature of ultrastrong spacetime curvature. Finally, in Chap. 26 we draw upon all the previous Parts of this book, combining them with general relativity to describe the universe on the largest of scales and longest of times: cosmology. It is here, more than anywhere else in classical physics, that we are conscious of reaching a frontier where the still-promised land of quantum gravity beckons.

Chapter 22 From Special to General Relativity Version 0822.1.K.pdf, 22 April 2009 Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125

Box 22.1 Reader’s Guide • This chapter relies significantly on – The special relativity portions of Chap. 1. – The discussion of connection coefficients in Sec. 10.5. • This chapter is a foundation for the presentation of general relativity theory in Chaps. 23–26.

22.1

Overview

We begin our discussion of general relativity in this chapter with a review and elaboration of relevant material already covered in earlier chapters. In Sec. 22.2, we give a brief encapsulation of the special theory drawn largely from Chap. 1, emphasizing those aspects that underpin the transition to general relativity. Then in Sec. 22.3 we collect, review and extend the fundamental ideas of differential geometry that have been scattered throughout the book and which we shall need as foundations for the mathematics of spacetime curvature (Chap. 23); most importantly, we generalize differential geometry to encompass coordinate systems whose coordinate lines are not orthogonal and bases that are not orthonormal Einstein’s field equations are a relationship between the curvature of spacetime and the matter that generates it, akin to the Maxwell equations’ relationship between the electromagnetic field and electric currents and charges. The matter is described using the stress-energy 1

2 tensor that we introduced in Sec. 1.12. We revisit the stress-energy tensor in Sec. 22.4 and develop a deeper understanding of its properties. In general relativity one often wishes to describe the outcome of measurements made by observers who refuse to fall freely—e.g., an observer who hovers in a spaceship just above the horizon of a black hole, or a gravitationalwave experimenter in an earth-bound laboratory. As a foundation for treating such observers, in Sec. 22.5 we examine measurements made by accelerated observers in the flat spacetime of special relativity.

22.2

Special Relativity Once Again

A pre-requisite to learning the theory of general relativity is to understand special relativity in geometric language. In Chap. 1, we discussed the foundations of special relativity with this in mind and it is now time to remind ourselves of what we learned.

22.2.1

Geometric, Frame-Independent Formulation

In Chap. 1 we learned that every law of physics must be expressible as a geometric, frameindependent relationship between geometric, frame-independent objects. This is equally true in Newtonian physics, in special relativity and in general relativity. The key difference between the three is the geometric arena: In Newtonian physics the arena is 3-dimensional Euclidean space; in special relativity it is 4-dimensional Minkowski spacetime; in general relativity (Chap. 23) it is 4-dimensional curved spacetime; see Fig. 1.1 and associated discussion. In special relativity, the demand that the laws be geometric relationships between geometric objects in Minkowski spacetime is called the Principle of Relativity; see Sec. 1.2. Examples of the geometric objects are: (i) a point P in spacetime (which represents an event); (ii) a parametrized curve in spacetime such as the world line P(τ ) of a particle, for which the parameter τ is the particle’s proper time, i.e. the time measured by an ideal clock1 that the particle carries (Fig. 22.1); (iii) vectors such as the particle’s 4-velocity "u = dP/dτ [the tangent vector to the curve P(τ )] and the particle’s 4-momentum p" = m"u (with m the particle’s rest mass); and (iv) tensors such as the electromagnetic field tensor F( , ). A " and tensor, as we recall, is a linear real-valued function of vectors; when one puts vectors A " into the slots of F, one obtains a real number (a scalar) F(A, " B) " that is linear in A " and in B " " " " " " " " " B so for example F(A, bB + cC) = bF(A, B) + cF(A, C). When one puts a vector B into just one of the slots of F and leaves the other empty, one obtains a tensor with one empty slot, " i.e. a vector. The result of putting a vector into the slot of a vector is the scalar F( , B), " B) " =D " ·B " = g(D, " B), " where g( , ) is the metric. product, D( In Secs. 1.2 and 1.3 we tied our definitions of the inner product and the metric to the ticking of ideal clocks: If ∆"x is the vector separation of two neighboring events P(τ ) and Recall that an ideal clock is one that ticks uniformly when compared, e.g., to the period of the light emitted by some standard type of atom or molecule, and that has been made impervious to accelerations so two ideal clocks momentarily at rest with respect to each other tick at the same rate independent of their relative acceleration; cf. Secs. 1.2 and 1.4, and for greater detail, pp. 23–29 and 395–399 of MTW. 1

3

t

7

u" 6 5 4 3

u" 2 1

x

! =0

y

Fig. 22.1: The world line P(τ ) of a particle in Minkowski spacetime and the tangent vector "u = dP/dτ to this world line; "u is the particle’s 4-velocity. The bending of the world line is produced by some force that acts on the particle, e.g. by the Lorentz force embodied in Eq. (22.3). Also shown is the light cone emitted from the event P(τ = 1). Although the axes of an (arbitrary) inertial reference frame are shown, no reference frame is needed for the definition of the world line or its tangent vector "u or the light cone, or for the formulation of the Lorentz force law.

P(τ + ∆τ ) along a particle’s world line, then g(∆"x, ∆"x) ≡ ∆"x · ∆"x ≡ −(∆τ )2 .

(22.1)

This relation for any particle with any timelike world line, together with the linearity of g( , ) in its two slots, is enough to determine g completely and to guarantee that it is " B) " = g(B, " A) " for all A " and B. " Since the particle’s 4-velocity "u is symmetric, g(A, "u =

dP P(τ + ∆τ ) − P(τ ) ∆"x = lim ≡ lim , ∆τ →0 ∆τ →0 ∆τ dτ ∆τ

(22.2)

Eq. (22.1) implies that "u · "u = g("u, "u) = −1. The 4-velocity "u is an example of a timelike vector; it has a negative inner product with itself (negative “squared length”). This shows up pictorially in the fact that "u lies inside the light cone (the cone swept out by the trajectories of photons emitted from the tail of "u; see Fig. 22.1). Vectors "k on the light cone (the tangents to the world lines of the photons) " that lie are null and so have vanishing squared lengths, "k · "k = g("k, "k) = 0; and vectors A "·A " > 0. outside the light cone are spacelike and have positive squared lengths, A An example of a physical law in 4-dimensional geometric language is the Lorentz force law d" p = qF( , "u) , (22.3) dτ where q is the particle’s charge and both sides of this equation are vectors, i.e. first-rank tensors, i.e. tensors with just one slot. As we learned in Sec. 1.5, it is convenient to give names to slots. When we do so, we can rewrite the Lorentz force law as dpα = qF αβ uβ . dτ

(22.4)

4 Here α is the name of the slot of the vector d" p/dτ , α and β are the names of the slots of F, β is the name of the slot of u, and the double use of β with one up and one down on the right side of the equation represents the insertion of "u into the β slot of F, whereby the two β slots disappear and we wind up with a vector whose slot is named α. As we learned in Sec. 1.5, this slot-naming index notation is isomorphic to the notation for components of vectors, tensors, and physical laws in some reference frame. However, no reference frames are needed or involved when one formulates the laws of physics in geometric, frame-independent language as above. Those readers who do not feel completely comfortable with these concepts, statements and notation should reread the relevant portions of Chap. 1.

22.2.2

Inertial Frames and Components of Vectors, Tensors and Physical Laws

In special relativity a key role is played by inertial reference frames. An inertial frame is an (imaginary) latticework of rods and clocks that moves through spacetime freely (inertially, without any force acting on it). The rods are orthogonal to each other and attached to inertial-guidance gyroscopes so they do not rotate. These rods are used to identify the spatial, Cartesian coordinates (x1 , x2 , x3 ) = (x, y, z) of an event P [which we also denote by lower case Latin indices xj (P) with j running over 1,2,3]. The latticework’s clocks are ideal and are synchronized with each other via the Einstein light-pulse process (Sec. 1.2). They are used to identify the temporal coordinate x0 = t of an event P; i.e. x0 (P) is the time measured by that latticework clock whose world line passes through P, at the moment of passage. The spacetime coordinates of P are denoted by lower case Greek indices xα , with α running over 0,1,2,3. An inertial frame’s spacetime coordinates xα (P) are called Lorentz coordinates or inertial coordinates. In the real universe, spacetime curvature is very small in regions well-removed from concentrations of matter, e.g. in intergalactic space; so special relativity is highly accurate there. In such a region, frames of reference (rod-clock latticeworks) that are non-accelerating and non-rotating with respect to cosmologically distant galaxies (and thence with respect to a local frame in which the cosmic microwave radiation looks isotropic) constitute good approximations to inertial reference frames. Associated with an inertial frame’s Lorentz coordinates are basis vectors "eα that point along the frame’s coordinate axes (and thus are orthogonal to each other) and have unit length (making them orthonormal). This orthonormality is embodied in the inner products "eα · "eβ = ηαβ ,

(22.5)

where by definition η00 = −1 ,

η11 = η22 = η33 = +1 ,

ηαβ = 0 if α #= β .

(22.6)

Here and throughout Part VI (as in Chap. 1), we set the speed of light to unity [i.e. we use the geometrized units discussed in Eqs. (1.3a) and (1.3b)], so spatial lengths (e.g. along the

5 x axis) and time intervals (e.g. along the t axis) are measured in the same units, seconds or meters with 1 s = 2.99792458 × 108 m. In Sec. 1.5 we used the basis vectors of an inertial frame to build a component representation of tensor analysis. The fact that the inner products of timelike vectors with each other are negative, e.g. "e0 · "e0 = −1, while those of spacelike vectors are positive, e.g. "e1 · "e1 = +1, forced us to introduce two types of components: covariant (indices down) and contravariant (indices up). The covariant components of a tensor were computable by inserting the basis vectors into the tensor’s slots, uα = "u("eα ) ≡ "u · "eα ; Fαβ = F("eα , "eβ ). For example, in our Lorentz basis the covariant components of the metric are gαβ = g("eα , "eβ ) = "eα · "eβ = ηαβ . The contravariant components of a tensor were related to the covariant components via “index lowering” with the aid of the metric, Fαβ = gαµ gβν F µν , which simply said that one reverses the sign when lowering a time index and makes no change of sign when lowering a space index. This lowering rule implied that the contravariant components of the metric in a Lorentz basis are the same numerically as the covariant components, g αβ = ηαβ and that they can be used to raise indices (i.e. to perform the trivial sign flip for temporal indices) F µν = g µα g νβ Fαβ . As we saw in Sec. 1.5, tensors can be expressed in terms of their contravariant components as p" = pα"eα , and F = F αβ "eα ⊗ "eβ , where ⊗ represents the tensor product [Eq. (1.10a)]. We also learned in Chap. 1 that any frame independent geometric relation between tensors can be rewritten as a relation between those tensors’ components in any chosen Lorentz frame. When one does so, the resulting component equation takes precisely the same form as the slot-naming-index-notation version of the geometric relation. For example, the component version of the Lorentz force law says dpα /dτ = qF αβ uβ , which is identical to Eq. (22.4). The only difference is the interpretation of the symbols. In the component equation F αβ are the components of F and the repeated β in F αβ uβ is to be summed from 0 to 3. In the geometric relation F αβ means F( , ) with the first slot named α and the second β, and the repeated β in F αβ uβ implies the insertion of "u into the second slot of F to produce a single-slotted tensor, i.e. a vector whose slot is named α. As we saw in Sec. 1.6, a particle’s 4-velocity "u (defined originally without the aid of any reference frame; Fig. 22.1) has components, in any inertial frame, given!by u0 = γ, uj = γv j where v j = dxj /dt is the particle’s ordinary velocity and γ ≡ 1/ 1 − δij v i v j . Similarly, the particle’s energy E ≡ p0 is mγ and its spatial momentum is pj = mγv j , i.e. in 3-dimensional geometric notation, p = mγv. This is an example of the manner in which a choice of Lorentz frame produces a “3+1” split of the physics: a split of 4-dimensional spacetime into 3-dimensional space (with Cartesian coordinates xj ) plus 1-dimensional time t = x0 ; a split of the particle’s 4-momentum p" into its 3-dimensional spatial momentum p and its 1-dimensional energy E = p0 ; and similarly a split of the electromagnetic field tensor F into the 3-dimensional electric field E and 3-dimensional magnetic field B; cf. Secs. 1.6 and 1.10. The principle of relativity (all laws expressible as geometric relations between geometric objects in Minkowski spacetime), when translated into 3+1 language, says that, when the laws of physics are expressed in terms of components in a specific Lorentz frame, the form of those laws must be independent of one’s choice of frame. The components of tensors in one Lorentz frame are related to those in another by a Lorentz transformation (Sec. 1.7), so the

6 principle of relativity can be restated as saying that, when expressed in terms of Lorentzframe components, the laws of physics must be Lorentz-invariant (unchanged by Lorentz transformations). This is the version of the principle of relativity that one meets in most elementary treatments of special relativity. However, as the above discussion shows, it is a mere shadow of the true principle of relativity—the shadow cast onto Lorentz frames when one performs a 3+1 split. The ultimate, fundamental version of the principle of relativity is the one that needs no frames at all for its expression: All the laws of physics are expressible as geometric relations between geometric objects that reside in Minkowski spacetime. If the above discussion is not completely clear, the reader should study the relevant portions of Chap. 1.

22.2.3

Light Speed, the Interval, and Spacetime Diagrams

One set of physical laws that must be the same in all inertial frames is Maxwell’s equations. Let us discuss the implications of Maxwell’s equations for the speed of light c, momentarily abandoning geometrized units and returning to mks/SI units. According to Maxwell, c can be determined by performing non-radiative laboratory experiments; it is not necessary to measure the time it takes light to travel along some path. For example, measure the electrostatic force between two charges; that force is ∝ (−1 0 , the electric permitivity of free space. Then allow one of these charges to travel down a wire and by its motion generate a magnetic field. Let the other charge move through this field and measure the magnetic force on it; that force is ∝ µ0 , the magnetic permitivity of free space. The ratio of these two forces can be computed and is ∝ 1/µ0(0 = c2 . By combining the results of the two experiments, we therefore can deduce the speed of light c; this is completely analogous to deducing the speed of seismic waves through rock from a knowledge of the rock’s density and elastic moduli, using elasticity theory (Chap. 11). The principle of relativity, in operational form, dictates that the results of the electric and magnetic experiments must be independent of the Lorentz frame in which one chooses to perform them; therefore, the speed of light is frameindependent—as we argued by a different route in Sec. 1.2. It is this frame independence that enables us to introduce geometrized units with c = 1. Another example of frame independence (Lorentz invariance) is provided by the interval between two events. The components gαβ = ηαβ of the metric imply that, if ∆"x is the vector separating the two events and ∆xα are its components in some Lorentz coordinate system, then the squared length of ∆"x [also called the interval and denoted (∆s)2 ] is given by (∆s)2 ≡ ∆"x · ∆"x = g(∆"x, ∆"x) = gαβ ∆xα ∆xβ = −(∆t)2 + (∆x)2 + (∆y)2 + (∆z)2 . (22.7) Since ∆"x is a geometric, frame-independent object, so must be the interval. This implies that the equation (∆s)2 = −(∆t)2 + (∆x)2 + (∆y)2 + (∆z)2 by which one computes the interval between the two chosen events in one Lorentz frame must give the same numerical result when used in any other frame; i.e., this expression must be Lorentz invariant. This invariance of the interval is the starting point for most introductions to special relativity—and, indeed, we used it as a starting point in Sec. 1.2. Spacetime diagrams will play a major role in our development of general relativity. Ac-

7 cordingly, it is important that the reader feel very comfortable with them. We recommend reviewing Fig. 1.13 and Ex. 1.14. **************************** EXERCISES Exercise 22.1 Example: Invariance of a Null Interval You have measured the intervals between a number of adjacent events in spacetime and thereby have deduced the metric g. Your friend claims that the metric is some other frame˜ that differs from g. Suppose that your correct metric g and his wrong independent tensor g ˜ agree on the forms of the light cones in spacetime, i.e. they agree as to which intervals one g are null, which are spacelike and which are timelike; but they give different answers for ˜ (∆"x, ∆"x). the value of the interval in the spacelike and timelike cases, i.e. g(∆"x, ∆"x) #= g ˜ and g differ solely by a scalar multiplicative factor. [Hint: pick some Lorentz Prove that g frame and perform computations there, then lift yourself back up to a frame-independent viewpoint.] Exercise 22.2 Problem: Causality If two events occur at the same spatial point but not simultaneously in one inertial frame, prove that the temporal order of these events is the same in all inertial frames. Prove also that in all other frames the temporal interval ∆t between the two events is larger than in the first frame, and that there are no limits on the events’ spatial or temporal separation in the other frames. Give two proofs of these results, one algebraic and the other via spacetime diagrams. ****************************

22.3

Differential Geometry in General Bases and in Curved Manifolds

The tensor-analysis formalism reviewed in the last section is inadequate for general relativity in several ways: First, in general relativity we shall need to use bases "eα that are not orthornormal, i.e. for which "eα ·"eβ #= ηαβ . For example, near a spinning black hole there is much power in using a time basis vector "et that is tied in a simple way to the metric’s time-translation symmetry and a spatial basis vector "eφ that is tied to its rotational symmetry. This time basis vector has an inner product with itself "et · "et = gtt that is influenced by the slowing of time near the hole so gtt #= −1; and "eφ is not orthogonal to "et , "et · "eφ = gtφ #= 0, as a result of the dragging of inertial frames by the hole’s spin. In this section we shall generalize our formalism to treat such non-orthonormal bases. Second, in the curved spacetime of general relativity (and in any other curved manifold, e.g. the two-dimensional surface of the earth) the definition of a vector as an arrow connecting

8 two points is suspect, as it is not obvious on what route the arrow should travel nor that the linear algebra of tensor analysis should be valid for such arrows. In this section we shall refine the concept of a vector to deal with this problem, and in the process we shall find ourselves introducing the concept of a tangent space in which the linear algebra of tensors takes place—a different tangent space for tensors that live at different points in the manifold. Third, once we have been forced to think of a tensor as residing in a specific tangent space at a specific point in the manifold, there arises the question of how one can transport tensors from the tangent space at one point to the tangent space at an adjacent point. Since the notion of a gradient of a vector depends on comparing the vector at two different points and thus depends on the details of transport, we will have to rework the notion of a gradient and the gradient’s connection coefficients; and since, in doing an integral, one must add contributions that live at different points in the manifold, we must also rework the notion of integration. We shall tackle each of these three issues in turn in the following four subsections.

22.3.1

Non-Orthonormal Bases

Consider an n-dimensional manifold, e.g. 4-dimensional spacetime or 3-dimensional Euclidean space or the 2-dimensional surface of a sphere. At some point P in the manifold, introduce a set of basis vectors {"e1 , "e2 , . . . , "en } and denote them generally as "eα . We seek to generalize the formalism of Sec. 22.2 in such a way that the index manipulation rules for components of tensors are unchanged. For example, we still want it to be true that covariant components of any tensor are computable by inserting the basis vectors into the tensor’s slots, Fαβ = F("eα , "eβ ), and that the tensor itself can be reconstructed from its contravariant components as F = F µν "eµ ⊗ "eν , and that the two sets of components are computable from each other via raising and lowering with the metric components, Fαβ = gαµ gβν F µν . The only thing we do not want to preserve is the orthonormal values of the metric components; i.e. we must allow the basis to be nonorthonormal and thus "eα · "eβ = gαβ to have arbitrary values (except that the metric should be nondegenerate, so no linear combination of the "eα ’s vanishes, which means that the matrix ||gαβ || should have nonzero determinant). We can easily achieve our goal by introducing a second set of basis vectors, denoted {"e1 , "e2 , . . . , "en }, which is dual to our first set in the sense that "eµ · "eβ ≡ g("eµ , "eβ ) = δ µ β

(22.8)

where δ α β is the Kronecker delta. This duality relation actually constitutes a definition of the eµ once the "eα have been chosen. To see this, regard "eµ as a tensor of rank one. This tensor is defined as soon as its value on each and every vector has been determined. Expression (22.8) gives the value "eµ ("eβ ) = "eµ · "eβ of "eµ on each of the four basis vectors "eβ ; and since every other vector can be expanded in terms of the "eβ ’s and "eµ ( ) is a linear function, Eq. (22.8) thereby determines the value of "eµ on every other vector. The duality relation (22.8) says that "e1 is always perpendicular to all the "eα except "e1 ; and its scalar product with "e1 is unity—and similarly for the other basis vectors. This interpretation is illustrated for 3-dimensional Euclidean space in Fig. 22.2. In Minkowski

9

e3

e3

e2 e1

e1

Fig. 22.2: Non-orthonormal basis vectors "ej in Euclidean 3-space and two members "e 1 and "e 3 of the dual basis. The vectors "e1 and "e2 lie in the horizontal plane, so "e 3 is orthogonal to that plane, i.e. it points vertically upward, and its inner product with "e3 is unity. Similarly, the vectors "e2 and "e3 span a vertical plane, so "e 1 is orthogonal to that plane, i.e. it points horizontally, and its inner product with "e1 is unity.

spacetime, if "eα are an orthonormal Lorentz basis, then duality dictates that "e0 = −"e0 , and "ej = +"ej . The duality relation (22.8) leads immediately to the same index-manipulation formalism as we have been using, if one defines the contravariant, covariant and mixed components of tensors in the obvious manner F µν = F("eµ , "eν ) ,

Fαβ = F("eα , "eβ ) ,

F µ β = F("eµ , "eβ ) ;

(22.9)

see Ex. 22.4. Among the consequences of this duality are the following: (i) g µβ gνβ = δνµ ,

(22.10)

i.e., the matrix of contravariant components of the metric is inverse to that of the covariant components, ||g µν || = ||gαβ ||−1; this relation guarantees that when one raises indices on a tensor Fαβ with g µα and then lowers them back down with gνβ , one recovers one’s original covariant components Fαβ unaltered. (ii) F = F µν "eµ ⊗ "eν = Fαβ "eα ⊗ "eβ = F µ β "eµ ⊗ "eβ ,

(22.11)

i.e., one can reconstruct a tensor from its components by lining up the indices in a manner that accords with the rules of index manipulation. (iii) F(" p, "q) = F αβ pα pβ ,

(22.12)

i.e., the component versions of tensorial equations are identical in mathematical symbology to the slot-naming-index-notation versions.

10 Box 23.1 Dual Bases in Other Contexts Vector spaces appear in a wide variety of contexts in mathematics and physics, and wherever they appear it can be useful to introduce dual bases. When a vector space does not posses a metric, the basis {"eµ } lives in a different space from {"eα }, and the two spaces are said to be dual to each other. An important example occurs in manifolds that do not have metrics. There the vectors in the space spanned by {"eµ } are often called a one forms and are represented pictorially as families of parallel surfaces; the vectors in the space spanned by {"eα } are called tangent vectors and are represented pictorially as arrows; the one forms are linear functions of tangent vectors, and the result that a one form β˜ gives when a tangent vector "a, is inserted into ˜ a), is the number of surfaces of β˜ pierced by the arrow "a; see, e.g., MTW. A its slot, β(" metric produces a one-to-one mapping between the one forms and the tangent vectors. In this book we regard this mapping as equating each one form to a tangent vector and thereby as making the space of one forms and the space of tangent vectors be identical. This permits us to avoid ever speaking about one forms, except here in this box. Quantum mechanics provides another example of dual spaces. The kets |ψ' are the tangent vectors and the bras (φ| are the one forms: linear complex valued functions of kets with the value that (φ| gives when |ψ' is inserted into its slot being the inner product (φ|ψ'. Associated with any coordinate system xα (P) there is a coordinate basis whose basis vectors are defined by ∂P "eα ≡ α . (22.13) ∂x Since the derivative is taken holding the other coordinates fixed, the basis vector "eα points along the α coordinate axis (the axis on which xα changes and all the other coordinates are held fixed). In an orthogonal curvilinear coordinate system, e.g. circular polar coordinates (,, φ) in Euclidean 2-space, this coordinate basis is quite different from the coordinate system’s orthonormal basis. For example, "eφ = (∂P/∂φ)& is a very long vector at large radii and a very short vector at small radii [cf. Fig. 22.3]; the corresponding unit-length vector is "eφˆ = (1/,)"eφ . By contrast, "e& = (∂P/∂,)φ already has unit length, so the corresponding orthonormal basis vector is simply "e&ˆ = "e& . The metric components in the coordinate basis are readily seen to be gφφ = , 2, g&& = 1, g&φ = gφ& = 0 which is in accord with the equation for the squared distance (interval) between adjacent points ds2 = gij dxi dxj = d, 2 + , 2 dφ2 . The metric components in the orthonormal basis, of course, are gîˆj = δij . Henceforth, we shall use hats to identify orthonormal bases; bases whose indices do not have hats will typically (though not always) be coordinate bases. In general, we can construct the basis {"eµ } that is dual to the coordinate basis {"eα } =

11

e% e%

e#$

e%

e#$

Fig. 22.3: A circular coordinate system {#, φ} and its coordinate basis vectors "e& = ∂P/∂#, "eφ = ∂P/∂φ at several locations in the coordinate system. Also shown is the orthonormal basis vector "eφˆ.

{∂P/∂xα } by taking the gradients of the coordinates, viewed as scalar fields xα (P): " µ. "eµ = ∇x

(22.14)

It is straightforward to verify the duality relation (22.8) for these two bases: µ " µ = ∇'eα xµ = ∇∂P/∂xα xµ = ∂x = δαµ . "eµ · "eα = "eα · ∇x ∂xα

(22.15)

In any coordinate system, the expansion of the metric in terms of the dual basis, g = " α ⊗ ∇x " β is intimately related to the line element ds2 = gαβ dxα dxβ : gαβ "eα ⊗ "eβ = gαβ ∇x Consider an infinitesimal vectorial displacement d"x = dxα (∂/∂xα ). Insert this displacement into the metric’s two slots, to obtain the interval ds2 along d"x. The result is ds2 = gαβ ∇xα ⊗ ∇xβ (d"x, d"x) = gαβ (d"x · ∇xα )(d"x · ∇xβ ) = gαβ dxα dxβ ; i.e. ds2 = gαβ dxα dxβ .

(22.16)

Here the second equality follows from the definition of the tensor product ⊗, and the third from the fact that for any scalar field ψ, d"x · ∇ψ is the change dψ along d"x. Any two bases {"eα } and {"eµ¯ } can be expanded in terms of each other: "eα = "eµ¯ Lµ¯ α ,

"eµ¯ = "eα Lα µ¯ .

(22.17)

(Note: by convention the first index on L is always placed up and the second is always placed down.) The quantities ||Lµ¯ α || and ||Lα µ¯ || are transformation matrices and since they operate in opposite directions, they must be the inverse of each other Lµ¯ α Lα ν¯ = δ µ¯ ν¯ ,

Lα µ¯ Lµ¯ β = δ α β .

(22.18)

12 These ||Lµ¯ α || are the generalizations of Lorentz transformations to arbitrary bases; cf. Eqs. (1.47a), (1.47b). As in the Lorentz-transformation case, the transformation laws (22.17) for the basis vectors imply corresponding transformation laws for components of vectors and tensors—laws that entail lining up indices in the obvious manner; e.g. Aµ¯ = Lα µ¯ Aα ,

T µ¯ν¯ ρ¯ = Lµ¯ α Lν¯ β Lγ ρ¯T αβ γ ,

and similarly in the opposite direction. (22.19) µ ¯ For coordinate bases, these L α are simply the partial derivatives of one set of coordinates with respect to the other ∂xµ¯ ∂xα µ ¯ α L α = α , L µ¯ = , (22.20) ∂x ∂xµ¯ as one can easily deduce via "eα =

∂xµ ∂P ∂xµ ∂P = = " e . µ ∂xα ∂xα ∂xµ ∂xα

(22.21)

In many physics textbooks a tensor is defined as a set of components Fαβ that obey the transformation laws ∂xµ ∂xν Fαβ = Fµν α β . (22.22) ∂x ∂x This definition is in accord with Eqs. (22.19) and (22.20), though it hides the true and very simple nature of a tensor as a linear function of frame-independent vectors.

22.3.2

Vectors as Differential Operators; Tangent Space; Commutators

As was discussed above, the notion of a vector as an arrow connecting two points is problematic in a curved manifold, and must be refined. As a first step in the refinement, let " to a curve P(ζ) at some point Po ≡ P(ζ = 0). We have us consider the tangent vector A defined that tangent vector by the limiting process " ≡ dP ≡ lim P(∆ζ) − P(0) ; A ∆ζ→0 dζ ∆ζ

(22.23)

cf. Eq. (22.2). In this definition the difference P(ζ) − P(0) means the tiny arrow reaching from P(0) ≡ Po to P(∆ζ). In the limit as ∆ζ becomes vanishingly small, these two points get arbitrarily close together; and in such an arbitrarily small region of the manifold, the effects of the manifold’s curvature become arbitrarily small and negligible (just think of an arbitrarily tiny region on the surface of a sphere), so the notion of the arrow should become sensible. However, before the limit is completed, we are required to divide by ∆ζ, which makes our arbitrarily tiny arrow big again. What meaning can we give to this? One way to think about it is to imagine embedding the curved manifold in a higher dimensional flat space (e.g., embed the surface of a sphere in a flat 3-dimensional Euclidean space as shown in Fig. 22.4). Then the tiny arrow P(∆ζ) − P(0) can be thought of equally

13 A=

dP d&

&=0.5 &=0 &= '0.5

" = dP/dζ at Fig. 22.4: A curve P(ζ) on the surface of a sphere and the curve’s tangent vector A P(ζ = 0) ≡ Po . The tangent vector lives in the tangent space at Po , i.e. in the flat plane that is tangent to the sphere there as seen in the flat Euclidean 3-space in which the sphere’s surface is embedded.

well as lying on the sphere, or as lying in a surface that is tangent to the sphere and is flat, as measured in the flat embedding space. We can give meaning to [P(∆ζ) − P(0)]/∆ζ if we regard this as a formula for lengthening an arrow-type vector in the flat tangent surface; " as an arrow living in the correspondingly, we must regard the resulting tangent vector A tangent surface. The (conceptual) flat tangent surface at the point Po is called the tangent space to the curved manifold at that point. It has the same number of dimensions n as the manifold itself (two in the case of Fig. 22.4). Vectors at Po are arrows residing in that point’s tangent space, tensors at Po are linear functions of these vectors, and all the linear algebra of vectors and tensors that reside at Po occurs in this tangent space. For example, the inner product " and B " at Po (two arrows living in the tangent space there) is computed via of two vectors A "·B " = g(A, " B) " using the metric g that also resides in the tangent the standard relation A space. This pictorial way of thinking about the tangent space and vectors and tensors that reside in it is far too heuristic to satisfy most mathematicians. Therefore, mathematicians have insisted on making it much more precise at the price of greater abstraction: Mathematicians define the tangent vector to the curve P(ζ) to be the derivative d/dζ which differentiates scalar fields along the curve. This derivative operator is very well defined by the rules of ordinary differentiation; if ψ(P) is a scalar field in the manifold, then ψ[P(ζ)] is a function of the real variable ζ, and its derivative (d/dζ)ψ[P(ζ)] evaluated at ζ = 0 is the ordinary derivative of elementary calculus. Since the derivative operator d/dζ differentiates in the

14 manifold along the direction in which the curve is moving, it is often called the directional derivative along P(ζ). Mathematicians notice that all the directional derivatives at a point Po of the manifold form a vector space (they can be multiplied by scalars and added and subtracted to get new vectors), and so they define this vector space to be the tangent space at Po . This mathematical procedure turns out to be isomorphic to the physicists’ more heuristic way of thinking about the tangent space. In physicists’ language, if one introduces a coordinate system in a region of the manifold containing Po and constructs the corresponding coordinate basis "eα = ∂P/∂xα , then one can expand any vector in the tangent " = Aα ∂P/∂xα . One can also construct, in physicists’ language, the directional space as A " it is ∂ ' ≡ Aα ∂/∂xα . Evidently, the components Aα of the physicist’s derivative along A; A " (an arrow) are identical to the coefficients Aα in the coordinate-expansion of the vector A directional derivative ∂A' . There therefore is a one-to-one correspondence between the direc" there, and a complete isomorphism between tional derivatives ∂A' at Po and the vectors A the tangent-space manipulations that a mathematician will perform treating the directional derivatives as vectors, and those that a physicist will perform treating the arrows as vectors. " to “Why not abandon the fuzzy concept of a vector as an arrow, and redefine the vector A be the same as the directional derivative ∂A' ?” mathematicians have demanded of physicists. Slowly, over the past century, physicists have come to see the merit in this approach: (i ) It does, indeed, make the concept of a vector more rigorous than before. (ii ) It simplifies a number of other concepts in mathematical physics, e.g., the commutator of two vector fields; see below. (iii ) It facilitates communication with mathematicians. With these motivations in mind, and because one always gains conceptual and computational power by having multiple viewpoints at one’s finger tips (see, e.g., Feynman, 1966), we shall regard vectors henceforth both as arrows living in a tangent space and as directional derivatives. Correspondingly, we shall assert the equalities ∂ ∂P " = ∂' , = α , A (22.24) A α ∂x ∂x and shall often expand vectors in a coordinate basis using the notation " = Aα ∂ . A ∂xα

(22.25)

This directional-derivative viewpoint on vectors makes natural the concept of the commu" and B: " [A, " B] " is the vector which, when viewed as a differential tator of two vector fields A operator, is given by [∂A' , ∂B' ]—where the latter quantity is the same commutator as one meets elsewhere in physics, e.g. in quantum mechanics. Using this definition, we can compute the components of the commutator in a coordinate basis: "

# $ % β β ∂ ∂ ∂ ∂B ∂A β α α " B] " ≡ A [A, , B − B . = A ∂xα ∂xβ ∂xα ∂xα ∂xβ α

(22.26)

This is an operator equation where the final derivative is presumed to operate on a scalar field just as in quantum mechanics. From this equation we can read off the components of the

15 A

[A , B ]

B B

A

" B] " of two vector fields. In this diagram the vectors are assumed Fig. 22.5: The commutator [A, to be so small that the curvature of the manifold is negligible in the region of the diagram, so all the vectors can be drawn lying in the surface itself rather than in their respective tangent spaces. In evaluating the two terms in the commutator (22.26), a locally orthonormal coordinate basis is " changes when one travels along A " (i.e. it used, so Aα ∂B β /∂xα is the amount by which the vector B α β α " is the short dashed curve in the upper right), and B ∂A /∂x is the amount by which A changes " (i.e. it is the other short dashed curve). According to Eq. (22.26), the when one travels along B " B]. " As the diagram shows, this difference of these two short-dashed curves is the commutator [A, " " commutator closes the quadrilateral whose legs are A and B. If the commutator vanishes, then there is no gap in the quadrilateral, which means that in the region covered by this diagram one " and B " are coordinate basis vectors. can construct a coordinate system in which A

commutator in any coordinate basis; they are Aα B β ,α − B α Aβ ,α , where the comma denotes partial differentiation. Figure 22.5 uses this equation to deduce the geometric meaning of the commutator: it is the fifth leg needed to close a quadrilateral whose other four legs are " and B. " constructed from the vector fields A The commutator is useful as a tool for distinguishing between coordinate bases and non-coordinate bases (also called non-holonomic bases): In a coordinate basis, the basis vectors are just the coordinate system’s partial derivatives, "eα = ∂/∂xα , and since partial derivatives commute, it must be that ["eα , "eβ ] = 0. Conversely (as Fig. 22.5 explains), if one has a basis with vanishing commutators ["eα , "eβ ] = 0, then it is possible to construct a coordinate system for which this is the coordinate basis. In a non-coordinate basis, at least one of the commutators ["eα , "eβ ] will be nonzero.

22.3.3

Differentiation of Vectors and Tensors; Connection Coefficients

In a curved manifold, the differentiation of vectors and tensors is rather subtle. To elucidate the problem, let us recall how we defined such differentiation in Minkowski spacetime or Euclidean space (Sec. 1.9). Converting to the above notation, we began by defining the " = d/dζ to a curve directional derivative of a tensor field F(P) along the tangent vector A P(ζ): F[P(∆ζ)] − F[P(0)] . ∇A' F ≡ lim (22.27) ∆ζ→0 ∆ζ

16 This definition is problematic because F[P(∆ζ))] lives in a different tangent space than F[P(0)]. To make the definition meaningful, we must identify some connection between the two tangent spaces, when their points P(∆ζ) and P(0) are arbitrarily close together. That connection is equivalent to identifying a rule for transporting F from one tangent space to the other. In flat space or flat spacetime, and when F is a vector F" , that transport rule is obvious: keep F" parallel to itself and keep its length fixed during the transport; in other words, keep constant its components in an orthonormal coordinate system (Cartesian coordinates in Euclidean space, Lorentz coordinates in Minkowski spacetime). This is called the law of parallel transport. For a tensor F the parallel transport law is the same: keep its components fixed in an orthonormal coordinate basis. In curved spacetime there is no such thing as an orthonormal coordinate basis. Just as the curvature of the earth’s surface prevents one from placing a Cartesian coordinate system on it, so the spacetime curvature prevents one from introducing Lorentz coordinates; see Chap. 23. However, in an arbitrarily small region on the earth’s surface one can introduce coordinates that are arbitrarily close to Cartesian (as surveyors well know); the fractional deviations from Cartesian need be no larger than O(L2 /R2 ), where L is the size of the region and R is the earth’s radius (see Sec.24.3). Correspondingly, in curved spacetime, in an arbitrarily small region one can introduce coordinates that are arbitrarily close to Lorentz, differing only by amounts quadratic in the size of the region. Such coordinates are sufficiently like their flat space counterparts that they can be used to define parallel transport in the curved manifolds: In Eq. (22.27) one must transport F from P(∆ζ) to P(0), holding its components fixed in a locally orthonormal coordinate basis (parallel transport), and then take the difference in the tangent space at Po = P(0), divide by ∆ζ, and let ∆ζ → 0. The result is a tensor at Po : the directional derivative ∇A' F of F. Having made the directional derivative meaningful, one can proceed as in Sec. 1.9, and " " [i.e., put A " in the last, differentiation, slot define the gradient of F by ∇A' F = ∇F( , , A) " Eq. (1.54b)]. of ∇F; " by Fαβ;γ ; and as in Sec. 10.5 As in Chap. 1, in any basis we denote the components of ∇F (elasticity theory), we can compute these components in any basis with the aid of that basis’s connection coefficients (also called Christoffel symbols). In Sec. 10.5 we restricted ourselves to an orthonormal basis in Euclidean space and thus had no need to distinguish between covariant and contravariant indices; all indices were written as subscripts. Now, with non-othonormal bases and in spacetime, we must distinguish covariant and contravariant indices. Accordingly, by analogy with Eq. (10.38), we define the connection coefficients Γµ αβ as ∇β "eα = Γµ αβ "eµ ,

(22.28)

where ∇β ≡ ∇'eβ . The duality between bases "eν · "eα = δαν then implies ∇β "eµ = −Γµ αβ "eα .

(22.29)

Note the sign flip, which is required to keep ∇β ("eµ ·"eα ) = 0, and note that the differentiation index always goes last. Duality also implies that Eqs. (22.28) and (22.29) can be rewritten

17 as Γµ αβ = "eµ · ∇β "eα = −"eα ∇β "eµ .

(22.30)

With the aid of these connection coefficients, we can evaluate the components Aα;β of the gradient of a vector field in any basis. We just compute " = ∇β ( A " µ"eµ ) = (∇β Aµ )"eµ + A " µ ∇β "eµ Aµ ;β "eµ = ∇β A = Aµ ,β"eµ + Aµ Γα µβ "eα = (Aµ ,β + Aα Γµ αβ )"eµ .

(22.31)

In going from the first line to the second, we have used the notation Aµ ,β ≡ ∂'eβ Aµ ;

(22.32)

i.e. the comma denotes the result of letting a basis vector act as a differential operator on the component of the vector. In going from the second line of (22.31) to the third, we have renamed the summed-over index α µ and renamed µ α. By comparing the first and last expressions in Eq. (22.31), we conclude that Aµ ;β = Aµ ,β + Aα Γµ αβ .

(22.33)

" associated with changes of its The first term in this equation dscribes the changes in A components; the second term corrects for artificial changes of components that are induced by turning and length changes of the basis vectors. By a similar computation, we conclude that in any basis the covariant components of the gradient are Aα;β = Aα,β − Γµ αβ Aµ , (22.34) where again Aα,β ≡ ∂β Aα . Notice that when the index being “corrected” is down [Eq. (22.34)], the connection coefficient has a minus sign; when it is up [Eq. (22.33)], the connection coefficient has a plus sign. This is in accord with the signs in Eqs. (22.29)–(22.30). These considerations should make obvious the following equations for the components of the gradient of a tensor: F αβ ;γ = F αβ ,γ + Γα µγ F µβ + Γβ µγ F αµ ,

Fαβ;γ = Fαβ,γ − Γµ αγ Fµβ − Γµ βγ Fαµ .

(22.35)

Notice that each index of F must be corrected, the correction has a sign dictated by whether the index is up or down, the differentiation index always goes last on the Γ, and all other indices can be deduced by requiring that the free indices in each term be the same and all other indices be summed. If we have been given a basis, then how can we compute the connection coefficients? We can try to do so by drawing pictures and examining how the basis vectors change from point to point—a method that is fruitful in spherical and cylindrical coordinates in Euclidean space (Sec. 10.5). However, in other situations this method is fraught with peril, so we need a firm mathematical prescription. It turns out that the following prescription works; see below for a proof:

18 (i ) Evaluate the commutation coefficients cαβ ρ of the basis, which are defined by the two equivalent relations ["eα , "eβ ] ≡ cαβ ρ"eρ , cαβ ρ ≡ "eρ · ["eα , "eβ ] . (22.36) [Note that in a coordinate basis the commutation coefficients will vanish. Warning: commutation coefficients also appear in the theory of Lie Groups; there it is conventional to use a different ordering of indices than here, cαβ ρ here = cραβLie groups .] (ii ) Lower the last index on the commutation coefficients using the metric components in the basis: cαβγ ≡ cαβ ρ gργ .

(22.37)

(iii ) Compute the covariant Christoffel symbols 1 Γαβγ ≡ (gαβ,γ + gαγ,β − gβγ,α + cαβγ + cαγβ − cβγα ) . 2

(22.38)

Here the commas denote differentiation with respect to the basis vectors as though the connection coefficients were scalar fields [Eq. (22.32)]. Notice that the pattern of indices is the same on the g’s and on the c’s. It is a peculiar pattern—one of the few aspects of index gymnastics that cannot be reconstructed by merely lining up indices. In a coordinate basis the c’s will vanish and Γαβγ will be symmetric in its last two indices; in an orthonormal basis gµν are constant so the g’s will vanish and Γαβγ will be antisymmetric in its first two indices; and in a Cartesian or Lorentz coordinate basis, which is both coordinate and orthonormal, both the c’s and the g’s will vanish, so Γαβγ will vanish. (iv ) Raise the first index on the covariant Christoffel symbols to obtain the connection coefficients, which are also sometimes called the mixed Christoffel symbols Γµ βγ = g µα Γαβγ .

(22.39)

" is an example of a geometric object that is not a tensor. The The gradient operator ∇ " and because ∇ " is not a connection coefficients can be regarded as the components of ∇; α tensor, these components Γ βγ do not obey the tensorial transformation law (22.19) when switching from one basis to another. Their transformation law is far more complicated and is very rarely used. Normally one computes them from scratch in the new basis, using the above prescription or some other, equivalent prescription (cf. Chap. 14 of MTW). For most curved spacetimes that one meets in general relativity, these computations are long and tedious and therefore are normally carried out on computers using symbolic manipulations software such as Macsyma, or GRTensor (running under Maple or Mathematica), or Mathtensor (under Mathematica). Such software is easily found on the Internet using a search engine. The above prescription for computing the connection coefficients follows from two key " First, The gradient of the metric tensor vanishes, properties of the gradient ∇: " =0. ∇g

(22.40)

This can be seen by introducing a locally orthornormal coordinate basis at the arbitrary point P where the gradient is to be evaluated. In such a basis, the effects of curvature

19 show up only at quadratic order in distance away from P, which means that the coordinate bases "eα ≡ ∂/∂xα behave, at first order in distance, just like those of an orthonormal coordinate system in flat space. Since ∇β "eα involves only first derivatives and it vanishes in an orthonormal coordinate system in flat space, it must also vanish here—which means that " at P the connection coefficients vanish at P in this basis. Therefore, the components of ∇g are gαβ;γ = gαβ,γ = ∂gαβ /∂xγ , which vanishes since the components of g in this basis are all 0 or ±1 plus corrections second order in distance from P. This vanishing of the components of " in our special basis guarantees that ∇g " itself vanishes at P; and since P was an arbitrary ∇g " must vanish everywhere and always. point, ∇g " and B, " the gradient is related to the commutator by Second, for any two vector fields A " − ∇' A " = [A, " B] " . ∇A' B B

(22.41)

" = 0, is most easily derived by introducing a locally orthonormal This relation, like ∇g coordinate basis at the point P where one wishes to check its validity. Since Γµ αβ = 0 at " − ∇'A " are B α ;β Aβ − Aα ;β B β = B α ,β Aβ − Aα ,β B β P in that basis, the components of ∇A' B B " B] " [Eq. [cf. Eq. (22.33)]. But these components are identical to those of the commutator [A, (22.26)]. Since the components of these two vectors [the left and right sides of (22.41)] are identical at P in this special basis, the vectors must be identical, and since the point P was arbitrary, they must always be identical. Turn, now, to the derivation of our prescription for computing the connection coefficients in an arbitrary basis. By virtue of the relation Γµ βγ = g µα Γαβγ [Eq. (22.39)] and its inverse Γαβγ = gαµ Γµ βγ ,

(22.42)

a knowledge of Γαβγ is equivalent to a knowledge of Γµ βγ . Thus, our task reduces to deriving expression (22.38) for Γαβγ , in which the cαβγ are defined by equations (22.36) and (22.37). " = 0, when As a first step in the derivation, notice that the constancy of the metric tensor, ∇g expressed in component notation using Eq. (22.35), and when combined with Eq. (22.42), becomes 0 = gαβ;γ = gαβ,γ − Γβαγ − Γαβγ ; i.e., Γαβγ + Γβαγ = gαβ,γ .

(22.43)

This determines the part of Γαβγ that is symmetric in the first two indices. The commutator of the basis vectors determines the part antisymmetric in the last two indices: From cαβ µ"eµ = ["eα , "eβ ] = ∇α"eβ − ∇β "eα = (Γµ βα − Γµ αβ )"eµ

(22.44)

(where the first equality is the definition (22.36) of the commutation coefficient, the second is expression (22.41) for the commutator in terms of the gradient, and the third follows from the definition (22.28) of the connection coefficient), we infer, by equating the components and lowering the µ index, that Γγβα − Γγαβ = cαβγ . (22.45) By combining equations (22.43) and (22.45) and performing some rather tricky algebra (cf. Ex. 8.15 of MTW), we obtain the computational rule (22.38).

20

22.3.4

Integration

Our desire to use general bases and work in curved space gives rise to two new issues in the definition of integrals. First, the volume elements used in integration involve the Levi-Civita tensor [Eqs. (1.59b), (1.74), (1.77)], so we need to know the components of the Levi-Civita tensor in a general basis. It turns out [see, e.g., Ex. 8.3 of MTW] that the covariant components ! ! differ from those in an orthonormal basis by a factor |g| and the contravariant by 1/ |g|, where g ≡ det ||gαβ ||

(22.46)

is the determinant of the matrix whose entries are the covariant components of the metric. More specifically, let us denote by [αβ . . . ν] the value of (αβ...ν in an orthonormal basis of our n-dimensional space [Eq. (1.59b)]: [12 . . . N ] = +1 , [αβ . . . ν] = +1 if α, β, . . . , ν is an even permutation of 1, 2, . . . , N = −1 if α, β, . . . , ν is an odd permutation of 1, 2, . . . , N = 0 if α, β, . . . , ν are not all different.

(22.47)

(In spacetime the indices must run from 0 to 3 rather than 1 to n = 4). Then in a general right-handed basis the components of the Levi-Civita tensor are (αβ...ν =

! |g| [αβ . . . ν] ,

1 (αβ...ν = ± ! [αβ . . . ν] , |g|

(22.48)

where the ± is plus in Euclidean space and minus in spacetime. In a left-handed basis the sign is reversed. As an example of these formulas, consider a spherical polar coordinate system (r, θ, φ) in three-dimensional Euclidean space, and use the three infinitesimal vectors dxj (∂/∂xj ) to construct the volume element dΣ [cf. Eq. (1.70b)]: % $ ∂ ∂ √ ∂ = (rθφ drdθdφ = gdrdθdφ = r 2 sin θdrdθdφ . (22.49) dΣ = ! dr , dθ , dφ ∂r ∂θ ∂φ Here the second equality follows from linearity of ! and the formula for computing its components by inserting basis vectors into its slots; the third equality follows from our formula (22.48) for the components, and the fourth equality entails the determinant of the metric coefficients, which in spherical coordinates are grr = 1, gθθ = r 2 , gφφ = r 2 sin2 θ, all other gjk vanish, so g = r 4 sin2 θ. The resulting volume element r 2 sin θdrdθdφ should be familiar and obvious. The second new integration issue that we must face is the fact that integrals such as & T αβ dΣβ (22.50) ∂V

21 [cf. Eqs. (1.77), (1.78)] involve constructing a vector T αβ dΣβ in each infinitesimal region dΣβ of the surface of integration, and then adding up the contributions from all the infinitesimal regions. A major difficulty arises from the fact that each contribution lives in a different tangent space. To add them together, we must first transport them all to the same tangent space at some single location in the manifold. How is that transport to be performed? The obvious answer is “by the same parallel transport technique that we used in defining the gradient.” However, when defining the gradient we only needed to perform the parallel transport over an infinitesimal distance, and now we must perform it over long distances. As we shall see in Chap. 23, when the manifold is curved, long-distance parallel transport gives a result that depends on the route of the transport, and in general there is no way to identify any preferred route. As a result, integrals such as (22.50) are ill-defined in a curved manifold. The only integrals that are well defined in a curved manifold are those such as ' α S dΣα whose infinitesimal contributions S α dΣα are scalars, i.e. integrals whose value is a ∂V scalar. This fact will have profound consequences in curved spacetime for the laws of energy, momentum, and angular momentum conservation. **************************** EXERCISES Exercise 22.3 Problem: Practice with Frame-Independent Tensors Let A, B be second rank tensors. (a) Show that A + B is also a second rank tensor. (b) Show that A ⊗ B is a fourth rank tensor. (c) Show that the contraction of A ⊗ B on its first and fourth slots is a second rank tensor. (If necessary, consult Chap. 1 for a discussion of contraction). (d) Write the following quantitites in slot-naming index notation: the tensor A ⊗ B; the simultaneous contraction of this tensor on its first and fourth slots and on its second and third slots. Exercise 22.4 Derivation: Index Manipulation Rules from Duality For an arbitrary basis {"eα } and its dual basis {"eµ¯ }, use (i) the duality relation (22.8), the "·B " = g(A, " B) " between the definition (22.19) of components of a tensor and the relation A metric and the inner product to deduce the following results: (a) The relations "eµ = g µα"eα ,

"eα = gαµ"eµ .

(22.51)

(b) The fact that indices on the components of tensors can be raised and lowered using the components of the metric, e.g. F µν = g µα Fα ν ,

pα = gαβ pβ .

(22.52)

22 (c) The fact that a tensor can be reconstructed from its components in the manner of Eq. (22.11). Exercise 22.5 Practice: Transformation Matrices for Circular Polar Bases Consider the circular coordinate system {,, φ} and its coordinate bases and orthonormal bases as discussed in Fig. 22.3 and the associated text. These coordinates are related to Cartesian coordinates {x, y} by the usual relations x = , cos φ, y = , sin φ. (a) Evaluate the components (Lx & etc.) of the transformation matrix that links the two coordinate bases {"ex , "ey } and {"e& , "eφ }. Also evaluate the components (L& x etc.) of the inverse transformation matrix. (b) Evaluate, similarly, the components of the transformation matrix and its inverse linking the bases {"ex , "ey } and {"e&ˆ , "eφˆ}. " ≡ "ex + 2"ey . What are its components in the other two bases? (c) Consider the vector A Exercise 22.6 Practice: Commutation and Connection Coefficients for Circular Polar Bases As in the previous exercise, consider the circular coordinates {,, φ} of Fig. 22.3 and their associated bases. (a) Evaluate the commutation coefficients cαβ ρ for the coordinate basis {"e& , "eφ }, and also for the orthonormal basis {"e&ˆ , "eφˆ}. (b) Compute by hand the connection coefficients for the coordinate basis and also for the orthonormal basis, using Eqs. (22.36)–(22.39). [Note: the answer for the orthonormal basis was worked out by a different method in our study of elasticity theory; Eq. (10.40).] (c) Repeat this computation using symbolic manipulation software on a computer. Exercise 22.7 Practice: Connection Coefficients for Spherical Polar Coordinates (a) Consider spherical polar coordinates in 3-dimensional space and verify that the nonzero connection coefficients assuming an orthonormal basis are given by Eq. (10.41). (b) Repeat the exercise assuming a coordinate basis with er ≡

∂ , ∂r

eθ ≡

∂ , ∂θ

eφ ≡

∂ . ∂φ

(22.53)

(c) Repeat both computations using symbolic manipulation software on a computer. Exercise 22.8 Practice: Index Gymnastics — Geometric Optics In the geometric optics approximation (Chap. 6), for electromagnetic waves in Lorenz gauge, " is a slowly varying " = Ae " iϕ , where A one can write the 4-vector potential in the form A amplitude and ϕ is a rapidly varying phase. By the techniques of Chap. 6, one can deduce that the wave vector, defined by "k ≡ ∇ϕ, is null: "k · "k = 0.

23 (a) Rewrite all of the equations in the above paragraph in slot-naming index notation. (b) Using index manipulations, show that the wave vector "k (which is a vector field because the wave’s phase ϕ is a vector field) satisfies the geodesic equation, ∇'k"k = 0. The geodesics, to which "k is the tangent vector, are the rays discussed in Chap. 6, along which the waves propagate. Exercise 22.9 Practice: Index Gymnastics — Irreducible Tensorial Parts of the Gradient of a 4-Velocity Field In our study of elasticity theory, we introduced the concept of the irreducible tensorial parts of a second-rank tensor in Euclidean space (Box. 10.2). Consider a fluid flowing through spacetime, with a 4-velocity "u(P). The fluid’s gradient ∇"u (uα;β in slot-naming index notation) is a second-rank tensor in spacetime. With the aid of the 4-velocity itself, we can break it down into irreducible tensorial parts as follows: 1 uα;β = −aα uβ + θPαβ + σαβ + ωαβ . 3

(22.54)

Pαβ ≡ gαβ + uα uβ ,

(22.55)

Here: (i) Pαβ is defined by (ii) σαβ is symmetric and trace-free and is orthogonal to the 4-velocity, and (iii) ωαβ is antisymmetric and is orthogonal to the 4-velocity. (a) In quantum mechanics one deals with “projection operators” Pˆ , which satisfy the equation Pˆ 2 = Pˆ . Show that Pαβ is a projection tensor, in the sense that Pαβ P β γ = Pαγ . (b) This suggests that Pαβ may project vectors into some subspace of 4-dimensional spacetime. Indeed it does: Show that for any vector Aα , Pαβ Aβ is orthogonal to "u; and if Aα is already perpendicular to "u, then Pαβ Aβ = Aα , i.e. the projection leaves the vector unchanged. Thus, Pαβ projects vectors into the 3-space orthogonal to "u. (c) What are the components of Pαβ in the fluid’s local rest frame, i.e. in an orthonormal basis where "u = "eˆ0 ? (d) Show that the rate of change of "u along itself, ∇'u"u (i.e., the fluid 4-acceleration) is equal to the vector "a that appears in the decomposition (22.54). Show, further, that "a · "u = 0. (e) Show that the divergence of the 4-velocity, ∇ · "u, is equal to the scalar field θ that appears in the decomposition (22.54). (f) The quantities σαβ and ωαβ are the relativistic versions of the fluid’s shear and rotation tensors. Derive equations for these tensors in terms of uα;β and Pµν .

24 (g) Show that, as viewed in a Lorentz reference frame where the fluid is moving with speed small compared to the speed of light, to first-order in the fluid’s ordinary velocity v j = dxj /dt, the following are true: (i) u0 = 1, uj = v j ; (ii) θ is the nonrelativistic expansion of the fluid, θ = ∇·v ≡ v j ,j [Eq. (12.63)]; (iii) σjk is the fluid’s nonrelativistic shear [Eq. (12.63)]; (iv) ωjk is the fluid’s nonrelativist rotation tensor [denoted rjk in Eq. (12.63)]. Exercise 22.10 Practice: Integration — Gauss’s Theorem In 3-dimensional Euclidean space the Maxwell equation ∇ · E = ρe /(0 can be combined with Gauss’s theorem to show that the electric flux through the surface ∂V of a sphere is equal to the charge in the sphere’s interior V divided by (0 : & & E · dΣ = (ρe /(0 )dΣ . (22.56) ∂V

V

Introduce spherical polar coordinates so the sphere’s surface is at some radius r = R. Consider a surface element on the sphere’s surface with vectorial legs dφ∂/∂φ and dθ∂/∂θ. Evaluate the components dΣj of the surface integration element dΣ = ((..., dθ∂/∂θ, dφ∂/∂φ). Similarly, evaluate dΣ in terms of vectorial legs in the sphere’s interior. Then use these results for dΣj and dΣ to convert Eq. (22.56) into an explicit form in terms of integrals over r, θ, φ. The final answer should be obvious, but the above steps in deriving it are informative. ****************************

22.4

The Stress-Energy Tensor Revisited

In Sec. 1.12 we defined the stress-energy tensor T of any matter or field as a symmetric, second-rank tensor that describes the flow of 4-momentum through spacetime. More specifically, the total 4-momentum P that flows through some small 3-volume Σ, going from the negative side of Σ to its positive side, is " = (total 4-momentum P" that flows through Σ); " T( . . . , Σ)

i.e., T αβ Σβ = P α

(22.57)

[Eq. (1.92)]. Of course, this stress-energy tensor depends on the location P of the 3-volume in spacetime; i.e., it is a tensor field T(P). From this geometric, frame-independent definition of the stress-energy tensor, we were able to read off the physical meaning of its components in any inertial reference frame [Eqs. (1.93)]: T 00 is the total energy density, including rest mass-energy; T j0 = T 0j is the j-component of momentum density, or equivalently the j-component of energy flux; and T jk are the components of the stress tensor, or equivalently of the momentum flux. We gained some insight into the stress-energy tensor in the context of kinetic theory in Secs. 2.4.2 and 2.5.3, and we briefly introduced the stress-energy tensor for a perfect fluid in Eq. (1.100b). Because perfect fluids will play a very important role in this book’s applications of general relativity to relativistic stars (Chap. 24) and cosmology (Chap. 26), we shall now

25 explore the perfect-fluid stress-energy tensor in some depth, and shall see how it is related to the Newtonian description of perfect fluids, which we studied in Part IV. Recall [Eq. (1.100a)] that in the local rest frame of a perfect fluid, there is no energy flux or momentum density, T j0 = T 0j = 0, but there is a total energy density (including rest mass) ρ and an isotropic pressure P : T 00 = ρ ,

T jk = P δ jk .

(22.58)

From this special form of T αβ in the local rest frame, one can derive Eq. (1.100b) for the stress-energy tensor in terms of the 4-velocity "u of the local rest frame (i.e., of the fluid itself), the metric tensor of spacetime g, and the rest-frame energy density ρ and pressure P: T αβ = (ρ + P )uαuβ + P g αβ ; i.e., T = (ρ + P )"u ⊗ "u + P g ; (22.59) see Ex. 22.11, below. This expression for the stress-energy tensor of a perfect fluid is an example of a geometric, frame-independent description of physics. It is instructive to evaluate the nonrelativistic limit of this perfect-fluid stress-energy tensor and verify that it has the form we used in our study of nonrelativistic, inviscid fluid mechanics (Table 12.1 on page 24 of Chap. 12, with vanishing gravitational potential Φ = 0). In the nonrelativistic limit the fluid is nearly at rest in the chosen Lorentz reference frame. It moves with ordinary velocity v = dx/dt that of light, √ is small compared to the speed 0 0 2 so the temporal part of its 4-velocity u = 1/ 1 − v and spatial part u = u v can be approximated as $ % 1 2 1 2 0 (22.60) u ,1+ v , u, 1+ v v. 2 2 In the fluid’s rest frame, in special relativity, it has a rest mass density ρo [defined in Eq. (1.84)], an internal energy per unit rest mass u (not to be confused with the 4-velocity), and a total density of mass-energy ρ = ρo (1 + u) . (22.61)

Now, in our by √ chosen Lorentz frame the volume of each fluid element is Lorentz contracted √ the factor 1 − v 2 and therefore the rest mass density is increased from ρo to ρo / 1 − v 2 = ρo u0 ; and correspondingly the rest-mass flux is ρo u0 v = ρo u [Eq. 1.84)], and the law of " · (ρo"u) = 0. When taking the rest-mass conservation is ∂(ρo u0 )/∂t + ∂(ρo uj )/∂xj = 0, i.e. ∇ Newtonian limit, we should identify the Newtonian mass ρN with the low-velocity limit of this rest mass density: $ % 1 2 0 . (22.62) ρN = ρo u , ρo 1 + v 2 The nonrelativistic limit regards the specific internal energy u, the kinetic energy per unit mass 12 v 2 , and the ratio of pressure to rest mass density P/ρo as of the same order of smallness P 1 .1, u ∼ v2 ∼ 2 ρo

(22.63)

and it expresses the momentum density T j0 accurate to first order in v ≡ |v|, the momentum flux (stress) T jk accurate to second order in v, the energy density T 00 accurate to second

26 order in v, and the energy flux T 0j accurate to third order in v. To these accuracies, the perfect-fluid stress-energy tensor (22.59) takes the following form: T j0 = ρN v j , 1 T 00 = ρN + ρN v 2 + ρN u , 2

T jk = P g jk + ρN v j v k , $ % 1 2 P 0j j T = ρN v + ρN v j ; v +u+ 2 ρN

(22.64)

see Ex. 22.11(c). These are precisely the same as the momentum density, momentum flux, energy density, and energy flux that we used in our study of nonrelativistic, inviscid fluid mechanics (Chap. 12), aside from the notational change from there to here ρ → ρN , and aside from including the rest mass-energy ρN = ρN c2 in T00 here but not there, and including the rest-mass-energy flux ρN v j in T 0j here but not there. Just as the nonrelativistic equations of fluid mechanics (Euler equation and energy conservation) are derivable by combining the nonrelativistic T αβ of Eq. (22.64) with the nonrelativistic laws of momentum and energy conservation, so also the relativistic equations of fluid mechanics are derivable by combining the relativistic version (22.59) of T αβ with the " · T = 0. (We shall give such a derivation and equation of 4-momentum conservation ∇ shall examine the resulting fluid mechanics equations in the context of general relativity in Chap. 23.) This, together with the fact that the relativistic T reduces to the nonrelativistic T αβ in the nonrelativistic limit, guarantees that the special relativistic equations of inviscid fluid mechanics will reduce to the nonrelativistic equations in the nonrelativistic limit. A second important example of a stress-energy tensor is that for the electromagnetic field. We shall explore it in Ex. 22.13 below. For a point particle which moves through spacetime along a world line P(ζ) (where ζ is the affine parameter such that the particle’s 4-momentum is p" = d/dζ), the stress-energy tensor will vanish everywhere except on the world line itself. Correspondingly, T must be expressed in terms of a Dirac delta function. The relevant delta function is a scalar function of two points in spacetime, δ(Q, P) with the property that when one integrates over the point P, using the 4-dimensional volume element dΣ (which in any inertial frame just reduces to dΣ = dtdxdydz), one obtains & f (P)δ(Q, P)dΣ = f (Q) . (22.65) V

Here f (P) is an arbitrary scalar field and the region V of 4-dimensional integration must include the point Q. One can verify that in terms of Lorentz coordinates this delta function can be expressed as δ(Q, P) = δ(tQ − tP )δ(xQ − xP )δ(yQ − yP )δ(zQ − zP ) ,

(22.66)

where the deltas on the right-hand side are ordinary one-dimensional Dirac delta functions. In terms of the spacetime delta function δ(Q, P) the stress-energy tensor of a point particle takes the form & +∞ T(Q) = "p(ζ) ⊗ p"(ζ)δ (Q, P(ζ)) dζ , (22.67) −∞

27 where the integral is along the world line P(ζ) of the particle. It is a straightforward but sophisticated exercise [Ex. 22.14] to verify that the integral of this stress-energy tensor over any 3-surface S that slices through the particle’s world line just once, at an event P(ζo ), is equal to the particle’s 4-momentum at the intersection point: & T αβ (Q)dΣβ = pα (ζo ) . (22.68) S

**************************** EXERCISES Exercise 22.11 Derivation: Stress-Energy Tensor for a Perfect Fluid (a) Derive the frame-independent expression (22.59) for the perfect fluid stress-energy tensor from its rest-frame components (22.58). (b) Read Ex. 1.29, and work part (b)—i.e., show that for a perfect fluid the inertial mass per unit volume is isotropic and is equal to (ρ + P )δ ij when thought of as a tensor, or simply ρ + P when thought of as a scalar. (c) Show that in the nonrelativistic limit the components of the perfect fluid stress-energy tensor (22.59) take on the forms (22.64), and verify that these agree with the densities and fluxes of energy and momentum that are used in nonrelativistic fluid mechanics (e.g., Table 12.1 on page 24 of Chap. 12). (d) Show that it is the contribution of the pressure P to the relativistic density of inertial mass that causes the term (P/ρN )ρN v = P v to appear in the nonrelativistic energy flux. Exercise 22.12 Problem: Electromagnetic Field Tensor As we saw in Sec. 1.10, in 4-dimensional spacetime the electromagnetic field is described by a second-rank tensor F( . . . , . . . ) which is antisymmetric on its two slots, F αβ = −F βα ; and the 4-force (rate of change of 4-momentum) that it exerts on a particle with rest mass m, charge q, proper time τ , 4-velocity "u = d/dτ , and 4-momentum p" is d" p dpα = ∇'u p" = qF( . . . , "u) ; i.e., = qF αβ uβ . (22.69) dτ dτ Here the second form of the equation, valid in a Lorentz frame, follows from the component form of ∇'u p": pα ;µ uµ = pα ,µ uµ = dpα /dτ . (a) By comparing this with the Lorentz force law for a low-velocity particle, dp/dt = q(E + v × B), show that the components of the electromagnetic field tensor in a Lorentz reference frame are (( (( x y z (( (( 0 E E E (( (( (( −E x 0 B z −B y (((( αβ ( ( || F || = (( (22.70) ; y z 0 B x (((( (( −E z −By (( −E B −B x 0 ((

28 i.e., F 0i = −F i0 = E i ,

F ij = −F ji = (ij k B k ,

(22.71)

where E j and B j are the components of the 3-vector electric and magnetic fields that reside in the 3-space of the Lorentz frame. (b) Define ∗F ≡(“dual” of F) by 1 ∗Fµν = (µναβ F αβ , 2

(22.72)

where ! is the Levi-Civita tensor (Sec. 22.3.4). What are the components of ∗F in a Lorentz frame in terms of that frame’s electric and magnetic fields [analog of Eq. (22.70)]? " 'u and (c) Let "u be the 4-velocity of some observer. Show that the 4-vectors F( . . . , "u) ≡ E " 'u lie in the 3-space of that observer’s local rest frame (i.e., they are − ∗ F( . . . , "u) ≡ B orthogonal to the observer’s 4-velocity), and are equal to the electric and magnetic fields of that 3-space, i.e., the electric and magnetic fields measured by that observer. (d) There are only two independent scalars constructable from the electromagnetic field tensor: F µν Fµν and ∗F µν Fµν . Show that, when expressed in terms of the electric and magnetic fields measured by any observer (i.e., of any Lorentz reference frame), these take the form F µν Fµν = 2(B2 − E2 ) , ∗F µν Fµν = 4B · E . (22.73) Exercise 22.13 Problem: Electromagnetic Stress-energy Tensor Expressed in geometric, frame-independent language, the Maxwell equations take the form F αβ ;β = 4πJ α ;

∗F αβ ;β = 0 or equivalently Fαβ;γ + Fβγ;α + Fγα;β = 0 .

(22.74)

Here J α is the density-current 4-vector, whose components in a specific Lorentz frame have the physical meanings J 0 = (charge density) ,

J i = (i-component of current density) .

(22.75)

The stress-energy tensor for the electromagnetic field has the form T µν =

1 1 (F µα F ν α − g µν Fαβ F αβ ) . 4π 4

(22.76)

(a) Show that in any Lorentz reference frame the electromagnetic energy density T 00 , energy flux T 0j , momentum density T j0, and stress T jk have the following forms when expressed in terms of the electric and magnetic fields measured in that frame: T 00 =

E 2 + B2 , 8π

T i0 = T 0i =

(i jk E j B k , 4π

29 T jk =

* 1 ) 2 (E + B2 )g jk − 2(E j E k + B j B k ) . 8π

(22.77)

Show that, expressed in index-free notation, the energy flux has the standard Poyntingvector form E × B/4π, and the stress tensor consists of a pressure P⊥ = E2 /8π orthogonal to E, a pressure P⊥ = B2 /8π orthogonal to B, a tension −P|| = E2 /8π along E, and a tension −P|| = B2 /8π along B. (b) Show that the divergence of the stress-energy tensor (22.76) is given by T µν ;ν =

1 1 (F µα ;ν F ν α + F µα F ν α;ν − Fαβ ;µ F αβ ) . 4π 2

(22.78)

(c) Combine this with the Maxwell equations to show that ∇ · T = −F( . . . , J) ;

i.e., T αβ ;β = −F αβ Jβ .

(22.79)

(d) Show that in a Lorentz reference frame the time and space components of this equation reduce to ∂ 00 ∂ (22.80) T + j T 0j = −E j Jj ≡ −(rate of Joule heating) , ∂t ∂x $ % % $ ∂ k0 ∂ kj Lorentz force 0 T + jT . (22.81) ek = −(J E + J × B) = − per unit volume ∂t ∂x Explain why these relations guarantee that, although the electromagnetic stress-energy tensor is not divergence-free, the total stress-energy tensor (electromagnetic plus that " is divergence-free; of the medium or fields that produce the charge-current 4-vector J) i.e., the total 4-momentum is conserved.

Exercise 22.14 Derivation: Stress-Energy Tensor for a Point Particle Derive Eq. (22.68). ****************************

22.5

The Proper Reference Frame of an Accelerated Observer [MTW pp. 163–176, 327–332]

Physics experiments and astronomical measurements almost always use apparatus that accelerates and rotates. For example, if the apparatus is in an earth-bound laboratory and is attached to the laboratory floor and walls, then it accelerates upward (relative to freely falling particles) with the negative of the “acceleration of gravity”, and it rotates (relative to inertial gyroscopes) because of the rotation of the earth. It is useful in studying such apparatus to regard it as attached to an accelerating, rotating reference frame. As preparation

30 observer’s world line

t

ey

x 0 = const

ex e0 ex

x 0 = const

ey y

x

Fig. 22.6: The proper reference frame of an accelerated observer. The spatial basis vectors "exˆ , "eyˆ, and "ezˆ are orthogonal to the observer’s world line and rotate, relative to local gyroscopes, as they move along the world line. The flat 3-planes spanned by these basis vectors are surfaces of constant ˆ coordinate time x0 ≡ (proper time as measured by the observer’s clock at the event where the 3-plane intersects the observer’s world line); in other words, they are the observer’s “3-space”. In each of these flat 3-planes the spatial coordinates xˆ, yˆ, zˆ are Cartesian, with ∂/∂ x ˆ = "exˆ , ∂/∂ yˆ = "eyˆ, ∂/∂ zˆ = "ezˆ.

for studying such reference frames in the presence of gravity, we here shall study them in flat spacetime. Consider an observer who moves along an accelerated world line through flat spacetime (Fig. 22.6) so she has a nonzero 4-acceleration " 'u "u . "a = ∇

(22.82)

Have that observer construct, in the vicinity of her world line, a coordinate system {xαˆ } (called her proper reference frame) with these properties: (i ) The spatial origin is centered ˆ on her world line at all times, i.e., her world line is given by xj = 0. (ii ) Along her world ˆ line the time coordinate x0 is the same as the proper time ticked by an ideal clock that she ˆ carries. (iii ) In the immediate vicinity of her world line the spatial coordinates xj measure physical distance along the axes of a little Cartesian latticework that she carries. These properties dictate that in the immediate vicinity of her world line the metric has the form ˆ ds2 = ηαˆβˆdxαˆ dxβ ; in other words, all along her world line the coordinate basis vectors are orthonormal: ∂ ∂ ˆ gαˆβˆ = αˆ · = ηαˆ βˆ at xj = 0 . (22.83) ˆ ∂x ∂xβ Properties (i ) and (ii ) dictate, moreover, that along the observer’s world line the basis vector ˆ "eˆ0 ≡ ∂/∂x0 differentiates with respect to her proper time, and thus is identically equal to " her 4-velocity U, ∂ " . (22.84) "eˆ0 = ˆ0 = U ∂x

31 There remains freedom as to how the observer’s latticework is oriented, spatially: The observer can lock it to a gyroscopic inertial-guidance system, in which case we shall say that it is “nonrotating”, or she can rotate it relative to such gyroscopes. We shall assume that the latticework rotates. Its angular velocity as measured by the observer (by comparing the latticework’s orientation with inertial-guidance gyroscopes) is a 3-dimensional, spatial " whose components in the observer’s vector Ω; and as viewed geometrically, it is a 4-vector Ω ˆ ˆ 0 j reference frame are Ω #= 0 and Ω = 0, i.e., it is a 4-vector that is orthogonal to the observer’s " ·U " = 0; i.e., it is a 4-vector that lies in the observer’s 3-space. Similarly, the 4-velocity, Ω latticework’s acceleration as measured by an accelerometer attached to it is a 3-dimensional spatial vector a which can be thought of as a 4-vector with components in the observer’s ˆ ˆ frame a0 = 0, aj = (ˆj-component of the measured a). This 4-vector, in fact, is the observer’s 4-acceleration, as one can verify by computing the 4-acceleration in an inertial frame in which the observer is momentarily at rest. Geometrically the coordinates of the proper reference frame are constructed as follows: (i ) Begin with the basis vectors "eαˆ along the observer’s world line (Fig. 22.6)—basis vectors " relative to that satisfy equations (22.83) and (22.84), and that rotate with angular velocity Ω ˆ 0 gyroscopes. Through the observer’s world line at time x construct the flat 3-plane spanned by the spatial basis vectors "eˆj . Because "eˆj · "eˆ0 = 0, this 3-plane is orthogonal to the world ˆ line. All events in this 3-plane are given the same value of coordinate time x0 as the event where it intersects the world line; thus the 3-plane is a surface of constant coordinate time ˆ ˆ x0 . The spatial coordinates in this flat 3-plane are ordinary, Cartesian coordinates xj with ˆ "eˆj = ∂/∂xj . It is instructive to examine the coordinate transformation between these proper-referenceframe coordinates xαˆ and the coordinates xµ of an inertial reference frame. We shall pick a very special inertial frame for this purpose: Choose an event on the observer’s world line, near which the coordinate transformation is to be constructed; adjust the origin of her ˆ ˆ proper time so this event is x0 = 0 (and of course xj = 0); and choose the inertial frame to be one which, arbitrarily near this event, coincides with the observer’s proper reference frame. Then, if we were doing Newtonian physics, the coordinate transformation from the proper reference frame to the inertial frame would have the form (accurate through terms quadratic in xαˆ ) 1 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ xi = xi + ai (x0 )2 + (i ˆj kˆ Ωj xk x0 , x0 = x0 . (22.85) 2 ˆ

ˆ

Here the term 12 aj (x0 )2 is the standard expression for the vectorial displacement produced, ˆ ˆ ˆ ˆ ˆ ˆ after time x0 by the acceleration aj ; and the term (i ˆj kˆ Ωj xk x0 is the standard expression for ˆ ˆ the displacement produced by the rotation Ωj during a short time x0 . In relativity theory there is only one departure from these familiar expressions (up through quadratic order): ˆ ˆ ˆ ˆ after time x0 the acceleration has produced a velocity v j = aj x0 of the proper reference frame relative to the inertial frame; and correspondingly there is a Lorentz-boost correction ˆ ˆ ˆ ˆ ˆ to the transformation of time: x0 = x0 + v j xj = x0 (1 + aˆj xj ) [cf. Eq. (1.49c)], accurate only

32 to quadratic order. Thus, the full transformation to quadratic order is 1 ˆ ˆ ˆ ˆ ˆ ˆ ˆ xi = xi + ai (x0 )2 + (iˆj kˆ Ωj xk x0 , 2 ˆ ˆ 0 0 x = x (1 + aˆj xj ) .

(22.86)

From this transformation and the form of the metric, ds2 = −(dx0 )2 + δij dxi dxj in the inertial frame, we easily can evaluate the form of the metric, accurate to linear order in x, in the proper reference frame: ˆ

ˆ

ds2 = −(1 + 2a · x)(dx0 )2 + 2(Ω × x) · dx dx0 + δjk dxj dxk

(22.87)

[Ex. 22.15(a)]. Here the notation is that of 3-dimensional vector analysis, with x the 3-vector ˆ ˆ ˆ whose components are xj , dx that with components dxj , a that with components aj , and Ω ˆ that with components Ωj . Because the transformation (22.86) was constructed near an arbitrary event on the observer’s world line, the metric (22.87) is valid near any and every event on its world line; i.e., it is valid all along the world line. It, in fact, is the leading order in an expansion in powers ˆ of the spatial separation xj from the world line. For higher order terms in this expansion see, e.g., Ni and Zimmermann (1978). Notice that precisely on the observer’s world line, the metric coefficients gαˆβˆ [the coeffiˆ

cients of dxαˆ dxβ in Eq. (22.87)] are gαˆβˆ = ηαˆβˆ, in accord with equation (22.83). However, as one moves farther and farther away from the observer’s world line, the effects of the accelˆ ˆ eration aj and rotation Ωj cause the metric coefficients to deviate more and more strongly from ηαˆ βˆ. From the metric coefficients of (22.87) one can compute the connection coefficients Γαˆ βˆ ˆγ on the observer’s world line; and from these connection coefficients one can infer the rates of change of the basis vectors along the world line, ∇U' "eαˆ = ∇ˆ0"eαˆ = Γµˆ αˆ ˆ0"eµˆ .

(22.88)

" = "a , ∇U' "eˆˆ0 ≡ ∇U' U

(22.89)

The result is (cf. Ex. 22.15) " + !(U " , Ω, " "eˆ, . . .) . ∇U' "eˆj = ("a · "eˆj )U j

(22.90)

" =U " ("a · S) " . ∇U' S

(22.91)

Equation (22.90) is a special case of a general “law of transport” for vectors that are orthogonal to the observer’s world line and that the observer thus sees as purely spatial: " of an inertial-guidance gyroscope (one which the observer carries with For the spin vector S herself, applying the forces that make it accelerate precisely at its center of mass so they do " and with Ω " = 0: not also make it precess), the transport law is (22.90) with "eˆj replaced by S

The term on the right-hand side of this transport law is required to keep the spin vector "·U " ) = 0. For any other vector A, " always orthogonal to the observer’s 4-velocity, ∇U' (S

33 which rotates relative to inertial-guidance gyroscopes, the transport law has in addition " ” term, also a second term which is the 4-vector form of to this “keep-it-orthogonal-to U dA/dt = Ω × A: " = U(" " a · A) " + !(U, " Ω, " A, " ...). ∇U' A (22.92)

" replaced by "eˆ. Equation (22.90) is this general transport law with A j Consider a particle which moves freely through the neighborhood of an accelerated observer. As seen in an inertial reference frame, the particle moves through spacetime on a straight line, also called a geodesic of flat spacetime. Correspondingly, a geometric, frameindependent version of its geodesic law of motion is ∇'u"u = 0 ;

(22.93)

i.e., it parallel transports its 4-velocity "u along itself. It is instructive to examine the component form of this “geodesic equation” in the proper reference frame of the observer. Since the components of "u in this frame are uα = dxα /dτ , where τ is the particle’s proper time (not the observer’s proper time), the components uαˆ ;ˆµ uµˆ = 0 of the geodesic equation (22.93) are uαˆ ,ˆµ uµˆ + Γαˆ µˆνˆ uµˆ uνˆ = 0 ;

(22.94)

d2 xαˆ dxµˆ dxνˆ α ˆ =0. + Γ µ ˆνˆ dτ 2 dτ dτ

(22.95)

or equivalently

Suppose for simplicity that the particle is moving slowly relative to the observer, so its ˆ ˆ ˆ ˆ ˆ ordinary velocity v j = dxj /dx0 is very nearly equal to uj = dxj /dτ and is very small ˆ ˆ compared to unity (the speed of light), and u0 = dx0 /dτ is very nearly unity. Then to first ˆ order in the ordinary velocity v j , the spatial part of the geodesic equation (22.95) becomes ˆ

d2 xi ˆ ˆ ˆ ˆ = −Γi ˆ0ˆ0 − (Γiˆj ˆ0 + Γi ˆ0ˆj )v j . ˆ 2 0 (dx )

(22.96)

By computing the connection coefficients from the metric coefficients of (22.87) [Ex. 22.15], we bring this low-velocity geodesic law of motion into the form ˆ

d2 xi ˆ ˆ ˆ ˆ = −ai − 2(iˆj kˆ Ωj v k , ˆ 2 0 (dx )

i.e.,

d2 x = −a − 2Ω × v . (dxˆ0 )2

(22.97)

This is the standard nonrelativistic form of the law of motion for a free particle as seen in a rotating, accelerating reference frame: the first term on the right-hand side is the inertial acceleration due to the failure of the frame to fall freely, and the second term is the Coriolis acceleration due to the frame’s rotation. There would also be a centrifugal acceleration if we had kept terms higher order in distance away from the observer’s world line, but it has been lost due to our linearizing the metric (22.87) in that distance. This analysis shows how the elegant formalism of tensor analysis gives rise to familiar physics. In the next few chapters we will see it give rise to less familiar, general relativistic phenomena.

34

**************************** EXERCISES Exercise 22.15 Example: Proper Reference Frame (a) Show that the coordinate transformation (22.86) brings the metric ds2 = ηαβ dxα dxβ ˆ into the form (22.87), accurate to linear order in separation xj from the origin of coordinates. (b) Compute the connection coefficients for the coordinate basis of (22.87) at an arbitrary event on the observer’s world line. Do so first by hand calculations, and then verify your results using symbolic-manipulation software on a computer. (c) From those connection coefficients show that the rate of change of the basis vectors eαˆ along the observer’s world line is given by (22.89), (22.90). (d) From the connection coefficients show that the low-velocity limit (22.96) of the geodesic equation is given by (22.97). Exercise 22.16 Problem: Uniformly Accelerated Observer As a special example of an accelerated observer, consider one whose world line, expressed in terms of a Lorentz coordinate system (t, x, y, z), is t=

1 sinh(aτ ) , a

x=

1 cosh(aτ ) , a

y=0,

z=0.

(22.98)

Here a is a constant with dimensions of 1/(length) or equivalently (length)/(time)2 , and τ is a parameter that varies along the accelerated world line. (a) Show that τ is the observer’s proper time, and evaluate the observer’s 4-acceleration "a, and show that |"a| = a where a is the constant in (22.98), so the observer feels constant, time-independent accleration in his proper reference frame. (b) The basis vectors eˆ0 and eˆ1 of the observer’s proper reference frame lie in the t, x plane in spacetime, eˆ2 points along the y-axis, and eˆ3 points along the z axis. Draw a spacetime diagram, on it draw the observer’s world line, and at several points along it draw the basis vectors eµˆ . What are eµˆ in terms of the Lorentz coordinate basis vectors ∂/∂xα ? " of the proper reference frame? (c) What is the angular velocity Ω (d) Express the coordinates xµˆ of the observer’s proper reference frame in terms of the Lorentz coordinates (t, x, y, z) accurate to first order in distance away from the observer’s world line and accurate for all proper times τ . Show that under this coordinate transformation the Lorentz-frame components of the metric, gαβ = ηαβ , are transformed into the components given by Eq. (22.87).

35 Exercise 22.17 Challenge: Thomas Precession As is well known in quantum mechanics, the spin-orbit contribution to the Hamiltonian for an electron in an atom is −e dφ HSO = L·S (22.99) 2m2e c2 r dr where φ is the electrostatic potential and L, S are the electron’s angular momentum and spin respectively. This is one half the naive value and the difference, known as the Thomas precession, is a purely special relativistic kinematic effect. Using the language of this chapter, explain from first principles how the Thomas precession arises. ****************************

Bibliographic Note For a very readable presentation of most of this chapter’s material, from much the same point of view, see Chap. 20 of Hartle (2003). For an equally elementary introduction from a somewhat different viewpoint, see Chaps. 1–4 of Schutz (1980). A far more detailed and somewhat more sophisticated introduction, largely but not entirely from our viewpoint, will be found in Chaps. 1–6 of Misner, Thorne and Wheeler (1973). More sophisticated treatments from rather different viewpoints than ours are given in Chaps. 1 and 2 and Sec. 3.1 of Wald (1984), and in Chaps. 1 and 2 of Carroll (2004). A treasure trove of exercises on this material, with solutions, will be found in Chaps. 6, 7, and 8 of Lightman, Press, Price and Teukolsky (1975).

Bibliography Carroll, S. M., 2004. Spacetime and Geometry: An Introduction to General Relativity, San Francisco: Addison Wesley. Feynman, R. P., 1966. The Character of Physical Law, Cambridge MA: MIT Press. Hartle, J. B., 2003. Gravity: An Introduction to Einstein’s General Relativity, San Francisco: Addison-Wesley. Lightman, A. P., Press, W. H., Price, R. H. & Teukolsky, S. A. 1975. Problem Book In Relativity and Gravitation, Princeton: Princeton University Press. MTW: Misner, Charles W., Thorne, Kip S., and Wheeler, John A. 1973. Gravitation, San Francisco:W. H. Freeman. Ni, W.-T., and Zimmermann, M., 1978. “Inertial and gravitational effects in the proper reference frame of an accelerated, rotating observer,” Physical Review D, 17, 1473.

36 Box 22.2 Important Concepts in Chapter 22 • Most important concepts from Chap. 1 – – – –

Principle of Relativity, Sec. 22.2.1 Metric defined in terms of interval, Sec. 22.2.1 Inertial frames, Sec. 22.2.2 Interval and spacetime diagrams, Sec. 22.2.3

• Differential geometry in general bases, Sec. 22.3 – Dual bases, {"eα }, {"eµ } with "eα · "eµ = δαµ , Sec. 22.3.1 – Covariant, contravariant and mixed components of a tensor, Sec. 22.3.1 – Changes of bases and corresponding transformation of components of tensors, Sec. 22.3.1 – Coordinate bases, Sec. 22.3.1 – Orthonormal bases, Sec. 22.2.2 – Vector as a differential operator (directional derivative), Sec. 22.3.2 – Tangent space, Sec. 22.3.2 – Commutator of vector fields, Sec. 22.3.2 – Parallel transport of vectors, Sec. 22.3.3 – Connection coefficients, how to compute them, and how to use them in computing components of the gradients of tensor fields, Sec. 22.3.3 – Christoffel symbols (connection coefficients in a coordinate basis), Sec. 22.3.3 – Levi-Civita tensor and its components, Sec. 22.3.4 – Volume elements for integration, Sec. 22.3.4 • Stress-energy tensor, Sec. 22.4 – – – –

Definition, Sec. 22.4 For perfect fluid, Sec. 22.4 For point particle, Sec. 22.4 For electromagnetic field, Ex. 22.13

• Proper reference frame of an accelerated observer and metric in it, Sec. 22.5 – Transport law for inertial-guidance gyroscope, Sec. 22.5 – Geodesic law of motion, Sec. 22.5

37 Pais, A. 1982. ‘Subtle is the Lord...’ The Science and Life of Albert Einstein, Oxford U.K.: Oxford University Press. Schutz, B. 1980. Geometrical Methods of Mathematical Physics, Cambridge: Cambridge University Press. Wald, R. M. 1984. General Relativity, Chicago: University of Chicago Press. Will, C. M. 1993. Was Einstein Right, New York: Basic Books.

Contents 23 Fundamental Concepts of General Relativity 23.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.2 Local Lorentz Frames, the Principle of Relativity, and Einstein’s Equivalence Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.3 The Spacetime Metric, and Gravity as a Curvature of Spacetime . . . . . . . 23.4 Free-fall Motion and Geodesics of Spacetime . . . . . . . . . . . . . . . . . . 23.5 Relative Acceleration, Tidal Gravity, and Spacetime Curvature . . . . . . . . 23.5.1 Newtonian Description of Tidal Gravity . . . . . . . . . . . . . . . . 23.5.2 Relativistic Description of Tidal Gravity . . . . . . . . . . . . . . . . 23.5.3 Comparison of Newtonian and Relativistic Descriptions . . . . . . . . 23.6 Properties of the Riemann Curvature Tensor . . . . . . . . . . . . . . . . . . 23.7 Curvature Coupling Delicacies in the Equivalence Principle, and Some Nongravitational Laws of Physics in Curved Spacetime1 . . . . . . . . . . . . . . 23.8 The Einstein Field Equation2 . . . . . . . . . . . . . . . . . . . . . . . . . . 23.9 Weak Gravitational Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.9.1 Newtonian Limit of General Relativity . . . . . . . . . . . . . . . . . 23.9.2 Linearized Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.9.3 Gravitational Field Outside a Stationary, Linearized Source . . . . . 23.9.4 Conservation Laws for Mass, Momentum and Angular Momentum . .

1 2

See MTW Chap. 16. See MTW Chap. 17.

0

1 1 5 6 9 14 14 16 17 19 23 26 29 30 31 33 35

Chapter 23 Fundamental Concepts of General Relativity Version 0823.1.K.pdf, 29 April 2009 Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 23.1 Reader’s Guide • This chapter relies significantly on – The special relativity portions of Chap. 1. – Chapter 22, on the transition from special relativity to general relativity. • This chapter is a foundation for the applications of general relativity theory in Chaps. 24–26.

23.1

Overview

Newton’s theory of gravity is logically incompatible with the special theory of relativity: Newtonian gravity presumes the existence of a universal, frame-independent 3-dimensional space in which lives the Newtonian potential Φ, and a universal, frame-independent time t with respect to which the propagation of Φ is instantaneous. Special relativity, by contrast, insists that the concepts of time and of 3-dimensional space are frame-dependent, so that instantaneous propagation of Φ in one frame would mean non-instantaneous propagation in another. The most straightforward way to remedy this incompatibility is to retain the assumption that gravity is described by a scalar field Φ, but modify Newton’s instantaneous, action-at1

2 a-distance field equation

∂2 ∂2 ∂2 + + ∂x2 ∂y 2 ∂z 2

Φ = 4πGρ

(23.1)

(where G is Newton’s gravitation constant and ρ is the mass density) to read ~ 2 Φ ≡ g αβ Φ;αβ = −4πGT µ µ , ∇

(23.2)

~ 2 is the squared gradient, or d’Alembertian in Minkowski spacetime and T µ µ is the where ∇ trace (contraction on its slots) of the stress-energy tensor. This modified field equation at first sight is attractive and satisfactory (but see Ex. 23.1, below): (i ) It satisfies Einstein’s Principle of Relativity in that it is expressed as a geometric, frame-independent relationship between geometric objects; and (ii ) in any Lorentz frame it takes the form [with factors of c = (speed of light) restored] 1 ∂2 ∂2 ∂2 ∂2 4πG − 2 2 + 2 + 2 + 2 Φ = 2 (T 00 − T xx − T yy − T zz ) , (23.3) c ∂t ∂x ∂y ∂z c which, in the kinds of situation contemplated by Newton [energy density predominantly due to rest mass, T 00 ∼ = ρc2 ; stress negligible compared to rest mass-energy, |T jk | ≪ ρc2 ; and 1/c × (time rate of change of Φ) negligible compared to spatial gradient of Φ], reduces to the Newtonian field equation (23.1). Not surprisingly, most theoretical physicists in the decade following Einstein’s formulation of special relativity (1905–1915) presumed that gravity would be correctly describable, within the framework of special relativity, by this type of modification of Newton’s theory, or something resembling it. For a brief historical account see Chap. 13 of Pais (1982). To Einstein, by contrast, it seemed clear as early as 1907 that the correct description of gravity should involve a generalization of special relativity rather than an incorporation into special relativity: Since an observer in a local, freely falling reference frame near the earth should not feel any gravitational acceleration at all, local freely falling frames (local inertial frames) should in some sense be the domain of special relativity, and gravity should somehow be described by the relative acceleration of such frames. Although the seeds of this idea were in Einstein’s mind as early as 1907 (see the discussion of the equivalence principle in Einstein, 1907), it required eight years for him to bring them to fruition. A first crucial step, which took half the eight years, was for Einstein to conquer his initial aversion to Minkowski’s (1908) geometric formulation of special relativity, and to realize that a curvature of Minkowski’s 4-dimensional spacetime is the key to understanding the relative acceleration of freely falling frames. The second crucial step was to master the mathematics of differential geometry, which describes spacetime curvature, and using that mathematics to formulate a logically self-consistent theory of gravity. This second step took an additional four years and culminated in Einstein’s (1915, 1916) general theory of relativity. For a historical account of Einstein’s eight-year struggle toward general relativity see, e.g., Part IV of Pais (1982); and for selected quotations from Einstein’s technical papers during this eight-year period, which tell the story of his struggle in his own words, see Sec. 17.7 of MTW.

3 It is remarkable that Einstein was led, not by experiment, but by philosophical and aesthetic arguments, to reject the incorporation of gravity into special relativity [Eqs. (23.2) and (23.3) above], and insist instead on describing gravity by curved spacetime. Only after the full formulation of his general relativity did experiments begin to confirm that he was right and that the advocates of special-relativistic gravity were wrong, and only more than 50 years after general relativity was formulated did the experimental evidence become extensive and strong. For detailed discussions see, e.g., Will (1981, 1986), and Part 9 of MTW. The mathematical tools, the diagrams, and the phrases by which we describe general relativity have changed somewhat in the seventy years since Einstein formulated his theory; and, indeed, we can even assert that we understand the theory more deeply than did Einstein. However, the basic ideas are unchanged; and general relativity’s claim to be the most elegant and aesthetic of physical theories has been reinforced and strengthened by our growing insights. General relativity is not merely a theory of gravity. Like special relativity before it, the general theory is a framework within which to formulate all the laws of physics, classical and quantum—but now with gravity included. However, there is one remaining, crucial, gaping hole in this framework: It is incapable of functioning, indeed it fails completely, when conditions become so extreme that space and time themselves must be quantized. In those extreme conditions general relativity must be married in some deep, as-yet-ill-understood way, with quantum theory, to produce an all-inclusive quantum theory of gravity—a theory which, one may hope, will be a “theory of everything.” To this we shall return, briefly, in Chaps. 24 and 26. In this chapter we present, in modern language, the foundations of general relativity. Our presentation will rely heavily on the concepts, viewpoint, and formalism developed in Chaps. 1 and 22. We shall begin in Sec. 23.2 with a discussion of three concepts that are crucial to Einstein’s viewpoint on gravity: a local Lorentz frame (the closest thing there is, in the presence of gravity, to special relativity’s “global” Lorentz frame), the extension of the principle of relativity to deal with gravitational situations, and Einstein’s equivalence principle by which one can “lift” laws of physics out of the flat spacetime of special relativity and into the curved spacetime of general relativity. In Sec. 23.3 we shall see how gravity prevents the meshing of local Lorentz frames to form global Lorentz frames, and shall infer from this that spacetime must be curved. In Sec. 23.4 we shall lift into curved spacetime the law of motion for free test particles, and in Sec. 23.5 we shall see how spacetime curvature pushes two freely moving test particles apart and shall use this phenomenon to make contact between spacetime curvature and the Newtonian “tidal gravitational field” (gradient of the Newtonian gravitational acceleration). In Sec. 23.6 we shall study a number of mathematical and geometric properties of the tensor field that embodies spacetime curvature: the Riemann tensor. In Sec. 23.7 we shall examine “curvature coupling delicacies” which plague the lifting of laws of physics from flat spacetime to curved spacetime. In Sec. 23.8 we shall meet the Einstein field equation, which describes the manner in which spacetime curvature is produced by the total stress-energy tensor of all matter and nongravitational fields. In Ex. 23.12 we shall examine in some detail how Newton’s laws of gravity arise as a weak-gravity limit of

4 general relativity. Finally, in Sec. 23.9 we shall examine the conservation laws for energy, momentum, and angular momentum of gravitating bodies that live in “asymptotically flat” regions of spacetime. **************************** EXERCISES Exercise 23.1 Example: A Special Relativistic, Scalar-Field Theory of Gravity Equation (23.2) is the field equation for a special relativistic theory of gravity with gravitational potential Φ. To complete the theory one must describe the forces that the field Φ produces on matter. (a) One conceivable choice for the force on a test particle of rest mass m is the following generalization of the familiar Newtonian expression: α ~ ; i.e., dp = −mΦ,α in a Lorentz frame, ∇u~ ~p = −m∇Φ (23.4) dτ where τ is proper time along the particle’s world line, ~p is the particle’s 4-momentum, ~u ~ is the spacetime gradient of the gravitational potential. Show is its 4-velocity, and ∇Φ that this equation of motion reduces, in a Lorentz frame and for low particle velocities, to the standard Newtonian equation of motion. Show, however, that this equation of motion is flawed in that the gravitational field will alter the particle’s rest mass— in violation of extensive experimental evidence that the rest mass of an elementary particle is unique and conserved. (b) Show that the above equation of motion, when modified to read ~ ; ∇u~ p~ = −(g + ~u ⊗ ~u) · m∇Φ α dp = −(g αβ + uα uβ )mΦ,β in a Lorentz frame, (23.5) i.e., dτ preserves the particle’s rest mass. In this equation of motion ~u is the particle’s 4~ into the “3-space” orthogonal to the particle’s velocity, and g + ~u ⊗ ~u projects ∇Φ world line; cf. Fig. 22.6. (c) Show, by treating a zero-rest-mass particle as the limit of a particle of finite rest mass (~p = m~u and ζ = τ /m finite as τ and m go to zero), that the above theory predicts that in any Lorentz reference frame pα eΦ (with α = 0, 1, 2, 3) are constant along the zerorest-mass particle’s world line. Explain why this prediction implies that there will be no deflection of light around the limb of the sun, which conflicts severely with experiments that were done after Einstein formulated his general theory of relativity. (There was no way, experimentally, to rule out the above theory in the epoch, ca. 1914, when Einstein was doing battle with his colleagues over whether gravity should be treated within the framework of special relativity or should be treated as a geometric extension of special relativity.) ****************************

5

z

y

EARTH

x

Fig. 23.1: A local inertial frame (local Lorentz frame) inside a space shuttle that is orbiting the earth.

23.2

Local Lorentz Frames, the Principle of Relativity, and Einstein’s Equivalence Principle

One of Einstein’s greatest insights was to recognize that special relativity is valid not globally, but only locally, inside locally freely falling (inertial) reference frames. Figure 23.1 shows a specific example of a local inertial frame: The interior of a space shuttle in earth orbit, where an astronaut has set up a freely falling (from his viewpoint “freely floating”) latticework of rods and clocks. This latticework is constructed by all the rules appropriate to a special relativistic, inertial (Lorentz) reference frame (Sec. 1.2): (i ) the latticework moves freely through spacetime so no forces act on it, and its rods are attached to gyroscopes so they do not rotate; (ii ) the measuring rods are orthogonal to each other, with their intervals of length uniform compared, e.g., to the wavelength of light (orthonormal lattice); (iii ) the clocks are densely packed in the lattice, they tick uniformly relative to ideal atomic standards (they are ideal clocks), and they are synchronized by the Einstein light-pulse process. However, there is one crucial change from special relativity: The latticework must be small enough that one can neglect the effects of inhomogeneities of gravity (which general relativity will associate with spacetime curvature; and which, for example, would cause two freely floating particles, one nearer the earth than the other, to gradually move apart even though initially they are at rest with respect to each other). The necessity for smallness is embodied in the word “local” of “local inertial frame”, and we shall quantify it with ever greater precision as we move on through this chapter. We shall use the phrases local Lorentz frame and local inertial frame interchangeably to describe the above type of synchronized, orthonormal latticework; and the spacetime coordinates t, x, y, z that the latticework provides (in the manner of Sec. 1.2) we shall call, interchangeably, local Lorentz coordinates and local inertial coordinates. Since, in the presence of gravity, inertial reference frames must be restricted to be local, the inertial-frame version of the principle of relativity must similarly be restricted to say: All the local, nongravitational laws of physics are the same in every local inertial frame, everywhere and everywhen in the universe. Here, by “local” laws we mean those laws, classical or quantum, which can be expressed entirely in terms of quantities confined to

6 (measurable within) a local inertial frame; and the exclusion of gravitational laws from this version of the principle of relativity is necessary because gravity is to be described by a curvature of spacetime which (by definition, see below) cannot show up in a local inertial frame. This version of the principle of relativity can be described in operational terms using precisely the same language as for the special relativistic version (Secs. 22.2.1 and 22.2.2): If two different observers, in two different local Lorentz frames, in different (or the same) regions of the universe and epochs of the universe, are given identical written instructions for a self-contained physics experiment (an experiment that can be performed within the confines of the local Lorentz frame), then their two experiments must yield the same results, to within their experimental accuracies. It is worth emphasizing that the principle of relativity is asserted to hold everywhere and everywhen in the universe: the local laws of physics must have the same form in the early universe, a fraction of a second after the big bang, as they have on earth today, and as they have at the center of the sun or inside a black hole. It is reasonable to expect that the specific forms that the local, nongravitational laws of physics take in general relativistic local Lorentz frames are the same as they take in the (global) Lorentz frames of special relativity. The assertion that this is so is a modern version of Einstein’s equivalence principle. In the next section we will use this principle to deduce some properties of the general relativistic spacetime metric; and in Sec. 23.7 we will use it to deduce the explicit forms of some of the nongravitational laws of physics in curved spacetime.

23.3

The Spacetime Metric, and Gravity as a Curvature of Spacetime

The Einstein equivalence principle guarantees that nongravitational physics within a local Lorentz frame can be described using a spacetime metric g, which gives for the invariant interval between neighboring events with separation vector ξ~ = ∆xα ∂/∂xα , the standard special relativistic expression ξ~2 = gαβ ξ α ξ β = (∆s)2 = −(∆t)2 + (∆x)2 + (∆y)2 + (∆z)2 .

(23.6)

Correspondingly, in a local Lorentz frame the components of the spacetime metric take on their standard special-relativity values gαβ = ηαβ ≡ {−1 if α = β = 0 , +1 if α = β = (x, or y, or z), 0 otherwise} .

(23.7)

Turn, now, to a first look at the gravity-induced constraints on the size of a local Lorentz frame: Above the earth set up, initially, a family of local Lorentz frames scattered over the entire region from two earth radii out to four earth radii, with all the frames initially at rest with respect to the earth [Fig. 23.2(a)]. From experience—or, if you prefer, from Newton’s theory of gravity which after all is quite accurate near earth—we know that as time passes these frames will all fall toward the earth. If (as a pedagogical aid) we drill holes through the earth to let the frames continue falling after reaching the earth’s surface, the frames will all pass through the earth’s center and fly out the earth’s opposite side.

7

(a)

(b)

Fig. 23.2: (a) A family of local Lorentz frames, all momentarily at rest above the earth’s surface. (b) A family of local, 2-dimensional Euclidean coordinate systems on the earth’s surface. The nonmeshing of Lorentz frames in (a) is analogous to the nonmeshing of Euclidean coordinates in (b) and motivates attributing gravity to a curvature of spacetime.

Obviously, two adjacent frames, which initially were at rest with respect to each other, acquire a relative velocity during their fall, which causes them to interpenetrate and pass through each other as they cross the earth’s center. Gravity is the cause of this relative velocity. If these two adjacent frames could be meshed to form a larger Lorentz frame, then as time passes they would always remain at rest relative to each other. Thus, a meshing to form a larger Lorentz frame is impossible. The gravity-induced relative velocity prevents it. In brief: Gravity prevents the meshing of local Lorentz frames to form global Lorentz frames. This situation is closely analogous to the nonmeshing of local, 2-dimensional, Euclidean coordinate systems on the surface of the earth [Figure 23.2(b)]: The curvature of the earth prevents a Euclidean mesh—thereby giving grief to map makers and surveyors. This analogy suggested to Einstein, in 1912, a powerful new viewpoint on gravity: Just as the curvature of space prevents the meshing of local Euclidean coordinates on the earth’s surface, so it must be that a curvature of spacetime prevents the meshing of local Lorentz frames in the spacetime above the earth—or anywhere else in spacetime, for that matter. And since it is already known that gravity is the cause of the nonmeshing of Lorentz frames, it must be that gravity is a manifestation of spacetime curvature. To make this idea more quantitative, consider, as a pedagogical tool, the 2-dimensional metric of the earth’s surface expressed in terms of a spherical polar coordinate system and in “line-element” form: ds2 = R2 dθ2 + R2 sin2 θdφ2 . (23.8) Here R is the radius of the earth, or equivalently the “radius of curvature” of the earth’s surface. This line element, rewritten in terms of the alternative coordinates π x ≡ Rφ , y ≡ R −θ , (23.9) 2 has the form ds2 = cos2 (y/R)dx2 + dy 2 = dx2 + dy 2 + O(y 2/R2 )dx2 , (23.10) where O(y 2/R2 ) means “terms of order y 2/R2 ” or smaller. Notice that the metric coefficients have the standard Euclidean form gjk = δjk all along the equator (y = 0); but as one

8 moves away from the equator, they begin to differ from Euclidean by fractional amounts of O(y 2/R2 ) = O[y 2 /(radius of curvature of earth)2 ]. Thus, local Euclidean coordinates can be meshed and remain Euclidean all along the equator—or along any other great circle—, but the earth’s curvature forces the coordinates to cease being Euclidean when one moves off the chosen great circle, thereby causing the metric coefficients to differ from δjk by amounts ∆gjk = O[(distance from great circle)2 / (radius of curvature)2 ]. Turn next to a specific example of curved spacetime: that of a “k = 0 Friedmann model” for our expanding universe (to be studied in depth in Chap. 26 below). In spherical coordinates (η, χ, θ, φ), the 4-dimensional metric of this curved spacetime, described as a line element, takes the form ds2 = a2 (η)[−dη 2 + dχ2 + χ2 (dθ2 + sin2 θdφ2 )] ,

(23.11)

where a, the “expansion factor of the universe,” is a monotonic increasing function of the “time” coordinate η. This line element, rewritten near χ = 0 in terms of the alternative coordinates Z η 1 da , x = aχ sin θ cos φ , y = aχ sin θ sin φ , z = aχ cos θ , (23.12) t= adη + χ2 2 dη 0 takes the form [cf. Ex. 23.2] 2

α

β

ds = ηαβ dx dx + O

x2 + y 2 + z 2 R2

dxα dxβ ,

(23.13)

where R is a quantity which, by analogy with the radius of curvature R of the earth’s surface, can be identified as a “radius of curvature” of spacetime: 2 2 1 da da a ¨ a˙ where a ˙ ≡ , a ¨ ≡ . (23.14) = O + R2 a2 a dt x=y=z=0 dt2 x=y=z=0 From the form of the metric coefficients in Eq. (23.14) we see that all along the world line x = y = z = 0 the coordinates are precisely Lorentz, but as one moves away from that world line they cease to be Lorentz, and the metric coefficients begin to differ from ηαβ by amounts ∆gαβ =O[(distance from the chosen world line)2 /(radius of curvature of spacetime)2 ]. This is completely analogous to our equatorial Euclidean coordinates on the earth’s surface. The curvature of the earth’s surface prevented our local Euclidean coordinates from remaining Euclidean as we moved away from the equator; here the curvature of spacetime prevents our local Lorentz coordinates from remaining Lorentz as we move away from our chosen world line. Notice that our chosen world line is that of the spatial origin of our local Lorentz coordinates. Thus, we can think of those coordinates as provided by a spatially tiny latticework of rods and clocks, like that of Figure 23.1; and the latticework remains locally Lorentz for all time (as measured by its own clocks), but it ceases to be locally Lorentz when one moves a finite spatial distance (in its own frame) away from its spatial origin. (This is analogous to the local Euclidean coordinates on the Earth’s equator: they remain Euclidean all along

9 the equator [Eq. (23.10)], going all around the world, but they deviate from Euclidean when one moves away from the equator.) This behavior is generic. One can show that, if any freely falling observer, anywhere in spacetime, sets up a little latticework of rods and clocks in accord with our standard rules and keeps the latticework’s spatial origin on his or her free-fall world line, then the coordinates provided by the latticework will be locally Lorentz, with metric coefficients ( j k ) δ x x ηαβ + O jkR2 in a local Lorentz frame, gαβ = (23.15) ηαβ at spatial origin where R is the radius of curvature of spacetime. Notice that because the deviations of the metric from ηαβ are second order in the distance from the spatial origin, the first derivatives of the metric coefficients are of first order, gαβ,µ = O(xj /R2 ). This, plus the vanishing of the commutation coefficients in our coordinate basis, implies that the connection coefficients of the local Lorentz frame’s coordinate basis are   √ δjk xj xk   O R2 in a local Lorentz frame. Γα βγ = (23.16)   0 at spatial origin It is instructive to compare Eq. (23.15) for the metric in the local Lorentz frame of a freely falling observer in curved spacetime with Eq. (22.87) for the metric in the proper reference frame of an accelerated observer in flat spacetime. Whereas the spacetime curvature in (23.15) produces corrections to gαβ = ηαβ of second order in distance from the world line, the acceleration and spatial rotation of the reference frame in (22.87) produces corrections of first order. This remains true when one studies accelerated observers in curved spacetime (Chap. 24). In their proper reference frames the metric coefficients gαβ will contain both the first-order terms of (22.87) due to acceleration and rotation, and the second-order terms of (23.15) due to spacetime curvature. **************************** EXERCISES Exercise 23.2 Derivation: Local Lorentz Frame in Friedman Universe By inserting the coordinate transformation (23.12) into the Friedman metric (23.11), derive the metric (23.13), (23.14) for a local Lorentz frame. ****************************

23.4

Free-fall Motion and Geodesics of Spacetime

In order to make more precise the concept of spacetime curvature, we will need to study quantitatively the relative acceleration of neighboring, freely falling particles.1 Before we 1

See MTW pp. 244–247, 312–324

10 can carry out such a study, however, we must understand quantitatively the motion of a single freely falling particle in curved spacetime. That is the objective of this section. In a global Lorentz frame of flat, special relativistic spacetime a free particle moves along a straight world line, i.e., a world line with the form (t, x, y, z) = (to , xo , yo, zo ) + (p0 , px , py , pz )ζ ;

i.e., xα = xαo + pα ζ .

(23.17)

Here pα are the Lorentz-frame components of the particle’s 4-momentum; ζ is the affine parameter such that p~ = d/dζ, i.e., pα = dxα /dζ [Eq. (1.18) ff ]; and xαo are the coordinates of the particle when its affine parameter is ζ = 0. The straight-line motion (23.17) can be described equally well by the statement that the Lorentz-frame components pα of the particle’s 4-momentum are constant, i.e., are independent of ζ dpα =0. dζ

(23.18)

Even nicer is the frame-independent description, which says that as the particle moves it parallel-transports its tangent vector p~ along its world line ∇p~ ~p = 0 ,

or, equivalently pα ;β pβ = 0 .

(23.19)

For a particle of nonzero rest mass m, which has p~ = m~u and ζ = τ /m with ~u = d/dτ its 4-velocity and τ its proper time, Eq. (23.19) is equivalent to ∇u~ ~u = 0. This is the form of the particle’s law of motion discussed in Eq. (22.93). This description of the motion is readily carried over into curved spacetime using the equivalence principle: Let P(ζ) be the world line of a freely moving particle in curved spacetime. At a specific event Po = P(ζo ) on that world line introduce a local Lorentz frame (so the frame’s spatial origin, like the particle, passes through Po as time progresses). Then the equivalence principle tells us that the particle’s law of motion must be the same in this local Lorentz frame as it is in the global Lorentz frame of special relativity: α dp =0. (23.20) dζ ζ=ζo More powerful than this local-Lorentz-frame description of the motion is a description that is frame-independent. We can easily deduce such a description from Eq. (23.20). Since the connection coefficients vanish at the origin of the local Lorentz frame where (23.20) is being evaluated [cf. Eq. (23.16)], (23.20) can be written equally well, in our local Lorentz frame, as α γ γ dp α β dx α α β dx + Γ βγ p (23.21) = (p ,γ + Γ βγ p ) = (pα ;γ pγ )ζ=ζo . 0= dζ dζ ζ=ζo dζ ζ=ζo Thus, as the particle passes through the spatial origin of our local Lorentz coordinate system, the components of the directional derivative of its 4-momentum along itself vanish. Now, if two 4-vectors have components that are equal in one basis, their components are guaranteed [by the tensorial transformation law (22.19)] to be equal in all bases, and correspondingly

11 the two vectors, viewed as frame-independent, geometric objects, must be equal. Thus, since Eq. (23.21) says that the components of the 4-vector ∇p~p~ and the zero vector are equal in our chosen local Lorentz frame, it must be true that ∇p~p~ = 0 .

(23.22)

at the moment when the particle passes through the point Po = P(ζo ). Moreover, since Po is an arbitrary point (event) along the particle’s world line, it must be that (23.22) is a geometric, frame-independent equation of motion for the particle, valid everywhere along its world line. Notice that this geometric, frame-independent equation of motion ∇p~p~ = 0 in curved spacetime is precisely the same as that [Eq. (23.19)] for flat spacetime. We shall generalize this conclusion to other laws of physics in Sec. 23.7 below. Our equation of motion (23.22) for a freely moving point particle says, in words, that the particle parallel transports its 4-momentum along its world line. In any curved manifold, ~ p~ ~p = 0 is called the geodesic equation, and the curve not just in spacetime, the relation ∇ to which ~p is the tangent vector is called a geodesic. On the surface of a sphere such as the earth, the geodesics are the great circles; they are the unique curves along which local Euclidean coordinates can be meshed, keeping one of the two Euclidean coordinates constant along the curve [cf. Eq. (23.10)], and they are the trajectories generated by an airplane’s inertial guidance system, which tries to fly the plane along the straightest trajectory it can. Similarly, in spacetime the trajectories of freely falling particles are geodesics; they are the unique curves along which local Lorentz coordinates can be meshed, keeping the three spatial coordinates constant along the curve and letting the time vary, thereby producing a local Lorentz reference frame [Eqs. (23.15) and (23.16)], and they are also the spacetime trajectories along which inertial guidance systems will guide a spacecraft. The geodesic equation guarantees that the square of the 4-momentum will be conserved along the particle’s world line; in slot-naming index notation, (gαβ pα pβ );γ pγ = 2gαβ pα pβ ;γ pγ = 0 .

(23.23)

(Here the standard rule for differentiating products has been used; this rule follows from the definition (22.27) of the frame-independent directional derivative of a tensor; it also can be deduced in a local Lorentz frame where Γα µν = 0 so each gradient with a “;” reduces to a partial derivative with a “,”.) Also in Eq. (23.23) the term involving the gradient of the metric has been discarded since it vanishes [Eq. (22.40)], and the two terms involving derivatives of pα and pβ , being equal, have been combined. In index-free notation the frameindependent relation (23.23) says ∇p~ (~p · ~p) = 2~p · ∇p~ p~ = 0 .

(23.24)

This is a pleasing result, since the square of the 4-momentum is the negative of the particle’s squared rest mass, p~ · p~ = −m2 , which surely should be conserved along the particle’s free-fall world line! Note that, as in flat spacetime, so also in curved, for a particle of finite rest mass the free-fall trajectory (the geodesic world line) is timelike, p~ · p~ = −m2 < 0, while for a zero-rest-mass particle it is null, ~p · p~ = 0. Spacetime also supports spacelike geodesics, i.e.,

12 curves with tangent vectors p~ that satisfy the geodesic equation (23.22) and are spacelike, p~ · ~p > 0. Such curves can be thought of as the world lines of freely falling “tachyons,” i.e., faster-than-light particles—though it seems unlikely that such particles really exist in Nature. Note that the constancy of ~p · ~p along a geodesic implies that a geodesic can never change its character: if initially timelike, it will always remain timelike; if initially null, it will remain null; if initially spacelike, it will remain spacelike. When studying the motion of a particle with finite rest mass, one often uses as the tangent vector to the geodesic the particle’s 4-velocity ~u = p~/m rather than the 4-momentum, and correspondingly one uses as the parameter along the geodesic the particle’s proper time τ = mζ rather than ζ (recall: ~u = d/dτ ; p~ = d/dζ). In this case the geodesic equation becomes ∇u~ ~u = 0 ; (23.25) cf. Eq. (22.93). Similarly, for spacelike geodesics, one often uses as the tangent vector ~u = d/ds, where s is proper distance (square root of the invariant interval) along the geodesic; and the geodesic equation then assumes the same form (23.25) as for a timelike geodesic. The geodesic world line of a freely moving particle has three very important properties: (i) When written in a coordinate basis, the geodesic equation ∇p~ ~p = 0 becomes the following differential equation for the particle’s world line xα (ζ) in the coordinate system [Ex. 23.3] d2 xα dxµ dxν α (23.26) = −Γ . µν dζ 2 dζ dζ Here Γα µν are the connection coefficients of the coordinate system’s coordinate basis. [Equation (22.95) was a special case of this.] Note that these are four coupled equations (α = 0, 1, 2, 3) for the four coordinates xα as functions of affine parameter ζ along the geodesic. If the initial position, xα at ζ = 0, and initial tangent vector (particle momentum), pα = dxα /dζ at ζ = 0, are specified, then these four equations will determine uniquely the coordinates xα (ζ) as a function of ζ along the geodesic. (ii) Consider a spacetime that possesses a symmetry, which is embodied in the fact that the metric coefficients in some coordinate system are independent of one of the coordinates xA . Associated with that symmetry there will be a conserved quantity pA ≡ p~ · ∂/∂xA associated with free-particle motion. Exercise 23.4 derives this result and develops a familiar example. (iii) Among all timelike curves linking two events P0 and P1 in spacetime, those whose proper time lapse (timelike length) is stationary under small variations of the curve are timelike geodesics; see Ex. 23.5. In other words, timelike geodesics are the curves that satisfy the action principle (23.30) below. Now, one can always send a photon from P0 to P1 by bouncing it off a set of strategically located mirrors, and that photon path is the limit of a timelike curve as the curve becomes null. Therefore, there exist timelike curves from P0 to P1 with vanishingly small length, so the geodesics cannot minimize the proper time lapse. This means that the curve of maximal proper time lapse (length) is a geodesic, and that any other geodesics will have a length that is a “saddle point” (stationary under variations of the path but not a maximum or a minimum).

13 **************************** EXERCISES Exercise 23.3 Derivation: Geodesic equation in an arbitrary coordinate system. Show that in an arbitrary coordinate system xα (P) the geodesic equation (23.22) takes the form (23.26). Exercise 23.4 Derivation: Constant of Geodesic Motion in a Spacetime with Symmetry (a) Suppose that in some coordinate system the metric coefficients are independent of some specific coordinate xA : gαβ,A = 0. [Example: in spherical polar coordinates t, r, θ, φ in flat spacetime gαβ,φ = 0, so we could set xA = φ.] Show that pA ≡ p~ ·

∂ ∂xA

(23.27)

is a constant of the motion for a freely moving particle [pφ = (conserved z-component of angular momentum) in above, spherically symmetric example]. [Hint: Show that the geodesic equation can be written in the form dpα − Γµαν pµ pν = 0 , dζ

(23.28)

where Γµαν is the covariant Christoffel symbol of Eqs. (22.38), (22.39).] Note the analogy of the constant of the motion pA with Hamiltonian mechanics: there, if the Hamiltonian is independent of xA then the generalized momentum pA is conserved; here, if the metric coefficients are independent of xA , then the covariant component pA of the momentum is conserved. For an elucidation of the connection between these two conservation laws, see the Hamiltonian formulation of geodesic motion in Exercise 25.2 of MTW. (b) As an example, consider a particle moving freely through a time-independent, Newtonian gravitational field. In Ex. 23.12 below we shall learn that such a gravitational field can be described in the language of general relativity by the spacetime metric ds2 = −(1 + 2Φ)dt2 + (δjk + hjk )dxj dxk ,

(23.29)

where Φ(x, y, z) is the time-independent Newtonian potential and hjk are contributions to the metric that are independent of the time coordinate t and have magnitude of order |Φ|. That the gravitational field is weak means |Φ| ≪ 1 (or, in cgs units, |Φ/c2 | ≪ 1). The coordinates being used are Lorentz, aside from tiny corrections of order |Φ|; and, as this exercise and Ex. 23.12 show, they coincide with the coordinates of the Newtonian theory of gravity. Suppose that the particle has a velocity 1 v j ≡ dxj /dt through this coordinate system that is less than or of order |Φ| 2 and thus small compared to the speed of light. Because the metric is independent of the time coordinate t, the component pt of the particle’s 4-momentum must be conserved

14 along its world line. Since, throughout physics, the conserved quantity associated with time-translation invariance is always the energy, we expect that pt , when evaluated accurate to first order in |Φ|, must be equal to the particle’s conserved Newtonian energy, E = mΦ + 21 mv j v k δjk , aside from some multiplicative and additive constants. Show that this, indeed, is true, and evaluate the constants. Exercise 23.5 Problem: Action principle for geodesic motion Show, by introducing a specific but arbitrary coordinate system, that among all timelike world lines that a particle could take to get from event P0 to P1 , the one or ones whose proper time lapse is stationary under small variations of path are the free-fall geodesics. In other words, an action principle for a timelike geodesic P(λ) [i.e., xα (λ) in any coordinate system xα ] is 1 Z 1 Z P1 dxα dxβ 2 dτ = δ gαβ (23.30) dλ = 0 , dλ dλ P0 0 where λ is an arbitrary parameter which, by definition, ranges from 0 at P0 to 1 at P1 . [Note: unless, after the variation, you choose the arbitrary parameter λ to be “affine” (λ = aτ + b where a and b are constants), your equation for d2 xα /dλ2 will not look quite like (23.26).] ****************************

23.5

Relative Acceleration, Tidal Gravity, and Spacetime Curvature

Now that we understand the motion of an individual freely falling particle in curved spacetime, we are ready to study the effects of gravity on the relative motions of such particles.2 Before doing so in general relativity, let us recall the Newtonian discussion of the same problem:

23.5.1

Newtonian Description of Tidal Gravity

Consider, as shown in Fig. 23.3(a), two point particles, A and B, falling freely through 3dimensional Euclidean space under the action of an external Newtonian potential Φ (i.e., a potential generated by other masses, not by the particles themselves). At Newtonian time t = 0 the particles are separated by only a small distance and are moving with the same velocity vA = vB . As time passes, however, the two particles, being at slightly different locations in space, experience slightly different gravitational potentials Φ and gravitational accelerations g = −∇Φ and thence develop slightly different velocities, vA 6= vB . To quantify this, denote by ξ the vector separation of the two particles in Euclidean 3-space. The components of ξ on any Euclidean basis [e.g., that of Fig. 23.3(a)] are ξ j = xjB − xjA , where xjI is the coordinate location of particle I. Correspondingly, the rate of change of ξ j 2

See MTW pp. 29–37, 218–224, 265–275

1

1

3

=1

3

2

2

y

t

t=0

t=0

=.5

z

=0

15

3

2

2 1

1

3 ζ=0

(a)

p=(

p

z

x

(b)

x

=(

ζ)

ζ)

ζ=0

Fig. 23.3: The effects of tidal gravity on the relative motions of two freely falling particles. Diagram (a) depicts this in a Euclidean 3-space diagram using Newton’s theory of gravity. Diagram (b) depicts it in a spacetime diagram using Einstein’s theory of gravity, general relativity.

with respect to Newtonian time is dξ j /dt = vBj − vAj ; i.e., the relative velocity of the two particles is the difference of their two velocities. The second time derivative of the relative separation, i.e., the relative acceleration of the two particles, is thus given by d2 xjB d2 xjA ∂Φ ∂Φ ∂2Φ k d2 ξ j = − = − + = − ξ , (23.31) dt2 dt2 dt2 ∂xj B ∂xj A ∂xj ∂xk accurate to first order in the separation ξ k . This equation gives the components of the relative acceleration in an arbitrary Euclidean basis. Rewritten in geometric, basis-independent language this equation says d2 ξ = −E(. . . , ξ) ; dt2

i.e.,

d2 ξ j = −E j k ξ k , dt2

(23.32)

where E is a symmetric, second-rank tensor, called the Newtonian tidal gravitational field : E = ∇∇Φ = −∇g ;

i.e., Ejk =

∂2Φ in Euclidean coordinates. ∂xj ∂xk

(23.33)

The name “tidal gravitational field” comes from the fact that this is the field which, generated by the moon and the sun, produces the tides on the earth’s oceans. Note that, since this field is the gradient of the Newtonian gravitational acceleration g, it is a quantitative measure of the inhomogeneities of Newtonian gravity. Equation (23.31) shows quantitatively how the tidal gravitational field produces the relative acceleration of our two particles. As a specific application, one can use it to compute, in Newtonian theory, the relative accelerations and thence relative motions of two neighboring local Lorentz frames as they fall toward and through the center of the earth [Fig. 23.2(a) and associated discussion].

16

23.5.2

Relativistic Description of Tidal Gravity

Turn attention, now, to the general relativistic description of the relative motions of two free particles. As shown in Fig. 23.3(b), the particles, labeled A and B, move along geodesic world lines with affine parameters ζ and 4-momentum tangent vectors ~p = d/dζ. The origins of ζ along the two world lines can be chosen however we wish, so long as events with the same ζ on the two world lines, PA (ζ) and PB (ζ) are close enough to each other that we can ~ perform power-series expansions in their separation, ξ(ζ) = PB (ζ) − PA (ζ), and keep only the leading terms. As in our Newtonian analysis, we require that the two particles initially have vanishing relative velocity, ∇p~ ξ~ = 0, and we shall compute the tidal-gravity-induced ~ relative acceleration ∇p~ ∇p~ ξ. As a tool in our calculation, we shall introduce into spacetime a two-dimensional surface which contains our two geodesics A and B, and also contains an infinity of other geodesics in between and alongside them; and on that surface we shall introduce two coordinates, ζ =(affine parameter along each geodesic) and λ=(a parameter that labels the geodesics); see Fig. 23.3(b). Geodesic A will carry the label λ = 0; geodesic B will be λ = 1; ξ~ ≡ (∂/∂λ)ζ=const will be a vector field which, evaluated on geodesic A (i.e., at λ = 0), is equal to the separation vector we wish to study; and the vector field ~p = (∂/∂ζ)λ=const will be a vector field which, evaluated on any geodesic (A, B, or other curve of constant λ), is equal to the 4-momentum of the particle which moves along that geodesic. Our identification of (∂/∂λ)ζ=const (λ = 0) with the separation vector ξ~ between geodesics A and B is the leading term in a power series expansion; it is here that we require, for good accuracy, that the geodesics be close together and be so parametrized that PA (ζ) is close to PB (ζ). Our objective is to compute the relative acceleration of particles B and A, ∇p~∇p~ ξ~ eval~ which we wish to differentiate a second time in that uated at λ = 0. The quantity ∇p~ ξ, computation, is one of the terms in the following expression for the commutator of the vector fields ~p and ξ~ [Eq. (22.41)]: ~ = ∇p~ξ~ − ∇~p~ . [~p, ξ] (23.34) ξ

Because p~ = (∂/∂ζ)λ and ξ~ = (∂/∂λ)ζ , these two vector fields commute, and Eq. (23.34) tells us that ∇p~ ξ~ = ∇ξ~~p. Correspondingly, the relative acceleration of our two particles can be expressed as ∇p~ ∇p~ξ~ = ∇p~ ∇ξ~~p = (∇p~∇ξ~ − ∇ξ~∇p~)~p . (23.35) Here the second equality results from adding on, for use below, a term that vanishes because ∇p~ ~p = 0 (geodesic equation). This first part of our calculation was performed efficiently using index-free notation. The next step will be easier if we introduce indices as names for slots. Then expression (23.35) takes the form (ξ α ;β pβ );γ pγ = (pα ;γ ξ γ );δ pδ − (pα ;γ pγ );δ ξ δ , (23.36) which can be evaluated by using the rule for differentiating products and then renaming indices and collecting terms; the result is (ξ α ;β pβ );γ pγ = (pα ;γδ − pα ;δγ )ξ γ pδ + pα ;γ (ξ γ ;δ pδ − pγ ;δ ξ δ ) .

(23.37)

17 The second term in this expression vanishes, since it is just the commutator of ξ~ and p~ [Eq. (23.34)] written in slot-naming index notation, and as we noted above, ξ~ and p~ commute. The remaining first term, (ξ α ;β pβ );γ pγ = (pα ;γδ − pα ;δγ )ξ γ pδ ,

(23.38)

reveals that the relative acceleration of the two particles is caused by noncommutation of the two slots of a double gradient (slots here named γ and δ). In the flat spacetime of special relativity the two slots would commute and there would be no relative acceleration. Spacetime curvature prevents them from commuting and thereby causes the relative acceleration. Now, one can show that pα ;γδ − pα ;δγ is linear in pα ; see Ex. 23.6. Therefore, there must exist a fourth rank tensor field R( , , , ) such that pα ;γδ − pα ;δγ = −Rα βγδ pβ

(23.39)

for any vector field p~(P). The tensor R can be regarded as responsible for the failure of gradients to commute, so it must be some aspect of spacetime curvature. It is called the Riemann curvature tensor. Inserting Eq. (23.39) into Eq. (23.38) and writing the result in both slot-naming index notation and abstract notation, we obtain (ξ α ;β pβ );γ pγ = −Rα βγδ pβ ξ γ pδ ,

~ ~p) . ∇p~ ∇p~ξ~ = −R( . . . , ~p, ξ,

(23.40)

This is the equation of relative acceleration for freely moving test particles. It is also called the equation of geodesic deviation, because it describes the manner in which spacetime curvature R forces geodesics that are initially parallel (the world lines of freely moving particles with zero initial relative velocity) to deviate from each other; cf. Fig. 23.3(b).

23.5.3

Comparison of Newtonian and Relativistic Descriptions

It is instructive to compare this relativistic description of the relative acceleration of freely moving particles with the Newtonian description. For this purpose we shall consider a region of spacetime, such as our solar system, in which the Newtonian description of gravity is highly accurate; and there we shall study the relative acceleration of two free particles from the viewpoint of a local Lorentz frame in which the particles are both initially at rest. In the Newtonian description, the transformation from a Newtonian universal reference frame (e.g., that of the center of mass of the solar system) to the chosen local Lorentz frame is achieved by introducing new Euclidean coordinates that are uniformly accelerated relative to the old ones, with just the right uniform acceleration to annul the gravitational acceleration at the center of the local Lorentz frame. This transformation adds a spatially homogeneous constant to the Newtonian acceleration g = −∇Φ but leaves unchanged the tidal field E = ∇∇Φ. Correspondingly, the Newtonian equation of relative acceleration in the local Lorentz frame retains its standard Newtonian form, d2 ξ j /dt2 = −E j k ξ k [Eq. (23.32)], with the components of the tidal field computable equally well in the original universal reference frame, or in the local Lorentz frame, from the standard relation E j k = Ejk = ∂ 2 Φ/∂xj ∂xk .

18 As an aid in making contact between the relativistic and the Newtonian descriptions, we shall convert over from using the 4-momentum ~p as the tangent vector and ζ as the parameter along the particles’ world lines to using the 4-velocity ~u = p~/m and the proper time τ = mζ; this conversion brings the relativistic equation of relative acceleration (23.40) into the form ~ ~u) . ∇u~ ∇u~ ξ~ = −R( . . . , ~u, ξ, (23.41) Because the particles are (momentarily) at rest near the origin of the local Lorentz frame, their 4-velocities are ~u ≡ d/dτ = ∂/∂t, which implies that the components of their 4velocities are u0 = 1, uj = 0, and their proper times τ are equal to coordinate time t, which in turn coincides with the time t of the Newtonian analysis: τ = t. In the relativistic analysis, as in the Newtonian, the separation vector ξ~ will have only spatial components, ξ 0 = 0 and ξ j 6= 0. [If this were not so, we could make it so by a readjustment of the origin of proper time for particle B; cf. Fig. 23.3(b).] These facts, together with the vanishing of all the connection coefficients and derivatives of them (Γj k0,0) that appear in (ξ j ;β uβ );γ uγ at the origin of the local Lorentz frame [cf. Eqs. (23.15) and (23.16)], imply that the local Lorentz components of the equation of relative acceleration (23.41) take the form d2 ξ j = −Rj 0k0 ξ k . dt2

(23.42)

By comparing this with the Newtonian equation of relative acceleration (23.32) we infer that, in the Newtonian limit, in the local rest frame of the two particles, Rj 0k0 = Ejk =

∂2Φ . ∂xj ∂xk

(23.43)

Thus, the Riemann curvature tensor is the relativistic generalization of the Newtonian tidal field . This conclusion and the above equations make quantitative the statement that gravity is a manifestation of spacetime curvature. Outside a spherical body with weak (Newtonian) gravity, such as the Earth, the Newtonian potential is Φ = −GM/r, where G is Newton’s gravitation constant, M is the body’s mass and r is the distance from its center. If we introduce Cartesian coordinates with origin at the body’s center and with z-axis through the point at which the Riemann tensor is to 1 be measured, then Φ in these coordinates is Φ = −GM/(z 2 + x2 + y 2) 2 , and on the z-axis the only nonzero Rj 0k0 , as computed from Eq. (23.43), are Rz 0z0 =

−2GM , r3

Rx 0x0 = Ry 0y0 =

+GM . r3

(23.44)

Correspondingly, for two particles separated from each other in the radial (z) direction, the relative acceleration (23.42) is d2 ξ j /dt2 = +(2GM/r 3 )ξ j ; i.e., the particles are pulled apart by the body’s tidal gravitational field. Similarly, for two particles separated from each other in a horizontal direction (in the x–y plane), d2 ξ j /dt2 = −(GM/r 3 )ξ j ; i.e., the particles are pushed together by the body’s tidal gravitational field. There thus is a radial tidal stretch and a lateral tidal squeeze; and the lateral squeeze has half the strength of the radial stretch

19 but occurs in two laterial dimensions compared to the one radial dimension. These stretch and squeeze, produced by the sun and moon, are responsible for the tides on the earth’s oceans. **************************** EXERCISES Exercise 23.6 Derivation: Linearity of Commutator of Double Gradient (a) Let a and b be scalar fields with arbitrary but smooth dependence on location in ~ and B ~ be tensor fields. Show that spacetime, and A (aAα + bB α );γδ − (aAα + bB α );δγ = a(Aα ;γδ − Aα ;δγ ) + b(B α ;γδ − B α ;δγ ) .

(23.45)

[Hint: The double gradient of a scalar field commutes, as one can easily see in a local Lorentz frame.] (b) Use Eq. (23.45) to show that (i) the commutator of the double gradient is independent of how the differentiated vector field varies from point to point, and depends only on the value of the field at the location where the commutator is evaluated, and (ii) the commutator is linear in that value. Thereby conclude that there must exist a fourth rank tensor field R such that Eq. (23.39) is true for any vector field ~p. ****************************

23.6

Properties of the Riemann Curvature Tensor

We now pause, in our study of the foundations of general relativity, to examine a few properties of the Riemann curvature tensor R.3 We begin, as a tool for deriving other things, by evaluating the components of the Riemann tensor at the spatial origin of a local Lorentz frame; i.e. at a point where Γα βγ vanishes but its derivatives do not. For any vector field p~ a straightforward computation reveals pα ;γδ − pα ;δγ = (Γα βγ,δ − Γα βδ,γ )pβ .

(23.46)

By comparing with Eq. (23.39), we can read off the local-Lorentz components of Riemann: Rα βγδ = Γα βδ,γ − Γα βγ,δ

at spatial origin of a local Lorentz frame. (23.47) p From this expression we infer that, at a spatial distance δij xi xj from the origin of a local Lorentz frame, the connection coefficients and the metric have magnitudes p Γα βγ = O(Rα µνλ δij xi xj ) , gαβ − ηαβ = O(Rµ νλρ δij xi xj ) , in a local Lorentz frame. (23.48) 3

See MTW pp. 273–288, 324–327.

20 Comparison with Eqs. (23.15) and (23.16) shows that the radius of curvature of spacetime (a concept defined only semiquantitatively) is of order the inverse square root of the components of the Riemann tensor in a local Lorentz frame: ! 1 in a local Lorentz frame. R=O (23.49) 1 |Rα βγδ | 2 By comparison with Eq. (23.44), we see that at radius r outside a weakly gravitating body of mass M, the radius of curvature of spacetime is

R∼

r3 GM

12

=

c2 r 3 GM

12

,

(23.50)

where the factor c in the second expression makes the formula valid in conventional units. For further discussion see Ex. 23.7. From the components (23.47) of the Riemann tensor in a local Lorentz frame, together with the vanishing of the connection coefficients at the origin and the standard expressions (10.37), (10.38) for the connection coefficients in terms of the metric components, one easily can show that 1 Rαβγδ = (gαδ,βγ + gβγ,αδ − gαγ,βδ − gβδ,αγ ) in a local Lorentz frame. 2

(23.51)

From these expressions, plus the commutation of partial derivatives gαγ,βδ = gαγ,δβ and the symmetry of the metric one easily can show that in a local Lorentz frame the components of the Riemann tensor have the following symmetries: Rαβγδ = −Rβαγδ , Rαβγδ = −Rαβδγ , Rαβγδ = +Rγδαβ

(23.52)

(antisymmetry in first pair of indices, antisymmetry in second pair of indices, and symmetry under interchange of the pairs). When one computes the value of the tensor on four vectors, ~ B, ~ C, ~ D) ~ using component calculations in this frame, one trivially sees that these symR(A, metries produce corresponding symmetries under interchange of the vectors inserted into the slots, and thence under interchange of the slots themselves. This is always the case: any symmetry that the components of a tensor exhibit in a special basis will induce the same symmetry on the slots of the geometric, frame-independent tensor. The resulting symmetries for R are given by Eq. (23.52) with the “Escher mind-flip” [Sec. 1.5.3] in which the indices switch from naming components in a special frame to naming slots: The Riemann tensor is antisymmetric under interchange of its first two slots, antisymmetric under interchange of the last two, and symmetric under interchange of the two pairs. One additional symmetry can be verified, by calculation in the local Lorentz frame [i.e., from Eq. (23.51)]: Rαβγδ + Rαγδβ + Rαδβγ = 0 . (23.53)

21 (Note that this cyclic symmetry is the same as occurs in the Maxwell equations h i h (22.74)ior ~ ~ ~ ~ [D, ~ B ~] + (1.66), and also the same as occurs in the commutator identities B, [C, D ] + C, h i ~ [B, ~ C ~ ] = 0.) One can show that the full set of symmetries (23.52) and (23.53) reduces D, the number of independent components of the Riemann tensor, in 4-dimensional spacetime, from 44 = 256 to “just” 20. Of these 20 independent components, 10 are contained in the Ricci curvature tensor — which is the contraction of the Riemann tensor on its first and third slots Rαβ ≡ Rµ αµβ ,

(23.54)

and which by the symmetries (23.52) and (23.53) of Riemann is itself symmetric Rαβ = Rβα .

(23.55)

The other 10 independent components of Riemann are contained in the Weyl curvature tensor, which we will not study here; see, e.g., pp. 325 and 327 of MTW. The contraction of the Ricci tensor on its two slots, R ≡ Rα α , (23.56) is called the curvature scalar . One often needs to know the components of the Riemann curvature tensor in some nonlocal-Lorentz basis. Exercise 23.8 derives the following equation for them in an arbitrary basis: Rα βγδ = Γα βδ,γ − Γα βγ,δ + Γα µγ Γµ βδ − Γα µδ Γµ βγ − Γα βµ cγδ µ . (23.57) Here Γα βγ are the connection coefficients in the chosen basis, Γα βγ,δ is the result of letting the basis vector ~eδ act as a differential operator on Γα βγ , as though Γα βγ were a scalar, and cγδ µ are the basis vectors’ commutation coefficients. Calculations with this equation are usually very long and tedious, and so are carried out using symbolic-manipulation software on a computer. **************************** EXERCISES Exercise 23.7 Example: Orders of magnitude of the radius of curvature of spacetime With the help of the Newtonian limit (23.43) of the Riemann curvature tensor, show that near the earth’s surface the radius of curvature of spacetime has a magnitude R ∼ (1 astronomical unit) ≡ (distance from sun to earth). What is the radius of curvature of spacetime near the sun’s surface? near the surface of a white-dwarf star? near the surface of a neutron star? near the surface of a one-solar-mass black hole? in intergalactic space? Exercise 23.8 Derivation: Components of Riemann tensor in an arbitrary basis By evaluating expression (23.39) in an arbitrary basis (which might not even be a coordinate basis), derive Eq. (23.57) for the components of the Riemann tensor. In your derivation

22 keep in mind that commas denote partial derivations only in a coordinate basis; in an arbitrary basis they denote the result of letting a basis vector act as a differential operator; cf. Eq. (22.32). Exercise 23.9 Problem: Curvature of the surface of a sphere On the surface of a sphere such as the earth introduce spherical polar coordinates in which the metric, written as a line element, takes the form ds2 = a2 (dθ2 + sin2 θdφ2 ) ,

(23.58)

where a is the sphere’s radius. (a) Show (first by hand and then by computer) that the connection coefficients for the coordinate basis {∂/∂θ, ∂/∂φ} are Γθ φφ = − sin θ cos θ ,

Γφ θφ = Γφ φθ = cot θ ,

all others vanish.

(23.59)

(b) Show that the symmetries (23.52) and (23.53) of the Riemann tensor guarantee that its only nonzero components in the above coordinate basis are Rθφθφ = Rφθφθ = −Rθφφθ = −Rφθθφ .

(23.60)

(c) Show, first by hand and then by computer, that Rθφθφ = a2 sin2 θ .

(23.61)

(d) Show that in the basis {~eθˆ, ~eφˆ} =

1 ∂ 1 ∂ , a ∂θ a sin θ ∂φ

,

(23.62)

the components of the metric, the Riemann tensor, the Ricci tensor, and the curvature scalar are 1 2 1 (23.63) gˆj kˆ = δjk , Rθˆφˆθˆφˆ = 2 , Rˆj kˆ = 2 gˆj kˆ , R = 2 . a a a The first of these implies that the basis is orthonormal; the rest imply that the curvature is independent of location on the sphere, as it should be by spherical symmetry. [The θ-dependence in the coordinate components of Riemann, Eq. (23.61), like the θdependence in the metric component gφφ , is a result of the θ-dependence in the length of the coordinate basis vector ~eφ : |~eφ | = a sin θ.) Exercise 23.10 Problem: Geodesic Deviation on a Sphere Consider two neighboring geodesics (great circles) on a sphere of radius a, one the equator and the other a geodesic slightly displaced from the equator (by ∆θ = b) and parallel to it at φ = 0. Let ξ~ be the separation vector between the two geodesics, and note that at φ = 0, ξ~ = b∂/∂θ. Let l be proper distance along the equatorial geodesic, so d/dl = ~u is its tangent vector.

23 (a) Show that l = aφ along the equatorial geodesic. (b) Show that the equation of geodesic deviation (23.40) reduces to d2 ξ θ = −ξ θ , dφ2

d2 ξ φ =0. dφ2

(23.64)

(c) Solve this, subject to the above initial conditions, to obtain ξ θ = b cos φ ,

ξφ = 0 .

(23.65)

Verify, by drawing a picture, that this is precisely what one would expect for the separation vector between two great circles.

****************************

23.7

Curvature Coupling Delicacies in the Equivalence Principle, and Some Nongravitational Laws of Physics in Curved Spacetime4

If one knows a local, special relativistic, nongravitational law of physics in geometric, frameindependent form [for example, the expression for the stress-energy tensor of a perfect fluid in terms of its 4-velocity ~u and its rest-frame mass-energy density ρ and pressure P T = (ρ + P )~u ⊗ ~u + P g

(23.66)

Eq. (22.59)], then the equivalence principle guarantees that in general relativity the law will assume the same geometric, frame-independent form. One can see that this is so by the same method as we used to derive the general relativistic equation of motion ∇p~ p~ = 0 for free particles [Eq. (23.22) and associated discussion]: (i ) rewrite the special relativistic law in terms of components in a global Lorentz frame [T αβ = (ρ + P )uαuβ + P g αβ ], (ii ) then infer from the equivalence principle that this same component form of the law will hold, unchanged, in a local Lorentz frame in general relativity, and (iii ) then deduce that this component law is the local Lorentz frame version of the original geometric law [T = (ρ + P )~u ⊗ ~u + P g], now lifted into general relativity. Thus, when the local, nongravitational laws of physics are known in frame-independent form, one need not distinguish between whether they are special relativistic or general relativistic. In this conclusion the word local is crucial: The equivalence principle is strictly valid only at the spatial origin of a local Lorentz frame; and, correspondingly, it is in danger of failure for any law of physics that cannot be formulated solely in terms of quantities which reside at the 4

See MTW Chap. 16.

24 spatial origin—i.e., along a timelike geodesic. For the above example, T = (ρ+P )~u ⊗~u +P g, ~ · T = 0 there is there is no problem; and for the local law of conservation of 4-momentum ∇ no problem. However, for the global law of conservation of 4-momentum Z T αβ dΣβ = 0 (23.67) ∂V

[Eq. (1.97) and Fig. 1.17], there is serious trouble: This law is severely nonlocal, since it involves integration over a finite, closed 3-surface ∂V in spacetime. Thus, the equivalence principle fails for it. The failure shows up especially clearly when one notices (as we discussed in Sec. 22.3.4) that the quantity T αβ dΣβ which the integral is trying to add up over ∂V has one empty slot, named α; i.e., it is a vector. This means that to compute the integral (23.67) we must transport the contributions T αβ dΣβ from the various tangent spaces in which they normally live, to the tangent space of some single, agreed upon location, where they are to be added. By what rule should the transport be done? In special relativity one uses parallel transport, so the components of the vector are held fixed in any global Lorentz frame. However, it turns out that spacetime curvature makes parallel transport dependent on the path of the transport (and correspondingly, a vector is changed by parallel transport R around a closed curve). As a result, the integral ∂V T αβ dΣβ depends not only on the common location to which one transports each surface element’s contribution in order to add them, it also depends on the path of the transport, which in general is quite arbitrary. This dependence makes the integral ill defined and correspondingly causes a breakdown, in general relativity, in the global law of 4-momentum conservation. Another instructive example is the law by which a freely moving particle transports its spin angular momentum. The spin angular momentum is readily defined in the momentary local Lorentz rest frame of the particle’s center of mass; there it is a 4-vector with vanishing time component, and with space components given by the familiar integral Z Si = ǫijk xj T k0dx dy dz , (23.68) interior of body

where T k0 are the components of the momentum density. In special relativity the law of angular momentum conservation (e.g., MTW Sec. 5.11) guarantees that the Lorentz-frame components S α of this spin angular momentum remain constant, so long as no external torques act on the particle. This conservation law can be written in special relativistic, frame-independent notation, as Eq. (22.91), specialized to a non-accelerated particle: ~=0; ∇u~ S

(23.69)

~ is parallel transported along the world line of the particle (which has i.e., the spin vector S 4-velocity ~u). If this were a local law of physics, it would take this same form, unchanged, in general relativity, i.e., in curved spacetime. Whether the law is local or not depends, clearly, on the size of the particle. If the particle is vanishingly small in its own rest frame, then the law is local and (23.69) will be valid in general relativity. However, if the particle has finite size, the law (23.69) is in danger of failing—and, indeed it does fail if the particle’s finite size is accompanied by a finite quadrupole moment. In that case, the coupling of the quadrupole

25 moment Iαβ to the curvature of spacetime Rα βγδ produces a torque on the “particle”, causing a breakdown in (23.69): S α ;µ uµ = ǫαβγδ Iβµ Rµ νγζ uδ uν uζ .

(23.70)

The earth is a good example: the Riemann tensor Rα βγδ produced at earth by the moon and sun couples to the earth’s centrifugal-flattening-induced quadrupole moment Iµν ; and the resulting torque (23.70) causes the earth’s spin axis to precess relative to the distant stars, with a precession period of 26,000 years—sufficiently fast to show up clearly in historical records as well as in modern astronomical measurements. For details see, e.g., Ex. 16.4 of MTW. This example illustrates the fact that, if a small amount of nonlocality is present in a physical law, then when lifted from special relativity into general relativity, the law will acquire a small curvature-coupling modification. What is the minimum amount of nonlocality that can produce curvature-coupling modifications in physical laws? As a rough rule of thumb, the minimum amount is double gradients: Because the connection coefficients vanish at the origin of a local Lorentz frame, the local Lorentz components of a single gradient are the same as the components in a global Lorentz frame, e.g., Aα ;β = ∂Aα /∂xβ . However, because spacetime curvature prevents the spatial derivatives of the connection coefficients from vanishing at the origin of a local Lorentz frame, any law that involves double gradients is in danger of acquiring curvature-coupling corrections when lifted into general relativity. As an example, it turns out that the wave equation for the electromagnetic vector 4-potential, which in Lorenz gauge takes the form Aα;µ µ = 0 in flat spacetime, becomes in curved spacetime Aα;µ µ = Rαµ Aµ ,

(23.71)

where Rαµ is the Ricci curvature tensor; see Ex. 23.11 below. [Note: in Eq. (23.71), and always, all indices that follow the semicolon represent differentiation slots; i.e., Aα;µ µ ≡ Aα;µ ;µ .] The curvature-coupling ambiguities that occur when one lifts slightly nonlocal laws from special relativity into general relativity using the equivalence principle are very similar to “factor-ordering ambiguities” that occur when one lifts a Hamiltonian into quantum mechanics from classical mechanics using the correspondence principle. In the equivalence principle the curvature coupling can be regarded as due to the fact that double gradients, which commute in special relativity, do not commute in general relativity. In the correspondence principle the factor ordering difficulties result from the fact that quantities that commute classically [e.g., position x and momentum p] do not commute quantum mechanically [ˆ xpˆ 6= pˆxˆ], so when the products of such quantities appear in a classical Hamiltonian one does not know xpˆ + pˆxˆ)?]. their correct order in the quantum Hamiltonian [does xp become xˆpˆ, or pˆxˆ, or 12 (ˆ **************************** EXERCISES Exercise 23.11 Example: Curvature coupling in electromagnetic wave equation

26 Since the Maxwell equations, written in terms of the classically measureable electromagnetic field tensor F [Eqs. (22.74) or (1.66)], involve only single gradients, it is reasonable to expect them to be lifted into curved spacetime without curvature-coupling additions. Assume that this is true. It can be shown that: (i ) if one writes the electromagnetic field tensor F in terms of a ~ as 4-vector potential A Fαβ = Aβ;α − Aα;β , (23.72) then half of the curved-spacetime Maxwell equations, Fαβ;γ + Fβγ;α + Fγα;β = 0 [Eqs. (22.74)] are automatically satisfied; (ii ) F is unchanged by gauge transformations in which a gradient ~→A ~ + ∇ψ; ~ and (iii ) by such a gauge transformation one is added to the vector potential, A ~ ~ = 0 on the vector potential. can impose the Lorentz-gauge condition ∇ · A Show that, when the charge-current 4-vector vanishes, J~ = 0, the other half of the Maxwell equations, F αβ ;β = 0 [Eqs. (22.74)] become, in Lorenz gauge and in curved spacetime, the wave equation with curvature coupling, Eq. (23.71). ****************************

23.8

The Einstein Field Equation5

One crucial issue remains to be studied in this overview of the foundations of general relativity: What is the physical law that determines the curvature of spacetime? Einstein’s search for that law, his Einstein field equation, occupied a large fraction of his efforts during the years 1913, 1914, and 1915. Several times he thought he had found it, but each time his proposed law turned out to be fatally flawed; for some flavor of his struggle see the excerpts from his writings in Sec. 17.7 of MTW. In this section we shall briefly examine one segment of Einstein’s route toward his field equation: the segment motivated by contact with Newtonian gravity. The Newtonian potential Φ is a close analog of the general relativistic spacetime metric g: From Φ we can deduce everything about Newtonian gravity, and from g we can deduce everything about spacetime curvature. In particular, by differentiating Φ twice we can obtain the Newtonian tidal field E [Eq. (23.33)], and by differentiating the components of g twice we can obtain the components of the relativistic generalization of E: the components of the Riemann curvature tensor Rα βγδ [Eq. (23.51) in a local Lorentz frame; Eq. (23.57) in an arbitrary basis]. In Newtonian gravity Φ is determined by Newton’s field equation ∇2 Φ = 4πGρ ,

(23.73)

which can be rewritten in terms of the tidal field Ejk = ∂ 2 Φ/∂xj ∂xk as E j j = 4πGρ . 5

See MTW Chap. 17.

(23.74)

27 Note that this equates a piece of the tidal field, its trace, to the density of mass. By analogy we can expect the Einstein field equation to equate a piece of the Riemann curvature tensor (the analog of the Newtonian tidal field) to some tensor analog of the Newtonian mass density. Further guidance comes from the demand that in nearly Newtonian situations, e.g., in the solar system, the Einstein field equation should reduce to Newton’s field equation. To exploit that guidance, we can (i ) write the Newtonian tidal field for nearly Newtonian situations in terms of general relativity’s Riemann tensor, Ejk = Rj0k0 [Eq. (23.43); valid in a local Lorentz frame], (ii ) then take the trace and note that by its symmetries R0 000 = 0 so that E j j = Rα 0α0 = R00 , and (iii ) thereby infer that the Newtonian limit of the Einstein equation should read, in a local Lorentz frame, R00 = 4πGρ .

(23.75)

Here R00 is the time-time component of the Ricci curvature tensor—which can be regarded as a piece of the Riemann tensor. An attractive proposal for the Einstein field equation should now be obvious: Since the equation should be geometric and frame-independent, and since it must have the Newtonian limit (23.75), it presumably should say Rαβ = 4πG× (a second-rank symmetric tensor that generalizes the Newtonian mass density ρ). The obvious required generalization of ρ is the stress-energy tensor Tαβ , so Rαβ = 4πGTαβ .

(23.76)

Einstein flirted extensively with this proposal for the field equation during 1913–1915. However, it, like several others he studied, was fatally flawed. When expressed in a coordinate system in terms of derivatives of the metric components gµν , it becomes (because Rαβ and Tαβ both have ten independent components) ten independent differential equations for the ten gµν . This is too many equations: By an arbitrary change of coordinates, xαnew = F α (x0old , x1old , x2old , x3old ) involving four arbitrary functions F 0 , F 1 , F 2 , F 3 , one should be able to impose on the metric components four arbitrary conditions, analogous to gauge conditions in electromagnetism (for example, one should be able to set g00 = −1 and g0j = 0 everywhere); and correspondingly, the field equations should constrain only six, not ten of the components of the metric (the six gij in our example). In November 1915 Einstein (1915), and independently Hilbert (1915) [who was familiar with Einstein’s struggle as a result of private conversations and correspondence] discovered the resolution of this dilemma: Because the local law of 4-momentum conservation guarantees T αβ ;β = 0 independent of the field equation, if we replace the Ricci tensor in (23.76) by a constant (to be determined) times some new curvature tensor Gαβ that is also automatically divergence free independent of the field equation (Gαβ ;β ≡ 0), then the new field equation Gαβ = κT αβ (with κ = constant) will not constrain all ten components of the metric. Rather, in a coordinate system the four equations [Gαβ − κT αβ ];β = 0 with α = 0, 1, 2, 3 will automatically be satisfied; they will not constrain the metric components in any way, and there will remain in the field equation only six independent constraints on the metric components, precisely the desired number. It turns out, in fact, that from the Ricci tensor and the scalar curvature one can construct

28 a curvature tensor Gαβ with the desired property: 1 Gαβ ≡ Rαβ − Rg αβ . 2

(23.77)

Today we call this the Einstein curvature tensor . That it has vanishing divergence, independently of how one chooses the metric, ~ ·G≡0, ∇

(23.78)

is called the contracted Bianchi identity , since it can be obtained by contracting the following Bianchi identity on the tensor ǫα βµν ǫν γδǫ (Sec. 13.5 of MTW): Rα βγδ;ǫ + Rα βδǫ;γ + Rα βǫγ;δ = 0 .

(23.79)

[This Bianchi identity holds true for the Riemann curvature tensor of any and every “manifold”, i.e. of any and every smooth space; it is derived most easily by introducing a local Lorentz frame, by showing from (23.57) that in such a frame the components Rαβγδ of Riemann have the form (23.51) plus corrections that are quadratic in the distance from the origin, by then computing the left side of (23.79), with index α down, at the origin of that frame and showing it is zero, and by then arguing that because the origin of the frame was an arbitrary event in spacetime, and because the left side of (23.79) is the component of a tensor, the left side viewed as a frame-independent geometric object must vanish at all events in the manifold. For an extensive discussion of the Bianchi identities (23.79) and (23.78) see, e.g., Chap. 15 of MTW.] The Einstein field equation, then, should equate a multiple of T αβ to the Einstein tensor αβ G : Gαβ = κT αβ . (23.80) The proportionality factor κ is determined from the Newtonian limit: By rewriting the field equation (23.80) in terms of the Ricci tensor 1 Rαβ − g αβ R = κT αβ , 2

(23.81)

then taking the trace to obtain R = −κgµν T µν , then inserting this back into (23.81), we obtain 1 (23.82) Rαβ = κ(T αβ − g αβ gµν T µν ) . 2 In nearly Newtonian situations and in a local Lorentz frame, the mass-energy density T 00 ∼ =ρ j0 jk is far greater than the momentum density T and also far greater than the stress T ; and correspondingly, the time-time component of the field equation (23.82) becomes 1 1 1 R00 = κ(T 00 − η 00 η00 T 00 ) = κT 00 = κρ . 2 2 2

(23.83)

By comparing with the correct Newtonian limit (23.75) and noting that in a local Lorentz frame R00 = R00 , we see that κ = 8πG . (23.84)

29 Quantity Conventional Units speed of light, c 2.998 × 108 m sec−1 Newton’s gravitation constant, G 6.673 × 10−11 m3 kg−1 sec−2 G/c2 7.425 × 10−28 m kg−1 c5 /G 3.629 × 1052 W √ c2 / G 3.479 × 1024 gauss cm = 1.160 × 1024 volts Planck’s reduced constant ~ 1.055 × 10−34 kg m2 s−1 sun’s mass, M⊙ 1.989 × 1030 kg sun’s radius, R⊙ 6.960 × 108 m earth’s mass, M⊕ 5.977 × 1024 kg earth’s radius, R⊕ 6.371 × 106 m Hubble constant Ho 65 ± 25 km sec−1 Mpc−1 −27 density to close universe, ρcrit 9+11 kg m−3 −5 × 10

Geometrized Units one one one one one (1.616 × 10−35 m)2 1.477 km 6.960 × 108 m 4.438 mm 6.371 × 106 m [(12 ± 5) × 109 lt yr]−1 −54 7+8 m−2 −3 × 10

Table 23.1: Some useful quantities in conventional and geometrized units. Note: 1 Mpc = 106 parsecs (pc), 1 pc = 3.026 light year (“lt yr”), 1 lt yr = 0.946 × 1016 m, 1 AU = 1.49 × 1011 m. For other useful astronomical constants see C. W. Allen, Astrophysical Quantities.

By now the reader must be accustomed to our use of geometrized units in which the speed of light is unity. Just as that has simplified greatly the mathematical notation in Chapters 1, 22 and 23, so also future notation will be greatly simplified if we set Newton’s gravitation constant to unity. This further geometrization of our units corresponds to equating mass units to length units via the relation 1=

m G = 7.42 × 10−28 ; 2 c kg

i.e., 1 kg = 7.42 × 10−28 m .

(23.85)

Any equation can readily be converted from conventional units to geometrized units by removing all factors of c and G; and it can readily be converted back by inserting whatever factors of c and G one needs in order to make both sides of the equation dimensionally correct. The caption of Table 23.1 lists a few important numerical quantities in both conventional units and geometrized units. (SI units are badly suited to dealing with relativistic electrodynamics; for this reason J. D. Jackson has insisted on switching from SI to Gaussian units in the last 1/3 of the 1999 edition of his classic textbook, and we do the same in the relativity portions of this book and in Table 23.1.) In geometrized units the Einstein field equation (23.80), with κ = 8πG = 8π [Eq. (23.84)], assumes the following standard form, to which we shall appeal extensively in coming chapters: Gµν = 8πT µν ;

23.9

i.e., G = 8πT .

(23.86)

Weak Gravitational Fields

The foundations of general relativity are all now in our hands. In this concluding section of the chapter, we shall explore their predictions for the properties of weak gravitational fields,

30 beginning with the Newtonian limit of general relativity and then moving on to more general situations.

23.9.1

Newtonian Limit of General Relativity

A general relativistic gravitational field (spacetime curvature) is said to be weak if there exist “nearly globally Lorentz” coordinate systems in which the metric coefficients differ only slightly from unity: gαβ = ηαβ + hαβ ,

with |hαβ | ≪ 1 .

(23.87)

The Newtonian limit requires that gravity be weak in this sense throughout the system being studied. It further requires a slow-motion constraint, which has three aspects: (i ) The sources of the gravity must have slow enough motions that, with some specific choice of the nearly globally Lorentz coordinates, |hαβ,t | ≪ |hαβ,j | ; (23.88) (ii ) the sources’ motions must be slow enough that in this frame the momentum density is very small compared to the energy density |T j0| ≪ T 00 ≡ ρ ;

(23.89)

and (iii ) any particles on which the action of gravity is to be studied must move with low velocities; i.e., must have 4-velocities satisfying |uj | ≪ u0 .

(23.90)

Finally, the Newtonian limit requires that the stresses in the gravitating bodies be very small compared to their mass densities |T jk | ≪ T 00 ≡ ρ . (23.91) When conditions (23.87)–(23.91) are all satisfied, then at leading nontrivial order in the small dimensionless quantities |hαβ |, |hαβ,t |/|hαβ,j |, |T j0|/T 00 , |uj |/u0, and |T jk |/T 00 the laws of general relativity reduce to those of Newtonian theory. The details of this reduction are an exercise for the reader [Ex. 23.12]; here we give an outline: The low-velocity constraint |uj |/u0 ≪ 1 on the 4-velocity of a particle, together with its normalization uα uβ gαβ and the near flatness of the metric (23.87), implies that dxj j ∼ j ∼ u =1, u =v ≡ . dt 0

(23.92)

Since u0 = dt/dτ , the first of these relations implies that in our nearly globally Lorentz coordinate system the coordinate time is very nearly equal to the proper time of our slowspeed particle. In this way, we recover the “universal time” of Newtonian theory. The universal, Euclidean space is that of our nearly Lorentz frame, with hµν completely ignored because of its smallness. These universal time and universal Euclidean space become the arena in which Newtonian physics is formulated.

31 Equation (23.92) for the components of a particle’s 4-velocity, together with |v j | ≪ 1 and |hµν | ≪ 1, imply that the geodesic equation for a freely moving particle at leading nontrivial order is d ∂ dv j ∼ 1 ≡ +v·∇. (23.93) = h00,j where dt 2 dt ∂t (Because our spatial coordinates are Cartesian, we can put the spatial index j up on one side of the equation and down on the other without creating any danger of error.) By comparing Eq. (23.93) with Newton’s equation of motion for the particle, we deduce that h00 must be related to the Newtonian gravitational potential by h00 = −2Φ ,

(23.94)

so the spacetime metric in our nearly globally Lorentz coordinate system must be ds2 = −(1 + 2Φ)dt2 + (δjk + hjk )dxj dxk + 2h0j dt dxj .

(23.95)

Because gravity is weak, only those parts of the Einstein tensor that are linear in hαβ are significant; quadratic and higher-order contributions can be ignored. Now, by the same mathematical steps as led us to Eq. (23.51) for the components of the Riemann tensor in a local Lorentz frame, one can show that linearized Riemann tensor in our nearly global Lorentz frame have that same form, i.e. (setting gαβ = ηαβ + hαβ ) 1 Rαβγδ = (hαδ,βγ + hβγ,αδ − hαγ,βδ − hβδ,αγ ) . 2

(23.96)

From this equation and the slow-motion constraint |hαβ,t | ≪ |hαβ,j |, we infer that the spacetime-space-time components of Riemann are 1 Rj0k0 = − h00,jk = Φ,jk = Ejk . 2

(23.97)

In the last step we have used Eq. (23.94). We have thereby recovered the relation between the Newtonian tidal field Ejk ≡ Φ,jk and the Relativistic tidal field Rj0k0. That relation can now be used, via the train of arguments in the preceding section, to show that the Einstein field equation Gµν = 8πT µν reduces to the Newtonian field equation ∇2 Φ = 4πT 00 ≡ 4πρ. This analysis leaves the details of h0j and hjk unknown, because the Newtonian limit is insensitive to them.

23.9.2

Linearized Theory

There are many systems in the universe that have weak gravity, but for which the slow-motion approximations (23.88)–(23.90) and/or weak-stress approximation (23.91) fail. Examples are electromagnetic fields and high-speed particles. For such systems we need a generalization of Newtonian theory that drops the slow-motion and weak-stress constraints, but keeps the weak-gravity constraint gαβ = ηαβ + hαβ ,

with |hαβ | ≪ 1 .

(23.98)

32 The obvious generalization is a linearization of general relativity in hαβ , with no other approximations being made—the so-called linearized theory of gravity. In this subsection we shall develop it. In formulating linearized theory we can regard the metric pertubation hµν as a gravitational field that lives in flat spacetime, and correspondingly we can carry out our mathematics as though we were in special relativity. In other words, linearized theory can be regarded as a field theory of gravity in flat spacetime—the type of theory that Einstein toyed with then rejected (Sec. 23.1 above). In linearized theory, the Riemann tensor takes the form (23.96), but we have no right to simplify it further into the form (23.97), so we must follow a different route to the Einstein field equation: Contracting the first and third indices in (23.96), we obtain the linearized Ricci tensor Rµν , contracting once again we obtain the scalar curvature R, and then from Eq. (23.77) we obtain for the Einstein tensor and the Einstein field equation 2Gµν = hµα,ν α + hνα,µ α − hµν,α α − h,µν − ηµν (hαβ ,αβ − h,β β ) = 16πTµν .

(23.99)

Here all indices that follow the comma are partial-derivative indices, and h ≡ η αβ hαβ

(23.100)

is the “trace” of the metric perturbation. We can simplify the field equation (23.99) by reexpressing it in terms of the quantity ¯ µν ≡ hµν − 1 hηµν . h 2

(23.101)

¯≡h ¯ αβ η αβ = One can easily check that this quantity has the opposite trace to that of hµν (h −h), so it is called the trace-reversed metric perturbation. In terms of it, the field equation (23.99) becomes ¯ µν,α α − ηµν h ¯ αβ, αβ + h ¯ µα,ν α + h ¯ να,µ α = 16πTµν . −h (23.102) We can simplify this field equation further by specializing our coordinates. We introduce a new nearly globally Lorentz coordinate system that is related to the old one by xαnew (P) = xαold (P) + ξµ (P) ,

(23.103)

where ξµ is a very small vectorial displacement of the coordinate grid. This change of coordinates via four arbitrary functions (α = 0, 1, 2, 3) produces a change of the functional form of the metric perturbation hαβ to old hnew µν = hµν − ξµ,ν − ξν,µ ,

(23.104)

[Ex. 23.13] and a corresponding change of the trace-reversed metric pertubation. This is linearized theory’s analog of a gauge transformation in electromagnetic theory. Just as an

33 electromagnetic gauge alters the vector potential Anew = Aold µ µ − ψ,µ , so the linearized-theory ¯ gauge change alters hµν and hµν ; and just as the force-producing electromagnetic field tensor Fµν is unaffected by an electromagnetic gauge change, so the tidal-force-producing linearized Riemann tensor is left unaffected by the gravitational gauge change. By a special choice of the four functions ξ α , we can impose the following four gauge ¯ µν : conditions on h ¯ µν, ν = 0 . h (23.105) These, obviously, are linearized theory’s analog of the electromagnetic Lorentz gauge condition Aµ ,µ = 0, so they are called the gravitational Lorenz gauge. Just as the flat-spacetime Maxwell equations take the remarkably simple wave-equation form Aµ,α α = 4πJµ in Lorenz gauge, so also the linearized Einstein equation (23.102) takes the corresponding simple waveequation form in gravitational Lorenz gauge: ¯ µν,α α = 16πTµν . −h

(23.106)

By the same method as one uses in electromagnetic theory, one can solve this gravitational ¯ µν produced by an arbitrary stress-energy-tensor source: field equation for the field h Z 4Tµν (t − |x − x′ |, x′ ]) ¯ µν (t, x) = (23.107) dVx′ h |x − x′ | The quantity in the numerator is the stress-energy source evaluated at the “retarded time” t′ = t − |x − x′ |. This equation for the field, and the wave equation (23.106) that underlies it, show explicitly that dynamically changing distributions of stress-energy must generate gravitational waves, which propagate outward from their source at the speed of light (Einstein, 1918). We shall study these gravitational waves in Chap. 25.

23.9.3

Gravitational Field Outside a Stationary, Linearized Source

Let us specialize to a time-independent source (so Tµν,t = 0 in our chosen nearly globally Lorentz frame), and compute its external gravitational field as a power series in 1/(distance to source). We place our origin of coordinates at the source’s center of mass, so Z xj T 00 dVx = 0 , (23.108) and in the same manner as in electromagnetic theory, we expand ′

1 xj xj 1 = + 3 + ... , |x − x′ | r r

(23.109)

where r ≡ |x| is the distance of the field point from the source’s center of mass. Inserting Eq. (23.109) into the general solution (23.107) of the Einstein equation and taking note of the conservation laws T αj ,j = 0, we obtain for the source’s external field k m 2ǫ S x 4M 1 1 jkm ¯ 0j = − ¯ ij = O 1 ; ¯ 00 = , h h +O 3 , h + O (23.110) r r r3 r3 r3

34 Here M and S k are the source’s mass and angular momentum: M≡

Z

00

T dVx ,

Sk ≡

Z

ǫkab xa T 0b dVx .

(23.111)

see Ex. 23.14. This expansion in 1/r, as in the electromagnetic case, is a multipolar expansion. At order 1/r the field is spherically symmetric and the monopole moment is the source’s mass M. At order 1/r 2 there is a “magnetic-type dipole moment”, the source’s spin angular momentum Sk . These are the leading-order moments in two infinite sets: the “mass multipole” moments (analog of electric moments), and the “mass-current multipole” moments (analog of magnetic moments). For details on all the higher order moments, see, e.g., Thorne (1980). ¯ αβ − The metric perturbation can be computed by reversing the trace reversal, hαβ = h ¯ Thereby we obtain for the spacetime metric gαβ = ηαβ + hαβ at linear order, outside ηαβ h. the source, 4ǫjkm S k xm 1 2M j k j dt − δ dx dx + O dxα dxβ . dtdx + 1 + jk r3 r r3 (23.112) In spherical polar coordinates, with the polar axis along the direction of the source’s angular momentum, the leading order terms take the form

2M ds = − 1 − r 2

2

4S 2M 2 dt − (dr 2 + r 2 dθ2 + r 2 sin2 θdφ2 ) , sin θdtdφ + 1 + r r (23.113) where S ≡ |S| is the magnitude of the source’s angular momentum. This is a very important result. It tells us that we can “read off” the mass M and angular momentum S k from the asymptotic form of the source’s metric. In the next chapter we shall devise, from the metric (23.113), physical measurements that one can make outside the source to determine its mass and angular momentum. As one would expect from Newtonian theory, the mass M will show up as the source of a “gravitational acceleration” that can be measured via Kepler’s laws for an orbiting particle. It will turn out that the angularmomentum term in the metric shows up physically via a dragging of inertial frames that causes inertial-guidance gyroscopes near the body to precess relative to the “distant stars”. For a time-independent body with strong internal gravity (e.g. a black hole), the distant gravitational field will have the same general form (23.112), (23.113) as for a weakly gravitating body, but the constants M and S k that appear in the metric will not be expressible as the integrals (23.111) over the body’s interior. Nevertheless, they will be measureable by the same techniques as for a weakly gravitating body (Kepler’s laws and frame dragging), and they can be interpreted as the body’s total mass and angular momentum. 2M ds = − 1 − r 2

2

35

23.9.4

Conservation Laws for Mass, Momentum and Angular Momentum

Consider a static (unmoving) sphere S surrounding our time-independent source of gravity, ¯ µν and in the metric [Eqs. with such a large radius r that the O(1/r 3 ) corrections in h (23.111)–(23.113)] can be ignored. Suppose that a small amount of mass-energy E (as measured in the sphere’s and source’s rest frame) is injected through the sphere, into the source. Then special relativistic law of mass-energy conservation tells us that the source’s R the 00 mass M = T dVx will increase by ∆M = E. Similarly, if an energy flux T 0j flows through the sphere, the source’s mass will change by dM =− dt

Z

T 0j dΣj ,

(23.114)

S

where dΣj is the sphere’s outward-pointing surface-area element, and the minus sign is due to the fact that dΣj points outward, not inward. Since M is the mass that appears in ¯ µν and metric gαβ , this conservation law can be the source’s asymptotic gravitational field h regarded as describing how the source’s gravitating mass changes when energy is injected into it. From the special relativistic law for angular momentum conservation, we deduce a similar result: A flux ǫijk xj T km of angular momentum through the sphere produces the following ¯ µν and change in the angular momentum Sk that appears in the source’s asymptotic field h metric: Z dSi = − ǫijk xj T km dΣm . (23.115) dt S There is also a conservation law for a gravitationally measured linear momentum. That linear momentum does not show up in the asymptotic field and metric that we wrote down above [Eqs. (23.111)–(23.113)] because our coordinates were chosen to be attached to the source’s center of mass—i.e., they are the Lorentz coordinates of the source’s rest frame. However, if linear momentum Pj is injected through our sphere S and becomes part of the source, then the source’s center of mass will start moving, and the asymptotic metric will acquire a new term δg0j = −4Pj /r , (23.116) where (after the injection) j

Pj = P =

Z

T 0j dVx

(23.117)

¯ 0j = −h ¯ 0j = −h0j = −δg0j ; also see Ex 23.14b]. More generally, the [see Eq. (23.107) with h rate of change of the source’s total linear momentum (the Pj term in the asymptotic g0j ) is the integral of the inward flux of momentum (inward component of the stress tensor) across the sphere: Z dPj (23.118) = − T jk dΣj . dt S

36 For a time-independent source with strong internal gravity, not only does the asymptotic metric, far from the source, have the same form (23.112), (23.113), (23.116) as for a weakly gravitating source; the conservation laws (23.114), (23.115), (23.118) for its gravitationally measured mass, angular momentum and linear momentum continue to hold true. The sphere S, of course, must be placed far from the source, in a region where gravity is very weak, so linearized theory will be valid in the vicinity of S. When this is done, then the special relativistic description of inflowing mass, angular momentum and energy is valid at S, and the linearized Einstein equations, applied in the vicinity of S (and not extended into the stronggravity region), turn out to guarantee that the M, Sj and Pj appearing in the asympototic metric evolve in accord with the conservation laws (23.114), (23.115), (23.118). For strongly gravitating sources, these conservation laws owe their existence to the spacetime’s asymptotic time-translation, rotation, and space-translation symmetries. In generic, strong-gravity regions of spacetime there are no such symmetries, and correspondingly no integral conservation laws for energy, angular momentum, or linear momentum. If a strongly gravitating source is dynamical rather than static, it will emit gravitational waves (Chap. 25). The amplitudes of those waves, like the influence of the source’s mass, die out as 1/r far from the source, so spacetime retains its asymptotic time-translation, rotation and space-translation symmetries. These symmetries continue to enforce integral conservation laws on the gravitationally measured mass, angular momentum and linear momentum [Eqs. (23.114), (23.115), (23.118)], but with the new requirement that one include, in the fluxes through S, contributions from the gravitational waves’ energy, angular momentum and linear momentum; see Chap. 25. For a more detailed and rigorous derivation and discussion of these asymptotic conservation laws, see Chaps. 18 and 19 of MTW. **************************** EXERCISES Exercise 23.12 Derivation: Newtonian limit of general relativity Consider a system that can be covered by a nearly globally Lorentz coordinate system in which the Newtonian-limit constraints (23.87)–(23.91) are satisfied. For such a system, flesh out the details of the text’s derivation of the Newtonian limit. More specifically: (a) Derive Eq. (23.92) for the components of the 4-velocity of a particle. (b) Show that the geodesic equation reduces to Eq. (23.93). (c) Show that to linear order in the metric perturbation hαβ the components of the Riemann tensor take the form (23.96). (d) Show that in the slow-motion limit the space-time-space-time components of Riemann take the form (23.97). Exercise 23.13 Derivation: Gauge Transformations in Linearized Theory

37 (a) Show that the “infinitesimal” coordinate transformation (23.103) produces the change (23.104) of the linearized metric perturbation. (b) Exhibit a differential equation for the ξ α that brings the metric perturbation into gravitational Lorenz gauge, i.e. that makes hnew µν obey the Lorenz gauge condtion (23.105) (c) Show that in gravitational Lorenz gauge, the Einstein field equation (23.102) reduces to (23.106). Exercise 23.14 Derivation: External Field of Stationary, Linearized Source Derive Eqs. (23.110) for the trace reversed metric perturbation outside a stationary (timeindependent), linearized source of gravity. More specifically: ¯ 00 . In your derivation identify a dipolar term of the form 4Dj xj /r 3 , and (a) First derive h show that by placing the origin of coordinates on the center of mass, Eq. (23.108), one causes the dipole moment Dj to vanish. ¯ 0j . The two terms in (23.109) should give rise to two terms. The first (b) Next derive h of these is 4Pj /r where Pj is the source’s linear momentum. Show, using the gauge ¯ 0µ = 0 [Eq. (23.105)] that if the momentum is nonzero, then the mass dipole condition h ,µ term of part (a) must have a nonzero time derivative, which violates our assumption of stationarity. Therefore, for this source the linear momentum must vanish. Show that ¯ 0j of Eq. (23.110). [Hint: you will have to add a the second term gives rise to the h 0a′ j ′ m′ perfect divergence, (T x x ),a′ to the integrand.] ¯ ij . [Hint: Show that T ij = (T ia xi ),a and thence that the volume integral (c) Finally derive h ij of T vanishes; and similarly for T ij xk .]

****************************

Bibliographic Note For a superb, detailed historical account of Einstein’s intellectual struggle to formulate the laws of general relativity, see Pais (1982). For Einstein’s papers of that era, in the original German and in English translation, with detailed annotations and explanations by editors with strong backgrounds in both physics and history of science, see Einstein (1989–2002). For some key papers of that era by other major contributors besides Einstein, in English translation, see Einstein, Lorentz, Minkowski and Weyl (1923). This chapter’s pedagogical approach to presenting the fundamental concepts of general relativity is strongly influenced by MTW (Misner, Thorne and Wheeler 1973), where readers will find much greater detail. See, especially, Chap. 8 for the the mathematics (differential geometry) of curved spacetime, or Chaps. 9–14 for far greater detail; Chap. 16 for the Einstein equivalence principle and how to lift laws of physics into curved spacetime; Chap. 17 for the Einstein field equations and many different ways to derive them; Chap. 18 for

38 Box 23.2 Important Concepts in Chapter 23 • Local Lorentz frame, Sec. 23.2 – Nonmeshing of local Lorentz frames due to spacetime curvature, Sec. 23.3 – Metric and connection coefficients in Local Lorentz frame, Eqs. (23.15) and (23.16) • Principle of relativity, Sec. 23.2 • Motion of a freely falling particle: geodesic with ∇p~p~ = 0, Sec. 23.4 – Geodesic equation in any coordinate system, Eq. (23.26) – Conserved quantity associated with symmetry of the spacetime, Ex. 23.4 – Action principle for geodesic, Ex. 23.5 • Tidal Gravity and Spacetime Curvature – Newtonian Tidal field Eij = ∂ 2 Φ/∂xi ∂xj , Sec. 23.5.1 – Riemann curvature as tidal field; equation of geodesic deviation, Sec. 23.5.1 – Connection of relavistic and Newtonian tidal fields, Rj0k0 = Ejk , Sec. 23.5.2 – Tidal field outside the Earth or other spherical, gravitating body, Eq. (23.44). • Properties of Riemann tensor (symmetries, Ricci Tensor, Curvature scalar, how to compute its components, radius of curvature of spacetime), Sec. 23.6 • Einstein’s equivalence principle: Lifting the laws of physics into curved spacetime, Secs. 23.2 and 23.7 – Curvature coupling effects, Sec. 23.7 – Breakdown of global conservation laws for energy and momentum, Sec. 23.7 • The Einstein field equation, Sec. 23.8 – Its connection to Newton’s field equation, Sec. 23.8 – Einstein tensor and its vanishing divergence, Sec. 23.8 • Geometrized units, Eq. (23.85) and Table 23.1 • Newtonian limit of general relativity, Sec. 23.9.1 – Conditions for validity: weak gravity, slow motion, small stresses, Sec. 23.9.1 • Linearized theory, Sec. 23.9.2 – Gravitational Lorenz gauge, Eq. (23.105) – Wave equation for metric pertubation, with stress-energy tensor as source, Eqs. (23.106), (23.107) • Metric outside a stationary, linearized source, Sec. 23.9.3 – Roles of mass and angular momentum in metric, Sec. 23.9.3 – Integral conservation laws for source’s mass, linear momentum and angular momentum, Sec. 23.9.4

39 weak gravitational fields (the Newtonian limit and Linearized Theory); and Chaps. 19 and 20 for the metric outside a stationary, linearized source and for the source’s conservation laws for mass, momentum, and angular momentum. For a superb, elementary introduction to the fundamental concepts of general relativity from a viewpoint that is somewhat less mathematical than this chapter or MTW, see Hartle (2003). We also recommend, at a somewhat elementary level, Schutz (1985), and at a more advanced level, Carroll (2004), and at a very advanced and mathematical level, Wald (1984).

Bibliography Carroll, S. M., 2004. Spacetime and Geometry: An Introduction to General Relativity, San Francisco: Addison Wesley. ¨ Einstein, Albert, 1907. “Uber das Relativitätsprinzip und die aus demselben gesogenen Folgerungen,” Jahrbuch der Radioaktivität und Elektronik, 4, 411–462; English translation: paper 47 in The Collected Papers of Albert Einstein, Volume 2, Princeton University Press, Princeton, NJ. Einstein, Albert, 1915. “Die Feldgleichungen der Gravitation,” Preuss. Akad. Wiss. Berlin, Sitzungsber, 1915 volume, 844–847. Einstein, Albert, 1916. “Die Grundlage der allgemeinen Relativitätstheorie,” Annalen der Physik, 49, 769–822. English translation in Einstein et al . (1923). ¨ Einstein, Albert, 1918. “Uber Gravitationswellen,” Sitzungsberichte der Königlish Preussischen Akademie der Wissenschaften, 1918 volume, 154–167. Einstein, Albert, Lorentz, Hendrik A., Minkowski, Hermann, and Weyl, Hermann, 1923. The Principle of Relativity, Dover, New York. Einstein, Albert, 1989–2002. The Collected Papers of Albert Einstein, Volumes 2–7, Princeton University Press, Princeton, NJ; and http://www.einstein.caltech.edu/ Hartle, J. B., 2003. Gravity: An Introduction to Einstein’s General Relativity, San Francisco: Addison-Wesley. Hilbert, David, 1915. “Die Grundlagen der Physik,” Königl. Gesell. d. Wiss. Göttingen, Nachr., Math.-Phys. Kl., 1917 volume, 53–76. MTW: Misner, Charles W., Thorne, Kip S., and Wheeler, John A., 1973. Gravitation, W. H. Freeman & Co., San Francisco. Minkowski, Hermann, 1908. “Space and Time,” Address at the 80th Assembly of German Natural Scientists and Physicians, at Cologne, 21 September 1908; text published posthumously in Annalen der Physik, 47, 927 (1915); English translation in Einstein et al . (1923).

40 Pais, Abraham, 1982. Subtle is the Lord . . . : The Science and Life of Albert Einstein, Oxford University Press: Oxford. Schutz, B. 1980. Geometrical Methods of Mathematical Physics, Cambridge: Cambridge University Press. Thorne, Kip S., 1980. “Multipole expansions of gravitational radiation,” Reviews of Modern Physics, 52, 299; especially Secs. VIII and X. Wald, R. M. 1984. General Relativity, Chicago: University of Chicago Press. Will, Clifford M., 1981. Theory and Experiment in Gravitational Physics, Cambridge University Press: Cambridge. Will, Clifford M., 1986. Was Einstein Right? Basic Books, New York. Will, Clifford M., 2006. “The Confrontation between General Relativity and Experiment,” Living Reviews in Relativity, 9, 3, (2006). URL (cited in April 2007): http://www.livingreviews.org/lrr-2006-3

Contents 24 Relativistic Stars and Black Holes 24.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.2 Schwarzschild’s Spacetime Geometry . . . . . . . . . . . . . . . . . . . 24.3 Static Stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.3.1 Birkhoff’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 24.3.2 Stellar Interior . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.3.3 Local Energy and Momentum Conservation . . . . . . . . . . . 24.3.4 Einstein Field Equation . . . . . . . . . . . . . . . . . . . . . . 24.3.5 Stellar Models and Their Properties . . . . . . . . . . . . . . . . 24.4 Gravitational Implosion of a Star to Form a Black Hole . . . . . . . . . 24.5 Spinning Black Holes: The Kerr Spacetime . . . . . . . . . . . . . . . . 24.5.1 The Kerr Metric for a Spinning Black Hole . . . . . . . . . . . . 24.5.2 Dragging of Inertial Frames . . . . . . . . . . . . . . . . . . . . 24.5.3 The Light-Cone Structure, and the Horizon . . . . . . . . . . . 24.5.4 Evolution of Black Holes: Rotational Energy and Its Extraction 24.6 The Many-Fingered Nature of Time . . . . . . . . . . . . . . . . . . . .

0

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

1 1 2 8 8 9 12 14 15 20 31 31 32 33 35 41

Chapter 24 Relativistic Stars and Black Holes Version 0824.1.K.pdf, 6 May 2009 Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125

Box 24.1 Reader’s Guide • This chapter relies significantly on – The special relativity portions of Chap. 1. – Chapter 22, on the transition from special relativity to general relativity. – Chapter 23, on the fundamental concepts of general relativity. • Portions of this chapter are a foundation for the applications of general relativity theory to gravitational waves (Chap. 25) and to cosmology (Chap. 26).

24.1

Overview

Having sketched the fundamentals of Einstein’s theory of gravity, general relativity, we shall now illustrate his theory by means of several concrete applications: stars and black holes in this chapter, gravitational waves in Chap. 25, and the large-scale structure and evolution of the universe in Chap. 26. While stars and black holes are the central thread of this chapter, we study them less for their own intrinsic interest than for their roles as vehicles by which to understand general relativity: Using them we shall elucidate a number of issues that we have already met: the physical and geometric interpretations of spacetime metrics and of coordinate systems, the Newtonian limit of general relativity, the geodesic motion of freely falling particles and photons, local Lorentz frames and the tidal forces measured therein, proper reference frames, the Einstein field equations, the local law of conservation of 4-momentum, and the asymptotic 1

2 structure of spacetime far from gravitating sources. Stars and black holes will also serve to introduce several new physical phenomena that did not show up in our study of the foundations of general relativity: the gravitational redshift, the “many-fingered” nature of time, event horizons, and spacetime singularities. We begin this chapter, in Sec. 24.2, by studying the geometry of the curved spacetime outside any static star, as predicted by the Einstein field equation. In Sec. 24.3 we study general relativity’s description of the interiors of static stars. In Sec. 24.4 we turn attention to the spherically symmetric gravitational implosion by which a nonrotating star is transformed into a black hole, and to the “Schwarzschild” spacetime geometry outside and inside the resulting static, spherical hole. In Sec. 24.5 we study the “Kerr” spacetime geometry of a spinnning black hole. Finally, in Sec. 24.6 we elucidate the nature of “time” in the curved spacetimes of general relativity.

24.2

Schwarzschild’s Spacetime Geometry

On January 13, 1916, just seven weeks after formulating the final version of his field equation, G = 8πT, Albert Einstein read to a meeting of the Prussian Academy of Sciences in Berlin a letter from the eminent German astrophysicist Karl Schwarzschild. Schwarzschild, as a member of the German army, had written from the World-War-One Russian front to tell Einstein of a mathematical discovery he had made: he had found the world’s first exact solution to the Einstein field equation. Written as a line element in a special coordinate system (coordinates named t, r, θ, φ) that Schwarzschild invented for the purpose, Schwarzschild’s solution takes the form (Schwarzschild 1916a) ds2 = −(1 − 2M/r)dt2 +

dr 2 + r 2 (dθ2 + sin2 θdφ2 ) , (1 − 2M/r)

(24.1)

where M is a constant of integration. The connection coefficients, Riemann tensor, and Ricci and Einstein tensors for this metric can be computed by the methods of Chaps. 22 and 23; see Ex. 24.1. The results are tabulated in Box 24.2. The key bottom line is that the Einstein tensor vanishes. Therefore, the Schwarzschild metric (24.1) is a solution of the Einstein field equations with vanishing stress-energy tensor. Many readers know already the lore of this subject: The Schwarzschild spacetime is reputed to represent the vacuum exterior of a nonrotating, spherical star; and also the exterior of a spherical star as it implodes to form a black hole; and also the exterior and interior of a nonrotating, spherical black hole; and also a wormhole that connects two different universes or two widely separated regions of our own universe. How does one discover these physical interpretations of the Schwarzschild metric (24.1)? The tools for discovering them—and, more generally, the tools for interpreting physically any spacetime metric that one encounters—are a central concern of this chapter. When presented with a line element such as (24.1), one of the first questions one is tempted to ask is “What is the nature of the coordinate system?” Since the metric coefficients

3 will be different in some other coordinate system, surely one must know something about the coordinates in order to interpret the line element. Remarkably, one need not go to the inventor of the coordinates to find out their nature. Instead one can turn to the line element itself: the line element (or metric coefficients) contain full information not only about the details of the spacetime geometry, but also about the nature of the coordinates. The line element (24.1) is a good example: Look first at the 2-dimensional surfaces in spacetime that have constant values of t and r. We can regard {θ, φ} as a coordinate system on each such 2-surface; and the spacetime line element (24.1) tells us that the geometry of the 2-surface is given in terms of those coordinates by (2) ds2 = r 2 (dθ2 + sin2 θdφ2 ) (24.2) (where the prefix (2) refers to the dimensionality of the surface). This is the line element (metric) of an ordinary, everyday 2-dimensional sphere expressed in standard spherical polar coordinates. Thus, we have learned that the Schwarzschild spacetime is spherically symmetric, and moreover that θ and φ are standard spherical polar coordinates. Here is an example of extracting from a metric information about both the coordinate-independent spacetime geometry and the coordinate system being used. Note, further, from Eq. (24.2) that the circumferences and surface areas of the spheres (t, r) = const in Schwarzschild spacetime are given by circumference = 2πr ,

area = 4πr 2 .

(24.3)

This tells us one aspect of the geometric interpretation of the r coordinate: r is a radial coordinate in the sense that the circumferences and surface areas of the spheres in Schwarzschild spacetime are expressed in terms of r in the standard manner (24.3). We must not go further, however, and assert that r is radius in the sense of being the distance from the center of one of the spheres to its surface. The center, and the line from center to surface, do not lie on the sphere itself and they thus are not described by the spherical line element (24.2). Moreover, since we know that spacetime is curved, we have no right to expect that the distance from the center of a sphere to its surface will be given by distance = circumference/2π = r as in flat spacetime. Returning to the Schwarzschild line element (24.1), let us examine several specific regions of spacetime: At “radii” r large compared to the integration constant M, the line element (24.1) takes the form ds2 = −dt2 + dr 2 + r 2 (dθ2 + sin2 θdφ2 ) .

(24.4)

This is the line element of flat spacetime, ds2 = −dt2 + dx2 + dy 2 + dz 2 written in spherical polar coordinates [x = r sin θ cos φ, y = r sin θ sin φ, z = r cos θ]. Thus, Schwarzschild spacetime is asymptotically flat in the region of large radii r/M → ∞. This is just what one might expect physically when one gets far away from all sources of gravity. Thus, it is reasonable to presume that the Schwarzschild spacetime geometry is that of some sort of isolated, gravitating body which is located in the region r ∼ M. The large-r line element (24.4) not only reveals that Schwarzschild spacetime is asymptotically flat; it also shows that in the asymptotically flat region the Schwarzschild t is the time

4 Box 24.2 Connection Coefficients and Curvature Tensors for Schwarzschild The coordinate basis vectors for the Schwarzschild solution are ~et =

∂ ∂ ∂ ∂ ~ ~ ~ ~ , ~er = , ~eθ = , ~eφ = ; ~et = ∇t, ~er = ∇r, ~eθ = ∇θ, ~eφ = ∇φ. (1) ∂t ∂r ∂θ ∂φ

The covariant metric coefficients in this coordinate basis are [cf. Eq. (24.1)] 2M 1 gtt = − 1 − , grr = , gθθ = r 2 , gφφ = r 2 sin2 θ ; r (1 − 2M/r) and the contravariant metric coefficients are the inverse of these 1 1 2M 1 rr tt , g θθ = 2 g φφ = 2 2 . , g = 1− g =− (1 − 2M/r) r r r sin θ

(2a)

(2b)

The nonzero connection coefficients in this coordinate basis are Γt rt = Γt tr =

M 1 , 2 r (1 − 2M/r)

Γr tt =

Γr θθ = −r(1 − 2M/r) , Γr φφ = −r sin2 θ(1 − 2M/r) ,

M (1 − 2M/r) , r2

Γr rr = −

Γθ rθ = Γθ θr = Γφ rφ = Γφ φr =

Γθ φφ = − sin θ cos θ ,

M 1 , 2 r (1 − 2M/r) 1 , r

(3)

Γφ θφ = Γφ φθ = cot θ ,

The orthonormal basis associated with the above coordinate basis is r 2M ∂ ∂/∂t 1 ∂ 1 ∂ , ~erˆ = 1 − ~eˆ0 = p , ~eθˆ = , ~eφˆ = . r ∂r r ∂θ r sin θ ∂φ 1 − 2M/r

(4)

The nonzero connection coefficients in this orthonormal basis are ˆ

Γrˆtˆtˆ = Γt rˆtˆ =

r2

p

M , 1 − 2M/r

ˆ

ˆ

Γφ θˆφˆ = −Γθ φˆφˆ =

cot θ , r

p

1 − 2M/r . r The nonzero components of the Riemann tensor in this orthonormal basis are ˆ Γθ rˆθˆ

φˆ

=Γ

Rrˆtˆrˆtˆ = −Rθˆφˆθˆφˆ = −

rˆφˆ

=

2M , r3

−Γrˆθˆθˆ

=

−Γrˆφˆφˆ

=

Rθˆtˆθˆtˆ = Rφˆtˆφˆtˆ = −Rrˆφˆ ˆr φˆ = −Rrˆθˆ ˆr θˆ =

M , r3

(5)

(6)

and those obtainable from these via the symmetries (23.52) of Riemann. The Ricci tensor, curvature scalar, and Einstein tensor all vanish—which implies that the Schwarzschild metric is a solution of the vacuum Einstein field equations.

5 coordinate of a Lorentz reference frame. Notice that the region of strong spacetime curvature has a boundary (say, r ∼ 100M) that remains forever fixed relative to the asymptotically Lorentz spatial coordinates x = r sin θ cos φ, y = r sin θ sin φ, z = r cos θ. This means that the asymptotic Lorentz frame can be regarded as the body’s asymptotic rest frame. We conclude, then, that far from the body the Schwarzschild t coordinate becomes the Lorentz time of the body’s asymptotic rest frame, and the Schwarzschild r, θ, φ coordinates become spherical polar coordinates in the body’s asymptotic rest frame. As we move inward from r = ∞, we gradually begin to see spacetime curvature. That curvature shows up, at r ≫ M, in slight deviations of the Schwarzschild metric coefficients from those of a Lorentz frame: to first order in M/r the line element (24.1) becomes 2M 2M 2 2 dt + 1 + dr 2 + r 2 (dθ2 + sin2 θdφ2 ) . (24.5) ds = − 1 − r r or, equivalently, in Cartesian spatial coordinates, ! 2M y z 2 2M x 2 2 2 2 2 ds = − 1 − p dx + dy + dz . (24.6) dt + dx + dy + dz + r r r r x2 + y 2 + z 2 It is reasonable to expect that, at these large radii where the curvature is weak, Newtonian gravity will be a good approximation to Einsteinian gravity. In Sec. 23.9.1 of the last chapter we studied in detail the transition from general relativity to Newtonian gravity, and found that, in nearly Newtonian situations if one uses a nearly globally Lorentz coordinate system (as we are doing), the line element should take the form [Eq. (23.95)] ds2 = −(1 + 2Φ)dt2 + (δjk + hjk )dxj dxk + 2htj dt dxj ,

(24.7)

where hµν are metric corrections that are very small compared to unity and Φ (which shows up in the time-time part of the metric) is the Newtonian potential. Direct comparison of (24.7) with (24.6) shows that a Newtonian description of the body’s distant gravitational field will entail a Newtonian potential given by Φ=−

M r

(24.8)

(Φ = −GM/r in cgs units). This, of course, is the external Newtonian field of a body with mass M. Thus, the integration constant M in the Schwarzschild line element is the mass which characterizes the body’s distant, nearly Newtonian gravitational field . This is an example of reading the mass of a body off the asymptotic form of the metric (Sec. 23.9.3). Notice that the asymptotic metric here [Eq. (24.5)] differs in its spatial part from that in Sec. 23.9.3 [Eq. (23.113)]. This difference arises from the use of different radial coordinates here and there: If we define r¯ by r = r¯ + M at radii r ≫ M, then to linear order in M/r, the asymptotic Schwarzschild metric (24.5) becomes 2M 2M 2 2 ds = − 1 − dt + 1 + [d¯ r 2 + r¯2 (dθ2 + sin2 θdφ2 )] , (24.9) r¯ r¯

6 which is the same as Eq. (23.113) with vanishing angular momentum Sj = 0. This easy change of the spatial part of the metric reinforces the fact that one reads the asymptotic Newtonian potential and the source’s mass M off the time-time components of the metric, and not the spatial part of the metric. We can describe the physical interpretation of M as the body’s mass in operational terms as follows: Suppose that a test particle (e.g., a small planet) moves around our central body in a circular orbit with radius r ≫ M. A Newtonian analysis of the orbit predicts that, as 1 measured using Newtonian time, the period of the orbit will be P = 2π(r 3 /M) 2 . Moreover, since Newtonian time is very nearly equal to the time t of the nearly Lorentz coordinates used in (24.5) [cf. Sec. 23.9.1], and since that t is Lorentz time in the body’s relativistic, asymptotic rest frame, the orbital period as measured by observers at rest in the asymptotic 1 rest frame must be P = 2π(r 3 /M) 2 . Thus, M is the mass that appears in Kepler’s laws for the orbits of test particles far from the central body. This quantity is often called the body’s “active gravitational mass,” since it is the mass that characterizes the body’s gravitational pull. It is also called the body’s “total mass-energy” because it turns out to include all forms of mass and energy that the body possesses (rest mass, internal kinetic energy, and all forms of internal binding energy including gravitational). We note, in passing, that one can use general relativity to deduce the Keplerian role of M without invoking the Newtonian limit: We place a test particle in the body’s equatorial plane θ = π/2 at a radius r ≫ M, and we give it an initial velocity that lies in the equatorial plane. Then symmetry guarantees the body will remain in the equatorial plane: there is no way to prefer going toward north, θ < π/2, or toward south, θ > π/2. We, further, adjust the initial velocity so the particle remains always at a fixed radius. Then the only nonvanishing components uα = dxα /dτ of the particle’s 4-velocity will be ut = dt/dτ and uφ = dφ/dτ . The particle’s orbit will be governed by the geodesic equation ∇u~ ~u = 0, where ~u is its 4-velocity. The radial component of this geodesic equation, computed in Schwarzschild coordinates, is [cf. Eq. (23.26) with a switch from affine parameter ζ to proper time τ = mζ] d2 r dxµ dxν dt dt dφ dφ r = −Γ = −Γr tt − Γr φφ . µν 2 dτ dτ dτ dτ dτ dτ dτ

(24.10)

(Here we have used the vanishing of all dxα /dτ except the t and φ components, and have used the vanishing of Γr tφ = Γr φt [Eq. (3) of Box 24.2].) Since the orbit is circular, with fixed r, the left side of (24.10) must vanish; and correspondingly the right side gives 1 12 M dφ dφ/dτ Γr tt 2 = , (24.11) = = − r dt dt/dτ Γ φφ r3 where we have used the values of the connection coefficients from Eq. (3) of Box 24.2, specialized to the equatorial plane θ = π/2. Equation (24.11) tells us that the amount of coordinate time t required for the particle to circle the central body once, 0 ≤ φ ≤ 2π, is 1 ∆t = 2π(r 3/M) 2 . Since t is the Lorentz time of the body’s asymptotic rest frame, this means that observers in the asymptotic rest frame will measure for the particle an orbital period 1 P = ∆t = 2π(r 3 /M) 2 . This, of course, is the same result as we obtained from the Newtonian limit—but our relativistic analysis shows it to be true for circular orbits of arbitrary radius r, not just for r ≫ M.

7 Next we shall move inward, from the asymptotically flat region of Schwarzschild spacetime, toward smaller and smaller radii. As we do so, the spacetime geometry becomes more and more strongly curved, and the Schwarzschild coordinate system becomes less and less Lorentz. As an indication of extreme deviations from Lorentz, notice that the signs of the metric coefficients ∂ ∂ 2M 1 ∂ ∂ , · = gtt = − 1 − · = grr = (24.12) ∂t ∂t r ∂r ∂r (1 − 2M/r) get reversed as one moves from r > 2M through r = 2M and into the region r < 2M. Correspondingly, outside r = 2M world lines of changing t but constant r, θ, φ are timelike, while inside r = 2M those world lines are spacelike; and similarly outside r = 2M world lines of changing r but constant t, θ, φ are spacelike, while inside they are timelike. In this sense, outside r = 2M, t plays the role of a time coordinate and r the role of a space coordinate; while inside r = 2M, t plays the role of a space coordinate and r the role of a time coordinate. Moreover, this role reversal occurs without any change in the role of r as 1/2π times the circumference of circles around the center [Eq. (24.3)]. Historically this role reversal presented for many decades severe conceptual problems, even to the best experts on general relativity. We will return to it in Sec. 24.4 below. Henceforth we shall refer to the location of role reversal, r = 2M, as the gravitational radius of the Schwarzschild spacetime. Throughout the rest of this section and all of Sec. 24.3, we shall confine attention to the region r > 2M, outside the gravitational radius. In Sec. 24.4 we shall seek a clear understanding of the “interior” region, r < 2M. Notice that the metric coefficients in the Schwarzschild line element (24.1) are all independent of the coordinate t. This means that the geometry of spacetime itself is invariant under the translation t → t + constant. At radii r > 2M where t plays the role of a time coordinate, t → t + constant is a time translation; and, correspondingly, the Schwarzschild spacetime geometry is time-translation-invariant, i.e., “static,” outside the gravitational radius. **************************** EXERCISES Exercise 24.1 Practice: Connection Coefficients and Riemann tensor in the Schwarzschild Metric (a) Explain why, for the Schwarzschild metric (24.1), the metric coefficients in the coordinate basis have the values given in Eqs. (2a,b) of Box 24.2. (b) Using tensor-analysis software on a computer, derive the connection coefficients given in Eq. (3) of Box 24.2. (c) Show that the basis vectors in Eqs. (4) of Box 24.2 are orthonormal. (d) Using tensor-analysis software on a computer, derive the connection coefficients (5) and Riemann components (6) of Box 24.2 in the orthonormal basis.

8 Exercise 24.2 Example: The Bertotti-Robinson solution of the Einstein field equation Bruno Bertotti (1959) and Ivor Robinson (1959) have independently solved the Einstein field equation to obtain the following metric for a universe endowed with a uniform magnetic field: ds2 = Q2 [−dt2 + sin2 tdz 2 + dθ2 + sin2 θdφ2 ] .

(24.13)

Here Q = const ,

0≤t≤π,

−∞ < z < +∞ ,

0≤θ≤π,

0 ≤ φ ≤ 2π .

(24.14)

If one computes the Einstein tensor from the metric coefficients of the line element (24.13) and equates it to 8π times a stress-energy tensor, one finds a stress-energy tensor which is precisely that of an electromagnetic field [Eqs. (22.76) and (22.77)] lifted, unchanged, into general relativity. The electromagnetic field is one which, as measured in the local Lorentz frame of an observer with fixed z, θ, φ (a “static” observer), has vanishing electric field and has a magnetic field with magnitude independent of where the observer is located in spacetime and with direction along ∂/∂z. In this sense, the spacetime (24.13) is that of a homogeneous magnetic universe. Discuss the geometry of this universe and the nature of the coordinates t, z, θ, φ. More specifically: (a) Which coordinate increases in a timelike direction and which coordinates in spacelike directions? (b) Is this universe spherically symmetric? (c) Is this universe cylindrically symmetric? (d) Is this universe asymptotically flat? (e) How does the geometry of this universe change as t ranges from 0 to π. [Hint: show that the curves {(z, θ, φ) = const, t = τ /Q} are timelike geodesics—the world lines of the observers referred to above. Then argue from symmetry, or use the result of Ex. 23.4.] (f) Give as complete a characterization as you can of the coordinates t, z, θ, φ.

****************************

24.3

Static Stars

24.3.1

Birkhoff’s Theorem

In 1923, George Birkhoff, a professor of mathematics at Harvard, proved a remarkable theorem:1 The Schwarzschild spacetime geometry is the unique spherically symmetric solution 1

For a textbook proof see Sec. 32.2 of MTW.

9 of the vacuum Einstein field equation G = 0. This Birkhoff theorem can be restated in more operational terms as follows: Suppose that you find a solution of the vacuum Einstein field equation, written as a set of metric coefficients gα¯β¯ in some coordinate system {xµ¯ }. Suppose, further, that these gα¯β¯(xµ¯ ) exhibit spherical symmetry, but do not coincide with the Schwarzschild expressions [Eqs. (2a) of Box 24.2]. Then Birkhoff guarantees the existence of a coordinate transformation from your coordinates xµ¯ to Schwarzschild’s coordinates xν such that, when that transformation is performed, the resulting new metric components gαβ (xν ) have precisely the Schwarzschild form [Eq. (2a) of Box 24.2]. For an example see Ex. 24.3. This implies that, thought of as a coordinate-independent spacetime geometry, the Schwarzschild solution is completely unique. Consider, now, a static, spherically symmetric star (e.g. the sun) residing alone in an otherwise empty universe (or, more realistically, residing in our own universe but so far from all other gravitating matter that we can ignore all other sources of gravity when studying it). Since the star’s interior is spherical, it is reasonable to presume that the exterior will be spherical; and since the exterior is also vacuum (T = 0), its spacetime geometry must be that of Schwarzschild. If the circumference of the star’s surface is 2πR and its surface area is 4πR2 , then that surface must reside at the location r = R in the Schwarzschild coordinates of the exterior. In other words, the spacetime geometry will be described by the Schwarzschild line element (24.1) at radii r > R, but by something else inside the star, at r < R. Since real atoms with finite rest masses reside on the star’s surface, and since such atoms move along timelike world lines, it must be that the world lines {r = R, θ = const, φ = const, t varying} are timelike. From the Schwarzschild invariant interval (24.1) we read off the squared proper time dτ 2 = −ds2 = (1 − 2M/R)dt2 along those world lines. This dτ 2 is positive (timelike world line) if and only if R > 2M. Thus, a static star with total mass-energy (active gravitational mass) M can never have a circumference smaller than 2πR = 4πM. Restated in conventional units: circumference 2GM M gravitational Radius > 2M = ≡ =R≡ = 3.0 km radius of star 2π c2 M⊙ (24.15) Here M⊙ is the mass of the sun. The sun satisfies this constraint by a huge margin: R = 7×105km. A one-solar-mass white-dwarf star satisfies it by a smaller margin: R ≃ 6×103km. And a one-solar-mass neutron star satisfies it by only a modest margin: R ≃ 10km. For a pedagogical and detailed discussion see, e.g., Shapiro and Teukolsky (1983).

24.3.2

Stellar Interior

We shall now take a temporary detour away from our study of the Schwarzschild geometry in order to discuss the interior of a static, spherical star. We do so less because of an interest in stars than because the detour will illustrate the process of solving the Einstein field equation and the role of the contracted Bianchi identity in the solution process. Since the star’s spacetime geometry is to be static and spherically symmetric, we can introduce as coordinates in its interior (i ) spherical polar angular coordinates θ and φ, (ii ) a radial coordinate r such that the circumferences of the spheres are 2πr, and (iii ) a time

10 coordinate t¯ such that the metric coefficients are independent of t¯. By their geometrical definitions, these coordinates will produce a spacetime line element of the form ds2 = gt¯t¯dt¯2 + 2gt¯r dt¯dr + grr dr 2 + r 2 (dθ2 + sin2 θdφ2 ) ,

(24.16)

with gαβ independent of t¯, θ, and φ. Metric coefficients gt¯θ , grθ , gt¯φ , grφ are absent from (24.16) because they would break the spherical symmetry: they would distinguish the +φ direction from −φ or +θ from −θ since they would give nonzero values for the scalar products of ∂/∂φ or ∂/∂θ with ∂/∂t or ∂/∂r. [Recall: the metric coefficients in a coordinate basis are gαβ = g(∂/∂xα , ∂/∂xβ ) = (∂/∂xα ) · (∂/∂xβ ).] We can get rid of the off-diagonal gt¯r term in the line element (24.16) by specializing the time coordinate: The coordinate transformation Z gt¯r ¯ t=t− dr . (24.17) gt¯t¯ brings the line element into the form ds2 = −e2Φ dt2 + e2Λ dr 2 + r 2 (dθ2 + sin2 θdφ2 ) .

(24.18)

Here we have introduced the names e2Φ and e2Λ for the time-time and radial-radial metric coefficients. The signs of these coefficients (negative for gtt and positive for grr ) are dictated by the fact that inside the star, as on its surface, real atoms move along world lines of constant r, θ, φ and changing t, and thus those world lines must be timelike. The name e2Φ ties in with the fact that, when gravity is nearly Newtonian the time-time metric coefficient −e2Φ must reduce to −(1 + 2Φ), with Φ the Newtonian potential [Eq. (23.95)]. Thus, the Φ used in (24.18) is a generalization of the Newtonian potential to relativistic, spherical, static gravitational situations. In order to solve the Einstein field equation for the star’s interior, we must specify the stress-energy tensor. Stellar material is excellently approximated by a perfect fluid; and since our star is static, at any point inside the star the fluid’s rest frame has constant r, θ, φ. Correspondingly, the 4-velocity of the fluid is ~u = e−Φ

∂ . ∂t

(24.19)

Here the factor e−Φ guarantees that the 4-velocity will have unit length, as it must. This fluid, of course, is not freely falling. Rather, in order for a fluid element to remain always at fixed r, θ, φ, it must accelerate relative to local freely falling observers with a 4-acceleration ~a ≡ ∇u~ ~u 6= 0; i.e., aα = uα ;µ uµ 6= 0. Symmetry tells us that this 4-acceleration cannot have any θ or φ components; and orthogonality of the 4-acceleration to the 4-velocity tells us that it cannot have any t component. The r component, computed from ar = ur ;µ uµ = Γr 00 u0 u0 , is ar = e−2Λ Φ,r ; and thus, ∂ ~a = e−2Λ Φ,r (24.20) . ∂r

11 Each fluid element can be thought of as carrying with itself an orthonormal set of basis vectors ~eˆ0 = ~u = e−Φ ˆ

∂ , ∂t

~ , ~e 0 = eΦ ∇t

~erˆ = e−Λ

∂ , ∂r

~ , ~e rˆ = eΛ ∇r

~eθˆ = ˆ

1 ∂ , r ∂θ

~ , ~e θ = r ∇θ

1 ∂ ; r sin θ ∂φ

~eφˆ = ˆ

~ . ~e φ = r sin θ∇φ

(24.21a) (24.21b)

These basis vectors play two independent roles: (i ) One can regard the tangent space of each event in spacetime as being spanned by the basis (24.21), specialized to that event. From this viewpoint, (24.21) constitutes an orthonormal, non-coordinate basis that covers every tangent space of the star’s spacetime. This basis is called the fluid’s orthonormal, local-rest-frame basis. (ii ) One can focus attention on a specific fluid element, which moves along the world line r = ro , θ = θo , φ = φo ; and one can construct the proper reference frame of that fluid element in the same manner as we constructed the proper reference frame of an accelerated observer in flat spacetime in Sec. 22.5. That proper reference frame is a coordinate system {xαˆ } whose basis vectors on the fluid element’s world line are equal to the basis vectors (24.21): ∂ = ~eµˆ , ∂xµˆ

~ µˆ = ~e µˆ at xˆj = 0 . ∇x

(24.22)

More specifically: the coordinates xµˆ are given, to second-order in spatial distance from the fluid element’s world line, by Z r 1 ˆ ˆ 0 Φo 1 x =e t, x = eΛ dr − e−Λo ro [(θ − θo )2 + sin2 θo (φ − φo )2 ] , 2 ro 1 ˆ ˆ x2 = r(θ − θo ) − ro sin θo cos θo (φ − φo )2 , x3 = r sin θ(φ − φo ) , (24.23) 2 from which one can verify relation (24.22) with ~eµˆ and ~e µˆ given by (24.21). [In Eqs. (24.23) and throughout this discussion all quantities with subscripts o are evaluated on the fluid’s world line.] In terms of the proper-reference-frame coordinates (24.23) the line element (24.18) takes the following form, accurate to first order in distance from the fluid element’s world line: ˆ ˆ ds2 = −[1 + 2Φ,r (r − ro )](dxoˆ)2 + δij dxi dxj . (24.24) Notice that the quantity Φ,r (r −ro ) is equal to the scalar product of (i ) the spatial separation ˆ ≡ (r − ro )∂/∂r + (θ − θo )∂/∂θ + (φ − φo )∂/∂φ of the “field point” (r, θ, φ) from the fluid x element’s world line, with (ii ) the fluid’s 4-acceleration (24.20), viewed as a spatial 3-vector a = e−2Λo Φ,r ∂/∂r. Correspondingly, the spacetime line element (24.24) in the fluid element’s proper reference frame takes the standard proper-reference-frame form (22.87) ˆ

ˆ

ˆ

ˆ )(dx0 )2 + δjk dxj dxk , ds2 = −(1 + 2a · x

(24.25)

accurate to first-order in distance from the fluid element’s world line. At second order, as was discussed at the end of Sec. 23.3, there are corrections proportional to the spacetime curvature.

12 In the local rest frame of the fluid, i.e., when expanded on the fluid’s orthonormal restˆ ˆ ˆ frame basis vectors (24.21) or equally well (24.22), the components T αˆβ = (ρ+P )uαˆ uβ +P g αˆβ of the fluid’s stress-energy tensor take on the standard form [Eq. (22.58)] ˆˆ

T 00 = ρ ,

ˆˆ

ˆˆ

T rˆrˆ = T θθ = T φφ = P ,

(24.26)

corresponding to a rest-frame mass-energy density ρ and isotropic pressure P . By contrast with the simplicity of these local-rest-frame components, the contravariant components T αβ = (ρ + P )uαuβ + P g αβ in the (t, r, θ, φ) coordinate basis are rather more complicated looking: T tt = e−2Φ ρ ,

T rr = e−2Λ P ,

T θθ = r −2 P ,

T φφ = (r sin θ)−2 P .

(24.27)

This shows one advantage of using orthonormal bases: The components of vectors and tensors are generally simpler in an orthonormal basis than in a coordinate basis. A second advantage occurs when one seeks the physical interpretation of formulae. Because every orthornormal basis is the proper-reference-frame basis of some local observer (the observer with 4-velocity ~u = ~eoˆ), components measured in such a basis have an immediate physical ˆˆ interpretation. For example, T 00 is the total density of mass-energy measured by the local observer. By contrast, components in a coordinate basis typically do not have a simple physical interpretation.

24.3.3

Local Energy and Momentum Conservation

Before inserting the perfect-fluid stress-energy tensor (24.26) into the Einstein field equation, ~ · T = 0. In doing so we shall impose on it the local law of conservation of 4-momentum, ∇ we shall require from the outset that, since the star is to be static and spherical, its density ρ and pressure P must be independent of t, θ, and φ; i.e., like the metric coefficients Φ and Λ, they must be functions of radius r only. The most straightforward way to impose 4-momentum conservation is to equate to zero the quantities ∂T αβ + Γβ µβ T αµ + Γα µβ T µβ = 0 (24.28) T αβ ;β = ∂xβ in our coordinate basis, making use of expressions (24.27) for the mixed components of the stress-energy tensor, and the connection coefficients and metric components given in Box 24.2. This straightforward calculation requires a lot of work. Much better is an analysis based ~ · T = 0 in on the local proper reference frame of the fluid. The temporal component of ∇ that reference frame, i.e. the projection of this conservation law onto the time basis vector ~eˆ0 = e−Φ ∂/∂t = ~u, represents energy conservation as seen by the fluid. But the fluid sees and feels no changes; its density and pressure remain always constant along a fluid element’s world line, and energy conservation is therefore guaranteed to be satisfied already; i.e., an ~ · T) = 0 must give the identity 0 = 0, so why bother computing it? If evaluation of ~u · (∇ one does bother, just to make sure of this argument, one does indeed get 0 = 0.

13 ~ · T = 0 in the fluid’s local rest frame, by contrast, will The spatial components of ∇ be nontrivial. The easiest way to compute them is to introduce the tensor P ≡ g + ~u ⊗ ~u that projects all vectors into the 3-surface orthogonal to ~u, i.e. into the fluid’s local 3surface of simultaneity. One can readily show that in the fluid’s local proper reference frame, the components of this projection tensor are Pˆ0αˆ = 0, Pîˆj = δij , which means that P can be thought of as the spatial 3-metric of the fluid’s local rest frame, viewed however ~ · T = 0 is obtained by contraction with P. as a spacetime tensor. The spatial part of ∇ Computed using index notation, this contraction gives: 0 = [(ρ + P )uαuβ + P g αβ ];β Pαµ = [(ρ + P )uβ ];β uα Pαµ + (ρ + P )[uβ uα ;β ]Pαµ + P;β g αβ Pαµ . (24.29) Here we have used the fact that the gradient of the metric vanishes. The first term in Eq. (24.29) vanishes, since uα Pαµ = 0 (the projection of ~u orthogonal to ~u is zero). The quantity in square brackets in the second term is the fluid’s 4-acceleration ~a [Eq. (24.20)]. The third term is the projection of the pressure gradient orthogonal to ~u; but because the star is static, the pressure gradient didn’t have any time component to begin with, so the projection accomplishes nothing; it is not needed. Therefore, Eq. (24.29) reduces to ~ . (ρ + P )~a = −∇P

(24.30)

Recall from Exs. 1.29 and 22.11(b) that for a perfect fluid ρ + P is the inertial mass per unit volume. Therefore, Eq. (24.30) says that the fluid’s inertial mass per unit volume times its 4-acceleration is equal to the negative of its pressure gradient. Since both sides of Eq. (24.30) are purely spatially directed as seen in the fluid’s local proper reference frame, we can rewrite this equation in 3-dimensional language as (ρ + P )a = −∇P .

(24.31)

A Newtonian physicist, in the proper reference frame, would identify −a as the local gravitational acceleration, g [cf. Eq. (22.97)], and correspondingly would rewrite (24.30) as ∇P = (ρ + P )g .

(24.32)

Notice that this is the standard equation of hydrostatic equilibrium for a fluid in an earthbound laboratory (or swimming pool or lake or ocean), except for the presence of the pressure P in the inertial mass per unit volume. On earth the typical pressures of fluids, even deep in the ocean, are only P . 109 dyne/cm2 ≃ 10−12 g/cm3 . 10−12 ρ; and thus, to extremely good accuracy one can ignore the contribution of pressure to the inertial mass density. However, deep inside a neutron star P may be within a factor 2 of ρ, so the contribution of P cannot be ignored. We can convert the law of force balance (24.30) into an ordinary differential equation for the pressure P by evaluating its components in the fluid’s proper reference frame. The 4-acceleration (24.20) is purely radial; its radial component is arˆ = e−Λ Φ,r = Φ,ˆr . The gradient of the pressure is also purely radial and its radial component is P;ˆr = P,ˆr = e−Λ P,r . Therefore, the law of force balance reduces to dP dΦ = −(ρ + P ) . dr dr

(24.33)

14

24.3.4

Einstein Field Equation

Turn, now, to the Einstein field equation. In order to impose it, we must first compute in our {t, r, θ, φ} coordinate system the components of the Einstein tensor Gαβ . In general, the Einstein tensor has 10 independent components. However, the symmetries of the line element (24.18) impose identical symmetries on the Einstein tensor computed from it: The only nonzero components will be Gtˆtˆ, Grˆrˆ, and Gθˆθˆ = Gφˆφˆ; and these three independent components will be functions of radius r only. Correspondingly, the Einstein equation will produce three independent differential equations for our four unknowns: the metric coefficients (“gravitational potentials”) Φ and Λ, and the radial distribution of density ρ and pressure P . These three independent components of the Einstein equation will actually be redundant with the law of hydrostatic equilibrium (24.33). One can see this as follows: If we had not yet imposed the law of 4-momentum conservation, then the Einstein equation G = 8πT, ~ · G ≡ 0 [Eq. (23.78)], would enforce ∇ ~ · T = 0. More together with the Bianchi identity ∇ explicitly, our three independent components of the Einstein equation together would imply the law of radial force balance, i.e., of hydrostatic equilibrium (24.33). Since we have already imposed (24.33), we need evaluate only two of the three independent components of the Einstein equation; they will give us full information. A long and rather tedious calculation (best done on a computer), based on the metric coefficients of (24.18) and on Eqs. (22.36)–(22.39), (23.57), (23.54), (23.56), and (23.77) produces for the time-time and radial-radial components of the Einstein tensor, and thence of the Einstein field equation, ˆˆ

G00 = −

1 d ˆˆ [r(1 − e−2Λ )] = 8πT 00 = 8πρ , r 2 dr

(24.34)

2 dΦ 1 (1 − e−2Λ ) + e−2Λ = 8πT rˆrˆ = 8πP . (24.35) 2 r r dr We can bring these components of the field equation into simpler form by defining a new metric coefficient m(r) by 1 . e2Λ ≡ (24.36) 1 − 2m/r Grˆrˆ = −

Note [cf. Eqs. (24.1), (24.18), and (24.36)] that outside the star m is equal to the star’s total mass-energy M. This, plus the fact that in terms of m the time-time component of the field equation (24.34) takes the form dm = 4πr 2 ρ , (24.37) dr motivates the name mass inside radius r for the quantity m(r). In terms of m the radialradial component (24.35) of the field equation becomes m + 4πr 3 P dΦ = ; dr r(r − 2m)

(24.38)

15 and combining this with (24.33) we obtain an alternative form of the equation of hydrostatic equilibrium dP (ρ + P )(m + 4πr 3 P ) (24.39) =− . dr r(r − 2m) [This form is called the Tolman-Oppenheimer-Volkoff or TOV equation because it was first derived by Tolman (1939) and first used in a practical calculation by Oppenheimer and Volkoff (1939).] Equations (24.37), (24.38), (24.39) plus an equation of state for the pressure of the stellar material P in terms of its density of total mass-energy ρ, P = P (ρ) ,

(24.40)

determine the four quantities Φ, m, ρ, and P as functions of radius. Actually, for full determination, one also needs boundary conditions. Just as the surface of a sphere is everywhere locally Euclidean (i.e., is arbitrarily close to Euclidean in arbitrarily small regions), so also spacetime must be everywhere locally Lorentz; cf. Eqs. (23.15) and (23.16). In order that spacetime be locally Lorentz at the star’s center (in particular, that circumferences of tiny circles around the center be equal to 2π times their radii), it is necessary that m vanish at the center m = 0 at r = 0, and thus m(r) =

Z

r

4πr 2 ρdr ;

(24.41)

0

cf. Eqs. (24.18) and (24.36). At the star’s surface the interior spacetime geometry (24.18) must join smoothly to the exterior Schwarzschild geometry (24.1), and hence m=M

24.3.5

and e2Φ = 1 − 2M/r

at r = R .

(24.42)

Stellar Models and Their Properties

A little thought now reveals a straightforward method of producing a relativistic stellar model: (i ) Specify an equation of state for the stellar material P = P (ρ) and specify a central density ρc or central pressure Pc for the star. (ii ) Integrate the coupled hydrostaticequilibrium equation (24.39) and “mass equation” (24.37) outward from the center, beginning with the initial conditions m = 0 and P = Pc at the center. (iii ) Terminate the integration when the pressure falls to zero; this is the surface of the star. (iv ) At the surface read off the value of m; it is the star’s total mass-energy M, which appears in the star’s external, Schwarzschild line element (24.1). (v ) From this M and the radius r ≡ R of the star’s surface, read off the value of the gravitational potential Φ at the surface [Eq. (24.42)]. (vi ) Integrate the Einstein field equation (24.38) inward from the surface toward the center to determine Φ as a function of radius inside the star. Just six weeks after reading to the Prussian Academy of Science the letter in which Karl Schwarzschild derived his vacuum solution (24.1) of the field equation, Albert Einstein again presented the Academy with the results of Schwarzschild’s fertile mind: an exact solution for the structure of the interior of a star that has constant density ρ. [And just four months

16 after that, on June 29, 1916, Einstein had the sad task of announcing to the Academy that Schwarzschild had died of an illness contracted on the Russian front.] In our notation, Schwarzschild’s solution for the interior of a star is characterized by its uniform density ρ, its total mass M, and its radius R which is given in terms of ρ and M by M=

4π 3 ρR 3

(24.43)

[Eq. (24.41)]. In terms of these the mass inside radius r, the pressure P , and the gravitational potential Φ are (Schwarzschild 1916b) " # 1 1 4π 3 (1 − 2Mr 2 /R3 ) 2 − (1 − 2M/R) 2 m= ρr , P = ρ , (24.44) 1 1 3 3(1 − 2M/R) 2 − (1 − 2Mr 2 /R3 ) 2 1 1 2M 2 1 2Mr 2 2 3 1− 1− . e = − 2 R 2 R3 Φ

(24.45)

We present these details less for their specific physical content than to illustrate the solution of the Einstein field equation in a realistic, astrophysically interesting situation. For discussions of the application of this formalism to neutron stars, where relativistic deviations from Newtonian theory can be rather strong, see e.g., Shapiro and Teukolsky (1983). For the seminal work on the theory of neutron-star structure see Oppenheimer and Volkoff (1939). Among the remarkable consequences of the TOV equation of hydrostatic equilibrium (24.39) for neutron-star structure are these: (i ) If the mass m inside radius r ever gets close to r/2, the “gravitational pull” [right-hand side of (24.39)] becomes divergently large, forcing the pressure gradient that counterbalances it to be divergently large, and thereby driving the pressure quickly to zero as one integrates outward. This protects the static star from having M greater than R/2, i.e., from having its surface inside its gravitational radius. (ii ) Although the density of matter near the center of a neutron star is above that of an atomic nucleus, where the equation of state is ill-understood, we can be confident that there is an upper limit on the masses of neutron stars, a limit in the range 1.6M⊙ . Mmax . 3M⊙ . This mass limit cannot be avoided by postulating that a more massive neutron star develops an arbitrarily large central pressure and thereby supports itself against gravitational implosion. The reason is that an arbitrarily large central pressure is self-defeating: The “gravitational pull” which appears on the right-hand side of (24.39) is quadratic in the pressure at very high pressures (whereas it would be independent of pressure in Newtonian theory). This purely relativistic feature guarantees that if a star develops too high a central pressure, it will be unable to support itself against the resulting “quadratically too high” gravitational pull. We conclude this section by introducing a useful technique for visualizing spacetime curvature: the embedding of the curved spacetime, or a piece of it, in a flat space of higher dimensionality. The geometry of a curved, n-dimensional manifold is characterized by 12 n(n + 1) metric components (since those components form a symmetric n × n matrix), of which only 12 n(n + 1) − n = 21 n(n − 1) are of coordinate-independent significance (since we are free to choose arbitrarily the n coordinates of our coordinate system and can thereby force n of the metric

17 components to take on any desired values, e.g., zero). If this n-dimensional manifold is embedded in a flat N-dimensional manifold, that embedding will be described by expressing N − n of the embedding manifold’s Euclidean (or Lorentz) coordinates in terms of the other n. Thus, the embedding will be characterized by N − n functions of n variables. In order for the embedding to be possible, in general, this number of choosable functions must be at least as large as the number of significant metric coefficients 21 n(n − 1). From this argument we conclude that the dimensionality of the embedding space must be N ≥ 12 n(n + 1). Actually, this argument analyzes only the local features of the embedding. If one wants also to preserve the global topology of the n-dimensional manifold, one must in general go to an embedding space of even higher dimensionality. Curved spacetime has n = 4 dimensions and thus requires for its local embedding a flat space with N = 10 dimensions. This is a bit much for 3-dimensional beings like us to visualize. If, as a sop to our visual limitations, we reduce our ambitions and seek only to extract a 3-surface from curved spacetime and visualize it by embedding it in a flat space, we will require a flat space of N = 6 dimensions. This is still a bit much. In frustration we are driven to extract from spacetime n = 2 dimensional surfaces and visualize them by embedding in flat spaces with N = 3 dimensions. This is doable—and, indeed, instructive. As a nice example, consider the equatorial “plane” through the spacetime of a static spherical star, at a specific “moment” of coordinate time t; i.e., consider the 2-surface t = const, θ = π/2 in the spacetime of Eqs. (24.18), (24.36). The line element on this equatorial 2-surface is Z r dr 2 2 2 (2) 2 + r dφ , where m = m(r) = 4πr 2 ρdr ; (24.46) ds = 1 − 2m/r 0 cf. Eq. (24.41). We seek to construct in a 3-dimensional Euclidean space a 2-dimensional surface with precisely this same 2-geometry. As an aid, introduce in the Euclidean embedding space a cylindrical coordinate system r, z, φ, in terms of which the space’s 3-dimensional line element is (3) ds2 = dr 2 + dz 2 + rdφ2 . (24.47) The surface we seek to embed is axially symmetric, so we can describe its embedding by the value of z on it as a function of radius r: z = z(r). Inserting this (unknown) embedding function into (24.47), we obtain for the surface’s 2-geometry (2)

ds2 = [1 + (dz/dr)2 ]dr 2 + r 2 dφ2 ;

(24.48)

and comparing with our original expression (24.46) for the 2-geometry we obtain a differential equation for the embedding function: 12 1 dz = −1 . (24.49) dr 1 − 2m/r If we set z = 0 at the star’s center, then the solution of this differential equation is Z r dr z= (24.50) 1 . 0 [(r/2m) − 1] 2

18 z

x

r

exterior of star interior of star y

Fig. 24.1: Embedding diagram depicting an equatorial, 2-dimensional slice t = const, θ = π/2 through the spacetime of a spherical star with uniform density ρ and with radius R equal to 2.5 times the gravitational radius 2M . See Ex. 24.4 for details.

Near the star’s center m(r) is given by m = (4π/3)ρc r 3 , where ρc is the star’s central density; and outside the star m(r) is equal to the star’s r-independent total mass M. Correspondingly, in these two regions Eq. (24.50) reduces to p z = (2π/3)ρc r 2 at r very near zero . p z = 8M(r − 2M) + constant at r > R , i.e., outside the star. (24.51) Figure 24.1 shows the embedded 2-surface z(r) for a star of uniform density ρ = const; cf. Ex. 24.4. For any other star the embedding diagram will be qualitatively similar, though quantitatively different. The most important feature of this embedding diagram is its illustration of the fact [also clear in the original line element (24.46)] that, as one moves outward from the star’s center, 2πr increases less rapidly than the proper radial distance travelled, R r its circumference 1 l = 0 (1 − 2m/r)− 2 dr. As a specific example, the distance from the center of the earth to a perfect circle near the earth’s surface is more than circumference/2π by about 1.5 millimeters—a number whose smallness compared to the actual radius, 6.4 × 108 cm, is a measure of the weakness of the curvature of spacetime near earth. As a more extreme example, the distance from the center of a massive neutron star to its surface is about one kilometer greater than circumference/2π—i.e., greater by an amount that is roughly 10 percent of the ∼ 10 km circumference/2π. Correspondingly, in the embedding diagram for the earth (Fig. 24.1) the embedded surface would be so nearly flat that its downward dip at the center would be noticeable only with great effort; whereas the embedding diagram for a neutron star would show a downward dip about like that of Fig. 24.1. **************************** EXERCISES Exercise 24.3 Example: Schwarzschild Geometry in Isotropic Coordinates

19 (a) It turns out that the following line element is a solution of the vacuum Einstein field equation G = 0: 2 4 M 1 − M/2¯ r 2 2 dt + 1 + [d¯ r 2 + r¯2 (dθ2 + sin2 θdφ2 )] . ds = − (24.52) 1 + M/2¯ r 2¯ r Since this solution is spherically symmetric, Birkhoff’s theorem guarantees it must represent the standard Schwarzschild spacetime geometry in a coordinate system that differs from Schwarzschild’s. Show that this is so by exhibiting a coordinate transformation that converts this line element into (24.1). Note: the t, r¯, θ, φ coordinates are called isotropic because in them the spatial part of the line element is a function of r¯ times the 3-dimensional Euclidean line element, and Euclidean geometry picks out at each point in space no preferred spatial directions, i.e., it is isotropic. (b) Show that at large radii r ≫ M, the line element (24.52) takes the form (23.112) discussed in Chap. 23, but with vanishing spin angular momentum S = 0. Exercise 24.4 Example: Star of Uniform Density (a) Show that the embedding surface of Eq. (24.50) is a paraboloid of revolution everywhere outside the star. (b) Show that in the interior of a uniform-density star, the embedding surface is a segment of a sphere. (c) Show that the match of the interior to the exterior is done in such a way that, in the embedding space the embedded surface shows no kink (no bend) at r = R. (d) Show that circumference/2π for a star is less than the distance from the center to the surface by an amount of order the star’s Schwarzschild radius 2M. Evaluate this amount analytically for a star of uniform density, and numerically (approximately) for the earth and for a neutron star. Exercise 24.5 Example: Gravitational Redshift Consider a photon emitted by an atom at rest on the surface of a static star with mass M and radius R. Analyze the photon’s motion in the Schwarzschild coordinate system of the star’s exterior, r ≥ R > 2M; and, in particular, compute the “gravitational redshift” of the photon by the following steps: (a) Since the emitting atom is very nearly an “ideal clock,” it gives the emitted photon very nearly the same frequency νem , as measured in the atom’s proper reference frame, as it would give were it in an earth laboratory or floating in free space. Thus, the proper reference frame of the emitting atom is central to a discussion of the photon’s properties and behavior. Show that the basis vectors of that proper reference frame are p 1 1 ∂ 1 ∂ ∂ ∂ ~eˆ0 = p , ~erˆ = 1 − 2M/r , ~eθˆ = , ~eφˆ = . (24.53) ∂r r ∂θ r sin θ ∂φ 1 − 2M/r ∂t As part of your proof, show that these basis vectors are orthonormal.

20 (b) Explain why hνem = −pˆ0 = −~p · ~eˆ0 at the moment of photon emission. (Here and below h is Planck’s constant and ~p is the photon’s 4-momentum.) (c) Show that the time-component of the photon 4-momentum in the Schwarzschild coorp dinate basis is pt = − 1 − 2M/R hνem at the moment of emission.

(d) Show that as the photon flies out (radially or nonradially) toward r = ∞, the coordinatetime component of its 4-momentum, pt , is conserved. [Hint: recall the result of Ex. 23.4(a).] (e) Show that when received by an observer at rest relative to the star and very far away from it, the photon is measured by that observer to have frequency νrec = −pt /h. (f) Show that the photon is redshifted by an amount λrec − λem 1 −1, =p λem 1 − 2M/R

(24.54)

where λrec is the wavelength that the photon’s spectral line exhibits at the receiver and λem is the wavelength that the emitting kind of atom would produce in an earth laboratory. Note that for a nearly Newtonian star, i.e. one with R ≫ M, this redshift becomes ≃ M/R = GM/Rc2 . (g) Evaluate this redshift for the earth, for the sun, and for a 1.4-solar-mass, 10-kilometerradius neutron star. Exercise 24.6 Challenge: Mass-Radius Relation for Real Neutron Stars Choose a physical equation of state from the alternatives presented in Shapiro & Teukolsky (1983) and represent it numerically. Then integrate the TOV equation starting with several suitable central pressures and deduce a mass-radius relation. You should find that as the central pressure is increased, the mass passes through a maximum while the radius continues to decrease. (Solutions with radii smaller than that associated with the maximum mass are unstable to radial perturbations.)

****************************

24.4

Gravitational Implosion of a Star to Form a Black Hole

J. Robert Oppenheimer (then a professor jointly at the University of California at Berkeley and at Caltech), upon discovering with his student George Volkoff that there is a maximum mass limit for neutron stars (Oppenheimer and Volkoff 1939), was forced to consider the possibility that when it exhausts its nuclear fuel a more massive star will implode to radii R ≤ 2M. With his graduate student Hartland Snyder, Oppenheimer just before the outbreak of

21

t M 6 r=R(t) 4 2 0

2

4

6

8

r/M

Fig. 24.2: Spacetime diagram depicting in Schwarzschild coordinates the gravitationally induced implosion of a star. The thick solid curve is the world line of the star’s surface, r = R(t) in the external Schwarzschild coordinates. The stippled region to the left of that world line is not correctly described by the Schwarzschild line element (24.1); it requires for its description the spacetime metric of the star’s interior.

World War II investigated the details of such an implosion for the idealized case of a perfectly spherical star in which all the internal pressure is suddenly extinguished; see Oppenheimer and Snyder (1939). In this section we shall repeat their analysis, though from a more modern viewpoint and using somewhat different arguments.2 By Birkhoff’s theorem, the spacetime geometry outside an imploding, spherical star must be that of Schwarzschild. This means, in particular, that an imploding, spherical star cannot produce any gravitational waves; such waves would break the spherical symmetry. By contrast, a star that implodes nonspherically can produce a strong burst of gravitational waves; see Chap. 25. Since the spacetime geometry outside an imploding, spherical star is that of Schwarzschild, we can depict the motion of the star’s surface by a world line in a 2-dimensional spacetime diagram with Schwarzschild coordinate time t plotted upward and Schwarzschild coordinate radius r plotted rightward (Fig. 24.2). The world line of the star’s surface is an ingoing curve. The region to the left of the world line must be discarded and replaced by the spacetime of the star’s interior, while the region to the right, r > R(t), is correctly described by Schwarzschild. As for a static star, so also for an imploding one, because real atoms with finite rest masses live on the star’s surface, the world line of that surface, {r = R(t), θ and φ constant}, must be timelike. Consequently, at each point along the world line it must lie within the local light cones. Let us examine those light cones: Several of the photon world lines (24.56) are depicted in Fig. 24.3, along with some of the local light cones (24.55). The light cones are drawn with one of the two θ, φ angular coordinates restored. The most extreme rightward and most extreme leftward edges of each cone are short segments of the radial photon trajectories, as given by Eq. (24.55); while the in-between parts are segments of trajectories of photons with nonzero dθ and/or dφ. Notice 2

For further details, see MTW Chapters 31 and 32.

22

t M 6

2 0

r=2M

4

2

4

6

8

r/M

Fig. 24.3: Some radial photon world lines [Eq. (24.56)] and some light cones [Eq. (24.55)] in the Schwarzschild spacetime, depicted in Schwarzschild coordinates. We draw the light cones inside r = 2M opening leftward rather than rightward for reasons explained in Sec. 24.4 below [cf. Fig. 24.5].

that the light cones do not have 45-degree opening angles as they do in a Lorentz frame of flat spacetime. This is a peculiarity due not to spacetime curvature, but rather to the nature of the Schwarzschild coordinates: If, at any chosen event of the Schwarzschild spacetime, we were to introduce a local Lorentz frame, then in that frame the light cones would have 45-degree opening angles. Thus, the “squeezing down” of the light cones as one approaches r = 2M from r > 2M, in Fig. 24.3, signals not a peculiarity of the frame-independent spacetime geometry at r = 2M, but rather a peculiarity of the Schwarzschild coordinates there. The radial edges of the light cones are generated by the world lines of radially traveling photons, i.e., photons with world lines of constant θ, φ and varying t, r. (Spherical symmetry dictates that if a photon starts out traveling radially, it will always continue to travel radially.) We could, if we wished, compute the world lines of these photons from their geodesic equation. However, knowing already that they have constant θ, φ we can compute them more simply from a knowledge that they must be null world lines: Setting to zero the ds2 of Eq. (24.1), we see that along these null, radial world lines 0 = ds2 = −(1 − 2M/r)dt2 +

dr 2 ; 1 − 2M/r

i.e.,

dt 1 =± . dr 1 − 2M/r

(24.55)

Integrating this differential equation we obtain r + 2M ln |(r/2M) − 1| = ±t + const .

(24.56)

Since the world line of the star’s surface is confined to the interiors of the local light cones, the squeezing down of the light cones near r = 2M [the fact that dt/dr → ±∞ in Eq. (24.55)] prevents the star’s world line r = R(t) from ever, in any finite coordinate time t, reaching the gravitational radius, r = 2M.

23 This conclusion is completely general; it relies in no way whatsoever on the details of what is going on inside the star or at its surface. It is just as valid for completely realistic stellar implosion (with finite pressure and shock waves) as for the idealized, OppenheimerSnyder case of zero-pressure implosion. In the special case of zero pressure, one can explore the details further: Because no pressure forces act on the atoms at the star’s surface, those atoms must move inward along radial geodesic world lines. Correspondingly, the world line of the star’s surface in the external Schwarzschild spacetime must be a timelike geodesic of constant (θ, φ). In Ex. 24.7 the geodesic equation is solved to determine that world line R(t), with a conclusion that agrees with the above argument: Only after a lapse of infinite coordinate time t does the star’s surface reach the gravitational radius r = 2M. A byproduct of that calculation is equally remarkable: Although the implosion to R = 2M requires infinite Schwarzschild coordinate time t, it requires only a finite proper time τ as measured by an observer who rides inward on the star’s surface. In fact, the proper time is π τ≃ 2

Ro3 2M

21

= 15 microseconds

Ro 2M

3/2

M M⊙

if Ro ≫ 2M ,

(24.57)

where Ro is the star’s initial radius when it first begins to implode freely, M⊙ denotes the mass of the sun, and proper time √ τ is measured from the start of implosion. Note that this implosion time is equal to 1/4 2 times the orbital period of a test particle at the radius of the star’s initial surface. For a star with mass and initial radius equal to those of the sun, τ is about 30 minutes; for a neutron star that has been pushed over the maximum mass limit by accretion of matter from its surroundings, τ is about 0.1 milliseconds. What happens to the star’s surface, and an observer on it, when—after infinite coordinate time but tiny proper time—it reaches the gravitational radius? There are two possibilities: (i ) the tidal gravitational forces there might be so strong that they destroy the star’s surface and any observers on it; or, (ii ) if the tidal forces are not that strong, then the star and observers must continue to exist, moving into a region of spacetime (presumably r < 2M) that is not smoothly joined onto r > 2M in the Schwarzschild coordinate system. In the latter case the pathology is all due to poor properties of Schwarzschild’s coordinates. In the former case it is due to an intrinsic, coordinate-independent singularity of the tide-producing Riemann curvature. To see which is the case, we must evaluate the tidal forces felt by observers on the surface of the imploding star. Those tidal forces are produced by the Riemann curvature tensor. More specifically, if an observer’s feet and head have a vector separation ξ at time τ as measured by the observer’s clock, then the curvature of spacetime will exert on them a relative gravitational acceleration given by the equation of geodesic deviation, in the form appropriate to a local Lorentz frame: ¯

d2 ξ j ¯ ¯ = −Rj ¯0k¯¯0 ξ k 2 dτ

(24.58)

[Eq. (23.42)]. Here the barred indices denote components in the observer’s local Lorentz frame. The tidal forces will become infinite, and will thereby destroy the observer and all

24 forms of matter on the star’s surface, if and only if the local Lorentz Riemann components R¯j ¯0k¯¯0 diverge as the star’s surface approaches the gravitational radius. Thus, to test whether the observer and star survive, we must compute the components of the Riemann curvature tensor in the local Lorentz frame of the star’s imploding surface. The easiest way to compute those components is by a transformation from components as measured in the proper reference frames of observers who are “at rest” (fixed r, θ, φ) in the Schwarzschild spacetime. At each event on the world tube of the star’s surface, then, we have two orthonormal frames: one (barred indices) a local Lorentz frame imploding with the star; the other (hatted indices) a proper reference frame at rest. Since the metric coefficients in these two bases have the standard flat-space form gα¯β¯ = ηαβ , gαˆβˆ = ηαβ , the bases must be related by a Lorentz transformation [cf. Eq. (1.47b) and associated discussion]. A little thought makes it clear that the required transformation matrix is that for a pure boost [Eq. (1.49a)] ˆ

L0 ¯0 = Lrˆr¯ = γ ,

ˆ

L0 r¯ = Lrˆ¯0 = −βγ ,

ˆ

ˆ

Lθ θ¯ = Lφ φ¯ = 1 ;

γ=p

1 1 − β2

,

(24.59)

with β the speed of implosion of the star’s surface, as measured in the proper reference frame of the static observer when the surface flies by. The transformation law for the components of the Riemann tensor has, of course, the standard form for any fourth rank tensor: ˆ

µ ˆ νˆ λ σ ˆ Rα¯β¯ ¯γ δ¯ = L α ˆσ . ¯ L β¯ L γ ¯ L δ¯Rµ ˆνˆλˆ

(24.60)

The basis vectors of the proper reference frame are given by Eq. (24.21), specialized to the star’s Schwarzschild exterior and with r set equal to the momentary radius R of the star’s surface [cf. also Ex. 24.5(a)] ~eˆ0 = p

∂ 1 , 1 − 2M/R ∂t

~erˆ =

p ∂ , 1 − 2M/R ∂r

~eθˆ =

1 ∂ , R ∂θ

~eφˆ =

1 ∂ . (24.61) R sin θ ∂φ

This is the Schwarzschild orthonormal basis used in Box 24.2; and from that Box we learn that the components of Riemann in this basis are: 2M M , Rˆ0θˆˆ0θˆ = Rˆ0φˆˆ0φˆ = + 3 , 3 R R M 2M = 3 , Rrˆθˆ ˆr θˆ = Rrˆφˆ ˆr φˆ = − 3 . R R

Rˆ0ˆrˆ0ˆr = − Rθˆφˆθˆφˆ

(24.62)

These are the components measured by static observers. By inserting these static-observer components and the Lorentz-transformation matrix (24.59) into the transformation law (24.60) we reach our goal: The following components of Riemann in the local Lorentz frame of the star’s freely imploding surface: 2M M , R¯0θ¯¯0θ¯ = R¯0φ¯¯0φ¯ = + 3 , 3 R R M 2M . = 3 , Rr¯θ¯ ¯r θ¯ = Rr¯φ¯ ¯r φ¯ = − R R3

R¯0¯r¯0¯r = − Rθ¯φ¯θ¯φ¯

(24.63)

25 These components are remarkable in two ways: First, they remain perfectly finite as the star’s surface approaches the gravitational radius; and, correspondingly, tidal gravity cannot destroy the star or the observers on its surface. Second, the components of Riemann are identically the same in the two orthonormal frames, hatted and barred, which move radially at finite speed β with respect to each other [expressions (24.63) are independent of β and are the same as (24.62)]. This is a result of the very special algebraic structure that Riemann’s components have for the Schwarzschild spacetime; it will not be true in typical spacetimes. From the finiteness of the components of Riemann in the surface’s local Lorentz frame, we conclude that something must be wrong with Schwarzschild’s t, r, θ, φ coordinate system in the vicinity of the gravitational radius r = 2M: Although nothing catastrophic happens to the star’s surface as it approaches 2M, those coordinates refuse to describe passage through r = 2M in a reasonable, smooth, finite way. Thus, in order to study the implosion as it passes through the gravitational radius and beyond, we shall need a new, improved coordinate system. Several coordinate systems have been devised for this purpose. For a study and comparison of them see, e.g., Chap. 31 of MTW. In this chapter we shall confine ourselves to one: A coordinate system devised for other purposes by Arthur Eddington (1922), then long forgotten and only rediscovered independently and used for this purpose by David Finkelstein (1958). Yevgeny Lifshitz, of Landau-Lifshitz fame, told one of the authors many years later what an enormous impact Finkelstein’s coordinate system had on peoples’ understanding of the implosion of stars. “You cannot appreciate how difficult it was for the human mind before Finkelstein to understand [the Oppenheimer-Snyder analysis of stellar implosion].” Lifshitz said. When, nineteen years after Oppenheimer and Snyder, the issue of the Physical Review containing Finkelstein’s paper arrived in Moscow, suddenly everything was clear. Finkelstein, a postdoctoral fellow at the Stevens Institute of Technology in Hoboken, New Jersey, found the following simple transformation which moves the region t = ∞, r = 2M of Schwarzschild coordinates in to a finite location. His transformation involves introducing a new time coordinate (24.64) t˜ = t + 2M ln |(r/2M) − 1| , but leaving unchanged the radial and angular coordinates. Figure 24.4 shows the surfaces of constant Eddington-Finkelstein time t˜ in Schwarzschild coordinates, and the surfaces of constant Schwarzschild time t in Eddington-Finkelstein coordinates. Notice, as advertised, that t = ∞, r = 2M is moved to a finite Eddington-Finkelstein location. By inserting the coordinate transformation (24.64) into the Schwarzschild line element (24.1) we obtain the following line element for Schwarzschild spacetime written in EddingtonFinkelstein coordinates: 2M 4M 2M 2 2 ds = − 1 − (24.65) dt˜ + dr 2 + r 2 (dθ2 + sin2 θdφ2 ) . dt˜dr + 1 + r r r Notice that, by contrast with the line element in Schwarzschild coordinates, none of the metric coefficients diverge as r approaches 2M. Moreover, in an Eddington-Finkelstein spacetime diagram, by contrast with Schwarzschild, the light cones do not pinch down to

26

t M 6 =4

(a)

t/M=4

/M

=6

0

M

=0

/M

2 t/

2

0

4

6

(b)

8

r/M

=2

2

t/M=

M

=

=4

t/

t/M=2

M

=2

4

t/

t/M=4

t

0

t/

6

t

t/M

2

t/M=6 M

4

t M

0

2

4

6

8

r/M

Fig. 24.4: (a) The 3-surfaces of constant Eddington-Finkelstein time coordinate t˜ drawn in a Schwarzschild spacetime diagram, with the angular coordinates θ, φ suppressed. (b) The 3-surfaces of constant Schwarzschild time coordinate t drawn in an Eddington-Finkelstein spacetime diagram, with angular coordinates suppressed.

slivers at r = 2M [compare Figs. 24.5(a) and 24.5(b)]: The world lines of radial light rays are computable in Eddington-Finkelstein, as in Schwarzschild, by setting ds2 = 0 (null world lines) and dθ = dφ = 0 (radial world lines) in the line element. The result, depicted in Fig. 24.5(a), is dt˜ dt˜ 1 + 2M/r for outgoing rays. (24.66) = −1 for ingoing rays; and = dr dr 1 − 2M/r Note that the ingoing light rays plunge unimpeded through r = 2M and in to r = 0 along 45-degree lines in the Eddington-Finkelstein coordinate system. The outgoing light rays, by contrast, are never able to escape outward through r = 2M: Because of the inward tilt of the outer edge of the light cone, all light rays that begin inside r = 2M are forced forever to remain inside, and in fact are drawn inexorably into r = 0, whereas light rays initially outside r = 2M can escape to r = ∞. Return, now, to the implosion of a star. The world line of the star’s surface, which became asymptotically frozen at the gravitational radius when studied in Schwarzschild coordinates, plunges unimpeded through r = 2M and into r = 0 when studied in Eddington-Finkelstein coordinates; see Ex. 24.7 and compare Figs. 24.6(b) and 24.6(a). Thus, in order to understand the star’s ultimate fate, we must study the region r = 0. As with r = 2M there are two possibilities: Either the tidal forces as measured on the star’s surface remain finite there, in which case something must be going wrong with the coordinate system; or else the tidal forces diverge, destroying the star. The tidal forces are computed in Ex. 24.8, with a remarkable result: They diverge. Thus, the region r = 0 is a spacetime singularity; a region where tidal gravity becomes infinitely large, destroying everything that falls into it. This, of course, is a very unsatisfying conclusion. It is hard to believe that the correct laws of physics will predict such total destruction. In fact, they probably do not. As we shall discuss in Chap. 26, when the radius of curvature of spacetime becomes as small as 1 lPW ≡ (G~/c3 ) 2 = 10−33 centimeters, space and time must cease to exist as classical entities;

27

t M

t M

6

6

4

4

2 0

2

(a) 2

4

6

r/M

8

(b)

0

2

4

6

r/M

8

Fig. 24.5: (a) Radial light rays, and light cones, for the Schwarzschild spacetime as depicted in Eddington-Finkelstein coordinates [Eq. (24.66)]. (b) These same light rays and light cones as depicted in Schwarzschild coordinates [cf. Fig. 24.3].

t M6

t M

4

(a)

(b)

4

2 0

6

2

2

4

6

8

r/M

0

2

4

6

8

r/M

Fig. 24.6: World line of an observer on the surface of an imploding star, as depicted (a) in an Eddington-Finkelstein spacetime diagram, and (b) in a Schwarzschild spacetime diagram; see Ex. 24.7.

28

t M 6 R I HORIZON ZO N

singularity

4 2

HO

nonsingular stellar matter, r =0

stellar surface

0

0

2

4

6

8

r/M

Fig. 24.7: Spacetime diagram depicting the formation and evolution of the horizon of a black hole. The coordinates outside the surface of the imploding star are those of Eddington and Finkelstein; those inside are a smooth continuation of Eddington and Finkelstein. Note that the horizon is the boundary of the region that is unable to send outgoing null geodesics to radial infinity.

they, and the spacetime geometry must then become quantized; and, correspondingly, general relativity must then break down and be replaced by a quantum theory of the structure of spacetime, i.e., a quantum theory of gravity. That quantum theory will describe and govern the classically singular region at the center of a black hole. Since, however, only rough hints of the structure of that quantum theory are in hand at this time, it is not known what that theory will say about the endpoint of stellar implosion. Unfortunately, the singularity and its quantum mechanical structure are totally invisible to observers in the external universe: The only way the singularity can possibly be seen is by means of light rays, or other signals, that emerge from its vicinity. However, because the future light cones are all directed into it (Fig. 24.6), no light-speed or sub-light-speed signals can ever emerge from it. In fact, because the outer edge of the light cone is tilted inward at every event inside the gravitational radius (Figs. 24.5 and 24.6), no signal can emerge from inside the gravitational radius to tell external observers what is going on there. In effect, the gravitational radius is an absolute event horizon for our universe, a horizon beyond which we cannot see—except by plunging through it, and paying the ultimate price for our momentary exploration of the hole’s interior. As most readers are aware, the region of strong, vacuum gravity left behind by the implosion of the star is called a black hole. The horizon, r = 2M, is the surface of the hole, and the region r < 2M is its interior. The spacetime geometry of the black hole, outside and at the surface of the star which creates it by implosion, is that of Schwarzschild—though, of course, Schwarzschild had no way of knowing this in the few brief months left to him after his discovery of the Schwarzschild line element. The horizon—defined as the boundary between spacetime regions that can and cannot communicate with the external universe—actually forms initially at the star’s center, and then expands to encompass the surface at the precise moment when the surface penetrates the gravitational radius. This evolution of the horizon is depicted in an Eddington-Finkelsteintype spacetime diagram in Fig. 24.7.

29 Our discussion here has been confined to spherically symmetric, nonrotating black holes created by the gravitational implosion of a spherically symmetric star. Real stars, of course, are not spherical; and it was widely believed—perhaps we should say hoped—in the 1950s and 1960s that black-hole horizons and singularities would be so unstable that small nonsphericities or small rotation of the imploding star would save it from the black-hole fate. However, elegant and very general analyses carried out in the 1960s, largely by the British physicists Roger Penrose and Stephen Hawking, showed otherwise; and more recent numerical simulations on supercomputers have confirmed those analyses: Singularities are a generic outcome of stellar implosion, as are the horizons that clothe them. **************************** EXERCISES Exercise 24.7 Example: Implosion of the Surface of a Zero-Pressure Star Consider the surface of a zero-pressure star, which implodes along a timelike geodesic r = R(t) in the Schwarzschild spacetime of its exterior. Analyze that implosion using Schwarzschild coordinates t, r, θ, φ, and the exterior metric (24.1) in those coordinates. (a) Show, using the result of Ex. 23.4(a), that the covariant time component ut of the 4velocity ~u of a particle on the star’s surface is conserved along its world line. Evaluate this conserved quantity in terms of the star’s mass M and the radius Ro at which it begins to implode. (b) Use the normalization of the 4-velocity to show that the star’s radius R as a function of the proper time τ since implosion began (proper time as measured on its surface) satisfies the differential equation 1 dR = −[const + 2M/R] 2 ; dτ

(24.67)

and evaluate the constant. Compare this with the equation of motion for the surface as predicted by Newtonian gravity, with proper time τ replaced by Newtonian time. (It is a coincidence that the two equations are identical.) (c) Show from the equation of motion (24.67) that the star implodes through the horizon R = 2M in a finite proper time of order (24.57). Show that this proper time has the magnitudes cited in Eq. (24.57) and the sentences following it. (d) Show, further, that when studied in Eddington-Finkelstein coordinates, the surface’s implosion to r = 2M requires only finite coordinate time t˜; in fact, a time of the same order of magnitude as the proper time (24.57). [Hint: from the Eddington-Finkelstein line element (24.65) and Eq. (24.57) derive a differential equation for dt˜/dτ along the ˜ world line of the star’s surface, and use it to examine the behavior of dt/dτ near R = 2M.] Show, further, that expression (24.67) remains valid all the way through the gravitational radius and in to r = 0. From this conclude that the proper time and Eddington-Finkelstein time to reach r = 0 are also of order (24.57).

30 (e) Show that the world line of the star’s surface as depicted in an Eddington-Finkelstein spacetime diagram has the form shown in Fig. 24.6(a), and that in a Schwarzschild spacetime diagram it has the form shown in 24.6(b). Exercise 24.8 Example: Gore at the Singularity (a) Knowing the world line of the surface of an imploding star in Eddington-Finkelstein coordinates, draw that world line in a Schwarzschild spacetime diagram. Note that as the world line approaches r = 0, it asymptotes to the curve {(t, θ, φ) = const, r variable}. Explain why this is required by the light-cone structure near r = 0. (b) Show that the curve to which it asymptotes, {(t, θ, φ) = const, r variable} is a timelike geodesic for r < 2M. [Hint: use the result of Ex. 23.4(a).] (c) Show that the basis vectors of the infalling observer’s local Lorentz frame near r = 0 are related to the Schwarzschild coordinate basis by 21 − 21 2M ∂ ∂ 2M 1 ∂ 1 ∂ ~eˆ0 = − −1 , ~eˆ1 = −1 , ~eˆ2 = , ~eˆ3 = . r ∂r r ∂t r ∂θ r sin θ ∂φ (24.68) What are the components of the Riemann tensor in that local Lorentz frame? (d) Show that the tidal forces produced by the Riemann tensor stretch an infalling observer in the radial, ~eˆ1 , direction and squeeze the observer in the tangential, ~eˆ2 and ~eˆ3 , directions; and show that the stretching and squeezing forces become infinitely strong as the observer approaches r = 0. (e) Idealize the body of an infalling observer to consist of a head of mass µ ≃ 20kg and feet of mass µ ≃ 20kg separated by a distance h ≃2 meters, as measured in the observer’s local Lorentz frame, and with the separation direction radial. Compute the stretching force between head and feet, as a function of proper time τ , as the observer falls into the singularity. Assume that the hole has the mass M = 5 × 109 M⊙ which is suggested by astronomical observations for a possible black hole at the center of the nearest giant elliptical galaxy to our own, the galaxy M87 (Sargent et al . 1978). How long before hitting the singularity (at what proper time τ ) does the observer die, if he or she is a human being made of flesh, bone, and blood? Exercise 24.9 Example: Wormholes Our study of the Schwarzschild solution of Einstein’s equations in this chapter has been confined to situations where, at small radii, the Schwarzschild geometry joins onto that of a star—either a static star, or a star that implodes to form a black hole. Suppose, by contrast, that there is no matter anywhere in the Schwarzschild spacetime. To get insight into this situation, construct an embedding diagram for the equatorial 2-surfaces t = const, θ = π/2 of the vacuum Schwarzschild spacetime, using as the starting point the line element of such a 2-surface written in isotropic coordinates [Ex. 24.3]: 4 M (2) 2 ds = 1 + (d¯ r 2 + r¯2 dφ2 ) . (24.69) 2¯ r

31 Show that the region 0 < r¯ ≪ M/2 is an asymptotically flat space, that the region r¯ ≫ M/2 is another asymptotically flat space, and that these two spaces are connected by a wormhole (“bridge,” “tunnel”) through the embedding space. This exercise, first carried out by Ludwig Flamm (1916) in Vienna just a few months after the discovery of the Schwarzschild solution, reveals that the pure vacuum Schwarzschild spacetime represents a wormhole that connects two different universes—or, with a change of topology, a wormhole that connects two widely separated regions of one universe. For further discussion see, e.g., Chap. 31 of MTW. For a discourse on why such wormholes almost certainly do not occur naturally in the real universe, and for analyses of whether the laws of physics as we know them allow advanced civilizations to construct wormholes, maintain them as interstellar travel devices, and convert them into “time machines,” see Morris and Thorne (1988); Morris, Thorne, and Yurtsever (1988); Friedman et al . (1990); Kim and Thorne (1991).

****************************

24.5

Spinning Black Holes: The Kerr Spacetime

24.5.1

The Kerr Metric for a Spinning Black Hole

Consider a star that collapses to form a black hole, and assume for pedagogical simplicity that during the collapse no energy, momentum, or angular momentum flows through a large sphere surrounding the system. Then the asymptotic conservation laws discussed in Sec. 23.9.4 guarantee that the mass M, linear momentum Pj , and angular momentum Sj of the newborn hole, as encoded in its asymptotic metric, will be identical to those of its parent star. If (as we shall assume) our asymptotic coordinates are those of the star’s rest frame so Pj = 0, then the hole will also be at rest in those coordinates, i.e. it will also have Pj = 0. If the star was non-spinning so Sj = 0, then the hole will also have Sj = 0, and a powerful theorem due to Werner Israel guarantees that—after it has settled down into a quiescent state—the hole’s spacetime geometry will be that of Schwarzschild. If, instead, the star was spinning so Sj 6= 0, then the final, quiescent hole cannot be that of Schwarzschild. Instead, according to a powerful theorem due to Hawking, Carter, Robinson, and others, its spacetime geometry will be the following exact, vacuum solution to the Einstein field equation (which is called the Kerr solution because it was discovered by the New-Zealand mathematician Roy Kerr3 ): ds2 = −α2 dt2 +

ρ2 2 dr + ρ2 dθ2 + ̟ 2 (dφ − ωdt)2 . ∆

Here ∆ = r 2 + a2 − 2Mr , 3

ρ2 = r 2 + a2 cos2 θ ,

Kerr, R. P. 1963 Phys Rev. Lett. 11, 237.

Σ2 = (r 2 + a2 )2 − a2 ∆ sin2 θ ,

(24.70a)

32 α2 =

ρ2 ∆, Σ2

̟2 =

Σ2 sin2 θ , ρ2

ω=

2aMr . Σ2

(24.70b)

In this line element {t, r, θ, φ} are the coordinates, and there are two constants, M and a. The physical meanings of M and a can be deduced from the asymptotic form of this Kerr metric at large radii: 4Ma 2 M 2M 2 2 dt − [dr 2 + r 2 (dθ2 + sin2 θdφ2 )] . sin θdφdt + 1 + O ds = − 1 − r r r (24.71) By comparing with the standard asymptotic metric in spherical coordinates, Eq. (23.113), we see that M is the mass of the black hole, Ma ≡ JH is the magnitude of its spin angular momentum, and its spin points along the polar axis, θ = 0. Evidently, then, the constant a is the hole’s angular momentum per unit mass; it has the same dimensions as M: length (in geometrized units). It is easy to verify that, in the limit a → 0, the Kerr metric (24.70) reduces to the Schwarzschild metric (24.1), and the coordinates {t, r, θ, φ} in which we have written it (called “Boyer-Lindquist coordinates”) reduce to Schwarzschild’s coordinates. Just as it is convenient to read the covariant metric components gαβ off the line element (24.70a) via ds2 = gαβ dxα dxβ , so also it is convenient to read the contravariant metric ~ ·∇ ~ = g αβ ∇α ∇β . (Here components g αβ off an expression for the wave operator ≡ ∇ ∇α ≡ ∇~eα is the directional derivative along the basis vector ~eα .) For the Kerr metric (24.70a), a straightforward inversion of the matrix ||gαβ || gives the ||g αβ || embodied in the following equation: =

24.5.2

∆ 1 1 −1 (∇t + ω∇φ )2 + 2 ∇r 2 + 2 ∇θ 2 + 2 ∇φ 2 . 2 α ρ ρ ̟

(24.72)

Dragging of Inertial Frames

As we shall see in Chap. 25, the spin of a black hole (or any other system in asymptotically flat spacetime) can be measured by its influence on the orientation of gyroscopes in the asymptotic region: The spin drags inertial frames into rotational motion around the black hole, thereby causing gyroscopes to precess. This frame dragging also shows up in the geodesic trajectories of freely falling particles. Consider, for concreteness, a particle dropped from rest far outside the black hole. Its initial 4-velocity will be ~u = ∂/∂t, and correspondingly, in the distant, flat region of spacetime, the covariant components of ~u will be ut = −1, ur = uθ = uφ = 0. Now, the Kerr metric coefficients gαβ , like those of Schwarzschild, are independent of t and φ; i.e., the Kerr metric is symmetric under time translation (it is “stationary”) and under rotation about the hole’s spin axis (it is “axially symmetric”). These symmetries impose corresponding conservation laws on the infalling particle [Ex. 23.4(a)]: u0 and uφ are conserved;, i.e. they retain their initial values u0 = −1 and uφ = 0 as the particle falls. By raising indices, uα = g αβ uβ , using the metric coefficients embodied in Eq. (24.72), we learn the evolution of the contravariant 4-velocity components, ut = −g tt = 1/α2 , uφ = −g tφ =

33 ω/α2. These in turn imply that as the particle falls, it acquires an angular velocity around the hole’s spin axis given by Ω≡

dφ dφ/dτ uφ = = t =ω. dt dt/dτ u

(24.73)

(The coordinates φ and t are tied to the rotational and time-translation symmetries of the spacetime, so they are very special; that is why we can use them to define a physically meaningful angular velocity.) At large radii, ω = 4aM/r → 0 as r → ∞. Therefore, when first dropped, the particle falls radially inward. However, as the particle nears the hole and picks up speed, it acquires a significant angular velocity around the hole’s spin axis. The physical cause of this is frame dragging: The hole’s spin drags inertial frames into rotation around the spin axis, and that inertial rotation drags the inertially falling particle into a circulatory orbital motion.

24.5.3

The Light-Cone Structure, and the Horizon

Just as for a Schwarzschild hole, so also for Kerr, the light-cone structure is a powerful tool for identifying the horizon and exploring the spacetime geometry near it. At any event in spacetime, the tangents to the light cone are those displacements {dt, dr, dθ, dφ} along which ds2 = 0. The outermost and innermost edges of the cone are those for which (dr/dt)2 is maximal. By setting expression (24.70a) to zero we see that dr 2 has its maximum value, for a given dt2 , when dφ = ωdt and dθ = 0. In other words, the photons that move radially outward or inward at the fastest possible rate are those whose angular motion is that of frame dragging, Eq. (24.73). For these extremal photons, the radial motion (along the outer and inner edges of the light cone) is √ dr α ∆ ∆ =± =± . (24.74) dt ρ Σ Now, Σ is positive definite, but ∆ is not; it decreases monotonically, with decreasing radius, reaching zero at √ r = rH ≡ M + M 2 − a2 (24.75) [Eq. (24.70b)]. (We shall assume that |a| < M so rH is real, and shall justify this assumption below.) Correspondingly, the light cone closes up to a sliver then pinches off as r → rH ; and it pinches onto a null curve (actually, a null geodesic) given by r = rH ,

θ = constant ,

φ = ΩH t + constant ,

where ΩH = ω(r = rH ) =

a . 2MrH

(24.76)

(24.77)

This light-cone structure is depicted in Fig. 24.8(a,b). The light-cone pinch off as shown there is the same as that for Schwarzschild spacetime (Fig. 24.2) except for the framedragging-induced angular tilt dφ/dt = ω of the light cones. In the Schwarzschild case, as

34 Horizon

Horizon

(c)

Horizon

Horizon

(a)

or

t ra ne

ge

or

t ra ne

(b)

(d)

ge

Fig. 24.8: (a) and (b): Light-cone structure of Kerr spacetime depicted in Boyer-Lindquist coordinates. Drawing (b) is a spacetime diagram; drawing (a) is the same diagram as viewed from above. (c) and (d): The same light-cone structure in Kerr coordinates.

r → 2M, the light cones pinch onto the geodesic world lines {r = 2M, θ = const, φ = const} of photons that travel along the horizon. These null world lines are called the horizon’s generators. In the Kerr case the light-cone pinchoff reveals that the horizon is at r = rH , and the horizon generators are null geodesics that travel around and around the horizon with angular velocity ΩH . This motivates us to regard the horizon itself as having the rotational angular velocity ΩH . Whenever a finite-rest-mass particle falls into a spinning black hole, its world line, as it nears the horizon, is constrained always to lie inside the light cone. The light-cone pinch off then constrains its motion to approach, asymptotically, the horizon generators. Therefore, as seen in Boyer-Lindquist coordinates, the particle is dragged into an orbital motion, just above the horizon, with asymptotic angular velocity dφ/dt = ΩH , and it travels around and around the horizon “forever” (for infinite Boyer-Lindquist coordinate time t), and never (as t → ∞) manages to cross through the horizon. As in the Schwarzschild case, so also in Kerr, this infall to r = rH requires only finite proper time τ as measured by the particle, and the particle feels only finite tidal forces (only finite values of the components of Riemann in its proper reference frame). Therefore, as for Schwarzschild spacetime, the “barrier” to infall through r = rH must be an illusion produced by a pathology of the Boyer-Lindquist coordinates at r = rH . This coordinate pathology can be removed by a variety of different coordinate transformations. One is the following change of the time and angular coordinates: Z Z a 2Mr ˜ ˜ (24.78) dr , φ = φ + dr . t=t+ ∆ ∆

35 The new (tilded) coordinates are a variant of a coordinate system originally introduced by Kerr, so we shall call them “Kerr coordinates”. By inserting the coordinate transformation (24.78) into the line element (24.70a), we obtain the following form of the Kerr metric, in Kerr coordinates: 2 2 2 2 a(ρ + 2Mr) ρ (ρ + 2Mr) 4Mrρ2 2 2 2 2 drdt˜ + dr + ρ dθ + ̟ dφ˜ − ωdt˜ − dr ds = −α dt + Σ2 Σ2 Σ2 (24.79) It is easy to verify that when a → 0 (so Kerr spacetime becomes Schwarzschild), the Kerr coordinates (24.78) become those of Eddington and Finkelstein [Eq. (24.64)], and the Kerr line element (24.79) becomes the Eddington-Finkelstein one [Eq. (24.65)]. Similarly, when one explores the light-cone structure for a spinning black hole in the Kerr coordinates [Fig. 24.8(c,d)], one finds a structure like that of Eddington-Finkelstein [Fig. 24.5(a)]: At large radii, r ≫ M, the light cones have their usual 45-degree form, but as one moves inward toward the horizon, they begin to tilt inward. In addition to the inward tilt, there is a frame-dragging-induced tilt in the direction of the hole’s rotation, +φ. At the horizon the outermost edge of the light cone is tangent to the horizon generators; and in Kerr coordinates, as in Boyer-Lindquist, these generators rotate around the horizon with angular ˜ t˜ = ΩH [cf. Eq. (24.78), which says that at fixed r, t˜ = t + constant and velocity dφ/d ˜ φ = φ + constant]. This light-cone structure demonstrates graphically that the horizon is at the radius r = rH . Outside there, the outer edge of the light cone tilts toward increasing r and so it is possible to escape to infinity. Inside there the outer edge of the light cone tilts inward, and all forms of matter and energy are forced to move inward, toward a singularity whose structure, presumably, is governed by the laws of quantum gravity.4 2

24.5.4

2

˜2

Evolution of Black Holes: Rotational Energy and Its Extraction

When a spinning star collapses to form a black hole, its centrifugal forces will flatten it, and the dynamical growth of flattening will produce gravitational radiation (Chap. 25). The newborn hole will also be flattened and will not have the Kerr shape; but rather quickly, within a time ∆t ∼ 10’s or 100’s of M ∼ 10µs(M/M⊙ ), the deformed hole will shake off its deformations as gravitational waves and settle down into the Kerr shape. This is the conclusion of extensive analyses, both analytic and numerical. Many black holes are in binary orbits with stellar companions, and pull gas off their companions and swallow it. Other black holes accrete gas from interstellar space. Any such accretion causes the hole’s mass and spin to evolve in accord with the conservation laws (23.114) and (23.115). One might have thought that by accreting a large amount of angular 4

Much hooplah has been made of the fact that in the Kerr spacetime it is possible to travel inward, through a√“Cauchy horizon” and then into another universe. However, the Cauchy horizon, located at r = M − M 2 − a2 is highly unstable against perturbations, which convert it into a singularity with infinte spacetime curvature. For details of this instability and the singularity, see, e.g., Brady, Droz and Morsink (1998) and references therein.

36 momentum, a hole’s angular momentum √ per unit mass a could grow larger than its mass M. If this were to happen, then rH = M + M 2 − a2 would cease to be a real radius—a fact that signals the destruction of the hole’s horizon: As a grows to exceed M, the inward light-cone tilt gets reduced so that everywhere the outer edge of the cone points toward increasing r, which means that light, particles, and information are no longer trapped. Remarkably, however, it appears that the laws of general relativity forbid a ever to grow larger than M. As accretion pushes a/M upward toward unity, the increasing drag of inertial frames causes a big increase of the hole’s cross section to capture material with negative angular momentum (which will spin the hole down) and a growing resistance to capturing any further material with large positive angular momentum. Infalling particles that might try to push a/M over the limit get flung back out by huge centrifugal forces, before they can reach the horizon. A black hole, it appears, is totally resistant against having its horizon destroyed. In 1969, Roger Penrose discovered that a large fraction of the mass of a spinning black hole is in the form of rotational energy, stored in the whirling spacetime curvature outside the hole’s horizon; and this rotational energy can be extracted. Penrose discovered this by the following thought experiment: From far outside the hole, you throw a massive particle into the vicinity of the hole’s ~ = ∂/∂t. horizon. Assuming you are at rest with respect to the hole, your 4-velocity is U in in in in ~ = −~p · (∂/∂t) = −p the energy of the particle (rest mass Denote by E = −~p · U t plus kinetic), as measured by you; cf. Eq. (1.38). As the particle falls, pin t is conserved because of the Kerr metric’s time-translation symmetry. Arrange that, as the particle nears the horizon, it splits into two particles, one (labeled “plunge”) plunges through the horizon and the other (labeled “out”) flies back out to large radii, where you catch it. Denote by E plunge ≡ −pplunge the conserved energy of the plunging particle and by E out ≡ −pout that of t t the out-flying particle. Four-momentum conservation at the event of the split dictates that p~in = p~plunge + p~out , which implies this same conservation law for all the components of the 4-momenta, in particular E out = E in − E plunge . (24.80) Now, it is a remarkable fact that the Boyer-Lindquist time basis vector ∂/∂t has a squared length ∂/∂t · ∂/∂t = gtt = −α2 + ̟ 2ω 2 that becomes positive at radii r < rergo = M +

√

M 2 − a2 cos2 θ ,

(24.81)

which is larger than rH everywhere except on the hole’s spin axis, θ = 0, π. The region rH < r < rergo is called the hole’s ergosphere. If the split into two particles occurs in the ergosphere, then it is possible to arrange the split such that the scalar product of the timelike vector ~pplunge with the spacelike vector ∂/∂t is positive, which means that the plunging particle’s conserved energy E plunge = −~pplunge · (∂/∂t) is negative; whence [by Eq. (24.80)] E out > E in .

(24.82)

see Ex. 24.11(a). When the outflying particle reaches your location, r ≫ M, its conserved energy is equal to its physically measured total energy (rest-mass plus kinetic); and the fact that E out > E in

37 means that you get back more energy (rest-mass plus kinetic) than you put in. The hole’s asymptotic energy-conservation law (23.114) implies that the hole’s mass has decreased by precisely the amount of energy that you have extracted, ∆M = −(E out − E in ) = E plunge < 0 .

(24.83)

A closer scrutiny of this process [Ex. 24.11(f)] reveals that the plunging particle must have had negative angular momentum, so it has spun the hole down a bit. The energy you extracted, in fact, came from the hole’s enormous store of rotational energy, which makes up part of its mass M; and your extraction of energy has reduced that rotational energy. Stephen Hawking has used sophisticated mathematical techniques to prove that, independently of how you carry out this thought experiment, and, indeed, independently of what is done to a black hole, general relativity requires that the horizon’s surface area AH never decrease. This is called the second law of black-hole mechanics, and it actually turns out to be a variant of the second law of thermodynamics, in disguise. A straightforward calculation (Ex. 24.10) reveals that the horizon surface area is given by

38

2 AH = 4π(rH + a2 ) = 8πMrH

AH = 16πM 2

for a spinning hole,

for a nonspinning hole, a = 0.

(24.84a) (24.84b)

Dimitrious Christodoulou has shown (cf. Ex. 24.11) that, in the Penrose process, the nondecrease of AH is the only constraint on how much energy one can extract, so by a sequence of optimally designed particle injections and splits that keep AH unchanged, one can reduce the mass of the hole to s r √ AH M(M + M 2 − a2 ) = , Mirr = (24.85) 16π 2 but no smaller. This is called the hole’s irreducible mass. The hole’s total mass is the sum of its irreducible mass and its rotational energy Mrot ; so the rotational energy is Mrot = M − Mirr

"

# r p 1 1 + 1 − a2 /M 2 =M 1− . 2

(24.86)

√ For the fastest possible spin, a = M, this gives Mrot = M(1 − 1/ 2) ≃ 0.2929M. This is the maximum amount of energy that can be extracted, and it is enormous compared to the energy ∼ 0.005M that can be released by thermonuclear burning in a star with mass M. The Penrose process of throwing in particles and splitting them in two is highly idealized, and of little or no importance in Nature. However, Nature seems to have found a very effective method for extracting rotational energy from spinning black holes: the BlandfordZnajek process in which magnetic fields, threading through a black hole and held on the hole by a surrounding disk of hot plasma, extract energy electromagnetically. This process is thought to power the gigantic jets that shoot out of the nuclei of some galaxies, and might also be the engine for some powerful gamma-ray bursts. **************************** EXERCISES Exercise 24.10 Derivation: Surface Area of a Spinning Black Hole From the Kerr metric (24.71) derive Eq. (24.84) for the surface area of a spinning black hole’s horizon—i.e., the surface area of the two-dimensional surface {r = rH , t = constant}. Exercise 24.11 Example: Penrose Process, Hawking Radiation, and Thermodynamics of Black Holes (a) Consider the Penrose process, described in the text, in which a particle flying inward toward a spinning hole’s horizon splits in two inside the ergosphere, and one piece

39 plunges into the hole while the other flies back out. Show that it is always possible to arrange this process so the plunging particle has negative energy, E plunge = −~pplunge · ∂/∂t < 0. [Hint: Perform a calculation in a local Lorentz frame in which ∂/∂t points along a spatial basis vector, ~eˆ1 . Why is it possible to find such a local Lorentz frame?] (b) Around a spinning black hole consider the vector field ξ~H ≡ ∂/∂t + ΩH ∂/∂φ ,

(24.87)

where ΩH is the Horizon’s angular velocity. Show that in the horizon (at radius r = rH ) this vector field is null and is tangent to the horizon generators. Show that all other vectors in the horizon are spacelike. (c) In the Penrose process, the plunging particle changes the hole’s mass by an amount ∆M and its spin angular momentum by an amount ∆JH . Show that ∆M − ΩH ∆JH = −~pplunge · ξ~H .

(24.88)

Here ~pplunge and ξ~H are to be evaluated at the event where the particle plunges through the horizon, so they both reside in the same tangent space. [Hint: the angular momentum carried into the horizon is the quantity pplunge = p~plunge · ∂/∂φ. Why? This φ quantity is conserved along the plunging particle’s world line. Why?] ~ is any future directed timelike vector and K ~ is any null vector, both (d) Show that if A ~·K ~ < 0. [Hint: living in the tangent space at the same event in spacetime, then A Perform a calculation in a specially chosen local Lorentz frame.] Thereby conclude that −~pplunge · ξ~H is positive, whatever may be the world line and rest mass of the plunging particle. (e) Show that, in order for the plunging particle to decrease the hole’s mass, it must also decrease the hole’s angular momentum; i.e., it must spin the hole down a bit. (f) The second law of black-hole mechanics says that, whatever may be the particle’s world line and rest mass, when the particle plunges through the horizon it causes the horizon’s surface area AH to increase. This suggests that the always positive quantity ∆M − ΩH ∆JH = −~pplunge · ξ~H might be a multiple of the increase ∆AH of the horizon area. Show that this is, indeed, the case: ∆M = ΩH ∆JH +

gH ∆AH , 8π

(24.89)

where gH is given in terms of the hole’s mass M and the radius rH of its horizon by gH =

rH − M . 2MrH

(24.90)

[You might want to do the algebra, based on Kerr-metric formulae, on a computer.] The quantity gH is called the hole’s “surface gravity” for a variety of reasons, including

40 the fact that an observer who hovers just above a horizon generator, blasting his or her rocket engines to avoid falling into the hole, has a 4-acceleration with magnitude gH /α and thus feels a “gravitational acceleration” of this magnitude; here α = g tt is a component of the Kerr metric, Eqs. (24.70a) and (24.72). This gravitational acceleration is arbitrarily large for an observer arbitrarily close to the horizon (where ∆ and hence α is arbitrarily close to zero); when renormalized by α to make it finite, the accelation is gH . (g) Stephen Hawking has shown, using quantum field theory, that a black hole’s horizon emits thermal (black-body) radiation. The temperature of this “Hawking radiation”, as measured by the observer who hovers just above the horizon, is proportional to the gravitational acceleration gH /α that the observer measures, with a proportionality constant ~/2πkB , where ~ is Planck’s constant and kB is Boltzmann’s constant. As this thermal radiation climbs out of the horizon’s vicinity and flies off to large radii, its frequencies and temperature get redshifted by the factor α, so as measured by distant observers the temperature is ~ gH . TH = (24.91) 2πkB This suggests a reinterpretation of Eq. (24.89) as the first law of thermodynamics for a black hole: ∆M = ΩH ∆JH + TH ∆SH , (24.92) where SH is the hole’s entropy; cf. Eq. (3.71). Show that this entropy is related to the horizon’s surface area by AH SH = k B 2 , (24.93) 4ℓp p where ℓp = ~G/c3 = 1.616 × 10−33 cm is the Planck length (with G Newton’s gravitation constant and c the speed of light). Because SH ∝ AH , the second law of black-hole mechanics is actually the second law of thermodynamics in disguise. A black hole’s entropy always increases. [Actually, the emission of the Hawking radiation will decrease the hole’s entropy and surface area; but general relativity doesn’t know about this because general relativity is a classical theory, and Hawking’s prediction of the thermal radiation is based on quantum theory. Thus, the Hawking radiation violates the second law of black hole mechanics. It does not, however, violate the second law of thermodynamics, because the entropy carried into the surrounding universe by the Hawking radiation exceeds the magnitude of the decrease of the hole’s entropy. The total entropy of hole plus universe increases.] (h) For a ten solar mass, nonspinning black hole, what is the temperature of the Hawking radiation in degrees Kelvin, and what is the hole’s entropy in units of the Boltzmann constant? (i) Reread the discussions of black-hole thermodynamics and entropy in the expanding universe in Secs. 3.11.2 and 3.11.3, which rely on the results of this exercise.

t=

er observ

obs

observ

er

t

erver

41

3

t=2 t=1

t =0

x

(a)

(b)

(c)

Fig. 24.9: Spacetime diagrams showing the slices of simultaneity as defined by various families of observers. Diagram (a) is in flat spacetime, and the three families (those with solid slices, those with dashed, and those with dotted) are inertial, so their slices of constant time are those of global Lorentz frames. Diagram (b) is in curved spacetime, and the two families’ slices of simultaneity illustrate the “many fingered” nature of time. Diagram (c) illustrates the selection of an arbitrary foliation of spacelike hypersurfaces of simultaneity, and the subsequent construction of the world lines of observers who move orthogonal to those hypersurfaces, i.e., for whom light-ray synchronization will define those hypersurfaces as simultaneities.

****************************

24.6

The Many-Fingered Nature of Time

We conclude this chapter with a discussion of a concept which John Archibald Wheeler (the person who has most clarified the conceptual underpinnings of general relativity) calls the many-fingered nature of time. In the flat spacetime of special relativity there are preferred families of observers: Each such family lives in a global Lorentz reference frame and uses that frame to split spacetime into space plus time. The hypersurfaces of constant time (“slices of simultaneity”) which result from that split are flat hypersurfaces which slice through all of spacetime [Fig. 24.9(a)]. Of course, different preferred families live in different global Lorentz frames and thus split up spacetime into space plus time in different manners [e.g., the dotted slices of constant time in Fig. 24.9(a) as contrasted to the dashed ones]. As a result, there is no universal concept of time in special relativity; but, at least, there are some strong restrictions on time: Each family of observers will agree that another family’s slices of simultaneity are flat slices. In general relativity, i.e., in curved spacetime, even this restriction is gone: In a generic curved spacetime there are no flat hypersurfaces, and hence no candidates for flat slices of simultaneity. Hand in hand with this goes the fact that, in a generic curved spacetime there are no global Lorentz frames, and thus no preferred families of observers. A family of observers who are all initially at rest with respect to each other, and each of whom moves freely (inertially), will soon acquire relative motion because of tidal forces. As a result,

42 their slices of simultaneity (defined locally by Einstein light-ray synchronization, and then defined globally by patching together the little local bits of slices) may soon become rather contorted. Correspondingly, as is shown in Fig. 24.9(b), different families of observers will slice spacetime up into space plus time in manners that can be quite distorted, relative to each other—with “fingers” of one family’s time slices pushing forward, ahead of the other family’s here, and lagging behind there, and pushing ahead in some other place. In curved spacetime it is best to not even restrict oneself to inertial (freely falling) observers. For example, in the spacetime of a static star, or of the exterior of a Schwarzschild black hole, the family of static observers [observers whose world lines are {(r, θ, φ) = const, t varying}] are particularly simple; their world lines mold themselves to the static structure of spacetime in a simple, static manner. However, these observers are not inertial; they do not fall freely. This need not prevent us from using them to split up spacetime into space plus time, however. Their proper reference frames produce a perfectly good split; and when one uses that split, in the case of a black hole, one obtains a 3-dimensional-space version of the laws of black-hole physics which is a useful tool in astrophysical research; see Thorne, Price, and Macdonald (1986). For any family of observers, accelerated or inertial, the slices of simultaneity as defined by Einstein light-ray synchronization (or equivalently by the space slices of the observer’s proper reference frames) are the 3-surfaces orthogonal to the observers’ world lines; cf. Fig. 24.9(c). To see this most easily, pick a specific event along a specific observer’s world line, and study the slice of simultaneity there from the viewpoint of a local Lorentz frame in which the observer is momentarily at rest. Light-ray synchronization guarantees that, locally, the observer’s slice of simultaneity will be the same as that of this local Lorentz frame; and, since the frame’s slice is orthogonal to its own time direction and that time direction is the same as the direction of the observer’s world line, the slice is orthogonal to the observer’s world line. By the discussion in Sec. 22.5, the slice is also the same, locally (to first order in distance away from the world line), as a slice of constant time in the observer’s proper reference frame. If the observers’ relative motions are sufficiently contorted (in curved spacetime or in flat), it may not be possible to mesh their local slices of simultaneity, defined in this manner, into global slices of simultaneity; i.e., there may not be any global 3-dimensional hypersurfaces orthogonal to their world lines. We can protect against this eventuality, however, by choosing the slices first: Select any foliation of spacelike slices through the curved spacetime [Fig. 24.9(c)]. Then there will be a family of timelike world lines that are everywhere orthogonal to these hypersurfaces. A family of observers who move along those world lines and who define their 3-spaces of simultaneity by local light-ray synchronization will thereby identify the orthogonal hypersurfaces as their simultaneities. Ex. 24.12 illustrates these ideas using Schwarzschild spacetime. **************************** EXERCISES Exercise 24.12 Practice: Slices of Simultaneity in Schwarzschild Spacetime

43 (a) One possible choice of slices of simultaneity for Schwarzschild spacetime is the set of 3surfaces t = const, where t is the Schwarzschild time coordinate. Show that the unique family of observers for whom these are the simultaneities are the static observers, with world lines {(r, θ, φ) = const, t varying}. Explain why these slices of simultaneity and families of observers exist only outside the horizon of a black hole, and cannot be extended into the interior. Draw a picture of the world lines of these observers and their slices of simultaneity in an Eddington-Finkelstein spacetime diagram. (b) A second possible choice of simultaneities is the set of 3-surfaces t˜ = const, where t˜ is the Eddington-Finkelstein time coordinate. What are the world lines of the observers for whom these are the simultaneities? Draw a picture of those world lines in an Eddington-Finkelstein spacetime diagram. Note that they and their simultaneities cover the interior of the hole as well as its exterior. ****************************

Bibliographic Note In our opinion, the best elementary textbook treatment of black holes and relativistic stars is that in Chaps. 12, 13, 15, and 24 of Hartle (2003); this treatment is also remarkably complete. For nonrotating, relativistic stars at an elementary level we also recommend Chap. 10 of Schutz (1980), and at a more advanced level (including stellar pulsations), Chaps. 23, 24, and 26 of MTW. For black holes at an intermediate level see Chaps. 5 and 6 of Carroll (2004), and at a more advanced level, Chap. 12 of Wald (1984) which is brief and highly mathematical, and Chaps. 31–34 of MTW which is long and less mathematical. The above are all portions of general relativity textbooks. There are a number of books and monographs devoted solely to the theory of black holes and/or relativistic stars. Among these, we particularly recommend the following: Shapiro and Teukolsky (1983) is an astrophysically oriented book at much the same level as this chapter, but with much greater detail and extensive applications; it deals with black holes, neutron stars and white dwarf stars in astrophysical settings. Frolov and Novikov (1998) is a very thorough monograph on black holes, including their fundamental theory, and their interactions with the rest of the universe; it includes extensive references to the original literature and readable summaries of all the important issues that had been treated by black-hole researchers as of 1997. Chandrasekhar (1983) is an idiosynchratic but elegant and complete monograph on the theory of black holes and especially small perturbations of them.

Bibliography Bertotti, Bruno, 1959. “Uniform Electromagnetic Field in the Theory of General Relativity,” Physical Review, 116, 1331–1333.

44 Box 24.3 Important Concepts in Chapter 25 • Schwarzschild spacetime geometry – Metric in Schwarzschild coordinates, Eq. (24.1); in isotropic coordinates, Ex. 24.3; in Eddington-Finkelstein coordinates, Eqs. (24.64, (24.65) – Connection coefficients and curvature tensors in Schwarzschild coordinates and in their orthonormal basis, Box 24.2 • Deducing the properties of a spacetime and the nature of the coordinates from a metric, Sec. sec:25Schwartzschild and Ex. 24.2 • Birkhoff’s theorem, Sec. 24.3.1 • Relativistic stars, Sec. 24.3 – Radius R always larger than gravitational radius 2M, Eq. (24.15), Sec. 24.3.5 – Metric (24.16) and stress-energy tensor (24.26) – Proper reference frame of fluid, Sec. 24.3.2 – Deducing equations of structure from local energy-momentum conservation and the Einstein field equations, Secs. 24.3.3 and 24.3.4 – Momentum conservation implies (ρ + P )a = −∇P , Sec. 24.3.3 – Embedding Diagram, Sec. 24.3.5, Fig. 24.1, Ex. 24.4 – Gravitational Redshift, Ex. 24.5 • Implosion of a star to form a Schwarzschild black hole, Sec. 24.4 – To reach R = 2M (horizon radius): infinite Schwarzschild coordinate time t but finite proper time τ and Eddington-Finkelstein time t˜, Sec. 24.4, Ex. 24.7 – Finite tidal forces at horizon radius, Eq. (24.63) – Black-hole horizon: its formation and evolution, Fig. 24.7 – Infinite tidal forces at singularity, Ex. 24.8 • Wormholes, Ex. 24.9 • Spinning black holes: the Kerr Spacetime, Sec. 24.5 – Kerr metric: in Boyer-Lindquist coordinates, Eqs. (24.70); in Kerr coordinates, Eq. (24.79) – Dragging of inertial frames, Sec. 24.5.2 – Horizon generators, Sec. 24.5.3, Fig. 24.8 – Horizon radius rH , Eq. (24.75); horizon angular velocity ΩH , Eq. (24.77); horizon surface area AH , Eqs. (24.84), horizon surface gravity gH , Eq. (24.90) – Second law of black-hole mechanics and thermodynamics, Ex. 24.11 – Hawking radiation, and black-hole temperature and entropy, Ex. 24.11 – Rotational energy, energy extraction, ergosphere, irreducible mass, Sec. 24.5.4

45 Birkhoff, George, 1923. Relativity and Modern Physics, Harvard University Press, Cambridge, Massachusetts. Brady, Patrick R., Droz, S., and Morsink, Sharon M., 1998. “The late-time singularity inside non-spherical black holes,” Phys. Rev. D, D58, 084034. Carroll, S. M., 2004. Spacetime and Geometry: An Introduction to General Relativity, San Francisco: Addison Wesley. Chandrasekhar, Subramahnian, 1983. The Mathematical Theory of Black Holes, Oxford University Press, Oxford. Eddington, Arthur S., 22. The Mathematical Theory of Relativity, Cambridge University Press, Cambridge. Finkelstein, David, 1958. “Past-future asymmetry of the gravitational field of a point particle,” Physical Review, 110, 965–967. Flamm, Ludwig, 1916. “Beiträge zur Einsteinschen Gravitationstheorie,” Physik Z., 17, 448–454. Friedman, John, Morris, Michael S., Novikov, Igor D., Echeverria, Fernando, Klinkhammer, Gunnar, Thorne, Kip S., and Yurtsever, Ulvi, 1990. “Cauchy problem in spacetimes with closed timelike curves,” Physical Review D, 42, 1915–1930. Hartle, J. B., 2003. Gravity: An Introduction to Einstein’s General Relativity, San Francisco: Addison-Wesley. Kim, Sung-Won and Thorne, Kip S., 1991. “Do vacuum fluctuations prevent the creation of closed timelike curves?” Physical Review D, 44, 1077–1099. MTW: Misner, Charles W., Thorne, Kip S. and Wheeler, John A., 1973. Gravitation, W. H. Freeman & Co., San Francisco. Morris, Mike and Thorne, Kip S., 1988. “Wormholes in spacetime and their use for interstellar travel: a tool for teaching general relativity,” American Journal of Physics, 56, 395–416. Morris, Mike, Thorne, Kip S., and Yurtsever, Ulvi, 1988. “Wormholes, time machines, and the weak energy condition,” Physical Review Letters, 61, 1446–1449. Frolov, Valery Pavlovich and Novikov, Igor Dmitrievich, 1998. The Physics of Black Holes, second edition, Kluwer Academic Publishers, Berlin. Oppenheimer, J. Robert, and Snyder, Hartland, 1939. “On continued gravitational contraction,” Physical Review, 56, 455–459. Oppenheimer, J. Robert, and Volkoff, George, 1939. “On massive neutron cores,” Physical Review, 55, 374–381.

46 Robinson, Ivor, 1959. “A Solution of the Maxwell-Einstein Equations,” Bull. Acad. Polon. Sci., 7, 351–352. Sargent, Wallace L. W., Young, Peter J., Boksenberg, A., Shortridge, Keith, Lynds, C. R., and Hartwick, F. D. A., 1978. “Dynamical evidence for a central mass condensation in the galaxy M87,” Astrophysical Journal, 221, 731–744. Schutz, B. 1980. Geometrical Methods of Mathematical Physics, Cambridge: Cambridge University Press. ¨ Schwarzschild, Karl, 1916a. “Uber das Gravitationsfeld eines Massenpunktes nach der Einsteinschen Theorie,” Sitzungsberichte der Königlich Preussischen Akademie der Wissenschaften, 1916 vol. I, 189–196. ¨ Schwarzschild, Karl, 1916b. “Uber das Gravitationsfeld einer Kugel aus inkompressibler Fl¨ ussigkeit nach der Einsteinschen Theorie,” Sitzungsberichte der Königlich Preussischen Akademie der Wissenschaften, 1916 vol. I, 424–434. Shapiro, Stuart L. and Teukolsky, Saul A., 1983. Black Holes, White Dwarfs, and Neutron Stars, Wiley, New York. Thorne, Kip S., Price, Richard H. and Macdonald, Douglas A., 1986. Black Holes: The Membrane Paradigm, Yale University Press, New Haven, Conn.. Tolman, Richard Chace, 1939. “Static solutions of Einstein’s field equations for spheres of fluid,” Physical Review, 55, 364–373. Wald, R. M. 1984. General Relativity, Chicago: University of Chicago Press.

Contents 25 Gravitational Waves and Experimental Tests of General Relativity 25.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2 Experimental Tests of General Relativity . . . . . . . . . . . . . . . . . . . . 25.2.1 Equivalence Principle, Gravitational redshift, and Global Positioning System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2.2 Perihelion advance of Mercury . . . . . . . . . . . . . . . . . . . . . . 25.2.3 Gravitational deflection of light, Fermat’s principle and Gravitational Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2.4 Shapiro time delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2.5 Frame dragging and Gravity Probe B . . . . . . . . . . . . . . . . . . 25.2.6 Binary Pulsar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.3 Gravitational Waves and their Propagation . . . . . . . . . . . . . . . . . . 25.3.1 The gravitational wave equation . . . . . . . . . . . . . . . . . . . . . 25.3.2 The waves’ two polarizations: + and × . . . . . . . . . . . . . . . . . 25.3.3 Gravitons and their spin . . . . . . . . . . . . . . . . . . . . . . . . . 25.3.4 Energy and Momentum in Gravitational Waves . . . . . . . . . . . . 25.3.5 Wave propagation in a source’s local asymptotic rest frame . . . . . . 25.3.6 Wave propagation via geometric optics . . . . . . . . . . . . . . . . . 25.3.7 Metric perturbation; TT gauge . . . . . . . . . . . . . . . . . . . . . 25.4 The Generation of Gravitational Waves . . . . . . . . . . . . . . . . . . . . . 25.4.1 Multipole-moment expansion . . . . . . . . . . . . . . . . . . . . . . 25.4.2 Quadrupole-moment formalism . . . . . . . . . . . . . . . . . . . . . 25.4.3 Gravitational waves from a binary star system . . . . . . . . . . . . . 25.5 The Detection of Gravitational Waves . . . . . . . . . . . . . . . . . . . . . . 25.5.1 Interferometer analyzed in TT gauge . . . . . . . . . . . . . . . . . . 25.5.2 Interferometer analyzed in proper reference frame of beam splitter . . 25.5.3 Realistic Interferometers . . . . . . . . . . . . . . . . . . . . . . . . .

0

1 1 2 2 4 4 7 8 9 11 11 14 18 19 21 23 25 28 28 29 33 38 40 43 45

Chapter 25 Gravitational Waves and Experimental Tests of General Relativity Version 0825.1.K.pdf, 13 May 2009. Please send comments, suggestions, and errata via email to [email protected] or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 Box 25.1 Reader’s Guide • This chapter relies significantly on – The special relativity portions of Chap. 1. – Chapter 23, on the transition from special relativity to general relativity. – Chapter 24, on the fundamental concepts of general relativity, especially Sec. 23.9 on weak, relativistic gravitational fields. – Chapter 25, on relativistic stars and black holes. – Sec. 6.3 on geometric optics. • In addition, Sec. 25.2.3 on Fermat’s principle and gravitational lenses is closely linked to Sec. 6.5 on gravitational lenses and Sec. 7.6 on diffraction at a caustic. • Portions of this chapter are a foundation for Chap. 27, Cosmology.

25.1

Overview

In 1915, when Einstein formulated general relativity, human technology was incapable of providing definitive experimental tests of his theory. Only a half century later did technology begin to catch up. In the remaining 35 years of the century, experiments improved from 1

2 accuracies of a few tens of per cent to a part in 1000 or even 10,000; and general relativity passed the tests with flying colors. In Sec. 25.2 we shall describe some of these tests, derive general relativity’s predictions for them, and discuss the experimental results. In the early twenty-first century, observations of gravitational waves will radically change the character of research on general relativity. They will produce, for the first time, tests of general relativity in strong-gravity situations. They will permit us to study relativistic effects in neutron-star and black-hole binaries with exquisite accuracies. They will enable us to map the spacetime geometries of black holes with high precision, and study observationally the large-amplitude, highly nonlinear vibrations of curved spacetime that occur when two black holes collide and merge. And (as we shall see in Chap. 27), they may enable us to probe the singularity in which the universe was born and the universe’s evolution in its first tiny fraction of a second. In this chapter we shall develop the theory of gravitational waves in much detail and shall describe the efforts to detect the waves and the sources that may be seen. More specifically, in Sec. 25.3 we shall develop the mathematical description of gravitational waves, both classically and quantum mechanically (in the language of gravitons), and shall study their propagation through flat spacetime and also, via the tools of geometric optics, through curved spacetime. Then in Sec. 25.4 we shall develop the simplest approximate method for computing the generation of gravitational waves, the “quadrupole-moment formalism”; and we shall describe and present a few details of other, more sophisticated and accurate methods based on multipolar expansions, post-Newtonian techniques, and numerical simulations on supercomputers (“numerical relativity”). In Sec. 25.5, we shall turn to gravitationalwave detection, focusing especially on detectors such as LIGO and LISA that rely on laser interferometry.

25.2

Experimental Tests of General Relativity

In this section we shall describe briefly some of the most important experimental tests of general relativity. For greater detail, see Will (1993, 2001, 2005)

25.2.1

Equivalence Principle, Gravitational redshift, and Global Positioning System

A key aspect of the equivalence principle is the prediction that all objects, whose size is extremely small compared to the radius of curvature of spacetime and on which no nongravitational forces act, should move on geodesics. This means, in particular, that their trajectories through spacetime should be independent of their chemical composition. This is called the weak equivalence principle or the universality of free fall. Efforts to test the universality of free fall date back to Galileo’s (perhaps apocryphal) experiment of dropping objects from the leaning tower of Pisa. In the twentieth century a sequence of ever-improving experiments led by Roland von Eötvös (1920), Robert Dicke (1964), Vladimir Braginsky (1972), and Eric Adelberger (1994) have led to an accuracy ∆a/a < 5 × 10−13 for the difference of gravitational acceleration toward the Sun for earth-bound bodies with very different chem-

3 ical composition. A proposed space experiment called STEP has the prospect to increase this accuracy to the phenomenal level of ∆a/a . 1 × 10−18 . General relativity predicts that bodies with significant self gravity (even black holes) should also fall, in a nearly homogeneous external gravitational field, with the same acceleration as a body with negligible self gravity. This prediction has been tested by comparing the gravitational accelerations of the Earth and Moon toward the Sun. Their fractional difference of acceleration [as determined by tracking the relative motions of the Moon and Earth using laser beams fired from Earth, reflected off mirrors that astronauts and cosmonauts have placed on the moon, and received back at earth] has been measured by the LURE Project to be ∆a/a . 3 × 10−13 . Since the Earth and Moon have (gravitational potential energy)/(rest-mass energy) ≃ −5 × 10−10 and ≃ −2 × 10−10 respectively, this verifies that gravitational energy falls with the same acceleration as other forms of energy to within about a part in 1000. For references and for discussions of a variety of other tests of the Equivalence Principle, see Will (1993, 2001, 2005). From the equivalence principle, one can deduce that, for an emitter and absorber at rest in a Newtonian gravitational field Φ, light (or other electromagnetic waves) must be gravitationally redshifted by an amount ∆λ/λ = ∆Φ, where ∆Φ is the difference in Newtonian potential between the locations of the emitter and receiver. (See Ex. 24.5 for a general relativistic derivation when the field is that of a nonspinning, spherical central body with the emitter on the body’s surface and the receiver far from the body.) Relativistic effects will produce a correction to this shift with magnitude ∼ (∆Φ)2 [cf. Eq. (24.54)], but for experiments performed in the solar system, the currently available precision is too poor to see this correction; so such experiments test the equivalence principle and not the details of general relativity. The highest precision test of this gravitational redshift thus far was NASA’s 1976 GravityProbe-A Project (led by Robert Vessot), in which several atomic clocks were flown to a height of about 10,000 km above the earth, and were compared with atomic clocks on the earth via radio signals transmitted downward. After correcting for special relativistic effects due to the relative motions of the rocket’s clocks and the earth clocks, the measured gravitational redshift agreed with the prediction to within the experimental accuracy of about 2 parts in 10,000. The Global Positioning System (GPS), by which one can determine one’s location on Earth to within an accuracy of about 10 meters, is based on signals transmitted from a set of earth-orbiting satellites. Each satellite’s position is encoded on its transmitted signals, together with the time of transmission as measured by atomic clocks onboard the satellite. A person’s GPS receiver contains a high-accuracy clock and a computer. It measures the signal arrival time and compares with the encoded transmission time to determine the distance from satellite to receiver; and it uses that distance, for several satellites, together with the encoded satellite positions, to determine (by triangulation) the receiver’s location on earth. The transmission times encoded on the signals are corrected for the gravitational redshift before transmission. Without this redshift correction, the satellite clocks would quickly get out of synchronization with all the clocks on the ground, thereby eroding the GPS accuracy; see Ex. 25.1. Thus, a good understanding of general relativity was crucial to the design of

4 the GPS!1

25.2.2

Perihelion advance of Mercury

It was known at the end of the 19’th century that the point in Mercury’s orbit closest to the Sun, known as its perihelion, advances at a rate of about 575′′ per century with respect to the fixed stars, of which about 532′′ can be accounted for by Newtonian perturbations of the other planets. The remaining ∼ 43′′ per century was a mystery until Einstein showed that it can be accounted for quantitatively by the general theory of relativity. More specifically (as is demonstrated in Ex. 25.2), if we idealize the Sun as nonrotating and spherical so its external gravitational field is Schwarzschild, and we ignore the presence of the other planets, and we note that the radius of Mercury’s orbit is very large compared to the Sun’s mass (in geometrized units), then Mercury’s orbit will be very nearly an ellipse; and the ellipse’s perihelion will advance, from one orbit to the next, by an angle ∆φ = 6πM/p + O(M 2 /p2 ) radians.

(25.1)

Here M is the Sun’s mass and p is the ellipse’s semi latus rectum, which is related to its semimajor axis a (half its major diameter) and its eccentricity e by p = a(1 − e2 ). For the parameters of Mercury’s orbit (M = M⊙ ≃ 1.4766 km, a = 5.79089 × 107 km, e = 0.205628), this advance is 0.10352′′ per orbit. Since the orbital period is 0.24085 Earth years, this shift corresponds to 42.98 arc seconds per century. Although the Sun is not precisely spherical, its tiny gravitational oblateness (as inferred from measurements of its spectrum of pulsations; Fig. 15.2) has been shown to contribute negligibly to this perihelion shift; and the frame dragging due to the Sun’s rotational angular momentum is also (sadly!) negligible compared to the experimental accuracy; so 42.98′′ per century would be the shift if the Sun and Mercury were the only objects in the solar system. The gravitational fields of the other planets, however, tug on Mercury’s orbit, producing— according to Newtonian theory—the large additional shift of about 532′′ per century. The weakness of gravity in the solar system guarantees that relativistic corrections to this additional shift are negligible, and that this shift can be added linearly to the Schwarzschild prediction of 42.98′′, to within the accuracy of the measurements. When this is done and comparison is made with experiment, the 42.98′′ prediction agrees with the observations to within the data’s accuracy of about 1 part in 1000.

25.2.3

Gravitational deflection of light, Fermat’s principle and Gravitational Lenses

Einstein not only explained the anomalous perihelion shift of Mercury. He also predicted [Ex. 25.3] that the null rays along which starlight propagates will be deflected, when passing through the curved spacetime near the Sun, by an angle ∆φ = 4M/b + O(M 2 /b2 ) , 1

For further details of the GPS see http://www.BeyondDiscovery.org

(25.2)

5 relative to their trajectories if spacetime were flat. Here M is the Sun’s mass and b is the ray’s impact parameter (distance of closest approach to the Sun’s center). For comparison, theories that incorporated a Newtonian-like gravitational field into special relativity (Sec. 23.1) predicted half this deflection. The deflection was measured to an accuracy ∼ 20 per cent during the 1919 solar eclipse and agreed with general relativity rather than the competing theories—a triumph that helped make Einstein world famous. Modern experiments, based on the deflection of radio waves from distant quasars, as measured using Very Long Baseline Interferometry (interfering the waves arriving at radio telescopes with transcontinental or transworld separations; Sec. 8.3), have achieved accuracies of about 1 part in 10,000, and they agree completely with general relativity. Similar accuracies are now achievable using optical interferometers in space, and may soon be achievable via optical interferometry on the ground. These accuracies are so great that, when astronomers make maps of the sky using either radio interferometers or optical interferometers, they must now correct for gravitational deflection of the rays not only when the rays pass near the sun, but for rays coming in from nearly all directions. This correction is not quite as easy as Eq. (25.2) suggests, since that equation is valid only when the telescope is much farther from the Sun than the impact parameter. In the more general case, the correction is more complicated, and must include aberration due to the telescope motion as well as the effects of spacetime curvature. As we discussed in Sec. 6.6, the gravitational deflection of light rays (or radio rays) passing through or near a cluster of galaxies can produce a spectacular array of distorted images of the light source. In Chap. 6 we deduced the details of this gravitational lens effect using a model in which we treated spacetime as flat, but endowed with a refractive index n(x) = 1−2Φ(x), where Φ(x) is the Newtonian gravitational potential of the lensing system. This model can also be used to compute light deflection in the solar system. We shall now derive this model from general relativity: The foundation for this model is the following general relativistic version of Fermat’s principle [see Eq. (6.42) for the Newtonian version]: Consider any static spacetime geometry, i.e. one for which one can introduce a coordinate system in which ∂gαβ /∂t = 0 and gjt = 0; so the only nonzero metric coefficients are g00 (xj ) and g0i (xj ). In such a spacetime the time coordinate t is very special, since it is tied to the spacetime’s temporal symmetry. An example is Schwarzschild spacetime and the Schwarzschild time coordinate t. Now, consider a light ray emitted from a spatial point xj = aj in the static spacetime and received at a spatial point xj = bj . Assuming the spatial path along which the ray travels is xj (η) where η is any parameter with xj (0) = aj , xj (1) = bj , then the total coordinate time ∆t required for the light’s trip from aj to bj (as computed from the fact that the ray must be null so ds2 = g00 dt2 + gij dxi dxj = 0) is ∆t =

Z 1s 0

γjk

dxj dxk dη , dη dη

where γjk ≡

gjk . −g00

(25.3)

Fermat’s principle says that the actual spatial trajectory of the light path is the one that extremizes this coordinate time lapse. To prove this version of Fermat’s principle, notice that the action (25.3) is the same as

6 that [Eq. (23.30)] for a geodesic in a 3-dimensional space with metric γjk and with t playing the role of proper distance traveled. Therefore, the Euler-Lagrange equation for Fermat’s action principle δ∆t = 0 is the geodesic equation in that space [Eq. (23.26)] with t the affine parameter, which [using Eq. (22.38) for the connection coefficients] can be written in the form dxk dxj d2 xk 1 =0. (25.4) γjk 2 + (γjk,l + γjl,k − γkl,j ) dt 2 dt dt Next, take the geodesic equation (23.26) for the light ray in the real spacetime, with spacetime affine parameter ζ, and change parameters to t, thereby obtaining d2 xk dxk dxl gkl dxk dxl d2 tdζ 2 dxk + Γ − Γ + g =0, jkl j00 jk dt2 dt dt g00 dt dt (dt/dζ)2 dt d2 t/dζ 2 dxk /dt + 2Γ =0. 0k0 (dt/dζ)2 g00

gjk

(25.5)

Insert the second of these equations into the first and write the connection coefficients in terms of derivatives of the spacetime metric. Then with a little algebra you can bring the result into the form (25.4) of the Fermat-principle Euler equation. Therefore, the null geodesics of spacetime, when viewed as trajectories through the 3-space of constant t, are precisely the Fermat-principle paths, i.e. geodesics in a 3-space with metric γjk and properdistance affine parameter t. QED The index-of-refraction formalism used to study gravitational lenses in Chap. 6 is easily deduced as a special case of this Fermat principle: In a nearly Newtonian situation, the linearized-theory, Lorentz-gauge, trace-reversed metric perturbation has the form (23.107) ¯ 00 = −4Φ, h ¯ 0j ≃ 0, h ¯ jk ≃ 0. with only the time-time component being significantly large: h Correspondingly, the metric perturbation [obtained by inverting Eq. (23.101)] is h00 = −2Φ, hjk = −δjk Φ, and the full spacetime metric gµν = ηµν + hµν is ds2 = −(1 + 2Φ)dt2 + (1 − 2Φ)δjk dxj dxk .

(25.6)

This is the standard spacetime metric (23.95) in the Newtonian limit, with a special choice of spatial coordinates, those of linearized-theory Lorentz gauge. The Newtonian limit includes the slow-motion constraint that time derivatives of the metric are small compared to spatial derivatives [Eq. (23.88)], so on the timescale for light to travel through a lensing system, the Newtonian potential can be regarded as static, Φ = Φ(xj ). Therefore the Newtonian-limit metric (25.6) is static, and the coordinate time lapse along a trajectory between two spatial points, Eq. (25.3), reduces to Z 1 ∆t = (1 − 2Φ)dℓ , (25.7) 0

p

where dℓ = δjk dxj dxk is distance traveled treating the coordinates as though they were Cartesian, in flat space. This is precisely the action for the Newtonian version of Fermat’s principle, Eq. (6.42), with index of refraction n(xj ) = 1 − 2Φ(xj ) .

(25.8)

7 Therefore, the spatial trajectories of the light rays can be computed via the Newtonian Fermat principle, with the index of refraction (25.8). QED Although this index-of-refraction model involves treating a special (Lorentz-gauge) coordinate system as though the spatial coordinates were Cartesian and space were flat (so dℓ2 = δjk dxj dxk )— which does not correspond to reality—, nevertheless, this model predicts the correct gravitational lens images. The reason is that it predicts the correct rays through the Lorentz-gauge coordinates, and when the light reaches Earth, the cumulative lensing has become so great that the fact that the coordinates here are slightly different from truly Cartesian has negligible influence on the images one sees.

25.2.4

Shapiro time delay

In 1964 Irwin Shapiro proposed a new experiment to test general relativity: Monitor the round-trip travel time for radio waves transmitted from earth and bounced off Venus or some other planet, or transponded by a spacecraft. As the line-of-sight between the Earth and the planet or spacecraft gradually moves nearer then farther from the Sun, the waves’ rays will pass through regions of greater or smaller spacetime curvature, and this will influence the round-trip travel time by greater or smaller amounts. From the time evolution of the round-trip time, one can deduce the changing influence of the Sun’s spacetime curvature. One can compute the round-trip travel time with the aid of Fermat’s principle. The round-trip proper time, as measured on Earth (neglecting, for simplicity, the Earth’s orbital motion; i.e., pretending the Earth is at rest relative to the Sun while the light goes out and p back) is ∆τ⊕ = 1 − 2M/r⊕ ∆t ≃ (1 − M/r⊕ )∆t, where M is the Sun’s mass, r⊕ is the Earth’s distance from the Sun’s center, ∆t is the round-trip coordinate time in the static solar-system coordinates, and we have used g00 = 1 − 2M/r⊕ . Because ∆t obeys Fermat’s principle, it is stationary under small perturbations of the the light’s spatial trajectory. This allows us to compute it using a straight-line trajectory through the spatial coordinate system. Letting b be the impact parameter (the ray’s closest coordinate distance to the Sun) and x be coordinate distance along the straight-line trajectory and neglecting the gravitational √ 2 2 fields of the planets, we have Φ = −M/ x + b , so the coordinate time lapse out and back is Z √r2 −b2 refl 2M ∆t = 2 √ 1+ √ dx . (25.9) 2 −b2 x2 + b2 − r⊕ Here rrefl is the radius of the location at which the light gets reflected (or transponded) back √ to Earth. Performing the integral and multiplying by g00 ≃ 1 − M/r⊕ , we obtain for the round-trip travel time measured on Earth (a⊕ + r⊕ )(arefl + rrefl ) M + 4M ln , (25.10) ∆τ⊕ = 2 (a⊕ + arefl ) 1 − r⊕ b2 p p 2 2 − b2 . − b2 and arefl = rrefl where a⊕ = r⊕ As the Earth and the reflecting planet or transponding spacecraft move along their orbits, only one term in this round-trip time varies sharply: the term ∆τ⊕ = 4M ln(1/b2 ) = 8M ln b ≃ 40µs(b/R⊙ ) .

(25.11)

8 When the planet or spacecraft passes nearly behind the Sun, as seen from Earth, b plunges to a minimum (on a timescale of hours or days) then rises back up, and correspondingly the time delay shows a sharp blip. By comparing the observed blip with the theory in a measurement with the Cassini spacecraft, this Shapiro time delay has been verified to the remarkable precision of about 1 part in 100,000 (Bertotti, Iess and Tortora 2003).

25.2.5

Frame dragging and Gravity Probe B

As we have discussed in Secs. 23.9.3 and 24.5, the rotational angular momentum J of a gravitating body places its imprint on the body’s asymptotic spacetime metric: 2M 4ǫjkmJ k xm 2M 2 2 j ds = − 1 − dt − δjk dxj dxk . (25.12) dtdx + 1 + r r3 r Here, for definiteness, we are using Lorentz gauge, and M is the body’s mass; cf. Eq. (23.112). The angular-momentum term drags inertial frames into rotation about the body (Sec. 24.2). One manifestation of this frame dragging is a precession of inertial-guidance gyroscopes near the body. Far from the body, a gyroscope’s spin axis will remain fixed relative to distant galaxies and quasars, but near the body it will precess. It is easy to deduce the precession in the simple case of a gyroscope whose center of mass is at rest in the coordinate system of Eq. (25.12), i.e. at rest relative to the body. The ~ = ~u(~a · S) ~ [Eq. (22.91) boosted from special transport law for the gyroscope’s spin is ∇u~ S relativity to general relativitypvia the equivalence principle]. Here ~u is the gyroscope’s 4velocity (so uj = 0, u0 = 1/ 1 − 2M/r ≃ 1 + M/r ≃ 1) and ~a its 4-acceleration. The spatial components of this transport law are 1 S j ,t u0 ≃ S j ,t = −Γj k0 S k u0 ≃ −Γj k0 S k ≃ −Γjk0 S k ≃ (g0k,j − g0j,k )S k . 2

(25.13)

Here each ≃ means “is equal, up to fractional corrections of order M/r”. By inserting gj0 from the line element (25.12) and performing some manipulations with Levi-Civita tensors, we can bring Eq. (25.13) into the form ∂S = Ωprec × S , ∂t

where Ωprec =

1 [−J + 3(J · n)n] . r3

(25.14)

Here n = erˆ is the unit radial vector pointing away from the gravitating body. Equation (25.14) says that the gyroscope’s spin angular momentum rotates (precesses) with angular velocity Ωprec in the coordinate system (which is attached to distant inertial frames, i.e. to the distant galaxies and quasars). This is sometimes called a “gravitomagnetic precession” because the off-diagonal term gj0 in the metric, when thought of as a 3-vector, is −4J×n/r 2 , which has the same form as the vector potential of a magnetic dipole; and the gyroscopic precession is similar to that of a magnetized spinning body interacting with that magnetic dipole. In magnitude, the precessional angular velocity (25.14) in the vicinity of the Earth is roughly one arcsec per century, so measuring it is a tough experimental challenge. A

9 team led by Francis Everitt has designed and constructed a set of superconducting gyroscopes that are currently (2005) flying in an Earth-orbiting satellite called Gravity Probe B, with the goal of measuring this precession to a precision of about 1 part in 100 (see http://einstein.stanford.edu/ ).

25.2.6

Binary Pulsar

Gravity in the solar system is very weak. Even at Mercury’s orbit, the gravitational potential of the Sun is only |Φ| ∼ 3 × 10−8. Therefore, when one expands the spacetime metric in powers of Φ, current experiments with their fractional accuracies ∼ 10−4 or worse are able to see only the first-order terms beyond Newtonian theory; i.e. terms of first post-Newtonian order. To move on to second post-Newtonian order, O(Φ2 ) beyond Newton, will require major advances in technology, or observations of astronomical systems in which Φ is far larger than 3 × 10−8. Radio observations of binary pulsars (this subsection) provide one opportunity for such observations; gravitational-wave observations of neutron-star and black-hole binaries (Sec. 25.5) provide another. The best binary pulsar for tests of general relativity is PSR1913+16, discovered by Russell Hulse and Joseph Taylor in 1974. This system consists of two neutron stars in a mutual elliptical orbit with period P ∼ 8 hr and eccentricity e ∼ 0.6. One of the stars emits pulses at a regular rate. These are received at earth with time delays due to crossing the binary orbit and other relativistic effects. We do not know a priori the orbital inclination or the neutron-star masses. However, we obtain one relation between these three quantities by analyzing the Newtonian orbit. A second relation comes from measuring the consequences of the combined second order Doppler shift and gravitational redshift as the pulsar moves in and out of its companion’s gravitational field. A third relation comes from measuring the relativistic precession of the orbit’s periastron (analog of the perihelion shift of Mercury). (The precession rate is far larger than for Mercury: about 4◦ per year!) From these three relations one can solve for the stars’ masses and the orbital inclination, and as a check can verify that the Shapiro time delay comes out correctly. One can then use the system’s parameters to predict the rate of orbital inspiral due to gravitational-radiation reaction—a phenomenon with magnitude ∼ |Φ|2.5 beyond Newton, i.e. 2.5 post-Newtonian order (Sec. 25.4.2 below). The prediction agrees with the measurements to accuracy ∼ 0.1 per cent (Weissberg and Taylor 2004) —a major triumph for general relativity! **************************** EXERCISES Exercise 25.1 Practice: Gravitational Redshift for Global Positioning System The GPS satellites are in circular orbits at a height of 18,000 km above the Earth’s surface. If the ticking rates of the clocks on the satellites were not corrected for the gravitational redshift, roughly how long would it take them to accumulate a time shift, relative to clocks on the earth, large enough to degrade the GPS position accuracy by 10 meters? by 1 kilometer?

10 Exercise 25.2 Example: Perihelion Shift Consider a small satellite in non-circular orbit about a spherical body with much larger mass M, for which the external gravitational field is Schwarzschild. The satellite will follow a timelike geodesic. Orient the Schwarzschild coordinates so the satellite’s orbit is in the equatorial plane, θ = π/2. (a) Because the metric coefficients are independent of t and φ, the quantities E˜ = −pt and ˜ = pφ must be constants of the satellite’s motion [cf. Ex. 23.4]. Show that L 2M dt ˜ , E = 1− r dτ ˜ = r 2 dφ . (25.15) L dτ Explain why E˜ has the physical interpretation of the satellite’s orbital energy per unit ˜ is its angular momentum per unit mass. mass (including rest-mass energy) and why L (b) Introduce the coordinate u = r −1 and use the normalization of the 4-velocity to derive the following differential equation for the orbit: 2 E˜ 2 1 du 2 = − u + (1 − 2Mu) . (25.16) ˜2 ˜2 dφ L L (c) Differentiate this equation with respect to φ to obtain a second order differential equation d2 u M = 3Mu2 . (25.17) +u− 2 ˜2 dφ L By reinstating the constants G, c, and comparing with the Newtonian orbital equation, argue that the right-hand side represents a relativistic perturbation to the Newtonian equation of motion. (d) Assume, henceforth in this exercise, that r ≫ M (i.e. u ≪ 1/M), and solve the orbital equation (25.17) by perturbation theory. More specifically: At zero order (i.e., setting the right side to zero), show that the Kepler ellipse M uK = (1 + e cos φ), (25.18) ˜2 L ˜ 2 /M is a solution. Here e (a constant of integration) is the ellipse’s eccentricity and L is the ellipse’s semi latus rectum. The orbit has its minimum radius at φ = 0. (e) By substituting uK into the right hand side of the relativistic equation of motion (25.17), show (at first-order in the relativistic perturbation) that in one orbit the angle ˜ 2 . (Hint: Try φ at which the satellite is closest to the mass advances by ∆φ ≃ 6πM 2 /L to write the differential equation in the form d2 u/dφ2 + (1 + ǫ)2 u ≃ . . . , where ǫ ≪ 1.)

11 (f) For the planet Mercury, the orbital period is P = 0.241 yr and the eccentricity is e = 0.206. Deduce that the relativistic contribution to the rate of advance of the perihelion (point of closest approach to the Sun) is 43′′ per century. Exercise 25.3 Example: Gravitational Deflection of Light. Repeat the previous exercise for a photon following a null geodesic. (a) Show that the trajectory obeys the differential equation d2 u + u = 3Mu2 . 2 dφ

(25.19)

(b) Obtain the zero’th order solution by ignoring the right hand side, u=

sin φ . b

(25.20)

where b is an integration constant. Show that, in the asymptotically flat region far from the body, this is just a straight line and b is the impact parameter (distance of closest approach to the body). (c) Substitute this solution into the right hand side and show that the perturbed trajectory satisfies sin φ M + 2 (1 − cos φ)2 . (25.21) u= b b (d) Hence show that a ray with impact parameter b ≫ M will be deflected through an angle 4M α= ; (25.22) b cf. Eq. (6.77) and associated discussion. ****************************

25.3

Gravitational Waves and their Propagation

25.3.1

The gravitational wave equation

Gravitational waves are ripples in the curvature of spacetime that are emitted by violent astrophysical events, and that propagate out from their sources with the speed of light. It was clear to Einstein and others, even before general relativity was fully formulated, that his theory would have to predict gravitational waves; and within months after completing the theory, Einstein (1916, 1918) worked out the basic properties of those waves. It turns out that, after they have been emitted, gravitational waves propagate through matter with near impunity, i.e., they propagate as though in vacuum, even when other matter and fields are present. (For a proof and discussion see, e.g., Sec. 2.4.3 of Thorne, 1983). This

12 justifies simplifying our analysis to vacuum propagation. By contrast with most texts on gravitational waves, we shall not further simplify to propagation through a spacetime that is flat, aside from the waves, because it is almost as easy to analyze propagation through a curved background spacetime as a flat one. The key to the analysis is the same two-lengthscale expansion as underlies geometric optics for any kind of wave propagating through any kind of medium (Sec. 6.3): We presume that the waves’ reduced wavelength λ ¯ (wavelength/2π) as measured in some relevant p local Lorentz frame is very short compared to the radius of curvature of spacetime R ∼ 1/ Rαˆ βˆ ˆγ δˆ and the lengthscale L on which the background curvature changes (e.g., the radius of the Earth when the waves are near Earth): λ ¯ ≪ {R, L}

(25.23)

cf. Eq. (6.16). Then the Riemann curvature tensor can be split into two pieces: The backB ground curvature Rαβγδ , which is the average of Riemann over a few wavelengths, plus the GW waves’ curvature Rαβγδ , which is the remaining, oscillatory piece: B GW Rαβγδ = Rαβγδ + Rαβγδ ,

B Rαβγδ ≡ hRαβγδ i .

(25.24)

This is the same kind of split as we used in developing the quasilinear theory of plasma waves (Sec. 21.2.1). Similarly, we can split the spacetime metric into a sum of a smooth background part plus a gravitational-wave perturbation, denoted hαβ B gαβ = gαβ + hαβ ;

B where gαβ = hgαβ i .

(25.25)

B Obviously, the smooth background Riemann tensor Rαβγδ can be computed in the usual B manner from the smooth background metric gαβ . Because the waves are generally very weak, we can regard their metric perturbation hαβ GW and Riemann tensor Rαβγδ as linearized fields that live in the smooth, curved background spacetime. When we do so, we can replace gradients (subscript “;”) based on the full physical GW metric gαβ by gradients (subscript “|”) based on the background metric so, e.g., Rαβγδ;µ = GW Rαβγδ|µ . This linearization implies that the waves’ Riemann tensor can be computed from their metric perturbation via

1 GW Rαβγδ = (hαδ|βγ + hβγ|αδ − hαγ|βδ − hβδ|αγ ) , 2

(25.26)

as one can see from the fact that this formula reduces to the right result, Eq. (23.96), in a local Lorentz frame of the background metric. We shall use the waves’ Riemann tensor GW Rαβγδ as our primary entity for describing the waves, and shall use the metric perturbation only as a computational tool—mostly when analyzing wave generation. Notice that the combination of indices that appears on the right side of Eq. (25.26) is carefully designed to produce an entity with the symmetries of Riemann Rαβγδ = −Rβαγδ ,

Rαβγδ = −Rαβδγ ,

Rαβγδ = Rγδαβ .

(25.27)

13 [Eq. (23.52)]. This combination of indices is encountered frequently in gravitational-wave theory, so it is useful to introduce the following short-hand notation for it: S{αβγδ} ≡ Sαδβγ + Sβγαδ − Sαγβδ − Sβδαγ .

(25.28)

In terms of this notation, expression (25.26) reads 1 GW Rαβγδ = h{αβ|γδ} . 2

(25.29)

One benefit of the two-lengthscale condition λ ¯ ≪ R is the fact that the double gradient of the gravitational waves’ Riemann tensor is far larger than the product of the waves’ Riemann with the background Riemann GW Rαβγδ|µν

∼

GW Rαβγδ

λ ¯2

GW Rαβγδ GW B ∼ Rαβγδ Rµνρσ . ≫ 2 R

(25.30)

Since the commutator of the double gradient is a sum of products of the wave Riemann with GW the background Riemann [generalization of Eq. (23.39) with pα replaced by Rαβγδ ], gradients GW of Rαβγδ commute to high accuracy: GW GW = Rαβγδ|νµ . Rαβγδ|µν

(25.31)

GW We shall use this fact in deriving the wave equation for Rαβγδ . Our derivation of the wave equation will be based on a combination of the Riemann curvature’s Bianchi identity

Rα βγδ;ǫ + Rα βδǫ;γ + Rα βǫγ;δ = 0

(25.32)

[Eq. (25.32)] and the vacuum Einstein field equation Gαβ ≡ Rαβ − 12 Rg αβ = 0. By contracting the vacuum field equation on its two slots, we find that the scalar curvature R vanishes, and by inserting this back into the vacuum field equation we find that the Ricci tensor vanishes: Rαβ ≡ Rµ αµβ = 0

in vacuum.

(25.33)

By then contracting the Bianchi identity (25.32) on its first and fifth slots and invoking (25.33) we find that the Riemann tensor is divergence-free: Rµ βγδ;µ = 0

in vacuum.

(25.34)

The symmetries (25.27) guarantee that Riemann is divergence-free not only on its first slot, but in fact, on each of its four slots. By next taking the divergence of the Bianchi identity (25.32) on its last slot, we obtain Rαβγδ;µ µ = −Rαβδµ;γ µ − Rαβµγ;δ µ .

(25.35)

14 We now split this equation into its rapidly oscillating (wave) piece and its background piece, and for the wave piece we approximate the full-spacetime gradients “;” by backgroundspacetime gradients “|”, we commute the gradient indices on the right-hand side [Eq. (25.31)], and we use the vanishing of the divergence [Eq. (25.34)] to obtain µ

GW Rαβγδ|µ =0.

(25.36)

This is the wave equation for gravitational waves propagating through the curved, background spacetime. It is a perfect analog of the vacuum wave equation Aα;µ µ = 0 [Eq. (23.71)] for electromagnetic waves. Both wave equations dictate that their waves propagate at the speed of light (c = 1 in our geometrized units). To get insight into the waves, we pick a region of spacetime far from the source, where the wavefronts are nearly flat, and in that region we introduce a local Lorentz frame of the background spacetime. This frame must be small compared to the background radius of curvature R; but since λ ¯ ≪ R, the frame can still be big compared to λ ¯ . For example, for waves passing near and through Earth, in the frequency band f ∼ 100 Hz of Earth-based detectors, R is about 109 km and λ ¯ is about 500 km, so the local Lorentz frame could be 5 given a size ∼ 10 km (∼ 10 times larger than the Earth), which is huge compared to λ ¯ but small compared to R. In this local Lorentz frame, by virtue of Eqs. (23.15), (23.16) and (25.30), the wave equation (25.36) becomes ∂2 ∂2 ∂2 ∂2 GW − 2 + 2 + 2 + 2 Rαβγδ =0, (25.37) ∂t ∂x ∂y ∂z For simplicity, we orient the spatial axes of the local Lorentz frame so the waves propagate in the z-direction, and we neglect the curvature of the phase fronts (i.e., we treat the waves as planar). Then the solution to (25.37) is an arbitrary function of t − z: GW GW Rαβγδ = Rαβγδ (t − z) .

(25.38)

This shows explicitly that the waves propagate with the speed of light.

25.3.2

The waves’ two polarizations: + and ×

In this subsection we shall explore the properties of gravitational waves. Throughout the discussion we shall confine attention to the background’s local Lorentz frame, far from the source, in which the waves are nearly planar and have the form (25.38). GW Only two of the 20 independent components of Rαβγδ are independent functions of t − z; the other eighteen are determined in terms of those two by the following considerations: (i ) The Bianchi identity (25.32), when applied to the specific functional form (25.38) and then integrated in time (with the integration constant dropped because we are studying waves that fluctuate in time), implies GW Rαβxy =0,

GW GW Rαβxz = −Rαβx0 ,

GW GW Rαβyz = −Rαβy0 .

(25.39)

15 (ii ) This, together with Riemann’s symmetries (25.27), implies that all components can be GW expressed in terms of Rj0k0 (which is general relativity’s analog of the Newtonian tidal tensor Ejk ). (iii ) The vacuum Einstein equation (25.33) then implies that GW GW GW GW GW Rz0z0 = Rz0x0 = Rx0z0 = Rz0y0 = Ry0z0 =0,

(25.40)

and GW GW Rx0x0 = −Ry0y0 ≡−

1¨ h+ (t − z) , 2

GW GW Rx0y0 = Ry0x0 ≡−

1¨ h× (t − z) . 2

(25.41)

Here the two independent components have been expressed in terms of dimensionless functions h+ (t − z) and h× (t − z). The double time derivatives, denoted by double dots ¨ + ≡ ∂ 2 h+ /∂t2 ), are required by dimensionality: Riemann has dimensions of 1/length2 (h or equivalently 1/time2 ; so if h+ and h× are to be dimensionless, they must be differentiated twice in (25.41). The factors of 21 are relics of the past history of general relativity research. Equation (25.40) says that for a gravitational wave the space-time-space-time part of Riemann is transverse; i.e., it has no spatial components along the propagation direction (zdirection). This is completely analogous to the fact that the electric and magnetic fields of an electromagnetic wave are transverse to the propagation direction. The first of Eqs. (25.41) says that the nonvanishing, transverse-transverse part of Riemann is traceless. These two properties are often summarized by saying that gravitational waves are “transverse and traceless,” or “TT.” The two independent functions h+ and h× are called the “gravitational-wave fields” for the “+ (plus) polarization state” and for the “× (cross) polarization state.” We can reconstruct all the components of the waves’ Riemann tensor from these two gravitational-wave fields as follows: First define the polarization tensors e+ ≡ (~ex ⊗ ~ex − ~ey ⊗ ~ey ) ,

e× ≡ (~ex ⊗ ~ey + ~ey ⊗ ~ex ) ,

(25.42)

and a second-rank gravitational-wave field + × hTT αβ = h+ eαβ + h× eαβ ; TT hTT xx = −hyy = h+ ,

TT hTT xy = hyx = h× ,

or equivalently all other hTT αβ vanish.

(25.43a) (25.43b)

[The notation “TT” indicates that this field is transverse to the propagation direction (zdirection) and traceless. The relationship between this hTT αβ and the metric perturbation hαβ will be explained in Sec. 25.3.7 below.] Then the waves’ Riemann tensor is 1 GW Rαβγδ = hTT ; 2 {αβ|γδ}

1 ¨ TT GW . and in particular R0j0k =− h 2 jk

(25.44)

We shall seek physical insight into h+ and h× by studying the following idealized problem: Consider a cloud of test particles that floats freely in space and is static and spherical before

16 the waves pass. We shall study the wave-induced deformations of the cloud as viewed in the nearest thing there is to a rigid, orthonormal coordinate system: the local Lorentz frame (in the physical spacetime) of a “fiducial particle” that sits at the cloud’s center. In that frame the displacement vector ξ j between the fiducial particle and some other particle has components ξ j = xj +δxj , where xj is the other particle’s spatial coordinate before the waves pass, and δxj is its coordinate displacement, as produced by the waves. By inserting this into the local-Lorentz-frame variant of the equation of geodesic deviation, Eq. (23.42), and neglecting the tiny δxk compared to xk on the right side, we obtain d2 δxj 1 ¨ TT k GW k = −Rj0k0 x = h x , 2 dt 2 jk

(25.45)

which can be integrated twice to give 1 δxj = hTT xk . 2 jk

(25.46)

The middle expression in Eq. (25.45) is the gravitational-wave tidal acceleration that moves the particles back and forth relative to each other. It is completely analogous to the Newtonian tidal acceleration −Rj0k0 xk = −(∂ 2 Φ/∂xj ∂xk )xk by which the moon raises tides on the earth’s oceans [Sec. 23.5.1]. Specialize, now, to a wave with + polarization (for which h× = 0). By inserting expression (25.43) into (25.46), we obtain 1 δx = h+ x , 2

δy = −

1 h+ y , 2

δz = 0 .

(25.47)

This displacement is shown in Fig. 25.1(a,b). Notice that, as the gravitational-wave field h+ oscillates at the cloud’s location, the cloud is left undisturbed in the z-direction (propagation direction), and in transverse planes it gets deformed into an ellipse elongated first along the x-axis (when h+ > 0), then along the y-axis (when h+ < 0). Because Rx0x0 = −Ry0y0 , i.e., because Rj0k0 is traceless, the ellipse is squashed along one axis by the same amount as it is stretched along the other, i.e., the area of the ellipse is preserved during the oscillations. The effects of the h+ polarization state can also be described in terms of the tidal acceleration field that it produces in the central particle’s local Lorentz frame: d2 δx = dt2

1¨ h+ (xex − yey ) , 2

(25.48)

¨ + ≡ ∂ 2 h+ /∂t2 . Notice that this acceleration field is divergence free. Because it where h is divergence-free, it can be represented by lines of force, analogous to electric field lines, which point along the field and have a density of lines proportional to the magnitude of the field; and when this is done, the field lines will never end. Figure 25.1(c,d) shows this ¨ + is positive and when it is negative. acceleration field at the phases of oscillation when h Notice that the field is quadrupolar in shape, with a field strength (density of lines) that increases linearly with distance from the origin of the local Lorentz frame. The elliptical deformations of the sphere of test particles shown in Fig. 25.1(a,b) are the responses of that

17

y

y y

y

x x

(a)

x

x

(d)

(c)

(b)

Fig. 25.1: Physical manifestations, in a particle’s local Lorentz frame, of h+ gravitational waves. (a) Transverse deformation of an initially spherical cloud of test particles at a phase of the wave when h+ > 0. (b) Deformation of the cloud when h+ < 0. (c) Field lines representing the acceleration ¨ + > 0. (d) Acceleration field lines field which produces the cloud’s deformation, at a phase when h ¨ + < 0. when h

y

y y

y x

x

x

x

(a)

(b)

(c)

(d)

Fig. 25.2: Physical manifestations, in a particle’s local Lorentz frame, of h× gravitational waves. (a) Deformation of an initially circular sphere of test particles at a phase of the wave when h× > 0. (b) Deformation of the sphere when h× < 0. (c) Field lines representing the acceleration field which ¨ × > 0. (d) Acceleration field produces the sphere’s deformation, at a phase of the wave when h ¨ × < 0. lines when h

18 sphere to this quadrupolar acceleration field. The polarization state which produces these accelerations and deformations is called the + state because of the orientation of the axes of the quadrupolar acceleration field [Fig. 25.1(c,d)]. Turn, next, to the × polarization state. In this state the deformations of the initially circular ring are described by 1 δx = h× y , 2

1 δy = h× x , 2

δz = 0 .

(25.49)

These deformations, like those for the + state, are purely transverse; they are depicted in Fig. 25.2(a,b). The acceleration field that produces these deformations is d2 δx = dt2

1¨ h× (yex + xey ) . 2

(25.50)

This acceleration field, like the one for the + polarization state, is divergence free and quadrupolar; the field lines describing it are depicted in Fig. 25.2(c,d). The name “× polarization state” comes from the orientation of the axes of this quadrupolar acceleration field. In defining the gravitational-wave fields h+ and h× , we have relied on a choice of (local Lorentz) reference frame, i.e. a choice of local Lorentz basis vectors ~eα . Exercise 25.4 explores how these fields change when the basis is changed. The conclusions are simple: (i) When one rotates the transverse basis vectors ~ex and ~ey through an angle ψ, then h+ and h× “rotate” through 2ψ in the sense that: (h+ + ih× )new = (h+ + ih× )old e2iψ ,

when (~ex + i~ey )new = (~ex + i~ey )eiψ .

(25.51)

(ii) When one boosts from an old frame to a new one moving at some other speed, but chooses the old and new spatial bases such that (a) the waves propagate in the z direction in both frames and (b) the plane spanned by ~ex and ~κ ≡ ~e0 + ~ez =(propagation direction in spacetime) is the same in both frames, then h+ and h× are the same in the two frames—i.e., they are scalars under such a boost!

25.3.3

Gravitons and their spin

Most of the above features of gravitational waves (though not expressed in this language) were clear to Einstein in 1918. Two decades later, as part of the effort to understand quantum fields, M. Fierz and Wolfgang Pauli (1939) at the Eidgenössische Technische Hochschule (ETH) in Zurich, Switzerland formulated a classical theory of linear fields of arbitrary spin so designed that the fields would be quantizable by canonical methods. Remarkably, their canonical theory for a field of spin two is identical to general relativity with nonlinear effects removed, and the plane waves of that spin-two theory are identical to the waves described above. When quantized by canonical techniques, these waves are carried by zero-rest-mass, spin-two gravitons. One can see by the following simple argument that the gravitons which carry gravitational waves must have spin two: Consider any plane-wave field (neutrino, electromagnetic,

19 gravitational, . . .) that propagates at the speed of light in the z-direction of a (local) Lorentz frame. At any moment of time examine any physical manifestation of that field, e.g., the acceleration field it produces on test particles. Rotate that manifestation of the field around the z axis, and ask what is the minimum angle of rotation required to bring the field back to its original configuration. Call that minimum angle, θret , the waves’ return angle. The spin S of the particles that carry the wave will necessarily be related to that return angle by S=

360 degrees . θret

(25.52)

This simple formula corresponds to the elegant mathematical statement that “the waves generate an irreducible representation of order S = 360 degrees/θret of that subgroup of the Lorentz group which leaves their propagation vector unchanged (the ‘Little group’ of the rotation vector).” For electromagnetic waves a physical manifestation is the electric field, which is described by a vector lying in the x–y plane; if one rotates that vector about the z-axis (propagation axis), it returns to its original orientation after a return angle θret = 360 degrees. Correspondingly, the spin of the particle which carries the electromagnetic wave (the photon) is one. For neutrinos the return angle is θret = 720 degrees; and correspondingly the spin of a neutrino is 21 . For gravitational waves the physical manifestations include the deformation of a sphere of test particles [Figs. 25.1(a,b) and 25.2(a,b)] and the acceleration fields [Figs. 25.1(c,d) and 25.2(c,d)]. Both the deformed, ellipsoidal spheres and the quadrupolar lines of force return to their original orientations after rotation through θret = 180 degrees; and correspondingly, the graviton must have spin two. Although Fierz and Pauli (1939) showed us how to quantize linearized general relativity, the quantization of full, nonlinear general relativity remains a difficult subject of current research, to which we shall return briefly in the next chapter.

25.3.4

Energy and Momentum in Gravitational Waves

In 1968 Richard Isaacson discovered a beautiful and powerful method to define a stress-energy tensor for a gravitational wave. This method is similar to the one by which we analyzed the back-action of a plasma wave on the plasma’s background particle distribution [Eq. (21.4)]. Here, as there, we take our exact dynamical equation (the Einstein field equation here, the Vlasov equation there) and expand it to quadratic order in the wave: (1)

(2)

Gαβ = GB αβ + Gαβ + Gαβ = 0 .

(25.53)

B + hµν , GB In this equation Gαβ is the Einstein tensor for the full spacetime metric gµν = gµν αβ (1) (2) B is the Einstein tensor for the background metric gαβ , Gαβ is the part linear in hµν , and Gαβ is the part quadratic in hµν . This is the analog of the quadratically expanded Vlasov equation (21.3). Here, as in the plasma case, we next split our dynamical equation into two parts, its spatial average (which is smooth on the scale λ ¯ ) and its remaining, fluctuating piece. In the plasma case the fluctuating piece is the linear wave equation for the plasma waves; in µ GW the gravitational case it is a variant of the gravitational wave equation Rαβγδ|µ = 0. In the plasma case the averaged piece is Eq. (21.4) by which the waves at quadratic order in their

20 amplitude act back on the unperturbed particle distribution. In the gravitational case, it is the equation (2) GB (25.54) αβ = −hGαβ i , by which the waves at quadratic order produce background spacetime curvature. Equation (25.54) can be brought into the standard form for Einstein’s equation in the background spacetime, GW GB (25.55) αβ = 8πTαβ , by attributing to the waves a stress-energy tensor defined by GW Tαβ =−

1 (2) hGαβ i . 8π

(25.56)

Because this stress-energy tensor involves an average over a few wavelengths, its energy density, momentum density, energy flux, and momentum flux are not defined on lengthscales shorter than a wavelength. One cannot say how much energy or momentum resides in the troughs of the waves and how much in the crests. One can only say how much total energy there is in a region containing a few wavelengths. However, once one has reconciled oneself to GW this amount of nonlocality, one finds that Tαβ has all the other properties that one expects of any good stress-energy tensor. Most especially, in the absence of coupling of the waves to matter (the situation we are treating), it obeys the standard conservation law T GW αβ |β = 0 .

(25.57)

This law is a direct consequence of the averaged field equation (25.56) and the contracted Bianchi identity for the background spacetime GB αβ |β = 0. By grinding out the second-order perturbation of the Einstein tensor and inserting it into Eq. (25.56), performing several integrations by parts in the average h. . .i, and invoking GW results to be derived in Sec. 25.3.7, one arrives at the following simple expression for Tαβ in terms of the wave fields h+ and h× : GW Tαβ =

1 hh+,αh+,β + h×,α h×,β i . 16π

(25.58)

[For details of the derivation, see Isaacson (1968) or Secs. 35.13 and 35.15 of MTW.] Let us examine this stress-energy tensor in the background spacetime’s local Lorentz frame, which we used above when exploring the properties of gravitational waves. Because h+ = h+ (t − z) and h× = h× (t − z), the only nonzero components of Eq. (25.58) are T GW 00 = T GW 0z = T GW z0 = T GW zz =

1 ˙2 hh + h˙ 2× i . 16π +

(25.59)

This has the same form as the stress-energy tensor for a plane electromagnetic wave propagating in the z direction, and the same form as the stress-energy tensor for any collection of zero-rest-mass particles moving in the z-direction [cf. Eq. (2.30c)], as it must since the

21 gravitational waves are carried by zero-rest-mass gravitons just as electromagnetic waves are carried by zero-rest-mass photons. Suppose that the waves have frequency ∼ f and that the amplitudes of oscillation of h+ and h× are ∼ hamp Then by inserting factors of G and c into Eq. (25.59) [i.e., by switching from geometrized units to conventional units] and by setting h(∂h+ /∂t)2 i ≃ 1/2(2πf hamp )2 and similarly for h× , we obtain the following approximate expression for the energy flux in the waves: 2 2 f hamp ergs π c3 2 2 GW 0z f hamp ≃ 300 2 T ≃ . (25.60) 4G cm sec 1 kHz 10−21 The numbers in this equation correspond to a strongly emitting, highly asymmetric supernova in the Virgo cluster of galaxies. Contrast this huge gravity-wave energy flux with the peak electromagnetic flux at the height of the supernova, ∼ 10−9 erg cm−2 sec−1 ; but note that the gravity waves should last for only a few milliseconds, while the strong electromagnetic output lasts for weeks. Corresponding to the huge energy flux (25.60) in an astrophysically interesting gravitational wave is a huge mean occupation number for the quantum states of the gravitationalwave field, i.e., a huge value for the number of spin-2, zero-rest-mass gravitons in each quantum state. To compute that occupation number, we shall evaluate the volume in phase space occupied by the waves from a supernova and then divide by the volume occupied by each quantum state [cf. Sec. 2.3]. At a time when the waves have reached a distance r from the source, they occupy a spherical shell of area 4πr 2 and thickness of order 10¯ λ, where λ ¯ = 1/(2πf ) is their reduced wavelength, so their volume in physical space is Vx ∼ 100r 2λ ¯. As seen by observers whom the waves are passing, they come from a solid angle ∆Ω ∼ (2¯ λ/r)2 centered on the source, and they have a spread of angular frequencies ranging from ω ∼ 12 c/¯ λ to ω ∼ 2c/¯ λ. Since each graviton carries an energy ~ω = ~c/¯ λ and a momentum ~ω/c = ~/¯ λ, the volume that they occupy in momentum space is Vp ∼ (2~/¯ λ)3 ∆Ω, i.e., Vp ∼ 10~3 /(λr 2 ) . The gravitons’ volume in phase space, then, is Vx Vp ∼ 1000~3 ∼ 4(2π~)3 .

(25.61)

Since each quantum state for a zero rest-mass particle occupies a volume (2π~)3 in phase space [Eq. (2.16) with gs = 1], this means that the total number of quantum states occupied by the gravitons is of order unity! Correspondingly, the mean occupation number of each occupied state is of order the total number of gravitons emitted, which (since the total energy radiated in an extremely strong supernova is of order 10−2 M⊙ c2 ∼ 1052 ergs, and each graviton carries an energy ~c/¯ λ ∼ 10−23 erg), is η¯ ∼ 1075 .

(25.62)

This enormous occupation number means that the waves behave exceedingly classically; √ quantum-mechanical corrections to the classical theory have fractional magnitude 1/ η¯ ∼ 10−37 .

25.3.5

Wave propagation in a source’s local asymptotic rest frame

Consider a source of gravitational waves somewhere far out in the universe. In the vicinity of the source but some wavelengths away from it (so the waves are well defined), introduce

22 a local Lorentz reference frame in which the source is at rest: the source’s local asymptotic rest frame. In that frame construct spherical polar coordinates (t, r, θ, φ) centered on the source so the background metric is ds2 = −dt2 + dr 2 + r 2 (dθ2 + sin2 θdφ2 ) .

(25.63)

The wave gravitational wave equation Rαβγδ|µ µ = 0 can be solved fairly easily in this coordinate system. The solution has the form that one would expect from experience with scalar waves and electromagnetic waves in spherical coordinates, plus the description of plane gravitational waves in Sec. 25.3.2: The waves propagate radially with the speed of light, so their wave fields h+ and h× are rapidly varying functions of retarded time τr ≡ t − r ,

(25.64)

and slowly varying functions of angle (θ, φ), and they die out as 1/r: h+ =

Q+ (τr ; θ, φ) , r

h× =

Q× (τr ; θ, φ) . r

(25.65)

These propagation equations can be thought of as saying that Q+ and Q× are constant along radial null rays, i.e. curves of constant τr = t − r, θ and φ; and h+ and h× are equal to these constantly-propagated Q’s, modified by a 1/r falloff. Notice that the null tangent vector to the radial rays is ~k = ~et + ~er = −∇τr .

(25.66)

By a computation in the coordinate basis, one can show that the radius factor r, which appears in the 1/r falloff law, evolves along the rays in accord with the equation 1 ~ ~ ∇~k r = r,α k α = (∇ · k)r , 2

(25.67)

This may look like a complicated way to describe r, but its virtue is that, when the waves have left the source’s vicinity and are traveling through the real, lumpy universe, the wave fields will continue to have the form (25.65), with r evolving in accord with (25.67)! [Sec. 25.3.6 below]. The wave fields are not fully meaningful until we have specified their associated polarization tensors. Those tensors can be defined along each ray using two transverse, orthonormal polarization vectors, ~a [the analog of ~ex in Eq. (25.42)] and ~b [the analog of ~ey ]: e+ = (~a ⊗ ~a − ~b ⊗ ~b) ,

e× = (~a ⊗ ~b + ~b ⊗ ~a) .

(25.68)

The vectors ~a and ~b must be held constant along each ray, or equivalently must be parallel transported along the rays: ∇~k~a = ∇~k~b = 0 . (25.69) It is conventional, in the source’s local asymptotic rest frame, to choose ~a = ~eθˆ ,

~b = ~e ˆ , φ

(25.70)

23 so the axes for the + polarization are in the θ and φ directions, and those for the × polarization are rotated 45 degrees to ~eθˆ and ~eφˆ. Once the polarization tensors have been constructed, and the wave fields (25.65) are known, then the waves’ TT gravitational-wave field can be computed from the standard equation + × hTT (25.71) αβ = h+ eαβ + h× eαβ [Eq. (25.43)], and the waves’ Riemann tensor can be computed from the obvious generalization of Eq. (25.44): 2 TT 2 TT 1 1 ∂ h{αβ ∂τr ∂τr 1 ∂ h{αβ GW Rαβγδ ≃ hTT kγ kδ} . = = 2 {αβ|γδ} 2 ∂τr2 ∂xγ ∂xδ} 2 ∂τr2

(25.72)

Here the derivative with respect to retarded time τr is taken holding (θ, φ, r) fixed, and the {. . .} on the indices has the meaning of Eq. (25.28). As an important special case, if the the basis vectors are chosen in the θ and φ directions [Eq. (25.70)], then the tide-producing space-time-space-time part of Riemann [Eq. (25.72)] takes form 1 GW h+,τr τr , = = −R RθGW ˆ ˆ ˆˆ ˆ ˆ ˆ ˆ φ0φ0 0θ 0 2

1 GW = = R RθGW h×,τr τr , ˆ ˆ ˆˆ ˆ ˆ ˆ ˆ φ0θ 0 0φ0 2

(25.73)

which is the obvious generalization of Eq. (25.41) to radially propagating waves. We shall demonstrate at the end of the next section that the Riemann tensor (25.72) constructed by the above procedure is, indeed, a solution of the gravitational wave equation.

25.3.6

Wave propagation via geometric optics

The two-lengthscale conditions (25.23), which underlie the definition of gravitational waves, µ GW permit us to solve the gravitational wave equation Rαβγδ|µ = 0 by means of geometric optics. We developed the concepts of geometric optics for rather general types of waves in Sec. 6.3. When those techniques are applied to the gravitational wave equation, they reveal that, as the waves travel through our lumpy, bumpy universe, with its stars, galaxies, and black holes, they continue to propagate along null geodesics, just as they did in the local asymptotic rest frame where they originated. Classically the null geodesics are the waves’ rays; quantum mechanically they are the world lines of the waves’ gravitons. Because electromagnetic waves also propagate along null-geodesic rays (photon world lines), gravitational waves must exhibit all the same null-ray-induced phenomena as electromagnetic waves: doppler shifts, cosmological redshifts, gravitational redshifts, gravitational deflection of rays, and gravitational lensing. Each ray starts out traveling radially through the local asymptotic rest frame, so it can be identified by three parameters: the direction (θ, φ) in which it was emitted, and the retarded time τr of its emission. The rays carry these three labels out through spacetime with themselves, and in particular they lay down the scalar field τr (P). As in the source’s local asymptotic rest frame, so also throughout spacetime, the vector ~k ≡ −∇τ ~ r

(25.74)

24 continues to be tangent to the null rays (so ~k · ~k = 0), and continues to satisfy the null geodesic equation ∇~k~k = 0 , (25.75) as one can see by the following index manipulations: 1 kα|µ k µ = −τr|αµ k µ = −τr|µα k µ = kµ|α k µ = (~k · ~k)|α = 0 . 2

(25.76)

[Here the second expression follows from the definition (25.74) of ~k, the third follows from the fact that double gradients of scalars (by contrast with vectors) commute, the fourth follows from (25.74) again, the fifth from the rule for differentiating products (and the fact that the gradient of the metric vanishes), and the sixth from the fact that ~k · ~k = 0.] Thus, ~k ≡ −∇τ ~ r continues to be the null-geodesic tangent vector. As in the source’s local asymptotic rest frame, so throughout spacetime: (i) the Q functions Q+ and Q× continue to be constant along each ray, (ii) the radius function r continues to evolve via the propagation law 1 ~ ~ · k)r , ∇~k r = r,α k α = (∇ 2

(25.77)

[Eq. (25.67)], (iii) the polarization vectors continue to be parallel transported along the rays ∇~k~a = ∇~k~b = 0 ,

(25.78)

[Eq. (25.69)], and continue to be used to build the polarization tensors via e+ = (~a ⊗ ~a − ~b ⊗ ~b) ,

e× = (~a ⊗ ~b + ~b ⊗ ~a) .

(25.79)

[Eq. (25.68)], (iv) the gravitational-wave fields continue to be constructed via the equations h+ =

Q+ (τr ; θ, φ) , r

h× =

Q× (τr ; θ, φ) , r

+ × hTT αβ = h+ eαβ + h× eαβ

(25.80) (25.81)

[Eqs. (25.65) and (25.71)] and the Riemann tensor continues to be constructed via GW Rαβγδ

2 TT 2 TT 1 ∂ h{αβ ∂τr ∂τr 1 1 ∂ h{αβ kγ kδ} = ≃ hTT = 2 2 γ δ} 2 ∂τr 2 ∂τr ∂x ∂x 2 {αβ|γδ}

(25.82)

[Eq. (25.72)]. We shall now sketch a proof that this geometric-optics Riemann tensor does, indeed, satisfy the gravitational wave equation. The foundation for the proof is the geometric-optics GW condition that h+ , h× , and thence Rαβγδ are rapidly varying functions of retarded time τr

25 and slowly varying functions of (θ, φ, r). To take advantage of this, we shall use a prime to denote derivatives at fixed τr , so GW GW Rαβγδ|µ = Rαβγδ,τ r

∂τr GW GW GW + Rαβγδ|µ ′ = −Rαβγδ,τ kµ + Rαβγδ|µ′ . r ∂xµ

(25.83)

Taking the divergence of this on the µ index we obtain µ

|µ GW GW GW µ GW GW Rαβγδ|µ = Rαβγδ,τ k k µ − 2Rαβγδ,τ + Rαβγδ|µ ′ k − Rαβγδ,τ kµ ′ r τr µ r |µ r

µ′

.

(25.84)

In the limit as the waves’ wavelength goes to zero, the first term á priori scales as 1/¯ λ2 , the second and third as 1/¯ λ, and the fourth as 1/¯ λ0 = 1. In the spirit of geometric optics (Sec. 6.3), we neglect the tiny fourth term. The leading-order, first term vanishes because ~k is null, so Eq. (25.84) reduces to µ

|µ µ GW GW GW Rαβγδ|µ = −2Rαβγδ,τ ′ k − Rαβγδ,τ kµ r r |µ

(25.85)

GW along ~k, i.e. The first term on the right hand side is the directional derivative of −2Rαβγδ,τ r along a ray. Since each ray has constant (θ, φ, τe ), and since the vectors ~a, ~b, and ~k that GW appear in Eqs. (25.80)–(25.82) for Rαβγδ all are parallel transported along ~k, the only piece GW of Rαβγδ that can vary along ~k is the factor 1/r. Correspondingly, µ GW Rαβγδ,τ ′k r |µ

=

GW r∇~k Rαβγδ,τ r

1 . r

(25.86)

Inserting this into Eq. (25.85) and invoking the propagation law (25.77) for r, we obtain µ

GW Rαβγδ|µ =0.

(25.87)

GW Thus, our geometric-optics formulae for Rαβγδ do, indeed, produce a solution to the gravitational wave equation. Moreover, since, in the source’s local asymptotic rest frame, this GW solution reduces to the one developed in Sec. 25.3.5, the formulae for Rαβγδ also satisfy the gravitational wave equation.

25.3.7

Metric perturbation; TT gauge

Although the properties of gravitational waves and their propagation are best described in GW terms of the waves’ Riemann tensor Rαβγδ , their generation is best described in terms of the waves’ metric perturbation hµν [cf. the linearized-theory analysis in Sec. 23.9.2]. As in linearized theory, so also here, there is gauge freedom in the waves’ metric perturbation, which results from introducing a tiny, rippled displacement ξ~ of the coordinate lines. In a local Lorentz frame of the smooth background, the gauge change has the linearized-theory form δhµν = −ξµ,ν −ξν,µ [Eq. (23.104)], so in an arbitrary coordinate system of the background spacetime it must be old hnew (25.88) µν = hµν − ξµ|ν − ξν|µ .

26 By choosing the background coordinates to be local Lorentz and carefully adjusting the waves’ gauge, we can ensure that the waves’ metric perturbation is equal to the transversetraceless gravitational-wave field (25.43), which we originally defined in terms of the Riemann tensor; i.e., we can ensure that hαβ = hTT or equivalently, αβ , hxx = −hyy = h+ (t − z) , hxy = hyx = h× (t − z) ,

all other hαβ vanish.(25.89)

To see that this is possible, we need only verify that this metric perturbation produces the correct components of Riemann, Eq. (25.41); indeed it does, as we can see by inserting Eqs. (25.89) into expression (25.26) for the Riemann tensor. For an alternative proof see Ex. 25.6. The gauge in which the waves’ metric perturbation takes the simple TT form (25.89) is called transverse-traceless gauge, or TT gauge, and the coordinates in which the metric perturbation takes this form are called TT coordinates. TT gauge is not the only one in which the waves’ metric perturbation has the planewave form hαβ = hαβ (t − z) There are many other such gauges; cf. Ex. 25.6. In any local Lorentz frame of the background spacetime and any gauge for which hαβ = hαβ (t − z), the transverse components of the waves’ Riemann curvature tensor take the form [derivable from Eq. (23.51) or (23.96)] GW =− Rj0k0

1 ∂ 2 hjk for j = x , y and k = x , y . 2 ∂t2

(25.90)

By comparing with Eq. (25.41) we see that in such a gauge, the transverse part of the waves’ metric perturbation must be equal to the TT gravitational-wave field: T hTT jk = (hjk ) .

(25.91)

Here the T on the right-hand side means “throw away all components except those which are spatial and are transverse to the waves’ propagation direction”. Since hTT jk is trace-free as well as transverse, we are guaranteed that the transverse part of the metric perturbation hjk will be trace-free; i.e. hxx + hyy = 1. To emphasize this trace-free property it is conventional to write Eq. (25.91) in the form TT hTT , (25.92) jk = (hjk ) where the second T on the right-hand side means “remove the trace, if the trace is not already zero”. To repeat, Eq. (25.92) is true in any gauge where the waves’ contribution to the metric has the “speed-of-light-propagation” form hαβ = hαβ (t − z). If we rotate the spatial axes so the waves propagate along the unit spatial vector n instead of along ~ez , then the “speed-of-light-propagation” form of the metric becomes hαβ = hαβ (t − n · x) ,

(25.93)

and the extraction of the spatial, transverse-traceless part of this metric perturbation can be achieved with the aid of the projection tensor P jk ≡ δ jk − nj nk .

(25.94)

27 Specifically, 1 TT hTT = Pj l Pk m hlm − Pjk P lm hlm . jk = (hjk ) 2

(25.95)

Here the notation is that of Cartesian coordinates with Pj k = P jk = Pjk . When analyzing gravitational wave generation, the quantity most easily computed is often the trace-reversed metric perturbation, in a gauge with speed-of-light propagation, so ¯ αβ = h ¯ αβ (t − n · x). Because the projection process (25.95) removes the trace (i.e., the h ¯ jk and hjk differ only in their trace, we can compute result is insensitive to the trace), and h ¯ jk without bothering to evaluate hjk the gravitational-wave field by direct TT projection of h first: ¯ lm . ¯ TT = Pj l Pk m h ¯ lm − 1 Pjk P lm h hTT (25.96) jk = (hjk ) 2 **************************** EXERCISES Exercise 25.4 Derivation: Behavior of h+ and h× under rotations and boosts (a) Derive the behavior (25.51) of h+ and h× under rotations in the transverse plane. [Hint: show that e+ + ie× rotates through 2ψ, and then write hGW αβ [Eq. (25.43)] in + × terms of h+ + ih× and e − ie .] (b) Show that, with the orientations of spatial basis vectors described after Eq. (25.51), h+ and h× are unchanged by boosts. Exercise 25.5 Problem: Energy-Momentum Conservation in Geometric Optics Limit GW Near the end of Sec. 25.3.6, we proved that our geometric-optics formulae for Rαβγδ satisfy the gravitational wave equation. Use these same techniques to show that the gravitational stress-energy tensor (25.58), with h+ and h× given by the geometric-optics formulae (25.80), ~ · TGW = 0. has vanishing divergence, ∇ Exercise 25.6 Example: Transformation to TT Gauge Consider a plane gravitational wave propagating in the z-direction through a local Lorentz frame of the smooth background spacetime. Such a wave can be described by Linearized Theory. In Sec. 23.9.2 and Ex. 23.13 we showed that, by a careful choice of the four gaugegenerating functions ξ α (P), one can bring the trace-reversed metric perturbation into Lorentz ¯ αβ, β = 0 and the wave equation h ¯ αβ,µ µ = 0, and gauge, so it satisfies the gauge condition h ¯ αβ = hαβ (t − z). In general there are 10 independent components of thence has the form h ¯ αβ , since it is a symmetric second-rank tensor, but the 4 gauge conditions reduce this from h 10 to 6. Thus, in general, the Lorentz-gauge metric perturbation for a plane gravitational wave contains six independent functions of t−z. Only two of these six can represent physical degrees of freedom of the wave; the other four must be pure-gauge functions and must be removable by a further specialization of the gauge. This exercise explores that further gauge freedom.

28 (a) Consider any trace-reversed metric perturbation that is in Lorentz gauge. Show that a further gauge change whose generators satisfy the wave equation ξα,µ µ = 0 leaves ¯ αβ still in Lorentz gauge. Show that such a gauge change, in general, involves four h free functions of three of the spacetime coordinates, by contrast with general gauge transformations which entail four free functions of all four spacetime coordinates. (b) Consider the plane gravitational wave described in the first paragraph of this problem. Exhibit gauge-change generators ξα that satisfy the wave equation and that remove ¯ αβ , bringing it into TT gauge, so the four of the six independent functions from h components of hαβ are given by Eqs. (25.89). (c) Show by an explicit calculation that the gauge change of part (b) can be achieved by throwing away all pieces of hαβ except the transverse ones (those that lie in the x-y plane) and by then removing the trace — i.e. by the transverse-traceless projection of Eq. (25.92).

****************************

25.4

The Generation of Gravitational Waves

25.4.1

Multipole-moment expansion

The electromagnetic waves emitted by a dynamical charge distribution are usually expressed as a sum over the source’s multipole moments. There are two families of moments: the electric moments (moments of the electric charge distribution) and the magnetic moments (moments of the electric current distribution). Similarly, the gravitational waves emitted by a dynamical distribution of mass-energy and momentum can be expressed as a sum over multipole moments. Again there are two families of moments: the mass moments (moments of the mass-energy distribution) and the current moments (moments of the mass-current distribution, i.e. the momentum distribution). The multipolar expansion of gravitational waves is presented in great detail in Thorne (1980). In this section we shall sketch and explain its qualitative and order-of-magnitude features. In the source’s weak-gravity near zone (if it has one), the mass moments show up in the time-time part of the metric in a form familiar from Newtonian theory g00 = −(1 + 2Φ) = −1&

I0 I1 I2 & & &... . r r2 r3

(25.97)

[cf. Eq. (23.95)]. Here r is radius, Iℓ is the moment of order ℓ, and “&” means “plus terms with the form”. The mass monopole moment I0 is the source’s mass, and the mass dipole moment I1 can be made to vanish by placing the origin of coordinates at the center of mass. Similarly, in the source’s weak-gravity near zone, its current moments Sℓ show up in the space-time part of the metric S1 S2 (25.98) g0j = 2 & 3 & . . . . r r

29 Just as there is no magnetic monopole moment in classical electromagnetic theory, so there is no current monopole moment in general relativity. The current dipole moment S1 is the source’s angular momentum, so the leading-order term in the expansion (25.98) has the form (23.112), which we have used to deduce the angular momenta of gravitating bodies. If the source has mass M, size L and internal velocities ∼ v, then the magnitudes of its moments are Iℓ ∼ MLℓ , Sℓ ∼ MvLℓ (25.99) These formulae guarantee that the near-zone fields g00 and g0j , as given by Eqs. (25.97) and (25.98), are dimensionless. As the source’s moments oscillate dynamically, they produce gravitational waves. Massenergy conservation [Eq. (23.114)] prevents the mass monopole moment I0 = M from oscillating; angular-momentum conservation [Eq. (23.115)] prevents the current dipole moment S1 = (angular momentum) from oscillating; and because the time derivative of the mass dipole moment I1 is the source’s linear momentum, momentum conservation [Eq. (23.118)] prevents the mass dipole moment from oscillating. Therefore, the lowest-order moments that can contribute to the waves are the quadrupolar ones. The wave fields h+ and h× in the source’s local asymptotic rest frame must (i) be dimensionless, (ii) die out as 1/r, and (iii) be expressed as a sum over derivatives of the multipole moments. These conditions guarantee that the waves will have the following form: h+ ∼ h× ∼

∂ 2 I2 /∂t2 ∂ 3 I3 /∂t3 ∂ 2 S2 /∂t2 ∂ 3 S3 /∂t3 & &...& & &... . r r r r

(25.100)

The timescale on which the moments oscillate is T ∼ L/v, so each time derivative produces a factor v/L. Correspondingly, the ℓ-pole contributions to the waves have magnitudes ∂ ℓ Iℓ /∂tℓ M ℓ ∼ v , r r

∂ ℓ Sℓ /∂tℓ M (ℓ+1) ∼ v . r r

(25.101)

This means that, for a “slow-motion source” (one with internal velocities v small compared to light so the reduced wavelength λ ¯ ∼ L/v is large compared to the source size L), the mass quadrupole moment I2 will produce the strongest waves. The mass octupole waves and current quadrupole waves will be weaker by ∼ v ∼ L/¯ λ; the mass 4-pole and current 2 2 2 octupole waves will be weaker by ∼ v ∼ L /¯ λ , etc. This is analogous to the electromagnetic case, where the electric dipole waves are the strongest, the electric quadrupole and magnetic dipole are smaller by ∼ L/¯ λ, etc. In the next section we shall develop the theory of mass-quadrupole gravitational waves. For the corresponding theory of higher-order multipoles, see, e.g., Sec. VIII of Thorne (1980).

25.4.2

Quadrupole-moment formalism

Consider a weakly gravitating, nearly Newtonian system, e.g. a binary star system, and write its Newtonian potential in the usual way Z ρ(x′ ) Φ(x) = − (25.102) dVx′ . |x − x′ |

30 By using Cartesian coordinates, placing the origin of coordinates at the center of mass, and expanding ′ ′ ′ 1 xj xj xj xk (3xj xk − r ′ 2 δjk ) 1 = + 3 + + ... , (25.103) |x − x′ | r r 2r 5 we obtain the multipolar expansion of the Newtonian potential Φ(x) = − Here M=

Z

ρdVx ,

3Ijk xj xk M − + ... . r 2r 5

Ijk =

Z

1 2 j k ρ x x − r δjk dVx 3

(25.104)

(25.105)

are the system’s mass and mass quadrupole moment. Note that the mass quadrupole moment is equal to the second moment of the mass distribution, with its trace removed. As we have discussed, dynamical oscillations of the quadrupole moment produce gravitational waves. Those waves must be describable by an outgoing-wave solution to the Lorentz-gauge, linearized Einstein equations ¯ µν, ν = 0 , h

¯ µν,α α = 0 h

(25.106)

[Eqs. (23.105) and (23.106)] that has the near-zone Newtonian limit j k 1 ¯ ¯ xx + h ¯ yy + h ¯ zz ) = h00 = 3Ijk x x (h00 + h 2 r

[cf. Eq. (23.101)]. The desired solution can be written in the form " # ˙ I (t − r) I (t − r) ¯ 0j = 2 jk ¯ 00 = 2 jk , h h r r ,jk

,k

,

¨ ¯ jk = 2 Ijk (t − r) , h r

(25.107)

(25.108)

p where the coordinates are Cartesian, r ≡ δjk xj xk , and the dots denote time derivatives. ¯ αβ, β and obtain zero To verify that this is the desired solution: (i) Compute its divergence h ¯ αβ has the form f (t−r)/r almost trivially. (ii) Notice that each Lorentz-frame component of h aside from some derivatives that commute with the wave operator, which implies that it satisfies the wave equation. (iii) Notice that in the near zone, the slow-motion assumption ¯ jk ≃ 0 and h ¯ 00 is inherent in the Newtonian limit makes the time derivatives negligible, so h twice the right-hand side of Eq. (25.107), as desired. Because the trace-reversed metric perturbation (25.108) in the wave zone has the speedof-light-propagation form, aside from its very slow decay as 1/r, we can compute the gravitational-wave field hTT jk from it by transverse-traceless projection, Eq. (25.96) with n = er : #TT " ¨ Ijk (t − r) . hTT (25.109) jk = 2 r

31 This is called the quadrupole-moment formula for gravitational-wave generation. Our derivation shows that it is valid for any nearly Newtonian source. Looking back more carefully at the derivation, one can see that, in fact, it relied only on the linearized Einstein equations and the Newtonian potential in the source’s local asymptotic rest frame. Therefore, this quadrupole formula is also valid for slow-motion sources that have strong internal gravity (e.g., slowly spinning neutron stars), so long as we read the quadrupole moment Ijk (t − r) off the source’s near-zone Newtonian potential (25.104) and don’t try to compute it via the Newtonian volume integral (25.105). When the source is nearly Newtonian, so the volume integral (25.105) can be used to compute the quadrupole moment, the computation of the waves is simplified by computing instead the second moment of the mass distribution Z Ijk = ρxj xk dVx , (25.110) which differs from the quadrupole moment solely in its trace. Then, because the TT projection is insensitive to the trace, the wave field (25.109) can be computed as hTT jk

"

I¨jk (t − r) =2 r

#TT

.

(25.111)

To get an order of magnitude feeling for the strength of the gravitational waves, notice that the second time derivative of the quadrupole moment, in order of magnitude, is the ns nonspherical part of the source’s internal kinetic energy, Ekin , so h+ ∼ h× ∼

ns E ns Ekin = G 4kin , r cr

(25.112)

where the second expression is written in conventional units. Although this estimate is based on the slow-motion assumption of source size small compared to reduced wavelength, L ≪ λ ¯, it remains valid in order of magnitude when extrapolated into the realm of the strongest of all realistic astrophysical sources, which have L ∼ λ ¯ . For sources in the “high-frequency” ns band of ground-based detectors (as we shall see below), the largest value of Ekin that is ns likely to occur is roughly Ekin ∼ M⊙ ∼ 1 km, where M⊙ is the mass of the Sun. The ns collision of two smallish black holes (masses of several solar masses) will have such an Ekin . −17 Such a source at the center of our galaxy would produce h+ ∼ 10 ; at the center of the Virgo cluster of galaxies it would produce h+ ∼ 10−20 , and at the Hubble distance (edge of the observable universe) it would produce h+ ∼ 10−23 . This sets the sensitivity goals of ground-based detectors, Sec. 25.5. GW Because the gravitational stress-energy tensor Tµν produces background curvature via B GW the Einstein equation Gµν = 8πTµν , just like nongravitational stress-energy tensors, it must contribute to the rate of change of the source’s mass M, linear momentum Pj and angular momentum Sj [Eqs. (23.114)–(23.118)] just like other stress-energies. When one inserts the B quadrupolar Tµν into Eqs. (23.114)–(23.118) and integrates over a sphere in the wave zone

32 of the source’s local asymptotic rest frame, one finds that 1 dM =− dt 5

dSi 2 = − ǫijk dt 5

,

∂ 2 Ijm ∂ 3 Ikm ∂t2 ∂t3

∂ 3 Ijk ∂ 3 Ijk ∂t3 ∂t3

(25.113)

,

(25.114)

and dPj /dt = 0. It turns out [cf. Sec. IV of Thorne (1980)] that the dominant linearmomentum change (i.e., the dominant radiation-reaction “kick”) arises from a beating of the mass quadrupole moment against the mass octupole moment, and mass quadrupole against current quadrupole. The back reaction of the emitted waves on their source shows up not only in changes of the source’s mass, momentum, and angular momentum, but also in accompanying changes of the source’s internal structure. These structure changes can be deduced fully, in many cases, from dM/dt, dSj /dt and dPj /dt. A nearly Newtonian binary system is an example (Sec. 25.4.3 below). However, in other cases (e.g., a compact body orbiting near the horizon of a black hole), the only way to compute the structure changes is via a gravitational-radiationreaction force that acts back on the system. The simplest example of such a force is one derived by William Burke (1971) for quadrupole waves emitted by a nearly Newtonian system. Burke’s quadrupolar radiation-reaction force can be incorporated into Newtonian gravitation theory by simply augmenting the system’s near-zone Newtonian potential by a radiation-reaction term, computed from the fifth time derivative of the system’s quadrupole moment: react

Φ

1 ∂ 5 Ijk j k xx . = 5 ∂t5

(25.115)

This potential satisfies the vacuum Newtonian field equation ∇2 Φ ≡ δjk Φ,jk = 0 because Ijk is trace free. This augmentation onto the Newtonian potential arises as a result of general relativity’s outgoing-wave condition. If one were to switch to an ingoing-wave condition, Φreact would change sign, and if the system’s oscillating quadrupole moment were joined onto standing gravitational waves, Φreact would go away. In Ex. 25.9, it is shown that the radiation reaction force density −ρ∇Φreact saps energy from the system at the same rate as the gravitational waves carry it away. Burke’s gravitational radiation-reaction potential Φreact and force density −ρ∇Φreact are close analogs of the radiation reaction potential [last term in Eq. (15.74)] and acceleration [right side of Eq. (15.77)] that act on an oscillating ball which emits sound waves into a surrounding fluid. Moreover, Burke’s derivation of his gravitational radiation-reaction potential is conceptually the same as the derivation, in Chap. 15, of the sound-wave reaction potential.

33

25.4.3

Gravitational waves from a binary star system

A very important application of the quadrupole formalism is to wave emission by a nearly Newtonian binary star system. Denote the stars by indices A and B and their masses by MA and MB , so their total and reduced mass are (as usual) M = MA + MB ,

µ=

MA MB ; M

(25.116)

and let the binary’s orbit be circular, for simplicity, with separation a between the stars’ centers of mass. Then Newtonian force balance dictates that the orbital angular velocity Ω is given by Kepler’s law, p Ω = M/a3 , (25.117) and the orbits of the two stars are MB a cos Ωt , M

MA a sin Ωt . M (25.118) j k The second moment of the mass distribution, Eq. (25.110), is Ijk = MA xA xA + MB xjB xkB . Inserting the stars’ time-dependent positions (25.118), we obtain as the only nonzero components xA =

yA =

Ixx = µa2 cos2 Ωt ,

MB a sin Ωt , M

Iyy = µa2 sin2 Ωt ,

xB = −

MA a cos Ωt , M

yB = −

Ixy = Iyx = µa2 cos Ωt sin Ωt .

Noting that cos2 Ωt = 21 (1 + cos 2Ωt), sin2 Ωt = 21 (1 − cos 2Ωt) and cos Ωt sin Ωt = and evaluating the double time derivative, we obtain I¨xx = −2µ(MΩ)2/3 cos 2Ωt , Iÿy = 2µ(MΩ)2/3 cos 2Ωt , I¨xy = Iÿx = −2µ(MΩ)2/3 sin 2Ωt .

(25.119) 1 2

sin 2Ωt,

(25.120)

We express this in terms of Ω rather than a because Ω is a direct gravitational-wave observable: the waves’ angular frequency is 2Ω. To compute the gravitational-wave field (25.109), we must project out the transverse part of this. The projection is best performed in an orthonormal spherical basis, since there the transverse part is just the projection into the plane spanned by ~eθˆ and ~eφˆ, and the transverse-traceless part just has components 1 (I¨θˆθˆ)TT = −(I¨φˆφˆ)TT = (I¨θˆθˆ − I¨φˆφˆ) , 2

(I¨θˆφˆ)TT = I¨θˆφˆ .

(25.121)

Now, a little thought will save us much work: We need only compute these quantities at φ = 0 (i.e., in the x-z plane), since their circular motion guarantees that their dependence on t and φ must be solely through the quantity Ωt − φ. At φ = 0, ~eθˆ = ~ex cos θ − ~ez sin θ and ~eφˆ = ~ey , so the only nonzero components of the transformation matrices from the Cartesian basis to the transverse part of the spherical basis are Lx θˆ = cos θ, Lz θˆ = − sin θ, Ly φˆ = 1. Using this transformation matrix, we obtain, at φ = 0, I¨θˆθˆ = I¨xx cos2 θ, I¨φˆφˆ = Iÿy , I¨θˆφˆ = I¨xy cos θ.

34 Inserting these and expressions (25.120) into Eq. (25.121), and setting Ωt → Ωt − φ to make the formulae valid away from φ = 0, we obtain (I¨θˆθˆ)TT = −(I¨φˆφˆ)TT = −(1 + cos2 θ) µ(MΩ)2/3 cos[2(Ωt − φ)] , (I¨θˆφˆ)TT = +(I¨φˆθˆ)TT = −2 cos θ µ(MΩ)2/3 sin[2(Ωt − φ)] .

(25.122)

The gravitational-wave field (25.109) is 2/r times this quantity evaluated at the retarded time t − r. We shall make the conventional choice for the polarization tensors: e+ = (~eθˆ ⊗ ~eθˆ − ~eφˆ ⊗ ~eφˆ) ,

e× = (~eθˆ ⊗ ~eφˆ + ~eφˆ ⊗ ~eθˆ) .

(25.123)

Then the two scalar gravitational-wave fields are µ(MΩ)2/3 2 ¨ TT 2 [ I (t − r)] = −2(1 + cos θ) cos[2(Ωt − Ωr − φ)] , h+ = hTT = ˆ ˆ θˆθˆ r θθ r

(25.124a)

µ(MΩ)2/3 2 ¨ TT [ I (t − r)] = −4 cos θ sin[2(Ωt − Ωr − φ)] . h× = hTT = ˆ ˆ ˆ ˆ θφ r θφ r

(25.124b)

We have expressed the ampitudes of these waves in terms of the dimensionless quantity (MΩ)2/3 = M/a = v 2 , where v is the relative velocity of the two stars. Notice that, as viewed from the polar axis θ = 0, h+ and h× are identical except for a π/2 phase delay, which means that the net stretch-squeeze ellipse (the combination of those in Figs. 25.1 and 25.2) rotates with angular velocity Ω. This is the gravitational-wave variant of circular polarization and arises because the binary motion as viewed from the polar axis looks circular. By contrast, as viewed by an observer in the equatorial plane θ = π/2, h× vanishes, so the net stretch-squeeze ellipse just oscillates along the + axes and the waves have linear polarization. This is natural, since the orbital motion as viewed by an equatorial observer is just a linear, horizontal, back-and-forth oscillation. Notice also that the gravitational-wave frequency is twice the orbital frequency, i.e. f =2

Ω Ω = . 2π π

(25.125)

To compute, via Eqs. (25.113) and (25.114), the rate at which energy and angular momentum are lost from the binary, we need to know the double and triple time derivatives of its quadrupole moment Ijk . The double time derivative is just I¨jk with its trace removed, but Eq. (25.119) shows that I¨jk is already trace free so I¨jk = I¨jk . Inserting Eq. (25.119) for this quantity into Eqs. (25.113) and (25.114) and performing the average over a gravitational-wave period, we find that 32 µ2 dM =− (MΩ)10/3 , 2 dt π M

dSz 1 dM =− , dt Ω dt

dSx dSy = =0. dt dt

(25.126)

35 This loss of energy and angular momentum causes the binary to spiral inward, decreasing the stars’ separation a and increasing the orbital angular velocity Ω. By comparing Eqs. (25.126) with the standard equations for the binary’s orbital energy and angular momentum, M − (sum of rest masses of stars) = E = − 12 µM/a = − 21 µ(MΩ)2/3 , and Sz = µa2 Ω = µ(MΩ)2/3 /Ω, we obtain an equation for dΩ/dt which we can integrate to give Ω = πf =

1 1 5 2/3 256 µM to − t

3/8

.

(25.127)

Here to (an integration constant) is the time remaining until the two stars merge, if the stars are thought of as point masses so their surfaces do not collide sooner. This equation can be inverted to read off the time until merger as a function of gravitational-wave frequency. These results for a binary’s waves and radiation-reaction-induced inspiral are of great importance for gravitational-wave detection; see, e.g., Cutler and Thorne (2002). As the stars spiral inward, (MΩ)2/3 = M/a = v 2 grows larger, h+ and h× grow larger, and relavistic corrections to our Newtonian, quadrupole analysis grow larger. Those relativistic corrections (including current-quadrupole waves, mass-octupole waves, etc.) can be computed using a post-Newtonian expansion of the Einstein field equations, i.e. an expansion in M/a ∼ v 2 . The expected accuracies of the LIGO/VIRGO network require that, for neutronstar binaries, the expansion be carried to order v 6 beyond our Newtonian, quadrupole analysis! At the end of the inspiral, the binary’s stars (or black holes) come crashing together. To compute the waves from this final merger, with an accuracy comparable to the expected observations, it is necessary to solve the Einstein field equation on a computer. The techniques for this are called numerical relativity. Numerical relativity is currently in its infancy, but has great promise for producing new insights into general relativity. **************************** EXERCISES Exercise 25.7 Example: Quadrupolar wave generation in linearized theory Derive the quadrupolar wave-generation formula (25.111) for a slow-motion, weak-gravity source in linearized theory, in Lorenz gauge, beginning with the retarded-integral formula Z 4Tµν (t − |x − x′ |, x′ ]) ¯ (25.128) hµν (t, x) = dVx′ |x − x′ | [Eq. (23.107)]. Your derivation might proceed as follows: (a) Show that for a slow-motion source, the retarded integral gives for the 1/r ≡ 1/|x| ¯ jk (radiative) part of h Z 4 ¯ (25.129) Tjk (t − r, x′ )dVx′ . hjk (t, x) = r

36 ¯ µν,α α (b) Show that in linearized theory in Lorenz gauge, the vacuum Einstein equations −h ν ¯ = 16πTµν [Eq. (23.106)] and the Lorenz gauge condition hµν, = 0 [Eq. (23.105)] together imply that the stress-energy tensor that generates the waves must have vanishing coordinate divergence, T µν ,ν = 0. This means that linearized theory is ignorant of the influence of self gravity on the gravitating T µν ! (c) Show that this vanishing divergence implies [T 00 xj xk ],00 = [T lm xj xk ],ml − 2[T lj xk + T lk xj ],l + 2T jk . (d) By combining the results of (a) and (c), deduce that 2 ¯ jk (t, x) = 2 d Ijk (t − r) , h r dt2

(25.130)

where Ijk is the second moment of the source’s (Newtonian) mass-energy distribution T 00 = ρ [Eq. (25.110)]. (e) Noticing that the trace-reversed metric perturbation (25.130) has the “speed-of-lightpropagation” form, deduce that the gravitational-wave field hTT jk can be computed from (25.130) by a transverse-traceless projection, Eq. (25.96). Comment: Part (b) shows that this linearized-theory analysis is incapable of deducing the gravitational waves emitted by a source whose dynanmics is controlled by its self gravity, e.g., a nearly Newtonian binary star system. By contrast, the derivation of the quadrupole formula given in Sec. 25.4.2 is valid for any slow-motion source, regardless of the strength and roles of its internal gravity; see the discussion following Eq. (25.109). Exercise 25.8 Problem: Energy carried by gravitational waves Compute the net rate at which the quadrupolar waves (25.109) carry energy away from their source, by carrying out the surface integral (23.114) with T 0j being Isaacson’s gravitationalwave energy flux (25.58). Your answer should be Eq. (25.113). [Hint: perform the TT projection in Cartesian coordinates using the projection tensor (25.94), and make use of the following integrals over solid angle on the unit sphere Z Z Z 1 1 1 1 ni dΩ = 0 , ni nj dΩ = δij , ni nj nk dΩ = 0; , 4π 4π 3 4π Z 1 1 ni nj nk nl dΩ = (δij δkl + δik δjl + δil δjk ). (25.131) 4π 15 These relations should be obvious by symmetry, aside from the numerical factors out in front. Those factors are most easily deduced by computing the z components, i.e., by setting i = j = k = l = z and using nz = cos θ.] Exercise 25.9 Problem: Energy removed by gravitational radiation reaction Burke’s radiation-reaction potential (25.115) produces a force per unit volume −ρ∇Φreact on its nearly Newtonian source. If we multiply this force per unit volume by the velocity

37 v = dx/dt of the source’s material, we obtain thereby a rate of change of energy per unit volume. Correspondingly, the net rate of change of the system’s mass-energy must be Z dM = − ρv · ∇Φreact dVx . (25.132) dt Show that, when averaged over a few gravitational-wave periods, this formula agrees with the rate of change of mass (25.113) that we derived by integrating the outgoing waves’ energy flux. Exercise 25.10 Problem: Propagation of waves through an expanding universe As we shall see in Chap. 27, the following line element is a possible model for the large-scale structure of our universe: ds2 = b2 [−dη 2 + dχ2 + χ2 (dθ2 + sin2 θdφ2 )] ,

where b = bo η 2

(25.133)

and bo is a constant with dimensions of length. This is an expanding universe with flat spatial slices η = constant. Notice Rthat the proper time measured by observers at rest in the spatial coordinate system is t = bo η 2 dη = (bo /3)η 3 . A nearly Newtonian, circular binary is at rest at χ = 0 in an epoch when η ∼ ηo . The coordinates of the binary’s local asymptotic rest frame are (t, r, θ, φ) where r = aχ and the coordinates cover only a tiny region of the universe, χ . χo ≪ ηo . The gravitational waves in this local asymptotic rest frame are described by the Eqs. (25.123) and (25.124); see also Sec. 25.3.5. Use geometric optics (Sec. 25.3.6) to propagate these waves out through the expanding universe. In particular (a) Show that the null rays are the curves of constant θ, φ, and η − χ. (b) Show that the orthonormal basis vectors ~eθˆ, ~eφˆ associated with the (η, χ, θ, φ) coordinates are parallel transported along the rays. (This should be fairly obvious from symmetry.) (c) Show that the wave fields have the form (25.124) with t − r replaced by the retarded time τr = 31 bo (η − χ)3 , and with 1/r being some function of χ and η (what is that function?). Exercise 25.11 Problem: Gravitational waves emitted by a linear oscillator Consider a mass m attached to a spring so it oscillates along the z axis of a Cartesian coordinate system, moving along the world line z = a cos Ωt, y = x = 0. Use the quadrupole moment formalism to compute the gravitational waves h+ (t, r, θ, φ) and h× (t, r, θ, φ) emitted by this oscillator, with the polarization tensors chosen as in Eqs. (25.123). Pattern your analysis after the computation of waves from a binary in Sec. 25.4.3 . Exercise 25.12 Problem: Gravitational waves from waving arms Wave your arms rapidly, and thereby try to generate gravitational waves. (a) Compute in order of magnitude, using classical general relativity, the wavelength of the waves you generate and their dimensionless amplitude at a distance of one wavelength away from you.

38 (b) How many gravitons do you produce per second? Discuss the implications of your result. ****************************

25.5

The Detection of Gravitational Waves

Physicists and astronomers are searching for gravitational waves in four different frequency bands using four different techniques: • In the extremely low frequency (ELF) band, ∼ 10−15 to ∼ 10−18 Hz, gravitational waves are sought via their imprint on the polarization of the cosmic microwave background (CMB) radiation. There is only one expected ELF source of gravitational waves, but it is a very interesting one: quantum fluctuations in the gravitational field (spacetime curvature) that emerge from the big bang’s quantum-gravity regime, the Planck era, and that are subsequently amplified to classical, detectable sizes by the universe’s early inflationary expansion. We shall study this amplification and the resulting ELF gravitational waves in Chap. 27 and shall see these waves’ great potential for probing the physics of inflation. • In the very low frequency (VLF) band, ∼ 10−7 to ∼ 10−9 Hz, gravitational waves are sought via their influence on the propagation of radio waves emitted by pulsars (spinning neutron stars) and by the resulting fluctuations in the arrival times of the pulsars’ radio-wave pulses at earth. The expected VLF sources are violent processes in the first fraction of a second of the universe’s life (Chap. 27), and the orbital motion of extremely massive pairs of black holes in the distant universe. • In the low frequency (LF) band, ∼ 10−4 to ∼ 0.1 Hz, gravitational waves are currently sought via their influence on the radio signals by which NASA tracks interplanetary spacecraft. In ∼ 2012 this technique will be supplanted by LISA, the Laser Interferometer Space Antenna—three “drag-free” spacecraft in a triangular configuration with 5 kilometer long arms, that track each other via laser beams. LISA is likely to see waves from massive black-hole binaries (hole masses ∼ 105 to 107 M⊙ ) out to cosmologiocal distances; from small holes, neutron stars, and white dwarfs spiraling into massive black holes out to cosmological distances; from the orbital motion of white-dwarf binaries, neutron-star binaries, and stellar-mass black-hole binaries in our own galaxy; and possibly from violent processes in the very early universe. • The high frequency (HF) band, ∼ 10 to ∼ 103 Hz, is where earth-based detectors operate: laser interferometer gravitational wave detectors such as LIGO, and resonant-mass detectors in which a gravitational wave alters the amplitude and phase of vibrations of a normal mode of a large, cylindrical bar. These detectors are likely to see waves from spinning, slightly deformed neutron stars in our own galaxy, and from a variety of sources in the distant universe: the final inspiral and collisions of binaries made

39 y

Be a

m

sp lit

te

r

L

End mirror

x

Laser Photodetector

L

End mirror

Fig. 25.3: An idealized gravitational-wave interferometer

from neutron stars and/or stellar-mass black holes (up to hundreds of solar masses); the tearing apart of a neutron star by the spacetime curvature of a companion black hole; supernovae and the triggers of gamma ray bursts; and possibly waves from violent processes in the very early universe. For detailed discussions of these gravitational-wave sources in all four frequency bands, and of prospects for their detection, see e.g. Cutler and Thorne (2002) and references therein. It is likely that waves will be seen in all four bands within the next 20 years, and the first detection is likely to occur in the HF band using gravitational-wave interferometers such as LIGO. We briefly discussed such interferometers in Sec. 8.5, focusing on optical interferometry issues. In this chapter we shall analyze the interaction of a gravitational wave with such an interferometer. That analysis will not only teach us much about gravitational waves, but will also illustrate some central issues in the physical interpretation of general relativity theory. To get quickly to the essentials, we shall examine initially a rather idealized detector: A Michaelson interferometer (one without the input mirrors of Fig. 8.11) that floats freely in space, so there is no need to hang its mirrors by wires; see Fig. 25.3. At the end of this chapter, we shall briefly discuss more realistic interferometers and their analysis. We shall use linearized theory to analyze the interaction of our idealized interferometer with a gravitational wave. We shall perform our analysis twice, using two different coordinate systems (two different gauges). Our two analyses will predict the same results for the interferometer output, but they will appear to attribute those results to two different mechanisms. In our first analysis (performed in TT gauge; Sec. 25.5.1) the interferometer’s test masses will remain always at rest in our chosen coordinate system, and the gravitational waves h+ (t − z) will interact with the interferometer’s light. The imprint that h+ (t − z) leaves on the light will cause a fluctuating light intensity Iout (t) ∝ h+ (t) to emerge from the

40 interferometer’s output port and be measured by the photodiode. In our second analysis (performed in the proper reference frame of the interferometer’s beam splitter; Sec. 25.5.2) the gravitational waves will interact hardly at all with the light. Instead, they will push the end mirrors back and forth relative to the coordinate system, thereby lengthening one arm while shortening the other. These changing arm lengths will cause a changing interference of the light returning to the beam splitter from the two arms, and that changing interference will produce the fluctuating light intensity Iout (t) ∝ h+ (t) measured by the photodiodes. These differences of viewpoint are somewhat like the differences between the Heisenberg Picture and the Schrödinger Picture in quantum mechanics. The intuitive pictures associated with two viewpoints appear to be very different (Schrödinger’s wave function vs. Heisenberg’s matrices; gravitational waves interacting with light vs. gravitational waves pushing on mirrors). But whenever one computes the same physical observable from the two different viewpoints (probability for a quantum measurement outcome; light intensity measured by photodetector), the two viewpoints give the same answer.

25.5.1

Interferometer analyzed in TT gauge

For our first analysis, we place the interferometer at rest in the x-y plane of a TT coordinate system, with its arms along the x and y axes and its beam splitter at the origin as shown in Fig. 25.3. For simplicity, we assume that the gravitational wave propagates in the z direction and has + polarization, so the linearized spacetime metric has the TT-gauge form ds2 = −dt2 + [1 + h+ (t − z)]dx2 + [1 − h+ (t − z)]dy 2 + dz 2

(25.134)

[Eq. (25.89)]. For ease of notation, we shall omit the subscript + from h+ in the remainder of this section. The beam splitter and end mirrors move freely and thus travel along geodesics of the metric (25.134). The splitter and mirrors are at rest in the TT coordinate system before the wave arrives, so initially the spatial components of their 4-velocities vanish, uj = 0. Because the metric coefficients gαβ are all independent of x and y, the geodesic equation dictates that the components ux and uy are conserved and thus remain zero as the wave passes, which implies (since the metric is diagonal) ux = dx/dτ = 0 and uy = dy/dτ = 0. One can also show (see Ex. 25.13) that uz = dz/dτ = 0 throughout the wave’s passage. Thus, in terms of motion relative to the TT coordinate system, the gravitational wave has no influence at all on the beam splitter and mirrors; they all remain at rest (constant x, y and z) as the waves pass. (Despite this lack of motion, the proper distances between the mirrors and the beam splitter—the interferometer’s physically measured arm lengths—do change. If the unchanging coordinate lengths of the two arms are ∆x = ℓx and ∆y = ℓy , then the metric (25.134) says that the physically measured arm lengths are

1 Lx = 1 + h(t) ℓx , 2

1 Ly = 1 − h(t) ℓy . 2

(25.135)

41 When h is positive, the x arm is lengthened and the y arm is shortened; when negative, Lx is shortened and Ly is lengthened.) Turn, next, to the propagation of light in the interferometer. We assume, for simplicity, that the light beams have large enough transverse sizes that we can idealize them, on their optic axes, as plane electromagnetic waves. (In reality, they will be Gaussian beams, of the sort studied in Sec. 7.5.5). The light’s vector potential satisfies the curved-spacetime vacuum wave equation Aα;µ µ = 0 [Eq. (23.71) with vanishing Ricci tensor]. We write the vector potential in geometric optics (eikonal-approximation) form as Aα = ℜ(Aα eiφ ) ,

(25.136)

where Aα is a slowly varying amplitude and φ is a rapidly varying phase; cf. Eq. (6.20). Because the wavefronts are (nearly) planar and the spacetime metric is nearly flat, the light’s amplitude Aµ will be very nearly constant as it propagates down the arms, and we can ignore its variations. Not so the phase. It oscillates at the laser frequency, ωo ∼ 3 × 1014 Hz; i.e., φout x arm ≃ ωo (x − t) for light propagating outward from the beam splitter along the x arm, and similarly for the returning light and the light in the y arm. The gravitational wave imprints onto the phase tiny deviations from this ωo (x − t); we must compute those imprints. In the spirit of geometric optics, we introduce the light’s spacetime wave vector ~k ≡ ∇φ ~ ,

(25.137)

and we assume that ~k varies extremely slowly compared to the variations of φ. Then the wave equation Aα;µ µ = 0 reduces to the statement that the wave vector is null, ~k·~k = φ,α φ,β g αβ = 0. For light in the x arm the phase depends only on x and t; for that in the y arm it depends only on y and t. Combining this with the TT metric (25.134) and noting that the interferometer lies in the z = 0 plane, we obtain 2 2 ∂φx arm ∂φx arm − + [1 − h(t)] = 0, ∂t ∂x 2 2 ∂φy arm ∂φy arm − + [1 + h(t)] = 0. (25.138) ∂t ∂y We idealize the laser as perfectly monochromatic and we place it at rest in our TT coordinates, arbitrarily close to the beam splitter. Then the outgoing light frequency, as measured by the beam splitter, must be precisely ωo and cannot vary with time. Since proper time, as measured by the beam splitter, is equal to coordinate time t [cf. the metric (25.134))], the frequency that the laser and beam splitter measure must be ω = −∂φ/∂t = −kt . This dictates the following boundary conditions (initial conditions) on the phase of the light that travels outward from the beam splitter: ∂φout ∂φout y arm x arm = −ωo at x = 0 , = −ωo at y = 0 . (25.139) ∂t ∂t It is straightforward to verify that the solutions to Eq. (25.138) [and thence to the wave equation and thence to Maxwell’s equation] that satisfy the boundary conditions (25.139)

42 are φout x arm φout y arm

1 1 = −ωo t − x + H(t − x) − H(t) , 2 2 1 1 = −ωo t − y − H(t − y) + H(t) , 2 2

where H(t) is the first time integral of the gravitational waveform, Z t H(t) ≡ h(t′ )dt′ ;

(25.140)

(25.141)

0

cf. Ex. 25.14. The outgoing light reflects off the mirrors, which are at rest in the TT coordinates at locations x = ℓx and y = ℓy . As measured by observers at rest in these coordinates, there is no doppler shift of the light because the mirrors are not moving. Correspondingly, the phases of the reflected light, returning back along the two arms, have the following forms: 1 1 back φx arm = −ωo t + x − 2ℓx + H(t + x − 2ℓx ) − H(t) , 2 2 1 1 back φy arm = −ωo t + y − 2ℓy − H(t + y − 2ℓy ) + H(t) . (25.142) 2 2 The difference of the phases of the returning light, at the beam splitter (x = y = 0), is 1 1 back ∆φ ≡ φback x arm − φy arm = −ωo [−2(ℓx − ℓy ) + H(t − 2ℓx ) + H(t − 2ℓy ) − H(t)] 2 2 ≃ +2ωo [ℓx − ℓy + ℓh(t)] for earth-based interferometers. (25.143) In the second line we have used the fact that for earth-based interferometers operating in the high-frequency band, the gravitational wavelength λGW ∼ c/(100Hz) ∼ 3000 km is long compared to the interferometers’ ∼ 4 km arms, and the arms have nearly the same length, ℓy ≃ ℓx ≡ ℓ. back back The beam splitter sends a light field ∝ eiφx arm + eiφy arm back toward the laser, and a back back back field ∝ eiφx arm − eiφy arm = eiφy arm (ei∆φ − 1) toward the photodetector. The intensity of the light entering the photodetector is proportional to the squared amplitude of the field, IPD ∝ |ei∆φ − 1|2 . We adjust the interferometer’s arm lengths so their difference ℓx − ℓy is small compared to the light’s reduced wavelength 1/ωo = c/ωo but large compared to |ℓh(t)|. Correspondingly, |∆φ| ≪ 1, so only a tiny fraction of the light goes toward the photodetector (it is the interferometer’s “dark port”), and that dark-port light intensity is IPD ∝ |ei∆φ − 1|2 ≃ |∆φ|2 ≃ 4ωo2 (ℓx − ℓy )2 + 8ωo2 (ℓx − ℓy )ℓh(t) .

(25.144)

The time varying part of this intensity is proportional to the gravitational waveform h(t), and it is this time varying part that the photodetector reports as the interferometer output.

43

25.5.2

Interferometer analyzed in proper reference frame of beam splitter

We shall now reanalyze our idealized interferometer in the proper reference frame of its beam splitter, denoting that frame’s coordinates by xˆα . Because the beam splitter is freely falling (moving along a geodesic through the gravitational-wave spacetime), its proper reference frame is locally Lorentz (“LL”), and its metric coefficients have the form gαˆβˆ = ηαβ + O(δjk xˆj xˆk /R2 ) [Eq. (23.15)]. Here R is the radius of curvature of spacetime, and 1/R2 is of ¨ tˆ − zˆ) [Eq. (25.41) order the components of the Riemann tensor, which have magnitude h( with t and z equal to tˆ and zˆ aside from fractional corrections of order h]. Thus, ¨ tˆ − zˆ)δjk xˆj xˆk ] . gαˆβˆ = ηαβ + O[h( (25.145) The following coordinate transformation takes us from the TT coordinates xα used in the previous section to the beam splitter’s LL coordinates: 1 ˆ 1 ˆ x = 1 − h(t − zˆ) xˆ , y = 1 + h(t − zˆ) yˆ , 2 2 1˙ ˆ 1˙ ˆ t − zˆ)(ˆ x2 − yˆ2 ) , z = zˆ − h( t − zˆ)(ˆ x2 − yˆ2) . (25.146) t = tˆ − h( 4 4 It is straightforward to insert this coordinate transformation into the TT-gauge metric (25.134) and thereby obtain, to linear order in h, 1 2 ¨ − z)(dtˆ − dˆ x − yˆ2 )h(t z )2 . ds2 = −dtˆ2 + dˆ x2 + dˆ y 2 + dˆ z 2 + (ˆ 2

(25.147)

This has the expected LL form (25.145) and, remarkably, it turns out not only to be a solution of the vacuum Einstein equations in linearized theory but also an exact solution to the full vacuum Einstein equations [cf. Ex. 35.8 of MTW]. Throughout our idealized interferometer, the magnitude of the metric perturbation in these LL coordinates is |hαˆβˆ| . (ℓ/¯ λGW )2 h, where λ ¯ GW = λGW /2π is the waves’ reduced ˆ wavelength and h is the magnitude of h(t − zˆ). For earth-based interferometers operating in the HF band (∼ 10 to ∼ 1000 Hz), λ ¯ GW is of order 50 to 5000 km, and the arm lengths are ℓ ≤ 4 km, so (L/¯ λ)2 . 10−2 to 10−6 . Thus, the metric coefficients hαˆ βˆ are no larger than h/100. This has a valuable consequence for the analysis of the interferometer: Up to fractional accuracy ∼ (ℓ/¯ λGW )2 h . h/100, the LL coordinates are globally Lorentz throughout the interferometer; i.e., tˆ measures proper time, and xˆj are Cartesian and measure proper distance. In the rest of this section, we shall restrict attention to such earth-based interfrometers, but shall continue to idealize them as freely falling. The beam splitter, being initially at rest at the origin of these LL coordinates, remains always at rest, but the mirrors move. Not surprisingly, the geodesic equation for the mirrors in the metric (25.147) dictates that their coordinate positions are, up to fractional errors of order (ℓ/¯ λGW )2 h, 1 ˆ xˆ = Lx = 1 + h(t) ℓx , yˆ = zˆ = 0 for mirror in x arm, 2 1 ˆ (25.148) yˆ = Ly = [1 − h(t) ℓy , xˆ = zˆ = 0 for mirror in y arm. 2

44 (This can also be deduced from the gravitational-wave tidal acceleration −RtGW xˆk , as in Eq. ˆˆ ˆˆ 0k 0 (25.45), and from the fact that to good accuracy xˆ and yˆ measure proper distance from the beam splitter.) Thus, although the mirrors do not move in TT coordinates, they do move in LL coordinates. The two coordinate systems predict the same time-varying physical arm lengths (the same proper distances from beam splitter to mirrors), Lx and Ly [Eqs. (25.135) and (25.148)]. As in TT coordinates, so also in LL coordinates, we can analyze the light propagation in the geometric optics approximation, with Aαˆ = ℜ(Aαˆ eiφ ). Just as the wave equation for the vector potential dictates, in TT coordinates, that the rapidly varying phase of the outward light in the x arm has the form φout x arm = −ωo (t − x) + O(ωo ℓhµν ) [Eq. (25.140) with ˙ x∼ℓ≪λ ¯ GW so H(t − x) − H(t) ≃ H(t)x = h(t)x ∼ hL ∼ hµν L], so similarly the wave equation in LL coordinates turns out to dictate that 2 ℓ out φx arm = −ωo (tˆ − xˆ) + O(ωoℓhµˆνˆ ) = −ωo (tˆ − xˆ) + O ωo ℓh 2 , (25.149) λ ¯ GW and similarly for the returning light and the light in the y arm. The term O(ωo ℓh ℓ2 /¯ λ2GW ) is the influence of the direct interaction between the gravitational wave and the light. Aside from this term, the analysis of the interferometer proceeds in exactly the same way as in flat space (because tˆ measures proper time and xˆ and yˆ proper distance): The light travels a round trip distance Lx in one arm and Ly in the other, and therefore acquires a phase difference, upon arriving back at the beam splitter, given by ℓ2 ∆φ = −ωo [−2(Lx − Ly )] + O ωo ℓh 2 λ ¯ GW 2 ℓ . (25.150) ≃ +2ωo [ℓx − ℓy + ℓh(tˆ)] + O ωo ℓh 2 λ ¯ GW This net phase difference for the light returning from the two arms is the same as we deduced in TT coordinates [Eq. (25.143)], up to the negligible correction O(ωo ℓh ℓ2 /¯ λ2GW ), and therefore the time-varying intensity of the light into the photodiode will be the same [Eq. (25.144)]. In our TT analysis the phase shift 2ωo ℓh(t) arose from the interaction of the light with the gravitational waves. In the LL analysis, it is due to the displacements of the mirrors in the LL coordinates (i.e., the displacements as measured in terms of proper distance), which cause the light to travel different distances in the two arms. The direct LL interaction of the waves with the light produces only the tiny correction O(ωoℓh ℓ2 /¯ λ2GW ) to the phase shift. It should be evident that the LL description is much closer to elementary physics than the TT description. This is always the case, when one’s apparatus is sufficiently small that one can regard tˆ as measuring proper time and xˆj as Cartesian coordinates that measure proper distance throughout the apparatus. But for a large apparatus (e.g. LISA, with its arm lengths ℓ & λ ¯ GW ) the LL analysis becomes quite complicated, as one must pay close attention to the O(ωo ℓh ℓ2 /¯ λ2GW ) corrections. In such a case, the TT analysis is much simpler.

45

25.5.3

Realistic Interferometers

For realistic, earth-based interferometers, one must take account of the acceleration of gravity. Experimenters do this by hanging their beam splitters and test masses on wires or fibers. The simplest way to analyze such an interferometer is in the proper reference frame of the beam splitter, where the metric must now include the influence of the acceleration of gravity by adding a term −2ge zˆ to the metric coefficient hˆ0ˆ0 [cf. Eq. (22.87)]. The resulting analysis, like that in the LL frame of our freely falling interferometer, will be identical to what one would do in flat spacetime, so long as one takes account of the motion of the test masses as dictated by the gravitational-wave tidal acceleration −Rîˆ0ˆj ˆ0 xˆj , and so long as one is willing to ignore the tiny effects of O(ωo ℓh ℓ2 /¯ λ2GW ). To make the realistic interferometer achieve high sensitivity, the experimenters introduce a lot of clever complications, such as the input mirrors of Fig. 8.11 which turn the arms into Fabry-Perot cavities. All these complications can be analyzed, in the beam splitter’s proper reference frame, using standard flat-spacetime techniques, so long as one makes sure to take account of the end-mirror motion as dictated by the gravitational-wave tidal acceleration. The direct coupling of the light to the gravitational waves can be neglected, as in our idealized interferometer. **************************** EXERCISES Exercise 25.13 Derivation and Practice: Geodesic motion in TT coordinates Consider a particle that is at rest in the TT coordinate system of the gravitational-wave metric (25.134) before the gravitational wave arrives. In the text it is shown that the particle’s 4-velocity has ux = uy = 0 as the wave passes. Show that uz = 0 and ut = 1 as the wave passes, so the components of the particle’s 4-velocity are unaffected by the passing gravitational wave. Exercise 25.14 Example: Light in an interferometric gravitational wave detector in TT gauge Consider the light propagating outward from the beam splitter, along the x arm of an interferometric gravitational wave detector, as analyzed in TT gauge, so (suppressing the subscript “x arm” and superscript “out”) the electromagnetic vector potential is Aα = ℜ(Aα eiφ(x,t) ) with Aα constant and with φ = −ωo t − x + 21 H(t − x) − 12 H(t) [Eqs. (25.140) and (25.141).] (a) Show that this φ satisfies the nullness equation (25.138), as claimed in the text — which implies that Aα = ℜ(Aα eiφ(x,t) ) satisfies Maxwell’s equations in the geometric optics limit. (b) Show that this φ satisfies the initial condition (25.139), as claimed in the text. (c) Show, by an argument analogous to Eq. (25.76), that ∇~k~k = 0. Thus, the wave vector must be the tangent vector to geometric optics rays that are null geodesics

46 in the gravitational-wave metric. Photons travel along these null geodesics and have 4-momenta ~p = ~~k. (d) Because the gravitational-wave metric (25.134) is independent of x, the px component of a photon’s 4-momentum must be conserved along its geodesic world line. Compute px = kx = ∂φ/∂x, thereby verify this conservation law. (e) Explain why the photon’s frequency, as measured by observers at rest in our TT coordinate system, is ω = −kt = −∂φ/∂t. Explain why the rate of change of this frequency, as computed moving with the photon, is dω/dt ≃ (∂/∂t + ∂/∂x)ω, and show that dω/dt ≃ − 12 ωo dh/dt. ****************************

Bibliographic Note For an up-to-date, elementary introduction to experimental tests of general relativity in the solar system, we recommend Chap. 10 of Hartle (2003). For an enjoyable, popular-level book on experimental tests, see Will (1993a). For a very complete monograph on the theory underlying experimental tests, see Will (1993b), and for an up to date review of experimental tests, see Will (2006). For an elementary, up to date, and fairly complete introduction to gravitational waves, we recommend Chaps. 16 and 23 of Hartle (2003). Also good at the elementary level, but less complete and less up to date, is Chap. 9 of Schutz (1980). For a more advanced treatment of the properties of gravitational waves, their propagation and their generation, we suggest MTW Sec. 18.2 and Chaps. 35 and 36; but Chap. 37 on gravitational wave detection is terribly out of date and not recommended. For a more advanced presentation of gravitational wave theory, see Thorne (1983), and for a fairly complete and nearly up to date review of gravitational wave sources for ground-based detectors (LIGO etc.) and spacebased detectors (LISA etc.), see Cutler and Thorne (2002). For a lovely monograph on the physics of interferometric gravitational-wave detectors, see Saulson (1994). For a two-term course on gravitational waves (theory and experiment), including videos of lectures plus large numbers of readings, problems and solutions, see Thorne, Bondarescu and Chen (2002). For a compendium of educational materials about gravitational waves, see the website of the LIGO Academic Advisory Committee, http://www.ligo.caltech.edu/laac/ .

Bibliography Bertotti, B., Iess, L. and Tortora, T., Nature 425, 374 (2003). Burke, William L., 1971. “Gravitational radiation damping of slowly moving systems calculated using matched asymptotic expansions,” Journal of Mathematical Physics, 12, 402–418.

47 Box 25.2 Important Concepts in Chapter 25 • Experimental Tests of general relativity, Sec. 25.2 – Weak equivalence principle (universality of free fall), Sec. 25.2.1 – Gravitational redshift, Sec. 25.2.1 – Perihelion shift, Sec.25.2.2 and Ex. 25.2 – Fermat’s principle, gravitational lenses, and deflection of light, Sec. 25.2.3 and Ex. 25.3 – Shapiro time delay, Sec. 25.2.4 – Frame dragging, Sec. 25.2.5 • Gravitational Wave Properties – Definition via short-wavelength condition λ ¯ ≪ {R, L}, Sec. 25.3.1 B – Background metric gαβ and waves’ metric perturbation hαβ and Riemann tenGW sor Rαβγδ , Sec. 25.3.1 – Gravitational-wave fields h+ and h× and their stretch–squeeze physical manifestations, Sec. 25.3.2 – Gravitons, Sec. 25.3.3 and end of Sec. 25.3.4 GW – Gravitatiional-wave stress-energy tensor Tαβ and its conservation laws, Sec. 25.3.4 and paragraph containing Eq. (25.113) • Gravitational wave propagation via geometric optics, Sec. 25.3.6 • TT gauge and computation of hTT jk and correspondingly h+ and h× via TT projection, Sec. 25.3.7 • Gravitational-wave generation, Sec. 25.4 – Multipolar expansion of h+ and h× , Sec. 25.4.1 – Slow-motion sources: quadrupole-moment formalism for computing wave generation, Sec. 25.4.2 – Burke’s radiation-reaction potential, Eq. (25.115) – Application: waves from a binary system, and binary inspiral due to radiation reaction, Sec. 25.4.3 • Gravitational-wave detection, Sec. 25.5 – Frequency bands: ELF, VLF, LF and HF, Sec. 25.5 – Interferometric gravitational-wave detector (“interferometer”), Sec. 25.5.1 – How to analyze an interferometer in TT gauge (Sec. 25.5.1) and in the proper reference frame of the beam splitter (Sec. 25.5.2)

48 Cutler, Curt and Thorne, Kip S., 2002. “An overview of gravitational wave sources,” in Proceedings of the GR16 Conference on General Relativity and Gravitation, ed. N. Bishop and S. D. Maharaj (World Scientific, 2002), 72–111; also available at http://xxx.lanl.gov/abs/gr-qc/0204090 Einstein, Albert, 1916. “Die Grundlage der allgemeinen Relativitätstheorie,” Annalen der Physik, 49, 769–822. English translation in Einstein et al . (1923). ¨ Einstein, Albert, 1918. “Uber Gravitationwellen,” Sitzungsberichte der Königlish Preussischen Akademie der Wissenschaften, 1918 volume, 154–167. Fierz, M. and Pauli, Wolfgang, 1939. “On relativistic wave equations for particles of arbitary spin in an electromagnetic field,” Proceedings of the Royal Society A, 173, 211–232. Hartle, J. B., 2003. Gravity: An Introduction to Einstein’s General Relativity, San Francisco: Addison-Wesley. Isaacson, R. A. 1968. Physical Review 166, 1272. MTW: Misner, Charles W., Thorne, Kip S. and Wheeler, John A., 1973. Gravitation, W. H. Freeman & Co., San Francisco. Saulson, Peter, Fundamentals of Interferometric Gravitational Wave Detectors, World Scientific, Singapore. Schutz, B. 1980. Geometrical Methods of Mathematical Physics, Cambridge: Cambridge University Press. Thorne, Kip S., 1980. Review of Modern Physics, 52, 299. Thorne, Kip S., 1983. “The Theory of Gravitational Radiation: An Introductory Review,” in Gravitational Radiation, eds. N. Dereulle and T. Piran, North Holland, Amsterdam, pp. 1–57. Thorne, Kip S., Bondarescu, Mihai and Chen, Yanbei, 2002. Gravitational Waves: A Web-Based Course, http://elmer.tapir.caltech.edu/ph237/ Weissberg, J.M. and Taylor, J.H. in Proceedings of the Aspen Conference on Binary Radio Pulsars, eds. F.A. Rasio and I.H. Stairs (Astronomical Society of the Pacific Conference Series, in press); http://xxx.lanl.gov/abs/gr-qc/0407149 . Will, Clifford M., 1993a. emphWas Einstein Right?: Putting General Relativity to the Test, New York: Basic Books. Will, Clifford M., 1993b. Theory and Experiment in Gravitational Physics, Revised Edition, Cambridge University press, Cambridge, UK.

49 Will, Clifford M., 2006. “The Confrontation between General Relativity and Experiment,” Living Reviews in Relativity 9, URL (cited on cited on 16 May 2007): http://www.livingreviews.org/Irr-2006-3 .

Contents 26 Cosmology 26.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.2 Homogeneity and Isotropy of the Universe; Robertson-Walker Line Element . . . . . . . . . . . . . . . . . . . . . . . . . 26.3 The Stress-energy Tensor and the Einstein Field Equation . . . . . . . . . . 26.4 Evolution of the Universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.4.1 Constituents of the universe: Cold matter, radiation, and dark energy 26.4.2 The vacuum stress-energy tensor . . . . . . . . . . . . . . . . . . . . 26.4.3 Evolution of the densities . . . . . . . . . . . . . . . . . . . . . . . . 26.4.4 Evolution in time and redshift . . . . . . . . . . . . . . . . . . . . . . 26.4.5 Physical processes in the expanding universe . . . . . . . . . . . . . . 26.5 Observational Cosmology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.5.1 Parameters characterizing the universe . . . . . . . . . . . . . . . . . 26.5.2 Local Lorentz frame of homogeneous observers near Earth . . . . . . 26.5.3 Hubble expansion rate . . . . . . . . . . . . . . . . . . . . . . . . . . 26.5.4 Primordial nucleosynthesis . . . . . . . . . . . . . . . . . . . . . . . . 26.5.5 Density of Cold Dark Matter . . . . . . . . . . . . . . . . . . . . . . 26.5.6 Radiation Temperature and Density . . . . . . . . . . . . . . . . . . . 26.5.7 Anisotropy of the CMB: Measurements of the Doppler Peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.5.8 Age of the universe: Constraint on the dark energy . . . . . . . . . . 26.5.9 Magnitude-Redshift relation for type Ia supernovae: Confirmation that the universe is accelerating . . . . . . . . . . . . . 26.6 The Big-Bang Singularity, Quantum Gravity, and the Initial Conditions of the Universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.7 Inflationary Cosmology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.7.1 Amplification of Primordial Gravitational Waves by Inflation . . . . . 26.7.2 Search for Primordial Gravitational Waves by their Influence on the CMB; Probing the Inflationary Expansion Rate . . . . . . . . . . . .

0

1 1 2 9 13 13 14 14 17 22 27 27 28 29 30 30 31 31 36 36 40 42 43 43

Chapter 26 Cosmology Version 0826.1.K.pdf, 20 May 2009. Please send comments, suggestions, and errata via email to [email protected] ch.edu or on paper to Kip Thorne, 130-33 Caltech, Pasadena CA 91125 [NOTE: I have done a quick and incomplete revision of this chapter in light of the observational data from the WMAP satellite (Bennett et. al. 2003, Hinshaw et. al. 2000). I have not yet had time to do this carefully. — Kip] Box 26.1 Reader’s Guide • This chapter relies significantly on – The special relativity portions of Chap. 1. – Chapter 23, on the transition from special relativity to general relativity. – Chapter 24, on the fundamental concepts of general relativity. – Sec. 24.3.3 on local energy-momentum conservation for a perfect fluid and Sec. 24.6 on the many-fingered nature of time. • In addition, Box 26.3 and Ex. 26.7 of this chapter rely on the Planckian distribution function for thermalized photons and its evolution (Liouville’s theorem or collisionless Boltzmann equation), as presented in Secs. 2.2.4, 2.3, and sec:02EvolutionLaws of Chap. 2.

26.1

Overview

General Relativity is an indispensable foundation for understanding the large scale structure and evolution of the universe (cosmology), but it is only one foundation out of many. The crudest of understandings can be achieved with general relativity and little else; but more detailed and deeper understandings require combining general relativity with quantum field 1

2 theory, nuclear and atomic physics, thermodynamics, fluid mechanics, and large bodies of astrophysical lore. In this chapter we shall explore aspects of cosmology which are sufficiently crude that general relativity, augmented by only bits and pieces of other physics, can provide an adequate foundation. Our exploration will simultaneously illustrate key aspects of general relativity and give the reader an overview of modern cosmology. We shall begin in Sec. 26.2 by discussing the observational data that suggest our universe is homogeneous and isotropic when averaged over regions of space huge compared to clusters of galaxies, and we then shall construct a spacetime metric for an idealized homogeneous, isotropic model for the universe. In Sec. 26.3 we shall construct a stress-energy tensor that describes, approximately, the total, averaged energy and pressure of the universe’s matter and radiation; and we shall insert that stress-energy tensor and the metric of Sec. 26.2 into the Einstein field equation, thereby deducing a set of equations that govern the evolution of the universe. In Sec. 26.4 we shall study the predictions that those evolution equations make for the rate of expansion of the universe and the manner in which the expansion changes with time, and we shall describe the most important physical processes that have occured in the universe during its evolution into its present state. As we shall see, the details of the expansion are determined by the values of seven parameters that can be measured today— with the caveat that there may be some big surprises associated with the so-called dark energy. In Sec. 26.5 we shall describe the astronomical observations by which the universe’s seven parameters are being measured, and the multifaceted evidence for dark energy. In Sec. 26.6 we shall discuss the big-bang singularity in which the universe probably began, and shall discuss the fact that this singularity, like singularities inside black holes, is a signal that general relativity breaks down and must be replaced by a quantum theory of gravity which (hopefully) will not predict singular behavior. We shall also examine a few features that the quantum theory of gravity is likely to exhibit. Finally, in Sec. 26.7 we shall discuss the “inflationary” epoch that the universe appears to have undergone immediately after the quantum gravity, big-bang epoch.

26.2

Homogeneity and Isotropy of the Universe; Robertson-Walker Line Element

The universe obviously is not homogeneous or isotropic in our neighborhood: In our solar system (size ∼ 1014 cm) almost all the mass is concentrated in the sun and planets, with a great void in between. Looking beyond the solar system, one sees the Milky Way Galaxy (size ∼ 1023 cm ∼ 105 light years, or equivalently 3 × 104 parsecs),1 with its mass concentrated toward the center and its density falling off roughly as 1/(distance)2 as one moves out past the sun and into the Galaxy’s outer reaches. Beyond the Galaxy is the emptiness of intergalactic space; then other galaxies congregated into our own “local group” (size ∼ 106 parsecs). The local group is in the outer reaches of a cluster of several thousand galaxies called the Virgo One parsec is 3.262 light years, i.e. 3.086 × 1018 cm. It is defined as the distance of a star whose apparent motion on the sky, induced by the Earth’s orbital motion, is a circle with radius one arc second. 1

3

1

x =1

x

1

x =1

x

2=3

1

x =1

2=3

x

t=2 t=1 t=0

2 =3

world line of homogeneous observer

Fig. 26.1: The synchronous coordinate system for a homogeneous, isotropic model of the universe.

cluster (size ∼ 107 parsecs), beyond which is the void of intercluster space, and then other clusters at distances ! 108 parsecs. Despite all this structure, the universe appears to be nearly homogeneous and isotropic on scales ! 108 parsecs, i.e., ! 3 × 108 light years: On such scales one can regard galaxies and clusters of galaxies as “atoms” of a homogeneous, isotropic “gas” that fills the universe. On scales far larger than clusters of galaxies, our best information about homogeneity and isotropy comes from the cosmic microwave background radiation (“CMB”). As we shall see in Secs. 26.4 and 26.5 below, this radiation, emitted by hot, primordial gas long before galaxies formed, comes to us from distances of order 3 × 109 parsecs (1 × 1010 light years)—a scale 100 times larger than a rich cluster of galaxies (i.e., than a “supercluster”), and the largest scale on which observations can be made. Remarkably, this microwave radiation has a black body spectrum with a temperature that is the same, in all directions on the sky, to within about three parts in 105 . This means that the temperature of the primordial gas was homogeneous on large scales to within this impressive accuracy. These observational data justify a procedure in modeling the universe which was adopted by Einstein (1917) and others, in the early days of relativistic cosmology, with little more than philosophical justification: Like Einstein, we shall assume, as a zero-order approximation, that the universe is precisely homogeneous and isotropic. Later we shall briefly discuss galaxies and clusters of galaxies as first-order corrections to the homogeneous and isotropic structure. Our assumption of homogeneity and isotropy can be stated more precisely as follows: There exists a family of slices of simultaneity (3-dimensional spacelike hypersurfaces), which completely covers spacetime, with the special property that on a given slice of simultaneity (i) no two points are distinguishable from each other (“homogeneity”), and (ii) at a given point no one spatial direction is distinguishable from any other (“isotropy”). Whenever, as here, the physical (geometrical) structure of a system has special symmetries, it is useful to introduce a coordinate system which shares and exhibits those symmetries. In the case of a spherical black hole, we introduced spherical coordinate systems. Here we shall introduce a coordinate system in which the homogeneous and isotropic hypersurfaces are slices of constant coordinate time t [Fig. (26.1)]: Recall (Sec. 24.6) the special role of observers whose world lines are orthogonal to the homogeneous and isotropic hypersurfaces: if they define simultaneity locally (on small scales)

4 by the Einstein light-ray synchronization process, they will regard the homogeneous hypersurfaces as their own slices of simultaneity. Correspondingly, we shall call them the “homogeneous observers.” We shall define our time coordinate t to be equal to the proper time τ as measured by these homogeneous observers, with the arbitrary additive constant in t so adjusted that one of the homogeneous hypersurfaces (the “initial” hypersurface) has t = 0 everywhere on it. Stated differently, but equivalently, we select arbitrarily the initial hypersurface and set t = 0 throughout it; and we then define t along the world line of a homogeneous observer to be the proper time that the observer’s clock has ticked since the observer passed through the initial hypersurface. This definition of t has an important consequence: Since the points at which each of the observers pass through the initial hypersurface are all equivalent (all indistinguishable; “homogeneity”), the observers’ subsequent explorations of the homogeneous universe must be indistinguishable; and, correspondingly, they must all reach any specific homogeneous hypersurface at the same proper time τ , and thence at the same coordinate time t = τ . Thus, the hypersurfaces of constant coordinate time t are the same as the homogeneous hypersurfaces. Turn, next, to the three spatial coordinates xj . We shall define them in an arbitrary manner on the initial hypersurface, but shall insist that the homogeneous observers carry them forward (and backward) in time along their world lines, so that each homogeneous observer’s world line is a curve of constant x1 , x2 , and x3 ; cf. Fig. 26.1. In this {t, xj } coordinate system the spacetime metric, described as a line element, will take the generic form ds2 = gtt dt2 + 2gtj dt dxj + gjk dxj dxk . (26.1) Since xj are constant along a homogeneous observer’s world line, the basis vector (∂/∂t)xj is tangent to the world line; and since t is constant in a homogeneous hypersurface, the basis vector (∂/∂xj )t lies in the hypersurface. These facts, plus the orthogonality of the homogeneous observer’s world line to the homogeneous hypersurface, imply that ! " ∂ ∂ ∂ ∂ gtj ≡ g (26.2) = , · j =0. j ∂t ∂x ∂t ∂x Moreover, since # the proper time along a homogeneous observer’s world line (line of constant j x ) is dτ = −gtt dt2 , and since by construction dt is equal to that proper time, it must be that (26.3) gtt = −1 . By combining Eqs. (26.1)–(26.3) we obtain for the line element in our very special coordinate system ds2 = −dt2 + gjk dxj dxk . (26.4)

Because our spatial coordinates, thus far, are arbitrary (i.e., they do not yet mold themselves in any special way to the homogeneous hypersurfaces), the spatial metric coefficients gjk must be functions of the spatial coordinates xi as well as of time t. [Side Remark : Any coordinate system in which the line element takes the form (26.4) is called a synchronous coordinate system. This is true whether the hypersurfaces t = const

5 are homogeneous and isotropic or not. The key features of synchronous coordinates are that they mold themselves to the world lines of a special family of observers in such a way that t is proper time along the family’s world lines, and the slices of constant t are orthogonal to those world lines (and thus are light-ray-synchronization-defined simultaneities for those observers). Since the introduction of synchronous coordinates involves a specialization of precisely four metric coefficients (gtt = −1, gtj = 0), by a careful specialization of the four coordinates one can construct a synchronous coordinate system in any and every spacetime. On the other hand, one cannot pick an arbitrary family of observers and use them as the basis of synchronous coordinates: The observers must move freely; i.e., their world lines must be geodesics. This one can see by computing uα ;β uβ for the vector field #u ≡ ∂/∂t, which represents the 4-velocities of the synchronous coordinate system’s special observers; a straightforward calculation [Exercise 26.1] gives uα ;β uβ = 0, in accord with geodesic motion. Thus it is that the static observers (observers with constant r, θ, φ) outside a black hole cannot be used as the foundation for synchronous coordinates. Those observers must accelerate to prevent themselves from falling into the hole; and correspondingly, the closest thing to a synchronous coordinate system that one can achieve, using for xj = const the world lines of the static observers, is the Schwarzschild coordinate system, which has gtj = 0 (the slices of constant t are simultaneities as measured by the static observers), but gtt = −(1 − 2M/r) %= −1 (the proper time between two adjacent simultaneities depends on the radius at which the static observer resides).] Returning to cosmology, we shall now specialize our spatial coordinates so they mold themselves nicely to the homogeneity and isotropy of the slices of constant t. One might have hoped this could be done in such a way that the metric coefficients are independent of all three coordinates xj . Not so. The surface of a sphere is a good example in one lower dimension: it is homogeneous and isotropic, but the most symmetric coordinates one can find for it, spherical polar coordinates, produce a line element (2) ds2 = a2 dθ2 + a2 sin2 θdφ2 with a metric coefficient gφφ = a2 sin2 θ that depends on θ. The deep, underlying reason is that the vector field (∂/∂φ)θ that “generates” rotations about the polar axis (z-axis) does not commute with the vector field that generates rotations about any other axis; and, correspondingly, those two vector fields cannot simultaneously be made the basis vectors of any coordinate system, and the metric coefficients cannot be made independent of two angular coordinates simultaneously. (For further detail see Secs. 25.2 and 25.3 of MTW, and especially Exercise 25.8.) Similarly, on our cosmological homogeneous hypersurfaces the most symmetric coordinate system possible entails metric coefficients that are independent of only one coordinate, not all three. In order to construct that most-symmetric coordinate system, we choose arbitrarily on the hypersurface t = const an origin of coordinates. Isotropy about that origin (all directions indistinguishable) is equivalent to spherical symmetry, which motivates our introducing spherical polar coordinates θ, φ, and a radial coordinate that we shall call χ. In this coordinate system the line element of the hypersurface will take the form a2 [dχ2 + Σ2 (dθ2 + sin2 θdφ2 )] ,

(26.5)

where a multiplicative constant (scale factor ) a has been factored out for future convenience (it could equally well have been absorbed into χ and Σ), and where Σ is an unknown

6 function of the radial coordinate χ. Correspondingly, the 4-dimensional line element of spacetime (26.4) will take the form ds2 = −dt2 + a2 [dχ2 + Σ2 (dθ2 + sin2 θdφ2 )] ,

(26.6)

where a is now a function of time t (i.e., it varies from hypersurface to hypersurface). Our next task is to figure out what functions Σ(χ) are compatible with homogeneity and isotropy of the hypersurfaces. There are elegant, group-theoretic ways to figure this out; see, e.g., Ryan and Shepley (1975). A more straightforward but tedious way is to note that, because the 3-dimensional Riemann curvature tensor of the hypersurface must be homogeneous and isotropic, it must be algebraically expressible in terms of (i ) constants, and the only tensors that pick out no preferred locations or directions: (ii ) the metric tensor gjk and (iii ) the Levi-Civita tensor 'ijk . Trial and error shows that the only combination of these three quantities which has the correct number of slots and the correct symmetries is Rijkl = K(gik gjl − gil gjk ) ,

(26.7)

where K is a constant. By computing, for the 3-dimensional metric (26.5), the components of the 3-dimensional Riemann tensor and comparing with (26.7), one can show that there are three possibilities for the function Σ(χ) in the metric, and three corresponding possibilities for the constant K in the three-dimensional Riemann tensor. These three possibilities are nicely parametrized by a quantity k which takes on the values +1, 0, and −1: k = +1 :

k=0 : k = −1 :

Σ = sin χ ,

Σ=χ, Σ = sinh χ ,

K=

k 1 = + , a2 a2

(26.8a)

k =0, a2

(26.8b)

k 1 =− 2 . 2 a a

(26.8c)

K= K=

We shall discuss each of these three possibilities in turn: Closed universe [k = +1]: For k = +1 the geometry of the homogeneous hypersurfaces, (3)

ds2 = a2 [dχ2 + sin2 χ(dθ2 + sin2 θdφ2 )] ,

(26.9)

is that of a 3-sphere, i.e., an ordinary sphere generalized to one higher dimension. One can verify this, for example, by showing (Ex. 26.2) that in a 4-dimensional Euclidean space with Cartesian coordinates (w, x, y, z) and line element (4)

ds2 = dw 2 + dx2 + dy 2 + dz 2 ,

(26.10)

the 3-sphere w 2 + x2 + y 2 + z 2 = a2

(26.11)

7

!=0

=0

=0

T

/2

(a)

=0

(b)

!=0

=0

r

(c)

Fig. 26.2: Embedding diagrams for the homogeneous hypersurfaces of (a) a closed, k = +1, cosmological model; (b) a flat, k = 0, model; and (c) an open, k = −1 model.

has the same metric (26.9) as our cosmological, homogeneous hypersurface. Figure 26.2(a) is an embedding diagram for an equatorial slice, θ = π/2, through the homogeneous hypersurface [2-geometry (2) ds2 = a2 (dχ2 + sin2 χdφ2 ); cf. Eq. (26.9)]. Of course, the embedded surface is a 2-sphere. As radius χ increases, the circumference 2πa sin χ around the spatial origin at first increases, then reaches a maximum at χ = π/2, then decreases again to zero at χ = π. Clearly, the homogeneous hypersurface is topologically “closed” and has a finite volume. For this reason a k = +1 cosmological model is often called a “closed universe”. The universe’s 3-volume, V = 2π 2 a3 (Ex. 26.2). Flat universe [k = 0]: For k = 0 the geometry of the homogeneous hypersurfaces, (3)

ds2 = a2 [dχ2 + χ2 (dθ2 + sin2 θdφ2 )] ,

(26.12)

is that of a flat, 3-dimensional Euclidean space—as one can easily see by setting r = aχ and thereby converting (26.12) into the standard spherical-polar line element for Euclidean space. Correspondingly, this cosmological model is said to represent a “flat universe.” Note, however, that this universe is only spatially flat: the Riemann curvature tensor of its 3dimensional homogeneous hypersurfaces vanishes; but, as we shall discuss below, because of the time evolution of the expansion factor a, the Riemann curvature of the full 4-dimensional spacetime does not vanish. The volumes of the homogeneous hypersurfaces are infinite, so one cannot talk of the universe’s total volume changing with time. However, the volume ∆V of a box in which resides a specific set of homogeneous observers will change as the expansion factor a evolves. For example, the volume could be a box with sides ∆χ, ∆θ, ∆φ, so ∆V = 'χθφ ∆χ∆θ∆φ = a3 χ2 sin θ∆χ∆θ∆φ ,

(26.13)

where 'χθφ is a component of the Levi Civita tensor. Open universe [k = −1]: For k = −1 the geometry of the homogeneous hypersurfaces, (3)

ds2 = a2 [dχ2 + sinh2 χ(dθ2 + sin2 θdφ2 )] ,

(26.14)

is different from geometries with which we have ordinary experience: The equatorial plane θ = π/2 is a 2-surface whose circumference 2πa sinh χ increases with growing radius aχ faster

8 than is permitted for any 2-surface that can ever reside in a 3-dimensional Euclidean space: d(circumference) = 2πcosh[(radius)/a] > 2π . d(radius)

(26.15)

Correspondingly, any attempt to embed that equatorial plane in a Euclidean 3-space is doomed to failure. As an alternative, we can embed it in a flat, Minkowski 3-space with line element (3) ds2 = −dT 2 + dr 2 + r 2 dφ2 . (26.16) The result is the hyperboloid of revolution, T 2 − r 2 = a2 ,

(26.17)

which is shown pictorially in Fig. 26.2(c). By analogy it is reasonable to expect, and one easily can verify, that the full homogeneous hypersurface [metric (26.14)] has the same 3geometry as the 3-dimensional hyperboloid T 2 − r 2 = a2

(26.18)

in the 4-dimensional Minkowski space (4)

ds2 = −dT 2 + dr 2 + r 2 (dθ2 + sin2 dφ2 ) .

(26.19)

That this hyperboloid is, indeed, homogeneous and isotropic one can show by verifying that Lorentz transformations in the T, r, θ, φ 4-space can move any desired point on the hyperboloid into the origin, and can rotate the hyperboloid about the origin by an arbitrary angle. Note that the T, r, θ, φ space has no relationship whatsoever to the physical spacetime of our homogeneous, isotropic universe. It merely happens that both spaces possess 3dimensional hypersurfaces with the same 3-geometry (26.14). Because these hypersurfaces are topologically open, with infinite volume, the k = −1 cosmological model is often called an “open universe.” [Side remark : Although homogeneity and isotropy force the cosmological model’s hypersurfaces to have one of the three metrics (26.9), (26.12), (26.14), the topologies of those hypersurfaces need not be the obvious ones adopted and described above. For example, a flat model could have a closed topology with finite volume rather than an open topology with infinite volume. This could come about if, for example, in a Cartesian coordinate system {x = χ sin θ cos φ, y = χ sin θ sin φ, z = χ cos θ} the 2-surface x = −L/2 were identical to x = +L/2 (so x, like φ in spherical polar coordinates, is periodic), and if similarly y = −L/2 were identical to y = +L/2 and z = −L/2 were identical to z = +L/2. The resulting universe would have volume a3 L3 ; and if one were to travel outward far enough, one would find oneself (as on the surface of the Earth) returning to where one started. This and other unconventional choices for the topology of the standard cosmological models are kept in mind by cosmologists, just in case observational data someday should give evidence for them; but in the absence of such evidence, cosmologists assume the simplest choices of topology: those made above.]

9 Historically, the three possible choices for the geometry of a homogeneous, isotropic cosmological model were discovered by Alexander Alexandrovich Friedmann (1922), a Russian mathematician in Saint Petersburg; and, correspondingly, the specific solutions to the Einstein field equations which Friedmann constructed using those geometries are called Friedmann cosmological models. The first proof that these three choices are the only possibilities for the geometry of a homogeneous, isotropic cosmological model was given independently by Howard Percy Robertson (1935), who was a professor at Caltech, and by Arthur Geoffrey Walker (1936), who was a young researcher at the Royal College of Science in London; and, correspondingly, the general line element (26.6) with Σ = sin χ, χ, or sinh χ is called the Robertson-Walker line element. **************************** EXERCISES Exercise 26.1 Example: The Observers of a Synchronous Coordinate System Show that any observer who is at rest in a synchronous coordinate system [Eq. (26.4)] is freely falling, i.e., moves along a geodesic of spacetime. Exercise 26.2 Example: The 3-Sphere Geometry of a Closed Universe (a) Show, by construction, that there exist coordinates χ, θ, φ on the 3-sphere (26.11) [which resides in the the 4-dimensional Euclidean space of Eq. (26.10)] such that the 3-sphere’s line element assumes the same form (26.9) as that of a homogeneous hypersurface in a closed, k = +1, universe. (b) Show that the total 3-volume of this 3-sphere is V = 2π 2 a3 . ****************************

26.3

The Stress-energy Tensor and the Einstein Field Equation

The expansion factor, a(t), of our zero-order, homogeneous, isotropic cosmological model is governed by the Einstein field equation G = 8πT. In order to evaluate that equation we shall need a mathematical expression for the stress-energy tensor, T. We shall deduce an expression for T in two different ways: by mathematical arguments, and by physical considerations; and the two ways will give the same answer. Mathematically, we note that because the spacetime geometry is homogeneous and isotropic, the Einstein curvature tensor must be homogeneous and isotropic, and thence the Einstein equation forces the stress-energy tensor to be homogeneous and isotropic. In the local Lorentz frame of a homogeneous observer, which has basis vectors #eˆ0 =

∂ , ∂t

#eχˆ =

1 ∂ , a ∂χ

#eθˆ =

1 ∂ , aΣ ∂θ

#eφˆ =

∂ 1 , aΣ sin θ ∂φ

(26.20)

10 ˆˆ

the components of the stress-energy tensor are T 00 =(energy density measured by homoˆˆ ˆˆ geneous observer), T 0j =(momentum density), T j k =(stress). Isotropy requires that the momentum density (a 3-dimensional vector in the homogeneous hypersurface) vanish; if it did not vanish, its direction would break the isotropy. Isotropy also requires that the stress, a symmetric-second rank 3-tensor residing in the homogeneous hypersurface, not pick out any preferred directions; and this is possible if and only if the stress is proportional to the metric tensor of the hypersurface. Thus, the components of the stress-energy tensor in the observer’s local Lorentz frame must have the form ˆˆ

T 00 ≡ ρ ,

ˆˆ

T 0j = 0,

ˆ

T j kˆ = P δ jk ,

(26.21)

where ρ is just a new notation for the energy density, and P is the isotropic pressure. This is precisely the stress-energy tensor of a perfect fluid which is at rest with respect to the homogeneous observer. Reexpressed in geometric, frame-independent form, this stressenergy tensor is T = (ρ + P )#u ⊗ #u + P g , (26.22) where #u is the common 4-velocity of the fluid and of the homogeneous observers #u = #eˆ0 =

∂ . ∂t

(26.23)

Physical considerations lead to this same stress-energy tensor: The desired stress-energy tensor must be that of our own universe, coarse-grain-averaged over scales large compared to a cluster of galaxies, i.e., averaged over scales ∼ 108 parsecs. The contributors to that stressenergy tensor will be (i ) the galaxy clusters themselves, which like the atoms of a gas will produce a perfect-fluid stress-energy with ρ equal to their smeared-out mass density and P equal to 1/3 times ρ times their mean square velocity relative to the homogeneous observers; (ii ) the intercluster gas, which (one can convince oneself by astrophysical and observational arguments) is a perfect fluid, nearly at rest in the frame of the homogeneous observers; (iii ) the cosmic microwave radiation, which, being highly isotropic, has the stress-energy tensor of a perfect fluid with rest frame the same as that of the homogeneous observers; (iv ) as-yet undetected cosmological backgrounds of other fundamental particles such as neutrinos, gravitons, axions, neutralinos, . . . , which are expected on theoretical grounds to be homogeneous and isotropic when coarse-grain averaged, with the same rest frame as the homogeneous observers; and (v) a possibly nonzero stress-energy tensor of the vacuum, which we shall discuss in Sec. 26.4 below, and which also has the perfect-fluid form . Thus, all the contributors are perfect fluids, and their energy densities and pressures add up to give a stress-energy tensor of the form (26.22). As in our analysis of relativistic stars (Sec. 24.3), so also here, before evaluating the # Einstein field equation we shall study the local law of conservation of 4-momentum, ∇·T = 0. (That conservation law is always easier to evaluate than the field equation, and by virtue of the contracted Bianchi identity it is equivalent to some combination of components of the field equation.) # · T, which appears in the law of 4-momentum conservation, is a vector. The quantity ∇ # · T, Since T has already been forced to be spatially isotropic, the spatial, 3-vector part of ∇

11 i.e., the projection of this quantity into a homogeneous hypersurface, is guaranteed already to vanish. Thus, only the projection orthogonal to the hypersurface, i.e., along #eˆ0 = #u = ∂/∂t, will give us any information. This projection is viewed by a homogeneous observer, or equivalently by the perfect fluid, as the law of energy conservation. Evaluation of it, i.e., computation of Tˆ0 µˆ ;ˆµ = 0 with T given by (26.21) and the metric given by (26.6), yields (Exercise 26.3) da3 d(ρa3 ) = −P . (26.24) dt dt This is precisely the first law of thermodynamics for a perfect fluid, as one can see by the following calculation: Imagine a rectangular parallelopiped of fluid contained in the spatial region between χ and χ + ∆χ, between θ and θ + ∆θ, and between φ and φ + ∆φ. As time passes the “walls” of this parallelopiped remain fixed relative to the homogeneous observers (since the walls and the observers both keep xj fixed as t passes), and correspondingly the walls remain fixed in the fluid’s rest frame. The volume of this fluid element is ∆V = a3 Σ2 sin θ∆χ∆θ∆φ, and the total mass-energy contained in it is E = ρV . Correspondingly, the first law of thermodynamics for the fluid element, dE/dt = −P dV /dt says ∂(ρa3 Σ2 sin θ∆χ∆θ∆φ) ∂(a3 Σ2 sin θ∆χ∆θ∆φ) = −P . ∂t ∂t

(26.25)

By dividing out the coordinate volume Σ2 sin θ∆χ∆θ∆φ (which is time independent), and then replacing the partial derivative by an ordinary derivative (because ρ and a depend only on t), we obtain the local law of energy conservation (26.24). The fact that the local law of energy conservation, Tˆ0 µˆ ;ˆµ is identical to the first law of thermodynamics should not be surprising. Into our stress-energy tensor we put only the contribution of a perfect fluid, so energy conservation for it, in the fluid’s local rest frame, must reduce to energy conservation for a perfect fluid, which is the first law of thermodynamics. If we had put other contributions into the stress-energy tensor, we would have obtained from energy conservation corresponding contributions to the first law; for example (as we saw in Part IV, when we studied fluid mechanics), if we had put viscous stresses into the stress-energy tensor, we would have obtained the first law in the form d(ρV ) = −P dV + T dS, including an explicit expression for the entropy increase dS due to viscous heating. Turn, next, to the components of the Einstein equation G = 8πT. Because the metric has already been forced to be homogeneous and isotropic, the Einstein tensor is guaranteed ˆˆ ˆˆ ˆˆ already to have the homogeneous, isotropic form G00 %= 0, G0j = 0, Gj k ∝ δ jk , i.e., the same form as the stress-energy tensor (26.21). Correspondingly, there are only two nontrivial components of the Einstein field equation, the time-time component and the isotropic (proportional to δ jk ) space-space component. Moreover, the contracted Bianchi identity guarantees that some combination of these two components will be equivalent to our nontrivial law of energy conservation, thereby leaving only one new piece of information to be extracted from the Einstein equation. We shall extract that information from the timeˆˆ ˆˆ ˆˆ time component, G00 = 8πT 00 . A straightforward but tedious evaluation of G00 = Gtt for the Robertson-Walker line element (26.6), and insertion into the field equation along with

12 ˆˆ

T 00 = T tt = ρ gives

! "2 a˙ k 8π + 2 = ρ, a a 3

(26.26)

where the dot represents a derivative with respect to the homogeneous observers’ proper time t. To verify that no errors have been made, one can evaluate the remaining nontrivial component of the field equation, Gχˆχˆ = 8πT χˆχˆ (or the θˆθˆ or φˆφˆ component; they are all ˆˆ ˆˆ equivalent since Gj k and T j k are both proportional to δ jk ). The result, a ¨ 2 + a

! "2 a˙ k + 2 = −8πP , a a

(26.27)

is, as expected, a consequence of the first of the Einstein components (26.26) together with the law of energy conservation (26.24): by differentiating (26.26) and then using (26.24) to eliminate ρ, ˙ one obtains (26.27). The task of computing the time evolution of our zero-order cosmological model now takes the following form: (i ) Specify an equation of state P = P (ρ)

(26.28)

for the cosmological perfect fluid; (ii ) integrate the first law of thermodynamics dρ (ρ + P ) = −3 da a

(26.29)

[Eq. (26.24), rearranged] to obtain the density ρ and [via Eq. (26.28)] the pressure P as functions of the expansion factor a; (iii ) evolve the expansion factor forward in time using the field equation ! "2 a˙ k 8π ρ (26.30) + 2 = a a 3 [Eq. (26.26)]. **************************** EXERCISES Exercise 26.3 Practice: Energy Conservation for a Perfect Fluid Consider a perfect fluid, with the standard stress-energy tensor T αβ = (ρ + P )uαuβ + P g αβ . Assume that the fluid resides in an arbitrary spacetime, not necessarily our homogeneous, isotropic cosmological model. (a) Explain why the law of energy conservation, as seen by the fluid, is given by uα T αβ ;β = 0.

13 (b) Show that this law of energy conservation reduces to dρ # · #u , = −(ρ + P )∇ dτ

(26.31)

where τ is proper time as measured by the fluid. (c) Show that for a fluid at rest with respect to the homogeneous observers in our homogeneous, isotropic cosmological model, (26.31) reduces to the first law of thermodynamics (26.24). Note: as a tool in this calculation, you might want to derive and use the following formulas, which are valid in any coordinate basis: Γα µα = √

1 √ ( −g),µ , −g

1 √ Aα ;α = √ ( −gAα ),α . −g

(26.32)

Here, g denotes the determinant of the covariant components of the metric. g ≡ det||gij || .

(26.33)

****************************

26.4

Evolution of the Universe

26.4.1

Constituents of the universe: Cold matter, radiation, and dark energy

The evolution of our zero-order cosmological model is highly dependent on the equation of state P (ρ); and that equation of state, in turn, depends on the types of matter and fields that fill the universe—i.e., the universe’s constituents. The constituents can be divided into three classes: (i) Cold matter, i.e. material whose pressure is neglible compared to its total density of mass-energy so the equation of state can be idealized as PM = 0 (subscript M for “matter”). The cold matter includes the baryonic matter of which people, planets, stars, galaxies, and intergalactic gas are made, as well as so-called cold, dark matter which is known to exist in profusion and might be predominantly fundamental particles (e.g. axions or neutralinos). (ii) Radiation, i.e. material with equation of state PR = ρR /3. This includes the CMB (primordial photons), primordial gravitons, primordial neutrinos when their temperatures exceed their rest masses, and other finite-restmass particles when the temperature is sufficiently high (i.e., very early in the universe). (iii) Dark energy (denoted by a subscript Λ for historical reasons described in Box 26.2), with very negative pressure, PΛ " − 12 ρΛ . As we shall see in Sec. 26.5, observations give strong evidence that such matter is present today in profusion. We do not yet know for sure its nature or its equation of state, but the most likely candidate is a nonzero stress-energy tensor associated with the vacuum, for which the equation of state is PΛ = −ρΛ .

14

26.4.2

The vacuum stress-energy tensor

Let us digress, briefly, to discuss the vacuum: The stress-energy tensors of quantum fields are formally divergent, and must be renormalized to make them finite. In the early decades of quantum field theory, it was assumed that (in the absence of boundaries such as those of highly electrically conducting plates) the renormalized vacuum stress-energy tensor Tvac would vanish. In 1968 Yakov Borisovich Zel’dovich initiated speculations that Tvac might, in fact, be nonzero, and those speculations became fashionable in the 1980s in connection with inflationary models for the very early universe (Sec. 26.7). It was presumed in the 1980s and 90s that a phase transition in the early universe had driven quantum fields into a new vacuum state, for which Tvac vanishes; but in the late 1990s, much to physicists’ amazement, observational evidence began to mount that our universe today is filled with a profusion of “dark energy”, perhaps in the form of a nonzero Tvac ; and by 2000 that evidence was compellingly strong. If Tvac is nonzero, what form can it take? It must be a second-rank symmetric tensor, and it would be very surprising if that tensor broke the local homogeneity and isotropy of spacetime or picked out any preferred reference frames. In order not to break those local symmetries, Tvac must be proportional to the metric tensor, with its proportionality factor independent of location in spacetime: Tvac = −ρΛ g ,

ˆˆ

00 i.e., Tvac = ρΛ ,

ˆˆ

jk Tvac = −ρΛ .

(26.34)

This is a perfect-fluid equation of state with PΛ = −ρΛ . If there is no significant transfer of energy or momentum between the vacuum and other constituents of the universe, then energy-momentum conservation requires that Tvac be divergence free. This, together with the vanishing divergence of the metric tensor, implies that ρΛ is constant, despite the expansion of the universe! This constancy can be understood in terms of the first law of thermodynamics (26.24): As the universe expands, the expansion does work against the vacuum’s tension −PΛ = ρΛ at just the right rate as to replenish the vacuum’s otherwise-decreasing energy density. For further insight into Tvac , see Box 26.2.

26.4.3

Evolution of the densities

In order to integrate the Einstein equation backward in time and thereby deduce the universe’s past evolution, we need to know how much radiation, cold matter, and dark energy the universe contains today. Those amounts are generally expressed as fractions of the critical energy density that marks the dividing line between a closed universe and an open one. By asking that k/a2 be zero, we find from the Einstein equation (26.26) that ! "2 3 a˙ o ρcrit = * 9 × 10−30 g/cm3 . (26.35) 8π ao ˙ today) that is discussed in Here we have used a numerical value of a˙ o /ao (the value of a/a Sec. 26.5 below. The energy density today in units of the critical density is denoted Ω≡

ρo , ρcrit

(26.36)

15 Box 26.2 The Cosmological Constant Soon after formulating general relativity, Einstein discovered that his field equation, together with then plausible equations of state P (ρ), demanded that the universe be either expanding or contracting; it could not be static. Firmly gripped by the mindset of his era, Einstein regarded a nonstatic universe as implausible, and thus thought his field equation incompatible with the way the universe ought to behave; and so he modified his field equation. There were very few possibilities for the modification, since (i ) it seemed clear that the source of curvature should still be the stress-energy tensor, and accordingly the field equations should say E = 8πT where E is a tensor (evidently not the Einstein tensor) which characterizes gravity; and (ii ) in order that the field equation leave four of the metric coefficients arbitrary (so they could be adjusted by coordinate freedom) the tensor E should have an automatically vanishing divergence. Of the various possibilities for E, one stood out as far simpler than all the rest: E = G + Λg, where Λ is a “cosmological constant.” To Einstein’s great satisfaction, by choosing Λ negative he was able, from his modified field equation G + Λg = 8πT ,

(1)

to obtain a forever-static, homogeneous and isotropic cosmological model; see Ex. 26.5. In 1929 Edwin Powell Hubble (1929), at the Mount Wilson Observatory, discovered that the universe was expanding. What a shock this was to Einstein! After visiting Mount Wilson and discussing Hubble’s observations with him, Einstein (1931) formally renounced the cosmological constant and returned to his original, 1915, field equation G = 8πT. In his later years, Einstein described the cosmological constant as the greatest mistake of his life. Had he stuck to his original field equation, the expansion of the universe might have been regarded as the greatest of all the predictions made by his general relativity. Remarkably, the cosmological-constant term Λg in Einstein’s modified field equation is identical to the modern vacuum contribution to the stress-energy tensor. More specifically, if we define ρΛ ≡ Λ/8π so Tvac = −ρΛ g = −(Λ/8π)g, then G + Λg = 8πT becomes G = 8π(T + Tvac ). Thus, the modern conclusion that there might be a nonzero vacuum stress-energy tensor is actually a return to Einstein’s modified field equation. It is not at all clear whether the universe’s dark energy has the equation of state PΛ = −ρΛ and thus is the vacuum stress-energy. Cosmologists’ prejudice that it may be vacuum is built into their adoption of Einstein’s cosmological constant notation Λ to denote the dark energy. and observations give values ΩR ∼ 10−4 ,

ΩM * 0.27 ,

ΩΛ * 0.73 ,

Ω ≡ ΩR + ΩM + ΩΛ * 1.00 .

(26.37)

We shall discuss these numbers and the observational error bars on them in Sec. 26.5.

16 The evolution of the universe could be influenced by energy transfer among its three constituents. However, that transfer was small during the epoch from a/ao ∼ 10−9 to today; see Box 26.3. This means that the first law of thermodynamics (26.24) must hold true for each of the three constituents individually: d(ρa3 ) = −P da3 . By combining this law with the constituents’ equations of state, PM = 0, PR = ρR /3, and (assuming the dark energy is vacuum) PΛ = −ρΛ , we obtain ρR = ρRo

a4o , a4

ρM = ρM o

a3o , a3

ρΛ = const .

(26.38)

These relations are plotted in Fig. 26.4 below, which we shall discuss later. The qualitative evolution of our zero-order cosmological model is easily deduced by inserting Eqs. (26.38) into Einstein’s equation (26.26) and rewriting the result in the standard form for the motion of a particle in a potential well:

where

1 2 −k a˙ + V (a) = , 2 2

(26.39)

! " 4π 2 4π a2o ao a2 2 V (a) = − a ρ = − ρcrit ao ΩR 2 + ΩM + ΩΛ 2 . 3 3 a a ao

(26.40)

Note that a/ao is the ratio of the linear size of the universe at some time in the past, to the size of the universe today. Each volume element, comoving with the homogeneous observers, expands in length by ao /a from then until now, and expands in volume by (ao /a)3 . The shape of the effective potential V (a) is shown in Fig. 26.3: It increases monotonically from −∞ at a = 0 to about −4ρcrit a2o at a/ao * 0.7, and then, as the universe nears our own era, it begins decreasing. The universe is radiation dominated at a/ao " 10−4 (Fig. 26.4), it is cold-matter dominated between a/ao ∼ 10−4 and a/ao ∼ 1, and the maxing-out of the effective potential and reversal to plunge is triggered by a modern-era (a/ao ∼ 1) transition to dark-energy dominance. The implications of this effective potential for the past evolution of our universe should be clear from one’s experience with particle-in-potential problems: The universe must have expanded at an ever decreasing rate a˙ from an age ∼ a small fraction of second, when our equations of state became valid, until nearly the present epoch, a/ao ∼ 0.4, and then the universe’s vacuum tension must have triggered an acceleration of the expansion. It seems strange that the universe should switch over to acceleration in just the epoch that we are alive, rather than far earlier or far later or not at all. The reasons for this are unknown. It is a big surprise, revealed by recent observations. If PΛ /ρΛ is independent of time, then the universe’s past evolution is not very sensitive to the precise value of PΛ /ρΛ . For PΛ /ρΛ = − 12 (the least negative pressure allowed by current observations), as for PΛ /ρΛ = −1 (vacuum), the dark energy begins to influence V (a) significantly only in the modern era, and its influence is to accelerate the expansion in accord with observation. It is of no dynamical importance earlier. However, nothing requires that PΛ /ρΛ be constant. It is possible, in principle for PΛ /ρΛ to evolve in a wide variety of ways that, in principle, could have had a strong influence on

17

"Energy"

k = -1 k=0 k = +1

?

a

V(a) = - a2"

? Fig. 26.3: The “particle-in-a-potential” depiction of the evolutionary equation (26.39) for the expansion factor a of the universe. Plotted horizontally is the expansion factor, which plays the role of the position of a particle. Plotted vertically is the “particle’s” potential energy V (a) (thick curve) and its total energy −k/2 (thin dotted line). The difference in height between the dotted line and the thick curve is the particle’s kinetic energy 12 a˙ 2 , which must be positive. The form of V (a) in the past (a ≤ ao ) is shown solid. The form in the future is unknown because we do not know the nature of the dark energy. If the form is that of the upper thick dashed curve, the universe may reach a maximum size and then recontract. If the form is that of the lower dashed curve, the univeral expansion will continue to accelerate forever.

the universe’s early evolution. That this probably did not occur we know from observational data which show that the dark energy cannot have had a very significant influence on the universal expansion at several key epochs in the past: (i) during the nucleosynthesis of light elements when the universe was about 1 minute old, (ii) during recombination of the primordial plasma (conversion from ionization to neutrality) when the universe was about 106 years old, and (iii) during early stages of galaxy formation when the universe was about 1 billion years old. Nevertheless, we are so ignorant, today, of the precise nature of the dark energy, that we must be prepared for new surprises. By contrast, the evolution of the dark energy in the future and the resulting evolution of the universe are unconstrained by observation and are unknown. Until we learn for sure the nature and dynamics of the dark energy, we cannot predict the universe’s future evolution.

26.4.4

Evolution in time and redshift

Since the dark energy cannot have had a very important dynamical role in the past, we shall ignore it in the remainder of this section and shall idealize the universe as containing only cold matter, with density ρM = ρM o (ao /a)3 and radiation with density ρR = ρRo (ao /a)4 . The radiation includes the cosmic background photons (which today are in the microwave frequency band), plus gravitons, and plus those neutrinos whose rest masses are much less than their thermal energies. In order for the observed abundances of the light elements to agree with the theory of their nucleosynthesis, it is necessary that the neutrino and graviton contributions to ρR be less than or of order the photon contributions. The photons were in thermal equilibrium with other forms of matter in the early universe and thus had a Planckian spectrum. Since black-body radiation has energy density ρR ∝ TR4

18 Box 26.3 Interaction of Radiation and Matter In the present epoch there is negligible radiation/matter interaction: the radiation propagates freely, with negligible absorption by interstellar or intergalactic gas. However, much earlier, when the matter was much denser and the radiation much hotter than today and galaxies had not yet formed, the interaction must have been so strong as to keep the photons and matter in thermodynamic equilibrium with each other. In this early epoch the matter temperature, left to its own devices, would have liked to drop as 1/Volume2/3 , i.e., as 1/a2 , while the radiation temperature, left to its own devices would have dropped as 1/Volume1/3 , i.e., as 1/a. To keep their temperatures equal, the photons had to feed energy into matter. This feeding was not at all a serious drain on the photons’ energy supply, however: Today the ratio of the number density of background photons to the number density of baryons (i.e., protons and neutrons) is nRo aR TRo 4 /(2.8kTRo ) = ∼ 108 , nM o ρmo /mp

(1)

where mp is the proton mass, k is Boltzman’s constant and 2.8kTRo is the average energy of a black-body photon at the CMB temperature TRo = 2.728 K. Because, aside from a factor of order unity, this ratio is the entropy per baryon in the universe (Chaps. 3 and 4), and because the entropy per baryon was (nearly) conserved during the (nearly) adiabatic expansion of the universe, this ratio was about the same in the early era of thermal equilibrium as it is today. Since the specific heat per photon and that per baryon are both of order Boltzman’s constant k, the specific heat of a unit volume of background radiation in the early era exceeded that of a unit volume of matter by eight orders of magnitude. Consequently, the radiation could keep the matter’s temperature equal to its own with little effort; and accordingly, despite their interaction, the radiation by itself satisfied energy conservation to high accuracy. This remained true, going backward in time, until the temperature reached T ∼ 1 m /k ∼ 109 K, at which point electron-positron pairs formed in profusion and sucked 5 e roughly half the photon energy density out of the photons. This pair formation, going backward in time, or pair annihilation going forward, occurred when the universe was several 10’s of seconds old (Fig. 26.4) and can be regarded as converting one form of radiation (photons) into another (relativistic pairs). Going further backward in time, at T ∼ mp /k ∼ 1013 K, the neutrons and protons (baryons) became relativistic, so cold matter ceased to exist—which means that, going forward in time, cold matter formed at T ∼ mp /k ∼ 1013 K. As is shown in the text, the dark energy only became significant in the modern era; so its interaction with cold matter and radiation (if any, and there presumably is very little) cannot have been important during the universe’s past evolution.

19 and the density decreases with expansion as ρR ∝ 1/a4 , the photon temperature must have been redshifted during this early era as TR ∝ 1/a. When the temperature dropped below ∼ 104 K, the electrons dropped into bound states around atomic nuclei, making the matter neutral, and its opacity negligible, so the photons were liberated from interaction with the matter and began to propagate freely. Kinetic theory (Box 26.4) tells us that during this free propagation, the photons retained their Planckian spectrum, and their temperature continued to be redshifted as TR ∝ 1/a. In accord with this prediction, the spectrum of the photons today is measured to be Planckian to very high accuracy; its temperature is TRo = 2.728 K, corresponding to a photon energy density today ργo ∼ 5 × 10−34 g cm−3 . Adding to this the neutrino and graviton energy densities, we conclude that ρRo ∼ 10−33 g cm−3 . By contrast, the matter density today is ρM o * 3 × 10−30 g cm−3 . To recapitulate: the matter and radiation densities and temperatures must have evolved as a3o a4o ao ρM = ρM o 3 , ρR = ρRo 4 , TR = TRo , (26.41) a a a throughout the entire epoch from a/ao ∼ 3 × 10−13 until today. The density and temperature evolutions (26.41) are depicted as functions of the universe’s expansion factor a/ao in Fig. 26.4. A second way to express the evolution is in terms of cosmological redshift: Imagine photons emitted in some chosen spectral line of some chosen type of atom (e.g., the Lyman alpha line of atomic hydrogen), at some chosen epoch during the universe’s evolution. Let the atoms be at rest in the mean rest frame of the matter and radiation, i.e., in the rest frame of a homogeneous observer, so they move orthogonally to the homogeneous hypersurfaces. Focus attention on specific photons that manage to propagate to Earth without any interaction whatsover with matter. Then they will arrive with a wavelength, as measured on Earth today, which is much larger than that with which they were emitted: The expansion of the universe has increased their wavelength, i.e., has redshifted them. As is shown in Exercise 26.6 below, if the expansion factor was a at the time of their emission, and if their wavelength at emission as measured by the emitter was λ, then at reception on Earth as measured by an astronomer, they will have wavelength λo given by λo ao = ; (26.42) λ a i.e., the photons’ wavelength is redshifted in direct proportion to the expansion factor of the universe. It is conventional to speak of the redshift z as not the ratio of the wavelength today to that when emitted, but rather as the fractional change in wavelength, so z≡

ao λo − λ = −1. λ a

(26.43)

In Fig. 26.4’s depiction of the density evolution of the universe, the horizontal axis at the bottom is marked off in units of z. It is also instructive to examine the density evolution in terms of proper time, t, as measured in the mean rest frame of the matter and radiation; i.e., as measured by clocks carried by the homogeneous observers. At redshifts z > ρM o /ρRo ∼ 5000, when the energy

20 Box 26.4 Kinetic Theory of Photons in General Relativity The kinetic theory of photons and other particles (Chap. 2) can be lifted from special relativity into general relativity using the equivalence principle: In any local Lorentz frame in curved spacetime, the number density in phase space is given by the special relativity expression (2.3): N (P, p#) = dN/dVx dVp . Here P is the location of the observer in spacetime, p# is the momentum of some chosen “fiducial” photon, dVx is a small 3-volume at P in the physical space of the observer’s local Lorentz frame, dVp is a small 3-volume in the momentum space of the observer’s local Lorentz frame, centered on p#, and dN is the number of photons in dVx and dVp . For a homogeˆ ˆ neous observer, we can choose dVx = a3 Σ2 sin θ dχ dθ dφ , dVp = dpχˆ dpθ dpφ , where the hats denote components on the unit vectors #eχˆ , #eθˆ, #eφˆ. The equivalence principle guarantees that, just as in flat spacetime, so also in curved spacetime, (i ) the number density in phase space, N , is independent of the velocity of the local Lorentz frame in which it is measured (with all the frames presumed to be passing through the event P); and (ii ) if the photons do not interact with matter, then N is constant along the world line of any chosen (fiducial) photon as it moves through spacetime and as its 4-momentum p# evolves in the free-particle (geodesic) manner. [In asserting this constancy of N , one must examine carefully the issue of curvature coupling; Sec. 23.7. Because the volume element dVx involved in the definition of N has some finite, though tiny size, spacetime curvature will produce geodesic deviation between photons on opposite sides of dVx . One can show fairly easily, however, that this geodesic deviation merely skews the phase-space volume element along its momentum directions in a manner analogous to Fig. 2.6(b), while leaving the product dVx dVp fixed and thereby leaving N unchanged; cf. Sec. 2.7.] The equivalence principle also guarantees that in curved spacetime, as in flat, the number density in phase space can be expressed in terms of the specific intensity Iν and the frequency of the chosen photon ν (as measured in any local Lorentz frame): N = h−4 Iν /ν 3 [Eq. (2.18)]. If the spectrum is Planckian with temperature T as measured in this Lorentz frame, then N will have the form 2 1 N = 3 hν/kT . (1) h e −1 The Lorentz-invariance and conservation of N , together with the fact that this N depends only on the ratio ν/T , implies that, (i) a spectrum that is Planckian in one reference frame will be Planckian in all reference frames, with the temperature T getting Doppler shifted in precisely the same manner as the photon frequency ν; and (ii) an initially Planckian spectrum will remain always Planckian (under free propagation), with its temperature experiencing the same cosmological redshift, gravitational redshift, or other redshift as the frequencies of its individual photons. For the CMB as measured by homogeneous observers, the frequencies of individual photons get redshifted by the expansion as ν ∝ 1/a, so the photon temperature also gets redshifted as T ∝ 1/a.

21 1

10

2

10

4

10

6

10

8

10

10

10

12

10

14

10

16

t, sec -5

10

R

-10

10

10

8

10

10

10

TR , K

10 10 10

-15

10

M

6

TR

4

R

-20

M

, g/cm3

-25

10

-30

10

2

-10

10 +10 10

-8

-6

10 +6 10

10 +6 10

-4

10 +4 10

-2

10 +2 10

1 0

a/a 0

z

Fig. 26.4: The evolution of the total mass-energy densities ρM and ρR in matter and in radiation and the radiation’s photon temperature TR , as functions of the expansion factor a of the universe, the cosmological redshift z and the proper time t (in the mean rest frame of the matter and radiation) since the “big bang.”

in radiation dominated over that in matter, a as a function of time was governed by the Einstein field equation ! "2 $ a %4 a˙ k 8π 8π o + 2 = . (26.44) ρR = ρRo a a 3 3 a

[Eq. (26.30)]. As we shall see in Sec. 26.5 below, the present epoch of the universe’s expansion is an early enough one that, if it is closed or open, the evolution has only recently begun to depart significantly from that associated with a flat, k = 0 model. Correspondingly, in the early, radiation-dominated era, the evolution was independent of k to high precision, i.e., the factor k/a2 in the evolution equation was negligible. Ignoring that factor and integrating Eq. (26.30), then setting ρ * ρR = ρRo (ao /a)4 , we obtain 3 ρ * ρR = , 32πt2

a = ao

!

32π ρRo t2 3

"1/4

when

a ρRo < ∼ 3 × 10−4 . ao ρM o

(26.45)

Here the origin of time, t = 0, is taken to be at the moment when the expansion of the universe began: the “big-bang.” This early, radiation-dominated era ended at a cross-over time & ! "4 '1/2 ( )1/2 ρRo 3 3 −4 4 tc = ∼ (3 × 10 ) × 32πρRo ρM o 32π × 10−33 g/cm3 ( )1/2 1g 1 year ∼ 70, 000 years . (26.46) × × 0.742 × 10−28 cm 0.946 × 1018 cm

22 In this calculation the first two factors on the second line are introduced to convert from geometrized units to conventional units. After the crossover time the solution to the Einstein equation is that for pressure-free matter. The precise details of the time evolution will depend on whether the universe is open, closed, or flat; but the three possibilities will agree up to the present epoch to within a few tens of per cent (see Sec. 26.5). Ignoring the differences between open, closed, and flat, we can adopt the k = 0, pressure-free evolution as given by Friedmann’s flat model, Eqs. (26.48), (26.49), (26.50), and (26.51) —but with the origin of time adjusted to match onto the radiation-dominated solution (26.45) at the cross-over time: & ! "2 '1/3 tc 1 a a ρRo t + ρ * ρM = , = 6πρ when > ∼ 3 × 10−4 . M o 2 6π(t + tc /3) ao 3 ao ρM o (26.47) The present age of the universe, as evaluated by setting ρM = ρM o in this formula and converting to cgs units, is of order 1010 years. We shall evaluate the age with higher precision in section 26.5.8 below. In Fig. 26.4’s depiction of the evolution, the time t since the big bang, as computed from Eqs. (26.45), (26.46), and (26.47), is marked along the top axis.

26.4.5

Physical processes in the expanding universe

The evolution laws (26.41) and (26.45), (26.46), and (26.47) for ρM and ρR are a powerful foundation for deducing the main features of the physical evolution of the universe from the very early epoch, a/ao ∼ 3 × 10−13 , i.e. z ∼ 3 × 1012 and t ∼ 10−5 sec up to the present. For detailed, pedagogical presentations of those features see, e.g., Peebles (1971) and Zel’dovich and Novikov (1983). Here we shall just present a very short summary. Some key physical events that one deduces during the evolution from z = 3 × 1012 to the present, z = 0, are these: (i ) At redshift z ∼ 3×1012 , baryon-antibaryon pairs annihilated and the thermal energies of neutrons and protons became much smaller than their rest-mass energies. This was the epoch of formation of baryonic cold matter. (ii ) At redshifts z ∼ 109 when the universe was of order a second old, the photons ceased being energetic enough to make electronpositron pairs; the pairs, which earlier had been plentiful, annihilated, feeding their energy into photons; and with their annihilation the primordial gas suddenly became transparent to neutrinos. Since then the neutrinos, born in thermodynamic equilibrium at z > 109 , should have propagated freely. (iii ) At redshifts z ∼ 3 × 108 , when the universe was a few minutes old, ρR was roughly 1 g/cm3 , and the temperature was TR = TM ∼ 109 K, nuclear burning took place. Going into this epoch of primordial nucleosynthesis the matter consisted of equal numbers of protons, neutrons, and electrons all in thermodynamic equilibrium with each other. Coming out, according to evolutionary calculations for the relevant nuclear reactions, it consisted of about 75 per cent protons (by mass), 25 per cent alpha particles (4 He nuclei), and tiny (< 10−6 ), but observationally important amounts of deuterium, 3 He, lithium, beryllium, and boron. [The agreement of these predictions with observation constitutes strong evidence that cosmologists are on the right track in their deductions about the early universe.] All the elements heavier than boron were almost certainly made in stars when the universe was billions of years old.

23 (iv ) At the redshift z ∼ 3000, when the universe was about 70, 000 years old and TR ∼ TM was about 104 K, came the cross-over from radiation dominance to matter dominance; i.e., ρR = ρM ∼ 10−17 g/cm3 . (v ) At the redshift z * 1090, when the universe was * 380, 000 years old and its temperature had dropped to roughly 3000 K and its density to ρM ∼ 10−20 g/cm3 , the electrons in the primordial plasma were captured by the protons and alpha particles to form neutral hydrogen and helium. Before this epoch of recombination the matter was highly ionized and opaque to radiation; afterward it was highly neutral and transparent. (vi ) Before recombination, if any matter tried to condense into stars or galaxies, it would get adiabatically heated as it condensed and the rising pressure of radiation trapped inside it would prevent the condensation from proceeding. After recombination, radiation was no longer trapped inside a growing condensation. Now, for the first time, stars, galaxies, and clusters of galaxies could begin to form. Measured anisotopies of the CMB, however, tells us that the size of the density fluctuations at recombination was ∆ρM /ρM ∼ 10−4 , which is just the right size to grow, by gravitational condensation, to ∆ρ/ρ ∼ 1 at z ∼ 10. Thus it is that the epoch of galaxy formation probably began around a redshift z ∼ 10, when the universe was already about two billion years old compared to its present age of roughly 14 billion years. (vi ) In galaxies such as ours there has been, since formation, a continuing history of stellar births, nucleosynthesis, and deaths. The unravelling of our Galaxy’s nucleosynthesis history and the resulting understanding of the origin of all the elements heavier than boron, was achieved, in large measure in the 1950s, 60s and 70s, by nuclear astrophysicists under the leadership of Caltech’s William A. Fowler. Our own sun is of the second generation or later: By measuring in meteorites the relative abundances of unstable atomic nuclei and their decay products, Caltech’s Gerald J. Wasserburg and his colleagues have deduced that the solar system is only 4.58 billion years old—i.e., it formed when our Milky Way Galaxy was already ∼ 5 billion years old. Before recombination the radiation was kept thermalized by interactions with matter. However, since recombination the radiation has propagated freely, interacting only with the gravitation (spacetime curvature) of our universe. **************************** EXERCISES Exercise 26.4 Example: Friedmann’s Cosmological Models Consider a model universe of the type envisioned by Alexander Friedmann (1922)—one with zero pressure (i.e., containing only cold matter), so its density is ρM ∝ 1/a3 ; cf. Eq. (26.38). Write this density in the form 3 am , (26.48) ρ= 8π a3 where am is a constant whose normalization (the factor 3/8π) is chosen for later convenience. (a) Draw the effective potential V (a) for this model universe, and from it discuss qualitatively the evolution in the three cases k = 0, ±1.

24

a

k = #1 k =0 k = +1

t Fig. 26.5: Time evolution of the expansion factor a(t) for the zero-pressure, Friedmann cosmological models (Exercise 26.4). The three curves correspond to the closed, k = +1 model; the flat, k = 0, model; and the open, k = −1 model.

(b) Show that for a closed, k = +1 universe with zero pressure, the expansion factor a evolves as follows: a=

am (1 − cos η) , 2

t=

am (η − sin η) . 2

(26.49)

Here η is a parameter which we shall use as a time coordinate in Sec. 26.5 [Eq. (26.80)]below. This a(t) [depicted in Fig. 26.5], is a cycloid . (c) Show that for a flat, k = 0 universe, the evolution is given by a=

!

9am 4

"1/3

t2/3 ,

ρ=

1 . 6πt2

(26.50)

(d) Show that for an open, k = −1 universe, the evolution is given by a=

am (cosh η − 1) , 2

t=

am (sinh η − η) . 2

(26.51)

Note, as shown in Fig. 26.5, that for small expansion factors, a - am , the evolutions of the three models are almost identical. Exercise 26.5 Problem: Einstein’s Static Universe Consider a model universe of the sort that Einstein (1917) envisioned: one with a nonzero, positive cosmological constant and containing matter with negligible pressure, P = 0. Reinterpret this in modern language as a universe with cold matter and a nonzero vacuum stress-energy. Einstein believed that (when averaged over the motions of all the stars), the universe must be static — i.e., neither expanding nor contracting: a = constant independent of time.

25 (a) Show that Einstein’s equations do admit a solution of this form, and deduce from it (i) the spatial geometry of the universe (spherical, flat, or hyperboloidal), and (ii) relationships between the universe’s “radius” a, its matter density ρM , and its vacuum energy density ρΛ . (b) Show that Einstein’s static cosmological model is unstable against small perturbations of its “radius”: if a is reduced slightly from its static, equilibrium value, the universe will begin to collapse; if a is increased slightly, the universe will begin to expand. Einstein seems not to have noticed this instability. For a historical discussion of Einstein’s ideas about cosmology, see Sec. 15e of Pais (1982). Exercise 26.6 Example: Cosmological Redshift Consider a particle, with finite rest mass or zero, that travels freely through a homogeneous, isotropic universe. Let the particle have energy E as measured by a homogeneous observer side-by-side with it, when it starts its travels, at some early epoch in the universe’s evolution; and denote by Eo its energy as measured by a homogeneous observer at its location, near Earth, today. Denote by # √ p = E 2 − m2 , po = Eo 2 − m2 (26.52) the momentum of the particle as measured in the early epoch and today. In this problem you will evaluate the ratio of the momentum today to the momentum in the early epoch, po /p, and will deduce some consequences of that ratio.

(a) Place the spatial origin, χ = 0, of the spatial coordinates of a Robertson-Walker coordinate system [Eq. (26.6)] at the point where the particle started its travel. (Homogeneity guarantees we can put the spatial origin anywhere we wish.) Orient the coordinates so the particle starts out moving along the “equatorial plane” of the coordinate system, θ = π/2 and along φ = 0. (Isotropy guarantees we can orient our spherical coordinates about their origin in any way we wish.) Then spherical symmetry about the origin guarantees the particle will continue always to travel radially, with θ = π/2 and φ = 0 all along its world line; in other words, the only nonvanishing contravariant components of its 4-momentum are pt = dt/dζ and pχ = dχ/dζ; and, since the metric is diagonal, the lowering of components shows that the only nonvanishing covariant components are pt and pχ . Show that the quantity pχ is conserved along the particle’s world line. (b) Express the momentum p measured by the homogeneous observer at the starting point and the momentum po measured near Earth today in terms of pχ . [Hint: The local Lorentz frame of a homogeneous observer has the basis vectors (26.20).] Show that a 1 po = , = p ao 1+z where z is the cosmological redshift at the starting point.

(26.53)

26 (c) Show that if the particle is a photon, then its wavelength is redshifted in accord with Eqs. (26.42) and (26.43). (d) Show that if the particle has finite rest mass and has speed v - 1 at its starting point, as measured by a homogeneous observer there, then its velocity today as measured by the near-Earth homogeneous observer will be a v vo = v = . (26.54) ao 1+z Exercise 26.7 Practice: Cosmic Microwave Radiation in an Anisotropic Cosmological Model Consider a cosmological model with the spacetime metric ds2 = −dt2 + a2 (t)dx2 + b2 (t)dy 2 + c2 (t)dz 2 .

(26.55)

The quantities a, b, and c (not to be confused with the speed of light which is unity in this chapter) are expansion factors for the evolution of the universe along its x, y, and z axes. The Einstein field equation governs the time evolution of these expansion factors; but the details of that evolution will not be important to us in this problem. (a) Show that the space slices t = const in this model have Euclidean geometry, so the model is spatially flat and homogeneous. Show that the observers who see these slices as hypersurfaces of simultaneity, i.e., the homogeneous observers, have world lines of constant x, y, and z, and their proper time is equal to the coordinate time t. (b) At time te when the expansion factors were ae , be , and ce the universe was filled with isotropic black-body photons with temperature Te , as measured by homogeneous observers. Define px ≡ p# · ∂/∂x, py ≡ #p · ∂/∂y, pz ≡ p# · ∂/∂z for each photon. Show that in terms of these quantities the photon distribution function at time te is &! " ! "2 ! "2 '1/2 2 px py pz 2 1 N = 3 E/kTe , where E = + + . (26.56) h e −1 ae be ce (c) After time te each photon moves freely through spacetime (no emission, absorption, or scattering). Explain why px , py , and pz are constants of the motion along the phase-space trajectory of each photon. (d) Explain why N , expressed in terms of px , py , pz , retains precisely the form (26.56) for all times t > te . (e) At time to > te , when the expansion factors are ao , bo , co , what are the basis vectors #eˆ0 , #exˆ , #eyˆ, #ezˆ of the local Lorentz frame of a homogeneous observer? (f) Suppose that such an observer looks at the photons coming in from a direction n = nxˆ#exˆ + nyˆ#eyˆ + nzˆ#ezˆ on the sky. Show that she sees a precisely Planck frequency distribution with temperature To that depends on the direction n that she looks: &! "2 ! "2 ! "2 '−1/2 ao bo co To = Te nxˆ + nyˆ + nzˆ . (26.57) ae be ce

27 (g) In the case of isotropic expansion, a = b = c, show that To is isotropic and is redshifted by the same factor, 1 + z, as the frequency of each photon [Eqs. (26.42) and (26.43)]: ae To 1 = = . Te 1+z ao

(26.58)

[The redshift z must not be confused with the coordinate z of Eq. (26.55).]

****************************

26.5

Observational Cosmology

26.5.1

Parameters characterizing the universe

Our zero-order (homogeneous and isotropic) model of the universe is characterized, today, by the following parameters: (i) The quantity Ho ≡ a˙ o /ao ,

(26.59)

which is called the Hubble expansion rate, and which determines the critical density ρcrit = (3/8π)Ho2 [Eq. (26.35)]. (ii) The density of cold matter measured in units of the critical density, ΩM = ρM o /ρcrit [Eq. (26.36)]. (iii) The split of ΩM into two parts, ΩM = ΩB + ΩCDM . Here ΩB is that portion due to “baryonic matter,” the type of matter (protons, neutrons, electrons, and atoms and molecules made from them) of which stars, galaxies, and interstellar gas are made; and ΩCDM is the portion due to non-baryonic, “cold, dark matter” (probably axions and/or neutralinos and/or other types of weakly interacting, massive particles produced in the big bang). (iv) The temperature TRo of the CMB. (v) The density of radiation in units of the critical density, ΩR = ρRo /ρcrit . (vi) ΩΛ , the density of dark energy in units of the critical density. (vii) PΛ /ρΛ , the ratio of the dark energy’s pressure to its density (equal to −1 if the dark energy is a nonzero stress-energy of the vacuum). The time-time component of the Einstein field equation, Eq. (26.26), translated into the notation of our seven parameters, says k = Ho2 (Ω − 1) , a2o

where Ω = ΩM + ΩR + ΩΛ * ΩM + ΩΛ

(26.60)

is the total density in units of the critical density. In most of the older literature (before ∼ 1995), much attention is paid to the dimensionless deceleration parameter of the universe, defined as PΛ −¨ ao /ao ΩM + qo ≡ = ΩΛ . (26.61) 2 Ho 2 ρΛ Here the second equality follows from the space-space component of the Einstein field equation, Eq. (26.27), translated into the language of our parameters, together with the fact that

28 today the only significant pressure is that of dark energy, PΛ . We shall not use qo in this book. Remarkably, the values of our seven independent parameters Ho , ΩB , ΩM , TRo , ΩR , ΩΛ , and PΛ /ρΛ are all fairly well known today (spring 2003), thanks largely to major observational progress in the past several years. In this section we shall discuss the observations that have been most effective in determining these parameters. For greater detail see, e.g., the review article by Turner (1999), and the WMAP results presented by Bennett et. al. (2003), Hinshaw et. al. (2009).

26.5.2

Local Lorentz frame of homogeneous observers near Earth

As a foundation for discussing some of the observations, we shall construct the local Lorentz frame of a homogeneous observer near Earth. (The Earth moves relative to this frame with a speed v = 630±20 km s−1 , as revealed by a dipole anisotropy in the temperature distribution of the CMB on the Earth’s sky.) Homogeneous observers can have local Lorentz frames because they moves freely (along timelike geodesics) through spacetime; cf. Ex. 26.1. For ease of analysis, we place the spatial origin of the Robertson-Walker {t, χ, θ, φ} coordinate system at the location of a near-Earth homogeneous observer. Then that observer’s Lorentz coordinates are 1 tˆ = t + χ2 aa˙ , xˆ ≡ aχ sin θ cos φ , yˆ ≡ aχ sin θ sin φ , zˆ ≡ aχ cos θ ; (26.62) 2 cf. Eq. (23.12) and associated discussion. Note that only at second-order in the distance away from the origin of the local Lorentz frame does the Lorentz time tˆ differ from the Robertson-Walker time coordinate t. This second-order difference will never be important for anything we compute, so henceforth we will ignore it and set tˆ = t. The near-Earth local Lorentz frame, like any local Lorentz frame, must be kept spatially small compared to the radius of curvature of spacetime. That radius of curvature is related to the Lorentz-frame components of spacetime’s Riemann curvature tensor by R∼

1 1/2 |Rαˆβˆ ˆγ δˆ|

.

(26.63)

More precisely, since it is the largest components of Riemann that have the biggest physical effects, we should use the largest components of Riemann when evaluating R. These largest components of Riemann today turn out to be ∼ a˙ 2o /a2o = Ho2 and ∼ k/a2o . Observations discussed below reveal that Ω ∼ 1, so k/a2o " Ho2 [Eq. (26.60)]. Therefore, the universe’s coarse-grain-averaged radius of spacetime curvature today is R∼

1 ; Ho

(26.64)

and the demand that the local Lorentz frame be small compared to this radius of curvature is equivalent to the demand that we confine the local-Lorentz spatial coordinates to the region # (26.65) Ho r - 1 , where r ≡ xˆ2 + yˆ2 + zˆ2 = aχ + O(aχ3 ) .

(Below we shall neglect the tiny aχ3 correction.)

29

26.5.3

Hubble expansion rate

Consider a homogeneous observer near enough to Earth to be in the near-Earth local Lorentz frame, but not at its origin. Such an observer has fixed Robertson-Walker radius χ and localLorentz radius r = aχ, and thus moves away from the origin of the local Lorentz frame with a velocity, as measured by the frame’s rods and clocks, given by v = dr/dt = aχ; ˙ i.e., evaluating that velocity today, v = a˙ o χ. Correspondingly, special relativity insists that light emitted by this homogeneous observer at local Lorentz radius r and received today by the homogeneous observer at r = 0 should be Doppler shifted by an amount ∆λ/λ ∼ = v = a˙ o χ. Note that this Doppler shift is proportional to the distance between the homogeneous observers, with the proportionality factor equal to the Hubble constant: z≡

∆λ = v = Ho r . λ

(26.66)

This Doppler shift is actually nothing but the cosmological redshift, looked at from a new viewpoint: When specialized to emitters and receivers that are near each other, so they can be covered by a single local Lorentz frame, the cosmological redshift formula (26.43) reduces to ao 1 1 z= (26.67) − 1 = (ao − a) ∼ = a˙ o ∆t = Ho ∆t , a a ao where ∆t is the time required for the light to travel from emitter to receiver. Since the light travels at unit speed as measured in the local Lorentz frame, ∆t is equal to the distance r between emitter and receiver, and the cosmological redshift becomes z = Ho r, in agreement with the Doppler shift (26.66). To the extent that the galaxies which astronomers study are at rest with respect to homogeneous observers, they should exhibit the distance-redshift relation (26.66). In reality, because of the gravitational attractions of other, nearby galaxies, typical galaxies are not at rest relative to homogeneous observers, i.e., not at rest relative to the “Hubble flow”. Their velocities relative to local homogeneous observers are called peculiar velocities and have magnitudes that are typically vpec ∼ 300 km/sec ∼ 10−3 , and can be as large as vpec ∼ 1000 km/sec. In order to extract the Hubble constant from measurements of galactic distances and redshifts, astronomers must determine and correct for these peculiar motions. That correction task is rather difficult when one is dealing with fairly nearby galaxies, say with z " 0.003 so v " 1000 km/sec. On the other hand, when one is dealing with more distant galaxies, the task of determining the distance r is difficult. As a result, the measurement of the Hubble constant has been a long, arduous task, involving hundreds of astronomers working for 2/3 of a century. Today this effort has finally paid off, with a number of somewhat independent measurements that give Ho = (70.5 ± 1.3) km sec−1 Mpc−1 ,

(26.68)

where the unit of velocity is km s −1 and the unit of distance is 1 Mpc = 1 megaparsec = 106 pc. Converting into units of length and time (using c = 1), the inverse Hubble constant is 1 = (4.3 ± 0.1) Gpc = (13.9 ± 0.3) × 109 years . Ho

(26.69)

30 Correspondingly, the critical density to close the universe, Eq. (26.35), is ρcrit = (9.1 ± 0.4) × 10−30 g/cm3 .

(26.70)

In the cosmology literature one often meets the “Hubble parameter,” whose definition and measured value are Ho h≡ = 0.705 ± 0.013 . (26.71) 100km s−1 Mpc−1

26.5.4

Primordial nucleosynthesis

When the universe was about a minute old and had temperature TR ∼ 109 K, nuclear burning converted free protons and neutrons into the light isotopes deuterium ≡2 H, 3 He, 4 He, and 7 Li. Over the past four decades astronomers have worked hard to achieve precision measurements of the primordial abundances of these isotopes. Those measurements, when compared with nucleosynthesis calculations based on models for the universal expansion, produce remarkably good agreement—but only when (i) the number of species of neutrinos (which contribute to the radiation density and via the Einstein equation to the expansion rate during the burning) is no greater than three (electron, muon, and tau neutrinos); and (ii) dark energy has negligible influence on the universe’s expansion except in and near the modern era, and possibly before nucleosynthesis; and (iii) the normalized baryon density is "2 ! 0.70 ΩB = (0.040 ± 0.006) . (26.72) h Here 0.006 is the 95 per cent confidence limit. This is a remarkably accurate measurement of the density of baryonic matter—and it shows that ρB is only about 5 per cent of the critical density. An even more accurate value comes, today, from a combination of WMAP and other measurements (Hinshaw 2009): ΩB = (0.046 ± 0.02) .

26.5.5

(26.73)

Density of Cold Dark Matter

The only kind of matter that can condense, gravitationally, is that with pressure P - ρ, i.e., cold matter. The pressures of the universe’s other constituents (radiation and dark energy) prevent them from condensing significantly; they must be spread rather smoothly throughout the universe. The total density of cold matter, ΩM , can be inferred from the gravitationally-measured masses M of large clusters of galaxies. Those masses are measured in four ways: (i) by applying the virial theorem to the motions of their individual galaxies, (ii) by applying the equation of hydrostatic equilibrium to the distributions of hot, X-ray emitting gas in the clusters, (iii) by studying the influence of the clusters as gravitational lenses for more distant objects, and (iv) from the positions and shapes of the Doppler peaks in the CMB (Sec. 26.5.7). The results of the four methods agree reasonably well and yield a total density of cold matter ΩM = 0.27 ± 0.01 ; (26.74)

31 see Turner (1999) for details and references; and for the Doppler peak measurements see Hinshaw et. al. (2009). This ΩM * 0.27 is much larger than the density of baryonic matter ΩB * 0.04. Their difference, ΩCDM = ΩM − ΩB = 0.23 ± 0.01 (26.75) is the density of cold, dark matter.

26.5.6

Radiation Temperature and Density

The temperature of the CMB has been measured, from its Planckian spectrum, to be TR = 2.728 ± 0.002K

(26.76)

This temperature tells us with excellent accuracy the contribution of photons to the radiation density ! "2 h Ωγ = (0.5040 ± 0.005) × 10−4 . (26.77) 0.70 The radiation also includes primordial gravitational waves (gravitons), whose energy density is predicted by inflationary arguments to be small compared to Ωγ , though this prediction could be wrong. It can be no larger than Ωg ∼ Ωγ , as otherwise the gravitons would have exerted an unacceptably large influence on the expansion rate of the universe during primordial nucleosynthesis and thereby would have distorted the nuclear abundances measureably. The same is true of other hypothetical forms of radiation. Primordial neutrinos must have been in statistical equilibrium with photons and other forms of matter and radiation in the very early universe. Statistical arguments about that equilibrium predict an energy density for each neutrino species of Ων = (7/8)(4/11)4/2Ωγ , so long as kTR . mν c2 (the neutrino rest mass-energy). Recent measurements of neutrino oscillations tell us that the neutrinos have rest masses ! 0.01 eV, which implies that they behaved like radiation until some transition temperature TR ! 100 K (at a redshift ! 30) and then became nonrelativistic, with negligible pressure. Combining all these considerations, we see that the total radiation density must be ΩR ∼ 1 × 10−4

(26.78)

to within a factor of order 2.

26.5.7

Anisotropy of the CMB: Measurements of the Doppler Peaks

Consider an object with physical diameter D that resides at a distance r from Earth, and neglect the Earth’s motion and object’s motion relative to homogeneous observers. Then

32 the object’s angular diameter Θ as observed from Earth will be Θ = D/r, if r - 1/Ho so the effects of spacetime curvature are negligible. For greater distances, r ∼ 1/Ho , the ratio rAD ≡

D Θ

(26.79)

(called the object’s angular-diameter distance) will be strongly influenced by the spacetime curvature—and thence by the cosmological parameters Ho , ΩM , ΩΛ , PΛ /ρΛ that influence the curvature significantly. In Ex. 26.8 formulas are derived for rAD as a function of these parameters and the object’s cosmological redshift z (a measure of its distance). Astronomers searched for many decades for objects on the sky (standard yardsticks), whose physical diameter D could be known with high confidence. By combining the known D’s with the yardsticks’ measured angular diameters Θ to get their rAD = D/Θ and by measuring the redshifts z of their spectral lines, the astronomers hoped thereby to infer the cosmological parameters from the theoretical relation rAD (z, cosmological parameters). This effort produced little of value in the era ∼1930 to ∼1990, when astronomers were focusing on familiar astronomical objects such as galaxies. No way could be found to determine, reliably, the physical diameter D of any such object. Finally, in 1994, Marc Kamionkowski, David Spergel and Naoshi Sugiyama (1994) identified an object of a very different sort, whose physical diameter D could be known with high confidence: the cosmological horizon in the era when the primordial plasma was recombining and matter and radiation were decoupling from each other. This was the long-sought standard yardstick. This cosmological horizon is not the same thing as the horizon of a black hole, but it is analogous. It is the distance between objects that are just barely able to communicate with each other via light signals. To discuss this concept quantitatively, it is useful to introduce a new time coordinate η for the Robertson-Walker line element (26.6) η=

*

dt ; a

so dη =

dt . a

(26.80)

Then the line element becomes ds2 = a2 [−dη 2 + dχ2 + Σ2 (dθ2 + sin2 θdφ2 )] .

(26.81)

By setting η = 0 at the beginning of the expansion and η = ηrec at the era of recombination, and noting that light travels in the χ direction with coordinate speed dχ/dη = 0, we see that the diameter of the horizon at recombination is * trec dt Drec = ηrec arec , where ηrec = (26.82) a 0 and arec is the value of a at recombination. Two objects separated by a distance greater than Drec were unable to communicate with each other at recombination, because there had not been sufficient time since the birth of the universe for light to travel from one to the other. In this sense, they were outside each others’ cosmological horizon. Objects with

33 separations less than Drec could communicate, at recombination; i.e., they were inside each others’ cosmological horizon. As the universe expands, the cosmological horizon expands; objects that are outside each others’ horizons in the early universe come inside those horizons at some later time, and can then begin to communicate. Kamionkowski, Spergel and Sugiyama realized that the universe provides us with markers on the sky that delineate the horizon diameter at recombination, Drec . These markers are anisotropies of the CMB, produced by the same density and temperature inhomogeneities as would later grow to form galaxies. The inhomogeneities are known, observationally, to have been perturbations of the density with fixed, homogeneous entropy per baryon, i.e. with fixed TR3 /ρM , and with amplitudes, as they came inside the horizon, 3

∆ρM ∆TR = ∼ 1 × 10−4 . TR ρM

(26.83)

We can resolve the perturbations ∆TR /TR at recombination into spatial Fourier components characterized by wave number k, or equivalently by reduced wavelength λ ¯ = 1/k. Observers on Earth find it more convenient to resolve the perturbations into spherical harmonics on the sky. Since order / = 1 corresponds to a perturbation with angular wavelength 360 degrees = 2π radians, order / must be a perturbation with angular wavelength 2π// and thence angular reduced wavelength Θ = 1//. The ratio λ ¯ /Θ of physical reduced wavelength to angular reduced wavelength is the angular-diameter distance over which the CMB photons have traveled since recombination: rec rAD =

λ ¯ / = . Θ k

(26.84)

Now, consider perturbations with spatial scale small enough that a reduced wavelength λ ¯ came inside the horizon (“crossed the horizon”) somewhat earlier than recombination. Before λ ¯ crossed the horizon, each high temperature region was unaware of a neighboring low temperature region, so the two evolved independently, in such a way that their fractional temperature difference grew as ∆TR ∝ a ∝ t2/3 (26.85) TR [Ex. 26.10]. When λ ¯ crossed the horizon (i.e., when the horizon expanded to become larger than λ ¯ ), the neighboring regions began to communicate. The high-TR , high-ρM region pushed out against its low-TR , low-ρM neighbor, generating sound waves; and correspondingly, the growth of ∆TR /TR changed into acoustic oscillations. For perturbations with some specific physical size λ ¯ 1/4 (angular size Θ1/4 ), the acoustic oscillations had completed one quarter cyle at the time of recombination, so their temperature contrast was reduced, at recombination, to zero. For pertubations a little smaller, Θ1/2 , the oscillations had completed a half cycle at recombination, so the hot regions and cold regions were reversed, and the temperature contrast was roughly as large as at horizon crossing. Perturbations still smaller, Θ3/4 , had completed 3/4 of a cycle at recombination, so their

34

Θ0

Θ1/4 Θ1/2 Θ3/4 Θ1

Fig. 26.6: Anisotropy of the CMB as measured by WMAP (the first two Doppler peaks; most error bars smaller than the dots; Hinshaw et. al. 2009) and by CBI and ACBAR (the last three peaks; Pearson et. al. 2002, and Kuo et. al. 2002). Plotted vertically is the mean square temperature fluctuation; plotted horizontally is the angular scale Θ. The solid curve is the theoretical prediction when one inserts the best-fit values of the cosmological parameters. The grey shading is the range of statistical fluctuations one would expect in an ensemble of universes all with these same cosmological parameter values. This figure is adapted from Hinshaw et. al. (2009).

density contrast was momentarily zero. Perturbations smaller still, Θ1 , had completed a full cycle of oscillation at recombination and so had a large density contrast; and so forth. The result is the pattern of temperature anisotropy as a function of Θ or equivalently / = 1/Θ shown in Fig. 26.6. The first peak in the pattern is for perturbations whose reduced wavelength λ ¯ 0 had only recently come inside the horizon at recombination, so λ ¯ 0 = rAD Θ0 is equal to the diameter Drec of the horizon at recombination, aside from a small difference that can be computed with confidence. This is the standard yardstick that astronomers had sought for decades. The basic structure of the pattern of anisotropy oscillations shown in Fig. 26.6 is in accord with the above description of acoustic oscillations, but the precise details are modestly different because the initial distribution of inhomogeneities is statistical (i.e. is a random process in the sense of Chap. 5), and the physics of the oscillations is somewhat complex. Examples of the complexities are: (i) ∆TR does not go to zero at the minima Θ1/4 and Θ3/4 because the emitting matter has acquired inhomogeneous velocities relative to Earth

35 by falling into the oscillations’ gravitational potential wells, and these velocities produce Doppler shifts that smear out the minima. (ii) This same infall makes the density and temperature contrasts smaller at the half-cycle point Θ1/2 than at the full-cycle points Θ0 and Θ1 . Despite these and other complexities and statistical effects, the shapes of the acoustic oscillations can be computed with high confidence, once one has chosen values for the cosmological parameters. The reason for the confidence is that the amplitude of the oscillations is very small, so nonlinear effects are negligible. The pattern of the temperature oscillations at recombination is computed as a function of physical length λ ¯ , with results that depend modestly on some of the cosmological parameters; and then the physical pattern is converted into a pattern as seen on Earth’s sky, using the angular-diameter distance rAD = λ ¯ /Θ that the CMB photons have traveled since recombination, which depends very strongly on the cosmological parameters. Remarkably, the positions Θ0 , Θ1/2 , Θ1 , . . . of the oscillation peaks (called Doppler peaks for no good reason2 ), depend more strongly on the total density Ω * ΩM + ΩΛ than on other parameters. The first quantitative studies of the Doppler peaks, by the Boomerang project’s balloonborne instruments (Lange et. al. 2000) and soon thereafter by MAXIMA (Balbi et. al. 2000) revealed that Ω = 1.0 ± 0.2 — a great triumph: the universe’s total density is approximately critical, and therefore its spatial geometry is approximately flat. A variety of other balloonbased and ground-based measurements in 2000–2003 led to increasing confidence in this conclusion and in a variety of other Boomerang/MAXIMA cosmological discoveries. More recently the WMAP satellite-borne instruments have produced a great leap in accuracy (Hinshaw et. al. 2008): Ω = 1.00 ± 0.02 (26.86) see Fig. 26.6. This near-unity value of Ω implies that the universe is very close to being spatially flat; see Eq. (26.60). The WMAP measurements also reveal that before the sound waves began producing the oscillations, the spectral density of the temperature perturbations decayed as STR (k) ∝ # −0.96±0.01 rms k , so the rms amplitude of the fluctuations, ∆TR = (k/2π)STR (k) [Eq. (5.64)] was nearly independent of wave number k, i.e. independent of Θ. This is in accord with predictions from “inflationary” models for the production of the perturbations (Sec. 26.7 below.) Yakov B. Zel’dovich and Rashid A. Sunyaev (1970), who first predicted the existence of these peaks, later gave them a name that has a little more justification: They called them Sakharov oscillations because Andrei D. Sakharov (1965) was the first to predict the sound waves that give rise to the peaks. Zel’dovich and Sunyaev introduced this name at a time when their close friend Andrei Sakharov was being attacked by the Soviet government; they hoped that this would call attention to Sakharov’s international eminence and help protect him. It seemed not to help. 2

36

26.5.8

Age of the universe: Constraint on the dark energy

The total mass density Ω = 1.00 ± 0.02 from CMB anisotropy and the cold-matter mass density ΩM = 0.27 ± 0.01 leave a missing mass density ΩΛ = 0.73 ± 0.02 ,

(26.87)

which must be in some exotic form (dark energy) that does not condense, gravitationally, along with the cold matter, and that therefore must have a pressure |PΛ | ∼ ρΛ . This dark energy must have had a negligible density at the time of recombination and at the time of nucleosynthesis; otherwise, it would have disturbed the shapes of the Doppler peaks and distorted the nuclear abundances. In order that it be significant now and small earlier, compared to cold matter, it must have a negative pressure PΛ < 0. One handle on how negative comes from the age of the universe. Assuming, that PΛ /ρΛ ≡ wΛ was constant or approximately so during most of the age of the universe (i.e., back to redshift z ∼ 10) and that the dark energy did not exchange significant energy with other constituents of the universe during that recent epoch, then the first law of thermodynamics implies that ρΛ = ρcrit ΩΛ (ao /a)3(1+wΛ ) . Inserting this, ρM = ρcrit ΩM (ao /a)3 , and ρ = ρM + ρΛ into the Einstein equation (26.26), solving for dt = da/a, ˙ and integrating, we obtain for the product of the current age of the universe to and the Hubble expansion rate Ho : Ho to =

*

1 0

+

dv 1 − ΩM − ΩΛ + v1 (ΩM + ΩΛ v −3wΛ )

.

(26.88)

The more negative is PΛ /ρΛ = wΛ , the larger is the integral, and thus the larger is Ho to . By comparing the observed properties of the oldest stars in our galaxy with the theory of stellar evolution, and estimating the age of the universe at galaxy formation, astronomers arrive at an estimate to = (14 ± 1.5) × 109 yr (26.89) for the age of the universe. A more accurate age from WMAP and other measurements is to = (13.7 ± 0.1) × 109 yr .

(26.90)

WMAP [?? with the aid of Eq. (26.88) ??] also places a moderately tight constraint on the dark energy’s ratio wΛ ≡ PΛ /ρΛ of pressure to energy density (Hinshaw 2009): −1.14 < wΛ < −0.88 .

26.5.9

(26.91)

Magnitude-Redshift relation for type Ia supernovae: Confirmation that the universe is accelerating

The constraint PΛ /ρΛ < −0.78 on the dark energy has a profound consequence for the expansion rate of the universe. In the “particle-in-a-potential” analysis [Eqs. (26.39), (26.40)

37 Dashed

Flat

24

effective mB

22

Solid (0, 0) (1, 0) (2, 0) $X = 0

($%,$X) = ( 0, 1 ) (0.5,0.5) ( 1, 0 ) (1.5,–0.5)

20

18

16

14 0.0

0.2

0.4

0.6

0.8

1.0

redshift z

Fig. 26.7: Magnitude-redshift diagram for type Ia supernovae based on observations by Perlmutter et. al. (1999) and others. [Adapted from Perlmutter et. al. (1999).]

and Fig. 26.3] the contribution of the dark energy to the potential is VΛ (a) = −(4π/3)a2 ρΛ ∝ an where n > 1.3, which grows stronger with increasing a. Correspondingly, in the present era, the “potential energy” is becoming more negative, which means that the universe’s “kinetic energy” 12 a˙ 2 must be increasing: the universe has recently made the transition from a decelerating expansion to an accelerating expansion. In 1998 two independent groups of astronomers reported the first direct observational evidence of this acceleration (Riess et. al. 1998, Perlmutter et. al. 1999). Their evidence was based on systematic observations of the apparent brightness of type Ia supernovae as a function of the supernovae’s redshift-measured distances. If the universal expansion is, indeed, accelerating, then distant objects (including supernovae), which we see when the universe was much younger than today, would have experienced a slower universal expansion than we experience today, so their observed redshifts z should be lower than in a universe with constant or decelerating expansion rate. These lowered redshifts should show up as a leftward displacement of the supernovae’s locations in a diagram plotting the supernovae’s redshift horizontally and their brightness (a measure of their distance) vertically—a so-called magnitude-redshift diagram. Such a diagram is shown in Fig. 26.7. The measure of brightness used in this diagram is the supernova’s apparent magnitude m ≡ −2.5 log10 (F /2.5 × 10−8 WM−2 ) ,

(26.92)

where F is the flux received at Earth. The sign is chosen so that the dimmer the supernova, the larger the magnitude. A series of theoretical curves is plotted in the diagram, based on assumed values for ΩM and ΩΛ , and on the presumption that the dark energy is vacuum

38 stress-energy so PΛ /ρΛ = −1. The formulae for these curves are derived in Ex. 26.9. The solid curves are for no dark energy, ΩΛ = 0. The dark energy, which converts the universal deceleration a¨ < 0 into acceleration a¨ > 0, pushes the curves leftward for distant supernovae (upper right-hand region), as described above. The dashed curves are for a mixture of dark energy and cold matter that sums to the critical density, ΩΛ + ΩM = 1. A detailed analysis of the data by Perlmutter et. al. (1999) gives (assuming PΛ /ρΛ = −1) 1 1 ΩΛ = (4ΩM + 1) ± . 3 6

(26.93)

Combining with ΩM = 0.27 ± 0.01, this implies ΩΛ = 0.70 ± 0.17 ,

(26.94)

in good agreement with the CMB measurements and deductions from the ages of the oldest stars. To recapitulate: A variety of observations all point in the same direction. They agree that our universe is close to spatially flat, with ΩΛ * 0.73, ΩM * 0.27, and ΩR ∼ 10−4 . **************************** EXERCISES Exercise 26.8 Example: Angular-Diameter Distance Consider an electromagnetic emitter at rest in the cosmological fluid (i.e. at rest relative to homogeneous observers), and let the emitter’s radiation be observed at Earth. Neglect the Earth’s motion relative to homogeneous observers. Let the cosmological redshift of the emitted radiation be z = ∆λ/λ. (a) Show that the emitter’s angular-diameter distance is rAD =

R , 1+z

(26.95)

where R = ao Σ(∆χ) =

Σ(∆χ) # . Ho |1 − Ω|

(26.96)

Here ao is the Universe’s expansion factor today, Σ is the function defined in Eq. (26.8), ∆χ is the coordinate distance that light must travel in going from emitter to Earth if its path has constant θ and φ, and for simplicity we have assumed Ω %= 1. [Hint: Place the Earth at χ = 0 of the Robertson-Walker coordinate system, and the emitter at χ = ∆χ, and use the line element (26.81). Also assume Ω %= 1 throughout; the final formula (26.98) for rAD when Ω = 1 can be obtained by letting Ω → 1 at the end of the calculation.]

39 (b) Assuming that the dark energy is vacuum stress-energy so PΛ = −ρΛ , show that in # the limit Ω → 1 so k = 0, the quantity Σ(∆χ)/ |1 − Ω| appearing in Eq. (26.96) becomes * 1+z Σ(∆χ) ∆χ du # # =# = . (26.97) 4 3 |1 − Ω| |1 − Ω| ΩR u + ΩM u + (1 − Ω)u2 + ΩΛ 1 [Hint: Use Eqs. (26.80) and (26.81) to deduce that ∆χ =

*

to te

dt = a

*

1+z

1

a dt ao d ao da a

and use the Einstein equation for da/dt.] Note that for a spatially flat universe [k = 0, Ω = 1, Σ(∆χ) = ∆χ], Eqs. (26.95), (26.96) and (26.97) imply rAD

1 = Ho (1 + z)

*

1+z 1

√

ΩR

u4

du . + ΩM u3 + ΩΛ

(26.98)

(c) Plot rAD (z) for the measured values ΩΛ = 0.73, ΩM = 0.27, ΩR * 0, Ho = 70 km s−1 Mpc−1 . Explore graphically how rAD (z) changes as Ω and ΩΛ /ΩM change. Exercise 26.9 Example: Magnitude-Redshift Relation and Luminosity Distance Consider a supernova which emits a total luminosity L as measured in its own local Lorentz frame in an epoch when the expansion factor is a and the cosmological redshift is z = ao /a−1. (a) Assume that the supernova and the Earth are both at rest relative to homogeneous observers at their locations. Place the origin χ = 0 of a Robertson-Walker coordinate system at the supernova’s location, orient the coordinate axes so the Earth lies at θ = π/2 and φ = 0, and denote by ∆χ the Earth’s radial coordinate location. Show that the flux of energy received from the supernova at Earth today is given by F=

L , 4πR2 (1 + z)2

(26.99)

where R is the same function as appears in the angular-diameter distance, Eqs. (26.96), (26.97). (b) It is conventional to define the source’s luminosity distance rL in such a manner that the flux is F = L/4πrL2 . Eq. (26.99 ) then implies that rL = (1 + z)R = (1 + z)2 rAD .

(26.100)

Plot rL (z) for the measured values ΩΛ = 0.73, ΩM = 0.27, ΩR * 0, Ho = 70 km s−1 Mpc−1 . Explore graphically how rL (z) changes as Ω and ΩΛ /ΩM change.

40 Exercise 26.10 Challenge: Growth of Perturbations Before They Cross the Horizon Show that in a matter-dominated universe, ρM . ρR and ρM . ρΛ , the fractional density difference of two neighboring regions that are outside each others’ cosmological horizons grows as ∆ρ/ρ ∝ a ∝ t2/3 . [Hint: the difference of spatial curvature between the two regions is of importance. For a solution, see, e.g., pp. 108–110 of Peebles (1993).] ****************************

26.6

The Big-Bang Singularity, Quantum Gravity, and the Initial Conditions of the Universe

Although we do not know for sure the correct equation of state at redshifts z . 3 × 1012 , where the thermal energy of each baryon exceeds its rest mass-energy, the “particle-in-apotential” form (26.39) of the evolution equation tells us that, so long as dV /da ≥ 0 at small a, the universe must have begun its expansion in a state of vanishing expansion factor a = 0, nonzero a, ˙ and infinite a/a. ˙ Since some of the components of the Riemann curvature tensor in the local Lorentz frame of a homogeneous observer are of order a˙ 2 /a2 , this means the expansion began in a state of infinite spacetime curvature, i.e., infinite tidal gravity, i.e., in a “big-bang” singularity. From the form V = −(4π/3)a2 ρ for the effective potential [Eq. (26.39)] and the first law of thermodynamics (26.29), we see that the sufficient condition dV /da ≥ 0 for the universe to have begun with a singularity is ρ + 3P > 0 .

(26.101)

Cold matter and radiation satisfy this condition, but dark energy violates it. As we have seen, dark energy seems to have become important in our universe only recently, so it might not have been important at the universe’s beginning. In this section we shall assume that it was not, and that the energy condition ρ + 3P > 0 was satisfied in the early universe. In the next section we shall discuss some consequences of a possible early-universe violation of ρ + 3P > 0. The conclusion that the universe, if homogeneous and isotropic (and if ρ + 3P > 0), must have begun in a big-bang singularity, drove Yevgeny Lifshitz and Isaak Khalatnikov, students of Lev Landau in Moscow, to begin pondering in the late 1930s the issue of whether deviations from homogeneity and isotropy might have permitted the universe to avoid the singularity. A few events (the imprisonment of Landau for a year during Stalin’s purges, then World War II, then the effort to rebuild Moscow, a nuclear weapons race with the United States, and other more urgent physics research) intervened, preventing the LifshitzKhalatnikov studies from reaching fruition until the early 1960s. However, after a great push in 1959–1961, Lifshitz and Khalatnikov reached the preliminary conclusion that early anisotropy and inhomogeneity could have saved the universe from an initial singularity: Perhaps the universe contracted from an earlier, large-scale state, then rebounded at finite size and finite curvature as a result of inhomogeneities and anisotropies. For a pedagogical presentation of the analysis which produced this conclusion see Landau and Lifshitz (1962).

41 The Lifshitz-Khalatnikov analysis was based on the mathematics of tensor analysis (differential geometry). In 1964 Roger Penrose (1965), a young faculty member at Kings College in London, introduced into general relativity an entire new body of mathematical techniques, those of differential topology, and used them to prove a remarkable theorem: that no matter how inhomogeneous and anisotropic an imploding star may be, if it implodes so far as to form a horizon, then it necessarily will produce a singularity of infinite spacetime curvature inside that horizon. Stephen Hawking and George Ellis (1968), at first graduate students and then research fellows at Cambridge University, by picking up Penrose’s techniques and applying them to the early universe, proved that Lifshitz and Khalatnikov had to be wrong: The presently observed state of the universe plus reasonable constraints on its early equation of state imply that, regardless of any inhomogeneity or anisotropy, there must have been a singularity of infinite curvature. In response to these differential-topology analyses, Lifshitz, Khalatnikov, and their student Vladimir Belinsky reexamined their differential-geometry analyses, found an error, and discovered a possible structure for generic spacetime singularities. In this so-called mixmaster structure, as a freely moving observer approaches the singularity, inhomogeneities and anisotropies drive the tidal gravity (spacetime curvature) to oscillate in such a way that the observer feels an infinite, chaotic sequence of oscillations with ever growing amplitude, ever shortening period, and finite total proper-time duration. This is an example of the chaotic behavior which occurs frequently in nonlinear physics. John Archibald Wheeler, a professor at Princeton University, realized in the mid 1950s that the singularities which began the big bang and terminate the implosion of a star cannot be classical: as one nears them, one must encounter a breakdown in classical general relativity, and new physics governed by the laws of quantum gravity (Wheeler 1957). Wheeler devised a simple argument to show that this is so, and to determine the critical radius of curvature of spacetime at which the transition to quantum gravity occurs: Quantum theory insists that every field possess a half quantum of flucutational zero-point energy in each of its modes. Moreover, if one wishes to measure the average value of the field in a spacetime region with 4-volume L4 (a region with side L along each of its 4 dimensions), one’s measurements will be sensitive to the zero-point fluctuations of the modes that have wavelengths ∼ L, but not to any others. Now, so long as gravity is weak over the scale L, one can introduce a nearly Lorentz frame in the region L4 and regard the deviations δgµν ≡ gµν −ηµν of the metric coefficients gµν from the flat metric ηµν as a nearly linear field that lives in nearly flat spacetime. This field must be just as subject to the laws of quantum mechanics as any other field. Its gravitationalwave modes with wavelength L have an energy density of order the square of the gradient of δgµν , i.e., ∼ (δgµν /L)2 , and thus for these modes to contain a half quantum of unpredictable, fluctuational energy, they must have unpredictable fluctuations δgµν of the metric given by ! "2 δgµν ! . (26.102) L3 ∼ L 2L Here the first term is the fluctuational energy density, L3 is the 3-dimensional volume of the mode, and !/2L is its total fluctuational energy. Correspondingly, the mode’s metric fluctuations are √ ! . (26.103) δgµν ∼ L

42 These fluctuations (which we have evaluated in the closest thing there is to a local Lorentz frame in our region L4 ) guarantee that, whenever we try to measure a length L, we will make unavoidable errors with magnitude √ δL ! ∼ δgµν ∼ . L L

(26.104)

√ The smaller is L, the larger are these fractional errors. When L is made smaller than !, the fractional errors exceed unity, there is no hope of measuring L at all (and our analysis also breaks down because we cannot introduce a nearly Lorentz frame throughout the region L4 ). Thus, for a lengthscale L to be measureable, it must lie in the regime L ! LP W ,

where LP W ≡

√

!=

!

G! c3

"1/2

= 1.616 × 10−33 cm .

(26.105)

The critical lengthscale LP W is called the Planck-Wheeler length. It is the shortest length that can possibly be measured with any accuracy. Thus, it is the smallest length that can be subjected to the classical laws of physics. Since gravity is characterized, classically, by the geometry of spacetime, classical gravity (i.e., general relativity) must break down on lengthscales shorter than LP W . This should be true in the small in ordinary, nearly flat spacetime; and it also should be true near singularities: Near a singularity, when the radius of curvature of spacetime as predicted by classical general relativity becomes shorter than LP W , general relativity must break down and be replaced by the correct quantum theory of gravity. And when quantum gravity comes into play, it may very well convert the singularity into something nonsingular. Thus, to understand the true outcome of the gravitational implosion of a star, deep inside the horizon, one must understand quantum gravity; and to understand the initial conditions of the universe, one must understand quantum gravity. The attempt to construct a quantum theory of gravity which unifies gravity with the strong, electromagnetic, and weak forces in an elegant and mutually consistent way is one of the “holy grails” of current theoretical physics research.

26.7

Inflationary Cosmology

If ρ + 3P > 0, then the universe is guaranteed to have cosmological horizons of the sort that we met when discussing acoustic oscillations in the era of recombination (Sec. 26.5.7). The background radiation received at Earth today last interacted with matter (at a redshift z ∼ 103 ) quite near our cosmological horizon. Two observers at the locations of that last interaction, one on our north celestial pole (i.e., directly above the north pole of the Earth) and the other on our south celestial pole (i.e., directly above the south pole of the Earth), are today far outside each others’ cosmological horizons; and at the moment of that last interaction, they were enormously far outside each others’ horizons. It is a great mystery how two regions of the universe, so far outside each others’ horizons (i.e. with no

43 possibility for causal contact since the big bang) could have the same temperatures at the time of that last interaction, to within the measured accuracy of ∆T /T ∼ 10−4 . One solution to this mystery is to assume that the universe emerged from the PlanckWheeler era of quantum gravity in a very nearly homogeneous and isotropic state (but one with enough inhomogeneities to seed galaxy formation). This “solution” leaves to a future theory of quantum gravity the task of explaining why this state was nearly homogeneous and isotropic. An alternative solution, proposed by Alan Guth (1981), then a postdoctoral fellow at Stanford University, is inflation. Suppose, Guth suggests, that the universe emerged from the Planck-Wheeler, quantumgravity era, with its fields in a vacuum state for which Tvac = −ρvac g was nonzero and perhaps was even as large in magnitude as ρvac ∼ !−2 ∼ 1093 g/cm3 . The expansion factor a presumably will have been of order LP W when the universe emerged from the Planck-Wheeler era; and the evolution equation (26.30) predicts that it subsequently will expand classically in accord with the law &! ! "1/2 ' " 4π t a = LP W exp t = LP W exp (26.106) ρΛ , 3 µLP W where µ is a dimensionless constant that might be of order unity. This exponential expansion under the action of vacuum stress-energy is called “inflation;” and if it lasted long enough, that means our entire universe was so small in the early stages of inflation that it could easily communicate with itself, producing homogeneity and isotropy. Of course, inflation at this enormous rate could not have lasted forever; it surely is not continuing today. If it occurred at all, it must have turned off at some point as a result of the fields undergoing a phase transition from the original vacuum state (sometimes called the “false vacuum”) to a new vacuum state in which ρvac is zero, or perhaps equal to the tiny ρΛ that we observe today. Although these ideas seem speculative, they have been made quite plausible by two factors: (i ) they fit naturally into present ideas about the physics of the grand unification of all forces; and (ii ) they successfully explain a number of mysterious features of the universe in which we live, including its spatial flatness, the high degree of isotropy of the background radiation (Ex. 26.11), and the flat (wavelength-independent) spectrum of rms density fluctuations that ultimately condensed into galaxies. For details see, e.g., Kolb and Turner (1994).

26.7.1

Amplification of Primordial Gravitational Waves by Inflation

This section is not yet written.

26.7.2

Search for Primordial Gravitational Waves by their Influence on the CMB; Probing the Inflationary Expansion Rate

This section is not yet written.

44

**************************** EXERCISES Exercise 26.11 Practice: Inflationary explanation for the isotropy of the cosmological background radiation Consider an inflationary cosmological model in which (i ) the expansion factor inflates as a = LP W exp(t/µLP W ) until it has e-folded N . 1 times, i.e., from time t = 0 (when it emerges from the Planck-Wheeler era) to time t = NµLP W , and then (ii ) a phase transition drives a into the standard expansion produced by radiation with P = ρ/3: an expansion with a ∝ t1/2 [Eq. (26.45)]. Show that in this model, if the number of e-folding times during inflation is N . 70, then the north-celestial-pole and south-celestial-pole regions which emit the background radiation we see today are actually inside each others’ cosmological horizons: They were able to communicate with each other (and thereby, the inflationary scenario suggests, were able to homogenize to the same temperature) during the inflationary era. Hint: the number of e-foldings required is given analytically by ( −1 $ % ) Ho ρo 1/4 N . ln * 70 . (26.107) LP W Λ ****************************

Bibliographic Note For an elementary introduction to cosmology, we recommend Chaps. 17, 18, 19 of Hartle (2003); and at an intermediate level, similar to this Chap. 27, we recommend Chap. 8 of Carroll (2004). Textbook treatments of cosmology written before about 1995 are rather out of date, so one should only consult the standard old relativity texts such as MTW (1973) and Weinberg (1972) for the most basic ideas. For physical processes in the early universe such as dark matter, inflation and phase transitions, we recommend Kolb and Turner (1994). Peebles (1993) is an excellent, but a bit out of date, treatise on all aspects of cosmology. More up to date treatises include Dodelson (2003) and Ryden (2002).

Bibliography Balbi, A., et. al., 2000. “Constraints on cosmological parameters from MAXIMA-1,” Astrophysical Journal Letters, submitted. astro-ph/0005124. Bennet, C.L., et. al., 2003. “First Year Wilkinson Microwave Anisotropy Probe (WMAP) Observations: Preliminary Maps and Basic Results”, Astrophysical Journal Letters, submitted. astro-ph/0302207

45

•

•

•

• •

• •

Box 26.5 Important Concepts in Chapter 27 Homogeneity and isotropy of universe, and its mathematical description via hypersurfaces, synchronous coordinates, Robertson-Walker line element, and three spatial geometries (closed, flat and open), Sec. 26.2 – Homogeneous observers and their local Lorentz frame, Secs. 26.2, 26.3, 26.5.2 Functions describing evolution of universe: expansion factor a(t) and total density of mass-energy ρ(t), Secs. 26.2, 26.3 – Evolution laws for ρ(t) and a(t): first law of themodynamics, and Einstein equation for expansion rate, Sec. 26.3 – Critical density to close the universe, ρcrit , Eq. (26.35) – Effective potential for expansion of universe and qualitative and quantitative forms of a(t), Secs. 26.4.3, and 26.4.4 Constituents of the universe: baryonic matter, cold dark matter, radiation, and dark energy; and their evolution as functions of the universe’s expansion factor a, Secs. 26.4.1, 26.4.3, 26.5.8 – Ω ≡ ρ/ρcrit and its measured values for constituents, Secs. 26.4.3, 26.5.4–26.5.7 – Stress-energy tensor for the vacuum, cosmological constant, and their possible role as the dark energy, Sec. 26.4.1, Box 26.2 – Radiation temperature and cosmological redshift as functions of a, Sec. 26.4.4 – Preservation of Planckian spectrum during evolution, Box 26.4 Physical processes during expansion: baryon-antibaryon annihilation, electronpositron annihilation, primordial nucleosynthesis, radiation-matter equality, plasma recombination, galaxy formation, Secs. 26.4.5, 26.5.4 Observational parameters: Hubble expansion rate Ho , Ω for constituents, spatial curvature k/a2o , deceleration parameter qo , age of universe, to , Secs. 26.5.1, 26.5.3 – Measured values and methods of measurement, Secs. 26.5.3–26.5.9 – Distance-redshift relation, Sec. 26.5.3 – Angular-diameter distance as function of redshift, Eq. (26.79) and Ex. 26.8 – Anisotropy of the CMB; Doppler peaks, and their use to measured the spatial geometry of the universe and thence Ω, Sec. 26.5.7, Fig. 26.6 – Ages of the universe constrains equation of state of dark energy, Sec. 26.5.8 – Luminosity distance; magnitude-redshift relation, Sec. 26.5.9, Ex. 26.9 Big-bang singularity, Planck-Wheeler length and quantum gravity, Sec. 26.6 Inflation, Sec. 26.7

46 Carroll, S.M., 2004. Spacetime and Geometry: An Introduction to General Relativity, San Francisco: Addison-Wesley. Dicke, Robert H., Peebles, P. James E., Roll, Peter G., and Wilkinson, David T., 1965. “Cosmic-black-body radiation,” Astrophysical Journal, 142, 414–419. Dodelson, S., 2003. Modern Cosmology, Academic Press. Einstein, Albert, 1917. “Kosmologische Betrachtungen zur allgemeinen Relativit¨ atstheorie,” Sitzungsberichte der K¨ oniglich Preussischen Akademie der Wissenschaften, 1917 volume, 142–152. Einstein, Albert, 1931. “Zum kosmologischen Problem der allgemeinen Relativit¨ atstheorie,” Sitzungsberichte der K¨ oniglich Preussischen Akademie der Wissenschaften, phys.-math. Kl., 1931 volume, 235–237. ¨ Friedmann, Alexander Alexandrovichi, 1922. “Uber die Kr¨ ummung des Raumes,” Zeitschrift f¨ ur Physik, 10, 377–386. Guth, Alan H., 1981. “Inflationary universe: A possible solution to the horizon and flatness problems,” Physical Review, 23, 347–356. Hartle, J. B., 2003. Gravity: An Introduction to Einstein’s General Relativity, San Francisco: Addison-Wesley. Hawking, Stephen W. and Ellis, George F. R., 1968. “The cosmic black body radiation and the existence of singularities in our universe,” Astrophysical Journal, 15225–36. Hinshaw, G. et. al., 2009. “Five-year Wilkinson Microwave anisotropy probe (WMAP) observations: Data Processing, Sky Maps, and Basic Results, AStrophysical Journal Supplement Series 180, 225–245. Hubble, Edwin Powell, 1929. “A relation between distance and radial velocity among extragalactic nebulae,” Proceedings of the National Academy of Sciences, 15, 169–173. Kolb, Edward W., Turner, Michael S., 1994. The Early Universe, Reading: AddisonWesley. Kuo, C.L. et. al. 2002. Astrophysical Journal, in press. astro-ph/0212289 Landau, Lev Davidovich, and Lifshitz, Yevgeny Michailovich, 1962. The Classical Theory of Fields, Addison Wesley, Reading, MA. Lange, Andrew E. et. al., 2000. “First estimations of cosmological parameters from Boomerang,” Physical Review Letters, submitted. astro-ph/000504. MTW: Misner, Charles W., Thorne, Kip S. and Wheeler, John A., 1973. Gravitation, W. H. Freeman & Co., San Francisco.

47 Pais, Abraham, 1982. ‘Subtle is the Lord...’ The Life and Science of Albert Einstein, Oxford University Press, New York. Pearson, T.J. et. al., 2002. Astrophysical Journal, submitted. astro-ph/0205288 Penrose, Roger, 1965. Gravitational collapse and space-time singularities, Physical Review Letters, 14, 57–59. Penzias, Arno A., and Wilson, Robert W, 1965. “A measurement of excess antenna temperature at 4080 Mc/s,” Astrophysical Journal, 142, 419–421. Peebles, P. J. E., 1993. Principles of Physical Cosmology, Princeton: Princeton University Press. Perlmutter, S. et. al., “Measurements of Ω and Λ from 42 high-redshift supernovae,” Astrophysical Journal, 517, 565–586 (1999). Riess, A. G. et. al., 1998. “Observational evidence from supernovae for an accelerating universe and a cosmological constant,” Astronomical Journal, 116, 1009. Ryan, Michael, and Shepley, Lawrencei, 1975. Homogeneous, Relativistic Cosmology, Princeton University Press, Princeton, NJ. Ryden, B.S., 2002. Introduction to Cosmology, San Francisco: Addison-Wesley. Robertson, Howard Percy, 1935. “Kinematics and World Structure,” Astrophysical Journal, 82, 248–301; and 83, 287–201 & 257–271. Sakharov, Andrei D., 1965. Zhurnal Eksperimentalnoi i Teoreticheskii Fisika, 49, 345. Sunyaev, Rashid A., and Zel’dovich, Yakov B., 1970. “Small-scale fluctuations of relic radiation,” Astrophysics and Space Science, 7, 3–19. Turner, Michael S., 1999. in Proceedings of Particle Physics and the Universe (Cosmo98), ed. D. O. Caldwell, AIP, Woodbury, NY; astro-ph/9904051. Walker, Arthur Geoffrey, 1936. “On Milne’s theory of world-structure,” Proceedings of the London Mathematical Society, 42, 90–127. Weinberg, S., 1972. Gravitation and Cosmology, New York: John Wiley. Wheeler, John Archibald, 1957, “On the nature of quantum geometrodynamics,” Annals of Physics, 2, 604–614. Zel’dovich, Yakov Borisovich, 1968. “The cosmological constant and the theory of elementary particles,” Soviet Physics—Uspekhi, 11, 381–393. Zel’dovich, Yakov Borisovich, and Novikov, Igor Dmitrivich, 1983. Relativistic Astrophysics Volume 2: The Structure and Evolution of the Universe, University of Chicago Press, Chicago.

04.AppA.11.1K.ooutline

9/26/04 9:00:25 PMPM

Appendix A Concept-Based Outline of This Book Version 04.AppA.11.1K [including Chapters 1 through 11] by Kip [This appendix is in the very early stages of development]

I. Frameworks for physical laws and their relationships to each other A. Newtonian Physics as Geometry 1. Flat space as the arena: Sec. 1.1 2. Coordinate invariance of physical laws a. Idea Introduced: Sec. 1.2 b. Newtonian particle kinetics as an example: Sec. 1.4 c. Newtonian mass conservation and force balance: Sec. 11.2 3. Elasticity in Geometric Language a. Irreducible tensorial parts of strain tensor: expansion, shear and rotation: Sec. 10.2, Box 10.1 b. Elastic stress tensor and force balance: Secs. 10.4, 11.2 B. Special Relativistic Physics as Geometry 1. Flat spacetime as the arena: Sec.. 1.1

2. Frame-invariance of physical laws a. Idea introduced: Sec. 1.2 b. Relativistic particle kinetics: Sec. 1.4 c. 4-momentum conservation: Secs. 1.4 & 1.12 i. Stress-energy tensor: Sec. 1.12 d. Electromagnetic theory: Sec. 1.10 i. Lorentz force law: Sec. 1.4 e. Kinetic theory: Chap. 2 i. Derivation of equations for macroscopic quantities as integrals over momentum space [Sec. 2.5] ii. Distribution function is frame-invariant and constant along fiducial trajectories [Secs. 2.2 & 2.7] 3+1 Splits of spacetime into space plus time, and resulting relationship between frame3. invariant and frame-dependent laws of physics a. Particle kinetics: Sec. 1.6

b. Electromagnetic theory: Sec. 1.10 c. Continuum mechanics; stress-energy tensor: Sec. 1.12 d. Kinetic theory: Secs. 2.2, 2.5 & 2.7 i. Cosmic microwave radiation viewed in moving frame: Ex. 2.3 4. Spacetime diagrams a. Introduced: Sec. 1.7 b. Simultaneity breakdown, Lorentz contraction, time dilation: Exercise 1.11 c. The nature of time; twins paradox, time travel: Sec. 1.8 d. Global conservation of 4-momentum: Secs. 1.6 & 1.12 e. Kinetic theory -- Momentum space: Sec. 2.2 C. General Relativistic Physics as Geometry

Page 1

04.AppA.11.1K.ooutline

9/26/04 9:00:25 PMPM

C. General Relativistic Physics as Geometry 1. Curved spacetime as the arena: Sec. 1.1 D. Kinetic Theory in Geometric Language 1. Phase space for particles as the arena: Chap 2 E. Statistical Mechanics in Geometric Language 1. Phase space for ensembles as the arena: Chap 3 2. invariance under canonical transformations (change of generalized coordinates and momenta in phase space): Sec. 3.2, Ex. 3.1 F. Relationship of Classical Theory to Quantum Theory 1. Mean occupation number as classical distribution function: Sec. 2.3

2. Mean occupation number determines whether particles behave like a classical wave, like classical particles, or quantum mechanically: Secs. 2.3 & 2.4; Ex. 2.1; Fig. 2.5 3. Geometric optics of a classical wave is particle mechanics of the wave's quanta: Sec. 6.3 4. Geometric optics limit of Schrodinger equation is classical particle mechanics: Ex. 6.6

II. Statistical physics concepts A. Systems and ensembles: Sec. 3.2 B. Distribution function 1. For particles: Sec. 2.2 2. For photons, and its relationship to specific intensity: Sec. 2.2 3. For systems in statistical mechanics: Sec. 3.2 4. Evolution via Vlasov or Boltzmann transport equation: Sec. 2.7 a. Kinetic Theory: Sec 2.7 b. Statistical mechanics: Sec. 3.3 5. For random processes: hierarchy of probability distributions: Sec. 5.2 C. Thermal equilibrium 1. Kinetic-theory distribution functions: Sec. 2.4 2. In statistical mechanics; general form of distribution function in terms of quantities exchanged with environment: Sec. 3.4 3. Evolution into statistical equilbrium--phase mixing and coarse graining: Secs. 3.6 and 3.7 D. Representations of Thermodynamics 1. Summary: Table 4.1

2. 3. 4. 5.

Energy representation: Sec. 4.2 Free-energy representation: Sec. 4.3 Enthalpy representaiton: Ex. 4.3 Gibbs representation: Sec. 4.4

E. Specific statistical-equilibrium ensembles and their uses 1. Summary: Table 4.1 2. Canonical, Gibbs, grand canonical and microcanonical defined: Sec. 3.4 3. Microcanonical: Secs. 3.5 and 4.2 4. Canonical: Sec. 4.3 5. Gibbs: Sec. 4.4 6. Grand canonical: Sec. 3.7 and Ex. 3.6 and 3.7 F. Fluctuations in statistical equilibrium 1. Summary: Table 4.2

Page 2

04.AppA.11.1K.ooutline

9/26/04 9:00:25 PMPM

1. 2. 3. 4.

Summary: Table 4.2 Particle number in a box: Ex. 3.7 Distribution of particles and energy inside a closed box: Sec. 4.5

Temperature and volume fluctuations of system interacting with a heat and volume bath: Sec. 4.5 5. Fluctuation-dissipation theorem: Sec. 5.6.1 6. Fokker-Planck equation: Sec. 5.6.2

7. Brownian motion: Sec. 5.6.3 G. Entropy 1. Defined: Sec. 3.6 2. Second law (entropy increase): Secs. 3.6, 3.7 3. Entropy per particle: Secs. 3.7, 3.8, Fig. 3.4, Exs. 3.5, 3.9 4. Of systems in contact with thermalized baths: a. Summary: Table 4.1 b. Heat & volume bath (Gibbs): Sec. 4.4 i. Phase transitions: Secs. 4.4 & 4.6, Ex. 4.4 & 4.7 ii. Chemical reactions: Sec. 4.4, Ex. 4.5 & 4.6 5. Of black holes and the expanding universe: Sec. 3.8 6. Connection to Information: Sec. 3.9 H. Macroscopic properties as integrals over momentum space: 1. In kinetic theory a. Number-flux vector, stress-energy tensor: Sec. 2.5 b. Equations of state: Sec. 2.6 c. Transport coefficients: Sec. 2.8 2. In statistical mechanics: Extensive thermodynamic variables a. Grand partition function: Ex. 3.6 3. In theory of random processes: Ensemble averages: Sec. 5.2 I. Random Processes: Chap 5 [extended to complex random processes in multiple dimensions: Ex. 8.7] 1. Properties of random processes

a. Stationarity: Sec. 5.2 b. Markov: Sec. 5.2 c. Gaussian: Sec. 5.2 d. Ergodicity: Sec. 5.3 2. Characterization of random processes a. Probability distributions: Sec. 5.2 b. Correlation functions: Sec. 5.3 c. Spectral densities: Sec. 5.3 i. white, flicker, random-walk: Sec. 5.4 ii. shot noise: Sec. 5.5 3. Theorems a. Central limit theorem [many influences -> Gaussian]: Sec. 5.2 i. and shot noise: Sec. 5.5 Page 3

04.AppA.11.1K.ooutline

i. and shot noise: Sec. 5.5

9/26/04 9:00:25 PMPM

b. Wiener-Khintchine [correlation spectral density]: Sec. 5.3 i. van Cittert-Zernike theorem in optics as a special case: Ex. 8.7 c. Doobs theorem [Gaussian & Markoff -> fully characterized by mean, variance, and relaxation time: Sec. 5.3 d. Effect of filter on spectral density: Sec. 5.5 e. Fluctuation-dissipation theorem: Sec. 5.6.1, Ex. 5.7, 5.8, 5.10

i. example of thermelastic noise: Secs. 10.5, 5.10, Ex. 10.6 f. Fokker-Planck equation: Sec. 5.6.2 i. and Brownian motion: Sec. 5.6.3, Ex. 5.6, 5.9 4. Filtering a. Band-pass filter: Sec. 5.5, Ex. 5.2 b. Wiener's optimal filter: Ex. 5.3 III. Optics (wave propagation) concepts A. Plane waves & wave packets in homogeneous media 1. Dispersion relation, phase velocity, group velocity: Sec. 6.2 2. Longitudinal wave packet spreading due to dispersion: Ex. 6.2 3. Transverse wave packet spreading due to finite-wavelength effects: Sec. 7.2; Fig. 7.2 B. Geometric optics approximation: Sec. 6.3, Box 11.1 1. Derivation via 2 lengthscale expansion: Sec. 6.3 2. Propagatin laws and their relation to Hamiltonian mechanics and quantum mechanics: Secs. 6.3 and 6.5 3. Fermat's principle: Sec. 6.3 a. Justified by Fresnel theory of diffraction: Sec. 7.4

4. Paraxial optics: Sec. 6.4 a. Paraxial ray optics: Sec. 6.4 b. Paraxial Fourier (wave) optics: Sec. 7.5 5. Breakdown of geometric optics a. General discussion (wave packet spreading, parametric wave amplification, ...): Sec. 6.3, Ex. 11.3 b. Caustics: Secs. 6.6, 7.6 6. Application to seismic waves in Earth: Sec. 11.5.1, Fig. 11.5

C. Finite-Wavelength Effects in Homogeneous, Dispersion-Free Media 1. Helmholtz-Khirchoff Integrals a. Precise version: field at point as integral over surrounding closed surface: Eq. (7.4) b. As integral over an aperture: Eq. (7.6) 2. Fraunhofer diffraction (far from diffracting object): Sec. 7.3 a. As Fourier transform of field leaving aperture: Eq. (7.11) b. Use of Convolution Theorem to compute diffraction patterns from complicated objects: Fig. 7.4 c. Babinet's principle: Sec. 7.3.2 d. Airy pattern for circular aperture: Fig. 7.6

e. Caustics: Sec. 7.6, Fig. 7.13 3. Fresnel diffraction (near diffracting object): Sec. 7.4 a. Fresnel integrals and Cornu spiral: Fig. 7.8

Page 4

04.AppA.11.1K.ooutline

9/26/04 9:00:25 PMPM

a. Fresnel integrals and Cornu spiral: Fig. 7.8 b. Diffraction pattern from straight edge: Fig. 7.9 4. Fourier optics [Paraxial Optics with finite wavelengths]: Sec. 7.5 a. Propagators (Point Spread Functions): Sec. 7.5 b. Gaussian beams: Sec. 7.5.5 D. Finite-Wavelength Effects in the Mixing of a Few Wave Beams: Chap. 8: 1. Coherence: Sec. 8.2 a. degree of coherence: i. degree of spatial coherence = degree of lateral coherence = complex fringe visibility, gamma_perp: Secs. 8.2.2 - 8.2.4 ii. fringe visibility, V = | gamma_perp |: Sec. 8.2.2 iii. degree of temporal coherence = degree of longitudinal coherence, gamma_||: Sec. 8.2.6 b. coherence time (Sec. 8.2.3), coherence length (Sec. 8.2.6), volume of coherence (Sec. 8.2.8) c. interferogram and spectrum: Sec. 8.2.7

d. intensity coherence and correlations: Sec. 8.6 2. Van Cittert-Zernike Theorem (coherence as Fourier transform of angular intensity distribution and spectrum): Sec. 8.2.2 a. as special case of Wiener-Khintchine Theorem: Ex. 8.7

E. Optical Instruments 1. Lens: Fig. 6.3, Fig. 6.5 a. geometric-optics analysis: Fig. 6.3, Fig. 6.5 b. Fourier-optics analysis: Sec. 7.5, Fig. 7.11 2. Refracting telescope: Ex. 6.9 3. Optical cavity: a. geometric-optics analysis: Ex. 6.10 b. as Fabry-Perot interferometer: Sec. 8.4.2 c. in interferometric gravitational-wave detector: Sec. 8.5 4. Optical fiber: a. geometric-optics analysis: Ex. 6.5 b. Fourier-optics analysis - Gaussian beam: Ex. 7.8 5. Diffraction grating: Sec. 7.2, Fig. 7.4 6. Zone Plate, Fresnel Lens: Sec. 7.4 7. Phase Contrast Microscope: Sec. 7.5, Fig. 7.12 8. Young's slits: Sec. 8.2.1 9. Michelson interferometer: Sec. 8.2.7 10. Michelson stellar interferometer: Sec. 8.2.5 11. Fourier transform spectrometer: Sec. 8.2.7 12. Radio interferometer: Sec. 8.3 a. earth-rotation aperture synthesis: Sec. 8.3.1 b. closure phase: Sec. 8.3.3 13. Interfaces, mirrors, beam splitters: a. Reciprocity relations for transmission and reflection: Sec. 8.4.1, Ex. 8.9

Page 5

04.AppA.11.1K.ooutline

9/26/04 9:00:25 PMPM

a. Reciprocity relations for transmission and reflection: Sec. 8.4.1, Ex. 8.9 b. Antireflection coating: Ex. 8.10 14. Etalon: Sec. 8.4.1 a. finesse: Sec. 8.4.1 15. Fabry-Perot interferometer: Sec. 8.4.2 16. Fabry-Perot spectrometer: Sec. 8.4.2 a. chromatic resolving power: Sec. 8.4.2 17. Sagnac interferometer: Ex. 8.11 18. Interferometric gravitational-wave detector: Sec. 8.5 19. Hanbury-Brown-Twiss intensity interferometer: Sec. 8.6 20. Hologram and holography: Sec. 9.3 a. Compact disks: Ex. 9.3 21. Frequency doubling crystals: Secs. 9.5.4, 9.6.1; Ex. 9.8 22. Phase Conjugating mirrors: Secs. 9.4, 9.6.2; Fig. 9.10, Ex. 9.9 23. Light Squeezing device: Ex. 9.10 IV. Continuum Mechanics in General A. Fundamental Concepts 1. Mass conservation: Sec. 11.2 2. Momentum conservation: Sec. 11.2 3. Wave equations: Box 11.1 V. Elasticity A. Fundamental Concepts 1. Strain; expansion, rotation and shear: Sec. 10.2 a. Irreducible tensorial parts of strain: Box 10.1 2. Cylindrical and spherical coordinates and bases; connection coefficients: Sec. 10.3, Box 10.2 3. Bulk and shear moduli; elastic stress tensor: Sec. 10.4.1, 10.4.2 a. Atomic-scale origin of moduli: Sec. 10.4.4, Ex. 10.15

b. Numerical values: Table 10.1 4. Elastic force on a unit volume; elastic stress balance: Sec. 10.4.2 a. Biharmonic equation for displacement; harmonic equation for expansion: Ex. 10.13 5. Elastic energy (energy of deformation): Sec. 10.4.3 6. Boundary conditions on displacements and stresses: Sec. 11.5.1, Ex. 11.6 7. Green's functions: Sec. 11.5.3 B. Elastostatic Equilibrium 1. Rods, Beams, Fibers a. Longitudinal compression; Young's modulus, Poisson ratio: Sec. 10.4.5 b. Torsion pendulum: Ex. 10.5 c. Linear pendulum - bending of support wire: Ex. 10.10 d. Bending of cantilever beam under own weight: Sec. 10.6, Ex. 10.8 e. Elastica (large deformations away from straightness): Ex. 10.9 f. Bifurcation of equilibria for stressed beam: Sec. 10.8 2. Plates Page 6

04.AppA.11.1K.ooutline

9/26/04 9:00:25 PMPM

2. Plates a. Deformation of plate under applied forces: Sec. 10.7 b. Stress polishing of mirrors: Sec. 10.6, Ex's: 10.11, 10.12 C. Bifurcation of Equilibria 1. Example of compressed beam or playing card: Secs. 10.8, 11.4.5 2. List of other examples: Sec. 10.8 3. Relationship to onset of intabilities: Sec. 11.4.5 D. Elastodynamics 1. Wave equation for displacement: Sec. 11.3.1 2. Waves in a homogeneous, isotropic medium a. Longitudinal waves (P-waves) and transverse waves (S-waves): Secs. 11.3.2, 11.3.3, 11.3.4 b. Wave energy: Sec. 11.3.5 c. Scalar and vector potentials: Ex. 11.1

d. Junction conditions at a boundary: Sec. 11.5.1, Ex. 11.6 e. Wave-wave mixing at a boundary: Sec. 11.5.1, Fig. 11.4 f. Edge waves: Rayleigh waves and Love waves: Sec. 11.5.2 3. Waves in rods, strings and beams: Sec. 11.4 a. Compression waves, torsion waves, and flexural waves: Secs. 11.4.1-11.4.4, Exs. 11.3, 11.4 b. Relationship to buckling and bifurcation of equilibria: Sec. 11.4.5 4. Seismic waves in the earth: Sec. 11.5

a. Earth's elastostatic structure: Table 11.1, Fig. 11.5 5. Normal modes of a solid: Sec. 11.5.4 a. xylophone: Ex. 11.5 VI. Nonlinear Physics A. Resonant Wave-Wave mixing: Chap 9 1. Via nonlinear dielectric susceptibilities: Sec. 9.5 2. Use of anisotropy to counteract dispersion: Sec. 9.5.4, Ex. 9.6 3. Holography: Sec. 9.3 a. Use in compact disks: Ex. 9.3 4. Phase Conjugation: Secs. 9.4, 9.6.2; Fig. 9.10, Ex. 9.9 5. Frequency doubling: Sec. 9.5.4, 9.6.1; Ex. 9.8 6. Light Squeezing: Ex. 9.10 VII. Computational techniques A. Tensor analysis 1. Without a coordinate system, abstract notation: Secs. 1.3 and 1.9 2. Index manipulations in Euclidean 3-space and in spacetime a. Tools introduced; slot-naming index notation: Sec's 1.5, 1.7 &1.9 b. Used to derive standard 3-vector identities: Exercise 1.15 3. In orthogonal curvilinear coordinates with orthonormal bases: Sec. 10.3 a. Connection coefficients: Sec 10.3, Ex. 10.1, Box 10.2 B. Two-lengthscale expansions: Box 2.2 Page 7

9/26/04 9:00:25 PMPM B. Two-lengthscale expansions: Box 2.2 1. Solution of Boltzmann transport equation in diffusion approximation: Sec. 2.8 2. Semiclosed systems in statistical mechanics: Sec. 3.2 3. Statistical independence of subsystems: Sec. 3.4 4. As foundation for geometric optics: Sec 6.3 C. Matrix and propagator techniques for linear systems 1. Paraxial geometric optics: Matrix methods: Sec. 6.4 2. Paraxial Fourier optics (finite wavelengths): Propagator methods: Sec. 7.5 D. Statistical physics: 1. Computation of fundamental potentials (or partition functions) via sum over states: Secs.

04.AppA.11.1K.ooutline

3.7, 4.3; Exercise 3.6 2. Renormalization group: Sec. 4.6 3. Monte carlo: Sec. 4.7

E. Green's functions: 1. In elasticity theory: physicists' and Heaviside: Sec. 11.5.3

Page 8

04.AppB.11.1K.ooutline

9/26/04 8:59:43 PMPM

Appendix B Some Unifying Concepts Version 04.AppB.11.1K [including mostly Chapters 1 through 11] by Kip [This appendix is in the very early stages of development]

I Physics as Geometry A Newtonian Physics as Geometry 1 Flat space as the arena: Sec. 1.1 2 Coordinate invariance of physical laws a Idea Introduced: Sec. 1.2 b Newtonian particle kinetics as an example: Sec. 1.4 c Newtonian mass conservation and force balance: Sec. 11.2 3 Elasticity in Geometric Language a Irreducible tensorial parts of strain tensor: expansion, shear and rotation: Sec. 10.2, Box 10.1 b Elastic stress tensor and force balance: Secs. 10.4, 11.2 B Special Relativistic Physics as Geometry 1 Flat spacetime as the arena: Sec.. 1.1

2 Frame-invariance of physical laws a Idea introduced: Sec. 1.2 b Relativistic particle kinetics: Sec. 1.4 c Stress-energy and 4-momentum conservation: Secs. 1.4 & 1.12 d Electromagnetic theory: Secs. 1.4, 1.10 & 1.12.3 e Kinetic theory: Chap. 2 - Secs. 2.2, 2.5 & 2.7 3 3+1 Splits of spacetime into space plus time, and resulting relationship between frameinvariant and frame-dependent laws of physics: a Particle kinetics: Sec. 1.6

b Electromagnetic theory: Sec. 1.10 c Continuum mechanics; stress-energy tensor: Sec. 1.12 d Kinetic theory: Secs. 2.2, 2.5 & 2.7; Ex. 2.3 4 Spacetime diagrams: Secs. 1.6, 1.7, 1.8, 1.12, 2.2; Ex. 1.11 C General Relativistic Physics as Geometry 1 Curved spacetime as the arena: Sec. 1.1 2 Details: Part VI D Kinetic Theory in Geometric Language 1 Phase space for particles as the arena: Chap 2 2 Details: Chap 2 E Statistical Mechanics in Geometric Language 1 Phase space for ensembles as the arena: Chap 3 2 invariance under canonical transformations (change of generalized coordinates and momenta in phase space): Sec. 3.2, Ex. 3.1 II Relationship of Classical Theory to Quantum Theory A Quantum mean occupation number

1 As classical distribution function: Sec. 2.3

Page 1

04.AppB.11.1K.ooutline

9/26/04 8:59:43 PMPM

1 As classical distribution function: Sec. 2.3 2 Determines whether particles behave like a classical wave, like classical particles, or quantum mechanically: Secs. 2.3 & 2.4; Ex. 2.1; Fig. 2.5 B Geometric optics (Eikonal or WKB approximation) 1 Geometric optics of a classical wave is particle mechanics of the wave's quanta: Sec. 6.3

2 Geometric optics limit of Schrodinger equation is classical particle mechanics: Ex. 6.6 III Conservation Laws A In relativity 1 Charge conservation: Sec. 1.11.1 2 particle & rest-mass conservation: Sec. 1.11.2 3 4-momentum conservation: Secs. 1.4, 1.6, 1.12 B In Newtonian physics 1 particle conservation 2 rest-mass conservation 3 momentum conservation 4 energy conservation C You get out what you put in! [use of conservation laws to deduce equations of motion] Statistical Physics Concepts IV A Systems: Sec. 3.2

1 [Give Examples] B Distribution functions & their evolution 1 For particles: Sec. 2.2 2 For photons, and its relationship to specific intensity: Sec. 2.2 3 For systems in statistical mechanics: Sec. 3.2 4 Evolution via Vlasov or Boltzmann transport equation: Sec. 2.7 a Kinetic Theory: Sec 2.7 b Statistical mechanics: Sec. 3.3 5 For plasmons: Chap. 22 6 For random processes: hierarchy of probability distributions: Sec. 5.2 C Thermal [statistical] equilibrium; equilibrium ensembles; representations of thermodynamics 1 Evolution into statistical equilbrium--phase mixing and coarse graining: Secs. 3.6 and 3.7

2 In kinetic theory: Sec. 2.4 3 In statistical mechanics: a general form of distribution function in terms of quantities exchanged with environment: Sec. 3.4 b summary of ensembles and representations: Table 4.1, Sec. 3.4 c Microcanonical ensemble; Energy representation: Secs. 3.5 and 4.2

d Canonical ensemble; Free-energy representation: Sec. 4.3 e Gibbs ensemble; Gibbs representation: Sec. 4.4 f Grand canonical ensemble & representation: Sec. 3.7 and Ex. 3.6 and 3.7 D Fluctuations in statistical equilibrium 1 Summary: Table 4.2 Page 2

04.AppB.11.1K.ooutline

9/26/04 8:59:43 PMPM

1 2 3 4

Summary: Table 4.2 Particle number in a box: Ex. 3.7 Distribution of particles and energy inside a closed box: Sec. 4.5

Temperature and volume fluctuations of system interacting with a heat and volume bath: Sec. 4.5 5 Fluctuation-dissipation theorem: Sec. 5.6.1 6 Fokker-Planck equation: Sec. 5.6.2

7 Brownian motion: Sec. 5.6.3 E Entropy 1 Defined: Sec. 3.6 2 Second law (entropy increase): Secs. 3.6, 3.7 3 Entropy per particle: Secs. 3.7, 3.8, Fig. 3.4, Exs. 3.5, 3.9 4 Of systems in contact with thermalized baths: Table 4.1; Secs. 4.4 & 4.6; Exs. 4.4 - 4.7 5 Phase transitions: Secs. 4.4 & 4.6, Exs. 4.4 & 4.7 6 Chemical reactions: Sec. 4.4, Exs. 4.5 & 4.6 7 Of black holes and the expanding universe: Sec. 3.8 8 Connection to Information: Sec. 3.9 F Macroscopic properties of matter as integrals over momentum space: 1 In kinetic theory: Secs. 2.5, 2.6, 2,8 2 In statistical mechanics: Ex. 3.6 3 In theory of random processes: Ensemble averages: Sec. 5.2 G Random Processes: Chap 5 [extended to complex random processes in multiple dimensions: Ex. 8.7] 1 Theory of Real-valued random processes: Chap 5

2 Theory of Complex-valued random processes: Ex. 8.7 3 Some unifying tools: a Gaussian processes & central limit theorem: Secs 5.2, 5.3, 5.5 b Correlation functions, spectral densities, and Wiener-Khintchine [van Cittert-Zernike] theorem relating them: Secs 5.3-5.5, 8.7 c Filtering: Sec. 5.5, Exs. 5.2 & 5.3 d Fluctuation-dissipation theorem: Secs. 5.6.1, 5.10, 10.5, Exs. 5.7, 5.8, 5.10, 10.6

e Fokker-Planck equation: Secs. 5.6.2, 5.6.3, Exs 5.6, 5.9 V Optics [wave propagation] Concepts A Geometric Optics [eikonal or WKB approximation] 1 General theory: Secs. 6.3, Box. 11.1 2 Dispersion relations & their role as Hamiltonians: Secs. 6.3, 6.5 3 Fermat's principle: Secs. 6.3, 7.4 4 Paraxial ray optics: Secs. 6.4, 7.5 a use in analyzing optical instruments: Exs. 6.10, 6.11, Figs. 6.3, 6.5 5 Breakdown of geometric optics: Secs. 6.3, 6.6, 7.6, Ex. 11.3 6 Caustics & Catastrophes: Secs. 6.6, 7.6 7 Applications: a waves in solids: Chap. 11 b seismic waves in Earth: Sec. 11.5.1, Fig. 11.5

Page 3

04.AppB.11.1K.ooutline

9/26/04 8:59:43 PMPM

b seismic waves in Earth: Sec. 11.5.1, Fig. 11.5 c gravitational lensing B Linear, Finite-wavelength phenomena 1 Wave packets: their motion, energy, and spreading 2 Diffraction a General theory (Helmholtz-Khirkhoff integral): Sec. 7.2 b Fraunhofer (distant) regime: Sec. 7.3 i Diffraction patterns: Figs. 7.4, 7.6 ii Babinet's principle: Sec. 7.3.2 iii Field near caustics: Sec. 7.6, Fig. 7.13 c Fresnel (near) regime: Sec. 7.4 3 Fourier optics (paraxial optics with finite wavelengths): Sec. 7.5 a use in analyzing optical instruments: Sec. 7.5, Fig. 7.11. 7.12, Ex. 7.8, 4 Coherence and its applications: Secs. 8.2, 8.6; Ex. 8.7 5 Interference: Chap 8 a Etalons and optical (Fabry-Perot) cavities: Sec. 8.4 b Radio interferometers: Sec. 8.3 c Gravitational-wave interferometers: Sec. 8.5 d Intensity interferometry: Sec. 8.6 6 Edge waves & wave-wave mixing at the boundary of a medium a Rayleigh waves and love waves in a solid: Sec. 11.5.2 b Wave-wave mixing in solids: Sec. 11.5.1, Fig. 11.4 c Water waves: C Nonlinear wave-wave mixing 1 General theory: Chap. 9 2 Applications: a Holography: Sec. 9.3, Ex. 9.3 b Phase Conjugation: Secs. 9.4, 9.6.2; Fig. 9.10, Ex. 9.9 c Frequency doubling: Sec. 9.5.4, 9.6.1; Ex. 9.8 d Light Squeezing: Ex. 9.10 3 Venues: a In nonlinear crystals b In fluids i solitary waves ii onset of turbulence c In plasmas VI Equilibria and their stability A Bifurcations of equilibria and the onset of instabilities 1 General discussion and examples: Secs. 10.8, 11.4.5 2 Compressed beam or playing card: Secs. 10.8, 11.4.5 VII Computational Techniques A Differential Geometry; Vectors and Tensors: Chap. 1, Sec. 10.3, Part VI B Two-lengthscale expansions: Box 2.2

Page 4

04.AppB.11.1K.ooutline

9/26/04 8:59:43 PMPM

B Two-lengthscale expansions: Box 2.2 1 Solution of Boltzmann transport equation in diffusion approximation: Sec. 2.8 2 Semiclosed systems in statistical mechanics: Sec. 3.2 3 Statistical independence of subsystems: Sec. 3.4 4 As foundation for geometric optics: Sec 6.3 5 Boundary layers in fluid mechanics C Matrix and propagator techniques for linear systems 1 Paraxial geometric optics: Matrix methods: Sec. 6.4 2 Paraxial Fourier optics (finite wavelengths): Propagator methods: Sec. 7.5 D Green's functions 1 In elasticity theory: physicists' and Heaviside: Sec. 11.5.3 E Reciprocity relations and Wronskians conservation 1 For partially transmitting mirrors and beam splitters in optics: F Junction conditions at a boundary 1 In elastodynamics: Sec. 11.5.1, Ex. 11.6 G Normal modes 1 In a solid: Sec. 11.5.4, Ex. 11.5

Page 5

Ph 136a: General Relativity

1 October 2008

CHAPTER 1: PHYSICS IN EUCLIDEAN SPACE AND FLAT SPACETIME: GEOMETRIC VIEWPOINT Reading: Course Description; available at http://www.pma.caltech.edu/Courses/ph136/2008/ Preface and version 0801.1.K.pdf of Chapter 1 of Blandford and Thorne: Available at http://www.pma.caltech.edu/Courses/ph136/2008/ Note: Chapter 1 is much longer than the other chapters of the book. However, it should be fairly easy reading since the material it covers will largely be familiar from previous courses; only the viewpoint will be new. Problems — from version 0801.1.K.pdf of Chapter 1 of Blandford and Thorne. Please turn your solution in at the beginning of class (9AM) on Wednesday October 8. Note: I give you many options as to which problems to work. Please choose the ones that will teach you the most. Avoid those that are trivial for you, and abandon those that you find so difficult that you get badly hung up. If, in any problem, the choices I give are all trivial or all terribly difficult, then say so and pick some other problem from the chapter. A. Work one of the following: 1.1 Geometrized units 1.6 Numerics of component manipulations and 1.7 Meaning of slot-naming index notation 1.8 Index gymnastics 1.18 Vectorial identities (I especially recommend this one). B. Work one of the following: 1.9 Frame-independent expressions for energy, momentum and velocity 1.20 3+1 split of charge-current 4-vector C. Work one of the following: 1.11 Doppler shift without Lorentz transformation. 1.19 Reconstruction of F. D. If it is not trivial for you, do work 1.14 Spacetime diagrams. Otherwise work one of: 1.16 Twins paradox 1.17 Around the World on TWA E. Work one of 1.27 Global conservation of 4-momentum in a Lorentz frame. 1.30 Stress-energy tensor and energy-momentum conservation for a perfect fluid

Ph 136a

8 October 2008 CHAPTER 2: KINETIC THEORY

Reading: Chapter 2 of Blandford and Thorne. Most of the applications of kinetic theory are in Example Exercises. These Examples are designed to give you some feel for how kinetic theory is used in practice. Some of them are also rather interesting. I urge you to read all Example Exercises as though they were part of the text, even though you will be working only a few of them. Feedback on Problems Please provide feedback on the feedback website, https://courses.caltech.edu/login/ . The password for Ph136 is my coauthor’s name. Problems from Version 0802.1 of Chapter 2 A. Do either: 1. Exercise 2.6: Observations of cosmic microwave radiation from Earth; or 2. Both Exercise 2.2: Distribution function for particles with a range of rest masses, and Exercise 2.3: Regimes of particulate and wave-like behavior. B. Exercise 2.13: Collisionless Boltzmann implies conservation of particles and 4 momentum. C. Do either: 1. Exercise 2.11: Specific heat for phonons in an isotropic solid; or 2. Exercise 2.9: Equation of state for relativistic, electron-degenerate hydrogen. Don’t hesitate to use Mathematica or Macsyma or Maple to do the integrals you encounter, or to do any other part of this or any problem. This is not a course on mathematical methods. Our goal is to teach you physics. If a problem turns out to involve heavy computations from which you learn little (if the grunge to learning ratio is outrageously high), find a way around the grunge, or skip the grungy part of the problem (but write a note in your solution saying you are doing that and why), or even abandon the problem entirely and focus your learning efforts elsewhere — e.g. pick some other problem to work instead, and say so. D. Do either: 1. Exercise 2.14: Solar heating of the Earth: The greenhouse effect. This one has the virtue that we lead you through it step by step. OR 2. Exercise 2.15: Olber’s paradox and solar furnace. This one requires more independent thinking, and may be shorter than 2.9 if you think creatively. E. Exercise 2.19: Diffusion coefficient computed in the collision-time approximation. This is a rather long exercise, designed to give you experience doing a Boltzmanntransport analysis in the diffusion approximation. If you think you fully understood the Boltzmann-transport computation of thermal conductivity in the text, then I suggest that, in place of 2.19 you do • Exercise 2.20: Neutron diffusion in a nuclear reactor.

1

Ph 136a

15 October 2008 CHAPTER 3: STATISTICAL MECHANICS

Reading: Chapter 3 of version 0803.1.K of Blandford and Thorne — focusing on Sections 3.1 through 3.10 (pages 1 through 43). The remainder of the chapter should be of “cultural” interest, and you are encouraged to read and think about it. Problems from version 0803.1.K of Blandford and Thorne A. 1. If you are rusty on Hamiltonian mechanics or have never studied it, do Exercise 3.1 Canonical Transformations - Page 9 ; 2. Otherwise, do Exercise 3.14 Bose-Einstein Condensation in a Box - Page 43 B. Do three of the five parts (you choose which three) of Exercise 3.2 Estimating Entropy - Page 25 C. Do one of the following two exercises: 1. 3.3 Additivity of Entropy for Statistically Independent Systems - Page 25 2. 3.4 Entropy of a Thermalized Mode of a Field - Page 25 D. 1. If you have never done an exercise similar to this, do 3.6 Grand Canonical Ensemble for a Monatomic Gas - Page 32; otherwise 2. If you have never done an exercise similar to this, do 3.7 Probability Distribution for the Number of Particles in a Cell - Page 33; otherwise 3. Do Exercise 3.12 Discontinuous Change of Specific Heat (in Bose Einstein Condensation) - Page 42 E. Do Exercise 3.10 Primordial Element Formation - Page 35

1

Ph 136a

22 October 2008 CHAPTER 4: STATISTICAL THERMODYNAMICS

Reading: Chapter 4 of Blandford and Thorne. Problems - based on version 0804.1.K.pdf of the chapter A. Exercise 4.1: Pressure-Measuring Device B. Do Exercise 4.3: Enthalpy Representation for Thermodynamics; if you find it overly difficult and you are rusty on thermodynamics, instead do Exercise 4.2: Energy Representation for a Nonrelativistic Monatomic Gas. C. Do Exercise 4.5: Electron Positron Equilibrium at Low Temperatures. D. Do either Exercise 4.4: Latent Heat and Clausius Clapeyron Equation, or Exercise 4.9:One Dimensional Ising Lattice E. Do either Exercise 4.7: Fluctuations and Phase Transitions in a Van der Waals Gas (don’t hesitate to use Mathematica or Maple if appropriate), or Exercise 4.8: Fluctuations of Systems in Contact with a Volume Bath.

1

Ph 136a

29 October 2008 CHAPTER 5: RANDOM PROCESSES

Reading: Chapter 5 of Blandford and Thorne. Note: If all of the problems in one of the following sets are trivial for you, or are very difficult and the ratio of learning to effort in working on them is exorbitant, then you can select some other problem in this chapter, from which you will learn a lot, and work it. However, if you do this, you must give a clear explanation as to why you are switching to the other problem and why you selected that one; and the TAs can remove points if your explanation is not reasonable. Problems - based on version 0805.1.K.pdf of the chapter A. Do either: 1. Exercise 5.2 parts (a) and (b) — finite-Fourier-transform filter, or 2. Exercise 5.2 parts (c) and (d) — averaging filter. B. Do either: 1. Exercise 5.3: Wiener’s optimal filter, or 2. Exercise 5.4: Allan Variance of Clocks, or 2. Exercise 5.5: Cosmological density fluctuations C. Do either: 1. Exercise 5.6: Noise in an L-C-R circuit, or 2. Exercise 5.10: Equations for A and B D. Do either: 1. Exercise 5.7: Thermal noise in a resonant-mass gravitational-wave detector, or 2. Exercise 5.8: Fluctuations of mirror position as measured by a laser E. Do either: 1. Exercise 5.11: Solution of Fokker-Planck equation for Brownian motion of a dust particle, or 2. Exercise 5.12: Solution of Fokker-Planck equation for an oscillator. [Note: you are likely to learn more from this problem than from 5.11; but if you are having trouble understanding Fokker-Planck theory, then 5.11 may be more useful.]

1

Ph 136a

5 November 2008 CHAPTER 6: GEOMETRIC OPTICS

Reading: Chapter 6 of Blandford and Thorne. Problems from version 0806.1.K.pdf or 0806.2.K.pdf A. If you have never explored wave-packet spreading, Do 1. Exercise 6.2 Gaussian wave packet and its spreading. Otherwise, do: 2. Exercise 6.4 Gravitational waves from a spinning neutron star. B. Do: Exercise 6.8 Geometric optics for the Schroedinger equation C. Do: 1. Exercise 6.10 Matrix optics for a refracting telescope, or 2. Exercise 6.11 Rays bouncing between two mirrors D. If you did not do Exercise 6.4, then do 1. Exercise 6.3 Quasi-spherical solution to vacuum scalar wave equation. Otherwise do: 2. Exercise 6.6 Sound waves in a wind, or 3. Exercise 6.12 Stellar gravitational lens

1

Ph 136a

12 November 2008 CHAPTER 7: DIFFRACTION

Reading: Chapter 7 of Blandford and Thorne. Problems A. Do: 1. 2. 3. B. Do: 1. 2. C. Do: 1. 2. D. Do: 1. 2.

Exercise 7.2 Pointillist painting, or Exercise 7.3 Thickness of a human hair, or Exercise 7.7 Light scattering by particles Exercise 7.4 Diffraction grating, or Exercise 7.10 Seeing in the atmosphere Exercise 7.8 Diffraction pattern from a slit, or Exercise 7.9 Zone plane Exericse 7.13 Convolution via Fourier optics, or Exericse 7.15 Noise due to scattered light in LIGO

1

Ph 136a

19 November 2008 CHAPTER 8: INTERFERENCE

Reading: Chapter 8 of Blandford and Thorne. Problems A. Do: 1. 2. B. Do: 1. 2. C. Do: 1. 2. D. Do: 1. 2.

Exercise 8.2 Lateral coherence of solar radiation, or Exercise 8.4 Longituidinal coherence of heavy metal rock music Exercise 8.7 Complex random processes, or Exercise 8.10 Reciprocity relations for locally planar optical device Exercise 8.5 Microwave background radiation, or Exercise 8.8 Interferometry from space Exercise 8.12 Antireflection coating, or Exercise 8.14 Phase shift in LIGO arm cavity. NOTE: I made an error in my lecture on Wednesday - I claimed the phase ϕr1 of the light emerging from a LIGO cavity (one with an infinitely reflecting end mirror) on resonance varies with changing cavity length d1 in the same manner as for the light transmitted by a cavity with identical mirrors. This is not so. Compare the phase of ψr1 in Eq. (8.44a) near resonance with that of ψt in Eq. (8.33a) near resonance.

E. Do: 1. Exericse 8.16 Electron intensity interferometer, or 2. Exercise 8.15 Photon shot noise in LIGO

1

Ph 136a

26 November 2008 CHAPTER 9: NONLINEAR OPTICS

Reading: Chapter 9 of Blandford and Thorne

Problems from Version 0809.1.K A. Do: 1. Ex. 9.2, Holographically reconstructed wave, OR 2. Ex. 9.6, Dispersion relation for an anisotropic medium B. Ex. 9.3, Compact disks, DVDs and Blue Ray Disks. C. Do: 1. Ex. 9.4, Nonlinear susceptibilities for an isotropic medium, OR 2. Ex. 9.8, Efficiency of frequency doubling. D. Do: 1. Ex. 9.5, Growth equation in idealized three-wave mixing, OR 2. Ex. 9.7, Growth equation in realistic wave-wave mixing. E. Do: 1. Ex. 9.9, Phase conjugation by four-wave mixing, OR 2. Ex. 9.10, Squeezed light produced by phase conjugation.

1

Ph 136b

7 January 2009 CHAPTER 10: ELASTOSTATICS

Reading: Chapter 10 of Blandford and Thorne.

Problems - to be turned in at the beginning of class on Wednesday 14 January A. Do one of the following: 1. Exercise 10.2,The displacement vectors associated with expansion, rotation and shear; OR 2. Exercise 10.3, Elastic force density, AND 10.5 Elastic energy. B. Do Exercise 10.7, Order of magnitude estimates. C. If you have never before worked with connection coefficients, then: 1. Do Exercise 10.9, Connection in Spherical Coordinates, AND the Spherical part of Exercise 10.10, Expansion in Cylindrical and Spherical Coordinates. Otherwise: 2. Do Exercise 10.12 Torsion pendulum D. Do one of the following: 1. Exercise 10.19, Dimensionally reduced shape equation for a stressed plate; OR 2. Exercise 10.17, Elastica

1

Ph 136a

14 January 2007 CHAPTER 11: ELASTODYNAMICS

Reading: Chapter 11 of Blandford and Thorne. Problems: solutions due in class on Wedneday, 21 January Note: Monday January 19 is Martin Luther King’s birthday, so no class. However, I will be available in my office, 154 West Bridge at my new office-hour time, 4PM, to discuss the homework and coursework, for anyone who wants to do so on a holiday. A. Do one of the following two exercises: 1. Exercise 11.2 Influence of gravity wave on speed 2. Exercise 11.7 Xylophones B. Do one of the following two exercises: 1. Exercise 11.3 Solving the algebraic wave equation by matrix techniques 2. Exercise 11.6 Speeds of elastic waves C. Do Exercise 11.8, Free-energy analysis of buckling instability D. Do one of the following two exercises: 1. Exercise 11.4 Lagrangian and energy density for elastodynamic waves 2. Exercise 11.10 Reflection and transmission of normal, longitudinal waves at a boundary E. Do Exercise 11.11 Earthquakes

1

Ph 136a

21 January 2009

CHAPTER 12: FOUNDATIONS OF FLUID DYNAMICS - First half Reading: Blandford and Thorne, Sections 12.1 – 12.6 of Chapter 12 (through page 31 but excluding Box 12.3 on Self Gravity) Problems: solutions due in class on Wedneday, 28 January A. Do one of the following: 1. Ex. 12.1, Weight in Vacuum, AND Ex. 12.4, Stability of Boats. OR: 2. Ex. 12.8, A hole in my bucket. B. Do one of the following: 1. Ex. 12.7, Shapes of stars in a tidally locked binary system. OR: 2. Ex. 12.9, Rotating planets, stars and disks. C. Do: Ex. 12.10, Crocco’s theorem. D. Do one of the following: 1. Ex. 12.12, Cavitation. OR: 2. Ex. 12.13, Collapse of a bubble.

1

Ph 136a

28 January 2008

CHAPTER 12: FOUNDATIONS OF FLUID MECHANICS – second half, and CHAPTER 13: VORTICITY – first half Reading: Chapter 12 of Blandford and Thorne: Box 12.3 and Section 12.7. Chapter 13 of Blandford and Thorne: Sections 13.1, 13.2, 13.4 [skip 13.3] Movies To Be Viewed via Streaming: [The Encyclopedia Brittanica evidently has bought the copyright to these movies, has digitized them, and permits them to be streamed by MIT but not to be downloaded.] Vorticity, Part 1 and Vorticity, Part 2. At http://web.mit.edu/hml/ncfmf.html Note: the film clips that I showed in class on Monday were from these movies. Viewing movies such as this is a much better way to build up your physical intuition about fluid mechanics, than reading any textbook. Problems: solutions due in class on Wedneday, 4 February A. Do: 1. 2. B. Do: 1. 2. C. Do:

Ex. 12.16 Mean Free Path, OR Ex. 12.17 Kinematic Interpretation of Vorticity Ex. 13.1 Vorticity and Incompressibility, OR Ex. 13.2 Joukowski’s Theorem Ex. 13.3 Rotating Superfluid

D. Do: 1. 2. E. Do: 1. 2.

Ex. 13.6 Potential Flow Around a Cylinder, OR Ex. 13.9 Laminar Flow Down a Long Pipe Ex. 13.7 Reynolds’ Numbers, OR Ex. 13.8 Fluid Dynamical Scaling

1

Ph 136a

4 February 2009 CHAPTER 13: VORTICITY – second half, and CHAPTER 14: TURBULENCE – first half

Reading: Chapter 13 of Blandford and Thorne: Sections 13.3, 13.5, 13.6 Chapter 14 of Blandford and Thorne: Sections 14.1, 14.2, 14.3 [Note: I will lecture on 14.3 on Monday] Movies To Be Viewed: These movies can be streamed from http://web.mit.edu/hml/ncfmf.html but they cannot be downloaded. Rotating Flows by Dave Fultz Low Reynolds Number Flows by G. I. Taylor The film clips that I showed in class on Monday and Wednesday were from these movies. Viewing movies such as this is a much better way to build up your physical intuition about fluid mechanics than reading any textbook. Problems: solutions due in class on Wedneday, 4 February A. DO: Exercise 13.10 Winds and ocean currents in the north Atlantic B. DO: Exercise 13.11 Circulation in a tea cup C. In the movie Low Reynolds Number Flows, 26 minutes 58 seconds into the movie, G.I. Taylor shows two small swimming devices, one whose driving force is produced by a paddle rotated back and forth about a central axis, and the other whose driving force is produced by a turning wire corkscrew. a. Explain physically why the paddle works at high Reynolds number but not low, and derive an approximate formula for the force that it exerts to propel the device forward at high Reynolds’ nunber. b. Explain physically why the turning wire corkscrew works at low Reynolds number but not high, and derive an approximate formula for the force that it exerts to propel the device forward at low Reynolds’ number. D. Do: 1. Exercise 14.1 part (b): Spreading of a Laminar Wake Behind a Sphere, AND Exercise 14.4 part (b): Turbulent Wake Behind a Sphere; OR 2. Exercises 14.2: Spreading of a 2-Dimensional Laminar Jet, AND Exercise 14.5 part (a): Spreading of a 2-Dimensional Turbulent jet. [Note: This is more interesting and challenging than option 1, but will require more work.] E. Do: Exercise 14.3 Reynolds Stress and Weak Turbulence Theory

1

Ph 136b

11 February 2009 CHAPTER 14: TURBULENCE — second half CHAPTER 15: WAVES AND CONVECTION — first part

Reading: Chapter 14 of Blandford and Thorne: Sections 14.4 and 14.5 Chapter 15 of Blandford and Thorne: Sections 15.1 and 15.2 Suggested movies for download and viewing: 1. Turbulence by Robert W. Stewart. 2. The water-wave portions of Waves in Fluids by A.E. Bryson. These movies are available at http://web.mit.edu/hml/ncfmf.html Problems A. DO ONE OF: 1. Exercise 14.7 Excitation of Earth’s Normal Modes by Atmospheric Turbulence, OR 2. Exercise 14.8 Effect of Drag B. DO ONE OF: 1. Exercise 14.10 Feigenbaum Sequence, OR 2. Exercise 14.12 Strange Attractors C, D. DO TWO OF THE FOLLOWING THREE EXERCISES: 1. Exercise 15.1 Fluid motions in gravity waves, OR 2. Exercise 15.6 Boat waves, OR 3. Exercise 15.7 Shallow water waves with variable depth; tsunamis E. DO ONE OF: 1. Exercise 15.2 Maximum size of water droplets, OR 2. Force balance for an interface between two fluids

1

Ph 136b

18 February 2009 CHAPTER 15: WAVES AND CONVECTION — middle part

Reading: Chapter 15 of Blandford and Thorne version 0815.3.K.pdf: Sections 15.3, 15.4, 15.5 Suggested movies for download and viewing: 1. Waves in Fluids by A.E. Bryson. 2. The short Rossby wave portion of Rotating Fluids by D. Fultz [beginning about 20 minutes, 30 seconds into the movie. I showed this in class on Wednesday. These movies are available at http://web.mit.edu/hml/ncfmf.html Problems Notes: * Please make sure you do the problems from version 0815.3.K.pdf of the chapter, not an earlier version. * This problem set is worth 30 points rather than the usual 50. A. DO 1. 2. B. DO C. DO 1. 2.

ONE OF: Exercise 15.8 Breaking of a Dam, OR 15.10 Two Soliton Solution, parts (b) and (c). Exercise 15.11 Rossby Waves in a Cylindrical Tank with Sloping Bottom ONE OF 15.14 Radiation Reaction without the Slow Motion Approximation, OR 15.15 Sound Waves from a Ball Undergoing Quadrupolar Oscillations

1

Ph 136b

25 February 2009

CHAPTER 15: WAVES AND CONVECTION — last, convection part CHAPTER 16: COMPRESSIBLE AND SUPERSONIC FLOW — first part Reading: Chapter 15 of Blandford and Thorne version 0815.4.K [NEW VERSION SINCE LAST WEEK]: Section 15.6 — particularly 15.6.1, 15.6.2, 15.6.3 (the rest is for your cultural edification) Chapter 16 of Blandford and Thorne version 0816.1.K: Secs. 16.1–16.4. Suggested movie for download and viewing: Channel Flow of a Compressible Fluid by Donald Coles. This movie is available at http://web.mit.edu/hml/ncfmf.html Problems Notes: * Please make sure you do the problems from version 0815.4.K.pdf of chapter 15, not an earlier version. * This problem set is worth the usual 50 points. A. DO ONE OF: 1. Exercise 15.17 Poiseuille flow with a uniform temperature gradient, OR 2. Exercise 15.18 15.18 Thermal boundary layers B. DO: Exercise 15.20 Width of a thermal plume C. DO: Exercise 16.1 Values of gamma D. DO: Exercise 16.4 Adiabatic, spherical accretion onto a black hole E. DO: Exercise 16.6 Riemann invariants for shallow-water flow

1

Ph 136a

4 March 2009

CHAPTER 16: COMPRESSIBLE AND SUPERSONIC FLOW—last part CHAPTER 17: MAGNETOHYDRODYNAMICS—first part Reading: Secs. 16.5 and 16.6 of Chapter 16, Compressible & Supersonic Flow, and also Secs. 17.1, 17.2 and 17.4 [not 17.3] of Chapter 17, Magnetohydrodynamics Problems A. Shocks: DO one of 1. Exericise 16.7 Hydrolic jumps and breaking ocean waves 2. Exercise 16.11 Relativistic shock B. Similarity solutions: DO one of 1. Exercise 16.12 Underwater Explosions 2. Exercise 16.14 Similarity Solution for Shock Tube C. MHD: DO TWO OF the following three problems 1. Exercise 17.2, Diffusion of Magnetic Field 2. Exercise 17.3, The Earth’s Bow Shock 3. Exercise 17.6 Hartmann Flow

1

Ph 136a

11 March 2009 CHAPTER 17: Magnetohydrodynamics—second part

Reading: Secs. 17.3, 17.5, 17.6 and 17.7 of Chapter 17, Magneothydordynamics Problems - from version 0817.2.K.pdf, not 0817.1.K.pdf. Note, 0817.2.K.pdf differs from last week’s 0817.1.K.pdf only by the addition of Exercise 17.8 and the bumping up of the numbers of subsequent exercises. NOTE: this problem set is worth 30 problems instead of the usual 50. Turn in your solutions to Kip’s mailbox in the upper left corner of the set of mailboxes at the west end of the corridor between West and East Bridge, no later than 1PM Wednesday 18 March. If you are concerned about passing this course on the basis of homework, then turn your solutions in earlier and ask the TAs to grade them as soon as possible (email the TAs at [email protected] and [email protected] ). Kip will send email to anyone who must take a final exam in order to pass the course, as soon as this is known. He will also send emails to people who have passed the course on the basis of homework, so informing them. DO 1. 2. 3. 4. 5.

THREE OF the following five problems. Exercise 17.7, Properties of Eigenmodes Exercise 17.8, Reformulation of the Energy Principle Exercise 17.9, Differential Rotation in the Solar Dynamo Exercise 17.11, Rotating Magnetospheres Exercise 17.12, Solar Wind

1

Ph 136c

1 April 2009 CHAPTER 18: PARTICLE KINETICS OF PLASMA

Reading: Chapter 18, The Particle Kinetics of Plasma Problems A. DO 1. 2. B. DO 1. 2. C. DO 1. 2. D. DO 1. 2.

one of: Exercise 18.1 Boundary of degeneracy; OR Exercise 18.6 Parameters for various plasmas one of: Exercise 18.5 Stopping of α-particles, OR Exercise 18.7 Equilibration time for a globular star cluster one of: Exercise18.4 Dependence of thermal equilibration on charge and mass; OR Exercise 18.11 Adiabatic indices for rapid compression of a magnetized plasma one of: Exercise 18.8 Thermoelectric transport coefficients; OR Exercise 18.12 Mirror machine

1

Ph 136c

8 April 2009

CHAPTER 19 - WAVES IN COLD PLASMAS: TW0-FLUID FORMALISM and CHAPTER 20 - KINETIC THEORY OF WARM PLASMAS Reading: 1. 2. 3. 4.

Secs. 19.1, 19.2, 19.3 and 19.7 OPTIONAL: Secs. 19.4, 19.5, 19.6 Secs. 20.1, 20.2, 20.3 and 20.4 OPTIONAL: Secs. 20.5, 20.6

Problems A. DO 1. 2. B. DO 1. 2. C. DO 1. 2. D. DO E. DO 1. 2.

one of: Exercise 19.4 Ion acoustic waves, OR Exercise 19.6 Dispersion and Faraday rotation of pulsar pulses one of: Exercise 19.5 Ion acoustic solitons; OR Exercise 19.9 Exploration of modes in CMA diagram – do any two of the three parts. one of: Exercise 19.8 Reflection of short waves by the ionosphere, OR Exercise 20.1, Two-fluid Equation of Motion Exercise 20.4, Landau Contour Deduced Using Laplace Transforms one of: Exercise 20.5, Ion Acoustic Dispersion Relation, OR Exercise 20.8, Range of Unstable Wave Numbers

1

Ph 136c

15 April 2009 CHAPTER 21 NONLINEAR DYNAMICS OF PLASMAS

Reading: Chapter 21, Nonlinear Dynamics of Plasmas Problems A. DO 1. 2. 3. B. DO 1. 2.

two of: Exercise 21.1 Non-resonant Particle Energy in Wave Exercise 21.2 Energy Conservation Exercise 21.3 Power in Electrostatic Waves one of: Exercise 21.4 Electron Fokker-Planck Equation OR Exercise 21.5 Three-Wave Mixing — Ion-Acoustic Evolution. Note: the last part of this exercise is technically difficult but the first three parts should be relatively easy. C. DO Exercise 21.6 Three-Wave Mixing — Langmuir Evolution. If you did not do Exercise 21.5, then read it and think about it before tackling 21.6

1

Ph 136a

22 April 2009 CHAPTER 22

Reading: Chapter 22, From Special to General Relativity. Students who were not in this class first term may need to read some sections of Chapter 1, which are cross referenced in Chapter 22. Students who were not in this class second term should read the introduction to Connection Coefficients in Sec. 10.3. Problems NOTE: The students in this class have a wide variety of backgrounds in relativity theory, so problems that are appropriate for some students are inappropriate (too sophisticated or too elementary) for others. Choose four problems appropriate for you from the following selection. A. Ex. 22.1 Invariance of a Null Interval AND Ex. 22.2: Causality B. Ex. 22.4: Index manipulation rules from duality C. Parts (a) and (b) of Ex. 22.6: Commutation and connection coefficients for circular polar bases. Also part (b) of Ex. 22.5: Transformation matrices for circular polar bases D. Ex. 22.9: Index gymnastics — irreducible tensorial parts of the gradient of a 4-velocity field E. Ex. 22.10: Integration — Gauss’s theorem F. Ex. 22.11: Stress-energy tensor for a perfect fluid G. Ex. 22.14: Stress-energy tensor for a point particle H. Ex. 22.15: Proper reference frame I. Ex. 22.16 Uniformly accelerated observer

1

Ph 136a

29 April 2009 CHAPTER 23

Reading: Chapter 23, Fundamental Concepts of General Relativity. Problems Work five of the following ten problems. Pick problems that are appropriate for you—not too easy; not too hard; different from anything you have done in previous courses. Note: If you are really eager to learn general relativity, you may want to work more than five problems, but please indicate which ones you want to be graded. Exercise Exercise Exercise Exercise Exercise Exercise Exercise Exercise Exercise Exercise

24.3 Geodesic equation in an arbitrary coordinate system 24.4 Constant of geodesic motion in a spacetime with symmetry 24.5 Action principle for geodesic motion 24.7 Orders of magnitude of the radius of curvature 24.8 Components of Riemann in an arbitrary basis 24.9 Curvature of the surface of a sphere 24.10 Geodesic deviation on a sphere 24.12 Newtonian limit of general relativity 24.13 Gauge transformations in linearized theory 24.14 External field of a stationary, linearized source

1

Ph 136a

6 May 2009 CHAPTER 24

Reading: Chapter 24, Relativistic Stars and Black Holes. Problems Work five of the following ten problems. Pick problems that are appropriate for you—not too easy; not too hard. I especially recommend the exercises in bold face • Exercise 24.1 Connection Coefficients and Riemann in Schwarzschild. Note: If you know Mathematica or Maple fairly well, you can fairly easily write your own programs to compute connection coefficients and curvature tensors. You will find programs written by others at the following web sites: * The site associated with James Hartle’s textbook on general relativity: http://wps.aw.com/aw hartle gravity 1/0,6533,512494-,00.html * The GRTensor web site: http://grtensor.phy.queensu.ca/ • Exercise 24.2 The Bertotti-Robinson Solution • Exercise 24.3 Schwarzschild Geometry in Isotropic Coordinates • Exercise 24.4 Star of Uniform Density • Exercise 24.5 Gravitational Redshift • Exercise 24.7 Implosion of the Surface of a Zero-Pressure Star • Exercise 24.8 Gore at the Singularity • Exercise 24.9 Wormholes • Exercise 24.11 Penrose Process, Hawking Radiation, & Black-Hole Thermodynamics • Exercise 24.12 Slices of Simultaneity in Schwarzschild Spacetime

1

Ph 136a

13 May 2009 CHAPTER 25

Reading: Chapter 25, Gravitational Waves and Experimental Tests of General Relativity. Problems Work four of the following eight problems. Pick problems that are appropriate for you—not too easy; not too hard. A. B. C. D. E. F. G. H.

Exercise Exercise Exercise Exercise Exercise Exercise Exercise Exercise

25.4 Behavior of h+ and h× under Rotations and Boosts 25.5 Energy-momentum conservation in geometric optics limit 25.6 Transformation to TT gauge 25.9 Energy removed by gravitational radiation reaction 25.10 Propagation of waves through an expanding universe 25.11 Gravitational waves emitted by a linear oscillator 25.12 Gravitational waves from waving arms 25.14 Light in an interferometric gravitational wave detector in TT gauge

1

Ph 136a

20 May 2009 CHAPTER 26

Reading: Chapter 26, Cosmology. Problems NOTE: This is the last problem set of the term. Work any four of the following problems: Exercise Exercise Exercise Exercise Exercise Exercise Exercise Exercise

26.2 The 3-Sphere Geometry of a Closed Universe 26.3 Energy Conservation for a Perfect Fluid 26.5 Einstein’s Static Universe 26.6 Cosmological Redshift 26.7 Cosmic Microwave Radiation in an Anisotropic Cosmological Model 26.8 Angular-Diameter Distance 26.9 Magnitude-Redshift Relation 27.11 Inflationary Explanation for the Isotropy of the CMB

1

Ph 136: Solution for Chapter 1 (compiled by Nate Bode and Dan Grin, revised by Kip Thorne)

A 1.1 Geometrized Units [by Alexei Dvoretskii] (a) We have three parts to answer: r G~ in cgs (i) tP = c5 (ii) tP = 5.36 × 10−44 s; (iii)

tP = 1.61 × 10−35 m

v (b) In cgs we have m dv dt = e(E + c × B) (c) Again in cgs, p = ~ω c n Final Part: My height: 5.9 × 10−9 s; My age: 2.5 × 1019 cm

1.6 Numerics of Component Manipulations [by Xinkai Wu, edited by Daniel Grin and Nate Bode] (a) Sijk Aj B k = (0, 0, 15), Sijk Ai B k = (0, 0, 0). If Yij ≡ Ai Bj , then Y11 = 12, Y12 = 15, and all other components are zero. (b) ~ A) ~ = T αβ Aα Aβ = −9; T(A, ~ ) as B, ~ then B β = T αβ Aα =⇒ denote T(A, B 0 = 1, B 1 = −4, B 2 = B 3 = 0; ~ ⊗ T as S, then S αβγ = Aα T βγ =⇒ denote A S 000 = 3, S 001 = S 010 = 2, S 011 = −1, S 100 = 6, S 101 = S 110 = 4, S 111 = −2, all other components vanish

1.7 Meaning of Slot Naming Index Notation [by Alexei Dvoretskii] ~⊗B Aα B βγ means A βα ~ Aα B means B( , A) Sαβγ = Tγβα means S = Twith slots 1 and 3 ~·B ~ = g(A, ~ B) ~ Aα B α = gµν Aµ B ν means A

1.8 Index Gymnastics [by Nate Bode and Dan Grin]

interchanged

(a) Aαβγ gβρ Sγλ g ρδ g λ α

= Aαβγ gβρ g ρδ g λ α Sγλ = Aλδγ Sγλ

(b) gαβ g αβ = (−1)2 + 12 + 12 + 12 = 4 (c) The expression Aα βγ Sαγ is a sum over γ since it is a repeating index which appears both upstairs and downstairs. In contrast, α only appears downstairs, so we cannot apply the Einstein summation convention. In this case, there is no consistent interpretation which obeys our index-manipulation laws in spacetime. (If we were in Euclidean space with its metric coefficients equal to the Kronecker delta, we would not care whether an index is up or down, and this expression would then be acceptable.) In the case of the equation Aα βγ Sβ Tγ = Rαβδ S β we have a free index (an index not being summed over) on the left hand side (α), while we have two free indices on the right hand side (α and δ). This would imply that a rank 1 tensor is equal to a rank 2 tensor, something which is obviously not possible.

1.18 Vectorial Identities for the Cross Product and Curl[by Alexei Dvoretskii] (a) (∇ × (∇ × A))i

= ijk (∇ × A)k;j = ijk klm Am;lj =

j i j (δli δm − δm δl )Am;lj = Aj;ij − Ai;jj = Aj;ji − Ai;jj

=

(∇(∇ · A) − ∇2 A)i =⇒ ∇ × (∇ × A) = ∇(∇ · A) − ∇2 A

In the above equations all indices (slots) that follow the semicolon are gradient indices. (b) (A × B) · (C × D)

=

ijk Aj Bk ilm Cl Dm = jki lmi Aj Bk Cl Dm

=

k j k (δlj δm − δm δl )Aj Bk Cl Dm = Aj Bk Cj Dk − Aj Bk Ck Dj

=

(A · C)(B · D) − (A · D)(B · C)

(c) using the identity demonstrated at the beginning of the problem E × (F × G) = (E · G)F − (E · F)G we easily get (A × B) × (C × D) = [(A × B) · D] C − [(A × B) · C] D

B 1.9 Frame-Independent Expressions for Energy, Momentum, and Velocity [by Alexei Dvoretskii] (a) p~2 = (m~u)2 = m2 ~u2 , using ~u2 = −1 one gets the desired result. ~ = −p0 ; and by (b) In part a we showed (in the observer’s frame) p~ · U ~ )2 + p~ · p~|1/2 = |(p0 )2 − (p0 )2 + definition p~ · p~ = −(p0 )2 + p2 . Thus we have |(~ p·U 2 1/2 p | = |p| √ (c) p = mγv, E = mγ, where γ = 1/ 1 − v2 , thus we have |v| = |p|/E ~ = −p0 , and v = p/E = (d) From the previous parts we already know p~ · U 0 p/p . So we have ~ )U ~ (0, p) p~ + (~ p·U (p0 , p) − p0 (1, 0) = = (0, v) = ~v = 0 ~ p p0 −~ p·U

1.20 3+1 Split of Charge-Current 4-Vector [by Xinkai Wu] In the rest frame of an observer with 4-velocity w, ~ the charge-current 4vector J~ = (ρw~ , j), w ~ = (1, 0); and the 4-vector ~jw~ = (0, j). As can be easily verified, in this frame ρw~ = −J~ · w, ~ and ~jw~ = J~ + (J~ · w) ~ w. ~ Inverting these ~ ~ two expressions gives J = jw~ + ρw~ w. ~ These relations are written in a frameindependent way, thus valid in any Lorentz frame.

C 1.11 Doppler Shift Derived without Lorentz Transformations [by Alexei Dvoretskii] (a) a) The case of a photon: In frame F (in which the emitting √ atom appears to be moving with 3-velocity ~ = (γ, γv) with γ = 1/ 1 − v2 , and p~ = (EF , EF n). Then using Eq. v) , U (1.38), we find the photon energy as measured by the emitting atom to be ~ = EF γ(1 − v · n), i.e. EF /E = 1/[γ(1 − v · n)]. E = −~ p·U (b) The case of a particle with finite rest mass m: p ~ is same as in the photon case, but p~ = (EF , |p|n), where |p| = E 2 − m2 . Now U F p ~ = γ(EF − E 2 − m2 v · n). And we find E = −~ p·U F 1.19 Reconstruction of F [by Alexei Dvoretskii] Just like in the derivation of (1.65a), we only need to show (1.65b) holds in the rest frame of the observer w(since ~ it’s written in a frame-independent way, it’s true in any Lorentz frame if it’s true in the observer’s rest frame). In this

j j 0 0 frame, w0 = 1, wj = 0, and Ew ~ = 0, Ew ~ = 0, Bw ~ = Ej , Bw ~ = Bj . Both sides of (1.65b) are manifestly antisymmetric in (α, β), thus we only need to check the (0j) and (ij) components. j 0j j 0 γ δ F 0j = Ej , while the r.h.s. of (1.65b) is given by w0 Ew ~ + γδ w Bw ~, ~ − w Ew ~ ~ using the component forms of w ~ and Ew~ , Bw~ given above one easily finds r.h.s. = Ej . j j i F ij = ijk Bk , while the r.h.s. of (1.65b) is given by wi Ew ~ + ~ − w Ew ij γ δ ~ ~ γδ w Bw~ . Again, using the component forms of w ~ and Ew~ , Bw~ one easily finds r.h.s. = ijk Bk .

D 1.14 Spacetime Diagrams [by Alexei Dvoretskii] The spacetime diagrams are Fig. 1 through Fig. 6. In these figures, we use t0 , x0 to denote t¯, x ¯, and θ = tan−1 β. (a) (See Fig. 1(a)) Events A and B are simultaneous in F¯ . However because of the slope a t¯ = const line has in frame F , A will occur before B in frame F (A is the event that’s ”farther back”). (b) (See Fig. 1(b)) Events A and B occur at the same spatial location in F¯ but not in F . t

t

t’

t’

"

B A # !

x’

!

x’

!

!

x

x

(a) Diagram for problem 1.14(a)

(b) Diagram for problem 1.14(b)

Figure 1: Diagrams for Problem 1.14 (a) and (b) (c) (See Fig. 2(a)) If P1 and P2 have a timelike separation, then P2 lies inside the light cone and θ < 45◦ . Hence in a boosted frame with √ β = tan θ < 1 the two events will occur at the same spacial location. In F¯ , −∆s2 = ∆τ = ∆t¯. (d) (See Fig. 2(b)) Analogously P2 will lie outside of the light cone and −−−→ hence the angle θ(between P1 P2 and the x-axis) is less than 45◦ . By boosting −−−→ into F¯ with tan θ = β < 1 we see that P1 P2 is parallel to the x ¯ axis, i.e. in F¯ √ 2 these two events are simultaneous. And we have ∆S = |∆¯ x|. (e) (See Fig. 3(a). In the figure, the hyperbola is given by t2 − x2 = t¯2 .)

t

t

t’

t’

P2

!

light cone

light cone P2

x’

! !

x’

!

!

P1

P1 !

x

x

(a) Diagram for problem 1.14(c)

(b) Diagram for problem 1.14(d)

Figure 2: Diagrams for Problem 1.14 (c) and (d) Let’s consider how much time will elapse as measured by observers in F and F¯ between O and P . (∆t¯)2 = (∆t)2 − (∆t)2 tan2 θ = (∆t)2 (1 − β 2 ), and thus ∆t¯ = ∆t/γ, i.e. time is slowed in a boosted frame. (f ) (See Fig. 3(b). In the figure, the hyperbola is given by x2 − t2 = x ¯2 .) 2 2 2 2 2 2 By analogous reasoning, (∆¯ x) = (∆x) − (∆x) tan θ = (∆x) (1 − β ), thus ∆¯ x = ∆x/γ,i.e. objects in a boosted frame are contracted along the boost. since there are no boosts along y and z, the length along those axes is unchanged. t

t

t’

t’

hyperbola

hyperbola

P

x’

!

x’

!

P !

! x

O

O

x

(b) Diagram for problem 1.14(f)

(a) Diagram for problem 1.14(e)

Figure 3: Diagrams for Problem 1.14 (e) and (f)

1.16 Twins Paradox [by Xinkai Wu, modified by Nate Bode and Dan Grin] a Since ~u2 = −1, we have 0 = d(~u · ~u)/dτ = 2~u · d~u/dτ = 2~u · ~a. In the observer’s rest frame ~u =√(1, 0), thus 0 = ~u · ~a =⇒ ~a = (0, a). So we get ~a · ~a = a2 , namely, |a| = ~a · ~a. This is telling us that the three acceleration measured in the rest frame of the accelerated object is equal to the magnitude of the four acceleration in any frame. b Denote x0 , x1 coordinates in Methuselah’s ref. frame as t, x, and the

proper time of Florence as τ . We have dt dx = u0 , = u1 dτ dτ du0 du1 = a0 , = a1 dτ dτ In part (a) we showed that even in the frame of Methuselah g 2 = ~a · ~a = −(a0 )2 + (a1 )2 Moreover, 0 = ~a · ~u = −a0 u0 + a1 u1 Since the symmetry of the problem tells us that Florence will, in each quadrant of her trip, be spending the same amount of time going any given velocity as in any other quadrant. Therefore, we only need to determine the relative time in the first quadrant and multiply by 4. We solve our two equations above to get a0 = g · u1 and using the invariance of the norm of the four velocity we can rewrite this as (

d 0 2 u ) = g(−1 + (u0 )2 dτ

Solving for u0 we get

gτ ) 2 where we have used the initial condition that τ = 0. We integrate this from τ = 0 to τ = TFlorence /4 and finally find that u0 = 1 + 2 sinh(

TMethuselah =

4c gTFlorence sinh( ) g 4c

where we have multiplied by the factor of 4 and adding appropriate factors of c to find the final relative time. c In geometric units, the acceleration at the surface of the Earth is g = 980 cm s−2 /c = 1.033 yr−1 . Note that as TFlorence increases, TMethuselah grows exponentially (see Fig. 4). A few numerical values are given below: TFlorence = 10 years gives TMethuselah = 25 years TFlorence = 50 years gives TMethuselah = 7.6 × 105 years TFlorence = 80 years gives TMethuselah = 1.7 × 109 years 1.17 Around the World on TWA [by Xinkai Wu] The 1972 Science papers of Hafele and Keating explain the experiment well. Let’s summarize it as follows. We analyze this problem in the non-rotating inertial frame whose origin coincides with the earth’s center. Denote the proper time as measured by the

Figure 4: 1.16 eastward clock, the westward clock, and the clock in the ground laboratory as τe , τw , τg , respectively. For a clock moving with a speed v in this frame, to leading order of relativistic corrections, its proper time is related to the coordinate time by dτ = [1 − g(R − h)/c2 − v2 /2c2 ]dt, where R is the earth’s radius, h is the clock’s altitude, and g the surface value of the acceleration of gravity. The 2nd term in this expression is a general relativistic effect while the 3rd term is a special relativistic one. For the clock in the ground lab, vg = ΩR cos λg eφ , with Ω being the angular velocity of the earth’s rotation, and λg the clock’s latitude. For the eastward clock, ve = (ΩR cos λe + ν cos θe )eφ + ν sin θe eθ , with λe being the eastward clock’s latitude, ν being its speed with respect to the ground, and θe being the angle between its velocity and the eastward direction. For the westward clock, vw = (ΩR cos λw + ν cos θw )eφ + ν sin θw eθ , with λw being the westward clock’s latitude, and θw being the angle between the its velocity and the eastward direction. Note that cos θe > 0 while cos θw < 0. Using the above facts and eliminating dt, we find ghe Ω2 R2 (cos2 λe − cos2 λg ) + ν 2 ΩRν cos λe cos θe − dτg dτe = 1 + 2 − c 2c2 c2 and dτw given by the same formula with the subscript e replaced by w. Integrating the above expressions gives the relation between τe and τg , τw and τg . In the real experiment ν, he , λe , θe (and hw , λw , θw ) changes with time so one must perform the integral numerically. For pedagogical purpose, here

we consider the simplified case, where λg = λe = λw = 0, θe = 0, θw = π, and he , hw are constants. We find ν2 2πR 1 − ΩRν τe − τg = 2 ghe − c 2 ν 1 ν2 2πR τw − τg = 2 ghw − + ΩRν c 2 ν Take ν = 893km/hour(based on the fact that it took about 45 hours to fly around the earth), he = hw = h = 10km, we find gh 2πR ν 2 2πR ΩRν 2πR = 178ns, = 55ns, 2 = 208ns 2 2 c ν 2c ν c ν (so we see that general relativistic effect is comparable to the special relativistic effect). So we get: τe − τg = −85ns, τw − τg = 331ns (the experimental data gives τe − τg = −59 ± 10ns, τw − τg = 273 ± 7ns). One remark: the difference between the aging of the two flying clocks is given by ΩR τw − τe = 2 2 2πR = 416ns c (the experimental result is 332 ± 17ns). Note that the velocity-independent general relativistic effect(and also all ν-dependence) cancels out in τw − τe .

E 1.27 Global Conservation of 4-Momentum in a Lorentz Frame [by Alexei Dvoretskii] The parallelepiped has eight faces, two perpendicular to each of the axes. Z T 0β dΣβ = ∆x∆y∆z(T 00 (t + ∆t) − T 00 (t)) ∂ν

+∆x∆y∆t(T 0z (z + ∆z) − T 0z (z)) +∆x∆z∆t(T 0y (y + ∆y) − T 0y (y)) +∆y∆z∆t(T 0x (x + ∆x) − T 0x (x))

The first term gives the change of the energy in a 3d volume ∆x∆y∆z in time ∆t. The other three terms give the flow of energy out of the 3d volume through the faces in time ∆t. The conservation law states that if the energy contained in a 3d volume increased/decreased then it flowed into/out of the volume. It’s not created of destroyed in the volume itself, i.e. it’s conserved.

1.30 Stress-Energy Tensor for a Perfect Fluid [by Alexei Dvoretskii (part a) and Geoffrey Lovelace (parts b-c)] (a) The stress-energy tensor should be a symmetric tensor made from ~u, g, ρ, and P , so it must be of the form: T αβ = Auα uβ + Bg αβ , where A, B are scalars to be determined. In the local rest frame, T jk = P δ jk tells us B = P ; and then T 00 = ρ tells us A = ρ + B = ρ + P ; note T 0j = 0 is satisfied automatically. Thus we’ve derived the stress-energy tensor given in (1.100b). ~ · T = 0, (b) The law of energy-momentum conservation can be written as ∇ or, in component form, T αβ ;β = 0. There are four conservation laws, one for each component (α=0,1,2,3). In the rest frame of the fluid, energy conservation is given by T 00 ,0 + T 0j ,j = 00 = E and T 0j = pj are the energy and momentum 0, or dE dτ + ∇ · p = 0, since T densities of the fluid, respectively, and τ is the fluid’s proper time. Also in this frame, ~u = (1, 0). Therefore, in the rest frame of the fluid, energy conservation ~ · T ). The last two expressions is given by 0 = T 00 ,0 + T 0j ,j = uα T αβ ,β = ~u · (∇ in this equation are geometric objects; therefore, the equation holds in any reference frame. We conclude that the law of energy conservation, as seen by ~ · T ) = 0 in any reference frame. the fluid, is ~u · (∇ To evaluate this equation explicitly, we begin by finding the divergence of the stress energy tensor. The stress energy tensor of a perfect fluid is T = (ρ + P )~u ⊗ ~u + P g = P P + ρ(P − g)

(1)

where ~u is the four-velocity of the fluid, ρ and P are the density and pressure of the fluid, g is the flat-space metric, and P is the projection tensor defined in Eq. 1.40(a): P = g + ~u ⊗ ~u (2) Later, we will use Eq. 1.74b: ~ =A ~ + (A ~ · ~u)~u. P( , A)

(3)

Taking the divergence, we find T αβ ;β = P;β Pαβ + P Pαβ ;β + ρ;β uα uβ + ρPαβ ;β .

(4)

Here, we have used the fact that the gradient of the metric vanishes. Now recall that d/dτ = uα ∇α , evaluate the divergence of the projection tensor Pαβ ;β = uα ;β uβ + uα uβ ;β =

duα + uα uβ ;β = aα + uα uβ ;β , dτ

(5)

combine the above equations in geometric notation to find ~ · T = P( , ∇P ~ ) + dρ ~u + (P + ρ)(~a + ~u(∇ ~ · ~u)). ∇ dτ

(6)

We will need the following inner products: ~u · ~u = −1 and ~u · ~a = 0. The second inner product follows immediately from the first (see the solution for problem 1.13 for the explicit derivation).

~ · T) = 0 as Now, we can write ~u · (∇ ~ · T) = P(~u, ∇P ~ ) − dρ − (P + ρ)(∇ ~ · ~u) = 0. ~u · (∇ dτ

(7)

The first term on the right hand side vanishes because the projection tensor is symmetric and P( , ~u) = ~u + (~u · ~u)~u = 0. Introducing Eq. 1.87, ~ · ~u = 1 dV , ∇ V dτ

(8)

we have ~ · T) = − ~u · (∇

dρ ~ · ~u) = − dρ − P + ρ dV = 0 − (P + ρ)(∇ dτ dτ V dτ

(9)

which implies dV dρ dV d(ρV ) =ρ +V = −P . (10) dτ dτ dτ dτ The left hand side is the rate of change of mass energy ρV inside the volume V , and the right hand side is the work of compression per unit time τ . To see the connection to the first law of thermodynamics, multiply both sides by dτ and use the chain rule: dE = d(ρV ) =

d(ρV ) dV dτ = −P dV = −P dτ. dτ dτ

(11)

This is the first law of thermodynamics for constant entropy (i.e, no heat transfer, dQ = T dS = 0) and constant number of particles. Note that ρV includes both the rest mass energies of the particles of the fluid and the energy of compression. (c) In the rest frame of the fluid, momentum conservation is given by T iν ,ν = i0 T ,0 + T ij ,j = 0, since T i0 and T ij are the momentum density and momentum ~ · T = 0. density flux, respectively. These are the spatial components of ∇ We know that in the rest frame of the fluid, the projection tensor P = g + ~u ⊗ ~u has the form Pµν = ηµν + δµ0 δν0 . Acting on a vector in the fluid rest frame, the projection operator yields Pµν Aν = Pµν (A0 , Ai )µ = (−A0 , Ai ) + (A0 , 0) = (0, Ai ). Acting the projection tensor on the particular vector T µν ,ν = (T 0ν ,ν , T iν ,ν ) gives Pµγ T γν ,ν = (0, T iν ,ν ). Therefore, in the rest frame of the fluid, the law of momentum conservation is given by Pγα T αβ ,β = 0. But this is now a geometric equation, so it holds in all reference frames. The momentum ~ · T) = 0. conservation law is thus P( , ∇ In part b, we found ~ · T = P( , ∇P ~ ) + dρ ~u + (P + ρ)(~a + ~u(∇ ~ · ~u)) ∇ dτ and ~ · T) = − ~u · (∇

dρ ~ · ~u). − (P + ρ)(∇ dτ

(12)

(13)

Therefore, the law of momentum conservation is ~ · T) = ∇ ~ · T + ~u[~u · (∇ ~ · T)] = P( , ∇P ~ ) + (P + ρ)~a = 0 P( , ∇

(14)

i.e., ~ ) (P + ρ)~a = −P( , ∇P

(15)

This is the analogy of “F = ma” for a perfect fluid, where the inertial mass is P + ρ. The left hand side is the inertial mass of the fluid times the four acceleration (note that in the fluid’s frame it’s four acceleration is (0, a)), while the right hand side is the “four-force,” which is the negative gradient of the projection of the pressure into the fluid’s 3-space.

Ph 136: Solution for Chapter 2 (compiled by Nate Bode and Dan Grin, revised by Kip Thorne)

A 2.6 Observations of Cosmic Microwave Radiation from a Moving Earth [by Alexander Putilin] (a) h4 ν 3 N c2 gs 2 N = 3 η = 3 η (f or photons) h h 2hν 3 (2h/c2 )ν 3 =⇒ Iν = 2 η = hν/kT 0 − 1 c e in its mean rest f rame. Iν =

let x = hν/kT0 , Iν =

2(kT0 )3 x3 x3 −15 erg = (3.0 × 10 ) h2 c2 ex − 1 cm2 ex − 1

from Fig. 1, we see the intensity peak is at xm = 2.82, which corresponds to νm = 1.6 × 1011 s−1 , λm = 0.19cm. (b) From chapter 1, we already know that the photon’s energy as measured in the mean rest frame is hν = −~ p · ~u0 , then (2.28) follows immediately. (c) Let n be the direction at which the receiver points, and v be the earth’s velocity w.r.t. to√the microwave background, then in the earth’s frame, ~u0 = √ (1/ 1 − v2 , −v/ 1 − v2 ), p~ = (hν, −hνn). Plugging the above expressions into (2.28), we find(let θ be the angle between v and n) 2hν 3 2h ν3 η = c2 c2 ehν/kT − 1 ! √ 1 − v2 with T = T0 1 − vcosθ

Iν =

For small v, we can keep only terms linear in v and find T ≈ T0 (1+vcosθ) which exhibits a dipolar anisotropy. And the maximal relative variation ∆T /T ≈ (T (θ = 0) − T (θ = π))/T0 = 2v/c = 4 × 10−3 .

4.5

4

3.5

I!(10−15erg/cm2)

3

2.5

2

1.5

1

0.5

0

0

1

2

3

4

5

6

7

8

x=h!/kT0

Figure 1: Ex. 2.6

2.2 Distribution Function for Particles with a Range of Rest Masses [by Jeff Atwell] (a) d2 V = dVx dVp dm is the natural phase-space volume to consider in the case that the particles have a range of masses. dVx dVp was shown to be frame invariant in the text. dm will also be frame invariant, because it is the range of rest masses among the particles in the set. So, d2 V will also be frame invariant. (b) Change from m as the fourth coordinate in momentum space (in addition to p1 , p2 , and p3 ) to E 2 = m2 + p2 , as the fourth coordinate. Then holding the other three coordinates fixed, we find the differential relationship between the old fourth coordinate and the new one: E dE = m dm. Plug this into the phase space volume to get d2 V = (dVx E/m)(dVp dE). dVx E/m is invariant because m is invariant and dVx E is also (as explained in the text). dVp dE = dp0 dpx dpy dpz is the 4-volume element in momentum space. From the discussion in Chapter 1, it may be rewritten dp0 dpx dpy dpz =

αβγδ Aα B β C γ Dδ , where Aα = (dp0 , 0, 0, 0), B α = (0, dpx , 0, 0), C α = (0, 0, dpy , 0), and Dα = (0, 0, 0, dpz ). Rewriting it this way is nice, because now it is fair to say that it is “manifestly” invariant.

2.3 Regimes of Particulate and Wave-like Behavior [by Jeff Atwell] (a) The equations in the text can be used to relate the occupation number to the specific intensity h3 c2 Iν . η= N = 2 2hν 3 We also have the equation dE Iν = . dAdtdνdΩ We are told that when the radiation reaches earth it has dE/dA ∼ 10−6 ergs/cm2 , and dt = 1 sec. We should take dΩ = (1000 km)2 /(1010 light years)2 , and might take dν ∼ ν (and one finds ν from E ∼ 100keV ). With this I find η ∼ 2000. This is not as small as what was expected, and implies that the photons behave more like waves in this case. The occupation number will stay the same as the photons propagate because the energy flux dE/dA dies out as 1/r2 , and the solid angle subtended by the source dies out as dΩ ∝ 1/r2 , so dE/dAdΩ is independent of radius, as are all other quantities that enter into Iν and then into η. (b) We are given the total dE in terms of the mass of the sun. An observer at distance r from the source sees an energy flux dE/dA = dE/(4πr2 ), and sees this coming from a solid angle dΩ ∼ λ2 /r2 . Therefore, dE/dAdΩ ∼ dE/(4πλ2 ). Here λ = c/ν, and ν ∼ 1000 Hz. We are also given dν ∼ 1500 Hz, and dt ∼ 10−2 sec. Also, gs = 2 for gravitons. With this I calculate η = 1072 , and so the gravitons will behave like a classical gravitational wave.

B 2.13 Collisionless Boltzmann Implies Conservation of Particles and of 4-Momentum [by Alexander Putilin] dN ∂N dxµ ∂N dpj = + j =0 µ dζ ∂x dζ ∂p dζ dpj note = 0 f or f reely moving particles dζ dxµ also = pµ dζ

thus the collisionless Boltzmann eqn. can be written as N; µ pµ =

∂N µ p =0 ∂xµ

thus a)

b)

Z Z ∂n ∂N ∂N dxi = dVp = − dVp (by the collisionless Boltzmann equation) = ∂t ∂t dxi dt Z Z p ∂N dxi dV = −∇ · N Vp = −∇ · S − p dxi dt m Z dVp S µ = N pµ 0 p Z dVp S µ ; µ = N; µ pµ 0 = 0 p Z dV p T µν = N pµ pν 0 p Z Z dVp µν µ ν dVp T ; ν = N; ν p p 0 = N; ν pν pµ 0 = 0 p p

C 2.11 Specific Heat for Phonons in an Isotropic Solid [by Jeff Atwell] (a) As with blackbody radiation, each mode in a solid is a harmonic oscillator with some well-defined frequency ν. Like all harmonic oscillators, this mode must have uniformly spaced energy levels, with an energy spacing = hν When the mode is excited into its N ’th energy level, we can regard it as having N quanta, i.e. N phonons; i.e., N will be its occupation number. Since the mode, like any harmonic oscillator, can have any occupation number 0 ≤ N < ∞, it must obey Bose-Einstein statistics rather than Fermi-Dirac statistics, and its quanta — its phonons — must be bosons. (b) Since phonons are bosons, and µR = 0, and each phonon has an energy E = hν, the distribution functions will be the same as for blackbody radiation (equation (2.21)): the mean number of quanta N in a mode with frequency ν will be 1 , η = hν/kT e −1

and the number density of phonons in phase space will be N =

1 gs . h3 ehν/kT − 1

In this expression we must think of ν as the frequency of a phonon whose energy is E = hν and whose momentum is p = E/cs = hν/cs . (c) To calculate the total energy in one type of sound wave (longitudinal or transverse), we integrate N E over phase space, i.e. we multiply by the volume V of our solid and we integrate over momentum space using spherical coordinates so dVp = 4πp2 dp: Z Z ∞ 1 gs (cs p)4πp2 dp . ET otal = N EV dVp = h3 ecs p/kT − 1 0 Change to the dimensionless variable x = cs p/kT , to find Z ∞ 4πk 4 x3 4 ET otal = gs 3 3 V T dx. h cs ex − 1 0 The integral can be evaluated in terms of the Bernoulli number [Eq. (2.48c) of the text, plus Table 2.1]. The final result is ET otal = gs

4π 5 k 4 V T 4, 15h3 c3s

which is what we wanted to show. (d) To get the heat capacity, differentiate ET otal with respect to T : CV = 4as T 3 V. (e) The phonon frequency and wavelength are related by ν = cs /λ, so the thermal distribution function is given by η=

1 ehcs /λkT

−1

=

1 eλT /λ

−1

.

From this we can see that when λ λT , η 1; and for λ ∼ λT , η ∼ 1; and for λ λT , η 1. The atomic spacing ao puts a lower limit on the wavelengths of the phonons, λmin = 2ao , corresponding to an upper limit on their energies: Emax = hνmax = hcs /λmin =

hcs . 2ao

This may be safely ignored at low temperature. But when kT ∼ Emax ,

the computation will fail. TD ∼ Emax /k is the Debye temperature. Once we hit the Debye temperature, the total number of modes saturates to some number Nmodes , as adding higher frequency (lower λ) modes would force us to exceed the fundamental lattice spacing. When kT > Emax , every mode should hold an energy ∼ kT , and so then ET otal ∼ gs Nmodes kT , where Nmodes is the total number of modes of vibration of the solid. This is the “equipartition theorem” in thermal physics. Then CV = dEtotal /dT = gs Nmodes k will be independent of temperature.

2.9 Equation of State for Electron-Degenerate Hydrogen [by Alexander Putilin] Mean occupation number of electron gas: 1

η= e

˜ µ E− ˜e kT

+1

˜ 2 = p2 + m2e , E

˜ looks like Fig. 2 Gas is degenerate if µ ˜e − me >> kT . In this limit η(E) ˜ goes from 0 to 1 is ∼ kT , so in The width of the ”transition” region where η(E)

!

1 ~kT

0

me

" µ

~ E

Figure 2: Ex. 2.9 ˜ by step function: η(E) ˜ =1 the limit µ ˜e − me >> kT we can approximate η(E) ˜ ˜ ˜ for E < µ ˜e ; η(E) = 0 for E > µ ˜e .

The number density n is given by Z ∞ Z ∞ 2 2 4πN p dp = 4π 3 ηp2 dp n = h 0 0 Z 8π pF 2 8π = p dp = 3 p3F h3 0 3h p where pF = µ ˜2e − m2e . The energy density ρ = ρp + ρe . Protons are nonrelativistic so ρp = mp n = 8πmp p3F /3h3 . while Z ∞ Z 8π pF p 2 2 ˜ N Ep dp = 3 ρe = 4π p + m2e p2 dp h 0 0 ρe ≈ me n, if pF > me (ultra-relativistic case) ρe ≈ h3 F In both cases ρe Tc0 , we get k3 T 3 ~3 ωo3 µ/kT Li (e ) ⇒ N = Li3 (eµ/kT ) 3 + ~3 ωo3 k3 T 3 4 3T 4 N kTc0 3Li4 (eµ/kT ) T µ/kT 0 E+ = Li4 (e )= N kTc , 0 4 (Tc ) ζ(3) ζ(3) Tc0 N+ =

Note that continuity of N implies that µ → 0 as T → Tc0 . Demanding continuity of the number of atoms, N+ = N− = N therefore implies that the total energy is also continuous across the critical temperature. (b) C ≡ (∂Etotal /∂T )N , so all we have to do to evaluate the heat capacity is differentiate the total energies for T > Tc0 and T < Tc0 that we found in part (a). Differentiation of the T < Tc0 case is trivial; differentiation of the T > Tc0

case requires a trick. Holding N = const and differentiating our equation for N , we find 0=3

k3 T 2 k3 T 3 ∂ µ/kT µ/kT Li (e ) + Li2 (eµ/kT ) e 3 ~3 ωo3 ~3 ωo3 ∂T ⇒

∂ µ/kT −3Li3 (eµ/kT ) e = ∂T T Li2 (eµ/kT )

We know how to write Li3 (eµ/kT ) as a function of T and N only from part (a). Finally, carrying out the differentiations gives the heat capacities as T → Tc0 (and therefore µ → 0) from above and below: ∂E− 12ζ(4) C− = = Nk ∂T N ζ(3) 12ζ(4) ∂E+ −3ζ(3) = C+ = N k + 3N kTc0 ∂T N ζ(3) Tc0 ζ(2) 12ζ(4) ζ(3) = Nk − 9 N k. ζ(3) ζ(2)

E 3.10 Primordial Element Formation [by Nate Bode and Dan Grin] In the early universe, the protons and neutrons travelled at non-relativistic speeds, so they can be described as a monatomic, non-relativistic gas. The photons travel at the speed of light and make up an ultrarelativistic gas. We solve the problem by considering the gas in two different epochs and suppose that the transition is rapid. Initially we have a gas made up of solely neutrons and protons. The entropy per 2 protons and 2 neutrons is (ignoring the small mass difference between the proton and neutron) " 3/2 #! 5 2mp 2πmp kTf σinit = 4 + ln 2 ρ h2 " 3/2 # 2mp 2πmp kTf = 10 + 4 ln ρ h2 In the final state the entropy will be given by the α-particle entropy per αparticle added to the photon entropy per photon. Note that the 7 MeV is the binding energy per nucleon. Therefore, the total binding energy of an α-particle is 28 MeV (so the problem incorrectly gives 7 MeV). Therefore " 3/2 # 8mp 8πmp kTf 5 + ln + σγ , σfinal = 2 ρ h2

where σγ is the photon entropy per photon and can be roughly found from the thermodynamic equation σγ ∼ Sγ /k = U/T = 28 MeV/kT ≈ 4.49 × 10−5 ergs/kT .

(2)

Now we must calculate ρ at the transition point. As nucleosynthesis occurs after inflation the universe expands adiabatically except when particle species annihilate. At the ∼ MeV temperatures under consideration, the last annihilation event (e+ e− → γγ) has occurred, so we may treat the expansion of the universe as adiabatic. This means that the density of baryons is given by ρbaryon,init = ρbaryon,now

Tinit Tnow

3 .

(3)

Currently the total matter density is ρtotal,now ≈ 1.7 × 10−29 g/cm3 , and the mass fraction of baryons is about 2%. Therefore to we need to equate σinit and σfinal , plugging in Eqns. 3 and 2. Solving using your favorite numerical solver gives Tcrit = 1.0 × 109 K . The time of nucleosynthesis can be found by plugging this temperature into Tinit T (t) = p ⇒ tcrit = tinit t/tinit

Tinit Tcrit

2 (4)

Therefore, tcrit ∼ 100 s. In Fig. 3, we can see that at early times (high temperatures), the higher entropy (preferred) state is 2p + 2n, while at late times (low temperatures T < Tcrit ), the higher entropy (preferred) state is α + γ, indicating that helium production does not occur until T ∼ Tcrit . It might be argued that the the graph does not give correct results because the true reaction will not be isentropic, meaning that the method of finding the initial density is incorrect. But, because this density lies inside a log the dependence is very weak. In particular, even in the extreme case of the density being incorrect by two orders of magnitude, the entropies will only differ by less than a factor of 2. Taking a look at Fig. 3 we see that a factor of 2 does not change the crossover point.

Σparticle H1kL 105

104

1000

100 107

108

109

1010

1011

1012

T HKelvinL

Figure 3: Particle Entropy per Particle in units of k −1 as a function of temperature in Kelvin. Red curve shows entropy per particle for collection of protons and neutrons only (initial state), while the green curve shows entropy per particle for α-particles and photons in the final state.

Ph 136: Solution for Chapter 4 (compiled and revised by Nate Bode and Dan Grin)

A 4.1 Pressure Measuring Device: Gibb’s approach [by Kip Thorne], [modified by Dan Grin] The obvious thing to do is to choose our pressure measuring device (pressure meter) to exchange only volume and not heat with its reservoir, but this would call on the use of enthalpy as an extensive quantity, and the appropriate ensemble. Unfortunately, this problem precedes Ex. 4.2, which introduces you to the enthalpy. At this point it is better (more straightforward) to let the meter exchange both volume and heat with its reservoir. This is analogous to the meter that Kip discussed in class on Monday, which exchanged particles and heat. (Attached also below is an alternative solution using the enthalpy, if you choose to do the problems in an alternate order, or already know about the enthalpy ensemble.) Then the probability for the meter to be in a (quantum) state with volume ∆V and energy ∆E˜ (probability measured either using an ensemble of meter & reservoir systems or via the fraction of the time the meter spends in a given state) is given by the meter’s Gibbs distribution:

or equivalently

˜ − P ∆V )/kT ] ; ρ = K exp[(−∆E

(1)

˜ − P ∆V )/kT + constant . ln ρ = (−∆E

(2)

Here K is the normalization constant and P and T are the reservoir’s pressure and temperature. õ , volume Võ , and Suppose that the reservoir plus meter has total energy E number of particles No ; and the meter plus reservoir is a closed system. Then by energy and volume conservation, when the meter has volume ∆V and energy ∆E, the reservoir has volume Vr = Vo − ∆V and energy Er = Eo − ∆E. Denote ˜r , Vr ) the total number of states of the reservoir that have energy Er , by Nr (E volume Vr and particle number Nr , and similarly Nm (∆E, ∆V ) for the meter. (The meter might not be made of the same kind of particles as the reservoir; it might, for example, just be a mass-spring system that can be heated; the only variables we care about for it are energy and volume.) Then the probability ρ is proportional to the following product of numbers of states for the reservoir and meter: ρ ∝ Nr (Eo − ∆E, Vo − ∆V, No ) × Nm (∆E, ∆V ) .

(3)

Because the meter is tiny compared to the reservoir, when we take the logarithm of this ρ, the changes in the meter term, as energy and volume flow back and forth, are negligible compared to the changes in the reservoir term, so ln ρ = ln[Nr (Eo − ∆E, Vo − ∆V, No )] + constant .

(4)

This logarithm of the number of reservoir states is proportional to the reservoir entropy, so ln ρ = Sr (Eo − ∆E, Vo − ∆V, No )/k + constant .

(5)

Expanding to first order in the meter energy and volume we obtain ∂Sr ∂Sr ∆E − ∆V . ln ρ = − ∂Er Vr ,Nr ∂Vr Er ,Nr

(6)

Comparing with our previous expression for ln ρ we see that the pressure and temperature that appear in the meter’s Gibbs distribution are related to the reservoir’s entropy by the usual reservoir thermodynamic relations ∂Sr 1 = . T ∂Er Vr ,Nr

P =T

∂Sr ∂Vr

˜ r ,Nr E

=

˜r ∂E ∂Sr

Nr ,Vr

∂Sr ∂Vr

˜r ,Nr E

=−

˜r ∂E ∂Vr

Nr ,Sr

.

In the last step we have used an obvious analog of Eqn. (2) of Box 4.2 in the text (permute S and N ).

4.1 Pressure Measuring Device: Enthalpy approach [by Dan Grin] Alternatively, we could let the meter exchange only volume with the system, in which case energy exchange would occur only through mechanical work, and not heat (entropy) flow. Thus, for the meter ∆E˜ = −P ∆V , and the meter is generally in the enthalpy ensemble with . (7) ρ = e−S/kB H=constant

However, in this case the meter’s entropy changes by ∆S = 0, even after interaction with the system, since there is no heat flow, and so ln ρ = ln Sres = constant. After energy and volume are exchanged with the reservoir, the reservoir has probability ˜ ∆V , ˜ V0 − ∆V, N0 , H0 Nm ∆E, (8) ρ ∝ Nr E0 − ∆E, where Ni denotes the number of states of the reservoir/meter that have the enumerated extensive properties. Taking the logarithm of Eqn. 8 and noting

that the changes in the meter are overwhelmingly smaller than the entropy of the system due to its small size, we obtain ˜ V0 − ∆V, N0 , H0 ] + const. (9) ln ρ ' ln [Nr E0 − ∆E,

ln Nr is equal to the reservoir entropy, and so to lowest order ∂Sr ˜ − ∂Sr ln ρ ' − ∆E ∆V + const, ∂Er N,V ∂Vr N,E

(10)

where the ‘r’ subscript denotes reservoir properties. ˜ = −P ∆V , we However, because as we saw before, ρ cannot change and ∆E must have ! ˜r ∂E ∂Sr ∂Sr ∂Sr P = , (11) = / ˜r N,E ∂Vr N,V ∂Sr ∂Vr E˜ ,N ∂E Nr ,Vr

r

r

and applying an equation analagous to Eqn. (2) of Box 4.2 in the text (switching N and S in the printed equation), we see that P =−

˜r ∂E ∂Vr

Nr ,Sr

.

B 4.3 Enthalpy Representation of Thermodynamics [by Xinkai Wu, Dan Grin] (a) Let us begin in the energy representation, in which case the fundamental potential is E(V, N, S) = T S + µN − P V . The Legendre transformation from V as an independent variable to P as independent variable is a mapping (V, E[V, N, S]) → (P, H[P, N, S]), where dE V. (12) H = E(V (P, N, S), N, S) − dV N,S Thus, H = E + PV , the enthalpy, is the desired Legendre tranformation of the energy. Combining H = E + P V with the first law dE = T dS − P dV + µ ˜dN one immediately finds dH = V dP +T dS + µ ˜dN . Having this expression we can compute V, T, µ ˜ by differentiating H w.r.t. P, S, N , respectively, namely H serves

as the fundamental potential in this case. (b) In the first law derived in part (a), there’s dS, dN but no dV . There is no exchange of heat and no exchange of particles, as S and N are independent variables, but there is exchange of the extensive quantities, volume and energy. Thus, the energy exchange is fixed completely by the volume exchange, dE = −P dV . A physical situation which could produce this ensemble is: the system is put in a chamber, with a frictionless piston separating the system and the bath, where the walls and the piston are impermeable and heat-insulating. (c) If H and N are the fixed quantities (constants of the motion) for each system in the ensemble, then when given sufficient time to evolve towards equilibrium, the ensemble’s probability distribution can only be a function of H and N (under Hamiltonian evolution). If we consider only that subset of states with some fixed average values of H and N , then we must have that ρ = ρ = constant. From the equation X ρk ln (ρk ) = −kB ln ρ = −kB ln ρ, S = −kB (13) k

where the last equality follows from the constancy of ρ. Thus ρ = exp [−S/kB ]H,N .

Unlike the analagous problem for micro-canonical ensemble, we do not have to widen our range of fixed quantities from a δ-function to distributions of finite width ∆H and ∆N . The reason is that because only enthalpy itself is fixed, and not energy or volume, there are many different values of E and V consistent with a given value of the enthalpy H. As a result, even without ’manually’ widening the distribution to a range of δE around E, the volume in phase space available to each system is not a set of measure-0 and we can comfortably do statistical mechanics. (d) The equations of state read off from the enthalpy first law and the associated Maxwell relations are: ∂H ∂H ∂H , T = , µ ˜= (14) V = ∂P S,N ∂S P,N ∂N P,S ∂ 2H ∂V ∂T ∂2H = = = (15) ∂P ∂S ∂S P,N ∂P S,N ∂S∂P ∂2H ∂T ∂µ ˜ ∂2H = = = (16) ∂S∂N ∂N P,S ∂S P,N ∂N ∂S ∂2H ∂2H ∂µ ˜ ∂V = = = (17) ∂N ∂P ∂P S,N ∂N P,S ∂P ∂N (e) “Adding up small subsystems”, we get

H = TS + µ Ñ (f ) In Exercise 1.30 it is shown that the inertial mass per unit volume is ρinert = T 00 δji + Tji ji

(18)

for isotropic system = ρδji + P δji = (ρ + P )δij .

(19)

Thus the total inertial mass is isotropic and is given by ρV + P V = E + P V = H. (g) To create the sample, we need ∆E, the sample’s energy. To inject the sample into the system (with no heat exchanged), we need to perform an amount of work P ∆V on the system. Thus the total energy required to create the sample and perform the injection is ∆E + P ∆V , which is just the sample’s enthalpy, because there is no heat exchanged. Thus, enthalpy has the physical interpretation of “energy of injection at fixed volume V”.

4.2 Energy Representation for a Nonrelativistic Monatomic Gas [by Dan Grin] (a) We begin with an expression for the fundamental potential E of a nonrelativistic gas in the energy representation, (see Eq. 4.9c in the text) E(V, S, N ) = N

3h2 4πm

V N

−2/3

exp

2 S − 5/3 . 3kB N

(20)

To derive the desired relations, we need only substitute Eqn. (20) into the following expressions for the intensive variables in terms of variables of the fundamental potential (Eqns. 4.10a in the text): ∂E ∂E ∂E T = , µ= , P =− . (21) ∂S V,N ∂N V,S ∂V S,N In the case of temperature, this derivative is trivial, as E only depends on S through the exponent, and yields h i 2/3 2 2 S exp − 5/3 . T = 2πkhB m N V 3kB N

In the case of pressure, the derivative is trivial, as E only depends on V through the pre-factor in front, and so,

∂E ∂V S,N

P =−

=

h2 2πm

N 5/3 V

exp

h

2 S 3kB N

i − 5/3 .

The chemical potential is a little trickier, but we can simplify our lives a little by re-writing E in the following form

E=

N V

5/3

3h2 4πm

exp

2 S − 5/3 V 3kB N

(22)

Then, calling on the product (Leibniz) rule, we see that µ =

2/3 5 N 3h2 2 S exp − 5/3 − 3 V 4πm 3kB N V,S 5/3 N 2 S 2S − 5/3 V exp 2 3kB N V 3kB N

∂E ∂N

⇒µ=

h2 4πm

=

N 2/3 V

h 5 − 2 kBSN exp 3k2B

S N

(23)

i − 5/3 .

(b) We verify that the Maxwell relations are satisfied by taking the appropriate derivatives of the Eqns. derived in part a). 5/3 h2 2 S N ∂T = − exp (24) − 5/3 = 3kB πN m V 3kB N ∂V S,N 5/3 ∂P h2 N S 2 S − 5/3 = − = 5−2 exp (25). 6πmN V kB N 3kB N ∂N S,V

∂P − ∂S N,V ∂µ ∂V N,S

∂T ∂N

1/3 2/3 ) h2 S V N h2 − × 2 2 3πmkB V N 3πmkB N V ∂µ 2 S − 5/3 = . exp 3kB N ∂S N,V (

= S,V

(26)

(c) To derive the ideal gas gas equation, we solve the temperature equation for the exponential expression to obtain 2/3 2 S 2πkB mT V exp − 5/3 = . 3kB N h2 N

(27)

Plugging this into the pressure equation, oodles of factors cancel out to yield the desired ideal gas law: P = N kB T /V .

C 4.5 Electron-Positron Equilibrium at “Low” Temperatures [by Xinkai Wu], [modified by Kip Thorne and Dan Grin] (a) The reaction equation e− + p −→ e− + p + e− + e+ gives µ ˜ e− + µ ˜p = 2˜ µ e− + µ ˜p + µ ˜e+ , which implies µ ˜ e− + µ ˜e+ = 0, i.e. µ ˜e− = −˜ µ e+ . (b) In what follows we shall use µ ˜ − to denote µ ˜e− , and µ ˜+ to denote µ ˜ e+ . The distribution function(density in phase space) for positrons and electrons (both having gs = 2) is given by 2 1 2 η± = 3 ˜ 3 ( E−˜ µ ± )/kT + 1 h h e E˜ ≈ mc2 + p2 /2m

N± =

⇒ N± =

2 h3

p2

e( 2m +mc

(28) (29)

1 2 −µ ˜

± )/kT +1

R The density in coordinate space n± is given by d3 pN± . We know that n− > n+ because while positrons and electrons are created in pairs we also have ionization electrons from hydrogen atoms. Thus we must have µ ˜− > µ ˜+ . This inequality combined with µ ˜− + µ ˜+ = 0 gives µ ˜− > 0, µ ˜+ < 0. h i p2 (c) In the dilute-gas regime, η± ≈ exp −( 2m + mc2 − µ ˜± )/kT . It’s trivial to perform the momentum-space integral and find 2 µ ˜± − mc2 n± = 3 (2πmkT )3/2 exp (30) h kT η 0, then ∆Psystem < 0, namely Psystem will drop below Pexternal and the system will keep collapsing. A similar reasoning can be carried out when the volume increases a little bit by fluctuation. Thus we conclude ∂P ∂v T > 0 would lead to instability against volume fluctuations.) So we can look at the slope of the curves in Fig. 4.7(a) and conclude that: when T > Tcrit , the gas is stable all along the curve; when T = Tcrit , the gas is stable except at the one particular point v = vcrit ; when T < Tcrit , the gas is stable in two regions (and these are the two different phases that are both stable against volume fluctuations) b < v < vmin and v > vmax , where vmin and vmax are the locations of the local minimum and the local maximum, respectively, and the gas is unstable in the region vmin < v < vmax . (c) By the principle of minimum Gibbs potential, the chemical potential at point A must be equal to that at point B (see Exercise 4.4). Recall that being on the same isothermal curve A and B have the same temperature. Thus µ(T, PA ) = µ(T, PB ) gives PA = PB , i.e. the straight line from A to B is horizontal. We have the following differential equation dG = d(µN ) = −SdT + V dP + µdN ⇒ N dµ = −SdT + V dP

(55) (56)

⇒ dµ = −sdT + vdP, where s ≡ S/N, v ≡ V /N

Along the isothermal curve, dT = 0, so dµ = vdP . Thus Z B Z B vdP dµ = µ|B − µ|A =

(57)

(58)

A

A

RB Easily seen, A vdP is just the difference between the areas of the two stippled regions. Thus µ|A = µ|B tells us that these two areas are equal. (d) σV = V

√

kT κ = V

s

−

kT

∂P ∂V

T,N

V2

;

(59)

see Eq. (4.53) and the paragraph that follows it. We see that small values ∂P correspond to large volume fluctuations. Thus near the “critiof − ∂V T,N 8a a ∂P cal point” with Tcrit = 27kb , Vcrit = 3N b, Pcrit = 27b is very small 2 , − ∂V T,N and the gas exhibit huge volume fluctuations. (For T < Tcrit, near the the local ∂P minimum and maximum on the P − V plot, we also have small − ∂V . T,N However, as argued in part (b) and (c), these regions are not physical and in ∂P general don’t exist in nature. Also, − ∂V gets small when V gets large. T,N ∂P But σV /V actually doesn’t get huge, because − ∂V ∼ N kT /V 2 for large T,N √ V , and σV /V remains ∼ 1/ N.) 4.8 Fluctuations of Systems in Contact with a Volume Bath [by Xinkai Wu] (a) No matter whether the ensemble is in equilibrium or not, the (very general) energy conservation law always holds. Since there’s no heat or particle exchange between the system and the bath, the change in the system’s energy is completely due to the work performed on the system by the bath, namely ˜ = −Pb dV . (Note that since the bath is huge compared to the system, the dE bath’s pressure Pb remains fixed, so ∆E˜ = −Pb ∆V for finite volume change.) ˜ + Pb ∆V = 0, i.e. the system’s enthalpy is conThis implies that ∆H = ∆E served. (b) Since the system is very small compared to the bath, the bath is al˜b = ways in equilibrium. Then we can use the first law for the bath ∆E Tb ∆Sb − Pb ∆Vb + µ ˜b ∆Nb . Since there’s no particle exchange, ∆Nb = 0. Also ˜b = −∆E ˜ = Pb ∆V = −Pb ∆Vb . These by energy and volume conservation, ∆E combined with the first law tells us Tb ∆Sb = 0, i.e. ∆Sb = 0. So we conclude that interaction with a system cannot change the bath’s entropy. (c) The bath and the system form a closed supersystem with a total entropy Stotal = Sb + S. This supersystem, like any closed system, must evolve toward increasing Stotal . This combined with the fact that Sb doesn’t change (proved in part (b)) tells us S always increases. When the supersystem reaches statistical equilibrium, it is in the microcanonical ensemble with ρtotal = e−Stotal /k = e(−Sb −S)/k = const. This, combined with Sb = const, gives us ρ = e−S/k = const for the regions of phase space that have enthalpy H in the small range δH. (d) For fluctuations away from equilibrium, the probability ∝ eStotal /k ∝ e . Note that the enthalpy H is a function of (P, S, N ) (see Exercise 4.3). Inverting this relation we get S = S(H, P, N ). So we still have equations (4.48a), (4.48b) but with E replaced by H, V replaced by P . S/k

Ph 136: Solution for Chapter 5 (revised by Nate Bode, Dan Grin, Geoffrey Lovelace and Kip Thorne)

A

5.2 Filters [by Geoffrey Lovelace(a-e) and Nate Bode(a)] (a) Equation (5.42) explains how to find the filtered spectral density Sw from the unfiltered spectral density Sy : ˜ )|2 Sy (f ), Sw (f ) = |K(f ˜ ) is the Fourier transform of the kernel of the filter. where K(f The filter is given by Eq. (5.47a): Z

t

cos[2πfo (t − t0 )]y(t0 )dt0 .

w(t) = t−∆t

Note: The algebra gets impossible very quickly if you aren’t clever on this problem. If you just go at this brute force, you may even find that Mathematica chokes on the integrals involved. We could proceed by following the suggestion in the book and drop a Fourier mode into our filter: Z t 0 w(t) = dt cos[2πf0 (t − t0 )]e−2πf t t−∆t

and squaring the output to find the Fourier transform of the kernel squared. However, this may be one of the few occasions when it is relatively straightforward to calculate the Fourier transform of the kernel directly. The first trick is to rewrite the finite Fourier transform filter so that the integral is taken over all times. To do this, just use the Heaviside Theta function, Θ(x), which has a value of 1 for positive values of x and a value of 0 for negative values of x. Z t w(t) = dt cos[2πf0 (t − t0 )]y(t0 ) t−∆t Z ∞ = dt0 Θ(∆t − (t − t0 ))Θ(t − t0 ) cos[2πf0 (t − t0 )]y(t0 ) −∞ Z ∞ = dt0 K(t − t0 )y(t0 ). −∞

Therefore K(τ ) = Θ(∆t−τ )Θ(τ ) cos[2πf0 τ ]. We next find the Fourier transform of K as follows: Z ∞ ˜ ) = 1 K(τ )ei2πf τ dτ K(f 2 −∞ Z 1 ∞ = Θ(∆t − τ )Θ(τ ) cos[2πf0 τ ]ei2πf τ dτ 2 −∞ Z 1 ∞ Θ(∆t − τ )Θ(τ ) e−i2πf0 τ + ei2πf0 τ ei2πf τ dτ = 4 −∞ Z 1 ∞ = Θ(∆t − τ )Θ(τ ) ei2π(f −f0 )τ + ei2π(f +f0 )τ dτ 4 −∞ −1 + ei2π(f +f 0)∆t i −1 + ei2π(f −f 0)∆t = + 4 2π(f − f 0) 2π(f + f 0) where the final step was done using Mathematica. We can now easily calculate ˜ 2: |K| ! 2 −1 + ei2π(f +f 0)∆t 2 1 −1 + ei2π(f −f 0)∆t 2 ˜ |K| = + + crossterm 4 2π(f − f 0) 2π(f + f 0) 2 ∆t crossterm = { sinc 2 (π(f − f0 )∆t) + sinc 2 (π(f + f0 )∆t)} + 2 4 Noting that we only deal with positive frequency and that the second term is of only significant magnitude in the negative regime we can discard it. The crossterms are similarly small being made up of a mixture of a sinc with f − f0 and a sinc with an f + f0 . The smallness of the latter sinc gives us the final result: 2 ˜ 2 = ∆t sinc (π(f − f0 )∆t) |K| 2 Finally, we can relate the spectral density of the input signal y to the spectral density of the output signal w by 2 ˜ )|2 Sy (f ) = ∆t sinc [π(f − f0 )∆t] Sy (f ). Sw (f ) = |K(f 2 (b) Figure 1 shows the finite fourier transform filter. To evaluate its bandwidth, use Eq. (5.45b): R∞ ˜ )|2 df |K(f ∆f = 0 ˜ 0 )|2 |K(f ˜ ) is entirely contained in the sinc function. The frequency dependence of K(f R∞ Two properties of sinc are useful: (i) sinc (0) = 1, and (ii) 0 df sinc 2 [π(f −

f0 )∆t] =

1 ∆t

+O

1 f0 ∆t

. These two properties immediately imply that ∆f =

∆t2 4∆t ∆t2 4

=

1 . ∆t

# K !f" # 2 $$$$$$$$$$$$$$$$$$$$$$$ !2 $ #t"2 1

0.8

0.6

0.4

0.2

-10

-5

5

10

2Π!f"fo "#t

Figure 1: Ex. 5.2 (b)

This is the bandwidth you expected to find. (c) Equation (5.42) explains how to find the filtered spectral density Sw from the unfiltered spectral density Sy : ˜ )|2 Sy (f ), Sw (f ) = |K(f ˜ ) is the Fourier transform of the kernel of the filter. where K(f The filter is Z t 1 w(t) = y(t0 )dt0 . ∆t t−∆t Inserting a sinusoidal function into the filter and squaring gives the square of the transform of the kernel: 2 Z t sin2 (f π∆t) 2 2πif t0 0 ˜ )|2 = 1 |K(f e dt ∆t = (f π∆t)2 = sinc (f π∆t). t−∆t This is sketched in Fig. 2. From the sketch, it is obvious that f0 = 0. Of course, this means that ˜ 0 )|2 = 1. We will use that result in the next part. |K(f (d) By assuming that the spectral density of the (unfiltered) noise varies 1 negligibly over the width of the filter (which is on the order of ∆t , as is evident from Fig. 2), we can apply Eq. (5.45g): q q ˜ 0 )| Sy (f0 )∆f = Sy (0)∆f . σw = |K(f

#K!f"#2 1

0.8

0.6

0.4

0.2

1

2

3

5

4

6

f !Π "t"

Figure 2: Ex. 5.2 (c)

The bandwidth ∆f is defined by Eq. 5.49b: R∞ Z ∞ ˜ )|2 df |K(f 0 ∆f = = sinc 2 (f π∆t)df 2 ˜ |K(0)| 0 Z ∞ 1 1 1 π = = . dx sinc 2 x = π∆t 0 π∆t 2 2∆t Notice that the bandwidth is only half the bandwidth of the finite fourier transform. This is because, here, the filter is centered on f = 0, so the integral in Eq. (5.49b) is only half what it is for the finite fourier transform (where the peak is at a large, positive frequency).

B 5.3 Wiener’s Optimal Filter [by Alexei Dvoretskii(a,b) and Nate Bode(c)] (a). Easy way By Kip: Z

+∞

N (t) =

K(t0 − t)y(t0 )dt0

Z

+∞

=

−∞

−∞ Z +∞

=

K(t00 )y(t00 + t)dt00 K(t00 )y(t00 )dt00

(1)

−∞

where we made a change of variable t00 ≡ t0 − t in the second equality and noted that y is a stationary random process (its statistical properties are independent

of the origin of time: p1 (y, t) = p1 (y, 0)) and therefore independent of t in the final equation. Thus, the statistical of N (t) must be the same as properties those of N . In particular, N 2 (t) = N 2 ] Let’s show N (t) = 0. One can do this with the help of ergodicity: N (t) = R +∞ R +∞ hN (t)i = −∞ K(t − t0 ) hy(t0 )i dt0 = −∞ K(t − t0 )y(t0 )dt0 = 0. One can also prove this directly by integrating N (t) over time, whose detail we omit here. Knowing N (t) = 0, we readily get Z +∞ Z +∞ 2 ˜ )|2 Sy (f )df. N 2 (t) = σN = SN (f )df = |K(f 0

0

Hard Way

By ergodicity N 2 (t) = N 2 (t) . Now let’s show N 2 (t) = N 2 . Z +∞ Z +∞

2 0 0 0 00 00 00 N (t) = K(t − t )y(t )dt K(t − t )y(t )dt Z

−∞ +∞ Z +∞

= −∞ Z +∞

−∞ Z +∞

−∞

−∞

= let =

−∞

dt0 dt00 K(t − t0 )K(t − t00 ) hy(t0 )y(t00 )i dt0 dt00 K(t − t0 )K(t − t00 )f (t0 − t00 )

t˜0 = t − t0 , t˜00 = t − t00 Z +∞ Z +∞ dt˜0 dt˜00 K(t˜0 )K(t˜00 )f (t˜00 − t˜0 ) −∞ +∞

Z

−∞ +∞

Z

dt0 dt00 K(t0 )K(t00 )f (t00 − t0 )

= −∞

−∞

In the above equations we use f (t0 − t00 ) to denote hy(t0 )y(t00 )i because it’s a stationary process. On the other hand, we have Z +∞ Z +∞

2 N = K(t0 )y(t0 )dt0 K(t00 )y(t00 )dt00 Z

−∞ +∞ Z +∞

= −∞

−∞

dt0 dt00 K(t0 )K(t00 )f (t0 − t00 )

−∞

Using the trivial fact that f (t0 − t00 ) = f (t00 − t0 ), we see that N 2 =

2 R +∞ ˜ )|2 Sy (f )df . N (t) = N 2 (t) = 0 |K(f R +∞ R +∞ [A much simpler proof by Kip: N (t) = −∞ K(t0 −t)y(t0 )dt0 = −∞ K(t00 )y(t00 + t)dt00 where we’ve made a change of variable t00 ≡ t0 − t. But y is a stationary random process, so its statistical properties are independent of the origin of time: p1 (y, t) = p1 (y, 0). Thus, the statistical of N (t) must be the properties same as those of N . In particular, N 2 (t) = N 2 ] (b) Using Parseval’s theorem, and the fact K(t), s(t) are both real (thus

˜ ˜ ∗ (f ), s˜(−f ) = s˜∗ (f )), we have K(−f )=K Z +∞ Z Z +∞ 1 +∞ ˜ ˜ )˜ S= K(t)s(t)dt = (K(f )˜ s∗ (f )+c.c)df = (K(f s∗ (f )+c.c)df 2 −∞ −∞ 0 We have

R +∞

S hN 2 i

1/2

˜ )˜ (K(f s∗ (f ) + c.c)df = h0 i1/2 R +∞ ˜ )|2 Sy (f )df | K(f 0

(2)

˜ ) → K(f ˜ ) + δ K(f ˜ ), one readily gets Taking a small variation K(f ! "Z ! # +∞ ˜ )Sy (f ) S s˜(f ) K(f S ˜ δ df δ K(f ) = − + c.c. 1/2 1/2 S 2 hN 2 i 0 hN 2 i hN 2 i We see that for δ

S

hN 2 i1/2

˜ ), we must have to vanish for any δ K(f ˜ ) = const × s˜(f ) K(f Sy (f )

(3)

˜ ) actually delivers a maximum to the ratio To show that this choice of K(f one could do a second variation calculation. Our physical intuition should make it obvious, though, because a filter with such a kernel favors such frequencies for which the signal to noise ratio is high and suppresses those for which the opposite is true. S , hN 2 i1/2

(c) We begin from Eqn. 2, square both sides, plug in Eqn. 3, and note that Sy (f ) is a nonnegative real valued function: R

2

S hN 2 i

=

+∞ ˜ (K(f )˜ s∗ (f ) + c.c)df 0 R +∞ ˜ )|2 Sy (f )df |K(f 0

R

=

=

=

2

+∞ (const × Ss˜y(f(f)) s˜∗ (f ) + c.c)df 0 R +∞ |const × Ss˜y(f(f)) |2 Sy (f )df 0 R 2 +∞ |˜ )|2 ( Ss(f + c.c)df (f ) 0 y R +∞ |˜s(f )|2 Sy (f ) df 0 Z +∞ 2

4 0

|˜ s(f )| df Sy (f )

5.4 Alan Variance of Clocks [by Xinkai Wu]

2

(4)

(a) We just need to find the relation between the Fourier transform of different random processes. ˜ τ (f ) Φ

= =

1 ˜ √ φ(f ) e−i2πf ·2τ − 2e−i2πf τ + 1 2¯ ωτ √ 2˜ φ(f )e−i2πf τ (cos 2πf τ − 1) ω ¯τ

Also, since φ(t) is obtained by integrating ω(t) once, we have ˜ ) = −1 ω ˜ (f ) φ(f i2πf Combining the above two results, and using the fact that the spectral density is basically given by the modulus squared Fourier transform, we find SΦτ (f )

=

2 2 cos 2πf τ − 1 Sω (f ) ω ¯2 2πf τ

∝

f 2 Sω (f ) f or f 1/2πτ

∝

f −2 Sω (f ) f or f 1/2πτ

[An alternative way of finding SΦτ (f ): As discussed in Section 5.5 of the text, Φτ (t) can be regarded as obtained from ω(t0 ) using a filter K(t − t0 ). Then 2 2 ˜ ˜ ) we feed ω(t0 ) = exp[i2πf t0 ] into our SΦ (f ) = K(f ) Sω (f ). To find K(f τ

√

2(cos 2πf τ −1) system. We find, using eqn (5.55a) and (5.55b), Φτ (t) = exp[i2πf (t+τω¯ τ)]i2πf , 2 h i2 ˜ cos 2πf τ −1 2 whose modulus square gives K(f ) = ω¯ 2 . Thus we arrive at eqn 2πf τ

(5.56)] (b) Using the expression for SΦτ (f ) obtained in the previous part, and making the change of variable z ≡ 2πf τ in the integral, one finds 1/2 Sω (1/2τ ) 1 στ = α ω ¯2 2τ 2 Z +∞ 2 cos z − 1 Sω (z/2πτ ) where α = dz π z Sω (1/2τ ) 0 2 As one can verify, π2 cos zz−1 integrates to one, and has a profile shown in Fig. 3, which isn’t too much different from a delta function located at z = 2.4 (which isn’t too much away from z = π). Thus we see α is a dimensionless number of order unifty and has very weak τ -dependence which we can ignore. √ (c) If ω has a white-noise spectrum Sω (1/2τ ) ∼ const, then στ ∝ 1/ τ , and the clock stability is better for long averaging time; if ω has a flicker-noise spectrum Sω (1/2τ ) ∝ τ , then στ is independent of averaging time; if ω has a

0.3 0.25 0.2 0.15 0.1 0.05 2

4

6

8

10

12

Figure 3: Ex. 5.4 (b)

random-walk spectrum Sω (1/2τ ) ∝ τ 2 , then στ ∝ better for short averaging time.

√

τ , and the clock stability is

5.5 Cosmological Density Fluctuations [by Roger Blandford] (a) It’s quite straightforward to show this, Z 1 ξ(r) = hδ(x)δ(x + r)i = lim dxδ(x)δ(x + r) V →∞ V V Z Z Z 0 1 1 1 −ik·x ˜ dx dke δ (k) dk0 e−ik ·(x+r) δ˜V (k0 ) = lim V 3 3 V →∞ V (2π) (2π) V [performing the integral in x gives a delta function in (k+k’)] Z 1 1 dk eik·r δ˜V (k)δ˜V (−k) = lim V →∞ V (2π)3 [δ(x) is real ⇒ δ˜V (−k) = δ˜V∗ (k); also let k → −k in the integral] Z Z dk −ik·r |δ˜V (k)|2 dk −ik·r = e lim = e P (k) V →∞ (2π)3 V (2π)3 The universe is isotropic, namely, δ(x) = δ(|x|) ⇒ δ˜V (k) = δ˜V (k) ⇒ P (k) = P (k). And weRcan perform the momentum integral in spherical coordinates. π Using the fact 0 dθ sin θ exp(−ikr cos θ) = 2 sinc (kr), one finds Z ξ(r) = 0

∞

dk 2 k sinc (kr)P (k) 2π 2

(b) The mass measured within a sphere of radius R is given by Z 3 δR (x) = drδ(x + r) 4πR3 r"/2

! 1) modes have decayed away, and we have ) ) * * 2 2 ξ1 ξ1 ) 2.4) − t − 2.4 t Bz (), t) ≈ c1 e µ0 κe R2 J0 = 1.6B0 e µ0 κe R2 J0 , (51) R R where we have used ξ1 ≈ 2.4 and c1 ≈ 1.6B0 .

17.3: The Earth’s Bow Shock [by Xinkai Wu 2000] (a) The momentum flux ∼ ρv 2 while the magnetic pressure generated by the 2 earth’s dipole field ∼ B 2 /2µ0 ∼ (BE /2µ0 )(rE /r)6 , noticing that for a dipole −3 field B ∼ r . Balancing these two terms and plugging in the numbers: BE ∼ 3 × 10−5 T , rE ∼ 6 × 106 m, we get r ∼ 8.5rE ∼ 5 × 107 m. (b) Eqn. (17.23) in the notes gives: E1 + vs B1 = E2 + vs B2 and Eqn. (17.21) gives: ρ1 (v1 − vs ) = ρ2 (v2 − vs ) In the infinite conductivity limit and applied to both sides of the shock front, eqn (17.5) gives: E1 = −v1 B1 and E2 = −v2 B2 . Combining the above equations we get B2 /B1 = ρ2 /ρ1 = (v1 − vs )/(v2 − vs ) Namely the magnetic field strength will increase by the same ratio as the density on crossing the shock front. Intuitively we expect the compression to decrease as the field is increased, because increasing the field means increasing the magnetic pressure, which will in turn resist compression. To be more rigorous, let’s look at a limiting case of equation (17.24): When B gets very large, the magnetic pressure term dominates and this equation gives B1 ≈ B2 , i.e. ρ1 ≈ ρ2 , which means there’s almost no compression. 17.6: Hartmann Flow [by Guodong Wang 1999] (a) The force balance equation is given by equation (17.34): ∇P = j × B + η∇2 v,

(52)

d2 vx . dz 2

(53)

whose x-component is B = B0 ez and P = −Qx + p(z), −Q = jy B0 + η 9

Now using equation (17.5) j = κe (E + v × B) and E = E0 ey we find jy = κe (E0 − B0 vx ).

(54)

Combining the above two equations, we get κe B02 (Q + κe B0 E0 ) d2 vx − vx = − . 2 dz η η

(55)

(b) A special solution to this equation is v0 =

Q + κe B0 E0 , κe B02

and the general solution is thus given by ) ) * * Hz Hz vx = v0 + C1 cosh + C2 sinh , a a where H = B0 a find

"

κe η

$1/2

(56)

(57)

. Using the boundary condition vx (z = ±a) = 0 we

and thus vx =

−v0 , C2 = 0, cosh(H)

(58)

/ 0 Q + κe B0 E0 cosh(Hz/a) 1 − . κe B02 coshH

(59)

C1 =

10

Solution for Problem Set 17B (compiled by Dan Grin and Nate Bode) March 18, 2009

17.7 Properties of Eigenmodes [by Nate Bode] ˆ with eigenfrequency ωm . By definition (a) Let ξ~m be an eigenfunction of F 2 ~ ~ ˆ F[ξm ] = −ρωm ξm which tells us Z Z ∗ ∗ ~ ~ ˆ ξm · F[ξm ] = (−ρωm 2 )(ξ~m · ξ~m )dV . (1) ˆ ξ~m ∗ ] = −ρ(ωm 2 )∗ ξ~m ∗ . This along with being self adjoint We also know that F[ gives us Z Z Z ∗ ˆ ξ~m ] = ξ~m · F[ ˆ ξ~m ∗ ] = (−ρ(ωm 2 )∗ )(ξ~m ∗ · ξ~m )dV . ξ~m · F[ (2) Together this tells us Z Z (−ρ(ωm 2 )∗ )(ξm )2 dV = (−ρωm 2 )(ξm )2 , or

Z

∗ −ρ ωm 2 − ωm 2 (ξm )2 = 0 ,

(3)

(4)

∗ which is only true when ωm 2 = ωm 2 . BAM. ˆ with eigenfrequencies ωm and ωn (b) Let ξm and ξn be eigenfunctions of F respectively. Then, Z Z Z 2~ ~ ~ ~ ˆ ξn · F[ξm ]dV = ξn · (−ρωm ξm )dV = (−ρωm 2 )ξ~n · ξ~m dV . (5) We can also use self adjointness to interchange the m and n in the above equation to get: Z Z (−ρωm 2 )ξ~n · ξ~m dV = (−ρωn 2 )ξ~m · ξ~n dV , (6) which gives us that: Z

(ρ(ωm 2 − ωn 2 ))ξ~m · ξ~n dV = 0 .

Since ωm 6= ωn we get the desired equation: Z ρξ~m · ξ~n dV = 0 . 1

(7)

(8)

17.8 Reformulation of the Energy Principle [by Daniel Grin 2009] Using the given notation, the equation of motion reads ρ

d2 ξ = j × b + δj × B − ∇δP = Fˆ [ξ] . dt2

(9)

Note that ξ is a vector. Then ξ · Fˆ [ξ] = ξ · (j × b) + ξ · (δj × B) − ξ · ∇δP.

(10)

Applying Eq. (17.46), we see that ξ · Fˆ [ξ] = ξ · (j × b) + ξ · (δj × B) + ξ · ∇ {γP (∇ · ξ) + (ξ · ∇) P } .

(11)

From the equality of triple products (basic vector calculus identity), we see that ξ · Fˆ [ξ] = j · (b × ξ) + ξ · (δj × B) + ξ · ∇ {γP (∇ · ξ) + (ξ · ∇) P } .

(12)

Applying Eq. (17.42) we then obtain ξ · Fˆ [ξ] = j·(b × ξ)+ξ ·({∇ × b/µ0 } × B)+ξ ·∇ {γP (∇ · ξ) + (ξ · ∇) P } . (13) From the properties of cross products (a × c = −c × a), we then see that ξ · Fˆ [ξ] = j·(b × ξ)−{∇ × b/µ0 }·(ξ × B)+ξ ·∇ {γP (∇ · ξ) + (ξ · ∇) P } . (14) Now applying the identity ∇ · (a × c) = c · (∇ × a) − a · (∇ × c) to the second term, we see that − {∇ × b/µ0 } · (ξ × B) = ∇ · ({ξ × B} × b/µ0 ) −

b · (∇ × {ξ × B}) . (15) µ0

Now applying Eq. (17.41), we see that − {∇ × b/µ0 } · (ξ × B) = ∇ · ({ξ × B} × b/µ0 ) −

b2 . µ0

(16)

Applying this to Eq. (14), we thus see that ξ · Fˆ [ξ] = j · (b × ξ) + ∇ · ({ξ × B} × b/µ0 ) −

b2 µ0

+ ξ · ∇ {γP (∇ · ξ) + (ξ · ∇) P } . (17)

Now, since ∇ · (ga) = g∇ · a + a · ∇g, ξ · Fˆ [ξ] = j · (b × ξ) + ∇ · ({ξ × B} × b/µ0 ) − {γP (∇ · ξ) + (ξ · ∇) P } (∇ · ξ) .

b2 µ0

+ ∇ · {ξγP (∇ · ξ) + ξ (ξ · ∇) P } − (18)

2

Grouping terms under the divergence operator and simplifying, we see that ξ · Fˆ [ξ] = j · (b × ξ) + ∇ · ({ξ × B} × b/µ0 + ξγP (∇ · ξ) + ξ (ξ · ∇) P) − 2 γP (∇ · ξ) − {(ξ · ∇) P } (∇ · ξ)

b2 µ0 −

(19)

which (finally!) is the desired expression. R (b) Since the potential energy functional is W [ξ] = − 12 V dV ξˆ˙F [ξ], we have o R n 2 2 W [ξ] = − 21 V j · (b × ξ) − bµ0 − [(ξ · ∇) P ] (∇ · ξ) − γP (∇ · ξ) d3 x− , R 1 ˆ d2 x 2 ∂V ({ξ × B} × b/µ0 + ξγP (∇ · ξ) + ξ (ξ · ∇) P) · n (20) where we have applied Stokes theorem to simplify the expressions and remove one differential operator where possible.

(c) We wish to discuss stability using this energy functional, but first we can drastically simplify our lives by recalling that the fluid is incompressible (∇ · ξ = 0), and so o R n 2 W [ξ] = − 12 V j · (b × ξ) − bµ0 d3 x− . (21) R 1 ˆ d2 x 2 ∂V ({ξ × B} × b/µ0 + ξ (ξ · ∇) P) · n (d) Since the current is in the zˆ direction and the resulting magnetic field proportional to $−1 and points in the φˆ direction (in cylindrical coordinates) by Ampere’s law, it is straightforward (but extremely tedious) to show that (ξ × B) ×

Bφ2 b + ξ · (ξ · ∇) P = ξ (∇ · ξ) = 0, µ0 µ0

(22)

where in the last step we have again invoked incompressibility of the fluid. Thus, Eqn. (21) simplifies to Z 1 b2 W [ξ] = − j · (b × ξ) − d3 x. (23) 2 V µ0 (e) Since j = (I/2πR) δ($ − R)ez , the first term in W [ξ] can be simplified to − 12 LI {b($ = R) × ξ($ = R)}z ,

(24)

where L is the length of the cylinder. This assumes that the perturbation is axisymmetric. Applying the purely φˆ direction of the magnetic field and the azimuthal symmetry of the perturbation and Eq. (17.41), we simplify the preceding to 2 |B| − 21 LI × (ξ$ ξz,z + ξ$ ξ$,$ ) B + ξ$ ,$ = 2 − 12 LI × (ξ$ (25) ξz,z + ξ$ ξ$,$ ) − ξ$ /$2 B = µ0 2 − 4πR LI × (ξ$ ξz,z + ξ$ ξ$,$ ) − ξ$ /R |$=R 3

Figure 1: The general form of Bφ . X’s point into the paper while dots point out of the paper.

17.9 Differential Rotation in the Solar Dynamo [by Nate Bode] (a) We have: ~ ∂B ~ + = ∇ × ~v × B ∂t

1 µ0 κe

~ ∇2 B

(26)

where we are instructed to drop the last term. Looking at the φ component of this equation we have 1 ∂ ∂Bφ ~ φ ) − ∂ (r sin θΩBθ ) = (r(~v × B) (27) ∂t r ∂r ∂θ 1 ∂ 2 ∂ = (r sin θΩBr ) + (r sin θΩBθ ) (28) r ∂r ∂θ 1 ∂ ∂ = r2 sin θBr Ω + sin θΩ r2 Br + r ∂r ∂r ∂ ∂ (29) r sin θBθ Ω + rΩ (sin θBθ ) ∂θ ∂θ ∂ ∂ = sin θ Br r Ω + Bθ Ω + (term with ∇ · B = 0) (30) ∂r ∂θ (b) Examining the equation above we can determine whether the terms are positive or negative. In the upper hemisphere Br > 0 and ∂Ω/∂θ > 0, while in the lower hemisphere these inequalities are reversed. In both hemispheres Bθ < 0 and ∂Ω/∂r < 0. Therefore Bφ < 0 in the northern hemisphere (31) 4

At the south pole Br < 0 which switches the sign from the north pole and we have Bφ > 0 in the southern hemisphere. (32) See figure 1.

17.11 Rotating Magnetospheres [by Guodong Wang 2002, modified by Daniel Grin 2009] We use cylindrical coordinates throughout this problem.

(a) Faraday’s law reads 0 = ∂t B = −∇ × E.

(33)

Since the system is axisymmetric, ∂φ ≡ 0. Integrating Eq. (33) along circle φ ∈ [0, 2π] for any given ρ, z so that r = (ρ, φ, z), we see that Eφ · 2πρ = 0

⇒ Eφ = 0.

(34)

Thus E is poloidal. E = −Aˆ eφ × B.

(35)

∗

For some C||Ω ∝ zˆ, it true that eˆφ = C × r, and so (absorbing A into the definition of C, we see that E = − (C × r) × B.

(36)

The magnetosphere is perfectly conducting, and so (in the pure MHD limit with velocity generated by rotation only) E = −v × B = − (Ω × r) × B,

(37)

where Ω is the angular velocity vector. Thus C = Ω. (b) Again we apply Faraday’s law: 0 = ∂t B = −∇ × E = ∇ × [(Ω × r) × B].

(38)

Note that Ω = (0, 0, Ω), and thus Ω × r = (0, ρΩ, 0). (Let φ = 0 since the field is axisymmetric.) ⇒ −E = (Ω × r) × B = (ρΩBz , 0, −ρΩBρ ).

(39)

⇒ 0 = ∇ × [(Ω × r) × B] = [(∂z (ρΩBz ) + ∂ρ (ρΩBρ )]eφ .

(40)

Combining Eq. (40) with ∇ · B =

1 ρ ∂ρ (ρBρ )

+ ∂z Bz = 0, we obtain

(B · ∇)Ω = 0. 5

(c) On the exterior side of the surface Etout = ρΩ∗ Bnout in the θ = 0 plane, where Et is the component of the electric field tangent to the surface and Bn is the component of the magnetic field normal to the surface. On the interior side of the surface Etin = ρΩBnin . Applying vn = 0, [Et ] = 0, [Bn ] = 0 at the surface, we see that Ω = Ω∗ .

Ex. 17.12 Solar Wind [by Xinkai Wu 2002] (a) Write the velocity as v = vφ eφ + vk , and the magnetic field as B = Bφ eφ + Bk , where the term with subscript k means the part of the vector that lies in the (r, θ) plane. Then we see that the φ-component of E = −v × B comes purely from vk × Bk . The vanishing of this implies that vk ∝ Bk ∝ (B − Bφ eφ ). Absorbing the Bφ part into the Ω × r, we get v=

κB + (Ω × r) ρ

(41)

Multiplying both sides of the above equation by ρ and then taking its divergence we get 0 = ∇ · (ρv) = ∇ · (κB) + ∇ · (ρΩ × r)

(42)

= B · ∇κ,

(43)

where we’ve used the mass conservation equation together with the stationarity, ∇ · B = 0, and ∇ · (ρΩ × r) = 0 (since ρΩ × r only has φ-component and by axisymmetry derivative w.r.t. φ vanishes). Thus we see that κ is constant along a field line. And the proof for the constancy of Ω is the same as that given in part (b) of Exercise 17.10. (b) The divergence of a vector C in the spherical coordinates is given by 1 ∂ 2 r 2 ∂r (r Cr ) upon using axisymmetry. Taking C to be ρv and using ∇ · (ρv) = 0 we find that ρvr r2 is a constant, while taking C to be B and using ∇ · B = 0 we find that Br r2 is a constant. (c) Using the result of part (a), we have vr = Combining these we readily get

κ ρ Br ,

and vφ =

vr Br = . vφ − Ωr Bφ

κ ρ Bφ

+ Ωr.

(44)

(d) The e.o.m. in the stationary case is given by ρ(v · ∇)v = −ρ∇Φ − ∇P + 6

(∇ × B) × B . µ0

(45)

Let’s consider the φ-component of the above equation. Then the gravity and pressure terms have no contribution because of axisymmetry. We thus find that ρvr

Br ∂ 1 ∂ (rvφ ) = (rBφ ). r ∂r µ0 r ∂r

(46)

Multiplying both sides of the above equation by r3 and using the constancy of ρvr r2 and Br r2 we find that Λ ≡ rvφ −

rBr Bφ µ0 ρvr

(47)

is constant. Br , and the (µ0 ρ)1/2 2 a vr2 1 r 2 B2 . µ0 ρ = Br2 = MA r

(e) Define the radial Alfven speed to be ar = Mach number to be MA = have

vr ar .

Then we have

radial Alfven And thus we

rBr Bφ vr2 vr MA2 Br2 rBφ vr = rvφ − Br MA2

Λ = rvφ −

= rvφ −

(48) (49)

rvφ − Ωr2 , MA2

(50)

where to get to the last line we’ve used the result of part (c) to eliminate

vr Br .

Plugging the above expression for Λ into the r.h.s. of (17.86) we see that it indeed reduces to the l.h.s.: vφ . Based on previous parts, we know Br ∝ MA =

1 r2 ,

ar ∝

Br √ ρ

∝

1√ r2 ρ ,

and thus

vr ρvr r2 √ ∝ vr r2 ρ ∝ √ ∝ ρ−1/2 . ar ρ

(51)

As ρ varies when one goes outward radially, MA will eventually become unity at some critical radius rc . (f ) Eqn. (50) tells us that at this critical rc , Λ = Ωrc2 . Then the timescales for the sun to lose its mass and spin are τm =

m m ≈ dm/dt 4πρvr r2

τL =

2 2 m Ωr m Ωr L ≈ = = 2 2 dL/dt 4πρvr r Λ 4πρvr r Ωrc2

(52)

r rc

2 τm .

(53)

Thus we see that the sun loses its spin faster than it loses its mass by a ratio of (rc /r )2 ≈ 400. 7

(g) Plugging in the numbers, we find τc ∼ 3 × 1019 s ∼ 103 billion years, which is much larger than the lifetime of the sun.

8

Solution for Problem Set 19 (Ch 18) (compiled by Nate Bode) April 8, 2009

A

18.1 Boundary of Degeneracy [by Alexander Putilin ’00] We’ll ignore factors of order unity in what follows. √ −1/3 λdB = ~/(momentum) ' ~/ me kT , which immediately gives (a) l = ne ne (me kT )3/2 /h3 . √ −1/3 , which (b) Using the uncertainty principle, ∆x ' h/∆p ' h/ me kT ne 3/2 3 again reduces to ne (me kT ) /h . (c) The quantum mechanical zero-point energy is given by (∆p)2 /me ' −2/3 me ) kT , which reduces to ne (me kT )3/2 /h3 . h2 /(l2 me ) ' h2 /(ne

18.6 Parameters for various plasmas [by Henry Huang ’98] 1/2 1/2 T /1K 0 kT Text eq. (18.10) λD = ne = 69 n/1m m. 2 −3 3/2

6 (T /1K) 3 Text eq. (18.11) ND = n 4π 3 λD = 1.4 × 10 (n/1m−3 )1/2 . 2 1/2 ω 1 ne −3 1/2 = 56.4 Text eq. (18.13) fp = 2πp = 2π Hz. 0 me 2π n/1m 3/2 −1 1 1 ee −3 −1 (T /1K) (ln Λ/10) s. Text eq. (18.21) tD = ν ee = 2.5×10−5 n/1m D

Using the fact that Λ = 92 ND (see Exercise 18.2 part (a)), we get tee D = 4× −1 3/2 9ND 4 −3 −1 10 n/1m (T /1K) ln( 2 )/10 s. And we only need to know T and n to get numerical values of these. (a) Atomic bomb. −1.2 Text eq. (16.57) gives T ∼ 4 × 104 (t/1ms) K; at t = 1ms, T ∼ 4 × 104 K. ρ The discussion above eq. (16.57) gives ρ ∼ 5kg/m3 , which means n ∼ µm ∼ p 5kg/m3 29×1.66×10−27 kg

∼ 1026 m−3 .

1

(b) Space shuttle Box 16.2 gives T ∼ 9000K. And since shuttle moves at ∼ 7000m/s sound speed 280m/s (all given in Box 16.2), we can use eq. (16.45a) which says ρ1 γ−1 ρ2 ' γ+1 and gives ρ2 ∼ 5ρ1 taking γ ∼ 1.5. The density at the altitude of ρ2 70km is ρ1 ∼ ρground exp(−70km/8km) ∼ 10−4 kg/m3 , which gives n ∼ µm ∼ p 5ρ1 µmp

∼ 1022 m−3 .

(c) Expanding universe Text Fig. 16.1 gives: at recombination threshold, log T ∼ 3.5 ⇒ T ∼ 3 103.5 K ∼ 3 × 103 K. Also from chapter 26, ρ ∝ T 3 ⇒ ρthen = ρnow TTthen ∼ now 3 3 K then 10−29 g/cm3 3×10 ∼ 1010 m−3 . ∼ 10−20 g/cm3 . So we get n ∼ ρm 3K p Plugging the above values of T and n into the equations on the top of this page, we get λD (m) 1 × 10−9 7 × 10−8 4 × 10−2

A-bomb Shuttle Universe

ND 1 10 2 × 106

fp (Hz) 9 × 1013 9 × 1011 9 × 105

tD (s) 2 × 10−14 9 × 10−12 4 × 10−1

B

18.5 Stopping of alpha particles [by Alexander Putilin ’00] First calculate the energy loss of an α-particle in a Coulomb collision with an electron (with impact parameter b). Consider the collision in the electron’s rest frame. We can approximate the trajectory of the α-particle by a straight line (take this to be the x-axis). Then the momentum change of the electron is given by an integral of force over time: Fy = F sin θ = Z ∆pe =

Z

2e2 x √ 4π0 (b2 + x2 ) b2 + x2

+∞

Fy dt = −∞

(1)

dx e2 x e2 = . v 2π0 (b2 + x2 )3/2 π0 vb

(2)

Then the energy loss is ∆p2 ∆E = − e = − 2me

e2 π0

2

1 =− 2me v 2 b2 2

e2 π0

2

mα , 4me Eb2

(3)

where mα = 4mp is the mass of an α-particle and E = 12 mα v 2 is the energy of an α-particle. When an α-particle travels a distance d`, it loses energy: Z bmax ∆E · ne · 2πb · db · d`. (4) dE = ∆E · (number of collisions) = bmin

dE =− d`

Z

bmax

bmin

e2 π0

2

mα πne mα · ne · 2πb · db = − 4me Eb2 2me E

e2 π0

2 ln Λ, (5)

where Λ = bmax /bmin . To estimate bmax , notice that electrons in plastic are not free but rather are bounded in atoms. It means that there is no Debye shielding −1/3 and so a reasonable estimate for bmax is the atomic spacing: bmax ∼ ne ∼ −10 2 · 10 m. For bmin , use the usual formula ~ 2(2e2 ) , . (6) bmin = Max bo = 4π0 mα v 2 mα v We see that ln Λ depends on the energy E, but since Λ 1 and ln Λ varies slowly for large Λ, we can assume ln Λ to be constant equal to its initial value √~ at E = E0 = 100 MeV. So, bmin ≈ ≈ 2.5 · 10−16 m, and so finally mα

2E0 /mα

ln Λ ≈ 13. Integrating equation (1), we get πne mα 1 E02 − E(`)2 = 2 2me

e2 π0

2 ln Λ · `.

(7)

The range ` is defined by E(`) = 0, so `=

π 2 0 e2

me E02 . πmα ne ln Λ

(8)

Plugging in the numbers, we find ` ≈ 0.5 cm.

18.7 Equilibration Time for a Globular Cluster [by unknown author] (a) For single deflections when b ≤ b0 , σ = πb20 . While for cumulative deflections, in which each deflection has b b0 , then ∆E = −(b0 /b)2 E for each deflection. Since we are interested in the case where the test star has high kinetic energy compared to the field stars, then we add up ∆E linearly. Z bmax 2 ∆E b0 bmax =− nvt2πbdb = −2πvtnb20 ln (9) E b bmin bmin where bmin is b0 and bmax is R = the radius of the star cluster. So the energy change timescale is dominated by cumulative deflections: tE =

1 2πb20 nv ln Λ 3

(10)

In the gravitational case, b0 = 2Gm/v 2 and we get tE =

v3 8nG2 m2

ln Λ

(11)

To estimate this, use the Virial theorem(e.g. Goldstein) which says that the 2 cluster’s kinetic energy is half the potential energy, so 12 N mv 2 ∼ 12 G(NRm) , m 1/2 ⇒ v ∼ GN . R ln Λ = ln

R R = ln N = ln 106 = 14 = ln b0 2Gm/v 2

and tE =

N 3/2 14n(8GmR3 )1/2

(12)

(13)

3 Also we can put in n = N/( 4π 3 R ) to get

tE =

4π 1/2 3/2 R 3 N 14(8Gm)1/2

= 4 × 1017 s = 1.3 × 1010 yr

(14)

which is about the age of the universe. (b) The cluster will try to develop a distribution function that is a function of the constant of motion (so it satisfies the collisionless Boltzmann equation, i.e. Liouville’s theorem of chapter 2). The velocity distribution will try to become isotropic, so dN 1 N = 3 3 = f (E) = f (mΦ + mv 2 ) (15) d xd p 2 where Φ is the gravitational potential which is less than zero. In true equilibrium, this f (E) should become an exponential, so N = C exp(−E/kT ). However, only stars with E < 0 are gravitationally bound in the cluster; those with E > 0 escape and fly away. This means that N = 0 for E > 0 and N ' C exp(−E/kT ) for E < 0. Stellar encounters then keep kicking stars, occasionally, to energies E > 0, and those stars evaporate from the cluster. Since the evaporated stars have larger energy than average, the rest of the cluster keeps shrinking and becoming more and more tightly bound.

C

4

18.4 Dependence on thermal equilibration on charge and mass [by Alexander Putilin ’00] The ion equilibration rate for a pure He3 plasma is derived by the same method as proton-proton equilibration rate. We start with electron-electron equilibration rate (B.T. eq. (18.27)) νee

−3/2 kT me c2 2 −3/2 ne c ln Λ 8π e2 kT √ = 3 4π0 me c2 me c2 2 π

ne σT c ln Λ √ = 2 π

(16) (17)

Replace electron charge, density, and mass with corresponding values for He3 ions: e → 2e, ne → nHe = 12 ne , me → mHe = 3mp . We get νHe

He

2 −3/2 nHe c ln Λ 8π 4e2 kT √ 3 4π0 mHe c2 mHe c2 2 π −3/2 r kT 16 me nHe σT c ln Λ √ = √ me c2 2 π 3 mp T −3/2 ln Λ n 16 He −7 −1 = √ · 5.8 × 10 s 1m−3 1K 10 3 =

(18) (19) (20)

= 0.5 × 1020 m−3 , T = 108 K. Now estimate ln Λ. 1/2 0 kT D , with λD = nHe Λ = bλmin = 4.9 × 10−5 m, and bmin = Max[b0 = 2 (2e) q 2(2e)2 ~ ~ 3kT −13 m and mHe mHe v 2 , mHe v ], where v ' mHe . We find b0 = 4.4 × 10 v =

and we have nHe =

1 2 ne

−5

4.9×10 2×10−14 m. Thus we take bmin = 4.4×10−13 m, which gives ln Λ = ln 4.4×10 −13 ' 18. Plugging it into the formula for νHe He we get

νHe He ' 500 s−1

(21)

18.11 Adiabatic indices for rapid compression of a magnetized plasma [by unknown author] (a) The amount of momentum that passes through a surface ∆A normal to the z direction per time ∆t is me vz for each electron, and only those electrons (with velocity vz ) which are in the region of volume ∆Avz ∆t pass through, so the

5

total amount is ne me hvz2 i∆A∆t. Since Tzz is this number divided by ∆A∆t, then Pek = Tzz = ne me hvz2 i. Similarly, Pe⊥ = ne me hvx2 i = ne me hvy2 i, and since hvx2 i + hvy2 i = h|v⊥ |2 i, then Pe⊥ =

1 ne me h|v⊥ |2 i. 2

(b) From Box 10.1, we see that Θ = Sxx + Syy + Szz and Σzz =

1 2 Szz − (Sxx + Syy ) . 3 3

Invert to get that Szz = 13 Θ + Σzz and Sxx + Syy = 23 Θ − Σzz , so that one sees that d`/dt dSzz 1 dΘ dΣzz 1 = = + = θ + σ jk bj bk ` dt 3 dt dt 3 and d(Sxx + Syy ) 2 dΘ dΣzz 2 dA/dt = = − = θ − σ jk bj bk . A dt 3 dt dt 3 (c) The amount of kinetic energy corresponding to motion in the z direction in the fluid element is ne A` 12 me hvk2 i. Due to energy conservation, if the element expands, doing work at rate P d(volume)/dt = ne me hvk2 iA(d`/dt), then the energy must drop accordingly, so 2

dhvk i 1 d` ne A` me = −ne me hvk2 iA , 2 dt dt so

2

2 d` 1 dhvk i =− . 2 hvk i dt ` dt

Following the same argument for the perpendicular contribution to the energy gives 2 1 dhv⊥ i 1 dA 2 ne A` me = − ne me hv⊥ i` , 2 dt 2 dt so 2 1 dhv⊥ i 1 dA =− . 2 hv⊥ i dt A dt

6

Due to particle number conservation, ne A` is a constant (equals the number of particles in the fluid element). So setting d(ne A`)/dt to zero, and dividing both sides by ne A`, yields: 1 dne 1 d` 1 dA =− − . ne dt ` dt A dt (d) Using the above results, 2

1 dne 1 dhvk i d`/dt dA/dt 5 1 dPek = + 2 = −3 − = − θ − 2σ jk bj bk , Pek dt ne dt hvk i dt ` A 3 2 1 dPe⊥ 1 dne 1 dhv⊥ d`/dt dA/dt 5 i = + 2 =− −2 = − θ + σ jk bj bk . Pe⊥ dt ne dt hv⊥ i dt ` A 3

(e) When there is no expansion along B, we can set the d` = 0. Using the results from part (d) and mass conservation (which says that d ln A = −d ln ρ) respectively, d(ln Pe⊥ ) = −2d(ln A) = 2d(ln ρ) or

∂(ln Pe⊥ ) = 2. ∂(ln ρ)

And similarly, d(ln Pek ) = −d(ln A) = d(ln ρ) or

∂(ln Pek ) = 1. ∂(ln ρ)

When there is no expansion perpendicular to B, we can set the dA = 0. Using the results from part (d) and mass conservation respectively, d(ln Pe⊥ ) = −d(ln `) = d(ln ρ) or

∂(ln Pe⊥ ) = 1. ∂(ln ρ)

And similarly, d(ln Pek ) = −3d(ln `) = 3d(ln ρ) or

∂(ln Pek ) = 3. ∂(ln ρ)

(f ) Finally, using the results of part (d), d(ln (P⊥2 Pk )) = 2d(ln P⊥ ) + d(ln Pk ) = −5d(ln A) − 5d(ln `) = 5d(ln ne ),

7

so, integrating gives P⊥2 Pk ∝ n5e . Also, d(ln P⊥ ) = −d(ln `) − 2d(ln A) = d(ln ne ) − d(ln A) = d(ln ne ) + d(ln B), because the flux AB through a circle is constant. So, integrating gives P⊥ ∝ ne B.

D

18.8 Thermoelectric transport coefficients [by Jeff Atwell] (a) Basically a temperature gradient creates an electric current and an electric field causes heat flow because the carriers of both currents (electric and heat) are electrons, which always carry both energy and charge. Suppose that initially the electrons on the left side of a room are hotter than those on the right side of the room. This means that initially the electrons on the left side are moving faster, on average. If we then let the room equilibrate, the electrons initially on the left side will penetrate faster to the right side, on average, then the right side electrons penetrate to the left side. Then we would say that both heat and charge have flowed from left to right (because electrons carry both energy and charge). Now, after the room has equilibrated, let’s suppose that we turn on an electric field which causes the electrons to accelerate to the right. This clearly will cause heat to flow to the right. (b) The distribution function f (x, v) is defined by the relation f (x, v)dxdv = Number of particles in dxdv. Recall from Chapter 2 the Boltzmann transport equation: ∂f dv ∂f + v · ∇x f + · ∇v f = . ∂t dt ∂t coll Now recall from exercise 2.13 that it is often valid to use the “collision-time approximation”: f − f0 ∂f =− , ∂t coll tD 8

where f0 is the distribution function in thermal equilibrium. We are interested in the steady state, so that ∂f /∂t = 0, and so now the Boltzmann transport equation reads dv f − f0 v · ∇x f + · ∇v f = − . dt tD For simplicity, suppose there is an electric field E in the x direction and a temperature gradient dT /dx. Then the transport equation becomes ∂f f − f0 eE ∂f + vx =− . m ∂vx ∂x tD Rewriting this, we have f = f0 − tD

eE ∂f ∂f + vx m ∂vx ∂x

.

We now assume weak fields and small temperature gradients, that is, we assume (f − f0 )/f0 1. To this approximation eE ∂f0 ∂f0 . (22) f = f0 − tD + vx m ∂vx ∂x For the Maxwell-Boltzmann distribution, f0 is a function of the energy E and the temperature T , so ∂f0 ∂f0 dT = , ∂x ∂T dx and ∂f0 dE ∂f0 ∂f0 = = mvx . ∂vx ∂E dvx ∂E If we suppose that dT /dx = 0, then equation 22 reduces to f = f0 − etD Evx

∂f0 . ∂E

The electric current density is given by Z Z ∂f0 3 Jx = evx f d3 v = −tD e2 E vx2 d v, ∂E R as vx f0 d3 v = 0 because f0 is an even function of the velocity component vx . Similarly, the heat flux is given by Z Z ∂f0 3 qx = Evx f d3 v = −tD eE Evx2 d v. ∂E We work with the Maxwell-Boltzmann distribution m 3/2 2 f0 = n e−mv /2kT . 2πkT 9

Notice that

∂f0 1 =− f0 , ∂E kT Z tD e2 E vx2 f0 d3 v, Jx = kT

so

and

Z tD eE qx = Evx2 f0 d3 v. kT Doing the integrals on mathematica using this f0 , and dropping coefficients of order unity, I find: ne2 tD E, Jx ∼ m and netD kT qx ∼ E, m which agrees with equations (18.33). Now, from (18.31): 3/2 2 2 3/2 e2 kT mc e2 kT (kT )3/2 κe ∼ , ∼ ∼ 2√ 2 2 2 σT cm ln Λ mc e cm ln Λ mc e m ln Λ where I have used (18.28) for the Thomson cross section. Then from (18.33): κe ∼

ntD e2 ne2 m2 (kT /m)3/2 (kT )3/2 ∼ ∼ 2√ , 4 m m ne ln Λ e m ln Λ

where I have used (18.21) for tD . Notice that these two κe expressions agree. (c) When E = 0, equation 22 reads f = f0 − tD vx Then we have

Z Jx =

and

Z

dT ∂f0 . dx ∂T

evx f d3 v = −tD e

dT dx

Z

vx2

∂f0 3 d v, ∂T

Z dT ∂f0 3 Evx2 d v. dx ∂T Again, doing the integrals on mathematica, I find agreement with equations (18.34). Now, from (18.31): 5/2 2 2 5/2 kc kT mc kc kT k(kT )5/2 κ∼ ∼ ∼ 4√ , 2 2 2 σT ln Λ mc e ln Λ mc e m ln Λ qx =

Evx f d3 v = −tD

where I have again used (18.28) for the Thomson cross section. Then from (18.34): ntD k 2 T nk 2 T m2 (kT /m)3/2 k(kT )5/2 √ κ∼ ∼ ∼ , m m ne4 ln Λ e4 m ln Λ 10

where I have again used (18.21) for tD . Notice that these two κ expressions agree. (d) αβ ∼ κe κ

e kT

κ kT e κe ∼ 1 ≈ 0.581, κe κ

and αT +

e kT e e enkT tD kT κe ∼ κT + β ∼ κ+β ∼ + β ∼ β + β ≈ β. e kT e kT k m

α dT (e) Look at the heat flow when Jx = κe E + α dT dx = 0, or when E = − κe dx . It follows that dT αβ dT qx = −κ − βE = − 1 − κ . dx κe κ dx

18.12 Mirror machine [by Alexander Putilin ’00] 2 (a) Since µ = mv⊥ /(2B) is conserved, then since B increases by a factor of 10 2 from center to end, then v⊥ must increase by a factor of 10 as a particle goes from center to edge. So 2 2 v⊥,final = 10v⊥,initial ,

and by conservation of energy 2 2 2 vk,final = vk,initial − 9v⊥,initial . 2 So, they escape when vk,final > 0, so only those particles released that have, 2 2 initially, vk > 9v⊥ escape. That is, only a fraction given by

Z

π/2

3 cos αdα = 1 − √ = 0.0513 10 tan −1 3

escape. (b) The distribution function dN/d|x| as a function of the pitch angle α, averaging over particles at all locations throughout the bottle, will be zero beyond tan −1 3 (because these particles would have escaped). For small values of α, dN/d|x| will be pretty flat, which corresponds to the fact that at α = 0 (the mirror points), α changes linearly with time for a particle. The distribution function will drop off rapidly as the value |α| = tan −1 3 is approached. If we just look at the middle of the bottle, then assuming the particles are released continuously in time (over the time period of many cycles for the particles to bounce from one mirror point to another), then for all times later, the distribution will remain isotropic there, except for the removal of all particles with sin α > 0.95 = sin tan −1 3. 11

(c) Since the diffusion time is independent of α, and the hole is so large in α space (from tan −1 3 to π/2, about 0.32 radians, which is (0.32)/(π/2) = 1/5 of α space), then one out of five collisions scatter a particle out of the bottle, so it leaks out (e-folds) in approximately time 5tD . (d) Wouldn’t it seem more reasonable that particles shouldn’t diffuse with diffusion time independent of α but rather sin α to account for solid angle properly? In such case, then in part (c), all occurrences of α should be replaced by sin α, and since sin α goes from 0 to 1 particles that scatter into sin α > 0.95 get lost. In which case, 1/20 of all particles scatter out of the bottle in time tD (as opposed to 1/5 as computed in part (c)), so the e-folding time is 20tD .

12

Solution for Problem Set 19-20 (compiled by Dan Grin and Nate Bode) April 16, 2009

A

19.4 Ion Acoustic Waves [by Xinkai Wu 2002] (a) The derivation of these equations is trivial, so we omit it here. (b) Write the proton density as n = n0 + δn and substitute it into the equations of part (a), keeping only terms linear in δn, u, and Φ. We find ∂u ∂δn + n0 =0 ∂t ∂z ∂u e ∂Φ =− ∂t mp ∂z ∂2Φ e n0 e = − δn − Φ . ∂z 2 0 k B Te

(1) (2) (3)

Plugging the plane-wave solution where δn, u, and Φ are all of the form ∝ exp[i(kz − ωt)] into the above linearized equations, we find −iωδn + n0 iku = 0 e −iωu = − ikΦ mp e n0 e 2 Φ . −k Φ = − δn − 0 k B Te

(4) (5) (6)

For the above algebraic equation to possess a solution, the determinant of its coefficient matrix must vanish, which gives n0 e 2 n0 e 2 2 ω 2 −k 2 − + k = 0. (7) 0 k B Te 0 mp Solving this yields the dispersion relation ω = ωpp (1 + 1/k 2 λ2D )−1/2 , with λD =

0 k B Te n0 e2

1/2

and ωpp =

n0 e2 0 mp

1

1/2

.

(8)

For long-wavelength, kλD . 4t m ωp2 dx m ndx m

From here the interstellar magnetic field can be estimated to be < Bk >≈ 1.7 × 10−6 G.

2

(11)

B

19.5 Ion Acoustic Solitons [by Keith Matthews 2005] 1/2 (a) You’ll note that the u coefficient (kB Te /mp ) is the characteristic velocity of the system, so let’s name it vc . Also define α ≡ kB Te /e. Then u = vc (u1 + 2 u2 + ...) Φ = α(Φ1 + 2 Φ2 + ...).

(12)

For an arbitrary field ψ, √ ∂ψ 2 1/2 ∂ψ √ ∂ψ = −α + 2ωpp 3/2 ∂t λD ∂η ∂τ 2ωpp 1/2 ∂ψ ∂ψ = . ∂z λD ∂η

(13)

i) From the continuity eqn at leading order 3/2 we find ∂n1 ∂u1 + = 0, ∂η ∂η

(14)

∂u2 ∂(n1 u1 ) + = 0. ∂η ∂η

(15)

− and at 5/2 order we have −

ii) To leading order 3/2 the equation of motion produces ∂u1 eα ∂Φ1 − = 0. ∂η mp vc 2 ∂η However

eα mp v c 2

= 1 so we have ∂Φ1 ∂u1 − = 0. ∂η ∂η

(16)

We note that λD ωp p = vc so to 5/2 we have 2

− iii)

en0 λD 2 α0 2

=

1 2

∂u2 ∂u1 1 ∂(u1 ) ∂Φ2 + + =− ∂η ∂τ 2 ∂η ∂.η

(17)

so at order and 2 Poisson’s eqn gives n1 = Φ 1

(18)

2

∂ Φ1 1 1 = − (n2 − Φ2 − Φ1 2 ). 2 2 ∂η 2 3

(19)

iv) From Eqs. (14), (16) and (18) we find n1 = u 1 = Φ 1 . Taking

∂ ∂η

(20)

of Eq. (19) and invoking Eq. (20) gives 1 ∂ 3 n1 =− 2 ∂η 3

∂Φ2 1 ∂(n1 ) ∂n2 − − ∂η ∂η 2 ∂η

We find that we can take care of the (17) and again invoking Eq. (20).

∂n2 ∂η

−

∂Φ2 ∂η

2

!

.

(21)

term by adding Eqs. (15) and 2

∂n2 ∂Φ2 ∂n1 3 ∂(n1 ) − =2 + . ∂η ∂η ∂τ 2 ∂η Substituting Eq. (22) into (21) gives us 1 ∂n1 ∂ 3 n1 ∂n1 =− + n1 2 . 2 ∂τ ∂η ∂η 3

(22)

(23)

which is just what the doctor ordered. Since n1 = u1 = Φ1 any of the three can solve this KdV equation.

19.9 Exploration of Modes in CMA Diagram [by Kip Thorne 2005] (a) EM waves in a cold unmagnetized plasma can only propagate when their frequency exceeds the plasma frequency ωp , as can be seen from their dispersion relation q ω = ωp2 + c2 k 2 . Thus ω = ωp represents a cut-off. There is no turn-on. The unmagnetized condition places us on the x-axis, and the lower limit of the frequency places us to the left of unity. These waves correspond to the diagram in the lower left corner with the modification that with no magnetic field the distinction between X and O dissolves, and R and L modes have the same phase velocity, so the are represented by the same circle. (b) Denote REM: Right hand polarized electromagnetic waves LEM: Left hand polarized electromagnetic waves RW: Right hand polarized Whistler waves LW: Left hand polarized Whistler waves. From Fig. 19.3 we can deduce the following relation between Vφ = nc and ω. As we move from the lower left to the upper right on the CMA diagram, ω decreases with B and n fixed. So we sequentially encounter LEM LW −→ (LEM ) −→ (RW ) −→ . REM RW 4

Figure 1: phase velocity vs. frequency for parallel propagating modes The boundaries corresponding to L = 0 and R = 0 are specified in the diagram and correspond to the mode changes above. (c) Denote: O: ordinary mode XL: extraordinary lower mode XH: extraordinary hybrid mode XU: extraordinary upper mode. From Fig. 19.5 we can extract the following relation between the phase velocity and the frequency . As ω decreases with B and n fixed, we observe the following pattern XL XH −→ −→ (XH) −→ (XL). O O

The boundary for these regions are described by 1 = 0, 3 = 0, 1 = 0 respectively. Combined with the previous result b), we obtain the pattern shown in the CMA diagram. On the upper right in CMA one can see that RW becomes XL as we move from parallel to perpendicular mode and LW ceases to exist at some θ < π/2. Similar phenomena can be read off from the CMA diagram.

C 5

Figure 2: phase velocity vs. frequency for perpendicular propagating modes

19.8 Reflection of Short Waves by the Ionosphere [by Keith Matthews 2005] The brute force approach: For adiabatic spatial variations in the index of refraction, Fermat’s principle chooses rays of extremal time. This gives simple differential equations (Eq. (6.42)) which we express as r component: θ component:

d2 r ds2

+

d˜ n/dr n ˜

n ˜ dθ ds = C

h

dr 2 ds

i −1 =0

a constant.

(24)

We choose polar coordinates because, as we shall see, the maximum range is of the order of the radius of the earth re . s parametrizes the path length. We are told that the electron density is exponential in altitude: ne = n0 ey/y0 . where y is the altitude r − re . From the values given in the problem: y0 = ω 2 21.71 km and n0 = 107 /m3 . From Eq. (19.74): n ˜ 2 = 1 − ωpe2 which we write 2 as n ˜ 2 ≡ 1 − ηne where η ≡ mee0 ω2 . (ηn0 = 8.04 × 10−6 and re = 6.4 × 103 km.) n − 1). Note that n ˜ 0 = y10 (˜ I define ψ0 at the point of transmission as the angle between the vertical 6

and the outgoing ray. As a result the initial conditions are cos ψ0 = ( dr ds )0 , and n ˜0 dθ because sin ψ0 = r0 ( ds )0 , C = r0 sin ψ0 . r0 = re and I set θ0 = 0.

Numerically integrate with s as the independent variable until r(sf ) = re . (I used Mathematica.) The results give r and θ as a function of s, so you have to calculate the range R = re θ(sf ). By searching around I found the maximum range to be 6677 km for ψ0 = 90 deg. It is vital that this ray does not pass through the maximum altitude of ymax = 200 km. I found that ytop = 25.7 km. 20.1 Two-fluid Equation of Motion [by Xinkai Wu 2002] We’ll first write things in components and in the end convert back to vector/tensor notation. The Vlasov equation reads ∂fs ∂fs ∂fs + v j j + aj j = 0. ∂t ∂x ∂v We’ll make use of the following definitions Z ns = fs dVv Z ns uis = fs v i dVv Z ik Ps = ms fs (v i − uis )(v k − uks )dVv Z = ms fs v i v k dVv − ms ns uis uks .

(25)

(26)

Multiplying the Vlasov eq. by v k , integrating over velocity space, using integra∂aj tion by parts at various places and the fact that ∂v j = 0, also using the explicit expression Eq. (20.3) for the acceleration due to external EM field, we get jk ∂(ns uks ) ∂ ns q s k Ps j k + j + n s us us − (E + (us × B)k ) = 0. (27) ∂t ∂x m ms Now using the continuity equation, ∂ns ∂ =− (ns ujs ), ∂t ∂xj

(28)

one immediately sees that Eq. (27) reduces to Eq. (20.11) after converting to vector notation.

D 7

20.4 Landau Contour Deduced Using Laplace Transforms [by Xinkai Wu 2000] (a) This part is nothing but a definition of Laplace transformation. (b) The z-dependence is eikz , giving ∂/∂z → ik. Also integration by parts gives Z Z ∞

0

dte−pt ∂Fs1 /∂t = −Fs1 (v, 0) + p

∞

dte−pt Fs1 (v, t)

(29)

0

Noticing the above facts, we get by Laplace transforming the Vlasov equation: 0 ˜ 0 = −Fs1 (v, 0) + pF˜s1 (v, p) + vik F˜s1 (v, p) + (qs /ms )Fs0 E(p)

(30)

where s = p, e. Laplace transforming ∇ · E = ρ/0 gives us a second equation: Z ∞ Z ∞ X X ˜ ik E(p) = (qs /0 ) dv[Fs0 (v)/p + F˜s1 (v, p)] = (qs /0 ) dv F˜s1 (v, p). s

−∞

s

−∞

(31) Where to get the last equality we’ve used the fact that the unperturbed charge density is zero, i.e. the contribution from Fs0 (v) vanishes. Combining these two equations we easily get Eq. (20.41). (c) Setting ip = ω, and plugging Eq. (20.41) into Eq. (20.42), we immediately get Eq. (20.26) without that overall minus sign.

E

20.5 Ion Acoustic Dispersion Relation [by Xinkai Wu 2000] Recall the definitions of Debye length and plasma frequency for species s: 1/2

ωps =

ne2 0 ms

λDs =

0 k B Ts ne2

1/2

,

and the Maxwellian distribution is 1/2 ms ms v 2 Fs (v) = n . exp − 2πkB Ts 2kB Ts 8

(32)

(33)

Now consider the integral Is ( ωr k

q

k B Ts ms ,

Is ≈

Z

P

Fs0 (v) dv. v − ωr /k

(34)

nk2 ωr2 (see

eq. (21.37)), and this is the formula we are q B Ts s going to use for Ip . For ωkr

ωr )≡ k

20.8 Range of Unstable Wave Numbers [by Jeff Atwell] For instability, we need a ζ with ζi > 0 such that k 2 = Z(ζ) is real and positive. The corresponding ω = kζ specifies the mode. ζ with ζi > 0 correspond to points in the interior of the closed curve in the Z plane (see Fig. 20.4). So the intersection of the positive Zr axis and the interior of the closed curve gives the range of wave numbers that we are looking for. The shape of this closed curve depends on the distribution function F (v). We are told our distribution function has two maxima, v1 and v2 , and one minimum, vmin . This means that the closed curve in the Z plane crosses the Zr axis three times, twice moving downward, and once moving upward. (The curve in Fig. 20.4 happens to be an example of this situation.) Suppose v1 < v2 . Then in the case of the closed curve shown in Fig 20.4, we may p of wave numbers which have at least one unstable mode as p write the range Z(v2 ) < k < Z(vmin ) (where Z is evaluated using Eq. (0.47)). For different shapes of the closed curve (i.e. for different combinations of Z(v1 ), Z(vmin ), and Z(v2 ) being positive and negative) the wave number range will be a bit different, but similar. The steps in going from Eq. (20.47) to Eq. (20.49) work for maxima of F (v), in addition to minima. This means that in the case shown in Fig. 20.4 we can instead write the minimum wave number with an unstable mode as Z +∞ e2 [F (v) − F (v2 )] 2 kmin = Z(v2 ) = dv, 2 me 0 −∞ (v − v2 ) for example.

9

Solution for Problem Set 21 (compiled by Nate Bode) April 24, 2009

A

21.1 Non-resonant Particle Energy in Wave [by Keith Matthews adapted from Chris Hirata] The rate of change of the non-resonant electron kinetic energy is given by Z d 1 dUe = me v 2 F0 dv dt dt 2 Insert eq. (21.20) d ∂ ∂ ∂ F0 = F0 + v F0 = dt ∂t ∂z ∂v

D

∂ F0 ∂v

We note that Dnon−res is independent of v, integrate by parts twice, and then simply integrate to get Z dUe = me D F0 dv = me nD dt Insert eq. (21.23) for D = Dnon−res Z dUe 1 ∞ = 2ωi εk dk dt 2 0 which is the desired eq. (21.24).

21.2 Energy Conservation [by Alexander Putilin/ ’99] The electron kinetic energy density and momentum density are given by Z 1 me v 2 F0 dv (1) Ue = 2 Z 1 Sze = me v 3 F0 dv (2) 2 1

and we have ∂U e ∂Sze + = ∂t ∂z

∂F0 ∂F0 1 2 me v +v dv 2 ∂t ∂z Z ∂ ∂F0 1 = me v 2 D dv 2 ∂v ∂v Z ∂F0 = − D me vdv ∂v Z

Now using eq. (21.22) for the resonant electrons Z e2 π dkEk δ(ωr − kv) D= 0 m2e

(3) (4) (5)

(6)

we find ∂U e ∂Sze −e2 π + = ∂t ∂z 0 me

Z

Ek ωr 0 ωr F k2 0 k 20 me k2 ωr = πe2 ωr ωi , k

dk

which becomes, upon using eq. (21.12) F00

(7)

Z −

dk 2ωi Ek (8) Z Z ∂ ωr ∂ dkEk − dkEk using eq. (21.18) (9) =− ∂t ∂z k ∂U w ∂Szw =− − (10) ∂t ∂z R used the facts that dkEk is the wave’s energy density U w , and Rwhere we’ve w r dkEk ∂ω ∂k is the wave’s energy flux Sz . Thus we finally have ∂U e ∂Sze ∂U w ∂Szw + =− − ∂t ∂z ∂t ∂z

(11)

which is the energy conservation law.

21.3 Cerenkov Power in Electrostatic Waves [by Alexander Putilin ’99] The emission rate of plasmons is given by (21.43) W =

πe2 ωr δ(ωr − k · v) 0 k 2 ~

Each plasmon has energy ~ωr , so the radiated power per unit time is Z Z 1 e2 ω2 3 P = d kW ~ω = d3 k 2r δ(ωr − k · v) r 3 2 (2π) 8π 0 k 2

(12)

(13)

The integration is over the region k < kmax (outside this region waves are strongly Landau damped). A good estimate of kmax is the inverse Debye length: kmax ∼ 1/λD (see the discussion at the end of Sec. 21.3.5). Since kλD < 1, we can approximate ωr (k) by a constant ωp . Choosing v to point along z-axis, we have Z e2 ωp2 1 (14) d3 k 2 δ(ωr − k · v) P = 2 8π 0 k R we get: Z r Z R dr0 dr0 q q + z(r) = r0 R3 R 0 2M − 1 2M r 02 − 1 " # r r p p R3 2M 1− 1− + 8M (r − 2M ) − 8M (R − 2M ). (5) = 2M R √ So outside the star, we get something like z ∼ r or r ∼ z 2 , which looks like a paraboloid. (b) Inside the star, we have z ∼

√ c − r2 , which is a sphere.

(c) We need to check dz/dr at r = R. Inside the star, after simplifying we have r dz 2M = . dr R − 2M

Outside the star, we have

r 2M dz = . dr R − 2M Now we see that they agree, so there is no kink. (d) Let’s compute the proper radial distance from the center to the surface: ! r r Z R Z R dr R 2M dr q q arcsin = =R . `= 2M R 2M r 2 0 0 1 − 1 − 2m(r) 3 R r If we use R M to expand the arcsin , we get: 1 ` = R + M. 3

So R is less than the distance ` by an amount of order M . The difference turns out to be about 1.5 millimeters for the Earth and about 1 kilometer for a massive neutron star.

24.5 Gravitational redshift [by Alexei Dvoretskii] (a) At the origin of the inertial reference frame of the atom the metric is flat:~eαˆ · ~eβˆ = ηαˆ βˆ. This is automatically satisfied if the basis vectors coincide with those of an orthonormal basis quoted in the problem. See Exercise 24.1 for a discussion. (b) In the inertial frame the atom’s 4-velocity is ~u = ~eˆ0 , so the energy of the photon as measured by the atom is ˆ

E = hνem = −~ p · ~u = −~ p · ~eˆ0 = −pˆ0 = p0 . 6

(6)

(c) In the orthonormal basis pˆ0 = p~ · ~eˆ0 = −hνem,

(7)

1 − 2M/r~eˆ0 , p p pt = p ~ · ~et = 1 − 2M/R p~ · ~eˆ0 = − 1 − 2M/R hνem .

(8)

and since ~et =

p

(d) The metric components do not depend on t and therefore, as was shown in part (a) of Exercise 23.4, pt is constant. (e) The observer who is measuring the frequency of the photon, projects the photon’s 4-momentum on her timelike basis vector, νrec = −~ p · ~eˆ0 /h.

(9)

Since the observer is far away, the metric is flat and the coordinate and orthonomal bases are the same. Therefore νrec = −~ p · ~e0 /h = −pt /.h

(10)

(f ) From here it is trivial to calculate that the redshift is equal to RS = (g) For the earth,

λrec − λem M 1 =p −1≈ λem R 1 − 2M/R

RS = For the sun,

4.4 × 10−3 m ≈ 6.9 × 10−10 . 6.4 × 106 m

(12)

1.5 × 103 m ≈ 2.1 × 10−6 . 7.0 × 108 m

(13)

RS = For a neutron star,

(11)

2M/R = 2

1.4 × 1.5 × 103 m ≈ 0.4. 1.0 × 104 m

(14)

25.7 Implosion of the Surface of a Zero-Pressure Star [by Xinkai Wu 2000] − (a) There is no t-dependence in the metric components, thus → u · conserved along the particle’s world line. ut = gtt ut = −(1 −

2M t 2M t )u = −(1 − )u (R = R0 ). R R0 7

∂ ∂t

= ut is

(15)

Also we have the normalization of the 4-velocity: − −1 = → u 2 = gtt (ut )2 + grr (ur )2 , ur (R = R0 ) = 0, giving us ut (R = R0 ) =

q 1 1− 2M Ro

. Thus we find

ut = − t

and u =

ut gtt

=

q

1− 2M Ro

1− 2M R

(16)

r

1−

2M , Ro

(17)

.

(b) Plugging ut obtained above into the equation of the 4-velocity normalization 2M 2M 1/2 r . In (as written out in part a), we easily get dR dτ = u = −[− R0 + R ] Newtonian gravity, energy conservation for a freely falling particle (a particle on the star’s surface) says 1 2 so

dR dt

2

−

M M = const = − , R R0

(18)

1/2 2M 2M dR =− − + , dt R0 R

(19)

which agrees with the GR result. (c) Integrating Eq. (24.67) and using the initial condition τ = 0, R = R0 , one finds " r # r r R0 M M M R0 τ=√ R − + R0 arctan −1 . (20) R R0 R0 R 2M Setting R = 2M and expanding the above expression to leading order in R0 for R0 >> M , we find 1/2 π R03 τ≈ . (21) 2 2M To find out the orbit period at leading order of large R0 we can use Newtonian mechanics, which gives 2πR0 2πR0 = 2π τorbit = =p v M/R0 √ and we see τ /τorbit = 1/4 2.

8

R03 M

1/2

,

(22)

(d) In Eddington-Finklestein coordinates, by the same argument as given in (a), we have the conserved quantity e

uet = getet ut + getr ur 2M et 2M r = −(1 − )u + u . R R

(23)

The 4-velocity normalization now reads − −1 = → u 2 = getet (ut )2 + grr (ur )2 + 2getr ut ur 2M 4M et r 2M e )(ut )2 + (1 + )(ur )2 + uu . = −(1 − R R R e

e

(24)

r Evaluating the above two equations at R = R0 and noticing q that u (R = R0 ) =

0, we obtain the value of the conserved quantity uet = −

1−

2M R0 .

Eq. (24) then yields

r

− 1−

2M 2M et 2M r = −(1 − )u + u . R0 R R

(25)

Using Eq. (25) to express uet in terms of ur and substituting the resulting expression into Eq. (24), we readily get 1/2 2M 2M dR r =u =− − + . dτ R0 R q q 2M 2M 2M − + − 1 − 2M R R R R0 de t 0 e t =u = . 2M dτ 1− R

(26)

Since the expression for ur is the same as in parts (b), τ (R) is given by the same solution obtained in part (c)[this is expected: because R is the same in both coordinates (2πR = circumference around star) and τ is the same (proper time)]. In particular, the proper time τ for the surface to go from R0 to 2M is the same as in (c). We could write out dt˜/dR = ut˜/ur and then integrate it to find t˜(R). The integral is doable but the answer is long and not so illuminating. So we use the following analysis: The E-F coordinate time e t for the surface to go from R0 to 2M is given by R R=2M det e t = R=R0 dτ dτ. de t dτ (R

= R0 ) =

de t dτ (R

= 2M ) =

q 1 1− 2M R

q

0

1−

≈ 1 when R0 >> 2M .

2M R0

+

2

q 1 1− 2M R 0

≈

3 2

when R0 >> 2M.

Thus the E-F coordinate time e t for the surface to go from R0 to 2M is finite and 9

of the same order as the proper time given by (24.57). Note that the expression de t is (24.67) is valid also inside the gravitational radius because as seen above dτ finite at R = 2M. 1/2 τ − π]2/3 , By Taylor expansion it’s easy to show that as R → 0, R ∝ [2( 2M R3 ) o

de t dτ

1/2 ∝ R−1/2 ∝ [2( 2M τ − π]−1/3 . R3o ) R de t Thus the integral R=0 dτ dτ converges, which means the E-F coordinate time e t

and

R3

to reach R=0 is finite and of the order 21 ( 2M0 )1/2 . The proper time is, of course, of the same order. t

e

te

dt dt (e) dR = uur , dR = uur . and we’ve worked out all these 4-velocity components in the previous parts. It is then trivial to check that: de t de t dR is always negative (in particular, dR (R → 0) = −1). dt when R > 2M . dR < 0 −→ ∞ when R −→ 2M >0 when R < 2M , which verifies that the world lines in E-F coordinates and Schwarzschild coordinates are given by Fig 24.6 (a) and (b), respectively.

25.8 Gore at the singularity [by Alexei Dvoretskii 99] (a) See Fig. 24.6 of the text. For r < 2M let’s continue to work in the Schwarzschild metric. Its advantage is that it is by now quite familiar and has a simple form. (The angular coodinates are assumed to be fixed.) ds2 = −

dr2 + (2M/r − 1)dt2 . 2M/r − 1

(27)

Thus the t and r coordinates have “switched places” and now have a somewhat counterintuitive meaning, r being timelike and t being spacelike. However, the light cones are still given by ds2 = 0 and so dt/dr =

1 . 2M/r − 1

(28)

The geodesics of the matter molecules must lie inside them and as they approach the singularity the light cones become narrower and narrower so that at r = 0, dt/dr = 0. (b) The curve to which the worldline asymptotes (t, θ, φ) = const is a timelike geodesic since it is (1) timelike (2) radial (3) has Pt = 0 = const. (c) It’s straightforward to compute gµˆνˆ using Eq. (24.68) and the gµν in Schwarzschild coordinates. One finds gˆ0ˆ0 = −1, gˆ1ˆ1 = gˆ2ˆ2 = gˆ3ˆ3 = 1, thus Eq. (24.68) gives the basis vectors of the infalling observer’s local Lorentz frame.

10

The components of Riemann in this basis are related to those given in Box 24.1 by linear transformation and are Rˆ0ˆ1ˆ0ˆ1 = −Rˆ2ˆ3ˆ2ˆ3 = −

2M r3

Rˆ2ˆ1ˆ2ˆ1 = Rˆ3ˆ1ˆ3ˆ1 = −Rˆ0ˆ3ˆ0ˆ3 = −Rˆ0ˆ2ˆ0ˆ2 = −

M . r3

(29)

The geodesic deviation equation is Eq. (23.42) ˆ

d2 ξ j ˆ ˆ = −Rjˆ0kˆˆ0 ξ k , dτ 2

(30)

and in our case ˆ

d2 ξ 1 2M ˆ ˆ ˆ = −R1ˆ0ˆ1ˆ0 ξ 1 = 3 ξ 1 2 dτ r ˆ M ˆ d2 ξ 2 ˆ ˆ = −R2ˆ0ˆ2ˆ0 ξ 2 = − 3 ξ 2 dτ 2 r 2 ˆ 3 d ξ M ˆ ˆ ˆ = −R3ˆ0ˆ3ˆ0 ξ 3 = − 3 ξ 3 , dτ 2 r

(31)

thus we see the radial direction gets stretched and the tangential ones get squeezed. When r → 0 the r.h.s. of the above expressions diverge, giving an infinite stretching/squeezing force. (e) The acceleration measured in earthly g⊕ ’s the observer is experiencing can be estimated as (M/M⊕ ) h , (32) a/g⊕ ≈ (r/r⊕ )3 r⊕ substituting M = 5 × 109 M = (5 × 109 ) · (3 × 105 )M⊕ = 1.5 × 1015 M⊕ . r⊕ = 0.01r = 0.01 · 2.3 × 105 (2M ) 2M = 0.01 · 2.3 × 105 = 5 × 10−7 (2M ), 5 × 109 h = 2 meters = 3 × 10−7 r⊕ , and we get a/g⊕ = 6 × 10

−11

2M r

3

.

(33)

(34) (35)

(36)

A typical observer could only withstand a few g⊕ ’s so she would normally die at r† ≈ 2 × 10−4 (2M ) = 106 (2M ) = 10 s, in other words deep inside the black hole. 11

Now, in this region the interval is ds2 = −dτ 2 = −

dr2 + (2M/r − 1)dt2 ≈ −(r/2M )dr 2 . 2M/r − 1

(37)

Integrating from r † to 0 gives 2r† τ = 3 †

r

r† = 0.01r† = 0.1s. 2M

(38)

one can also use the expression τ (R) worked out in part (c) of Exercise 24.7 to find out this τ † , which is given by τ (R = 0) − τ (R = r † ), expanded to O[(R0 )0 ] for large R0 . This gives the same answer as obtained above.) Thus the observer is torn apart by tidal force about 0.1 second before hitting the singularity.

24.9 Wormholes [by Alexei Dvoretskii 99] . For the line element in isotropic coordinates 4 M 2 ds = 1 + (d¯ r2 + r¯2 dφ2 ) 2¯ r substitute ρ =

M 2 2

(39)

/¯ r to get

1+

M 2ρ

4

(dρ2 + ρ2 dφ2 ).

(40)

which has precisely the same form as the first equation but with r¯ replaced by ρ. Thus there are two asymptotically flat spaces, at r → ∞ and ρ → ∞ with the intermediate region connecting them together. Topologically, this wormhole could look like Figure 31.5 of MTW. (Both top and bottom halves have the form given in Eq. (24.51).)

24.11 Penrose process, Hawking radiation, and black-hole thermodynamics. [by Kip Thorne and Xinkai Wu 2002] (a) Consider the event P inside the ergosphere at which the plunging particle is created. At P ∂/∂t is spacelike. Choose a local Lorentz frame in which it lies in the plane spanned by ~eˆ0 and ~eˆ1 . Then by performing a boost in the ~eˆ1 direction we can make ~eˆ1 point in the same direction as ∂/∂t, so ∂/∂t = K~eˆ1 for some K > 0. We are free to choose the direction of p ~plunge , so long as it is timelike. Any choice for which the plunging particle moves in the positive ~eˆ1 direction, so pˆ1 = p~ · ~eˆ1 > 0 will lead to pt = p ~ · ∂/∂t = p ~ · (K~eˆ1 ) = Kpˆ1 > 0 and hence E = −pt < 0. 12

(b) In the Kerr coordinate basis, ξ t = 1, ξ r = ξ θ = 0, ξ φ = ΩH . And the horizon’s generators are the world lines given in eq. (25.77), r = rH , θ = const, φ = ΩH t + const whose tangent vectors are ut (1, 0, 0, ΩH ). So we see ξ~ is tangent to the horizon’s generators. On the horizon, ∆ = 0, which means α2 = 0, and the norm of any vector field χ ~ is then given by χ ~2 =

2 2 ρ2 r 2 (χ ) + ρ2 χθ + $2 χφ − ΩH χt ∆

(41)

Each of the three terms on the r.h.s. of the above equation is ≥ 0, i.e. the norm of χ ~ becomes zero(null vector field) only when χr = 0, χθ = 0 and χφ = ΩH χt ~ for all other vectors this norm is positive, i.e. those which is the case of ξ, vectors are all spacelike. (c) Since the Kerr metric is independent of t and φ(time-translation symmetry and axisymmetry), we have the two conserved quantities E ≡ −pt and L ≡ pφ which are interpreted as energy and angular momentum of the particle, respectively. We have ∆M = E and ∆JH = L, hence ∆M − ΩH ∆JH = E − ΩH L = −pt − ΩH pφ

(42)

On the other hand −~ p · ξ~H = −pt − ΩH pφ using the expression for ξ~ given in part (b). Thus Eq. (24.89) is true. ~ points in the time (d) Choose a Lorentz frame in which the timelike vector A 0 0 ~ ~ points in the ~e1 direction so A = A ~e0 with A > 0, and the null vector K 0 0 ~ ~ ~ direction so K = K (~e1 + ~e0 ) with K > 0, thus A · K = −A0 K 0 < 0. Thus −~ pplunge · ξ~H is positive (~ pplunge is timelike, future directed, and ξ~H is null.) (e) Combining the results of part (c) and (d), we have ∆M − ΩH ∆JH > 0, which implies ∆JH < ∆M/ΩH . Thus ∆M < 0 implies ∆JH < 0. (f ) ΩH , JH , gH , AH are all functions of (M, a) with the explicit expressions given in the text. Thus one can express the r.h.s. of eq. (25.90) in terms of ∆M and ∆a in a staightforward manner. ∆a terms cancel, and the ∆M terms sum to give ∆M , which is the l.h.s. of Eq. (24.90). ~ (g) TH ∆SH = 2πk gH ∆SH = B c, this becomes SH = k4B Al2H .

gH 8π ∆AH

implies SH =

kB A H 4 ~ .

Restoring G and

p

(h) Recall that M ≈ 1.5km ≈ 0.5 × 10−5 s. Thus we find for a ten solar mass black hole TH ≈ 6 × 10−9 degrees Kelvin, and its entropy SH ≈ 1 × 1079 kB . 25.12 Slices of simultaneity in Schwarzschild spacetime [by Alexei Dvoretskii 99.] The Schwarzschild spacetime can be sliced by surfaces t = const. Since the 13

Schwarzschild coordinates are orthogonal to each other the world lines of observers with constant spacelike coordinates and varying t will be orthogonal to those surfaces and therefore for those observers the surfaces will be simultaneities. If t is kept constant as the black hole horizon is crossed the form of the line element is changed and t becomes spacelike. Therefore slices of t = const can no longer be viewed as simultaneities. For Eddington-Finklestein coordinates no such problem exists and the simultaneities cover the interior as well as the exterior of the black hole. (See Fig. 24.4 in the text.)

14

Solutions for Problem Set for Ch. 25 (compiled by Nate Bode) May 20, 2009

A-D

25.4 Behavior of h+ and h× under rotations and boosts [by Xinkai Wu ’02] (a) Quantities with a tilde denote those in the new basis, and those without tilde in the old basis. As suggested in equation 25.51 we perform a change of ˜x + i˜ basis: e ey = (ex + iey )eiψ . Then, ˜x = ex cos ψ − ey sin ψ, e ˜y = ey cos ψ + ex sin ψ . e

(1)

Plugging the above transformation matrix into equation 25.41 we find the components of Riemann in the new basis Rx˜0˜x0 = cos2 ψRx0x0 + sin2 ψRy0y0 − 2 cos ψ sin ψRx0y0 1¨ 1¨ = cos 2ψ − h+ − sin 2ψ − h× 2 2

(2) (3)

¨˜ on the other hand Rx˜0˜x0 = − 21 h + , thus we get ˜ + = (cos 2ψ)h+ − (sin 2ψ)h× h

(4)

Similarly, by looking at Rx˜0˜y0 , we find ˜ × = (cos 2ψ)h× + (sin 2ψ)h+ h

(5)

Translated into complex numbers, this is just ˜ + + ih ˜ × = (h+ + ih× )e2iψ h

(6)

(b) The desired boost is a boost along the z direction, which gives ~e˜0 = ~e0 cosh β + ~ez sinh β, ~e˜z = ~e0 sinh β + ~ez cosh β 1

(7)

with ~ex , ~ey unchanged. And the corresponding transformation for the coordinates is t˜ = t cosh β − z sinh β, z˜ = −t sinh β + z cosh β which gives t˜ − z˜ = (cosh β + sinh β)(t − z)

(8) (9)

with x, y unchanged. Look at components of Riemann in the new basis using the above transformation matrix, we find −1 ¨ + (t − z) (10) (cosh β − sinh β)2 h Rx˜0x˜0 = (cosh β − sinh β)2 Rx0x0 = 2 ¨ ¨˜ 2 ˜ + (t˜− z˜) = (−1/2) h on the other hand Rx˜0x˜0 = −1 h + (t−z)/(cosh β +sinh β) . 2 Equating this with eq. (10) we find ¨ ˜ + (t − z) = h ¨ + (t − z) ⇒ h ˜ + = h+ h

(11)

By looking at Rx˜0y˜0 one can show the invariance of h× in a similar manner. 25.5 Energy-momentum conservation in geometric optics limit [by Alexander Putilin ’00] Starting from equation 25.58: 1 hh+,α h+,β + h×,α h×,β i 16π

GW Tαβ =

(12)

In the geometric optics limit: h+ =

Q+ (τr ; θ, φ) Q× (τr ; θ, φ) , h× = r r

(13)

~ r is null, and we have ∇~ ~k = 0, ∇~ r = 1 (∇ ~ · ~k)r. The wave vector ~k ≡ −∇τ k k 2 GW |β β β To show Tαβ = 0, it’s sufficient to prove that h+|β = h×|β = 0. We’ll follow the pattern used in Sec 25.3.6, h+,β = h+,τr

∂τr + h+,β 0 = −h+,τr kβ + h+,β 0 ∂xβ

(14)

∂φ ∂θ where prime denotes derivatives at fixed τr (i.e. h+,β 0 = h+,θ ∂x β + h+,φ ∂xβ + ∂r h+,r ∂xβ ) Differentiating the second time we get

h+|ββ = −h+,τr,β kβ − h+,τr (kβ

|β

) + h+|β 0 β

(15) 0

~ · ~k)h+,τ − h+|β 0 ,τ k β + h 0 β (16) = h+,τr τr k β kβ − h+,τr |β 0 k β − (∇ r r +|β ~ · ~k)h+,τ + h 0 β = ~k 2 h+,τr τr − 2k β h+,τr |β 0 − (∇ r +|β 2

0

(17)

~k 2 = 0 so the first term vanishes. Notice that in geometric optics limit h+ is a fast varying function of τr and slowly varying function of θ, φ, r. It means that derivatives of h+ w.r.t. τr are much larger than derivatives at fixed τr and we can neglect the last term. ~ · ~k)h+,τ h+|ββ ≈ −2k β h+,τr |β 0 − (∇ r

(18)

k β h+,τr |β 0 is the directional derivative along ~k (at fixed τr ). Since θ, φ are constant along a ray, only 1/r factors can vary ∂Q+ 1 1 1 ∂Q+ β β = − 2 ∇~k r = − ∇~k r h+,τr (19) k h+,τr |β 0 = k r ∂τr ,β 0 ∂τr r r 1 ~ ~ using ∇~k r = (∇ · k)r (20) 2 1 ~ ~ · k)h+,τr (21) = − (∇ 2 Finally ~ · ~k)h+,τ − (∇ ~ · ~k)h+,τ = 0 h+|ββ = (∇ r r

(22)

The equality h×|ββ = 0 can be derived in exactly the same way. 25.6 Transformation to TT gauge [by Alexander Putilin ’99 and Keith Matthews ’05] (a) Consider the gauge transformation generated by ξα : hαβ → h0αβ = hαβ − ξα,β − ξβ,α ,

(23)

¯ αβ → h ¯0 = h ¯ αβ − ξα,β − ξβ,α + ηαβ ξ ,µ . h αβ µ

(24)

,β ¯ ,β − ξ β − ξ β + ξ µ h¯0 αβ = h µ,α αβ α,β β,α

(25)

or, Then

¯ = h αβ

,β

− ξα,β

= −ξα,β β

β

(26) (27)

¯ ,β = 0, since h ¯ αβ where to get the last expression we’ve used the fact that h αβ is in Lorentz gauge. If we want h¯0 αβ to remain in Lorentz gauge, we see that the generators ξα should satisfy wave equation: ξα,β β = 0 The general solution of this equation can be written as a sum of plane waves: Z i d3 k h i(k·x−ωt) i(k·x+ωt) A (k)e + B (k)e (28) ξα (t, x) = α α (2π)3 3

The first term describes the wave propagating in the k direction and second one in −k direction. In our cases we need only the first term (since we consider a gravitational wave propagating in some particular direction). So Z d3 k Aα (k)ei(k·x−ωt) (29) ξα (t, x) = (2π)3 R d3 k R ik·x At time t = 0: ξα (0, x) = (2π) , or Aα (k) = d3 xξα (0, x)e−ik·x . 3 Aα (k)e We see that ξα (x) are completely determined by four functions of three spatial coordinates: ξα (0, x). These functions give initial conditions for the wave equation at t = 0. (b) Consider a plane gravitational wave propagating in the z-direction. ¯ αβ = h ¯ αβ (t − z) = h ¯ αβ (τ ), h

τ ≡t−z

(30)

¯ αβ is in Lorentz gauge, i.e. h ¯ ,β = h ¯ ,t + h ¯ ,z = −h ¯ αt,t + h ¯ αz,z = −h ¯0 − h ¯0 h αt αz αt αz αβ =0

(31) (32)

Where prime denotes derivatives with respect to τr as defined in the previous ¯ αz = −h ¯ αt + const. The constant is irrelevant and we problem. Integrating: h ¯ αz = −h ¯ αt . can set it to zero, thus h These four gauge conditions reduce the number of independent components ¯ αβ from 10 to 6: h ¯ tt , h ¯ tx , h ¯ ty , h ¯ xx , h ¯ xy , h ¯ yy . of h Now make additional gauge transformation with ξα,β β = 0

(33)

¯ new → h ¯ αβ − ξα,β − ξβ,α + ηαβ ξµ ,µ h αβ

(34)

ξα = ξα (τ ) = ξα (t − z),

We note that ξµ ,µ = −ξt,t + ξz,z = −ξt0 − ξz0 . ¯ new = ¯ new satisfy additional constraints: h We want to choose ξα so that h tt αβ new new new new ¯ ¯ ¯ ¯ htx = hty = 0, hxx + hyy = 0. ¯ new = h ¯ tt − 2ξt,t + (ξ 0 + ξ 0 ) = h ¯ tt + ξ 0 − ξ 0 h tt t z z t new ¯ ¯ tx − ξt,x − ξx,t = h ¯ tx − ξ 0 h = h tx x ¯ new = h ¯ ty − ξt,y − ξy,t = h ¯ ty − ξ 0 h ty ¯ new h xx ¯ new h yy ¯ new h xy

y

¯ xx − 2ξx,x − = h ¯ yy − ξ 0 − ξ 0 = h t

ξt0

−

ξz0

¯ xx − =h

ξt0

−

ξz0

(35) (36) (37) (38) (39)

z

¯ xy = h

(40)

4

This gives the system of equations: ¯ tx ξx0 = h 0 ¯ ty ξ =h

(41) (42)

y

ξt0 =

¯ tt + h

1 ¯ 2 (hxx

¯ yy ) +h

(43)

2 ¯ tt + 1 (h ¯ xx + h ¯ yy ) − h 2 ξz0 = 2

(44)

These equations have unique solutions (up to an additive constant) given by simple integrations. (c) We apply eqns 25.94 and 25.95 where we use z k = ekzˆ for nk . Then Pjl Pkm hlm = hjk − zj hkz − zk hjz + zj zk hzz and Pjk P lm hlm = (δjk − zj zk )(hxx + hyy ). Here are the results: 1 1 hTxxT = hxx − 0 − 0 + 0 − (hxx + hyy ) = (hxx − hyy ) 2 2 1 1 TT hyy = hyy − 0 − 0 + 0 − (hxx + hyy ) = − (hxx − hyy ) 2 2 hTzzT = hzz − hzz − hzz + hzz − 0 = 0 1 hTxyT = hxy − (0) = hxy 2 1 TT hxz = hxz − 0 − hxz + 0 − (0) = 0 2 hTyzT = hyz − hyz = 0 Tν = 0 so hTttT = −hTtzT = hTzzT = 0, hTxtT = −hTxzT = 0 and We still have hTµν, TT TT hty = −hyz = 0.



hTjkT

0 0 = 0 0

0 0 1 (h − h ) h xx yy xy 2 hxy − 21 (hxx − hyy ) 0 0

 0 0  0 0

So we find that hTxxT = −hTyyT = h+ , hTxyT = hTyxT = h× and tr(hT T ) = 0 as they should.

25.9 Energy removed by gravitational radiation reaction [by Keith Matthews ’05] 1 dM =− dt 5

Z

dxj ρ dt

∂ 5 Ilm l m xx ∂t5

2 5 d x = − Ijl 5 ,j

5

3

Z ρ

dx(j l) 3 x d x dt

(45)

5

Where I 5 indicates ∂∂tI5 and it comes out of the integral because it is not a function of ~x. To evaluate the integral on the right we consider Z Z d dx(j l) 3 d(r2 ) 3 ∂ 1 1 x d x − δ jl ρ d x Ijk = Ijk = Ijk = 2 ρ (46) ∂t dt dt 3 dt Because δ ij xi xj = r2 and δ ij δ ij = 3, as applied below, we find that the second integral above doesn’t contribute. Z Z 1 1 i j 2 ij ij (47) I I = ρ(~x)ρ(~x0 )(xi xj − δ ij r2 )(x0 x0 − δ ij r0 )d3 x d3 x0 3 3 Z Z 1 2 2 2 i j = ρ(~x)ρ(~x0 )(xi xj x0 x0 − r2 r0 + δ ij δ ij r2 r0 )d3 x d3 x0 (48) 3 9 Z Z 1 i j = ρ(~x)(xi xj − δ ij r2 )d3 x ρ(~x0 )x0 x0 d3 x0 (49) 3 So

dM d hM i 1 5 1 = = − Ijl Ijl dt dt 5 which we integrate by parts twice to give dM d hM i 1 3 3 = = − Ijl Ijl dt dt 5

(50)

(51)

which is the desired result.

25.10 Propagation of waves through an expanding universe [by Alexander Putilin ’00] ds2 = b2 [−dη 2 + dχ2 + χ2 (dθ2 + sin2 θdφ2 )], where b = b0 η 2 .

(52)

(a) We can prove that curves of constant θ, φ, η − χ satisfy geodesic equation by explicit calculation of connection coefficients. But the easier way is to use symmetry. Spherical symmetry implies that a radial curve η = η(ζ), χ = χ(ζ), θ, φ = const must be a geodesic for some parameter ζ. Since a geodesic is null 2 2 dχ we have − dη + dχ = 0, dη dζ dζ dζ = dζ , ⇒ η − χ = const along a geodesic. (b) Symmetry also helps here. Spherical symmetry guarantees that ∇~k~eθˆ cannot point in χ or φ direction. So ∇~k~eθˆ = a~eθˆ + b~k. ~k = k η ~eηˆ + k χ~eχˆ = k η (~eηˆ + ~eχˆ ), since ~k 2 = 0 implies k η = k χ . But ~eθˆ · ∇~k~eθˆ = a = 12 ∇~k (~eθˆ · ~eθˆ) = 21 ∇~k (1) = 0 gives a = 0. and ∇~k~eθˆ = b~k = bk η (~eηˆ + ~eχˆ ). ˆ α ˆ µ k αˆ ~eθ; ~e ˆ = bk ηˆ(~eηˆ + ~eχˆ ) ˆα ˆ = k Γ θˆα ˆ µ

Take a dot product of this eqn with ~eχˆ : 6

(53)

α ˆ ηˆ bk ηˆ = k αˆ Γµˆ θˆαˆ ηχˆ ˆµ = k Γχ ˆθˆα ˆ = k (Γχ ˆθˆηˆ +Γχ ˆθˆχ ˆ ), so b = Γχ ˆθˆηˆ +Γχ ˆθˆχ ˆ . Now we need only to calculate two connection coefficients to verify that Γχˆθˆηˆ = Γχˆθˆχˆ = 0, so that b = 0 ⇒ ∇~k~eθˆ = 0. The proof that ∇~k~eφˆ = 0 is very similar.

(c) The general solutions are, in the geometric optics limit: h+ =

Q+ (τr , θ, φ) , r

h× =

Q× (τr , θ, φ) r

(54)

~ · ~k)r. ~ r and ∇~ r = 1 (∇ where ~k = −∇τ k 2 To fix τr = τr (η − χ) notice that at χ = 0 (or correspondingly r = 0) τr = t − r = t, where t is a proper time at χ = 0. dt2 = −ds2 = b2 dη 2 = b20 η 4 dη 2 dt = b0 η 2 dη ⇒ t = τr (η) = t = so τr (η − χ) =

1 b0 η 3 3

1 b0 η 3 3

1 b0 (η − χ)3 3

2 ~k = −∇τ ~ r ⇒ k η = k χ = (η − χ) b0 η 4

√ √ ~ · ~k) = √1 (√−g k α ),α = √1 (∇ ( −g k η ),η + ( −g k χ ),χ −g −g 2(η − χ)2 (η + 2χ) (after some calculations) = b0 η 5 χ

(55)

(56) (57)

Then ∇~k r =

1 ~ ~ (∇ · k)r = k η (r,η + r,χ ) 2

(58)

reduces to

∂ ∂ + ∂η ∂χ

r=

1 2 + χ η

r

(59)

changing variables: a = η − χ, b = η + χ, we get: ∂ 1 2 r = + r ∂b b−a b+a R 1 2 r(a, b) = C(a)e db( b−a + b+a ) = C(a)(b − a)(b + a)2

(60)

⇒ r(χ, η) = C(η − χ)χη 2

(62)

7

(61)

where C(η − χ) is an arbitrary function. Consider the region η = η0 , χ

Classical Physics BLandford and Thorne

Short Description

Description

Comments

We need your help!