Stock, J., Watson, M. - Introduction to Econometrics.pdf

April 17, 2017 | Author: k2o9 | Category: N/A
Share Embed Donate


Short Description

Download Stock, J., Watson, M. - Introduction to Econometrics.pdf...

Description

Introduction to Econometrics James H. Stock HARVARD UNIVERSITY

Mark W. Watson PRINCETON UNIVERSITY

...

~

Boston San Francisco N ew York

London Toronco Sydney ToJ..:yo Singapore Madrid

Mexico City Munich Paris Cape Town Hong Kong Montreal

Brief Contents

PART ONE

Introduction and Review

CHAPTE R I

Economic Questions and Data

CHAPTER 2

ReviewofProbability

CHAPTER 3

Review of Statistics

PART TWO

Fundamentals of Regression Analysis

CHAPT ER 4

Linear Regression with One Regressor

CHAPTER 5

Regression with a Singl e Regressor: Hypothesis Tests and

Confidence In tervals 148

CHAPTER 6

Linear Regression with Multiple Regressors

CHAPTER 7

Hypothesis Tests and Confidence Intervals

in Mult ipl e Regression 220

CHAPTER 8

No nlinear Regression Functions

CHAPTER 9

Assessing Studies Based on Multiple Regression

3

17

65

109

1II

186

254

PART THREE Further Topics in Regression Analysis

312

347

CHAPTER 10

Regression with Panel Data

349

CHAPTER II

Regression with a Binary Depe ndent Variable

C HAPTE R 12

Instrumenta l Variables Regression

CHAPTER 13

Ex periments and Quasi -Experiment s

PART FOUR

Regression Analysis of Economic

Time Series Data 523

CHAPTER 14

Introduction to T ime Series Regression and Forecasting

CHAPTER 15

Estimation of Dynamic Causal Effects

C HAPTER 16

Additional Topics in Time Series Regression

PART FIVE

The Econometric Theory of Regression Analysis

CHAPTER 17

The Theory of Linear Regression wit h One Regressor

CHAPTER 18 The T heory of Multiple Regression

383

421

468

525

591

637

675

677

704

v

Contents

Preface

XXVIl

PART ONE

Introduction and Review

CHAPTER I

Economic Questions and Data

1.1

3

Economic Questions We Examine 4

Question # 1: Does Reducing Class Size Improve Elementary School

Education? 4

Question # 2: Is There Racial Discrimination in the Market for Home Loans? Question # 3: H ow Much Do Cigarette Taxes Reduce Smoking? 5

Question #4: W hat Will the Rate of Inflation Be Next Year? 6

Quantitative Questions, Quantitative Answers 7

1.2

Causal Effects and Idealized Experime nts

5

8

Estimation of Causal Effects 8

Forecasting and Causality 9

1.3

Data: Sources and Types

10

Expe rimental versus Observational Data Cross-Sectional Data [ I

Time Series Data II

Panel Data 13

CHAPTER 2 2.1

Review of Probability

10

17

Random Variables and Probability Distributions

18

Probabilities, the Sample Space, and Random Variables 18

Probability Distribution of a Discrete Random Variable 19

Probability Distribution of a Continuous Random Variable 2 1

2.2

Expected Values, Mean, and Variance

23

The Expected Value of a Random Variable 23

The Standard Deviation and Variance 24

Mean and Variance of a Linear Function of a Random Var iable Other Measures of the Shape of a Distribution 26

2.3

Two Random Variables

25

29

Joint and Marginal Distributions Conditional Distributions 30

29

vii

viii

CO NTENTS

Independence 34

Covariance and Correlation 34

The Mean and Variance of Sums of Random Variables 2.4

35

The Normal, Chi-Squared ,Student t, and F Distributions

39

The Normal Distribution 39

The Chi-Squared Di stribution 43

The Student t Di stribution 44

The F Distribution 44

2.5

Random Sampling and the Distribution of the Sample Average Random Sampling 45

The Sampling Distribution of the Sample Average

2.6

APPENDIX 2.1

48

49

Derivation of Results in Key Concept 2.3

CHAPTER 3 Review of Statistics

3.1

46

Large-Sample Approximations to Sampling Distributions The Law of Large Numbers and Consi stency The Central Li mit Theorem 52

45

63

65

Estimation of the Population Mean

66

Esti mators and Their Properties 67

Properties of Y 68

The Importance of Random Sampling 70

3.2

Hypothesis Tests C oncerning the Population Mean

71

Null and Alternative Hypotheses 72

The p- Value 72

Calculating thep-Value When O' y Is Known 74

The Sample Variance, Sample Standard Deviation , and Standard Error Calculating thep-Value When O'y Is Unknown 76

The t-Statistic 77

Hypothesis Testing with a Prespecifled Significance Level 78

One-Sided Alternatives 80

3.3

Confidence Interva ls for the Population Mean

81

3.4

Comparing Means from Different Populations

83

Hypothesis Tests for the Difference Between Two Means 83

Confidence Intervals for the Difference Between Two Population Means 3.5

Differences-of-Me an s Estimat ion of Causa l Effects

Using Experiment al Data 85

The Causal Effect as a Difference of Conditional Expectations 85

Estimation of the Causal Effec t Using Differences of Means 87

75

84

ix

CO NTENTS

3.6 Using the t-St at istic W hen the Sample Size Is Small 88

T he t-Statistic and the Student t Distribution 88

Use of the Student t Distribution in Practice 92

3.7 Scatterplot, the Sample Covariance, and the Sample Correlation 92

Scatterp[ots 93

Sample Covariance and Correlation 94

APPENDIX 3 . 1

The U.S. Current Population Survey

[05

APPENDIX 3.2

Two Proofs That Y [s the Least Squares Estimator of J.Ly

APPENDIX 3.3

A ProofThat the Sample Variance Is Consistent [07

PART TWO

Fundamentals of Regression Analysis

CHAPTER 4

linear Regression w ith One Regressor

4. 1 The Linear Regression Model

4. 2

109

III

112

Estimating the Coefficients of the Linear Regression Model

116

T he Ordinary Least Squares Estimator [18

OLS Estimates of the Re [ationsh ip Between Test Scores and the

Student-Teacher Ratio 120

Why Use the OLS Estimator? 12 1

4.3

Measures of Fit

123

The R2 [23

The Standard Error of the Regression 124

Application to the Test Score Data 125

4.4 T he Least Squares Assumptions 126

Assumption #1: The Conditional Distribution ofu, Given X, Has a Mean

of Zero [26

Assumption #2: (X,. Y,). i n Are Independent[y and Ide ntically

Distributed [28

Assumption #3: Large Outliers Are Unlike[y [29

Use of the Least Squares Assumptions 130

=[.. . ..

4.5 The Sampling Distribution of the OLS Estimators T he Sampling Distribution of the OLS Estimators

4.6 Conclusion

[06

131

132

135

AP P E NDI X 4 . 1

The California Test Score Data Set

[43

A PPEND IX 4. 2

Derivation of the OLS Estimators

143

A PP EN D IX 4 . 3

Sampli ng Di stribution of the O LS Estimator

144

x

CO NTENTS

CHAPTER 5

Regression with a Single Regressor: Hypothesis Tests

and Confidence Intervals 148

5.1 Testing Hypotheses About One of the

Regression Coefficients

II

149

Tw o-Sided H ypotheses Concerning f31 149

O ne-Sided H ypotheses Concerning f31 153

Testing Hypot heses About the Intercept f30 155

5.2

I.

5.3

I

5.4

Confidence Intervals for a Regression Coefficient Regression W hen X Is a BinaryVari able Inte rpretation of the Regression Coefficients

ISS

158

158

Heteroskedasticity and Homoskedasticity

160

W hat Are Heteroskedasticity and H omoskedasticity? 160

Mathematical Implications of Hom oskedasticity 163

W hat Does This Mean in Practice? 164

5.5 The Theoretical Foundations of Ord inary Least Squares 166

Linear Cond itionally Unbiased Estimators and the Ga uss-Markov -r heorem Regression Estimators Other Than OLS 168

5.6 Using the t-Statistic in Regression When the Sample Si z~

Is Small

169

The t-Statistic a nd the Student t Di stribution 170

Use of the Student t Distribution in Practice 170

5.7 Conclusion

171

APPE N DI X 5 .1 Form ulas for OLS Standard Errors

180

The Gauss-Markov Cond itions and a Proof

of the Gauss-Markov Theorem 182

APPEND IX 5.2

CHAPTER 6 6.1

Linear Regression with Multiple Regressors Omi tted Variable Bias

186

186

Definition of Omitted Variable Bias 187

A Form ula for Omitted Va riable Bias 189

Addressing Omitted Variable Bias by Dividing the Data into Groub s 6.2

The Multiple Regression Mode l

193

The Population Regression Line 193

The Popu lation Multiple Regre ssion Model 6.3

194

The OLS Estimator in Multiple Regression

196

The OLS Esti ma tor 197

Appl ication to Test Score s and the Stude nt-Teacher Ratio

198

19 1

167

CONT ENTS

6.4 Measures of Fit in Multiple Regression

200

T he Standard Error of the Regression (SER) 200

The R2 200

The 'I\djusred R2" 20 I

Applicat ion to Test Scores 202

6.5 The Least Squares Assumptions in Multiple Regression

202

u,

Ass umption # 1: The Conditio nal Distributi on of Given X I, ' X 2i , Mean of Zero 203

Assumption # 2: (XI, ' X 2" . . . , Xiu ' Y,) i I, ... ,n Are i.i.d. 203

As sum ption # 3: Large Outliers Are Unlikely 203

Assumpti on # 4: No Perfect Mul ticollinearity 203

. ..

,Xk, H as a

=

6.6 The Distribution of the OLS Estimators

in Multiple Regression

205

6.7 Mul t icollinearity 206

Examples of Perfect Multicollineari ty Impe rfect Multicollinearity 209

6.8 Conclusion APP ENDIX 6 . 1

206

210

Derivation of Eq uation (6 .1)

218

Di stribution of the OLS Estimators W hen There Are Two

Regressors and H omoskedastic Errors 218

APPEN DIX 6 .2

CHAPTER 7 Hypothesis Tests and Confidence Intervals

in Multiple Regression

220

7.1 Hypothesis Tests and Confidence Intervals

fo r a Single Coeffic ie nt

221

Standard Errors for the OLS Estimators 22 1

Hypothesis Tests for a Si ngle Coeffi cient 22 1

C onfidence Intervals for a Single Coefficient 223

Application to Test Scores and the Studen t- Teacher Ratio 7.2 Tests of Joint Hypotheses 225

Testi ng H ypotheses on Tw o or More Coefficients 225

T he F-Statistic 227

Applicatio n to Test Scores and the Student- Teacher Ratio T he Homoskedastici ry-Only F-Statistic 230

7.3 Testing Si ngle Restrictions

Involving Multiple Coefficient s

232

7.4 Confidence Sets for Multiple Coefficients

234

22 3

229

xi

xii

CONTE NTS

7.S

Model Specification for Multipl e Regression

235

Omitted Variable Bias in Multiple Regression 236

Model Specification in Theory and in Practice 236

Interpreting the R2 and the Adjusted R2 in Practice 237

7.6

Analysis of the Test Score Data Set

7.7

Conclusion

239

244

APPE ND IX 7. 1 The Bonferroni Test ofa Joint Hypotheses

CHAPTER 8 Nonlinear Regression Functions 8.1

25 1

254

A General Strategy for Modeling Nonlinear

Regression Functions 256

Test Scores and District Income 256

The Effect on Y of a Change in X in Nonlinear Specifications 260

A General Approach to Modeling Nonlinearities Using Multiple Regression

8 .2

Non linear Functions of a Single Independent Variable

264

Polynomials 265

Logarithms 267

Polynomial and Logarithmic Models of Test Scores and District Income 8.3

Interactions Between Independent Variables

280

Nonli ne ar Effects on Test Scores of the Student-Teacher Ratio Discussion of Regression Results Summary of Findings 295

8. 5

275

277

Interactions Between Two Binary Variable s 277

Interactions Between a Continuous and a Binary Variable Interactions Between Two Continuous Variables 286

8.4

Conclusion

264

290

291

296

Regression Functions That Are Nonlinear

in the Parameters 307

APPENDI X 8.1

CHAPTER 9 Assessing Studies Based on Multiple Regression

9.1

Internal and Exte rna l Validity

312

31 3

Threats to Internal Validity 313

Threats to External Validity 314

9.2

Threats to Internal Va lidity of Multiple Regression Analysis Omitted Variable Bias 316

Misspecification of the Functional Form of the Regression Function Errors-in- Variables 319

Sample Se lection 322

316

319

CONTE NTS

Simulta neous Causa li ty 324

Sources of Inconsistency of OLS Standard Errors

325

9.3 Internal and External Validity When the Regression Is Used

for Forecast ing

327

Using Regression Models for Forecasting 327

Asse ssing the Validity of Regre ss ion Models for Foreca sti ng 9.4 Example: Test Scores and Class Size External Validity 329

Internal Validity 336

Discu ss ion and Implication s 337

9.S Conclusion APPEND IX 9.1

329

338

The Massachu setts Elementary School Testing Data

PART THREE Further Topics in Regression Analysis CHAPTER 10

328

Regression with Panel Data

344

347

349

10. 1 Panel Data 350

Example: Traffic Deaths and A lcohol Taxes

351

10.2 Panel Data with Two T ime Pe riods: "Before and After"

Comparisons

353

10.3 Fixed Effects Regression 356

The Fixed Effects Regression Mode l 356

Estimation and Inference 359

Application to Traffic Death s 360

10.4 Regression with Time Fixed Effects 361

Time Effects Only 36 1

Both En tity and Time Fixed Effects 362

10.S The Fixed Effects Regression Assumpt ions and Standard Errors

for Fixed Effects Regression

364

The Fixed Effects Regression Assumptions 364

Standard Errors for Fi xed Effects Regression 366

10.6 Drunk Driving Laws and Traffic Deaths 10.7 Conclusion

367

371

APPENDIX 10 . 1 The State Traffic Fatality Data Set

378

Standard Errors for Fixed Effects Regression

with Serially Correlated Errors 379

APPENDIX 10 . 2

xiii

xiv

CONTENTS

CHAPTER II

11.1

Regression with a Binary DependentVariable

383

Binary Dependent Variables and the Linear Probability Model Binary Dependent Variables 385

The Linear Probability Model 387

11.2

Probit and Logit Regression

389

Probit Regression 389

Logit Regression 394

Comparing the Linear Probability, Probit, and Logit Models 11 .3

396

Estimation and Inference in the Logit and Probit Models

396

Nonlinear Least Squares Estimation 397

Maximum Likelihood Estimation 398

Measures of Fit 399

11.4

Application to the Boston HMDA Data

11.5

Summary

407

APPENDIX 11 . 1 The Boston HMDA Data Set

CHAPTER 12

12.1

400

415

APPENDIX 11 . 2

Maximum Likelihood Estimati on

APPEND IX 11. 3

Other Limited Dependent Variable Models

Instrumental Variables Regression

418

421

The IV Estimator w ith a Single Regressor

and a Single Instrument 422

The IV Model and Assumptions 422

The Two Stage Least Squares Estimator 423

Why Does IV Regression Work? 424

The Sampling Distribution of the TSLS Estimator Application to the Demand for Cigarettes 430

12.2

415

The General IV Regression Model

428

432

TSLS in the General IV Model 433

Instrument Relevance and Exogeneity in the General IV Model The IV Regression Assumptions and Sampling Distribution

of the TSLS Estimator 434

Inference Using the TSLS Estimator 437

Appl ication to the Demand for Cigarettes 437

12.3

Checki ng Instrument Validity

439

Assumption # 1: Instrument Relevance 439

Assumption # 2: Ins trument Exogeneity 44 3

12.4

Application to the Demand for Cigarettes

445

434

384

CONTENTS

12.5 Where Do VaJid Instruments Come From?

Three Examples 12.6 Conclusion

XV

450

451

455

APPENDIX 12. 1 The Cigarette Consumption Panel Data Set

462

Derivation of the Formula for theTS LS Estimator in

Equation (12.4) 462

APP EN DIX 12 .2

APPEND IX 12 .3

Large-Sample Distribution ofthe TSLS Estimator

463

Large-Sample Di stribu t ion of the TSLS Estimator When the

Instrument Is Not Valid 464

A PPE N DIX 12 .4

APPENDIX 12.5

CHAPTER 13

Instrumental Variables Anal ysis with Weak Instruments

Experiments and Quasi-Experiments

468

13.1 Idealized Experiments and Causal Effects

Ideal Randomized Controlled Experiments The Differen ces Estimator 471

466

4 70

470

13. 2 Potential Problems with Experiments in Practice Threats to Internal Validity 472

Threats to External Validity 47 5

472

13.3 Regression Estimators of Causal Effects

Using Experimental Data

477

The Differences Estimator with Additi onal Regressors 477

The Differences-in-Differences Estimator 480

Estimation of Causal Effects for Different Groups 484

Estimation When There Is Partial Compliance 484

Testing for Randomization 485

13.4 Experimental Estimates of the Effect of Class Size Reductions Experime ntal Design 486

Analys is of the STAR Data 487

Comparison of the Observational and Expe rimental Estimates

of Class Size Effects 492

13.5 Quasi-Experiments 494

Examples 49 5

Econometric Methods for Analyzing Quasi-Experiments 13.6 Potential Problems with Quasi-Experiments

Threats to Internal Validity 500

Threats to External Validity 502

497

500

486

xv

CONTENTS

12.5 Where Do Valid Instruments Corne From? Three Examples 451

12.6 Conclusion

450

455

APPENDIX 12 . 1 The Cigarette Consumption Panel Data Set

462

Derivation of the Formula for theTSLS Estimator in

Equation (12.4) 462

AP PENDIX 12.2

APPENDIX 12.3

Large-Sample Distribution oftheTSLS Estimator

463

Large-Sample Distribution of the TSLS Estimator W hen the

Instrument Is Not Valid 464

APPENDIX 12.4

APPENDIX 12.5

CHAPTER 13

Instrumental Variables Analysis with Weak In struments

Experiments and Quasi-Experiments

466

468

13.1 Idealized Experiments and Causal Effects 470

Ideal Randomized Controlled Experiments 470

The Differen ces Estimator 471

13.2

Potential Problems with Experiments in Practice

472

Threats to Internal Validity 472

Threats to External Validity 475

13.3 Regression Estimators of Causal Effects

Using Experimental Data 477

The Differences Estimator with Additional Regressors 477

The Diffe rences-in-Differences Esti mator 480

Estimation of Causal Effects for Different Groups 484

Estim ation When There Is Partial Compliance 484

Testing for Randomization 485

13.4

Experimental Estimates of the Effect of Class Size Reductions Experime ntal Design 486

Analysis of the STAR Data 487

Comparison of the Observational and Experimental Estimates

of Class Size Effects 492

13.5 Quasi-Experiments 494

Examples 495

Econometric Me thods for Analyzing Quasi-Experiments 13.6 Pot ential Problems with Quasi-Experiments Threats to Internal Val idity 500

Threats to External Validity 502

497

500

486

xvi

CONTENTS

13.7

Experimental and Quasi-Experimental Estimates in Heterogeneous Populations 502 Population H eteroge neity: Whose Causal Effect? 502 OLS w ith Heterogeneous Causal Effe cts 503 IV Regression wi th Heterogeneous Causal Effects 50

13.8

Conclusion A PP EN DIX 13.1

507 The Project STAR DataS et

516

Extension of the Differences-i n- Differences Estimator to Multiple Time Periods 517

A PPEND IX 13. 2

Conditional Mean Independe nce

A PPE N D IX 13.3 APP EN DIX 13.4

Individuals

III I

PART FOUR

518

IV Estimation W hen the Causal Effect Varies Across 520

Regression AnaJysis of Economic Time Series Data

CHAPTER 14 Introduction to Time Series Regression and Forecasting 14.1 14.2

Using Regression Mode ls for Forecasting

Introduction to T ime Series Data and Serial Correlation

Autoregressions

525

527

The Rates of Inflation and Unempl oyme nt in the United States Lags, Fi rst Differences, Logarithm s, and Growth Rates 528 Autocorrelation 532 Other Examples of Economic Time Series 533 14.3

523

528

528

535

The First Ord er A utoregressive Mode l 535 The p th Order Autoregressive Model 538 14.4

Time Series Regression with Addit ional Predictors and the Autoregressive Distributed Lag Model 541 Forecasting Change s in the Inflation Rate Using Past Unemployment Rates 541 Stationarity 544 Time Series Regre ssion w ith Multiple Predictors 545 Forecast Uncertainty and Forecast Intervals 548

14.5

Lag Length Selection Using Informat ion Criteria

549

Determining the Order of an Autoregression 55 1 Lag Le ngth Se lection in Time Series Regression W it h Mult iple Predictors 14.6

Nonstationarity I: Trends What [s a Tre nd? -P ...-.... I ~... .."..~

r-.., ........ .....I

554

555 I..... ~

c:: . __ l.-.. .... ;" ;... T ...,.. .... ...I r

" 1:; 7

553

CONTENTS

xvii

De tec ting Stochas tic Tre nds : Testing for a Unit A R Roo t 560 Avoiding the Pro ble ms Caused by Stochastic Trends 564 14.7 Nonstationarity II: Breaks 565

W hat Is a Break? 565

Testi ng fo r Breaks 566

Pse udo Out-o f-Sample Fo recasting 571

Avo iding the Proble ms Caused by Breaks 576

14.8 Conclusion

CHAPTER 15

577

AP P ENDI X 14 . 1

Time Series Data Used in Chapter 14

A PPE NDI X 14 . 2

Stationarity in the A R(I ) Model

APPENDIX 14 .3

Lag Operator Notation

APPENDIX 14.4

ARM A Models

A P PEN D IX 14.5

Consistency of the BIC Lag Length Estimator 589

586

588

589

Estimation of Dynamic Causal Effects

1 S.l

An Init ial Taste of the O range J uice Data

1 S.2

Dynamic Causal Effects

591 593

595

Causal Effects and Time Series Data Tw o Types ofExogenei ty 598

t 5.3

586

596

Estimation of Dynamic Causal Effects with Exogenous Regressors The Distributed Lag Model Ass umptions 601 Au tocorrelated U t , Standard Errors, and Infe rence 60 I Dynamic Multi pli e rs and Cumulative Dynamic Mul t ipl iers

602

15.4 H et eroskedasticity- and Autocorrelat ion-Consistent Stan dard

Er rors

604

DIstribution of the OLS Estimator w ith Autocorre la ted Errors H AC Standard Errors

604

606

15.5 Estimation of Dynamic Causal Effects with Strict ly

Exogenous Regressors 608 The Distributed Lag Mode l wit h A R(I) Errors 609 OLS Estimation of the ADL Model 6 12 GL S Estimation 613 The Distributed Lag Model w ith Additional Lags a nd A R(p) Errors 15.6 Orange J uice Prices an d Cold Weather 15.7 Is Exogeneity Plausible? Some Examples U.S. Income a nd Aus tra lian Exports 625

Oil Prices and Inflation 626

618 624

6 15

600

xviii

CONTENTS

Monetary Policy and Inflation The Phillips Curve 627

15. 8

Conclusion

626

627

The Orange Juice DataSet

APPENDIX 15.1

634

The ADL Model and Generalized Least Squares

in Lag Operator Notation 634

APPENDIX 15.2

CHAPTER 16

16.1

Additional Topics in Time Series Regression Vector Autoregressions

637

638

The VAR Model 638

A VAR Model of the Rates of Inflation and Unemployment 16.2

Multiperiod Forecasts

641

642

Iterated Muliperiod Forecasts 643

Direct Multiperiod Forecasts 645

Which Method Should You Use? 647

16.3

O rders oflntegration and the DF-GLS Unit Root Test Other Models ofTrends and Orders of Integration 648

The DF-GLS Test for a Unit Root 650

Why Do Unit Root Tests Have Non-normal Distributions?

16.4

Cointegration

653

655

Cointegratlon and Error Correction 655

How Can You Tell Whether Two Variables Are Cointegrated 7 Estimation of Cointegrating Coefficients 660

Extension to Multiple Cointegrated Variables 661

Application to Interest Rates 662

16.S

658

Volatility Cl ustering and Autoregressive Conditional

Heteroskedasticity 664

Volatility Clustering 665

Autoregressive Conditional Heteroskedasticity Application to Stock Price Volatility 667

16.6

648

Conclusion

666

669

AP PENDIX 16 . 1 U.S. Financial Data Used in Chapter 16

674

PART FIVE

The Econometric Theory of Regression Analysis

CHAPTE R 17

The Theory of Linear Regression with One Regressor

17.1

675

677

The Extended Least Squares Assumptions and the OLS Estimator The Extended Least Squares Assumption s The OLS Estimator 680

678

678

xix

CON TENTS

17.2

Fundamentals of Asymptotic Distribution Theory

680

Convergence in Probability and the Law of Large Numbers 681

The Centra l Limit Theorem and Convergence in Distri bution 683

Slutsky's Theore m and the Contin uous Mappi ng Theorem 685

Application to the t-Statistic Based on the Sample Mean 68 5

17.3

Asym ptotic Distribut ion of the OLS Estimator and t-Statist ic

686

Consistency and Asymptotic Normality of the OLS Estimators 686

Consistency of H eteroskedasti city-Ro bust Standard Errors 686

Asymptotic Norma lity of the H eteroskedasticity-Robust t-Statistic 688

17.4

Exact Sampling Dist ributions When the Errors

Are Normally Distributed 688

Distribution of ~ J with No rma l Errors 688

Distribution of the Homoskedasticity-only t-Statistic

17.5

Weigh ted Least Squares

690

691

W LS with Known Heteroske dasti city 69 1

W LS w ith H eteroskedasticity of Know n Functional Form 692

H eteroskedasticity-Robust Standard Errors or W LS? 695

AP P ENDI X 17. 1 The Normal a nd Re lated Distributions and Moments of

Contin uous Random Variables

700

Two Inequa lit ies

702

A PP ENDI X 17. 2

C HAPTER 18 The Theory 18.1

of Multiple Regression

704

The Linear Multipl e Regressi on Model and OLS Estimator

in Matrix Form 706

The Mul tiple Regressio n Model in Matrix Notation The Extended Least Squares Assu mptions 707

The O LS Estimator 708

18.2

706

Asymptot ic Distribution of the OLS Estimator and t-Statistic The Mul tivariate Central Limit Theorem 7 10

Asymptotic Normali ty of ~ 710

Hete roskedastic ity-Robust Standard Errors 711

Co nfidence Intervals for Predicted Effects 712

A symptotic Di stri but ion of the t-Statistic 7 13

18.3

Tests of Join t Hypoth eses

713

Join t H ypotheses in Matrix Notation 713

Asymptotic Di stribution of the F-Statistic 7 14

Confidence Sets for Mult iple Coefficients 71 4

18.4

Distribution of Regression Statistics w ith Normal Errors Matrix Represe ntations ofO LS Regression Statistics Di stribution of ~ w ith Nor mal Errors 7 16

7 15

715

710

xx

CONTEN TS

Distribution of S$ 717

Homoskedasticity-Only Standard Errors Distribution of the t-Statistic 718

Distribution of the F-Statistic 718

18. 5

717

Efficiency of the OLS Estimator with Homoskedastic Errors

719

The Gauss-Markov Conditions for Multiple Regression 719

Linear Conditionally Unbiased Estimators 719

The Gauss-Markov Theorem for Multiple Regression 720

18.6 Generalized Least Squares 721

The GLS Assumptions 722

GLS When n Is Known 724

GLS When n Contains Unknown Parameters 725

The Zero Conditional Mean Assumption and GLS 725

18.7 Instrumental Variables and Generalized Method

of Moments Estimation

727

The IV Estimator in Matrix Form 728

Asymptotic Distribution of the TSLS Estimator 729

Properties ofTSLS When the Errors Are Homoskedastic 730

Generalized Method of Moments Estimation in Linear Models 733

743

APPENDIX 18, I

Summary of Matrix Algebra

APPENDIX 18,2

Multivariate Distributions

APPENDI X 18,3

Derivation of the Asymptotic Distribution of {3

APPE N D IX 18 .4

Derivations of Exact Distributions of OLS Test Statistics with

749

Normal Errors

747

748

Proof of the Gauss-Markov Theorem

for Multiple Regression 751

APPENDIX 18,5

APPENDIX 18.6

Proof of Selected Results for IV and GMM Estimation

Appendix 755

References 763

Answers to "Review the Concepts " Questions Glossary 775

Index 783

767

752

Key Concepts PART ONE 1.1

Introduction and Review

Cross-Sectional, Time Series, and Panel Data

2.1

Expected Value and the Mean

2.2

Variance and Standard Deviation

15

24 25

38

2.3

Means, Variances, and Covariances of Sums of Random Variables

2.4

Computing Probabil ities Involving Normal Random Variables

2. 5

Simple Random Sampling and i.i.d. Random Variables

2.6

Convergence in Probability, Consistency, and the Law of Large Numbers

2 .7

The Central Limit Theorem

3. ,

Estimators and Estimates

3.2

Bias, Consistency, and Efficiency

40

47

55

67

3.3

Efficiency ofY: Y is BLUE

3. 4

The Standard Error ofY

68

70

76

3.5

The Terminology of H ypothesis Testing

3.6

Testing the Hypothesis E(Y) = J.Ly,0 Against the Alternative E(Y)

3. 7

Confidence Intervals for the Population Mean

PART TWO

50

79 1:- ILY,0

80

82

Fundamentals of Regression Analysis

109

4.1

Terminology for the Linear Regression Model with a Single Regressor

4.2

The OLS Estimator, Predicted Values, and Residuals

4.3

The Least Squares Assumptions

I 15

I 19

131

133

4.4

Large-Sample Distributions of.8o and.81

5. 1

General Form of the t-Statistic

5.2

Testing the Hypothe sis f3 1 = f31 ,0 Against the Alternative f31

5.3 5.4

Confidence Interval for f31 157 Heteroskedasticityand Homoskedasticity

5.5 6. 1

T he Gauss-Markov Theorem for .81 168 Omitted Variable Bias in Regression with a Single Regressor

6 .2

The Multiple Regress ion Model

6.3

The OLS Estimators, Predicted Values, and Residuals in the Multiple Regression Model

6.4

The Least Squares Assumptions in the Multiple Regression Model

6.5

Large Sample Distribution of .80' .81' ... , .8k

150 1:-

f3 1,0

152

162 189

196 198

204

206

xxi

xxii

KEY CONC EPTS

7.1 7. 2 7.3

Testi ng the H ypothe sis f3i = {3,.o Against the A lternat ive f3, 7= !3j.o 222

Confidence Intervals for a Single Coefficient in Multiple Regression 223

O mitted Variable Bias in Multiple Regression 237

7.4

R2 and R2: W hat T hey Tell You- and What They Don't 238

8. 1

The Expected Effect on Y of a Change in X, in the Nonlinear Regression Model (8 .3)

8.2

Logarithms in Regression: Three Cases

8 .3

A Method fo r Interpreting Coefficients in Regressions w ith Bi nary Variables

8.4

Interactions Between Binary and Continuous Variables

8.5

Inte racti ons in Mu ltiple Regression

273

282

9.1

Internal and External Validity

Omitted Var'iable Bias: Should I Include More Variables in My Regression?

313

9 .3

Functional Form Misspecification

9 .4

Errors-in-Variables Bias

319

Sam ple Se lection Bias

9 .6

Simultaneous Causal ity Bias

9.7

Threats to the Internal Validity of a Multiple Regression Study

323

325

327

PART THREE Further Topics in Regression Analysis

347

350

10.2

The Fixed Effects Regression Model

10.3

The Fixed Effects Regre ssion Assumptions

11. 1

The Linear Probability Model

11.2

The Probit Model, Predicted Probabilities, and Estimated Effects

359

365

388

392

11. 3

Logit Regression

12.1

The General Instrumenta l Variables Regression Model and Terminology

394

12. 2

Two Stage Least Squares

12 .3

The Tv" 0 Condi t ions for Valid Instruments

12.4

The IV Regression Assumptions

436

437

A Rule ofThumb for Checking for Weak Instruments

44 1

12 .6

The O veridentifying Restrictions Test (the }-Statistic)

444

14 .1

433

435

12 .5

PART FOUR

318

321

9 .5

Notation for Panel Data

279

287

9 .2

10. 1

26 1

Regression Analysis of Economic Time Series Data

Lags , First Differences, Logarithm s, and Growth Rates

14. 2

Autocorre lation (Serial Correlation) and Autocovariance

14. 3

Autoregressions

14.4

The A utoregressive Distributed Lag Model

14.5

Stationarity

539

545

544

530

532

523

xxiii

KE Y CON CEPTS

14. 6 Time Se nes Regression w ith Multiple Predictors 546

14.7 Gra nger Causal ity Tests (Te sts of Predictive Content) 547

14.8 The Augmented Dickey -Fuller Test for a Unit Autoregressive Root 14 .9 The Q LR Test for Coefficient Stabil ity 14. 10 Pse udo Out-of-Sample Forecasts 15.1

569

572

The Distributed Lag Model and Exogeneity

15 .2 The Distributed Lag Model Assumptions

600

602

15 .3 15.4

H AC Standard Errors 609

Estimation of Dynam ic Multipliers Under Strict Exoge nei ty

16. I

Vector Autoregressions

16.2

Iterated Multiperiod Forecasts

16.3 16.4

Direct Multiperiod Forecasts 647

Orders of Integration, Differe ncing, and Stat ionarity

16.5

Cointegration

PART FIVE 17 .1

562

6 17

639

645

650

658

The Econometric Theory of Regression Analysis

675

The Extended Least Squares Assumptions for Regression w ith a Single Regressor

18. 1 The Extended Least Squares Assumption s in the Multiple Regression Model 18.2

The Multivariate Central Li mitTheorem

18. 3

Gauss-Markov Theorem for Multiple Regression

18.4

The GLS Assumption s

723

710

721

707

680

G eneral Interest Boxes

The Distribution of Earnings in the United States in 2004 A Bad Day on Wall Street 42

Landon Wins I 7[

35

The Gender Gap of Earnings of College Graduate s in the United States A Novel Way to Boost Retire ment Savings 90

The "Beta" ofa Stock 122

86

The Economic Value of a Year of Education: Heteroskedasticity or Homoskedasticity? 165

The Mozart Effect: Omitted Variable Bias? 190

The Returns to Education and the Gender Gap The Demand for Economics Journa[s

284

288

Do Stock Mutua[ Funds Outperform the Market?

323

James Heckman and Danie[ McFadden , Nobel Laureate s Who Invented Instrumental Variable Regression? A Scary Regre ssion 426

The Externa[ities of Smoking The Hawthorne Effect

407

425

446

474

What [s the Effect on Employment of the Minimum Wage?

498

Can You Beat the Market? Part I 540

The River of Blood

550

Can You Beat the Market? Part 11

573

N EWS FLASH : Commodity Traders Send Shivers Through Di sney World Robert Engle and Clive Granger, Nobel Laureates 657

625

Preface

conometrics ca n be a fun course for both teacher a nd student. Th e real world of economics, business, and government is a complicated and messy place. full of competing ideas and quest ions tha t dem and answe rs. Is it more effective to tackle drunk driving by passing tough laws or by increasing the tax on alcohol? Can yo u make money in the stock market by buying when prices are historically low, relative to earnings. or should yo u just sit tight as the ran dom walk theory of stock prices suggests? Can we improve elementary education by reducing class sizes. or should we simply have our children listen to Mozart for tcn min utes a day? Econometrics helps us to sort o ut sound ide as from crazy ones and to find qu an­ titative answers to important quantitative questions. E conometrics opens a wi n­ dow on our complicated world that lets us see the relationships on which people. businesses, and governments base their decisions. This textbook is designed for a first course in undergraduate econometrics. It is our experience that to make econometrics relevant in an introductory course, interesting applications must motivate the theory and the theory must match the applications. TIlis simple principle represents a significant departure from the older generation of econ ometrics books, in which theoretical models and ass um ptions do not match the app lications. It is no wonder tha t some students question the rel­ evance of econometrics after they spend much of their time learni ng assumptions that th ey subsequently realize are unrealistic, so that they must then learn "sol u­ tions" to " problems" that arise when the applications do not match the assump­ t ions. We believe that it is far better to motivate the need for tools with a concrete application, and thell to provide a few simple assumptions that match the appli­ catio n. Because the the ory is immedia tely relevant to tbe applica tions, this approach can make econometrics come alive. TIle second edition benefits from the many constructive suggestions of teach­ ers who used the first edition, while maintaining the philosophy tbat applications should drive the theory, not the other way around. The single greatest change in the seco nd edition is a reo rgani za tion and expan sion of the material o n core regression analysis: Part II, wh ich covers regression with cross-sectional data, has been expanded from fo ur chapters to six. We have added new empirical exampl es (as boxes) drawn from economics and finance; some new optional sectio ns on

E

xxviii

PREFACE

classical regression theory; and many new exercises, both paper-and-pencil and computer-based empirical exercises using data sets newly placed on the textbook Web site. A more detailed description of changes to the second edition can be fo und on page xxxii.

Features of This Book This textbook differs from others in three main ways. First, we integrate real-world questions and data into the development of the theory. and we take seriously the substantive findings of the resulting empirical analysis. Second, our choice of top­ ics reflects modern theory and practice. Third, we provide theory and assumptions that match the applications. Our aim is to teach students to become sophisticated consumers of econometrics and to do so at a level of mathematics appropriate for an introductory course.

Real-world Questions and Data We organize each methodological topic around an important real-world question that demands a specific numerical answer. For example, we teach single-variable regression, multiple regression , and functional form analysis in the context of esti­ mating the effect of school inputs on school outputs. (Do sma ller elementary school class sizes produce higher test scores?) We teach panel data methods in the context of analyzing the effect of drunk driving laws on traffic fatalities. We use possible racial discrimination in the market for home loans as th e empirical appli­ cation for teaching regression with a binary dependent variable (logit and probit). We teach instrumental variable estimation in the context of estimating the demand elasticity for cigarettes. Although these examples involve economic reasoning, all can be understood with only a single introductory course in economics, and many can be understood without any previous economics coursework.Thus the instruc­ tor can focus on teaching econometrics, not microeconomics or macroeconomics. We treat all our empirical applications seriously and in a way that shows stu­ dents how they can learn from data but at the same time be self-critical and aware of the limitations of empirical analyses. Through each application, we teach students to explore alternative specifications an d thereby to assess whether their substantive findings are robust. The questions asked in the empirical appli­ cations are important, and we provide serious and, we think. credible answers. We encourage students and instructors to disagree, however, and invite th em to reanalyze the data , which are provided on the textbook's companion Web site (www.aw-bc.comlstock_ watson).

PREFACE

xxix

Contemporary Choice ofTopics Econometrics has come a long way in the past two decades. The topics we cover reflect the best of contemporary appli ed econometrics. O ne can only do so much in an introductory course. so we focus on procedures and tests tha t ar e commo nly used in practice. For example:

• Instrumental variables regression. We present instrumental variables regres­ sion as a general me thod for handJing correlation be tween the error term and a regressor, which can arise for many reasons, including omitted variables and simultaneous causality. The two assumptions for a valid instrum enl - exo­ geneity and relevance-are given equal billing. We follow that presentation with an extended discussion of where instruments come fro m, and with tests of overidentifying restrictions and d iagnostics fo r weak instruments-a nd we explain what to do if these diagnostics suggest problems.

• Program evaluation. An increasing number of econometric studies analyze either randomized controlled experiments or quasi-experiments, also known as natural experiments. We address these topics, often collectively referred to as program evaluation, in Chapter 13. We present this research strategy as an alternative approach to the problems of omitted variables. simultaneous causality, and selection, and we assess both the strengths and the weaknesses of studies using experimental or quasi-experimental data .

• Forecasting. The chapter on forecasting (Chapter 14) considers univariate (autoregressive) and multivariate forecasts using time series regression, not large simultaneous equation structural models. We focus on simple and reli­ able tools, such as autoregressions and model selection via an information cri­ terion, that work well in practice. This chapter also fe atures a practica lly oriented treatment of stochastic trends (unit roo ts), unit root tests, tests for structural breaks (at known and unknown dates), and pseudo o Ul-of-sample forecasting, all in the context of developing stable and reliable time series fore­ casting models.

• Time series regression. We make a clear distinction between two very differ­ ent applications of time series regression: forecasting and estima tion of dynamic causal effects. The chapter on causal inference using time series data (Chapter 15) pays careful attention to when different estima tion methods, including generalized least squares, will o r will not lead to valid causal infer ­ ences, and when it is advisable to estimate dynamic regressions using OLS with heteroskedasticity- and autocorrelation-consistent standard eITors.

xxx

PREFACE

Theory T hat Matches Applications Although econometric tools are best motivated by empirical applications, students need to learn enough econometric theory to und erstand the strengths and limita­ tions of those tools. We provide a modern treatment in which the (it between tbe­ ory and applications is as tight as possible, while keeping tlle mathematics at a level th at requires only algebra . Modern empirical applica tions share some comm on characteristics: the da ta sets typically are large (hundreds of observations, often more); regressors are not fixed over repeated samples but rather are co llected by random sampling (or some other mechanism that makes them random); the data are not normally distributed; and there is no a priori reason to think that the errors are homoskedastic (although often there are reasons to think that they are heteroskedastic). These observations lead to important differences betwee n the theoretical development in this textbook and other textbooks.

• Large-sample approach. Because data sets are large, fro m the outset we use large-sample normal approximations to sampling distribu tions for hypothesis testing and confidence intervals. O ur experience is tha t it takes less time to teach the rudiments of large-sample approximations than to teach the Student l and exact F distributions, degrees-of-freedom corrections, and so forth. This large-sample approach also saves students the frustration of discovering that, because of nonnormal errors, the exact distribution theory they just mastered is irrelevant. Once taught in the context of the sample mean, the large-sample approach to hypothesis testing and confidence inte rvals carr ies directly thro ugh mu ltiple regression analysis, logit and probit, instrumental variables estimation, and time series methods. • Random sampling. Because regressors are rarely fi xed in econometric appli­ cations, from the outset we treat data on all variables (dependent and inde­ pendent) as the result of random sampling. This assumption matches our initial applications to cross-sectional data; it extends readily to panel and time series data; and because of our large-sample approach, it poses no addi tional con­ ceptual or mathematical difficulties.

Ii

• H eterosk edasticity. A pplied econometricians routin ely use het eroskedas­ tieity -robust standard errors to elimina te worries about whe ther hetero ­ skedasticity is presen t or not. In thi s book. we move be yond treating heterosked asticity as an exception or a "problem" to he "solved"; instead, we allow for hete roskedasticity fro m the outset and simply use heteroskedasticity­

PREFACE

xxxi

robust standard errors. We present homoskedasticity as a special case that pro­ vides a theoretical motivation for OLS.

Skilled Producers, Sophisticated Consumers We hope that students using this book will become sophisticated consumers of empirical analysis. To do so, they must learn not only how to use the tools of regression analysis, but also how to assess the validity of empirical analyses pre­ sented to them. Our approach to teaching how to assess an empirical study is threefold. First, immediately after introducing the main tools of regression analysis, we devote Chapter 9 to the threats to internal and external validity of an empirical study. This chapter discusses data problems and issues of generalizing findings to other set­ tings. It also examines the main threats to regression analysis, including omitted variables, functional form misspecification, errors-in-variables, selection, and simultaneity-and ways to recognize these threats in practice. Second, we apply these methods for assessing empirical studies to the empir­ ical analysis of the ongoing examples in the book. We do so by considering alter­ native specifications and by systematically addressing the various threats to validity of the analyses presented in the book. Third, to become sophisticated consumers, students need firsthand experience as producers. Active learning beats passive learning, and econometrics is an ideal course for active learning. For this reason , the textbook Web site features data sets, software, and suggestions for empirical exercises of differing scopes. These web resources have been expanded considerably for the second edition.

Approach to Mathematics and Level of Rigor Our aim is for students to develop a sophisticated understanding of the tools of modern regression analysis, whether the course is taught at a "high" or a "low" level of mathematics. Parts I-IV of the text (which cover the substantive material) are accessible to students with only precalculus mathematics. Parts I-IV have fewer equations, and more applications, than many introductory econometrics books, and far fewe r equations than books aimed at mathematical sections of under­ graduate courses. But more equations do not imply a more sophisticated treat­ ment. In our experience, a more mathematical treatment does not lead to a deeper understanding for most students. This said, different students learn differently, and for the mathematically weU ­ prepared students, le arning can be enhanced by a more explicitly mathematical treatment. Part V therefore contains an introduction to econometric theory that

I,

xxxii

PREfACE

is appropriate lor students with a stronge r mathematical background. We believe that, when the mathema tical chapters in Pa rt V are used in conjuDction with the ma terial in Parts I-iV, this book is suita ble for advanced undergraduate or mas­ ter's level econometrics courses.

Changes to the Second Edition The cb a nges introduced in th c second edition fall into three ca tegories: more empi rica l exa mples: ex pa nded theoretica l ma terial, especially in the treatmen t of the core regression topics; and additional student exercises.

More empirical examples. TIl e second edition retains tbe empirical examples from the first edition and adds a significan t number of new ones. T11cse additio na l examples include estima tion of the returns to education; infe rence abou t the gen­ der gap in earnings: the diffic ulty of for ecasting the stock market; and modeli ng the volatility clustering in stock returns. The data se ts for these empirical exam­ ples are posted on the course Web site. The secoml ed ition also includes more gen­ eral-interest boxes, for example how sample selection bias ("survivorship bias" ) can produce misleading conclusions about whether actively man aged mutual fun ds actually beat the marker. Expanded theoretical materiaL. The phil osophy of tbis and tbe previous edi­ tion is that t he modeling assumptions should be motivated by empirical applica­ tions. For this reason, our th re e basic least sq uares assumpt io ns tha t under pin regression with a single regressor includ e neith er normali ty nor homoskedastic­ ity, bot h o f which are argu ably the exception in eco nometric a pp li cations. This leads directly to large-sample inference using heteroskedasticity-robust standard errors. O ur experie nce is that students do not find th is difficult- in fact, wh at they fin d difficult is the traditiona l approach of introducing tile homoskedasticity and normali ty assum ptions, learn ing how to use {- and F-tahl es, then being told that wh at they just learned is not reliabl e in applications because of the failure of these assumptions and that these " p roblems" mu st be '·fi xed." But n ot all instruc tors sh are th is view, anti some find it useful to introd uce the homos kedastic no rm al regressi on model. Moreover, even if homosk ed asticity is the exception instead of the rule. assuming homoskedastjcity permits djscllssing the G au ss-M arkov theo ­ rem , a key moti vation fo r using ordin ary least squares (OLS). For thcse reasons. the treatm ent of the core regression mate rial has been sig­ nificantly expanded in the second edition. and now includes sections on the theo­ retical motivation for OLS (the Gauss-Markov theorem). small --;ample inference in the homoskedastic normal model. ,md multicolline .

1.3

Data: Sources and Types In econometric.. data come from One of 1\\0 sou rces: experimenls or nonexpcri­ mental observations of the world.This book examines both expe nmental and oon­ e;o.:perimental dahl sets.

Experimental versus Observational Data k: ~p eri l1lenl j) 1

data come fro m experiments t.lesigned to cvahl(lIe a treatment or policy or 10 invcstigute R causal effect. For e." ampil:. the statl;: ofTcnncsscc finance d a I ~rge random it.ed controlled experiment examining class site in the 19SOs. In that expe riment. which we examine in Chllpter 13, th ousand~ of "tut.lcms were ra n­ domly u~signcd \0 d.:l ss~ of d ifferent si7es for sevcral years ami were given ann ual standardil.cd tests. The ' lcnncssee class size ex perimen t cost millions of dollars and required the o ngoing cooperalion of many adm inistrators, pa renl-.... and leachers m er several years. Becaulll! real-world experiments ~it h human llubJects .Ire difficuilio :ldmin­ ister and torontrol.thcy have flaws relative 10 ideal randomi7ed controlled exper­ imcnts. Moreover, 1Il some circumstances experiments arc not on ly expensive and difficulllO adminbter but also unethical. (Would it be eth ical to offer randomly selected tecnag.ers incxpensi\c cig.arcllcs to ~c how ma n) they buy?) Because of these financial. practical . and ethical problems. experi ments in c(;Qnomi(::) are ra re. Instead, mOllt economic data are obtained by observing real-\\ orld behavior. Data obtained by observing actual behavior outside an t"xperimclltai setting O. Thb

y --4

l~

wri llc n as

Ilr.

111C law of large n umbers says t hat if Y I' i - I. ... . 11 a rc ind..::pt:nden tly and idc niitillly dist ri huted with £ ( Vi) = jJ. y and if large o utlie rs Il Te unli kely (ICChlli­ , cally if \'ar( V,) = if y < '".I), the n Y ~ J.l.r.

,

\ariable, where (from Table 2.2) the proba bility Ihal Y, = 1 is O.7R. Beca use thr: expectat ion of a Bernoulli ra nd om variab l~ is its success probability. E( Y;) -"" /-L y ::: 0.78. The sample avcnlge Y is the fract ion of d AYS in her sample in which her commu te was shurt. Figure 2.8 sho\\·s the sam pling distribution oCY for various sample sizes n . When II = 2 (figure 2.8a). Y can take on on ly threc val ues: O,~.
View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF