1_LinearRegression

Share Embed Donate


Short Description

LinearRegression...

Description

Case study Simple linear regression

Advanced regression methods

Gaussian simple linear regression

1. Reminders on linear regression

Multiple linear regression

V. Lefieux

Gaussian multiple linear regression Regression in practice References

1/109

Table of contents Case study Simple linear regression

Case study

Gaussian simple linear regression

Simple linear regression

Multiple linear regression Gaussian multiple linear regression

Gaussian simple linear regression

Regression in practice References

Multiple linear regression Gaussian multiple linear regression Regression in practice

2/109

Table of contents Case study Simple linear regression

Case study

Gaussian simple linear regression

Simple linear regression

Multiple linear regression Gaussian multiple linear regression

Gaussian simple linear regression

Regression in practice References

Multiple linear regression Gaussian multiple linear regression Regression in practice

3/109

Forecast the ozone concentration 112 records from Rennes-summer 2001 (source: Laboratoire of math´ematiques appliqu´ees of l’Agrocampus Ouest) containing: 









Case study Simple linear regression Gaussian simple linear regression

MaxO3: daily maximum of ozone concentration (µ gr /m3 ).

Multiple linear regression

T9, T12, T15: temperature (at 09:00, 12:00 and

Gaussian multiple linear regression

15:00). Ne9, Ne12, Ne15: cloud cover (at 09:00, 12:00 and 15:00).

Regression in practice References

Vx9, Vx12, Vx15: east-west component of the wind (at 09:00, 12:00 and 15:00). MaxO3v: daily maximum of ozone concentration for the day before.



wind: wind direction at 12:00.



rain: rainy or dry day. 4/109

A linear link ? Case study Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References

5/109

Table of contents Case study Simple linear regression

Case study

Gaussian simple linear regression

Simple linear regression

Multiple linear regression Gaussian multiple linear regression

Gaussian simple linear regression

Regression in practice References

Multiple linear regression Gaussian multiple linear regression Regression in practice

6/109

Terminology Case study Simple linear regression

The word regression was introduced by Francis Galton in Regression towards mediocrity in hereditary stature : (Galton, 1886) ( http://galton.org/essays/1880-1889/ galton-1886-jaigi-regression-stature.pdf). He observed that extreme heights in parents are not completely passed on to their offspring.

Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References

The most known regression is the linear model, simple or multivatiate (Cornillon and Matzner-Løber, 2010), but there are a lot of other models including non linear models (Antoniadis et al., 1992).

7/109

Assumptions Case study

Y : dependent variable , random. X : explanatory variable, deterministic.

Simple linear regression Gaussian simple linear regression

Simple linear regression assumes:

Multiple linear regression

Y = β 1 + β2 X + ε

Gaussian multiple linear regression Regression in practice

where: 



References

β1 and β2 are unknown parameters (unobserved), ε, the error of model, is a centered random variable with variance σ 2 : E (ε)

=0,

Var (ε) = σ 2 .

8/109

Sample Case study

Let ( xi , yi )i ∈{1,...,n} be n realizations of ( X , Y ):

∀i ∈ {1,..., n} : y

i

Simple linear regression Gaussian simple linear regression

= β 1 + β2 xi + εi .

Multiple linear regression

One assume that: 





Gaussian multiple linear regression

(yi )i ∈{1,...,n} is an i.i.d. sample of Y . (xi )i ∈{1,...,n} are deterministic (and observed). (εi )i ∈{1,...,n} is an i.i.d. sample of ε (unobserved).

(εi )i ∈{1,...,n} , for ( i , j ) i

Regression in practice References

∈ {1,..., n}2, satisfy:



E (εi ) = 0 ,



Var ( εi ) = σ 2 ,



Cov( εi , εj ) = 0 if i = j .



9/109

Matrix form 1/3 Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

y1

.

1 x1 =

.

.

ε1

β1 +

.

        yn

1 xn

β2

Gaussian multiple linear regression

.

Regression in practice References

εn

10/109

Matrix form 2/3 Case study Simple linear regression Gaussian simple linear regression

Or: Y

= Xβ + ε

Multiple linear regression

where:

Y

=

Gaussian multiple linear regression

  

y1 .. . ,

yn

 

1 xn

Regression in practice

       

1 x1 . .. X = .. . , β=

β1 β2

, ε=

ε1 .. . .

References

εn

11/109

Matrix form 3/3 Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

We have: E (ε)

Gaussian multiple linear regression

=0,

Regression in practice

Γε = σ 2 In .

References

where

ε

is the variance-covariance matrix of ε.

12/109

Ordinary least square estimator 1/5 Case study



Ordinary least square (OLS) estimators of β1 and β2 , β1 and β2 , minimize:



n

S (β1 , β2 ) =



(yi

i =1

so:

− β1 − β2x ) i

Simple linear regression Gaussian simple linear regression Multiple linear regression

2

Gaussian multiple linear regression Regression in practice

  

References

β1 , β2 = arg min S (β1 , β2 ) (β1 ,β2 )

n

= arg min

(β1 ,β2 )

 i =1

(yi

− β 1 − β 2 x )2 . i

σ 2 must also estimated.

13/109

Ordinary least square estimator 2/5 Case study

The regression line is:

Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References

where yi = β1 + β2 xi .

  

14/109

Ordinary least square estimator 3/5 Case study

It isn’t the distance between points and regression line that we minimize:

Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References

15/109

Ordinary least square estimator 4/5 Case study Simple linear regression Gaussian simple linear regression

It’s possible to consider other distances, for example:

Multiple linear regression

n



S (β1 , β2 ) =

yi

| − i =1

β1

Gaussian multiple linear regression

β2 xi .



|

Regression in practice References

Ordinary least square estimators are sensitive to outliers but are easily obtained (by derivation) and are unique (if the xi aren’t all equal).

16/109

Ordinary least square estimator 5/5 Case study Simple linear regression Gaussian simple linear regression

We assume that ( xi )i ∈{1,...,n} aren’t all equal (otherwise columns of X are colinear and the solution non unique). Ordinary least square estimator of ( β1 , β2 ) is:

Multiple linear regression Gaussian multiple linear regression

sXY , sX2

Regression in practice

  −

β2 =

β1 = y¯

References

β2 x¯ .

17/109

Proof 1/3

 

We search β1 and β2 which minimize:

Case study Simple linear regression

n

S (β1 , β2 ) =



( yi

i =1

2

−β −β x ) . 1

2 i

Gaussian simple linear regression Multiple linear regression

The hessian matrix of the function:

Gaussian multiple linear regression

2 R

S:

(β1 , β2 ) is positive definite. Hessian matrix, H (S ) (or

H (S ) =

R

→ →

Regression in practice

S (β1 , β2 )

2

∇ S ), is:

 

References

∂ 2 S (β1 ,β2 ) ∂β 12

∂ 2 S (β1 ,β2 ) ∂β 1 ∂β2

∂ 2 S (β1 ,β2 ) ∂β 2 ∂β 1

∂ 2 S (β1 ,β2 ) ∂β 22

 

.

18/109

Proof 2/3 We search values such that

∇S (β1, β2) = We have:

So:

∂ S (β1 , β2 ) = ∂β1

−2

∂ S (β1 , β2 ) = ∂β2

2

 

    ∂ S (β1 ,β2 ) ∂β1 ∂ S (β1 ,β2 ) ∂β2

n

Case study Simple linear regression

=0.

Multiple linear regression Gaussian multiple linear regression

 −−  −− −  − −  − −   

n i =1 yi n i =1 xi

(yi

β1

Gaussian simple linear regression

Regression in practice

β2 xi ) ,

References

i =1 n

xi (yi

β1

β2 xi ) .

i =1

β1

yi

β1

β2 xi = 0

β2 xi = 0

. 19/109

Proof 3/3 First equation gives: Case study

n

n

   ⇔ 

n β1 + β2

xi =

i =1

Simple linear regression

β1 + β2 x = y .

yi

Gaussian simple linear regression

i =1

Second equation gives: n

β1

n

So:

n

xi2 =

xi + β 2 i =1

Multiple linear regression

i =1

Gaussian multiple linear regression

xi yi . i =1

    −           − ⇔ −  − ⇔   − n

y

β2 x

n

i =1

i =1

n

β2

i =1 1 β2 = n1 n

n

xi2

x

i =1

=

xi

n i =1 xi yi n 2 i =1 xi

xi yi

n

i =1

n

xi yi

i =1

xy

x

2

=

References

n

xi2 =

xi + β 2

Regression in practice

y

xi

i =1

sX ,Y . sX2

20/109

Regression line Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

The regression line is:

Gaussian multiple linear regression

y = β1 + β2 x .

Regression in practice

 

References

The barycenter ( x , y ) belongs to the regression line.

21/109

Fitted value Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

The fitted value of i -th observation is:

Gaussian multiple linear regression

yi = β1 + β2 xi .

Regression in practice

  

References

22/109

Residual Case study Simple linear regression

The residual of i -th observation is:

−y

      ei = yi

i

Gaussian simple linear regression

.

Multiple linear regression

ei is an estimation of εi .

We have:

Gaussian multiple linear regression Regression in practice

n

S β1 , β2 =

ei2

References

i =1

and:

n

εi2 .

S (β1 , β2 ) =

i =1

23/109

Properties of fitted values Case study Simple linear regression Gaussian simple linear regression Multiple linear regression



n

n

yi = 

i =1

i =1

   

Var (yi ) = σ

2

Gaussian multiple linear regression

yi .

1 + n

− x¯)2 ¯)2 =1 (x − x

(xi

n i

i



Regression in practice References

.

24/109

Properties of residuals Case study Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression

n

 i =1

ei = 0 .

Regression in practice References

25/109

 

Properties of β1 and β2 

 

Case study

∀i ∈ {1, 2} : E



 

Var β1 = n



Cov β1 , β2 =

Multiple linear regression Gaussian multiple linear regression

n 2 i =1 xi

n (x i i =1

x¯)2 .

    −    −    − −

Var β2 =

Gaussian simple linear regression

βi = βi .

σ2 

Simple linear regression

β1 and β2 are unbiased estimators of β1 and β2 :

σ2 n i =1 (xi

References

x¯)2

.

σ 2 x¯

n i =1 (xi

Regression in practice

x¯)2

.

26/109

Linear estimators Case study Simple linear regression

It’s possible to find ( λ1 , λ2 ) and β2 = λ  2 Y:



n

β1 = y¯

β2 x¯ =

n

1 n

x¯ ( xi

 −    −    −− n

β2 =

i =1

i =1

(xi

n j =1 (xj

Gaussian simple linear regression



∈ (R )2 such that β1 = λ1 Y

x¯)

x¯)2

yi .

Multiple linear regression Gaussian multiple linear regression

x¯)

−− x¯)2

n j =1 (xj



yi ,

Regression in practice References

27/109

Gauss-Markov theorem Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

The theorem states that OLS estimators β1 and β2 have the smallest variances among unbiased linear estimators.

Gaussian multiple linear regression Regression in practice

 

They are BLUE: Best Linear Unbiased Estimators.

References

28/109

Residual variance Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

We consider the following unbiased estimation of σ 2 , called residual variance: σ2 =



n

1

n

 − 2

Gaussian multiple linear regression Regression in practice

ei2 .

References

i =1

29/109

Forecast Case study Simple linear regression

For an observation xn+1 , we call forecast (or prediction):

Gaussian simple linear regression

ynp+1 = β1 + β2 xn+1 .

It’s a forecast of:

  

Multiple linear regression Gaussian multiple linear regression Regression in practice

yn+1 = β1 + β2 xn+1 + εn+1 .

References

We have:

   

Var ynp+1 = σ 2

1 + n

(xn+1 x¯)2 n x¯)2 i =1 (xi

− −



.

30/109

Forecast error Case study Simple linear regression

The forecast error is:

Gaussian simple linear regression

enp+1 = yn+1

We have: E

and:

  

Var enp+1 = σ 2

p n

Multiple linear regression

− y +1 .



Gaussian multiple linear regression

enp+1 = 0

 

1 1+ + n

Regression in practice References

− x¯)2 ¯)2 =1 (x − x

(xn+1



n i

i



.

31/109

Geometric interpretation 1/2 Case study

We have: Y

Simple linear regression

= β 1 1n + β2 X + ε

where

X

=

   x1 .. .

Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression

.

Regression in practice

xn

We can rewrite:

References

n

S (β1 , β2 ) =

 i =1



(yi

− (β1 + β2x ))2 = Y − (β1 i

where x is the euclidean distance of x

1+

β 2 X)

2

p

∈R . 32/109

Geometric interpretation 2/2 Case study Simple linear regression

We have:

−    Y

with

Y

Y

Gaussian simple linear regression

2

= min

(β1 ,β2

Y − (β1 )

β2 X)

Multiple linear regression

2

Gaussian multiple linear regression

= β1 1 + β2 X.

  

is the orthogonal projection of by 1 and X (sp 1, X ).

Y

1+

Regression in practice References

Y

on the subspace spanned

{ }

β1 and β2 are the coordinates of the

Y

projection.

33/109

Introduction to the coefficient of determination Case study Simple linear regression Gaussian simple linear regression

We want to explain variations of Y by variations of X . Variations of Y are the differences between yi and their average y : yi y .

Multiple linear regression



Gaussian multiple linear regression Regression in practice

We can write:

References

yi

−

where yi

−y = y −y +y −y



i

i



i

y¯ is the variation explained by the model.

34/109

Analysis of variance Case study Simple linear regression Gaussian simple linear regression

ANOVA (ANalysis Of VAriance) states: SST

=SSM

n

n

− y) Y − y¯ 2

i =1

(yi

1n

2

= i =1 (yi =

y)

  −  −  Y

Gaussian multiple linear regression

n

2



Multiple linear regression

+SSR

y¯1n

2

2

+ i =1 (yi +

yi )

Regression in practice

2

References

 − − Y

Y

where SST is the Sum of Squares Total, SSM is the Sum of Squares Model and SSR is the Sum of Squares Residual.

35/109

Coefficient of determination 1/3 Case study Simple linear regression

The coefficient of determination , R2 is defined by: R2 = R

2

SSM = SST

∈ [0, 1] because: 0

−   −  Y

y¯1n

Y

y¯1n

Gaussian simple linear regression

2

2

Multiple linear regression

.

Gaussian multiple linear regression Regression in practice References

≤ SSM ≤ SST .

If R2 = 1 then SSM = SST. If R2 = 0 then SSR = SST.

36/109

Coefficient of determination 2/3 For the simple linear regression : 1 n

n

 − (yi

i =1

= = So: s2 SSM R = = Y2 = SST sY 2

  − −    −      −   −  

1 y )2 = n =



Case study

1

n

1 n

n

sX ,Y sX ,Y x + 2 xi 2 sX sX

y

i =1

sX ,Y sX2

2

sX ,Y sX2

2 2 sX

sX ,Y sX

2

n i =1 (yi n i =1 (yi

1 n

Simple linear regression

2

y

Gaussian simple linear regression

n

(xi

x)

Multiple linear regression

2

Gaussian multiple linear regression

i =1

Regression in practice References

.

y )2 y)

2

=

sX , Y sX sY

2

= rX2 ,Y . 37/109

Coefficient of determination 3/3 Case study Simple linear regression

Don’t over-interpret the coefficient of determination: 





For a good model, R

2

Gaussian simple linear regression Multiple linear regression

is close to 1.

Gaussian multiple linear regression Regression in practice

But a R 2 close to 1 doesn’t indicate that a linear model is adequate.

References

A R2 close to 0 indicates that a linear model isn’t adequate (some non linear models could be adequate).

38/109

Limitations Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

It’s impossible to build confidence intervals on parameters, or test their significancy: it becomes possible if we add a distribution.

Gaussian multiple linear regression Regression in practice

The gaussian simple linear regression assumes that the error is gaussian.

39/109

References

Table of contents Case study Simple linear regression

Case study

Gaussian simple linear regression

Simple linear regression

Multiple linear regression Gaussian multiple linear regression

Gaussian simple linear regression

Regression in practice References

Multiple linear regression Gaussian multiple linear regression Regression in practice

40/109

Assumptions Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

We add:

Gaussian multiple linear regression

0, σ 2 .

ε

∼N

 

(εi )i ∈{1,...,n} is an i.i.d sample with distribution

Regression in practice

N 

0, σ 2 .

References

41/109

Distribution of

Y Case study Simple linear regression Gaussian simple linear regression

We have:

Multiple linear regression Gaussian multiple linear regression

β1 + β2 x1 Y

∼N

n



.. β1 + β2 xn

  , σ 2 In

Regression in practice

.

References

42/109

Maximum likelihood estimation 1/3 Case study

The likelihood of ( ε1 ,...,



=

is:

Simple linear regression

      − − − √  √  −      √ − − −

p ε1 ,...,

=

n)

n ; β1 , β2 , σ

1

σ 2π 1 σ 2π

n

n

exp exp

1 2

2

n

yi

i =1

β1 β2 xi σ

1 S (β1 , β2 ) 2σ 2

Gaussian simple linear regression

2

Multiple linear regression Gaussian multiple linear regression Regression in practice

.

References

So the log-likelihood is:  ε1 ,..., =

n log

n ; β1 , β2 , σ



2

n log σ 2 2

1 S (β1 , β2 ) . 2σ 2

43/109

Maximum likelihood estimation 2/3 Case study Simple linear regression

We obtain maximum:



∂ ε1 ,...,

2 n ; β1 , β2 , σ

∂β1



∂ ε1 ,...,

n ; β1 , β2 , σ

∂β2

n ; β1 , β2 , σ

∂σ 2

2

Multiple linear regression

∂ S β1,ML , β2,ML =0



∂β1

=0,

Gaussian multiple linear regression

∂ S β1,ML , β2,ML

2

 

∂ ε1 ,...,

  

Gaussian simple linear regression

 

=0 =0



∂β2

n

⇔ −

=0,

       

2 2σML

1 + 4 S β1,ML , β2,ML 2σML

Regression in practice References

=0.

Maximum likelihood and OLS estimators of β1 and β2 are same.

44/109

Maximum likelihood estimation 3/3 Case study

We have: 2 σML



= = = =

Simple linear regression

  

S β1 , β2

Gaussian simple linear regression

n

1 n 1 n 1 n

Multiple linear regression

n

yi

− β1 − β2x

2

Gaussian multiple linear regression

i

i =1 n

Regression in practice

  −     (yi

yi )2

References

i =1 n

ei2 .

i =1

Maximum likelihood estimation of σ 2 is biased (but asymptotically unbiased). 45/109

Parameters distribution 1/6 Case study



   

With β = (β1 , β2 ) and β = β1 , β2 β

where: V =

∼ N2

Simple linear regression



β, σ V

1 n

x¯)2

Multiple linear regression

2

     −  − 1 n i =1 (xi

Gaussian simple linear regression

, we have:

n 2 i =1 xi



Gaussian multiple linear regression

 −  x¯

Regression in practice References

.

1

46/109

Parameters distribution 2/6 Case study Simple linear regression Gaussian simple linear regression

So: β1

∼N

β1 ,

σ2 n

Multiple linear regression

n 2 i =1 xi

n

( xi

x¯)

Gaussian multiple linear regression

,

2

 ∼ N    −  − i =1

β2

β2 ,

σ2

n i =1 (xi

x¯)

2

Regression in practice

.

References

The 2 estimators aren’t independent.

47/109

Parameters distribution 3/6 Case study Simple linear regression Gaussian simple linear regression

Cochran theorem gives: (n

− 2) σˆ 2 = σ2

and:

Multiple linear regression

n 2 i =1 e i

σ2

 ⊥ 

β

χ



Gaussian multiple linear regression

2 n−2

Regression in practice References

σ2 .

48/109

Cochran theorem Case study Simple linear regression

Consider X

∼ N (µ, σ2 I ) . n

Gaussian simple linear regression

n

Multiple linear regression

Let H be the orthogonal projection matrix on a p -dimensional subspace of Rn .

Gaussian multiple linear regression Regression in practice

The Cochran theorem states that: 





References

∼ N (Hµ, σ2H) . HX ⊥ X − HX .  ( − ) ∼ χ2 . HX

H

X µ σ2

n

2

p

49/109

Parameters distribution 4/6 Case study

With: σβ =



1

  σ2

2

n

∈ {1, 2}:

− x¯)2



x¯)2

Gaussian simple linear regression

,

Multiple linear regression

,

Gaussian multiple linear regression



Regression in practice References

 ∼ N    − ∼ N βi

and:

n i =1 (xi

n i =1 (xi

2

we have for i

n 2 i =1 xi

σ2

σβ2 =



Simple linear regression

βi

βi

σ βi

βi , σβ2

i

(0, 1) .

50/109

Parameters distribution 5/6 Case study

We can estimate

σ2

Simple linear regression

by: σ =

So:

n

 −     −    − 1

2

σβ2 = 1

σβ2 = 2

2

n

σ2

n

Gaussian simple linear regression Multiple linear regression

ei2 .

i =1

Gaussian multiple linear regression Regression in practice

n 2 i =1 xi

n i =1 (xi

x¯)2

σ2

n i =1 (xi

References

x¯)2

,

.

51/109

Parameters distribution 6/6 Case study Simple linear regression

We have for i

Gaussian simple linear regression

∈ {1, 2}:

Multiple linear regression

∀i ∈ {1, 2} : β σ− β ∼ T −2  i

i

Gaussian multiple linear regression

n

β

i

and:

       − ∼F  −

1 β 2σ 2

β



V −1 β

β

(2, n

Regression in practice References

− 1) .

52/109

Test on β1 and β2 For i

Case study

∈ {1, 2}, we test:



where c

Simple linear regression

H0 : β i = c

Gaussian simple linear regression



H1 : β i = c

∈ R (the test is called

We use the statistic test:

Multiple linear regression

significancy test for c = 0).

Gaussian multiple linear regression Regression in practice

−

βi c Ti = . σ βi

References

We reject H0 at the level α if:

|t | > t −2 1− 1 − 2 -quantile of T (n − 2). i

where tn−2,1− α2 is the

n

,

α

2

α

 

53/109

Confidence intervals for β1 and β2 Case study Simple linear regression Gaussian simple linear regression

The confidence interval of βi with level 1 βi



− t −2 1− n

,

α

2

− α is:

Multiple linear regression Gaussian multiple linear regression

σβi ; βi + tn−2,1− α2 σβi .

 

Regression in practice

 

References

It’s also possible to build a confidence interval for the vector β1 , β2 .

  

54/109

Table of contents Case study Simple linear regression

Case study

Gaussian simple linear regression

Simple linear regression

Multiple linear regression Gaussian multiple linear regression

Gaussian simple linear regression

Regression in practice References

Multiple linear regression Gaussian multiple linear regression Regression in practice

55/109

Introduction Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

The multiple linear regression assumes that there is a linear link between the dependent variable Y and p explanatory variables ( X1 ,..., Xp ):

Gaussian multiple linear regression Regression in practice

Y = β 1 X1 + β2 X2 + . . . + βp Xp + ε

References

where ε is the error of the model.

56/109

Sample Case study

Let ( xi 1 ,..., xip , yi )i ∈{1,...,n} n observations of (X1 ,..., Xp , Y ):

∀i ∈ {1,..., n} : y

i

Simple linear regression Gaussian simple linear regression

= β1 xi 1 + β2 xi 2 + . . . + βp xip + εi .

Multiple linear regression

One assume that:

Gaussian multiple linear regression







(yi )i ∈{1,...,n} is an i.i.d. sample of Y . (xij )i ∈{1,...,n},j ∈{1,...,p } are deterministic (and observed).

Regression in practice References

(εi )i ∈{1,...,n} is an i.i.d sample of ε (unobserved).

(εi )i ∈{1,...,n} , for ( i , j ) i

∈ {1,..., n}2, satisfy:



E (εi ) = 0 ,



Var ( εi ) = σ 2 ,



Cov( εi , εj ) = 0 if i = j .



57/109

Matrix form 1/2 Case study Simple linear regression

We can write: Y

= Xβ + ε

Gaussian simple linear regression

with:

Multiple linear regression

y1

x11 . . .

..

Y

=

      yi .. .

yn

,

X

=

  

.. xi 1 .. .

.. ... .. .

x1p

.. xip .. .

Gaussian multiple linear regression

ε1

  

xn1 . . . xnp .

,β=

   

β1 .. . , ε=

βp

..

      εi .. .

Regression in practice References

.

εn

58/109

Matrix form 2/2 Case study Simple linear regression Gaussian simple linear regression

We have: E (ε)

Multiple linear regression

=0,

Gaussian multiple linear regression

Γε = σ 2 In .

Regression in practice

So:

References

E ( Y)

= Xβ ,

Γ Y = σ 2 In .

59/109

Multiple linear regression with or without constant

Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

If the constant is in the model, we consider

Gaussian multiple linear regression

X1 equal to 1:

Regression in practice

∀i ∈ {1,..., n} : x 1 = 1 . There are only ( p − 1) explanatory variables. i

References

60/109

Linearization of problems Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

It’s possible to consider as explanatory variables functions of X1 ,..., Xp (power, exponential, logarithm. . . ).

Gaussian multiple linear regression Regression in practice References

61/109

Ordinary least square estimator 1/4 Case study Simple linear regression

Ordinary least square (OLS) estimators of β = (β1 ,...,

   

p ),

β = β1 ,..., βp , minimize:

S (β) =

Multiple linear regression

2

p

n

yi

Gaussian multiple linear regression

βj xij

  −   i =1

j =1

Regression in practice References

 Y − Xβ  2 = ( Y − Xβ )  ( Y − Xβ ) = Y Y − 2 β  X Y + β  X Xβ . =

Gaussian simple linear regression

62/109

Ordinary least square estimator 2/4 Case study Simple linear regression Gaussian simple linear regression

So:

Multiple linear regression

 Y − Xβ  2 = arg min Y Y − 2β  X Y + β  X Xβ

β = arg min

Gaussian multiple linear regression

β



β





Regression in practice

.

References

If the rank of matrix X is p , then X X is invertible. This is the case if the columns of X aren’t collinear.

63/109

Ordinary least square estimator 3/4 Case study Simple linear regression

We minimize S defined by:

Gaussian simple linear regression

S:

Rp

β

→ →

R+

Multiple linear regression

. S ( β ) = Y Y

− 2 β  X Y + β  X Xβ

Gaussian multiple linear regression Regression in practice

The gradient of S is:

References

∇S (β) = −2XY + 2XXβ . Note that the gradient of x → a x is a and that the gradient of x → x  Ax is Ax + A x (2Ax if the matrix A is symmetric).

64/109

Ordinary least square estimator 4/4 Case study Simple linear regression Gaussian simple linear regression

Hessian matrix of S is 2X X, which is positive definite. We need to solve:

Multiple linear regression Gaussian multiple linear regression

−XY + XXβ = 0 ⇔ XXβ = X−1Y ⇔ β = X X X Y .

Regression in practice

 

References

65/109

Fitted value and residuals Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

Fitted values are: Y

Residual are: e

= Xβ .

  − =Y

Y

Gaussian multiple linear regression Regression in practice References

.

66/109

Geometric interpretation Case study Simple linear regression



is the orthogonal projection of Y on the subspace spanned by columns of X: sp X1 ,..., Xp .

Y

{

}

The projection matrix ( “hat” matrix) is:

      H

=X

−1   X X X

= Xβ = X

 X X

−1

Multiple linear regression Gaussian multiple linear regression Regression in practice

.

References

We can check that: Y

Gaussian simple linear regression

 X Y

= HY .

67/109

Properties of fitted values Case study Simple linear regression Gaussian simple linear regression Multiple linear regression 

E ( Y)

= Xβ .



ΓY = σ 2 H .

Gaussian multiple linear regression Regression in practice



References

68/109

Properties of residuals 

e

∈ sp{X1,...,



e

=Y





X e

Case study

}⊥ .

Simple linear regression



Gaussian simple linear regression

− Y = Y − HY = H⊥Y where H⊥ = I −H . n

Multiple linear regression

=0.

Gaussian multiple linear regression

If there is constant is in the model model then n n e, 1n = 0, so i =1 e i = 0 and i =1 yi =





Xp





  2 = Y H ⊥ Y . e



E(e) = 0 .



Γe = σ 2 H ⊥ .

  

Regression in practice

n i =1 yi

.

References

69/109



Properties of β Case study Simple linear regression Gaussian simple linear regression



Multiple linear regression

β is an unbiased estimator of β :

Gaussian multiple linear regression

E



β

    

Γβ = σ 2

−1 X X

=β .

Regression in practice References

.

70/109

Gauss-Markov theorem Case study Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression

β is the best linear unbiased estimator ( BLUE) of β .

Regression in practice References

71/109

Estimation of σ 2 Case study Simple linear regression Gaussian simple linear regression

We consider the residual variance as estimation of σ 2 : σ2 =

 e

n

 −    

2

p

=

Multiple linear regression

SSR . n p

Gaussian multiple linear regression



   −

Regression in practice

Thus, for the variance-covariance matrix of β : Γβ = σ 2

 X X

−1

=

SSR n p

−1  XX

References

.

72/109

ANOVA Case study Simple linear regression Gaussian simple linear regression

With a constant in the model: SST

= SSM

n

 i =1

Multiple linear regression

+ SSR

n

(yi

− y )2

Y − y¯ 2 1n

=

(yi

y )2

  −   − 

+

i =1

=

Y

Gaussian multiple linear regression

n

y¯1n

(yi

yi )2

 −   − 

Regression in practice

i =1

2

+

Y

Y

References

2

73/109

Dcomposition of la Variance (ANOVA) Case study Simple linear regression Gaussian simple linear regression

Without a constant in the model: SST

= SSM

n



Multiple linear regression

+ SSR

n

yi2

=

i =1

Y2

yi2

  

+

i =1

=

Y

Gaussian multiple linear regression

n

2

(yi

yi )2

 −   − 

Regression in practice

i =1

+

Y

Y

References

2

74/109

Coefficient of determination Case study

The coefficient of determination R R2 =

2

Simple linear regression

∈ [0, 1] is defined by:

Gaussian simple linear regression

SSM . SST

Multiple linear regression Gaussian multiple linear regression

R2 is also equal to cos 2 (θ) where θ is the angle between:  Regression with constant: ( Y y¯1n ) and Y y¯1n .





Regression without constant:

Y

and



Y.

Regression in practice

 − 

References

Interpretation is easier in the case of a regression with constant.

75/109

Adjusted coefficient of determination Case study Simple linear regression

The coefficient of determination increases with p .

Gaussian simple linear regression

The adjusted coefficient of determination is defined by: 

Regression with constant : R2adjusted = 1



Multiple linear regression Gaussian multiple linear regression

− n −n p 1 − R2

  − −−  − 

.

Regression in practice References

Regression without constant: R2adjusted = 1

n n

1 1 p

R2 .

76/109

Table of contents Case study Simple linear regression

Case study

Gaussian simple linear regression

Simple linear regression

Multiple linear regression Gaussian multiple linear regression

Gaussian simple linear regression

Regression in practice References

Multiple linear regression Gaussian multiple linear regression Regression in practice

77/109

Assumption Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

We add:

Gaussian multiple linear regression

0, σ 2 .

ε

∼N

 

So (εi )i ∈{1,...,n} is an i.i.d sample with distribution

N 

0, σ 2 .

78/109

Regression in practice References

Distribution of

Y Case study Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression

We have: 2 Y

∼N

n Xβ , σ



In



.

Regression in practice References

79/109

Maximum likelihood estimation 1/3 Case study

The likelihood of ( ε1 ,...,



p ε1 ,...,

n)

Simple linear regression

is:

 √ −     √  −   − √  − −

n ; β, σ

2

= =

1

σ 2π 1 σ 2π

n

n

exp exp

Gaussian simple linear regression

1 ε 2 2 σ2

1 2 S (β) 2σ

Multiple linear regression Gaussian multiple linear regression

.

Regression in practice References

Thus the log-likelihood is:



 ε1 ,...,

n ; β, σ

2

=

n log



n log σ 2 2

1 S (β) . 2σ 2

80/109

Maximum likelihood estimation 2/3 Case study Simple linear regression Gaussian simple linear regression

We obtain the maximum by solving:



β

 ε1 ,...,

n ; β, σ

∂ ε1 ,..., n ; β , σ 2 ∂σ 2



2

⇔∇ S =0⇔− n 2σ



=0

β

Multiple linear regression

β ML

Gaussian multiple linear regression

=0,

1 S β ML = 0 . 4 2σML

Regression in practice

     2 ML

+

References

Maximum likelihood and OLS estimators of β are same.

81/109

Maximum likelihood estimation 3/3 Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

We have: 2 σML

S β

= n

 

2   = .

Gaussian multiple linear regression

e

n

Regression in practice

Maximum likelihood estimation of σ 2 is biased (but asymptotically unbiased).

References

82/109

Parameters distribution Case study Simple linear regression Gaussian simple linear regression

Cochran theorem gives:

Multiple linear regression



β 

∼N

 X X

,

Gaussian multiple linear regression Regression in practice

 −  ∼   ⊥  (n



β, σ 2

−1

p)

β

σ2 σ2

χ 2n−p ,

References

σ2 .

83/109

Test Case study

We test:



Simple linear regression

H0 : R β = r

Gaussian simple linear regression



H1 : R β = r

Multiple linear regression

with dim( R ) = (q , p ) and rang( R ) = q .

Gaussian multiple linear regression

The statistic used is: F =

n

∼F

 −  −  −  p

r

q



Regression in practice

2

Y

R (X X)



−1

R



References

−1

2

.



Under H0 : F (q , n p ) . We reject H0 at level α if f > f(q,n−p −1),1−α .

84/109

Overall significance test Case study Simple linear regression

For a regression with constant, we consider:



H0 : β 2 = . . . = β p = 0

∃ ∈ {2,..., p} /β = 0

H1 : i

Gaussian simple linear regression

.

Multiple linear regression

i

Gaussian multiple linear regression

Statistic used is:

Regression in practice

− p SSM − 1 SSR − p R2 . − 1 1 − R2 Under H0 : F ∼ F (p − 1, n − p ). n F = p n = p

References

We reject H0 at level α if f > f(p−1,n−p),1−α .

85/109

ANOVA Case study Simple linear regression

We define:

Gaussian simple linear regression

SSM MSM = , p 1 SSR MSR = . n p



Source

ddl

SS

MS

M

−1 n−p n−1

SSM

MSM

SSR

MSR

R T

p



F

MSM MSR

Multiple linear regression Gaussian multiple linear regression Regression in practice

− value P (F (p − 1, n − p ) > f ) p

SST

86/109

References

Test on a parameter Case study

Consider:

for i



Simple linear regression

H0 : β i = c

Gaussian simple linear regression



H1 : β i = c

Multiple linear regression

∈ {1,..., p}.

Gaussian multiple linear regression

The statistic used is: T =

  − βi

σ

    where

X X

−1 is X X ii −1

Regression in practice

c

References

(X X)ii−1

le i -th diagonal value of the matrix

. Under H0 : T (n p ). We reject H0 at level α if t > tn−p ,1−α .

∼T −

87/109

Table of contents Case study Simple linear regression

Case study

Gaussian simple linear regression

Simple linear regression

Multiple linear regression Gaussian multiple linear regression

Gaussian simple linear regression

Regression in practice References

Multiple linear regression Gaussian multiple linear regression Regression in practice

88/109

Atypical explanatory variables 1/2 Case study

The i -th diagonal value of

H 

hii = Xi 

X

where

i

Simple linear regression

is called leverage point :

   X X

−1

Gaussian simple linear regression Multiple linear regression

Xi

Gaussian multiple linear regression

= (xi 1 ,..., xip ) .

We have 0

Regression in practice

≤ h ≤ 1 because H = H and: n

2 H

References

ii

=H

⇒h

ii

=

 j =1

n

hij2

=

hii2

+



hij2 .

j =1,j  =i

89/109

Atypical explanatory variables 2/2 Case study Simple linear regression Gaussian simple linear regression



If hii is close to 1 then hij are close to 0, yi will be almost explained by yi . n i =1 hii

Moreover Tr (H) = rang ( H) so

Multiple linear regression Gaussian multiple linear regression

= p.

Regression in practice



Belsey proposed to consider i as atypical if: hii > 2

References

p . n

90/109

Atypical dependent variable: principle Case study Simple linear regression

We use residuals: ei = yi

−y

We have: Γe = σ 2 (In

 i

Gaussian simple linear regression

.

Multiple linear regression

−H ) .

Gaussian multiple linear regression

Thus:

Regression in practice

Var (ei ) = σ 2 (1

References

− h ), Cov( e , e ) = σ 2 (1 − h ) . i

j

ii

ij

We use studentized residuals to evaluate if an observation is atypical.

91/109

Atypical dependent variable: internal studentized residuals

Case study Simple linear regression Gaussian simple linear regression

For i

Multiple linear regression

∈ {1,..., n}, the internal studentized residuals are: e ri =

y y = σ i 1 ih . ii

 √ −− i

Var (ei )

Gaussian multiple linear regression Regression in practice References

T − p − 1)

ri approximatively follows the distribution (n (approximation used when n p 1 > 30).

− −

92/109

Atypical dependent variable: external studentized residuals We use cross validation criterion. Consider leave-one out calculations: y i −i =

    β

σ −i

−i

2

=

 

−i Xi X

=

n

β

−i 

Y −i

Simple linear regression

 −     

−i

X

Gaussian simple linear regression

,

−i

−1

X

−i 

Multiple linear regression

Y

Gaussian multiple linear regression

,

Regression in practice

2

Y −i

References

.

−p−1

The external studentized residuals are: Ri =

σ −i We have: Ri

−y−

    yi

∼ T

1+

(n

−i Xi

− p).



i

Case study

i



−1 −i (X −i ) X −i Xi

.

93/109

Observations influence Case study Simple linear regression



The Cook distance measures a difference between β and −i β . For the observation i :



β

Di = 1 p

β

−i 

X X

β

β

Multiple linear regression Gaussian multiple linear regression

−i

 −     −   σ2

Gaussian simple linear regression

.

Regression in practice References

Cook proposed to consider i as influent if: Di >

4

n

−p . 94/109

Collinearity detection: problem Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

If the columns of

X

Gaussian multiple linear regression

are collinear then the OLS estimator

isn’t unique.

Regression in practice References

95/109

Collinearity detection: facteur d’influence of la variance and tolrance

∈{

Case study

}

We compute the regression of Xj , for j 1,..., p , on the p other variables (with the constant) and we calculate the coefficient of determination R j2 . The variance inflation factor (VIF) of Xj , j 1,..., p , is:

∈{

VIFj =

1

−1 R2 .

TOLj =

1 . VIFj

}

Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice

j

References

The tolerance (TOL) is:

In practice, we consider that there is a collinearity problem if VIFj > 10 (TOLj < 0.1). 96/109

Collinearity detection: explanatory variables structure1/5

Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

In the case of a regression without constant, consider:

 

X

=

− 1−

X1

X1 1 n

X

X1 1 n

  ,...,

Xp Xp

− −

X p 1n X p 1n



Gaussian multiple linear regression Regression in practice

.

References

97/109

Collinearity detection: explanatory variables structure 2/5

Case study Simple linear regression



The correlation matrix of explanatory variables is R = X X. It’s a nonnegative and symmetric matrix, thus diagonalizable. Let (λ1 ,..., p ) be the p descendingly ordered eigenvalues of R . The condition numbers for j 1,..., p are defined by:

∈{

CNj =



}

Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References

λ1 . λj

In practice a CN j > 30 value indicates a possible problem of collinearity.

98/109

Collinearity detection: analyse of la structure des explanatory variables 3/5 We now seek to determine the groups of variables concerned. Let V be the transfer matrix ( V V  = Ip ) of R :

R=V

We have for k

  

λ1 0 .. .. .

0 .. . .. . .. .. . .

0 .. .

...

..

λi .. .

0 .. . .. .

... ..

.

0

∈ {1,..., p}:

Var βk =

  

σ2 Xk

X k 1n



0 λp

  

p

j =1



Case study Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice

V .

References

vkj2

λj

. 99/109

Collinearity detection: explanatory variables structure 4/5

Case study



We compute the proportion πlk of the variance of βk explained by Xl : vkl2 λl

πlk =

We have:

β1

Eigenvalues

CN

λ1

1

π11

.. .

.. .

.. .

λj



.. .

.. .

λp



λ1 λj

λ1 λp

Var

 

πj 1

2 vkj p j =1 λj



...

.. .

.. . πp 1

Multiple linear regression Gaussian multiple linear regression

 

Var βk 1k

... .. .

jk

pk

 

Var βp 1p

.. . .. .

.. . .. .

Gaussian simple linear regression

.

.. . .. .

Simple linear regression

jp

.. . .. .

pp

100/109

Regression in practice References

Collinearity detection: explanatory variables structure 5/5

Case study Simple linear regression Gaussian simple linear regression

In practice, we must study the explanatory variables Xj with a high CN. For this variable j , if there are at least two explanatory variables Xk and Xk such that πjk and πjk are high (more than 0.5 in practice) then problem of collinearity is suspected between these variables k and k  . 



101/109

Multiple linear regression Gaussian multiple linear regression Regression in practice References

Complementary analysis Case study Simple linear regression Gaussian simple linear regression 

Homoscedasticity. There are tests but it’s also possible to plot studentized residuals in function of fitted values.

Multiple linear regression Gaussian multiple linear regression Regression in practice



Normality of the error distribution It’s possible to use tests, for example a Shapiro-Wilk test.

References

102/109

Model selection: principle Case study Simple linear regression Gaussian simple linear regression

Assume that we are searching k predictors between K possible predictors. There are

K

Multiple linear regression Gaussian multiple linear regression

potential models.

k

We need:

Regression in practice





a criterion to compare two models,



a search strategy.

References

103/109

Model selection: possible criterions Case study Simple linear regression Gaussian simple linear regression





Multiple linear regression

Adjusted coefficient of determination,

Gaussian multiple linear regression

Kullback information criterion (AIC or BIC for example),

Regression in practice References



Mallow’s coefficient.

104/109

Model selection: Kullback-Leibler information criterion

Case study Simple linear regression

Let f0 be a probability density that we want to estimate among a parameterized set F. The Kullback-Leibler divergence is a measure of distance between true and estimated probability densities: I (f0 ,

F ) = min ∈F f

log

f0 ( x ) f (x )

    

Multiple linear regression Gaussian multiple linear regression

f0 (x ) dx .

This quantity is always nonnegative and is zero if f0 Kullback-Leibler estimators have the following form:



Gaussian simple linear regression

Regression in practice References

∈ F.

2 I (Mk ) = n log σM + α (n) k k 2 is the residual variance of M (one of the models where σM k k with k predictors).



105/109

Model selection: AIC and BIC criterions Case study Simple linear regression

There are many choices for α: 

The Akaike criterion (AIC: Akaike Information Criterion) With α (n) = 2:

Multiple linear regression Gaussian multiple linear regression

2 AIC = n log σM + 2k . k 

Gaussian simple linear regression

Regression in practice

   

The Schwarz criterion (BIC: Bayesian Information Criterion, equivalent to SBC: Schwarz Bayesian Criterion) With α (n) = log ( n):

References

2 BIC = n log σM + k log( n) . k

106/109

Model selection: Mallows criterion Case study Simple linear regression Gaussian simple linear regression Multiple linear regression

The Mallows criterion Cp (p doesn’t refer to the number of predictors) for a model Mk with k predictors is defined by: SSR( Mk ) Cp = σ2



Gaussian multiple linear regression Regression in practice

− n + 2k .

References

107/109

Model selection: search strategy 

 





Start with no variables in the model. Add the variable whose inclusion gives the most statistically significant improvement (given the criterion chosen). Repeat until none of the variables exclusion improves the model.

Backward procedure:  





Case study

Forward procedure:

Start with all variables in the model. Remove the variable whose exclusion gives the most statistically significant improvement (given the criterion chosen). Repeat until none of the variables inclusion improves the model.

Stepwise procedure: It’s a combination of the forward and backward procedures. 108/109

Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References

References Case study Simple linear regression Gaussian simple linear regression

Antoniadis, A., Berruyer, J., and Carmona, R. (1992). R´ egression non lin´ eaire et applications. Economica.

Multiple linear regression

Cornillon, P.-A. and Matzner-Løber, E. (2010). R´ egression avec R . Springer.

Gaussian multiple linear regression

Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland , 15:246–263.

Regression in practice References

McCullagh, P. and Nelder, J. A. (1989). Generalized linear models . Monographs on Statistics & Applied Probability (37). Chapman & Hall.

109/109

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF