1_LinearRegression
Short Description
LinearRegression...
Description
Case study Simple linear regression
Advanced regression methods
Gaussian simple linear regression
1. Reminders on linear regression
Multiple linear regression
V. Lefieux
Gaussian multiple linear regression Regression in practice References
1/109
Table of contents Case study Simple linear regression
Case study
Gaussian simple linear regression
Simple linear regression
Multiple linear regression Gaussian multiple linear regression
Gaussian simple linear regression
Regression in practice References
Multiple linear regression Gaussian multiple linear regression Regression in practice
2/109
Table of contents Case study Simple linear regression
Case study
Gaussian simple linear regression
Simple linear regression
Multiple linear regression Gaussian multiple linear regression
Gaussian simple linear regression
Regression in practice References
Multiple linear regression Gaussian multiple linear regression Regression in practice
3/109
Forecast the ozone concentration 112 records from Rennes-summer 2001 (source: Laboratoire of math´ematiques appliqu´ees of l’Agrocampus Ouest) containing:
Case study Simple linear regression Gaussian simple linear regression
MaxO3: daily maximum of ozone concentration (µ gr /m3 ).
Multiple linear regression
T9, T12, T15: temperature (at 09:00, 12:00 and
Gaussian multiple linear regression
15:00). Ne9, Ne12, Ne15: cloud cover (at 09:00, 12:00 and 15:00).
Regression in practice References
Vx9, Vx12, Vx15: east-west component of the wind (at 09:00, 12:00 and 15:00). MaxO3v: daily maximum of ozone concentration for the day before.
wind: wind direction at 12:00.
rain: rainy or dry day. 4/109
A linear link ? Case study Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References
5/109
Table of contents Case study Simple linear regression
Case study
Gaussian simple linear regression
Simple linear regression
Multiple linear regression Gaussian multiple linear regression
Gaussian simple linear regression
Regression in practice References
Multiple linear regression Gaussian multiple linear regression Regression in practice
6/109
Terminology Case study Simple linear regression
The word regression was introduced by Francis Galton in Regression towards mediocrity in hereditary stature : (Galton, 1886) ( http://galton.org/essays/1880-1889/ galton-1886-jaigi-regression-stature.pdf). He observed that extreme heights in parents are not completely passed on to their offspring.
Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References
The most known regression is the linear model, simple or multivatiate (Cornillon and Matzner-Løber, 2010), but there are a lot of other models including non linear models (Antoniadis et al., 1992).
7/109
Assumptions Case study
Y : dependent variable , random. X : explanatory variable, deterministic.
Simple linear regression Gaussian simple linear regression
Simple linear regression assumes:
Multiple linear regression
Y = β 1 + β2 X + ε
Gaussian multiple linear regression Regression in practice
where:
References
β1 and β2 are unknown parameters (unobserved), ε, the error of model, is a centered random variable with variance σ 2 : E (ε)
=0,
Var (ε) = σ 2 .
8/109
Sample Case study
Let ( xi , yi )i ∈{1,...,n} be n realizations of ( X , Y ):
∀i ∈ {1,..., n} : y
i
Simple linear regression Gaussian simple linear regression
= β 1 + β2 xi + εi .
Multiple linear regression
One assume that:
Gaussian multiple linear regression
(yi )i ∈{1,...,n} is an i.i.d. sample of Y . (xi )i ∈{1,...,n} are deterministic (and observed). (εi )i ∈{1,...,n} is an i.i.d. sample of ε (unobserved).
(εi )i ∈{1,...,n} , for ( i , j ) i
Regression in practice References
∈ {1,..., n}2, satisfy:
E (εi ) = 0 ,
Var ( εi ) = σ 2 ,
Cov( εi , εj ) = 0 if i = j .
9/109
Matrix form 1/3 Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
y1
.
1 x1 =
.
.
ε1
β1 +
.
yn
1 xn
β2
Gaussian multiple linear regression
.
Regression in practice References
εn
10/109
Matrix form 2/3 Case study Simple linear regression Gaussian simple linear regression
Or: Y
= Xβ + ε
Multiple linear regression
where:
Y
=
Gaussian multiple linear regression
y1 .. . ,
yn
1 xn
Regression in practice
1 x1 . .. X = .. . , β=
β1 β2
, ε=
ε1 .. . .
References
εn
11/109
Matrix form 3/3 Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
We have: E (ε)
Gaussian multiple linear regression
=0,
Regression in practice
Γε = σ 2 In .
References
where
ε
is the variance-covariance matrix of ε.
12/109
Ordinary least square estimator 1/5 Case study
Ordinary least square (OLS) estimators of β1 and β2 , β1 and β2 , minimize:
n
S (β1 , β2 ) =
(yi
i =1
so:
− β1 − β2x ) i
Simple linear regression Gaussian simple linear regression Multiple linear regression
2
Gaussian multiple linear regression Regression in practice
References
β1 , β2 = arg min S (β1 , β2 ) (β1 ,β2 )
n
= arg min
(β1 ,β2 )
i =1
(yi
− β 1 − β 2 x )2 . i
σ 2 must also estimated.
13/109
Ordinary least square estimator 2/5 Case study
The regression line is:
Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References
where yi = β1 + β2 xi .
14/109
Ordinary least square estimator 3/5 Case study
It isn’t the distance between points and regression line that we minimize:
Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References
15/109
Ordinary least square estimator 4/5 Case study Simple linear regression Gaussian simple linear regression
It’s possible to consider other distances, for example:
Multiple linear regression
n
∗
S (β1 , β2 ) =
yi
| − i =1
β1
Gaussian multiple linear regression
β2 xi .
−
|
Regression in practice References
Ordinary least square estimators are sensitive to outliers but are easily obtained (by derivation) and are unique (if the xi aren’t all equal).
16/109
Ordinary least square estimator 5/5 Case study Simple linear regression Gaussian simple linear regression
We assume that ( xi )i ∈{1,...,n} aren’t all equal (otherwise columns of X are colinear and the solution non unique). Ordinary least square estimator of ( β1 , β2 ) is:
Multiple linear regression Gaussian multiple linear regression
sXY , sX2
Regression in practice
−
β2 =
β1 = y¯
References
β2 x¯ .
17/109
Proof 1/3
We search β1 and β2 which minimize:
Case study Simple linear regression
n
S (β1 , β2 ) =
( yi
i =1
2
−β −β x ) . 1
2 i
Gaussian simple linear regression Multiple linear regression
The hessian matrix of the function:
Gaussian multiple linear regression
2 R
S:
(β1 , β2 ) is positive definite. Hessian matrix, H (S ) (or
H (S ) =
R
→ →
Regression in practice
S (β1 , β2 )
2
∇ S ), is:
References
∂ 2 S (β1 ,β2 ) ∂β 12
∂ 2 S (β1 ,β2 ) ∂β 1 ∂β2
∂ 2 S (β1 ,β2 ) ∂β 2 ∂β 1
∂ 2 S (β1 ,β2 ) ∂β 22
.
18/109
Proof 2/3 We search values such that
∇S (β1, β2) = We have:
So:
∂ S (β1 , β2 ) = ∂β1
−2
∂ S (β1 , β2 ) = ∂β2
2
∂ S (β1 ,β2 ) ∂β1 ∂ S (β1 ,β2 ) ∂β2
n
Case study Simple linear regression
=0.
Multiple linear regression Gaussian multiple linear regression
−− −− − − − − −
n i =1 yi n i =1 xi
(yi
β1
Gaussian simple linear regression
Regression in practice
β2 xi ) ,
References
i =1 n
xi (yi
β1
β2 xi ) .
i =1
β1
yi
β1
β2 xi = 0
β2 xi = 0
. 19/109
Proof 3/3 First equation gives: Case study
n
n
⇔
n β1 + β2
xi =
i =1
Simple linear regression
β1 + β2 x = y .
yi
Gaussian simple linear regression
i =1
Second equation gives: n
β1
n
So:
n
xi2 =
xi + β 2 i =1
Multiple linear regression
i =1
Gaussian multiple linear regression
xi yi . i =1
− − ⇔ − − ⇔ − n
y
β2 x
n
i =1
i =1
n
β2
i =1 1 β2 = n1 n
n
xi2
x
i =1
=
xi
n i =1 xi yi n 2 i =1 xi
xi yi
n
i =1
n
xi yi
i =1
xy
x
2
=
References
n
xi2 =
xi + β 2
Regression in practice
y
xi
i =1
sX ,Y . sX2
20/109
Regression line Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
The regression line is:
Gaussian multiple linear regression
y = β1 + β2 x .
Regression in practice
References
The barycenter ( x , y ) belongs to the regression line.
21/109
Fitted value Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
The fitted value of i -th observation is:
Gaussian multiple linear regression
yi = β1 + β2 xi .
Regression in practice
References
22/109
Residual Case study Simple linear regression
The residual of i -th observation is:
−y
ei = yi
i
Gaussian simple linear regression
.
Multiple linear regression
ei is an estimation of εi .
We have:
Gaussian multiple linear regression Regression in practice
n
S β1 , β2 =
ei2
References
i =1
and:
n
εi2 .
S (β1 , β2 ) =
i =1
23/109
Properties of fitted values Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
n
n
yi =
i =1
i =1
Var (yi ) = σ
2
Gaussian multiple linear regression
yi .
1 + n
− x¯)2 ¯)2 =1 (x − x
(xi
n i
i
Regression in practice References
.
24/109
Properties of residuals Case study Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression
n
i =1
ei = 0 .
Regression in practice References
25/109
Properties of β1 and β2
Case study
∀i ∈ {1, 2} : E
Var β1 = n
Cov β1 , β2 =
Multiple linear regression Gaussian multiple linear regression
n 2 i =1 xi
n (x i i =1
x¯)2 .
− − − −
Var β2 =
Gaussian simple linear regression
βi = βi .
σ2
Simple linear regression
β1 and β2 are unbiased estimators of β1 and β2 :
σ2 n i =1 (xi
References
x¯)2
.
σ 2 x¯
n i =1 (xi
Regression in practice
x¯)2
.
26/109
Linear estimators Case study Simple linear regression
It’s possible to find ( λ1 , λ2 ) and β2 = λ 2 Y:
n
β1 = y¯
β2 x¯ =
n
1 n
x¯ ( xi
− − −− n
β2 =
i =1
i =1
(xi
n j =1 (xj
Gaussian simple linear regression
∈ (R )2 such that β1 = λ1 Y
x¯)
x¯)2
yi .
Multiple linear regression Gaussian multiple linear regression
x¯)
−− x¯)2
n j =1 (xj
yi ,
Regression in practice References
27/109
Gauss-Markov theorem Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
The theorem states that OLS estimators β1 and β2 have the smallest variances among unbiased linear estimators.
Gaussian multiple linear regression Regression in practice
They are BLUE: Best Linear Unbiased Estimators.
References
28/109
Residual variance Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
We consider the following unbiased estimation of σ 2 , called residual variance: σ2 =
n
1
n
− 2
Gaussian multiple linear regression Regression in practice
ei2 .
References
i =1
29/109
Forecast Case study Simple linear regression
For an observation xn+1 , we call forecast (or prediction):
Gaussian simple linear regression
ynp+1 = β1 + β2 xn+1 .
It’s a forecast of:
Multiple linear regression Gaussian multiple linear regression Regression in practice
yn+1 = β1 + β2 xn+1 + εn+1 .
References
We have:
Var ynp+1 = σ 2
1 + n
(xn+1 x¯)2 n x¯)2 i =1 (xi
− −
.
30/109
Forecast error Case study Simple linear regression
The forecast error is:
Gaussian simple linear regression
enp+1 = yn+1
We have: E
and:
Var enp+1 = σ 2
p n
Multiple linear regression
− y +1 .
Gaussian multiple linear regression
enp+1 = 0
1 1+ + n
Regression in practice References
− x¯)2 ¯)2 =1 (x − x
(xn+1
n i
i
.
31/109
Geometric interpretation 1/2 Case study
We have: Y
Simple linear regression
= β 1 1n + β2 X + ε
where
X
=
x1 .. .
Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression
.
Regression in practice
xn
We can rewrite:
References
n
S (β1 , β2 ) =
i =1
(yi
− (β1 + β2x ))2 = Y − (β1 i
where x is the euclidean distance of x
1+
β 2 X)
2
p
∈R . 32/109
Geometric interpretation 2/2 Case study Simple linear regression
We have:
− Y
with
Y
Y
Gaussian simple linear regression
2
= min
(β1 ,β2
Y − (β1 )
β2 X)
Multiple linear regression
2
Gaussian multiple linear regression
= β1 1 + β2 X.
is the orthogonal projection of by 1 and X (sp 1, X ).
Y
1+
Regression in practice References
Y
on the subspace spanned
{ }
β1 and β2 are the coordinates of the
Y
projection.
33/109
Introduction to the coefficient of determination Case study Simple linear regression Gaussian simple linear regression
We want to explain variations of Y by variations of X . Variations of Y are the differences between yi and their average y : yi y .
Multiple linear regression
−
Gaussian multiple linear regression Regression in practice
We can write:
References
yi
−
where yi
−y = y −y +y −y
i
i
i
y¯ is the variation explained by the model.
34/109
Analysis of variance Case study Simple linear regression Gaussian simple linear regression
ANOVA (ANalysis Of VAriance) states: SST
=SSM
n
n
− y) Y − y¯ 2
i =1
(yi
1n
2
= i =1 (yi =
y)
− − Y
Gaussian multiple linear regression
n
2
Multiple linear regression
+SSR
y¯1n
2
2
+ i =1 (yi +
yi )
Regression in practice
2
References
− − Y
Y
where SST is the Sum of Squares Total, SSM is the Sum of Squares Model and SSR is the Sum of Squares Residual.
35/109
Coefficient of determination 1/3 Case study Simple linear regression
The coefficient of determination , R2 is defined by: R2 = R
2
SSM = SST
∈ [0, 1] because: 0
− − Y
y¯1n
Y
y¯1n
Gaussian simple linear regression
2
2
Multiple linear regression
.
Gaussian multiple linear regression Regression in practice References
≤ SSM ≤ SST .
If R2 = 1 then SSM = SST. If R2 = 0 then SSR = SST.
36/109
Coefficient of determination 2/3 For the simple linear regression : 1 n
n
− (yi
i =1
= = So: s2 SSM R = = Y2 = SST sY 2
− − − − −
1 y )2 = n =
Case study
1
n
1 n
n
sX ,Y sX ,Y x + 2 xi 2 sX sX
y
i =1
sX ,Y sX2
2
sX ,Y sX2
2 2 sX
sX ,Y sX
2
n i =1 (yi n i =1 (yi
1 n
Simple linear regression
2
y
Gaussian simple linear regression
n
(xi
x)
Multiple linear regression
2
Gaussian multiple linear regression
i =1
Regression in practice References
.
y )2 y)
2
=
sX , Y sX sY
2
= rX2 ,Y . 37/109
Coefficient of determination 3/3 Case study Simple linear regression
Don’t over-interpret the coefficient of determination:
For a good model, R
2
Gaussian simple linear regression Multiple linear regression
is close to 1.
Gaussian multiple linear regression Regression in practice
But a R 2 close to 1 doesn’t indicate that a linear model is adequate.
References
A R2 close to 0 indicates that a linear model isn’t adequate (some non linear models could be adequate).
38/109
Limitations Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
It’s impossible to build confidence intervals on parameters, or test their significancy: it becomes possible if we add a distribution.
Gaussian multiple linear regression Regression in practice
The gaussian simple linear regression assumes that the error is gaussian.
39/109
References
Table of contents Case study Simple linear regression
Case study
Gaussian simple linear regression
Simple linear regression
Multiple linear regression Gaussian multiple linear regression
Gaussian simple linear regression
Regression in practice References
Multiple linear regression Gaussian multiple linear regression Regression in practice
40/109
Assumptions Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
We add:
Gaussian multiple linear regression
0, σ 2 .
ε
∼N
(εi )i ∈{1,...,n} is an i.i.d sample with distribution
Regression in practice
N
0, σ 2 .
References
41/109
Distribution of
Y Case study Simple linear regression Gaussian simple linear regression
We have:
Multiple linear regression Gaussian multiple linear regression
β1 + β2 x1 Y
∼N
n
.. β1 + β2 xn
, σ 2 In
Regression in practice
.
References
42/109
Maximum likelihood estimation 1/3 Case study
The likelihood of ( ε1 ,...,
=
is:
Simple linear regression
− − − √ √ − √ − − −
p ε1 ,...,
=
n)
n ; β1 , β2 , σ
1
σ 2π 1 σ 2π
n
n
exp exp
1 2
2
n
yi
i =1
β1 β2 xi σ
1 S (β1 , β2 ) 2σ 2
Gaussian simple linear regression
2
Multiple linear regression Gaussian multiple linear regression Regression in practice
.
References
So the log-likelihood is: ε1 ,..., =
n log
n ; β1 , β2 , σ
2π
2
n log σ 2 2
1 S (β1 , β2 ) . 2σ 2
43/109
Maximum likelihood estimation 2/3 Case study Simple linear regression
We obtain maximum:
∂ ε1 ,...,
2 n ; β1 , β2 , σ
∂β1
∂ ε1 ,...,
n ; β1 , β2 , σ
∂β2
n ; β1 , β2 , σ
∂σ 2
2
Multiple linear regression
∂ S β1,ML , β2,ML =0
⇔
∂β1
=0,
Gaussian multiple linear regression
∂ S β1,ML , β2,ML
2
∂ ε1 ,...,
Gaussian simple linear regression
=0 =0
⇔
∂β2
n
⇔ −
=0,
2 2σML
1 + 4 S β1,ML , β2,ML 2σML
Regression in practice References
=0.
Maximum likelihood and OLS estimators of β1 and β2 are same.
44/109
Maximum likelihood estimation 3/3 Case study
We have: 2 σML
= = = =
Simple linear regression
S β1 , β2
Gaussian simple linear regression
n
1 n 1 n 1 n
Multiple linear regression
n
yi
− β1 − β2x
2
Gaussian multiple linear regression
i
i =1 n
Regression in practice
− (yi
yi )2
References
i =1 n
ei2 .
i =1
Maximum likelihood estimation of σ 2 is biased (but asymptotically unbiased). 45/109
Parameters distribution 1/6 Case study
With β = (β1 , β2 ) and β = β1 , β2 β
where: V =
∼ N2
Simple linear regression
β, σ V
1 n
x¯)2
Multiple linear regression
2
− − 1 n i =1 (xi
Gaussian simple linear regression
, we have:
n 2 i =1 xi
x¯
Gaussian multiple linear regression
− x¯
Regression in practice References
.
1
46/109
Parameters distribution 2/6 Case study Simple linear regression Gaussian simple linear regression
So: β1
∼N
β1 ,
σ2 n
Multiple linear regression
n 2 i =1 xi
n
( xi
x¯)
Gaussian multiple linear regression
,
2
∼ N − − i =1
β2
β2 ,
σ2
n i =1 (xi
x¯)
2
Regression in practice
.
References
The 2 estimators aren’t independent.
47/109
Parameters distribution 3/6 Case study Simple linear regression Gaussian simple linear regression
Cochran theorem gives: (n
− 2) σˆ 2 = σ2
and:
Multiple linear regression
n 2 i =1 e i
σ2
⊥
β
χ
∼
Gaussian multiple linear regression
2 n−2
Regression in practice References
σ2 .
48/109
Cochran theorem Case study Simple linear regression
Consider X
∼ N (µ, σ2 I ) . n
Gaussian simple linear regression
n
Multiple linear regression
Let H be the orthogonal projection matrix on a p -dimensional subspace of Rn .
Gaussian multiple linear regression Regression in practice
The Cochran theorem states that:
References
∼ N (Hµ, σ2H) . HX ⊥ X − HX . ( − ) ∼ χ2 . HX
H
X µ σ2
n
2
p
49/109
Parameters distribution 4/6 Case study
With: σβ =
1
σ2
2
n
∈ {1, 2}:
− x¯)2
x¯)2
Gaussian simple linear regression
,
Multiple linear regression
,
Gaussian multiple linear regression
−
Regression in practice References
∼ N − ∼ N βi
and:
n i =1 (xi
n i =1 (xi
2
we have for i
n 2 i =1 xi
σ2
σβ2 =
Simple linear regression
βi
βi
σ βi
βi , σβ2
i
(0, 1) .
50/109
Parameters distribution 5/6 Case study
We can estimate
σ2
Simple linear regression
by: σ =
So:
n
− − − 1
2
σβ2 = 1
σβ2 = 2
2
n
σ2
n
Gaussian simple linear regression Multiple linear regression
ei2 .
i =1
Gaussian multiple linear regression Regression in practice
n 2 i =1 xi
n i =1 (xi
x¯)2
σ2
n i =1 (xi
References
x¯)2
,
.
51/109
Parameters distribution 6/6 Case study Simple linear regression
We have for i
Gaussian simple linear regression
∈ {1, 2}:
Multiple linear regression
∀i ∈ {1, 2} : β σ− β ∼ T −2 i
i
Gaussian multiple linear regression
n
β
i
and:
− ∼F −
1 β 2σ 2
β
V −1 β
β
(2, n
Regression in practice References
− 1) .
52/109
Test on β1 and β2 For i
Case study
∈ {1, 2}, we test:
where c
Simple linear regression
H0 : β i = c
Gaussian simple linear regression
H1 : β i = c
∈ R (the test is called
We use the statistic test:
Multiple linear regression
significancy test for c = 0).
Gaussian multiple linear regression Regression in practice
−
βi c Ti = . σ βi
References
We reject H0 at the level α if:
|t | > t −2 1− 1 − 2 -quantile of T (n − 2). i
where tn−2,1− α2 is the
n
,
α
2
α
53/109
Confidence intervals for β1 and β2 Case study Simple linear regression Gaussian simple linear regression
The confidence interval of βi with level 1 βi
− t −2 1− n
,
α
2
− α is:
Multiple linear regression Gaussian multiple linear regression
σβi ; βi + tn−2,1− α2 σβi .
Regression in practice
References
It’s also possible to build a confidence interval for the vector β1 , β2 .
54/109
Table of contents Case study Simple linear regression
Case study
Gaussian simple linear regression
Simple linear regression
Multiple linear regression Gaussian multiple linear regression
Gaussian simple linear regression
Regression in practice References
Multiple linear regression Gaussian multiple linear regression Regression in practice
55/109
Introduction Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
The multiple linear regression assumes that there is a linear link between the dependent variable Y and p explanatory variables ( X1 ,..., Xp ):
Gaussian multiple linear regression Regression in practice
Y = β 1 X1 + β2 X2 + . . . + βp Xp + ε
References
where ε is the error of the model.
56/109
Sample Case study
Let ( xi 1 ,..., xip , yi )i ∈{1,...,n} n observations of (X1 ,..., Xp , Y ):
∀i ∈ {1,..., n} : y
i
Simple linear regression Gaussian simple linear regression
= β1 xi 1 + β2 xi 2 + . . . + βp xip + εi .
Multiple linear regression
One assume that:
Gaussian multiple linear regression
(yi )i ∈{1,...,n} is an i.i.d. sample of Y . (xij )i ∈{1,...,n},j ∈{1,...,p } are deterministic (and observed).
Regression in practice References
(εi )i ∈{1,...,n} is an i.i.d sample of ε (unobserved).
(εi )i ∈{1,...,n} , for ( i , j ) i
∈ {1,..., n}2, satisfy:
E (εi ) = 0 ,
Var ( εi ) = σ 2 ,
Cov( εi , εj ) = 0 if i = j .
57/109
Matrix form 1/2 Case study Simple linear regression
We can write: Y
= Xβ + ε
Gaussian simple linear regression
with:
Multiple linear regression
y1
x11 . . .
..
Y
=
yi .. .
yn
,
X
=
.. xi 1 .. .
.. ... .. .
x1p
.. xip .. .
Gaussian multiple linear regression
ε1
xn1 . . . xnp .
,β=
β1 .. . , ε=
βp
..
εi .. .
Regression in practice References
.
εn
58/109
Matrix form 2/2 Case study Simple linear regression Gaussian simple linear regression
We have: E (ε)
Multiple linear regression
=0,
Gaussian multiple linear regression
Γε = σ 2 In .
Regression in practice
So:
References
E ( Y)
= Xβ ,
Γ Y = σ 2 In .
59/109
Multiple linear regression with or without constant
Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
If the constant is in the model, we consider
Gaussian multiple linear regression
X1 equal to 1:
Regression in practice
∀i ∈ {1,..., n} : x 1 = 1 . There are only ( p − 1) explanatory variables. i
References
60/109
Linearization of problems Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
It’s possible to consider as explanatory variables functions of X1 ,..., Xp (power, exponential, logarithm. . . ).
Gaussian multiple linear regression Regression in practice References
61/109
Ordinary least square estimator 1/4 Case study Simple linear regression
Ordinary least square (OLS) estimators of β = (β1 ,...,
p ),
β = β1 ,..., βp , minimize:
S (β) =
Multiple linear regression
2
p
n
yi
Gaussian multiple linear regression
βj xij
− i =1
j =1
Regression in practice References
Y − Xβ 2 = ( Y − Xβ ) ( Y − Xβ ) = Y Y − 2 β X Y + β X Xβ . =
Gaussian simple linear regression
62/109
Ordinary least square estimator 2/4 Case study Simple linear regression Gaussian simple linear regression
So:
Multiple linear regression
Y − Xβ 2 = arg min Y Y − 2β X Y + β X Xβ
β = arg min
Gaussian multiple linear regression
β
β
Regression in practice
.
References
If the rank of matrix X is p , then X X is invertible. This is the case if the columns of X aren’t collinear.
63/109
Ordinary least square estimator 3/4 Case study Simple linear regression
We minimize S defined by:
Gaussian simple linear regression
S:
Rp
β
→ →
R+
Multiple linear regression
. S ( β ) = Y Y
− 2 β X Y + β X Xβ
Gaussian multiple linear regression Regression in practice
The gradient of S is:
References
∇S (β) = −2XY + 2XXβ . Note that the gradient of x → a x is a and that the gradient of x → x Ax is Ax + A x (2Ax if the matrix A is symmetric).
64/109
Ordinary least square estimator 4/4 Case study Simple linear regression Gaussian simple linear regression
Hessian matrix of S is 2X X, which is positive definite. We need to solve:
Multiple linear regression Gaussian multiple linear regression
−XY + XXβ = 0 ⇔ XXβ = X−1Y ⇔ β = X X X Y .
Regression in practice
References
65/109
Fitted value and residuals Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
Fitted values are: Y
Residual are: e
= Xβ .
− =Y
Y
Gaussian multiple linear regression Regression in practice References
.
66/109
Geometric interpretation Case study Simple linear regression
is the orthogonal projection of Y on the subspace spanned by columns of X: sp X1 ,..., Xp .
Y
{
}
The projection matrix ( “hat” matrix) is:
H
=X
−1 X X X
= Xβ = X
X X
−1
Multiple linear regression Gaussian multiple linear regression Regression in practice
.
References
We can check that: Y
Gaussian simple linear regression
X Y
= HY .
67/109
Properties of fitted values Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
E ( Y)
= Xβ .
ΓY = σ 2 H .
Gaussian multiple linear regression Regression in practice
References
68/109
Properties of residuals
e
∈ sp{X1,...,
e
=Y
X e
Case study
}⊥ .
Simple linear regression
Gaussian simple linear regression
− Y = Y − HY = H⊥Y where H⊥ = I −H . n
Multiple linear regression
=0.
Gaussian multiple linear regression
If there is constant is in the model model then n n e, 1n = 0, so i =1 e i = 0 and i =1 yi =
Xp
2 = Y H ⊥ Y . e
E(e) = 0 .
Γe = σ 2 H ⊥ .
Regression in practice
n i =1 yi
.
References
69/109
Properties of β Case study Simple linear regression Gaussian simple linear regression
Multiple linear regression
β is an unbiased estimator of β :
Gaussian multiple linear regression
E
β
Γβ = σ 2
−1 X X
=β .
Regression in practice References
.
70/109
Gauss-Markov theorem Case study Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression
β is the best linear unbiased estimator ( BLUE) of β .
Regression in practice References
71/109
Estimation of σ 2 Case study Simple linear regression Gaussian simple linear regression
We consider the residual variance as estimation of σ 2 : σ2 =
e
n
−
2
p
=
Multiple linear regression
SSR . n p
Gaussian multiple linear regression
−
−
Regression in practice
Thus, for the variance-covariance matrix of β : Γβ = σ 2
X X
−1
=
SSR n p
−1 XX
References
.
72/109
ANOVA Case study Simple linear regression Gaussian simple linear regression
With a constant in the model: SST
= SSM
n
i =1
Multiple linear regression
+ SSR
n
(yi
− y )2
Y − y¯ 2 1n
=
(yi
y )2
− −
+
i =1
=
Y
Gaussian multiple linear regression
n
y¯1n
(yi
yi )2
− −
Regression in practice
i =1
2
+
Y
Y
References
2
73/109
Dcomposition of la Variance (ANOVA) Case study Simple linear regression Gaussian simple linear regression
Without a constant in the model: SST
= SSM
n
Multiple linear regression
+ SSR
n
yi2
=
i =1
Y2
yi2
+
i =1
=
Y
Gaussian multiple linear regression
n
2
(yi
yi )2
− −
Regression in practice
i =1
+
Y
Y
References
2
74/109
Coefficient of determination Case study
The coefficient of determination R R2 =
2
Simple linear regression
∈ [0, 1] is defined by:
Gaussian simple linear regression
SSM . SST
Multiple linear regression Gaussian multiple linear regression
R2 is also equal to cos 2 (θ) where θ is the angle between: Regression with constant: ( Y y¯1n ) and Y y¯1n .
−
Regression without constant:
Y
and
Y.
Regression in practice
−
References
Interpretation is easier in the case of a regression with constant.
75/109
Adjusted coefficient of determination Case study Simple linear regression
The coefficient of determination increases with p .
Gaussian simple linear regression
The adjusted coefficient of determination is defined by:
Regression with constant : R2adjusted = 1
Multiple linear regression Gaussian multiple linear regression
− n −n p 1 − R2
− −− −
.
Regression in practice References
Regression without constant: R2adjusted = 1
n n
1 1 p
R2 .
76/109
Table of contents Case study Simple linear regression
Case study
Gaussian simple linear regression
Simple linear regression
Multiple linear regression Gaussian multiple linear regression
Gaussian simple linear regression
Regression in practice References
Multiple linear regression Gaussian multiple linear regression Regression in practice
77/109
Assumption Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
We add:
Gaussian multiple linear regression
0, σ 2 .
ε
∼N
So (εi )i ∈{1,...,n} is an i.i.d sample with distribution
N
0, σ 2 .
78/109
Regression in practice References
Distribution of
Y Case study Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression
We have: 2 Y
∼N
n Xβ , σ
In
.
Regression in practice References
79/109
Maximum likelihood estimation 1/3 Case study
The likelihood of ( ε1 ,...,
p ε1 ,...,
n)
Simple linear regression
is:
√ − √ − − √ − −
n ; β, σ
2
= =
1
σ 2π 1 σ 2π
n
n
exp exp
Gaussian simple linear regression
1 ε 2 2 σ2
1 2 S (β) 2σ
Multiple linear regression Gaussian multiple linear regression
.
Regression in practice References
Thus the log-likelihood is:
ε1 ,...,
n ; β, σ
2
=
n log
2π
n log σ 2 2
1 S (β) . 2σ 2
80/109
Maximum likelihood estimation 2/3 Case study Simple linear regression Gaussian simple linear regression
We obtain the maximum by solving:
∇
β
ε1 ,...,
n ; β, σ
∂ ε1 ,..., n ; β , σ 2 ∂σ 2
2
⇔∇ S =0⇔− n 2σ
=0
β
Multiple linear regression
β ML
Gaussian multiple linear regression
=0,
1 S β ML = 0 . 4 2σML
Regression in practice
2 ML
+
References
Maximum likelihood and OLS estimators of β are same.
81/109
Maximum likelihood estimation 3/3 Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
We have: 2 σML
S β
= n
2 = .
Gaussian multiple linear regression
e
n
Regression in practice
Maximum likelihood estimation of σ 2 is biased (but asymptotically unbiased).
References
82/109
Parameters distribution Case study Simple linear regression Gaussian simple linear regression
Cochran theorem gives:
Multiple linear regression
β
∼N
X X
,
Gaussian multiple linear regression Regression in practice
− ∼ ⊥ (n
β, σ 2
−1
p)
β
σ2 σ2
χ 2n−p ,
References
σ2 .
83/109
Test Case study
We test:
Simple linear regression
H0 : R β = r
Gaussian simple linear regression
H1 : R β = r
Multiple linear regression
with dim( R ) = (q , p ) and rang( R ) = q .
Gaussian multiple linear regression
The statistic used is: F =
n
∼F
− − − p
r
q
Rβ
Regression in practice
2
Y
R (X X)
Xβ
−1
R
References
−1
2
.
−
Under H0 : F (q , n p ) . We reject H0 at level α if f > f(q,n−p −1),1−α .
84/109
Overall significance test Case study Simple linear regression
For a regression with constant, we consider:
H0 : β 2 = . . . = β p = 0
∃ ∈ {2,..., p} /β = 0
H1 : i
Gaussian simple linear regression
.
Multiple linear regression
i
Gaussian multiple linear regression
Statistic used is:
Regression in practice
− p SSM − 1 SSR − p R2 . − 1 1 − R2 Under H0 : F ∼ F (p − 1, n − p ). n F = p n = p
References
We reject H0 at level α if f > f(p−1,n−p),1−α .
85/109
ANOVA Case study Simple linear regression
We define:
Gaussian simple linear regression
SSM MSM = , p 1 SSR MSR = . n p
−
Source
ddl
SS
MS
M
−1 n−p n−1
SSM
MSM
SSR
MSR
R T
p
−
F
MSM MSR
Multiple linear regression Gaussian multiple linear regression Regression in practice
− value P (F (p − 1, n − p ) > f ) p
SST
86/109
References
Test on a parameter Case study
Consider:
for i
Simple linear regression
H0 : β i = c
Gaussian simple linear regression
H1 : β i = c
Multiple linear regression
∈ {1,..., p}.
Gaussian multiple linear regression
The statistic used is: T =
− βi
σ
where
X X
−1 is X X ii −1
Regression in practice
c
References
(X X)ii−1
le i -th diagonal value of the matrix
. Under H0 : T (n p ). We reject H0 at level α if t > tn−p ,1−α .
∼T −
87/109
Table of contents Case study Simple linear regression
Case study
Gaussian simple linear regression
Simple linear regression
Multiple linear regression Gaussian multiple linear regression
Gaussian simple linear regression
Regression in practice References
Multiple linear regression Gaussian multiple linear regression Regression in practice
88/109
Atypical explanatory variables 1/2 Case study
The i -th diagonal value of
H
hii = Xi
X
where
i
Simple linear regression
is called leverage point :
X X
−1
Gaussian simple linear regression Multiple linear regression
Xi
Gaussian multiple linear regression
= (xi 1 ,..., xip ) .
We have 0
Regression in practice
≤ h ≤ 1 because H = H and: n
2 H
References
ii
=H
⇒h
ii
=
j =1
n
hij2
=
hii2
+
hij2 .
j =1,j =i
89/109
Atypical explanatory variables 2/2 Case study Simple linear regression Gaussian simple linear regression
If hii is close to 1 then hij are close to 0, yi will be almost explained by yi . n i =1 hii
Moreover Tr (H) = rang ( H) so
Multiple linear regression Gaussian multiple linear regression
= p.
Regression in practice
Belsey proposed to consider i as atypical if: hii > 2
References
p . n
90/109
Atypical dependent variable: principle Case study Simple linear regression
We use residuals: ei = yi
−y
We have: Γe = σ 2 (In
i
Gaussian simple linear regression
.
Multiple linear regression
−H ) .
Gaussian multiple linear regression
Thus:
Regression in practice
Var (ei ) = σ 2 (1
References
− h ), Cov( e , e ) = σ 2 (1 − h ) . i
j
ii
ij
We use studentized residuals to evaluate if an observation is atypical.
91/109
Atypical dependent variable: internal studentized residuals
Case study Simple linear regression Gaussian simple linear regression
For i
Multiple linear regression
∈ {1,..., n}, the internal studentized residuals are: e ri =
y y = σ i 1 ih . ii
√ −− i
Var (ei )
Gaussian multiple linear regression Regression in practice References
T − p − 1)
ri approximatively follows the distribution (n (approximation used when n p 1 > 30).
− −
92/109
Atypical dependent variable: external studentized residuals We use cross validation criterion. Consider leave-one out calculations: y i −i =
β
σ −i
−i
2
=
−i Xi X
=
n
β
−i
Y −i
Simple linear regression
−
−i
X
Gaussian simple linear regression
,
−i
−1
X
−i
Multiple linear regression
Y
Gaussian multiple linear regression
,
Regression in practice
2
Y −i
References
.
−p−1
The external studentized residuals are: Ri =
σ −i We have: Ri
−y−
yi
∼ T
1+
(n
−i Xi
− p).
i
Case study
i
−1 −i (X −i ) X −i Xi
.
93/109
Observations influence Case study Simple linear regression
The Cook distance measures a difference between β and −i β . For the observation i :
β
Di = 1 p
β
−i
X X
β
β
Multiple linear regression Gaussian multiple linear regression
−i
− − σ2
Gaussian simple linear regression
.
Regression in practice References
Cook proposed to consider i as influent if: Di >
4
n
−p . 94/109
Collinearity detection: problem Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
If the columns of
X
Gaussian multiple linear regression
are collinear then the OLS estimator
isn’t unique.
Regression in practice References
95/109
Collinearity detection: facteur d’influence of la variance and tolrance
∈{
Case study
}
We compute the regression of Xj , for j 1,..., p , on the p other variables (with the constant) and we calculate the coefficient of determination R j2 . The variance inflation factor (VIF) of Xj , j 1,..., p , is:
∈{
VIFj =
1
−1 R2 .
TOLj =
1 . VIFj
}
Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice
j
References
The tolerance (TOL) is:
In practice, we consider that there is a collinearity problem if VIFj > 10 (TOLj < 0.1). 96/109
Collinearity detection: explanatory variables structure1/5
Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
In the case of a regression without constant, consider:
X
=
− 1−
X1
X1 1 n
X
X1 1 n
,...,
Xp Xp
− −
X p 1n X p 1n
Gaussian multiple linear regression Regression in practice
.
References
97/109
Collinearity detection: explanatory variables structure 2/5
Case study Simple linear regression
The correlation matrix of explanatory variables is R = X X. It’s a nonnegative and symmetric matrix, thus diagonalizable. Let (λ1 ,..., p ) be the p descendingly ordered eigenvalues of R . The condition numbers for j 1,..., p are defined by:
∈{
CNj =
}
Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References
λ1 . λj
In practice a CN j > 30 value indicates a possible problem of collinearity.
98/109
Collinearity detection: analyse of la structure des explanatory variables 3/5 We now seek to determine the groups of variables concerned. Let V be the transfer matrix ( V V = Ip ) of R :
R=V
We have for k
λ1 0 .. .. .
0 .. . .. . .. .. . .
0 .. .
...
..
λi .. .
0 .. . .. .
... ..
.
0
∈ {1,..., p}:
Var βk =
σ2 Xk
X k 1n
−
0 λp
p
j =1
Case study Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice
V .
References
vkj2
λj
. 99/109
Collinearity detection: explanatory variables structure 4/5
Case study
We compute the proportion πlk of the variance of βk explained by Xl : vkl2 λl
πlk =
We have:
β1
Eigenvalues
CN
λ1
1
π11
.. .
.. .
.. .
λj
.. .
.. .
λp
λ1 λj
λ1 λp
Var
πj 1
2 vkj p j =1 λj
...
.. .
.. . πp 1
Multiple linear regression Gaussian multiple linear regression
Var βk 1k
... .. .
jk
pk
Var βp 1p
.. . .. .
.. . .. .
Gaussian simple linear regression
.
.. . .. .
Simple linear regression
jp
.. . .. .
pp
100/109
Regression in practice References
Collinearity detection: explanatory variables structure 5/5
Case study Simple linear regression Gaussian simple linear regression
In practice, we must study the explanatory variables Xj with a high CN. For this variable j , if there are at least two explanatory variables Xk and Xk such that πjk and πjk are high (more than 0.5 in practice) then problem of collinearity is suspected between these variables k and k .
101/109
Multiple linear regression Gaussian multiple linear regression Regression in practice References
Complementary analysis Case study Simple linear regression Gaussian simple linear regression
Homoscedasticity. There are tests but it’s also possible to plot studentized residuals in function of fitted values.
Multiple linear regression Gaussian multiple linear regression Regression in practice
Normality of the error distribution It’s possible to use tests, for example a Shapiro-Wilk test.
References
102/109
Model selection: principle Case study Simple linear regression Gaussian simple linear regression
Assume that we are searching k predictors between K possible predictors. There are
K
Multiple linear regression Gaussian multiple linear regression
potential models.
k
We need:
Regression in practice
a criterion to compare two models,
a search strategy.
References
103/109
Model selection: possible criterions Case study Simple linear regression Gaussian simple linear regression
Multiple linear regression
Adjusted coefficient of determination,
Gaussian multiple linear regression
Kullback information criterion (AIC or BIC for example),
Regression in practice References
Mallow’s coefficient.
104/109
Model selection: Kullback-Leibler information criterion
Case study Simple linear regression
Let f0 be a probability density that we want to estimate among a parameterized set F. The Kullback-Leibler divergence is a measure of distance between true and estimated probability densities: I (f0 ,
F ) = min ∈F f
log
f0 ( x ) f (x )
Multiple linear regression Gaussian multiple linear regression
f0 (x ) dx .
This quantity is always nonnegative and is zero if f0 Kullback-Leibler estimators have the following form:
Gaussian simple linear regression
Regression in practice References
∈ F.
2 I (Mk ) = n log σM + α (n) k k 2 is the residual variance of M (one of the models where σM k k with k predictors).
105/109
Model selection: AIC and BIC criterions Case study Simple linear regression
There are many choices for α:
The Akaike criterion (AIC: Akaike Information Criterion) With α (n) = 2:
Multiple linear regression Gaussian multiple linear regression
2 AIC = n log σM + 2k . k
Gaussian simple linear regression
Regression in practice
The Schwarz criterion (BIC: Bayesian Information Criterion, equivalent to SBC: Schwarz Bayesian Criterion) With α (n) = log ( n):
References
2 BIC = n log σM + k log( n) . k
106/109
Model selection: Mallows criterion Case study Simple linear regression Gaussian simple linear regression Multiple linear regression
The Mallows criterion Cp (p doesn’t refer to the number of predictors) for a model Mk with k predictors is defined by: SSR( Mk ) Cp = σ2
Gaussian multiple linear regression Regression in practice
− n + 2k .
References
107/109
Model selection: search strategy
Start with no variables in the model. Add the variable whose inclusion gives the most statistically significant improvement (given the criterion chosen). Repeat until none of the variables exclusion improves the model.
Backward procedure:
Case study
Forward procedure:
Start with all variables in the model. Remove the variable whose exclusion gives the most statistically significant improvement (given the criterion chosen). Repeat until none of the variables inclusion improves the model.
Stepwise procedure: It’s a combination of the forward and backward procedures. 108/109
Simple linear regression Gaussian simple linear regression Multiple linear regression Gaussian multiple linear regression Regression in practice References
References Case study Simple linear regression Gaussian simple linear regression
Antoniadis, A., Berruyer, J., and Carmona, R. (1992). R´ egression non lin´ eaire et applications. Economica.
Multiple linear regression
Cornillon, P.-A. and Matzner-Løber, E. (2010). R´ egression avec R . Springer.
Gaussian multiple linear regression
Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland , 15:246–263.
Regression in practice References
McCullagh, P. and Nelder, J. A. (1989). Generalized linear models . Monographs on Statistics & Applied Probability (37). Chapman & Hall.
109/109
View more...
Comments