Business Analytics

June 3, 2016 | Author: MANOJ KUMAR | Category: Types, Instruction manuals
Share Embed Donate


Short Description

Business Analytics...

Description

Business Analytics Linear Regression

4.b.Introduction to Regression Analysis • Regression analysis is used to: – Predict the value of a dependent variable based on the value of at least one independent variable – Explain the impact of changes in an independent variable on the dependent variable

• Dependent variable: the variable we wish to explain usually denoted by Y • Independent variable: the variable used to explain the dependent variable. Denoted by X. it is sometimes also referred as predictor variable 2

4.b.Simple Linear Regression Model (SLRM) – Only one independent variable, x (When there is only one predictor variable, the prediction method is called simple regression) – Relationship between x and y is described by a linear function (in other words, a simple linear regression is the one where predictions of Y when plotted as a function of X, form a straight line). – Changes in y are generally assumed to be caused by changes in x 3

4.b.SLRM Example • I have a data set which contains 25,000 records of human heights and weights (source: http://socr.ucla.edu/docs/resources/SOCR_Da ta/SOCR_Data_Dinov_020108_HeightsWeight s.html) • You can download csv file of the data from: (https://app.box.com/s/10z9keaeyucqc0nks8u q) 4

4.b.Assumptions 1. A linear relationship exists between the dependent and the independent variable. 2. The independent variable is uncorrelated with the residuals.

3. The expected value of the residual term is zero

E ( )  0

4. The variance of the residual term is constant for all observations (Homoskedasticity) 2 2

E      i



5. The residual term is independently distributed; that is, the residual for one observation is not correlated with that of another observation [E( ε ε )  0, j  i] i j

6. The residual term is normally distributed. 5

4.b.Types of Regression Models Negative Linear Relationship

Relationship NOT Linear

Positive Linear Relationship

No Relationship

6

4.b.Population Linear Regression Y

Y  β 0  β1 X  u ui

(continued)

Slope = β1 Random Error for this x value

Predicted Value of Y for Xi

Intercept = β0

Individual person's marks

xi

X 7

4.b.Population Regression Function Dependent Variable

Population y intercept

Population Slope Coefficient

Independent Variable

Random Error term, or residual

Y  β 0  β1 X  u Linear component

Random Error component

But can we actually get this equation? If yes what all information we will need?

8

4.b.Sample Regression Function Y

y  b 0  b1 x  e

(continued)

Observed Value of y for xi

ei

Slope = β1 Random Error for this x value

Predicted Value of Y for Xi

Intercept = β0

xi

X 9

4.b.Sample Regression Function Estimate of the regression intercept

Estimate of the regression slope Independent variable

y i  b 0  b1x  e

Error term

Notice the similarity with the Population Regression Function Can we do something of the error term?

10

4.b.The error term (residual) – Represents the influence of all the variable which we have not accounted for in the equation – It represents the difference between the actual "y" values as compared the predicted y values from the Sample Regression Line – Wouldn't it be good if we were able to reduce this error term? – What are we trying to achieve by Sample Regression? 11

4.b.Our Objective

Y  β 0  β1 X  u To Predict PRL from SRL

ˆ i  b 0  b1x y 12

4.b.One method to find b0 and b1 –Method of Ordinary Least Squares (OLS) 2finding the values of b –b0 and b1e 2are obtained by 0 ˆ  (y y) and b1 that minimize the sum of the squared 2 residuals  (y  (b  b x))



 

0

1

–Are there any advantages of minimizing the squared errors?

13

4.b.OLS Regression Properties ( y yˆ )  0

– The sum of the residuals from the least squares 2 ˆ ( y  y ) regression  line

is 0.

– The sum of the squared residuals is a minimum. Minimize(

)

– The simple regression line always passes through the mean of the y variable and the mean of the x variable – The least squares coefficients are unbiased estimates14

4.b.Interpretation of the Slope and the Intercept – b0 is the estimated average value of y when the value of x is zero. More often than not it does not have a physicaly interpretation  – b1 is the estimated changeYin  thebaverage value of y 0  b1 X as a result of a one-unit changeslope in ofx the line(b ) 1

b0

x 15

4.b.Limitations of Regression Analysis – Parameter Instability - This happens in situations where correlations change over a period of time. This is very common in financial markets where economic, tax, regulatory, and political factors change frequently. – Public knowledge of a specific regression relation may cause a large number of people to react in a similar fashion towards the variables, negating its future usefulness. – If any regression assumptions are violated, predicted dependent variables and hypothesis 16 tests will not hold valid.

4.b.General Multiple Linear Regression Model – In simple linear regression, the dependent variable was assumed to be dependent on only one variable Yi  b0  b1 (independent X 1i  b2 X 2i  ......... variable) bk X ki   i – In General Multiple Linear Regression model, the dependent variable derive sits value from two or more than two variable. – General Multiple Linear Regression model take the following form:

where: th

17

4.b.Estimated Regression Equation – As we calculated the intercept and the slope n 2  i coefficient  in case of simple linear regression by i 1 minimizing the sum of squared errors, similarly we      estimate the intercept and slope coefficient in Yi  b0  b1 X 1i  b2 X 2i  .........  bk X ki multiple linear regression. i

• Sum  of SquaredErrors   estimated.  Y Y  Y  b  b i

i

i

 

0

is  minimized and the slope  coefficient is

 X  b X  .........  b X 1 1i 2 2i k ki  

– The resultant estimated equation becomes:

18

th

4.b.Interpreting the Estimated Regression Equation b0  Valueof Y – Intercept Term (b0): It's the value of dependent when X 1  X 2  .......X k  0 variable when the value of all independent variables become zero.

– Slope coefficient (bk): It's the change in the dependent variable from a unit change in the corresponding independent (Xk) variable keeping all other independent variables constant. • In reality when the value of the independent variable changes by one unit, the 19 change in the dependent variable is not equal to the slope coefficient but

4.b.Assumptions of Multiple Regression Model – There exists a linear relationship between the dependent and independent variables. – The expected value of the error term, conditional on the independent variables is zero. – The error terms are homoskedastic, i.e. the variance of the error terms is constant for all the observations. – The expected value of the product of error terms is always zero, which implies that the error terms are uncorrelated with each other. 20

Thank you!

Pristine 702, Raaj Chambers, Old Nagardas Road, Andheri (E), Mumbai-400 069. INDIA www.edupristine.com Ph. +91 22 3215 6191

© Pristine – www.edupristine.com

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF