Correlation and Regression Analysis

Share Embed Donate


Short Description

Download Correlation and Regression Analysis...

Description

Correlation and Regression Analysis

Nayyar Raza Kazmi M.B.,B.S, D.H.P.M, M.P.H, M.Sc

Objectives of the Lecture • To understand the concept of Correlation and Regression Analysis. • Understand the areas in which Correlation and regression Models can be applied. • Understand interpreting Correlation and Regression parameters.

• Most of studies done by Post graduate trainees are crosssectional in nature. • Analysis of such studies is mostly confined to application of descriptive univariate statistics. • Quality of such studies can be enhanced by further data mining by Correlation and Regression Analysis.

Correlation – Strength of association between two variables. – Tells us how much the two variables are associated with one another. – However doesn’t assume CAUSATION. – Simply tells us whether the two variables are positively or negatively correlated.

Regression • If there is a strong correlation between two variables, Regression is used to determine the value of dependent variable (Y) from the value of independent variable (X) • Types – Simple Linear Regression – Multiple Linear Regression – Logistic Regression

Correlation Analysis is a group of statistical techniques to measure the association between two variables.

The Dependent

Variable is the variable being predicted or estimated.

30 Sales ($thousands)

A Scatter Diagram is a chart that portrays the relationship between two variables.

Advertising Minutes and $ Sales

25 20 15 10 5 0 70

90

110

130

150

170

190

Advertising Minutes

The Independent

Variable provides the basis for estimation. It is the predictor variable. Correlation Analysis

The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. Also called Pearson’s r and It requires interval or ratioPearson’s product moment scaled data. correlation coefficient. P e a r rs o n ' s It can range from -1.00 to 1.00. Values of -1.00 or 1.00 indicate perfect and strong -1 0 correlation. 1 Negative values indicate an Values close to 0.0 indicate inverse relationship and weak correlation. positive values indicate a The Coefficient of Correlation, direct relationship. r

Y

10 9 8 7 6 5 4 3 2 1 0 0

1

2

3

4

5 X

6

7

8

9

10

Perfect Negative Correlation

Y

10 9 8 7 6 5 4 3 2 1 0 0

1

2

3

4

5 X

6

7

8

9

10

Perfect Positive Correlation

Y

10 9 8 7 6 5 4 3 2 1 0 0

1

2

3

4

5 X

6

7

8

9

10

Zero Correlation

Phi Co-efficient • Used for two categorical variables

Ф =

ad - bc (a+b)(a+c)(c+d)(b+d)

Regression Equation and Regression Line Yc

=

a

+

bX

• where Y = computed value of the dependent variable a c = Y-intercept where X equals zero • b = slope of the regression line, which is the increase or decrease • in Y for each change of one unit of X X = a given value of the independent variable •

Simple Linear Regression • Determines the value of a Dependent Variable based on a single independent Variable. • Simplest form of Regression Analysis.

Multiple Linear Regression • Used when the Dependent Variable is a continuous variable and independent variables are continuous or categorical.

Y = a + b1x1 + b2x2+……..+bkxk

Putting MLR in Practice • A descriptive study on normal healthy adults aged 14-25 years gathers date about their weight, systolic Blood Pressure and Serum Cholesterol levels.

????? • Is serum cholesterol level associated with weight and systolic blood pressure? • Can we predict Serum Cholesterol levels if we know a persons weight and systolic blood pressure.

Y = a + b1x1 + b2x2+……..+bkxk Y= 18.52+3.20(BP)+[-4.06(Weight)] So What could be the Serum Cholesterol level for a person who weighs 75Kg and has a systolic Blood Pressure of 145mm Hg????

Y= 18.52+3.20(145)+[-4.06(75)] Y= 18.52+464+[-304.5] Y= 18.52+464-304.5 Y= 178.02

Logistic Regression • Logistic Regression is used when the outcome variable is categorical • The independent variables could be either categorical or continuous • Logistic Regression determines the Odds Ratio for various independent variables for the dichotomous dependent variable

• The Dichotomous Dependent variable could be presence/ absence of a complication, disease etc. • Data for dichotomous variables must be binary coded like 1 for presence of complication or disease and 0 for Absence of complication or disease.

Putting Logistic Regression in Practice • Risk Factors for Complications of Diabetes Mellitus in patients admitted to a Tertiary Care Hospital

What can I derive from this Data??????

Risk Factors for No of patients Retinopathy (n=32)

%age

BMI> 30 Smoking Level of prior awareness

13 28 14

40.26 87.5 43.75

HbA1C >7 Duration of Diabetes > 10 Years

10 20

31.25 62.5

Where Correlation and Regression Models can be applied • Cross-sectional studies. • K.A.P Studies • Studies aiming to determine relationships between certain factors of interest and their outcomes

Softwares to use • MS Excel with Data Analysis add-in installed • SPSS • Epi Info 2002 • MedCalc (Recommended because of ease of use and power to perform all types of statistical calculations)

• Thankyou for your patience.(There is a Negative Strong Correlation between length of Biostats lecture and the Your moods evident by the 11 “O” Clock sign on your forheads • Questions, Queries and Suggestions are welcome.

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF