Objectives of the Lecture • To understand the concept of Correlation and Regression Analysis. • Understand the areas in which Correlation and regression Models can be applied. • Understand interpreting Correlation and Regression parameters.
• Most of studies done by Post graduate trainees are crosssectional in nature. • Analysis of such studies is mostly confined to application of descriptive univariate statistics. • Quality of such studies can be enhanced by further data mining by Correlation and Regression Analysis.
Correlation – Strength of association between two variables. – Tells us how much the two variables are associated with one another. – However doesn’t assume CAUSATION. – Simply tells us whether the two variables are positively or negatively correlated.
Regression • If there is a strong correlation between two variables, Regression is used to determine the value of dependent variable (Y) from the value of independent variable (X) • Types – Simple Linear Regression – Multiple Linear Regression – Logistic Regression
Correlation Analysis is a group of statistical techniques to measure the association between two variables.
The Dependent
Variable is the variable being predicted or estimated.
30 Sales ($thousands)
A Scatter Diagram is a chart that portrays the relationship between two variables.
Advertising Minutes and $ Sales
25 20 15 10 5 0 70
90
110
130
150
170
190
Advertising Minutes
The Independent
Variable provides the basis for estimation. It is the predictor variable. Correlation Analysis
The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. Also called Pearson’s r and It requires interval or ratioPearson’s product moment scaled data. correlation coefficient. P e a r rs o n ' s It can range from -1.00 to 1.00. Values of -1.00 or 1.00 indicate perfect and strong -1 0 correlation. 1 Negative values indicate an Values close to 0.0 indicate inverse relationship and weak correlation. positive values indicate a The Coefficient of Correlation, direct relationship. r
Y
10 9 8 7 6 5 4 3 2 1 0 0
1
2
3
4
5 X
6
7
8
9
10
Perfect Negative Correlation
Y
10 9 8 7 6 5 4 3 2 1 0 0
1
2
3
4
5 X
6
7
8
9
10
Perfect Positive Correlation
Y
10 9 8 7 6 5 4 3 2 1 0 0
1
2
3
4
5 X
6
7
8
9
10
Zero Correlation
Phi Co-efficient • Used for two categorical variables
Ф =
ad - bc (a+b)(a+c)(c+d)(b+d)
Regression Equation and Regression Line Yc
=
a
+
bX
• where Y = computed value of the dependent variable a c = Y-intercept where X equals zero • b = slope of the regression line, which is the increase or decrease • in Y for each change of one unit of X X = a given value of the independent variable •
Simple Linear Regression • Determines the value of a Dependent Variable based on a single independent Variable. • Simplest form of Regression Analysis.
Multiple Linear Regression • Used when the Dependent Variable is a continuous variable and independent variables are continuous or categorical.
Y = a + b1x1 + b2x2+……..+bkxk
Putting MLR in Practice • A descriptive study on normal healthy adults aged 14-25 years gathers date about their weight, systolic Blood Pressure and Serum Cholesterol levels.
????? • Is serum cholesterol level associated with weight and systolic blood pressure? • Can we predict Serum Cholesterol levels if we know a persons weight and systolic blood pressure.
Y = a + b1x1 + b2x2+……..+bkxk Y= 18.52+3.20(BP)+[-4.06(Weight)] So What could be the Serum Cholesterol level for a person who weighs 75Kg and has a systolic Blood Pressure of 145mm Hg????
Logistic Regression • Logistic Regression is used when the outcome variable is categorical • The independent variables could be either categorical or continuous • Logistic Regression determines the Odds Ratio for various independent variables for the dichotomous dependent variable
• The Dichotomous Dependent variable could be presence/ absence of a complication, disease etc. • Data for dichotomous variables must be binary coded like 1 for presence of complication or disease and 0 for Absence of complication or disease.
Putting Logistic Regression in Practice • Risk Factors for Complications of Diabetes Mellitus in patients admitted to a Tertiary Care Hospital
What can I derive from this Data??????
Risk Factors for No of patients Retinopathy (n=32)
%age
BMI> 30 Smoking Level of prior awareness
13 28 14
40.26 87.5 43.75
HbA1C >7 Duration of Diabetes > 10 Years
10 20
31.25 62.5
Where Correlation and Regression Models can be applied • Cross-sectional studies. • K.A.P Studies • Studies aiming to determine relationships between certain factors of interest and their outcomes
Softwares to use • MS Excel with Data Analysis add-in installed • SPSS • Epi Info 2002 • MedCalc (Recommended because of ease of use and power to perform all types of statistical calculations)
• Thankyou for your patience.(There is a Negative Strong Correlation between length of Biostats lecture and the Your moods evident by the 11 “O” Clock sign on your forheads • Questions, Queries and Suggestions are welcome.
Thank you for interesting in our services. We are a non-profit group that run this website to share documents. We need your help to maintenance this website.