Review of Statistical Analysis
Short Description
Statistical Analysis Lecture...
Description
[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS Dr. Adversario || June 29, 2015
RES
Transcribers: Campbell, Candare, Caraan, Carasig, Carballo Editors: Adviento
A. Measures of Frequency
OUTLINE I.
II.
III. IV. V. VI. VII. VIII. IX. X. XI.
Descriptive Analysis A. Qualitative and Quantitative measures 1. Measures of Frequency 2. Measures of Location 3. Measures of Central Tendency 4. Measures of Dispersion B. Tabular and Graphical Presentation Inferential Analysis A. Estimation B. Hypothesis Testing Factors to be considered in choosing the proper statistical tests Types of Variables and Levels of Measurement Assumption of Distribution Test for Difference Between Group Proportions Test for Difference of Group Means/Medians Statistical Tools To Investigate Relationship Between Variables Analysis of 2x2 tables Measures of Effects/Association Measuring the Accuracy of the Diagnostic Tests
Legend: Remember (Exams)
Lecturer
Book
Previous Trans
Trans Comm
1. Count absolute number of persons/elements with the characteristic 2. Ratio single number representing the relative size of 2 numbers a/b (k) 3. Proportion special type of ratio where the numerator is part of the denominator a/a+b (k) 4. Rate frequency of occurrence of events in a given interval of time B. Measures of Location 1. Percentile one of the 99 values of a variable which divides the distribution into 100 equal parts 2. Decile one of the 9 values of a variable which divides the distribution into 10 equal parts 3. Quartile one of the 3 values of a variable which divides the distribution into 4 equal parts
LEARNING OUTCOMES I. Determine the appropriate descriptive measure for summarizing data II. Determine the appropriate method for presenting data III. Determine the appropriate statistical test for analyzing data Descriptive Statistics
Method to summarize and present data in a form which will make it easier to analyze and interpret Consists of the collection, organization, summarization, and presentation of data
P25 = D2.5 = Q1 P50 = D5 = Q2 P75 = D7.5 = Q3
Inferential Statistics
Method to make generalizations and conclusions about a target population based on results from a sample INFERENCES from samples to populations Uses probability
Summarizing Figures Qualitative Measures Frequency Location Quantitative Measures Central Tendency Dispersion Tabular presentation Graphical presentation
C. Measures of Central Tendency
Mean: average Median: middlemost Mode: most frequent
I. DESCRIPTIVE ANALYSIS
Page 1 of 10
[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO
Interpretation Mean: The average weight of patients is 14.4 kg Median: Half of the patients weighed less than 15.85 kg while the other half weighed more than or equal to 15.85 kg Mode: The usual weight of the patients is 12.6 and 16 kg Choice of the Measures of Central Tendency scale of measurement nature of the distribution
Coefficient of Variation Comparing 2 different variables Comparing 2 different populations on the same variable Measure of relative dispersion which expresses the standard deviation as a percentage of the mean
D. Measures of Dispersion
Range Variance Standard Deviation Coefficient of Variation Interquartile Range Page 2 of 10
[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO
Summary
Proportions and percentages are used to summarize nominal and ordinal data Percentiles are useful to compare an individual observation with a norm The median is used for ordinal data or skewed numerical data The range is used with numerical data when the purpose is to emphasize extreme values The standard deviation is used when the mean is used The coefficient of variation is used when the intent is to compare distributions measured on different scales The interquartile range is used to describe the central 50% of a distribution, regardless of its shape Variance and standard deviation can be used to directly compare two samples with same units of measure
Type of Graphs
Pie Graph
Bar Graph o Vertical
Tabular and Graphical Presentation
Figures Visual presentation of results o Graphs o Diagram o Photograph o Pen and ink drawings o Flow Charts o Schematics o Maps
Page 3 of 10
[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO o
Horizontal
Frequency Polygon
o
Component
Scatterplot
Choosing the appropriate graph GRAPH Histogram or frequency polygon Bar (Horizontalor Vertical) Pie or component bar o
Histogram
NATURE OF VARIABLE Quantitative continuous
PURPOSE
Qualitative or quantitative discrete Qualitative
Comparison of absolute or relative counts between categories Breakdown of a group total where the number of categories is not too many Shows trend of data or changes with time Correlate data between two variables
Line graph
Time Series
Scatterplot
Quantitative
Graphic representation of a frequency distribution
Figure Checklist Is the figure necessary? Are the data plotted accurately Is the grid scale correctly proportioned? Are parallel figures or equally important figures prepared according to timescale? Avoid 3D figures for 2D data Avoid non data ink (ticks, grids, frames) and chart junk Avoid optical illusions: broken lines, markers, hatching fill patterns, improper aspect ratios
Line Graph
Page 4 of 10
[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO II. INFERENTIAL ANALYSIS A. Estimation
Is the process by which a statistic computed for a random sample is used to approximate (“estimate”) the corresponding parameter o Parameter- numerical constant obtained by observing the total population o Statistic- numerical variable obtained by observing a random sample from the population
Summarizing Figure Box Plot
Parameter
Mean
μ
Variance
σ
2
s
Standard deviation
σ
s
Proportion
Area of Practice
NCR No. (%)
Non-NCR No. (%)
Co-management
27 (48.2143)
29 (51.7857)
56
CP Clearance
23 (63.8888)
13 (36.1111)
36
Diagnostic Procedures
22 (56.4103)
17 (43.5897)
39
Ventilator Management
18 (43.9024)
23 (56.0976)
41
Reasons for Referral
P
p X1-X2
Difference between Two Proportions
P1-P2
p1-p2
Example:
9 (50.0000)
9 (50.0000)
18
Weaning from ventilator
6 (30.0000)
14 (70.0000)
20
What is wrong with the table? Too many decimal places “Area of Practice” can be merged into one heading Gridlines should be limited to three to highlight figures
Co-management
Research Objective
Results
To estimate the prevalence of parasitism among Filipino children 1-5 years old
(61%, 79%)
To determine the average dental carise score among public elementary school children
70%
15% (14.7%, 15.3%)
Estimate: 70% and 15% are the point estimates for each objective and the values inside the parentheses are the interval estimates. Example 1: Estimation of Population Mean
Area of Practice Reasons for Referral
Point estimate – single numerical value used to approximate the population Interval estimate – consists of 2 numbers, a lower limit and an upper limit, which serves as the bounding values within which the parameter is expected to lie with a certain degree of confidence Point estimate is more precise but interval estimate is more likely to be correct because it gives you a range of values.
Total
Peri-operative evaluation for thoracic surgery
2
μ1-μ2
Area of Practice
x
Difference between Two Means
Table 13 Most Common Reasons for Referral to One’s Specialty in Area of Practice
Statistic
NCR No. (%)
Non-NCR No. (%)
Total
27 (48.2)
29 (51.8)
56
CP Clearance
23 (63.8)
13 (36.1)
36
Diagnostic Procedures
22 (56.4)
17 (43.6)
39
Ventilator Management
18 (43.9)
23 (56.1)
41
Peri-operative evaluation for thoracic surgery
9 (50)
9 (50)
18
Weaning from ventilator
6 (30)
14 (70)
20
A municipal health officer was interested in identifying factors affecting the utilization of health services in his area. Among the factors that he considered was the accessibility of the Rural Health Unit. He interviewed a random sample of 25 patients and asked about the distance travelled in going from their homes to the clinic. His findings showed a mean travel distance of 7km. What is the point estimate of the mean distance travelled by the population of patients served by the clinic. The point estimate of the mean distance traveled by the patients from their homes to the clinic is 7km, which is the result obtained from the sample.
Page 5 of 10
[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO
H0: No difference between those using cadavers and models H1 (two-tailed test): there is a difference H1 (one-tailed test): those using models have greater performance or vice versa
Example 2: Estimation of the Population Proportion A survey was conducted to study the dental health practices of adults in a certain urban population. Of 300 adults randomly selected and interviewed, 123 indicated that they had regular dental check-up twice a year. What is the point and the 95% interval estimate of the population proportion who had regular dental check-up?
2.
The point estimate of the population proportion who had regular dental check-up is 41% (
123 300
Stating the level of significance, α When we arbitrarily set the level of significance at α, we are setting the probability that we shall erroneously reject a true H0 to be at most equal to α e.g. if we set α=0.5, the probability that we are rejecting a true hypothesis is at most only 5%
× 100):
3.
B. Hypothesis Testing
Process of hypothesis testing
Type I error (α error)
Probability of rejecting a true H0
Concluding there is a difference when none exists
Type II error (β error)
Probability of not rejecting a false H0
Concluding that no difference exists when there is
Choosing the test statistic and determining its sampling distribution Depends on the sampling distribution of the sample statistic Probability distribution tables of the different test statistic o normal table: z statistic o t table: t statistic 2 2 o X table: X statistic Factors to be considered in choosing the appropriate statistical test o Objectives of the study o Type of variable o Level of measurement o Whether the samples are related or independent o Assumption on the distribution
STEPS IN HYPOTHESIS TESTING 1.
Stating the null hypothesis, H0 and the alternative hypothesis, H1
Example: A study to compare the performance in Anatomy of 2 groups of students, those using cadavers for demonstration and those using models. If the parameter for evaluating student performance is the proportion who obtain a grade of 2.0 or better, then the H0 and H1 are formulated as follows:
4.
Determining the critical region Critical region – set of values of the test statistics which will lead me to reject a null hypothesis Critical region for a Two-Tailed z test with α=0.05
Page 6 of 10
[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO OBJECTIVES To compare the level of parameter (mean) with a prespecified value (i.e., standard value, national figure, previous results) To compare the level of parameter (proportion) with a pre-specified value (i.e., standard value, national figure, previous results)
5.
To assess the health effects of vehicular emissions on vulnerable population groups by looking at mean blood lead levels between schoolchildren and street child vendors
To compare the parameter (proportion) between 2 groups
To determine success rate defined as the proportion who had ≤ 1 otitis media episode in the first year of treatment between medically and surgically treated groups
To compare the parameter (mean) between 2 or more groups
To compare the hypoalgesic effect as measured through pain scores of true (distal & proximal to the torniquet) and sham acupuncture
To compare the parameter (proportion) between 2 or more groups
To compare the prevalence of current smoking among different income groups categorized into quintiles
To determine whether two or more quantitative variables are related
To determine whether systolic blood pressure of patients in the recumbent and standing positions vary with each other
To determine whether two or more qualitative variables are related
To determine if there is an association between gender and smoking status
Example:
Divide the CR into 2 equal parts, α/2= 0.025, such that one part is located in each tail end of the sampling distribution of the test statistic From the normal table, z value corresponding to a probability of 0.025 is 1.96 CR z ≥ 1.96 and z ≤ -1.96
6.
Making the statistical decision i.e. whether or not to reject the null hypothesis Rejecting of Nor rejecting the null hypothesis (H0) 1. If the computed value of the test statistic falls in the critical region, then we reject H0 2. If the probability (p-value) of getting the computed test statistic under H0 is low, we can say that the sample data cannot support H0 and thus we can reject H0
7.
Drawing conclusions about the population STATISTICAL DECISION Reject the null hypothesis (H0) Do not reject the null hypothesis (H0)
CONCLUSION State the alternative hypothesis (H1) “There is no sufficient evidence to say (state the alternative hypothesis)” NOTE: We don’t accept the null hypothesis
III. FACTORS TO BE CONSIDERED IN CHOOSING THE APPROPRIATE STATISTICAL TEST
To determine the prevalence of breast cancer among Filipino women aged 50-54 years old if the prevalence based on previous studies is about 5%
To compare the parameter (mean) between 2 groups Computing the test statistic H0: PM = PC H1: PM ≠ PC Level of significance (α) = 0.05 What is the critical region (CR)?
EXAMPLE OF RESEARCH OBJECTIVE To determine if the average life span of Filipinos has changed over the years since 1995 which was 65 years old
Objectives of the study Type of variable Level of measurement Whether the samples are related or independent Assumption on the distribution
Page 7 of 10
IV. TYPES OF VARIABLES AND LEVELS OF MEASUREMENT
[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO Study Objective
A. TYPES OF VARIABLES 1.
2.
QUANTITATIVE Variables can be measured and ordered according to quantity or amount, or whose values can be expressed numerically o age, height, weight, no. of correct answers Discrete: integers/whole numbers Continuous: fractions/decimals QUALITATIVE Categories are simply used as labels to distinguish one group from another sex, urban-rural classification, religion, region in the country, occupation, marital status, disease status
LEVELS OF MEASUREMENT 1.
NOMINAL
Number or names which represent a set of mutually exclusive and exhaustive classes to which individuals or objects may be assigned sex, regions, race, occupation, patient id no. 2. ORDINAL
Classes can be ordered or ranked dehydration status: none, some, severe socio-economic status: low, middle, high 3. INTERVAL
Exact/equal distance between two categories can be determined Zero point is arbitrary and does not mean absence of the characteristic temperature, calendar time, IQ 4. RATIO
Zero point is fixed weight, height, blood pressure, number of seizure recurrence, number of pre-natal visits Number of samples and whether they are related or independent Study Variable Average dental caries score among public elementary school children Prevalence of parasitism amount Filipino children 1-5 yrs old Mean blood lead levels between schoolchildren and street child vendors Success rate between medically vs surgically treated groups for otitis media
Type of Variable
To compare the mean blood lead levels between schoolchildren and street child vendors To compare the performance of 10 pairs of students matched by IQ, one subjected to programmed materials and the other subjected to lecture type of learning process To compare the prevalence of current smoking among different license groups categorized into quintiles To determine change in the level of knowledge on breast cancer at baseline and after the distribution of the DOH health education material
Number of Samples 2
Type of Sample
2
Related
3
Independent
2
Related
Independent
V. ASSUMPTION OF DISTRIBUTION Parametric Assumptions ▫ Random selection ▫ Normality ▫ Homoscedasticity Numerical data ▫ Interval ▫ Ratio
Non-Parametric Few assumptions
Non-numerical data ▫ Nominal ▫ Ordinal Smaller sample size
Level of Measurement
Quantitative
Ratio
Qualitative
Nominal
Quantitative
Ratio
Qualitative
Nominal
VI. TEST FOR DIFFERENCE BETWEEN GROUP PROPORTIONS
Independent Related
Page 8 of 10
2 Chi-square/Fishers McNemar
>2 Chi-square Cochran Q
[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO EXAMPLES Study Objective To determine if the average dental caries score among public elementary school children has improved since last year’s survey To determine if the prevalence of parasitism among 6 yo children entering school differs from the national figure which is 70% Study Objective
VII. TEST FOR DIFFERENCE BETWEEN GROUP MEANS/ MEDIANS 2 Independent Independent t-test Wilcoxon MannWhitney
Parametric NonParametric
Related Paired ttest Wilcoxon signed rank
>2 Independent One-way ANOVA Kruskal Wallis
2 Interval Ratio Ordinal
Related Two-way ANOVA Friedmann
>2
Independent Independent t-test
Related Paired ttest
Independent One-way ANOVA
Related Two-way ANOVA
Wilcoxon MannWhitney
Wilcoxon signed rank
Kruskal Wallis
Friedmann
VIII. STATISTICAL TOOLS TO INVESTIGATE RELATIONSHIP BETWEEN VARIABLES VARY ACCORDING TO: Nominal Cramer coefficient Phi coefficient
Interval/Ratio Pearson product moment correlation (simple and multiple)
Kappa coefficient of agreement
Linear regression (simple and multiple)
Chi-square test of association
Ordinal Spearman rankorder correlation
To assess the health effects of vehicular emissions on vulnerable population groups by looking at mean blood lead levels between schoolchildren and street child vendors To compare the rate of success for the treatment of otitis media defined as the proportion who had ≤ otitis media episode in the first medically and surgically treated groups To compare the performance of 10 pairs of students matched by IQ, one subjected to programmed materials and the other subjected to lecture type of learning process To compare the hypoalgesic effect as measured through pain scores of true (distal & proximal to the tourniquet) and sham acupuncture To compare the prevalence of current smoking among different income groups categorized into quintiles To determine change in the level of knowledge on breast cancer at baseline and after the distribution of DOG health education material To determine whether
Page 9 of 10
Level of Measurement
No. of Samples
Test Statistic
Ratio
1
t-test for 1 mean
Nominal
1
z-test for 1 proporti on
Level of Measurem ent
No. of Sam ples
Type of Sample
Test Statistic
Ratio
2
Indep enden t
Indepen dent ttest
Nominal
2
Indep enden t
Chisquare test
Ordinal
2
Relate d
Wilcoxo n signed ranks test
Ratio
3
Indep enden t
ANOVA
Nominal
5
Indep enden t
Chisquare test
Nominal
2
Relate d
McNemar’s Change Test
[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO systolic BP of patients in standing and recumbent position vary with each other To determine if there is a relationship between gender and smoking status
Ratio
1
Pearson correlati on
1 Nominal
Chisquare test of associati on
Example 1
What is the proportion of non-significant reduction in cry/fuss duration among those whose mothers were on low-allergenic diet? Data Layout for Case Control
Hypothesis st Among breastfed infants with colic presenting in 1 6 wks of life, elimination of multiple, major allergenic food proteins from the maternal diet is associated with a reduction in crying and fussing. A randomized, controlled trial of a low-allergen maternal diet was conducted among exclusive breastfed infants presenting with colic. The primary endpoint was the duration of crying/fussing measured in minutes within 48 hours taken at baseline (days 1 & 2) and on days (8 & 9). Variable Mean cry/fuss duration, min/48h Days 1 & 2 Days 8 & 9
Low-allergen diet
Control diet
690 431
631 509
What is the objective of the study What is the level of measurement? How many sample/groups involved? Are they related or independent? What is the appropriate test statistic? ▫ between groups at baseline ▫ between groups on days 8 & 9 ▫ within each group between baseline and on days 8 & 9
Exposure status Exposed Unexposed Total
Ratio of 2 odds, the odds of exposure among cases and the odds of exposure among the controls
Comparison of mean between groups Ratio 2 Independent Independent t-test
XI. MEASURING THE ACCURACY OF THE DIAGNOSTIC TEST Data Lay-out Diagnostic or Screening Test Positive Test
Paired t-test
Negative Test
Data Layout for Cohort
Exposed Unexposed
Total Without Disease B D
Controls B D B+ d
Odds Ratio (OR)
IX. ANALYSIS OF 2X2 TABLES
Disease Status With Disease A C
Outcome Cases A C A +C
Total
A+B C+D
X. MEASURES OF EFFECT/ASSOCIATION Relative Risk (RR) Ratio of incidence of disease in the exposed to the incidence of disease in the unexposed
Page 10 of 10
Gold Standard Disease Disease Absent Present True False Positive (b) Positive (a) False True Negative Negative (d) (c) TP + FN FP + TN (a + c) (b + d)
Total
TP + FP (a + b) FN + TN (c + d) TP + FP + FN + TN (a + b + c + d)
View more...
Comments