Review of Statistical Analysis

November 19, 2017 | Author: IsabelCupino | Category: Statistical Hypothesis Testing, Level Of Measurement, Null Hypothesis, Statistics, Student's T Test
Share Embed Donate


Short Description

Statistical Analysis Lecture...

Description

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS Dr. Adversario || June 29, 2015

RES

Transcribers: Campbell, Candare, Caraan, Carasig, Carballo Editors: Adviento

A. Measures of Frequency

OUTLINE I.

II.

III. IV. V. VI. VII. VIII. IX. X. XI.

Descriptive Analysis A. Qualitative and Quantitative measures 1. Measures of Frequency 2. Measures of Location 3. Measures of Central Tendency 4. Measures of Dispersion B. Tabular and Graphical Presentation Inferential Analysis A. Estimation B. Hypothesis Testing Factors to be considered in choosing the proper statistical tests Types of Variables and Levels of Measurement Assumption of Distribution Test for Difference Between Group Proportions Test for Difference of Group Means/Medians Statistical Tools To Investigate Relationship Between Variables Analysis of 2x2 tables Measures of Effects/Association Measuring the Accuracy of the Diagnostic Tests

Legend: Remember (Exams)

Lecturer

Book

Previous Trans









Trans Comm 

1. Count  absolute number of persons/elements with the characteristic 2. Ratio  single number representing the relative size of 2 numbers  a/b (k) 3. Proportion  special type of ratio where the numerator is part of the denominator  a/a+b (k) 4. Rate  frequency of occurrence of events in a given interval of time B. Measures of Location 1. Percentile  one of the 99 values of a variable which divides the distribution into 100 equal parts 2. Decile  one of the 9 values of a variable which divides the distribution into 10 equal parts 3. Quartile  one of the 3 values of a variable which divides the distribution into 4 equal parts

LEARNING OUTCOMES I. Determine the appropriate descriptive measure for summarizing data II. Determine the appropriate method for presenting data III. Determine the appropriate statistical test for analyzing data Descriptive Statistics 



Method to summarize and present data in a form which will make it easier to analyze and interpret Consists of the collection, organization, summarization, and presentation of data

 P25 = D2.5 = Q1  P50 = D5 = Q2  P75 = D7.5 = Q3

Inferential Statistics 

 

Method to make generalizations and conclusions about a target population based on results from a sample INFERENCES from samples to populations Uses probability

        

Summarizing Figures Qualitative Measures Frequency Location Quantitative Measures Central Tendency Dispersion Tabular presentation Graphical presentation

C. Measures of Central Tendency   

Mean: average Median: middlemost Mode: most frequent

I. DESCRIPTIVE ANALYSIS

Page 1 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO

Interpretation  Mean: The average weight of patients is 14.4 kg  Median: Half of the patients weighed less than 15.85 kg while the other half weighed more than or equal to 15.85 kg  Mode: The usual weight of the patients is 12.6 and 16 kg Choice of the Measures of Central Tendency  scale of measurement  nature of the distribution

Coefficient of Variation  Comparing 2 different variables  Comparing 2 different populations on the same variable  Measure of relative dispersion which expresses the standard deviation as a percentage of the mean

D. Measures of Dispersion     

Range Variance Standard Deviation Coefficient of Variation Interquartile Range Page 2 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO

Summary       



Proportions and percentages are used to summarize nominal and ordinal data Percentiles are useful to compare an individual observation with a norm The median is used for ordinal data or skewed numerical data The range is used with numerical data when the purpose is to emphasize extreme values The standard deviation is used when the mean is used The coefficient of variation is used when the intent is to compare distributions measured on different scales The interquartile range is used to describe the central 50% of a distribution, regardless of its shape Variance and standard deviation can be used to directly compare two samples with same units of measure

Type of Graphs 

Pie Graph



Bar Graph o Vertical

Tabular and Graphical Presentation

Figures  Visual presentation of results o Graphs o Diagram o Photograph o Pen and ink drawings o Flow Charts o Schematics o Maps

Page 3 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO o

Horizontal



Frequency Polygon

o

Component



Scatterplot

Choosing the appropriate graph GRAPH Histogram or frequency polygon Bar (Horizontalor Vertical) Pie or component bar o



Histogram

NATURE OF VARIABLE Quantitative continuous

PURPOSE

Qualitative or quantitative discrete Qualitative

Comparison of absolute or relative counts between categories Breakdown of a group total where the number of categories is not too many Shows trend of data or changes with time Correlate data between two variables

Line graph

Time Series

Scatterplot

Quantitative

Graphic representation of a frequency distribution

Figure Checklist  Is the figure necessary?  Are the data plotted accurately  Is the grid scale correctly proportioned?  Are parallel figures or equally important figures prepared according to timescale?  Avoid 3D figures for 2D data  Avoid non data ink (ticks, grids, frames) and  chart junk  Avoid optical illusions: broken lines, markers,  hatching fill patterns, improper aspect ratios

Line Graph



Page 4 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO II. INFERENTIAL ANALYSIS A. Estimation 

Is the process by which a statistic computed for a random sample is used to approximate (“estimate”) the corresponding parameter o Parameter- numerical constant obtained by observing the total population o Statistic- numerical variable obtained by observing a random sample from the population

Summarizing Figure Box Plot

Parameter

Mean

μ

Variance

σ

2

s

Standard deviation

σ

s

Proportion

Area of Practice

NCR No. (%)

Non-NCR No. (%)

Co-management

27 (48.2143)

29 (51.7857)

56

CP Clearance

23 (63.8888)

13 (36.1111)

36

Diagnostic Procedures

22 (56.4103)

17 (43.5897)

39

Ventilator Management

18 (43.9024)

23 (56.0976)

41

Reasons for Referral

P

p X1-X2

Difference between Two Proportions

P1-P2

p1-p2

Example:

9 (50.0000)

9 (50.0000)

18

Weaning from ventilator

6 (30.0000)

14 (70.0000)

20

What is wrong with the table? Too many decimal places “Area of Practice” can be merged into one heading Gridlines should be limited to three to highlight figures

Co-management

Research Objective

Results

To estimate the prevalence of parasitism among Filipino children 1-5 years old

(61%, 79%)

To determine the average dental carise score among public elementary school children

70%

15% (14.7%, 15.3%)

Estimate: 70% and 15% are the point estimates for each objective and the values inside the parentheses are the interval estimates. Example 1: Estimation of Population Mean

Area of Practice Reasons for Referral



Point estimate – single numerical value used to approximate the population Interval estimate – consists of 2 numbers, a lower limit and an upper limit, which serves as the bounding values within which the parameter is expected to lie with a certain degree of confidence Point estimate is more precise but interval estimate is more likely to be correct because it gives you a range of values.

Total

Peri-operative evaluation for thoracic surgery

  

2

μ1-μ2



Area of Practice

x

Difference between Two Means



Table 13 Most Common Reasons for Referral to One’s Specialty in Area of Practice

Statistic

NCR No. (%)

Non-NCR No. (%)

Total

27 (48.2)

29 (51.8)

56

CP Clearance

23 (63.8)

13 (36.1)

36

Diagnostic Procedures

22 (56.4)

17 (43.6)

39

Ventilator Management

18 (43.9)

23 (56.1)

41

Peri-operative evaluation for thoracic surgery

9 (50)

9 (50)

18

Weaning from ventilator

6 (30)

14 (70)

20

A municipal health officer was interested in identifying factors affecting the utilization of health services in his area. Among the factors that he considered was the accessibility of the Rural Health Unit. He interviewed a random sample of 25 patients and asked about the distance travelled in going from their homes to the clinic. His findings showed a mean travel distance of 7km. What is the point estimate of the mean distance travelled by the population of patients served by the clinic. The point estimate of the mean distance traveled by the patients from their homes to the clinic is 7km, which is the result obtained from the sample.

Page 5 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO

H0: No difference between those using cadavers and models H1 (two-tailed test): there is a difference H1 (one-tailed test): those using models have greater performance or vice versa

Example 2: Estimation of the Population Proportion A survey was conducted to study the dental health practices of adults in a certain urban population. Of 300 adults randomly selected and interviewed, 123 indicated that they had regular dental check-up twice a year. What is the point and the 95% interval estimate of the population proportion who had regular dental check-up?

2. 

The point estimate of the population proportion who had regular dental check-up is 41% (

123 300

Stating the level of significance, α When we arbitrarily set the level of significance at α, we are setting the probability that we shall erroneously reject a true H0 to be at most equal to α e.g. if we set α=0.5, the probability that we are rejecting a true hypothesis is at most only 5%

× 100):

3.

B. Hypothesis Testing

 



Process of hypothesis testing

Type I error (α error)

Probability of rejecting a true H0

Concluding there is a difference when none exists

Type II error (β error)

Probability of not rejecting a false H0

Concluding that no difference exists when there is

Choosing the test statistic and determining its sampling distribution Depends on the sampling distribution of the sample statistic Probability distribution tables of the different test statistic o normal table: z statistic o t table: t statistic 2 2 o X table: X statistic Factors to be considered in choosing the appropriate statistical test o Objectives of the study o Type of variable o Level of measurement o Whether the samples are related or independent o Assumption on the distribution

STEPS IN HYPOTHESIS TESTING 1.

Stating the null hypothesis, H0 and the alternative hypothesis, H1

Example: A study to compare the performance in Anatomy of 2 groups of students, those using cadavers for demonstration and those using models. If the parameter for evaluating student performance is the proportion who obtain a grade of 2.0 or better, then the H0 and H1 are formulated as follows:

4.  

Determining the critical region Critical region – set of values of the test statistics which will lead me to reject a null hypothesis Critical region for a Two-Tailed z test with α=0.05

Page 6 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO OBJECTIVES To compare the level of parameter (mean) with a prespecified value (i.e., standard value, national figure, previous results) To compare the level of parameter (proportion) with a pre-specified value (i.e., standard value, national figure, previous results)

5.

To assess the health effects of vehicular emissions on vulnerable population groups by looking at mean blood lead levels between schoolchildren and street child vendors

To compare the parameter (proportion) between 2 groups

To determine success rate defined as the proportion who had ≤ 1 otitis media episode in the first year of treatment between medically and surgically treated groups

To compare the parameter (mean) between 2 or more groups

To compare the hypoalgesic effect as measured through pain scores of true (distal & proximal to the torniquet) and sham acupuncture

To compare the parameter (proportion) between 2 or more groups

To compare the prevalence of current smoking among different income groups categorized into quintiles

To determine whether two or more quantitative variables are related

To determine whether systolic blood pressure of patients in the recumbent and standing positions vary with each other

To determine whether two or more qualitative variables are related

To determine if there is an association between gender and smoking status

Example:



Divide the CR into 2 equal parts, α/2= 0.025, such that one part is located in each tail end of the sampling distribution of the test statistic From the normal table, z value corresponding to a probability of 0.025 is 1.96 CR z ≥ 1.96 and z ≤ -1.96

  6. 

Making the statistical decision i.e. whether or not to reject the null hypothesis Rejecting of Nor rejecting the null hypothesis (H0) 1. If the computed value of the test statistic falls in the critical region, then we reject H0 2. If the probability (p-value) of getting the computed test statistic under H0 is low, we can say that the sample data cannot support H0 and thus we can reject H0

7.

Drawing conclusions about the population STATISTICAL DECISION Reject the null hypothesis (H0) Do not reject the null hypothesis (H0)

CONCLUSION State the alternative hypothesis (H1) “There is no sufficient evidence to say (state the alternative hypothesis)” NOTE: We don’t accept the null hypothesis

III. FACTORS TO BE CONSIDERED IN CHOOSING THE APPROPRIATE STATISTICAL TEST     

To determine the prevalence of breast cancer among Filipino women aged 50-54 years old if the prevalence based on previous studies is about 5%

To compare the parameter (mean) between 2 groups Computing the test statistic H0: PM = PC H1: PM ≠ PC Level of significance (α) = 0.05 What is the critical region (CR)?

EXAMPLE OF RESEARCH OBJECTIVE To determine if the average life span of Filipinos has changed over the years since 1995 which was 65 years old

Objectives of the study Type of variable Level of measurement Whether the samples are related or independent Assumption on the distribution

Page 7 of 10

IV. TYPES OF VARIABLES AND LEVELS OF MEASUREMENT

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO Study Objective

A. TYPES OF VARIABLES 1.  

2.  

QUANTITATIVE Variables can be measured and ordered according to quantity or amount, or whose values can be expressed numerically o age, height, weight, no. of correct answers Discrete: integers/whole numbers Continuous: fractions/decimals QUALITATIVE Categories are simply used as labels to distinguish one group from another sex, urban-rural classification, religion, region in the country, occupation, marital status, disease status

LEVELS OF MEASUREMENT 1.

NOMINAL



Number or names which represent a set of mutually exclusive and exhaustive classes to which individuals or objects may be assigned  sex, regions, race, occupation, patient id no. 2. ORDINAL
  

Classes can be ordered or ranked dehydration status: none, some, severe socio-economic status: low, middle, high 3. INTERVAL

 

Exact/equal distance between two categories can be determined Zero point is arbitrary and does not mean absence of the characteristic  temperature, calendar time, IQ 4. RATIO  

Zero point is fixed weight, height, blood pressure, number of seizure recurrence, number of pre-natal visits Number of samples and whether they are related or independent Study Variable Average dental caries score among public elementary school children Prevalence of parasitism amount Filipino children 1-5 yrs old Mean blood lead levels between schoolchildren and street child vendors Success rate between medically vs surgically treated groups for otitis media

Type of Variable

To compare the mean blood lead levels between schoolchildren and street child vendors To compare the performance of 10 pairs of students matched by IQ, one subjected to programmed materials and the other subjected to lecture type of learning process To compare the prevalence of current smoking among different license groups categorized into quintiles To determine change in the level of knowledge on breast cancer at baseline and after the distribution of the DOH health education material

Number of Samples 2

Type of Sample

2

Related

3

Independent

2

Related

Independent

V. ASSUMPTION OF DISTRIBUTION Parametric Assumptions ▫ Random selection ▫ Normality ▫ Homoscedasticity Numerical data ▫ Interval ▫ Ratio

Non-Parametric Few assumptions

Non-numerical data ▫ Nominal ▫ Ordinal Smaller sample size

Level of Measurement

Quantitative

Ratio

Qualitative

Nominal

Quantitative

Ratio

Qualitative

Nominal

VI. TEST FOR DIFFERENCE BETWEEN GROUP PROPORTIONS

Independent Related

Page 8 of 10

2 Chi-square/Fishers McNemar

>2 Chi-square Cochran Q

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO EXAMPLES Study Objective To determine if the average dental caries score among public elementary school children has improved since last year’s survey To determine if the prevalence of parasitism among 6 yo children entering school differs from the national figure which is 70% Study Objective

VII. TEST FOR DIFFERENCE BETWEEN GROUP MEANS/ MEDIANS 2 Independent Independent t-test Wilcoxon MannWhitney

Parametric NonParametric

Related Paired ttest Wilcoxon signed rank

>2 Independent One-way ANOVA Kruskal Wallis

2 Interval Ratio Ordinal

Related Two-way ANOVA Friedmann

>2

Independent Independent t-test

Related Paired ttest

Independent One-way ANOVA

Related Two-way ANOVA

Wilcoxon MannWhitney

Wilcoxon signed rank

Kruskal Wallis

Friedmann

VIII. STATISTICAL TOOLS TO INVESTIGATE RELATIONSHIP BETWEEN VARIABLES VARY ACCORDING TO: Nominal Cramer coefficient Phi coefficient

Interval/Ratio Pearson product moment correlation (simple and multiple)

Kappa coefficient of agreement

Linear regression (simple and multiple)

Chi-square test of association

Ordinal Spearman rankorder correlation

To assess the health effects of vehicular emissions on vulnerable population groups by looking at mean blood lead levels between schoolchildren and street child vendors To compare the rate of success for the treatment of otitis media defined as the proportion who had ≤ otitis media episode in the first medically and surgically treated groups To compare the performance of 10 pairs of students matched by IQ, one subjected to programmed materials and the other subjected to lecture type of learning process To compare the hypoalgesic effect as measured through pain scores of true (distal & proximal to the tourniquet) and sham acupuncture To compare the prevalence of current smoking among different income groups categorized into quintiles To determine change in the level of knowledge on breast cancer at baseline and after the distribution of DOG health education material To determine whether

Page 9 of 10

Level of Measurement

No. of Samples

Test Statistic

Ratio

1

t-test for 1 mean

Nominal

1

z-test for 1 proporti on

Level of Measurem ent

No. of Sam ples

Type of Sample

Test Statistic

Ratio

2

Indep enden t

Indepen dent ttest

Nominal

2

Indep enden t

Chisquare test

Ordinal

2

Relate d

Wilcoxo n signed ranks test

Ratio

3

Indep enden t

ANOVA

Nominal

5

Indep enden t

Chisquare test

Nominal

2

Relate d

McNemar’s Change Test

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS – DR. ADVERSARIO systolic BP of patients in standing and recumbent position vary with each other To determine if there is a relationship between gender and smoking status

Ratio

1

Pearson correlati on

1 Nominal

Chisquare test of associati on

Example 1

What is the proportion of non-significant reduction in cry/fuss duration among those whose mothers were on low-allergenic diet? Data Layout for Case Control

Hypothesis st Among breastfed infants with colic presenting in 1 6 wks of life, elimination of multiple, major allergenic food proteins from the maternal diet is associated with a reduction in crying and fussing. A randomized, controlled trial of a low-allergen maternal diet was conducted among exclusive breastfed infants presenting with colic. The primary endpoint was the duration of crying/fussing measured in minutes within 48 hours taken at baseline (days 1 & 2) and on days (8 & 9). Variable Mean cry/fuss duration, min/48h Days 1 & 2 Days 8 & 9

Low-allergen diet

Control diet

690 431

631 509

What is the objective of the study What is the level of measurement? How many sample/groups involved? Are they related or independent? What is the appropriate test statistic? ▫ between groups at baseline ▫ between groups on days 8 & 9 ▫ within each group between baseline and on days 8 & 9

Exposure status Exposed Unexposed Total

Ratio of 2 odds, the odds of exposure among cases and the odds of exposure among the controls

Comparison of mean between groups Ratio 2 Independent Independent t-test

XI. MEASURING THE ACCURACY OF THE DIAGNOSTIC TEST Data Lay-out Diagnostic or Screening Test Positive Test

Paired t-test

Negative Test

Data Layout for Cohort

Exposed Unexposed

Total Without Disease B D

Controls B D B+ d

Odds Ratio (OR)

IX. ANALYSIS OF 2X2 TABLES

Disease Status With Disease A C

Outcome Cases A C A +C

Total

A+B C+D

X. MEASURES OF EFFECT/ASSOCIATION Relative Risk (RR) Ratio of incidence of disease in the exposed to the incidence of disease in the unexposed

Page 10 of 10

Gold Standard Disease Disease Absent Present True False Positive (b) Positive (a) False True Negative Negative (d) (c) TP + FN FP + TN (a + c) (b + d)

Total

TP + FP (a + b) FN + TN (c + d) TP + FP + FN + TN (a + b + c + d)

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF