Research Design and Statistical Analysis - 4th Ed 2006
May 2, 2017 | Author: johnknight000 | Category: N/A
Short Description
Research design...
Description
Research Design and Statistical Analysis in Christian Ministry
Table of Contents Preface Table of Contents
i-1 i-5 Unit I: Research Fundamentals
1 ........................................................................................................................ Scientific Knowing Ways of Knowing
1-1
Common Sense Authority Intuition/Revelation Experience Deductive Reasoning Inductive Reasoning
1-1 1-2 1-2 1-3 1-3 1-3
Objectivity Precision Verification Empiricism Goal: Theories
1-4 1-4 1-5 1-5 1-5
Science as a Way of Knowing
The Scientific Method Types of Research Historical Research 1-7 1-7 1-7 1-8
An Example
1-8
An Example
1-9
An Example
1-10
An Example
1-10
An Example
1-11
An Example
1-11
Correlational Research Experimental Research Ex Post Facto Research Evaluation
Research and Development Qualitative Research
Faith and Science
1-8 1-9 1-9 1-10 1-10 1-11 1-11
Suspicion of Science By the Faithful Suspicion of Religion By the Scientific There Need Be No Conflict
1-12 1-13 1-13
Vocabulary Study Questions Sample Test Questions
1-15 1-15 1-15
Summary
1-6 1-6
1-7
Primary sources Secondary sources Criticism Examples
Descriptive Research
1-4
1-12
1-14
2 ........................................................................................................................ Proposal Organization Front Matter
2-2 2-2 2-2 2-3
The Introductory Statement
2-3
Introduction
i-8
2-2
Title Page Table of Contents List of Tables List of Illustrations
2-3
© 4th ed. 2006 Dr. Rick Yount
Preliminaries The Statement of the Problem Purpose of the Study Synthesis of Related Literature Significance of the Study The Hypothesis
2-3 2-4 2-4 2-6 2-6
Population Sampling Instrument Limitations Assumptions Definitions Design Procedure for Collecting Data
2-7 2-8 2-8 2-9 2-10 2-10 2-11 2-12
Procedure for Analyzing Data Testing the Hypotheses Reporting the Data
2-12 2-13 2-13
Appendices Bibliography, or Cited Sources
2-13 2-14
Personal Anxiety Professionalism in Writing
2-14 2-15
Method
Analysis
Reference Material
Practical Suggestions
Clear Thinking Unified Flow Quality Library Research Efficient Design Accepted Format
Summary
2-15 2-15 2-15 2-15 2-15
Vocabulary Study Questions Sample Test Questions
2-7
2-12
2-13 2-14
2-16 2-16 2-16 2-17
3 ........................................................................................................................ Empirical Measurement 3-1 Variables and Constants
3-1
Independent Variables Dependent Variables
3-2 3-2
Nominal Measurement Ordinal Measurement Interval Measurement Ratio Measurement Data Type Summary
3-2 3-2 3-2 3-3 3-3
Definitions An Example Another Example Operationalization Questions
3-4 3-4 3-5 3-6
Vocabulary Study Questions Sample Test Questions
3-7 3-7 3-7
Measurement Types
Operationalization
Summary
3-2
3-3
3-6
4 ........................................................................................................................ Getting On Target 4-1 The Problem Statement
4-1
Characteristics of a Problem Limit scope of your study Current theory and/or latest research
© 4th ed. 2006 Dr. Rick Yount
4-1 4-1 4-1
i-9
Research Design and Statistical Analysis in Christian Ministry Meaningfulness Clearly written
4-2 4-2
Association Between Two Variables Association of several variables Difference Between Two Groups Differences Between More Than Two Groups
4-2 4-2 4-3 4-3
Examples of Problem Statements
The Hypothesis Statement The Research Hypothesis
Association Between Two Variables Association of several variables Difference Between Two Groups Differences Between More Than Two Groups
The Directional Hypothesis The Non-directional Hypothesis The Null Hypothesis
4-4 4-5 4-5 4-6
Example 1 4-8 4-8
Comments Suggested revision
4-8 4-9
Comments Suggested revision
4-9 4-9
Comments Suggested revision
4-9 4-10
Comments
4-10
Example 4
Example 5
Dissertation Examples
4-6 4-7 4-7
4-8
4-8
Comments Suggested revision
Example 3
4-4 4-4
Revision Examples
Example 2
4-2
Regression Analysis Correlation of Competency Rankings Factorial Analysis of Variance Chi-Square Analysis of Independence
4-8 4-9 4-9 4-10
4-10 4-10 4-11 4-11 4-11
5 ........................................................................................................................ Introduction to Statistical Analysis 5-1 Statistics, Mathematics, and Measurement
5-1
Descriptive Statistics Inferential statistics Statistics and Mathematics Statistics and Measurement
5-2 5-2 5-2 5-2
Question One: Similarity or Difference? -1- Question Two: Data Types in Similarity Studies -2- Question Two: Data Types in Difference Studies
5-4 5-4 5-4
A Statistical Flow Chart
-3-4-5-6-7-
Interval or Ratio Correlation Ordinal Correlation Nominal Correlation Interval/Ratio Differences Ordinal Differences
Summary
Vocabulary Study Questions Sample Test Questions
5-4 5-5 5-5 5-6 5-7
5-4
5-7 5-7 5-8 5-8
6 ........................................................................................................................ Synthesis of Related Literature 6-1 A Definition
Synthetic Narrative Recent Research
i-10
6-1
6-1 6-1
© 4th ed. 2006 Dr. Rick Yount
Preliminaries Related to Your Study
6-2
Choose One or More Databases
6-2
The Procedure for Writing the Related Literature E.R.I.C. RIE CIJE Psychological Abstracts Dissertation Abstracts
6-2 6-2 6-2 6-3 6-3
Thesaurus of ERIC Descriptors Education Index Citation Indexes Smithsonian Science Information Exchange Mental Measurements Yearbook Measures for Psychological Measurement
6-3 6-3 6-3 6-4 6-4 6-4
Searching manually Searching by Computer
6-5 6-5
An Organizational Notebook Prioritizing Articles Selecting Notes and Quotes with References Reorganize Material by Key Words Write a Synthesis of Related Literature Revise the Synthesis
6-7 6-7 6-8 6-8 6-8 6-8
Choose Preliminary Sources
Select Key Words Searching the literature
Select Articles Analyze the Research Articles
Summary
Vocabulary Study Questions Sample Test Questions
6-2
6-3
6-4 6-5 6-6 6-7
6-9 6-9 6-9 6-10
7 ....................................................................................................................... Populations and Sampling 7-1 The Rationale of Sampling
7-1
The Population Sampling Biased Samples Randomization
7-1 7-1 7-2 7-2
Identify the Target Population Identify the Accessible Population Determine the Size of the Sample
7-2 7-2 7-3
Steps in Sampling
Accuracy Cost The Homogeneity of the Population Other Considerations Sample Size Rule of Thumb Select the Sample
Types of Sampling
7-3 7-3 7-3 7-4 7-4 7-4
7-4
Simple Random Sampling Systematic Sampling Stratified Sampling Cluster sampling
7-4 7-5 7-6 7-6
Historical Case Studies of Organizations Observational Case Studies Oral Histories Situational Analysis Clinical case study
7-8 7-8 7-8 7-8 7-8
Vocabulary
7-9
Inferential Statistics A Quick Look Ahead The Case Study Approach
Summary
© 4th ed. 2006 Dr. Rick Yount
7-2
7-7 7-8
7-9
i-11
Research Design and Statistical Analysis in Christian Ministry Study Questions Sample Test Questions
7-9 7-10
8 ........................................................................................................................ Collecting Dependable Data 8-1 Validity
8-1
Content Validity Predictive Validity Concurrent Validity Construct Validity
8-2 8-2 8-2 8-3
Coefficient of Stability Coefficient of Internal Consistency Coefficient of Equivalence
8-4 8-4 8-5
Answer 1: A Test Must be Reliable in Order to be Valid Answer 2: A Test Can be Valid Even If It Isn’t Reliable
8-5 8-5
Vocabulary Study Questions Sample Test Questions
8-7 8-8 8-8
Reliability
Reliability and Validity Objectivity Summary
8-3
8-5 8-6 8-7
Unit II: Research Methods
9 ........................................................................................................................ Observation 9-1 The Problem of the Observation Method Obstacles to Objectivity in Observation
9-1 9-2
Personal Interest Early decision Personal characteristics
9-2 9-2 9-3
Definition Familiar Groups Unfamiliar Groups Observational Limits Manual versus Mechanical Recording Interviewer Effect Debrief Immediately Participant Observation Undercover Observation? Observational Checklist
9-3 9-3 9-3 9-3 9-3 9-3 9-4 9-4 9-4 9-4
Example Vocabulary Study Questions Sample Test Questions
9-4 9-5 9-5 9-5
Practical Suggestions for Avoiding these Problems
Summary
9-3
9-4
10 ...................................................................................................................... Survey Research 10-1 The Questionnaire
10-1
Advantages Remote subjects Researcher influence Cost Reliability Subjects’ convenience
i-12
10-1 10-1 10-1 10-2 10-2 10-2
© 4th ed. 2006 Dr. Rick Yount
Preliminaries Disadvantages
10-2
Rate of return Inflexibility Subject motivation Verbal behavior only Loss of control
10-2 10-3 10-3 10-3 10-3
Asking questions Understandable format Clear instructions Demographics at the end
10-4 10-4 10-4 10-4
Types of questionnaires Guidelines
The Interview Advantages
10-5 10-5 10-5 10-5 10-5
Time Cost Interviewer effect Interviewer variables
10-6 10-6 10-6 10-6
Recording responses Interview skills Demographics Alternative modes
10-6 10-7 10-7 10-7
Types of Interviews Guidelines
Developing the Survey Instrument
10-5 10-5
Flexibility Motivation Observation Broader Application Freedom from mailings
Disadvantages
10-3 10-4
Specify Survey Objective Write Good Questions Evaluate and Select the Best Items Format the Survey Write Clear Instructions Pilot Study
10-6
10-6 10-6
10-7 10-7 10-7 10-7 10-8 10-8 10-8
Summary
Examples Vocabulary Study Questions Sample Test Questions
10-8
10-8 10-9 10-10 10-10
11 ...................................................................................................................... Developing Tests 11-1 Preliminary considerations
11-1
The Emphases in the Material Nature of Group being Tested The Purpose of the Test Writing items
11-1 11-2 11-2 11-2
The True-False Item
11-2
Objective Tests
Advantages Disadvantages
11-2 11-3
Avoid specific determiners Absolute answer Avoid double negatives Use precise language Avoid direct quotes Watch item length Avoid complex sentences Use more false items
11-3 11-3 11-3 11-3 11-4 11-4 11-4 11-4
Advantages
11-4
Writing True-False items
Multiple Choice Items
© 4th ed. 2006 Dr. Rick Yount
11-2
11-3
11-4
i-13
Research Design and Statistical Analysis in Christian Ministry Disadvantages
11-4
Pose a singular problem Avoid repeating phrases in responses Minimize negative stems Make responses similar Make responses mutually exclusive Make responses equally plausible Randomly order responses Avoid sources of irrelevant difficulty Eliminate extraneous material Avoid “None of the Above”
11-5 11-5 11-5 11-5 11-5 11-5 11-5 11-5 11-5 11-6
Advantages Disadvantages
11-6 11-6
When to use supply items Limit blanks Only one correct answer Blank important terms Place blank at the end Avoid irrelevant clues Avoid text quotes
11-6 11-6 11-6 11-6 11-7 11-7 11-7
Advantages Disadvantages
11-7 11-7
Limit number of pairs Make option list longer Only one correct match Maintain a central theme Keep responses simple Make the response option list systematic Specific instructions
11-7 11-7 11-8 11-8 11-8 11-8 11-8
Writing Multiple Choice Items
Supply Items
Writing Supply Items
Matching Items
Writing Matching Items
Essay Tests
Open-Ended Items 11-8 11-9
Use short-answer essays Write clear questions Develop a grading key
11-9 11-9 11-9
Item analysis
11-6 11-6
11-7 11-7
11-8 11-8
Advantages Disadvantages
Writing essay items
11-4
11-9
11-9
Rank Order Subjects By Grade Categorize Subjects into Top and Bottom Groups Compute Discrimination Index Revise Test Items
11-10 11-10 11-10 11-10
Examples Vocabulary Study Questions Sample Test Questions Sample Test
11-10 11-13 11-14 11-14 11-15
Summary
11-10
12 ...................................................................................................................... Developing Scales 12-1 The Likert Scale
12-2
Define the attitude Determine related areas Write statements Positive examples Negative examples
12-3 12-3
Validating the items Rank
12-3 12-3
Create an item pool
i-14
12-2 12-2 12-2 12-3
© 4th ed. 2006 Dr. Rick Yount
Preliminaries Formatting the Scale Write instructions Scoring the Likert scale
The Thurstone Scale
12-4 12-4 12-4
Develop item pool Compute item weights Rank the items by weight Choose Equidistant Items Formatting the Scale Administering the Scale Scoring
12-4 12-5 12-5 12-6 12-6 12-6 12-6 12-6
Q-Methodology Semantic Differential Delphi Technique Summary
Vocabulary Study Questions Sample Test Questions Sample Thurstone Scale Sample Thurstone Scale (with weights)
12-6 12-7 12-7 12-8
12-8 12-8 12-8 12-9 12-10
13 ...................................................................................................................... Experimental Designs 13-1 What Is Experimental Research? Internal Invalidity
13-1 13-2
History Maturation Testing Instrumentation Statistical regression Differential selection Experimental mortality Selection-Maturation Interaction of Subjects The John Henry Effect Treatment diffusion
13-2 13-2 13-2 13-3 13-3 13-3 13-4 13-4 13-4 13-4
Reactive effects of testing Treatment and Subject Interaction Testing and Subject Interaction Multiple Treatment Effect Summary
13-5 13-5 13-5 13-5 13-5
True Experimental Designs
13-6
External Invalidity
Types of Designs
Pretest-Posttest Control Group Posttest Only Control Group Solomon Four-Group
13-6 13-6 13-7
Time Series Nonequivalent Control Group Design Counterbalanced Design
13-7 13-8 13-8
Quasi-experimental Designs
Pre-experimental Designs
The One Shot Case Study One-Group Pretest/Posttest Static-Group comparison
Summary
Vocabulary Study Questions Sample Test Questions
© 4th ed. 2006 Dr. Rick Yount
13-4
13-6
13-7
13-9
13-9 13-9 13-10
13-10 13-10 13-11 13-11
i-15
Research Design and Statistical Analysis in Christian Ministry
Unit III: Statistical Fundamentals
14 ...................................................................................................................... Basic Math Skills 14-1 Mathematical Symbols
14-1
Arithmetic Operators Square (2) Square Root (√) The Sum Symbol (Σ) Parentheses and Brackets Using Letters as Numbers
14-1 14-1 14-2 14-2 14-2 14-3
Fractions Negative numbers Percents and Proportions Exponents Simple Algebra
14-3 14-4 14-4 14-4 14-4
Vocabulary Study Questions
14-6 14-6
Mathematical Concepts
Summary
14-3
14-6
15 ...................................................................................................................... Distributions and Graphs 15-1 Creating An Ungrouped Frequency Distribution Creating a Grouped Frequency Distribution
15-1 15-2
Calculate the Range Compute the Class Width Determine the Lowest Class Limit Determine the Limits of Each Class Group the Scores in Classes
15-2 15-2 15-3 15-3 15-3
X- and Y-axes Scaled Axes Histogram Frequency Polygon
15-4 15-4 15-4 15-5
Vocabulary Study Question Sample Test Questions
15-6 15-6 15-7
Graphing Grouped Frequency Distributions
Distribution Shapes Distribution-Free Measures Summary
15-4
15-5 15-6 15-6
16 ...................................................................................................................... Central Tendency and Variation 16-1 Measuring Central Tendency
16-1
The Mode The Median The Arithmetic Mean Central Tendency and Skew
16-1 16-1 16-2 16-3
Range Average Deviation Standard deviation
16-3 16-4 16-5
Measures of Variability
Deviation Method Raw Score Method Equal Means, Unequal Standard Deviations
16-5 16-6 16-7
Population Parameters
16-9
Parameters and Statistics
i-16
16-3
16-8
© 4th ed. 2006 Dr. Rick Yount
Preliminaries Sample Statistics Estimated Parameters
16-9 16-9
Standard (z-) Scores Summary
Example Vocabulary Study Questions Sample Test Questions
16-10 16-12 16-12 16-13 16-14 16-15
17 ...................................................................................................................... The Normal Curve and Hypothesis Testing 17-1 The Normal Curve
17-1
The Normal Curve Table The Normal Curve Table in Action
17-2 17-3
Criticial Values One- and Two-Tailed Tests
17-6 17-6
The Distinction Illustrated Using the z-Formula for Testing Group Means Computing Probabilities of Means
17-8 17-9 17-9
Level of Significance
Sampling Distributions
Summary
Example Vocabulary Study Questions Sample Test Questions
17-6 17-7
17-10
17-10 17-11 17-11 17-13
18 ...................................................................................................................... The Normal Curve: Error Rates and Power 18-1 Type I and Type II Error Rates
18-1
Decision Table Probabilities Normal Curve Areas
18-2 18-2
Increase α Increase µ1 - µ2 Decrease the Standard Error of the Mean
18-4 18-4 18-5
Increasing Statistical Power
Decrease s Increase n Like Fishing for Minnows
Statistical Significance and Practical Importance Summary
18-5 18-5 18-6
Vocabulary Study Questions Sample Test Questions
18-4
18-6 18-6 18-7 18-7 18-7
Unit IV: Statistical Procedures
19 ...................................................................................................................... One Sample Parametric Tests 19-1 The One-Sample z-Test The One-Sample t-Test
19-1 19-2
The t-Distribution Table Computing t
19-2 19-3
A z-Score Confidence Interval A t-Score Confidence Interval
19-4 19-5
Confidence Intervals Summary
© 4th ed. 2006 Dr. Rick Yount
19-4 19-5
i-17
Research Design and Statistical Analysis in Christian Ministry Vocabulary Study Questions Sample Test Questions
19-5 19-5 19-6
20 ...................................................................................................................... Two Sample t-Tests 20-1 Descriptive or Experimental? t-Test for Independent Samples
20-1 20-2
The Standard Error of Difference Example Problem
20-3 20-3
Effect of Correlated Samples The Standard Error of Difference Example Problem
20-5 20-6 20-6
t-Test for Correlated Samples
The Two Sample Confidence Interval Summary Examples Vocabulary Study Questions Sample Test Questions
20-5
20-8 20-8
20-8 20-10 20-10 20-11
21 ...................................................................................................................... One-Way Analysis of Variance 21-1 Why Not Multiple t-tests? Computing the F-Ratio
21-1 21-3
Sums of Squares Degrees of Freedom Variance Estimates The F-Ratio The F-Distribution Table The ANOVA Table An Example
21-3 21-4 21-4 21-4 21-5 21-5 21-5
Procedures Defined
21-6
Multiple Comparison Procedures The Least Significant Difference The Honestly Significant Difference Multiple Range Tests Fisher-Protected Least Significance Difference
21-6 21-7 21-7 21-7
(F)LSD HSD SNK
21-8 21-8 21-9
Procedures Computed
Summary
Examples Vocabulary Study Questions Sample Test Questions
21-6
21-7
21-10 21-10 21-11 21-11 21-13
22 ...................................................................................................................... Correlation Coefficients 22-1 The Meaning of Correlation Correlation and Data Types Pearson’s Product Moment Correlation Coefficient (rxy) Spearman’s rho Correlation Coefficient (rs) Other Important Correlation Coefficients Point Biserial Coefficient Rank Biserial Coefficient Phi Coefficient (rf) Kendall’s Coefficient of Concordance (W) The Coefficient of Determination (r2)
i-18
22-1 22-2 22-3 22-5 22-6
22-6 22-6 22-7 22-7 22-7
© 4th ed. 2006 Dr. Rick Yount
Preliminaries
Summary
22-8
Vocabulary Study Questions Sample Test Question
22-8 22-8 22-9
23 ...................................................................................................................... Chi-Square Procedures 23-1 The Chi Square formula The Goodness of Fit Test
23-1 23-2
Equal Expected Frequencies
23-2
The Example of a Die Computing the Chi Square Testing the Chi Square Value Translating into English
23-2 23-2 23-3 23-3
The Example of Political Party Preference Computing the Chi Square Value Testing the Chi Square Translate into English Eyeball the Data
23-3 23-3 23-4 23-4 23-4
Proportional Expected Frequencies
Chi-Square Test of Independence The Contingency Table Expected Cell Frequencies Degrees of Freedom Application to a Problem Party Preference Revisited Strength of Association Contingency Coefficient Cramer’s Phi
Cautions in Using Chi-Square
23-3
23-4 23-5 23-5 23-6 23-6 23-7 23-8
23-8 23-9
23-9
Small expected frequencies Assumption of Independence Inclusion of Non-Occurrences
23-9 23-10 23-10
Example Vocabulary Study Questions Sample Test Questions
23-11 23-12 23-12 23-12
Summary
23-11
Unit V: Advanced Statistical Procedures
24 ...................................................................................................................... Non-Parametric Statistics for Ordinal Differences 24-1 The Rationale of Testing Ordinal Differences Wilcoxin Rank-Sum Test (Ws)
24-2 24-2
Computing the Wilcoxin W The Wilcoxin W Table
24-3 24-3
Computing the Mann-Whitney U The Mann-Whitney U Table
24-3 24-4
Computing the Wilcoxin T The Wilcoxin T Table
24-4 24-5
Computing the Kruskal-Wallis H Using the Chi-Square Table with Kruskal-Wallis H
24-5 24-6
Example Vocabulary Study Questions Sample Test Questions
24-6 24-8 24-8 24-8
The Mann-Whitney U Test Wilcoxin Matched-Pairs Test (T) Kruskal-Wallis H Test Summary
© 4th ed. 2006 Dr. Rick Yount
24-3 24-4 24-5 24-6
i-19
Research Design and Statistical Analysis in Christian Ministry
25 ...................................................................................................................... Factorial and Multivariate Analysis of Variance 25-1 Two-Way ANOVA
25-2
The Meaning of Interaction Types of Interaction No Interaction Ordinal Interaction Disordinal Interaction
Sums of Squares in Two-Way ANOVA The Two-Way ANOVA Table
25-2 25-3 25-3 25-3 25-3
Three-way ANOVA Analysis of Covariance
25-3 25-4
Adjusting the SS Terms Uses of ANCOVA Example Problem
25-6 25-7 25-7
Example Vocabulary Study Questions Sample Test Questions
25-10 25-13 25-13 25-14
Multivariate Analysis of Variance Summary
25-5 25-6
25-9 25-10
26 ...................................................................................................................... Regression Analysis 26-1 The Equation of a Line Linear Regression
26-2 26-3
The Linear Regression Equation Computing a and b Drawing the Regression Line on the Scatterplot
Errors of Prediction (e) Standard Error of Estimate
26-3 26-3 26-4
Multiple Linear Regression Raw Score Regression Equation Standardized Score Regression Equation Multiple Correlation Coefficient Multiple Regression Example The Data The Correlation Matrix The Multiple Regression Equation The Essential Questions Multiple Regression Printout Section One Section Two Section Three
Focus on the Significant Predictors Multiple Regression Equations
26-4 26-5
26-6
26-6 26-6 26-6 26-7 26-7 26-7 26-7 26-8 26-8 26-8 26-9 26-10
Summary
Example Vocabulary Study Questions Sample Test Questions
26-10 26-11
26-12
26-12 26-14 26-14 26-15
Unit VI: EvaluatingResearch Proposals
27 ...................................................................................................................... Guidelines for Evaluating Research Proposals 27-1 Research Proposal Checklist Front Matter Introduction
i-20
27-1
27-1 27-1
© 4th ed. 2006 Dr. Rick Yount
Preliminaries The Method The Analysis General
27-2 27-2 27-3
Appendices: Answer Key to Sample Test Questions
A1
Word List
A2
Critical Value Tables
A3
Dissertations and a Thesis
A4
Bibliography
A5
© 4th ed. 2006 Dr. Rick Yount
i-21
Chapter 1
Scientific Knowing
Unit I: Research Fundamentals
1 Scientific Knowing Ways of Knowing Science as a Way of Knowing The Scientific Method Types of Research Have you considered how you know what you know? As you sit in classes or talk with friends, have you noticed that people differ in the way they know things? Look at six students who are discussing the issue of "modern translations" of the Bible. Student 1: "I use the King James Version because that's the translation I grew up using. Everybody in our church back home uses it." Student 2: "I use the New King James because my pastor says it offers the best of beauty and modern scholarship." Student 3: "I've prayed about what version to use. I like the Amplified Version because it is so clear in its language. It just feels right." Student 4: " I've tried five or six different translations for devotional reading and for preparation for teaching in Sunday School. After evaluating each one, I've come back again and again to the New International Version. It's the best translation for me." Student 5: "The essense of Bible study is understanding the message, whatever translation we may use. Therefore, I use different translations depending on my study goals." Student 6: "I use the New King James because most of my congregation is familiar with it. In a recent survey, I found that 84% of our members use the KJV or NKJV."
Each of these students reflect a different basis for knowing which translation to use. Which student most closely reflects your view? How did you come to know what you know?
Ways of Knowing Common Sense
As we begin our study of research design and statistical analysis, we need to understand the characteristics of scientific knowing, and how this kind of knowing differs from other ways we learn about our world. We will first look at five non-scientific ways of knowing: common sense, authority, intuition/revelation, experience, and deductive reasoning. Then we'll analyze the scientific method, which is based on inductive reasoning.
© 4th ed. 2006 Dr. Rick Yount
1-1
Authority Intuition/Revelation Experience Deductive Reasoning Inductive Reasoning
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
Common Sense Common sense refers to knowledge we take for granted. We learn by absorbing the customs and traditions that surround us—from family, church, community and nation. We assume this knowledge is correct because it is familiar to us. We seldom question, or even think to question, its correctness because it just is. Unless we move to another region, or go to school and study the views of others, we have nothing to challenge our way of thinking. It's just common sense! But common sense told us that “the earth is flat” until Columbus discovered otherwise. Common sense told us that “dunce caps and caning are effective student motivators” until educational research discovered the negative aspects of punishment. Common sense may well be wrong.
Authority Authoritative knowledge is an uncritical acceptance of another’s knowledge. When we are sick, we go to the doctor to find out what to do. When we need legal help, we go to a lawyer and follow his advice. Since we can not verify the knowledge on our own, we must simply choose to accept or reject the expert's advice. It would be foolish to argue with a doctor's diagnosis, or a lawyer's perception of a case. This is the meaning of "uncritical acceptance" in the definition above. The only recourse to accepting the expert's knowledge is to get a second opinion—from another expert. As Christians, we believe that God’s Word is the authority for our life and work. The Living Word—the Lord Himself—within us confirms the Truth of the Written Word. The Written Word confirms our experiences with the Living Word. Scripture is a valid source of authoritative knowledge. However, we spend a lot of time discussing Scriptural interpretations. Our discussions often deteriorate into conflicts about “my pastor’s” interpretations. We use our own pastor’s interpretation as authoritative because of the influence he has had in our own life. (We can substitute any authoritative person here, such as a father or mother, Sunday School teacher, or respected colleague.) But is the authority is correct? Authoritative knowing does not question the source of knowledge. Yet differing authorities cannot be correct simultaneously. How do we test the validity of an authority’s testimony?
Intuition/Revelation Intuitive knowledge refers to truths which the mind grasps immediately, without need for proof or testing or experimentation. The properly trained mind “intuits” the truth naturally. The field of geometry provides a good example of this kind of knowing. Let’s say I know that Line segment A is the same length as line segment B. I also know that Line segment B is the same length as line segment C. From these two truths, I immediately recognize that Line segments A and C are equal. Or, in short hand, IF A=B and B=C, THEN A=C
I do not need to draw the three lines and measure them. My mind immediately grasps the truth of the statement. Revelation is knowledge that God reveals about Himself. I do not need test this knowledge, or subject it to experimentation. When Christ reveals Himself to us, we know Him in a personal way. We did not achieve this knowledge by our own efforts, but merely received the revelation of the Lord. We cannot prove this knowledge to others, but it is bedrock truth to those who've experienced it. Problems arise, however,
1- 2
© 4th ed. 2006 Dr. Rick Yount
Chapter 1
Scientific Knowing
when we apply intuitive knowing to ministry programs. “Well, it's obvious that regular attendance in Sunday School helps people grow in the Lord.” Is it? We work hard at promoting Sunday School attendance. Does it actually change the lives of the attenders? Is it enough for people to think it does, whether or not real change takes place? Answers to these questions come from clear-headed analysis, not from intuition.
Experience Experiential knowledge comes from “trial and error learning.” We develop it when we try something and analyze the consequences. You've probably heard comments like these: “We've already tried that and it failed.” Or another: “We’ve found that holding Vacation Bible School during the third week of August, in the evening, is best for our church.” The first is negative. The speaker is saying there's no need to try that ministry or program again, because it was already tried. The second is positive. This church has tried several approaches to offering Vacation Bible School and found the best time for them. Their “truth” may not apply to any other church in the association, but it is true for them. They’ve tried it and it worked. . .or it didn’t. Much of the promotion of new church programs comes out of this framework. We say, “This program is being used in other churches with great success” (which means our church can have the same experience if we use this program). How do we evaluate program effectiveness? What is success? How do we measure it?
Deductive Reasoning Deductive reasoning moves thinking from stated general principles to specific elements. We develop general over-arching statements of intent and purpose. Then we deduce from these principles specific actions we should take. Determine “world view” first. Then make daily decisions which logically derive from this perspective. When we take the Great Commission as our primary mandate, we have framed a world view for ministry. That is, “Whatever we do, we will connect it to reaching out and baptizing (missions and evangelism), teaching (discipleship and ministry).” Now, how do we do it? We deduce specific programs, plans, and procedures for carrying out the mandate. We eliminate programs that conflict with this mandate. How do we arrive at this “world view?” Are our over-arching principles correct? Have we interpreted them correctly? Correct action rises or falls on the basis of two things. First, correct action depends on the correctness of our world view. Secondly, correct action depends on our ability to translate that view into practical ministry steps.
Inductive Reasoning Inductive reasoning moves thinking from specific elements to general principles. Inductive Bible study analyzes several passages and then synthesizes key concepts into the central truth. Science is inductive in its study of a number of specifics and its use of these results to formulate a theory. The truths derived in this way are temporary and open to adjustment when new elements are discovered. Knowledge gained in this way is usually related to probabilities of happenings. We have a high degree of confidence that combining “X” and “Y” will produce effect “Z.” Or, we learn that “B” and “C” are seldom found in combination with “D.” I can demonstrate probability by using matches. Picture yourself at the kitchen table with 100 matches. You pick up the first one. What is the probability it will light when you strike it? Well, you have two possibilities: either it will or it won’t. So the probability is 50% (1 event out of 2 possibilities). You strike it and it lights. Pick up the © 4th ed. 2006 Dr. Rick Yount
1-3
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
second match. The probability is 0.50 that it will light: (1 event out of two possibilities: Yes or No.) But cumulatively, out of two matches (first and second), one lit. One out of two is 50%. So the probability of the second match lighting is 50%, because 1 of 2 have already lit. You strike it and it lights. Pick up the third match. Again, the third match taken alone has “p = 0.50” of lighting (read ‘probability equals point-five-oh’). However, taking all three matches together, two of the three have lit and the probability is 2/3 (“p = 0.66”) that the third match will light. It does. Now, pick up the fourth match. The probability is 3/4 (p=0.75) that it will light, taking all four matches together. What about the 100th match, given that the 99 previous matches have all lit? The probability is 0.50 for this particular match (yes, no), but p = 0.99 taking all matches together. The probability is very high! Yet we cannot absolutely guarantee it will light. This is the nature of inductive logic, and inductive logic is the basis of scientific knowledge. By definition, science does not deal with absolute Truth. Science seeks knowledge about processes in our world. Researchers gather information through observation. They then mold this information into theories. The scientific community tests these theories under differing conditions to establish the degree to which they can be generalized. The result is temporary, open-ended truth (I call it little-t truth to distinguish it from absolute Truth). This kind of truth is open for inquiry, further testing, and probable modification. While this kind of knowing can add nothing to our faith, it is very helpful in solving ministry problems.
Science as a Way of Knowing Objectivity Precision Verification Empiricism
Scientific knowing is based on precise data gathered from the natural world we live in. It builds a knowledge base in a neutral, unbiased manner. It seeks to measure the world precisely. It reports findings clearly so that others can duplicate the studies. It forms its conclusions on empirical data. Let’s look at these ideals more closely.
Goal: Theories
Objectivity Human beings are complex. Personal experiences, values, backgrounds, and beliefs make objective analysis difficult unless effort is made to remain neutral. Optimists tend to see the positive in situations. Pessimists see the negative. But scientists look for objective reality — the world as it is — uncolored by personal opinion or feelings. Scientific knowing attempts to eliminate personal bias in data collection and analysis. Honest researchers take a neutral position in their studies. That is, they do not try to prove their own beliefs. They are willing to accept empirical results contrary to their own opinions or values.
Precision Reliable scientific knowing requires precise measurement. Researchers carry out experiments under controlled, narrow conditions. They carefully design instruments to be as accurate as possible. They evaluate tests for reliability and validity. They use pilot projects (trial runs of procedures) to identify sources of extraneous error in measurements. Why? Because inaccurate measurement and undefined conditions and unreliable instruments and extraneous errors produce data that is worthless. Every score has two parts: the true measure of the subject, and an unknown amount of error. We can represent this as
1- 4
© 4th ed. 2006 Dr. Rick Yount
Chapter 1
Scientific Knowing
Score = True Measure + Error
Think of two students who are equally prepared for an exam. When they arrive in class, one is completely healthy and the other has the flu. They will likely score differently on the exam. In this case, illness introduces an error term into the second student's score. When we gather data in a haphazard, disorderly way, error interferes with the true measure of the variable. Like static on a television screen, the error masks the true picture of the data. Analysis of this noisy data will provide a numerical answer which is suspect. Accurate measurement is a vital ingredient in the research process.
Verification Science analyzes world processes which are systematic and recurring. Researchers report their findings in a way that allows others to replicate their studies — to check the facts in the real world. These replications either confirm or refute the original findings. When researchers confirm earlier results, they verify the earlier findings. Research reports provide readers the background, specific problem(s) and hypotheses of studies. Also included are the populations, definitions, limitations, assumptions, as well as procedures for collecting and analyzing data. Writers do this intentionally so others can evaluate the degree that findings can be generalized, and perhaps, replicate the study.
Empiricism The root of “empiricism” (Greek, empeirikos) refers to the “employment of empirical methods, as in science,” or “derived from observation or experiment; verifiable or provable by means of observation or experiment.”1 Science uses the term to underscore the fact that it bases its knowledge on observations of specific events, not on abstract philosophizing or theologizing. These carefully devised observations of the real world form the basis of scientific knowledge. Therefore, the kinds of problems which science can deal with are testable problems. Empirical data is gathered by observation. Basic observations can be done with the naked eye and an objective checklist (see Chapter 9). But observations are also made with instruments such as an interview or questionnaire (Chapter 10), a test (Chapter 11), an attitude scale (Chapter 12), or a controlled experiment (Chapter 13). Scientific knowing cares less about philosophical reasoning than it does the rational collection and analysis of factual data relevant to the problem to be solved.
Goal: Theories The goal of scientific research is theory construction, the development of theories which explain the phenomena under study, not the mere cataloging of empirical data. The inductive process of scientific knowing begins with the specifics (collected data) and leads to the general (theories). What causes cancer? What makes it rain? How does man learn? What is the best way to relieve anxiety? What effect do children have on marital satisfaction? Most ministerial students want pragmatic answers to pragmatic problems in the ministry. In the past ten years [during the 1980’s] there have been a rash of studies "Empiricism," "empirical." The American Heritage Dictionary, 3rd ed., Version 3.0A, WordStar International, 1993.
1
© 4th ed. 2006 Dr. Rick Yount
1-5
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
relating some variable to church growth. The pragmatic question is “How do I make my church grow?” But Christian research goes deeper. It looks beyond the surface of ministry programming to the social, educational, psychological, and administrative dynamics of church life and work. Each of these areas have many theories and theorists giving advice and explanation. Are these views valid for Christian ministry? Can you modify these theories for effective use in church ministry? Seek a solid theoretical base for your proposal.
The Scientific Method Felt Difficulty Problem Literature Hypothesis Population Sample(s)
The scientific method is a step-by-step procedure for solving problems on the basis of empirical observations. Here are the major elements: 1. Begin with a “felt difficulty.” What is your interest? What questions do you want answered? How might a theory be applied in a specific ministry situation? What conflicting theories have you found? The felt difficulty is the beginning point for any study (but it has no place in the proposal).
Collect Analyze Test Interpret
2. Write a formal “Problem Statement.” The Problem establishes the focus of the study by stating the necessary variables in the study and what you plan to do with them (see Chapter 4). 3. Gather literature information. What is known? Before you plan to do a study of your own, you must learn all you can about what is already known. This is done through a literature search and results in a synthesis of recent findings on the topic (see Chap 6). 4. State hypothesis. On the basis of the literature search, write a hypothesis statement that reflects your best tentative solution to the Problem (see Chapter 4). 5. Select a target group (population). Who will provide your data? How will you find subjects for your study? Are they accessible to you? (see Chapter 7) 6. Draw one or more samples, as needed. How many samples will you need? What kind of sampling will you use? (see Chapter 7). 7. Collect data. What procedure will you use to actually collect data from the subjects? Develop a step-by-step plan to obtain all the data you need to answer your questions (see Chapters 9-13). 8. Analyze data. What statistics will you use to analyze the data? Develop a step-by-step plan to analyze the data and interpret the results (see Chapters 14-25). 9. Test the null, or statistical, hypothesis. On the basis of the statistical results, what decision do you make concerning your hypothesis? (see Chapters 16-26). 10. Interpret the results. What does the statistical decision mean in terms of your study? Translate the findings from “statistics” to English. (see Chapters 16-26)
The scientific method provides a clear procedure for empirically solving problems. In chapter 2 we introduce you to the structure of a research proposal. As you read the chapter, notice how the elements of the proposal follow the steps of the scientific method. Refer back to this outline in order to understand the links between the scien-
1- 6
© 4th ed. 2006 Dr. Rick Yount
Chapter 1
Scientific Knowing
tific method and the research proposal.
Types of Research Under the umbrella of scientific research, there are several types of studies you can do. These types differ in procedure — what they entail — and outcome — what they accomplish. Here are four major and three minor types of research from which you may choose.
Historical research analyzes the question “what was?” It studies documents and relics in order to determine the relationship of historic events and trends to present-day practice.
Primary sources A source of information is primary when it is produced by the researcher. Reports written by researchers who conduct studies are “eye witness” accounts, and are primary sources of information on the results. Other examples of primary sources are autobiographies and textbooks written by authors who conduct their own research. Use primary sources as the major source of information in the Related Literature section of your proposal. Primary sources take two forms: documents and relics. Documents. Society creates documents to expressly record events. They are objective and direct. Documents provide straightforward information. Average Bible Study attendance listed on the Annual Church Letters on file in the state convention office is more likely to be accurate than numbers given from memory by ministers of education in local churches. However, information contained in documents may be incorrect. The documents may have been falsified, or word meanings in the documents may have changed. Relics. Society creates relics simply by living. Relics are artifacts left by communities and cultures in the past. People did not create these objects to record information as is the case with documents. Therefore, information conveyed by relics requires interpretation. The historical researcher reconstructs the meaning of relics in the context of their time and place.
Secondary sources A source of information is secondary when it is a second-hand account of research. Secondary sources may take the form of summaries, news stories, encyclopedias, or textbooks written by synthesizers of research reports. While secondary sources provide the bulk of materials used in term papers, you should use them only to provide a broad view of your chosen topic. As already stated, emphasize the use of primary sources in your Synthesis of Related Literature.
Criticism The term “criticism” has a decidedly negative connotation to most of us. A critical person is one who finds fault, depreciates, or puts down someone or something. The term comes from the Greek krino, to judge. Webster defines “criticism” as the “art, skill, or profession of making discriminating judgments and evaluations, especially of literary or other artistic works."2 Criticism can therefore refer to praise as well as depreciation. A "Criticism," The American Heritage Dictionary, 3rd ed., Version 3.0A, WordStar International, 1993.
© 4th ed. 2006 Dr. Rick Yount
Descriptive Correlational
Historical Research
2
Historical
1-7
Experimental Ex Post Facto Evaluation Research/Dev
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
Christian may cringe when he hears someone speaks of using “higher criticism” to study Scripture. It sounds as if the scholar is criticizing -- berating, slandering, putting down -- the Bible. The term actually means that scholars objectively analyze language, culture, and comparative writings to determine the authenticity of the work. Who wrote Hebrews? Paul? Apollos? Peter? Scholars apply the systematic tools of content analysis and “literary criticism” to determine the answer. Criticism takes two major forms: External criticism and internal criticism. External criticism. External criticism answers the question of genuineness of the object. Is the document or relic actually what it seems to be? What evidence can we gather to affirm the authenticity of the object itself? For example, is this painting really a Rembrandt? Was this letter really written by Thomas Jefferson? External criticism focuses on the object itself. Internal criticism. Internal criticism answers the question of trustworthiness of the object. Can we believe what the document says? What ideas are being conveyed? What does the writer mean by his words, given the culture and time period he wrote? Internal criticism focuses on the object’s meaning.
Examples Historical research is not merely the collection of facts from secondary sources about an historic event or process. It is the objective interpretation of facts, in line with parallel events in history. The goal of historical research is to in explain the underlying causes of present practices. Most of the historical dissertations written by our students have focused on former deans and faculty members. Dr. Phillip H. Briggs studied the contributions of Dr. J. M. Price, Founder and Dean of the School of Religious Education.3 Dr. Robert Mathis analyzed the contributions of Dr. Joe Davis Heacock, Dean of the School of Religious Education, 1950-1973.4 Dr. Carl Burns evaluated the contributions of Dr. Leon Marsh, Professor of Foundations of Education, School of Religious Education, Southwestern Seminary, 1956-1987.5 Dr. Sophia Steibel analyzed the Life and Contributions of Dr. Leroy Ford, Professor of Foundations of Education, 1956-1984.6 Dr. Douglas Bryan evaluated the contributions of Dr. John W. Drakeford, Professor of Psychology and Counseling.7
Descriptive Research Descriptive research analyzes the question “what is?” A descriptive study collects data from one or more groups, and then analyzes it in order to describe present conditions. Much of this textbook underscores the tools of descriptive research: survey by questionnaire or interview, attitude measurement, and testing. A popular use of descriptive research is to determine whether two or more groups differ on some variable of interest. Phillip H. Briggs, “The Religious Education Philosophy of J. M. Price,” (D.R.E. diss., Southwestern Baptist Theological Seminary, 1964). 4 Robert Mathis, “A Descriptive Study of Joe Davis Heacock: Educator, Administrator, Churchman,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1984) 5 Carl Burns, “A Descriptive Study of the Life and Work of James Leon Marsh,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1991) 6 Sophia Steibel, “An Analysis of the Works and Contributions of Leroy Ford to Current Practice in Southern Baptist Curriculum Design and in Higher Edcuation of Selected Schools in Mexico,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1988) 7 Douglas Bryan, “A Descriptive Study of the Life and Wrok of John William Drakeford,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1986) 3
1- 8
© 4th ed. 2006 Dr. Rick Yount
Chapter 1
Scientific Knowing
Another application of descriptive research is whether two or more variables are related within a group. This latter type of study, while descriptive in nature, is often referred to specifically as correlational research (see the next section).
An Example The goal of descriptive research is to accurately and empirically describe differences between one or more variables in selected groups. Dr. Dan Southerland studied differences in ministerial roles and allocation of time between growing and plateaued or declining Southern Baptist churches in Florida.6 Specified roles were pastor, worship leader, organizer, administrator, preacher and teacher.7 The only role which showed significant difference between growing and non-growing churches was the amount of time spent serving as “organizer,” which included “vision casting, setting goals, leading and supervising change, motivating others to work toward a vision, and building groupness.”8
Correlational Research Correlational research is often presented as part of the descriptive family of methods. This makes sense since correlational research describes association between variables of interest in the study. It answers the question “what is” in terms of relationship among two or more variables. What is the relationship between learning style and gender? What is the relationship between counseling approach and client anxiety level? What is the relationship between social skill level and job satisfaction and effectiveness for pastors? In each of these questions we have asked about an association between two or more variables. Correlational research also includes the topics of linear and multiple regression which uses the strengths of associations to make predictions. Finally, correlational analysis includes advanced procedures like Factor Analysis, Canonical Analysis, Discriminant Analysis, and Path Analysis — all of which are beyond the scope of this course.
An Example The goal of correlational research is to establish whether relationships exist between selected variables. Dr. Robert Welch studied selected factors relating to job satisfaction in staff organizations in large Southern Baptist Churches.9 He found the most important intrinsic factors affecting job satisfaction were praise and recognition for work, performing creative work and growth in skill. The most important extrinsic factors were salary, job security, relationship with supervisor, and meeting family needs.10 Findings were drawn from 579 Southern Baptist ministers in 153 churches.11
Experimental Research Experimental research analyzes the question “what if?” Experimental studies use carefully controlled procedures to manipulate one (independent) variable, such as Dan Southerland, “A Study of the Priorities in Ministerial Roles of Pastors in Growing Florida Baptist Churches and Pastors in Plateaued or Declining Florida Baptist Churches,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1993) 7 8 Ibid., 1 Ibid., 2 9 Robert Horton Welch, “A Study of Selected Factors Related to Job Satisfaction in the Staff Organizations of Large Southern Baptist Churches,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1990) 10 11 Ibid., 2 Ibid., 61 6
© 4th ed. 2006 Dr. Rick Yount
1-9
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
Teaching Approach, and measure its effect on other (dependent) variables, such as Student Attitude and Achievement. Manipulation is the distinguishing element in experimental research. Experimental researchers don’t simply observe what is. They manipulate variables and set conditions in order to design the framework for their observations. What would be the difference in test anxiety across three different types of tests? Which of three language training programs is most effective in teaching foreign languages to mission volunteers? What is the difference between Counseling Approach I and Counseling Approach II in reducing marital conflict? In each of these questions we find a researcher introducing a treatment (type of test, training program, counseling approach) and measuring an effect. Experimental Research is the only type which can establish cause-and-effect relationships between independent and dependent variables. See Chapter 13 for examples of experimental designs.
An Example The goal of experimental research is to establish cause-effect relationships between independent and dependent variables. Dr. Daryl Eldridge analyzed the effect of knowledge of course objectives on student achievement in and attitude toward the course.12 He found knowledge of instructional objectives produced significantly higher scores on the Unit I exam (mid-range cognitive outcomes) but not on the Unit III exam (knowledge outcomes). Knowledge of objectives did produce significantly higher scores on the postcourse attitude inventory.13
Ex Post Facto Research Ex Post Facto (which translates into English as “after the fact”) research is similar to experimental research in that it answers the question, “what if?” But in ex post facto designs, nature — not the researcher — manipulates the independent variable. In studying the effects of brain damage on the attitudes of children toward God, it would be immoral and unethical to randomly select two groups of children, brain damage one of them, and then test for differences! But in an ex post facto approach the researcher defines two populations: normal children and brain-damaged children. Nature has applied the treatment of brain damage. The experiment is done “after the fact” of the brain damaged condition. Experimental studies involving juvenile delinquency, AIDS, cancer, criminal or immoral behavior and the like all require an Ex Post Facto approach.
An Example The goal of ex post facto research is to establish cause-and-effect relationships between independent and dependent variables “after the fact” of manipulation. An example of Ex Post Facto research would be “An Analysis of the Difference in Social Skills and Interpersonal Relationships Between Congenitally Deaf and Hearing College Students.” Congenital deafness in this case is the treatment already applied by nature.
Evaluation Evaluation is the systematic appraisal of a program or product to determine if it is accomplishing what it proposes to do. It is the application of the scientific method to the Daryl Roger Eldridge, “The Effect of Student Knowledge of Behavioral Objectives on Achievement and Attitude Toward the Course,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1985) 13 Ibid., 2 12
1- 10
© 4th ed. 2006 Dr. Rick Yount
Chapter 1
Scientific Knowing
practical worlds of educational and administrative programming. Specialists commend to us a variety of programs designed to solve problems. Depending upon the degree of personal involvement of these specialists with the programs, these commendations may contain more word magic than substance. Does a program do what it's supposed to do? The danger in choosing an evaluation type study for dissertation research is the political ramifications which come if the evaluation proves embarrassing to the church or agency conducting the program. Program leaders may not appreciate negative evaluations and apply pressure to modify results. This distorts the research process. Suppose you choose to evaluate a new counselor orientation program at a highly visible counseling network — and you find the program substandard. Will this impact your ability to work with this agency as a counselor? Or suppose you want to compare Continuous Witness Training (CWT) with Evangelism Explosion (EE) as a witness training program. What are the implications of your finding one program much better than the other?
An Example The goal of evaluation research is to objectively measure the performance of an existing program in accordance with its stated purpose. An example of this type of study would be “A Critical Analysis of Spiritual Formation Groups of First Year Students at Southwestern Baptist Theological Seminary.” Program outcomes are measured against program objectives to determine if Spiritual Formation Groups accomplish their purpose.
Research and Development Research and Development (“R&D”) is the application of the scientific method in creating a new product: a standardized test, or program, or technique. R&D is a cyclical process in which developers (1) state the objectives and performance levels of the product, (2) develop the product, (3) measure the results of the product's performance, and (4) (if the results of the treatment do not meet the stated levels) revise the materials for further testing. “Cyclical process” means that the materials are revised and tested until they perform according to the standards set at the beginning of the product's development.
An Example The goal of research and development is the production of a new product which performs according to specified standards. Dr. Brad Waggoner developed an instrument to measure “the degree to which a given church member manifests the functional characteristics of a disciple.”14 Two pilot tests using this original instrument produced Cronbach Alpha reliability coefficients of 0.9745 and 0.9618, demonstrating its ability to produce a reliable measurement of a church member's functional characteristics of a disciple.15 In 1998, this instrument was incorporated into MasterLife materials produced by LifeWay Christian Resources (SBC).16
14 Brad J. Waggoner, “The Development of an Instrument for Measuring and Evaluating the Discipleship Base of Southern Baptist Churches,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1991) 15 Ibid., 118 16 Report of joint development between Lifeway and the International Mission Board (SBC) at the 1998 Meeting of the Southern Baptist Research Fellowship.
© 4th ed. 2006 Dr. Rick Yount
1-11
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
Qualitative Research In 1979 the faculty of the School of Religious Education at Southwestern created a teaching position for research and statistics. Their desire was for this position to give emphasis to helping students understand research methods and procedures for statistical analysis. It was further the desire that doctoral research become more objective and scientific, less philosophical and historical. In 1981, after two years of interviews and discussions, Religious Education faculty voted, and the president approved, my election to their faculty to provide this emphasis. This textbook, and the dissertation examples it contains, are products of 25 years of emphasis on descriptive, correlational and experimental research -- most of which is quantitative or statistical in nature. In recent years interest has grown in research methods which focus more on the issue of quality than quantity. A qualitative study is an inquiry process of understanding a social or human problem, based on building a complex, holistic picture, formed with words, reporting detailed views of informants, and conducted in a natural setting.17 Dr. Don Ratcliff, in a 1999 seminar for Southwestern doctoral students, suggested the following as the most common qualitative research designs: ethnography, field study, community study, biographical study, historical study, case study, survey study, observation study, grounded theory and any combination of the above.18 Grounded theory is a popular choice of qualitative researchers. It originated in the field of sociology and calls for the researcher to live in and interact with the culture or people being studied. The researcher attempts to derive a theory by using multiple stages of data collection, along with the process of refining and inter-relating categories of information.19 Qualitative research is subjective, open-ended, evolving and relies on the ability of the research to reason and logically explain relationships and differences. Dr. Marcia McQuitty, Professor of Childhood Education in our school, has become our resident expert in qualitative designs. I continue to focus on quantitative research, which is, in comparison, objective, close-ended (once problem and hypothesis is established), structured and relies on the ability of the research to gather and statistically analyze valid and reliable data to explain relationships and differences.
Faith and Science What is the relationship between faith and science? Are faith and science enemies? Can ministers use scientific methodology to study the creation and retain a fervent faith in the Creator. I believe we can -- and should. But care must be taken to consciously mark out the boundaries of each. There is a difference between faith-knowing and scientific knowing, and that difference sometimes explodes into conflict — a conflict fueled by both sides. First we'll look at the suspicion of science by the faithful. Then we'll consider the suspicion of religion by scientists.
Suspicion of Science By the Faithful Anselm wrote in the 10th century, “I believe so that I may understand.” In other words, commitment and faith are essential elements in gaining spiritual understanding. Randy Covington, “An Investigation into the Administrative Structure and Polity Practiced by the Union of Evangelical Christians - Baptists of Russia,” (Ph.D. proposal, Southwestern Baptist Theological Seminary, 1999), 20 paraphrasing John W. Creswell, Research Design: Qualitative and Quantitative Approaches (Thousand Oaks, CA: Sage Publications, ,1994), 1-2 18 Ibid., 25 quoting doctoral conference notes from meeting with Ratliff, Southwestern, April 24, 1999 19 Ibid., 26 paraphrasing Creswell, 12 17
1- 12
© 4th ed. 2006 Dr. Rick Yount
Chapter 1
Scientific Knowing
His words reflect Jesus' teaching that He gives understanding to those who follow Him (Mt. 11:29; 16:24). Blaise Pascal wrote in the 17th century, “The heart has reasons which are unknown to reason.... It is the heart which is aware of God and not reason. That is what faith is: God perceived intuitively by the heart, not by reason.” The truth of Christ comes by living it out, by risking our lives on Him, by doing the Word. We grow in our knowledge of God through personal experience as we follow Him and work with Him. We believe in order to understand spiritual realities. This approach to knowing is private and subjective. Such belief-knowing resents an anti-supernatural skepticism of openminded inquiry. More than that, some scientists consider the scientific method to be their religion. Their “belief in evolution” may be a justification for their unbelief in God. Science is helpful in learning about our world, but it makes a poor religion. So the faithful view science and its adherents with suspicion. Sometimes, however, the suspicion of science by the religious has less to do with faith than it does political power. In the Middle Ages, the accepted view of the universe was geocentric (“earth-center”). The moon, the planets, the sun (located between between Venus and Mars) and the stars were believed to rotate about the earth in perfect circles. This view had three foundations: science, philosophy and the Church. Greek science ( Ptolemy) and Greek philosophy (Aristotle) supported a geocentric view of the universe. The logic was rock solid for centuries: Man is the pinnacle of creation. Therefore, the earth must be the center of the universe. The Roman Catholic Church taught that the geocentric view was Scriptural, based on Joshua 10:12-13. “Joshua said to the LORD in the presence of Israel: 'O sun, stand still over Gibeon, O moon, over the Valley of Aijalon.' So the sun stood still, and the moon stopped, till the nation avenged itself on its enemies, as it is written in the Book of Jashar. The sun stopped in the middle of the sky and delayed going down about a full day.” For the sun and moon to stand still, the Church fathers reasoned, they would have to be circling the earth. Then several scientists began their skeptical work of actually observing the movements of the planets and stars. Copernicus, a Polish astronomer, created a 15th century revolution in astronomy when he published his heliocentric (“sun-center”) theory of the solar system. He theorized, on the basis of his observations and calculations, that the earth and its sister planets revolved around the sun in perfect (Aristotelian) circles. Keplar later demonstrated that the solar system was indeed heliocentric, but that the planets, including earth, orbited the sun in elliptical, not circular, paths. The Roman Catholic Church attacked their views because they displaced earth from its position of privilege, and opened the door to doubt in other areas. But Poland is a long way from Rome (it was especially so in the 15th century!), and so Copernicus and Keplar remained outside the Church's reach. Galileo is the father of modern Physics and did his work in Italy in the 16th and 17th centuries. He studied the work of Copernicus and Keplar, and built a telescope in order to more closely observe the planets. In 1632, he published the book Dialogue Concerning the Two Chief World Systems: Ptolemaic and Copernican, in which he supported a heliocentric view of the solar system. He was immediately attacked by Church authorities who continued to espouse a geocentric world view. Professors at the University of Florence refused to look through Galileo's telescope: they did not believe his theory, so they refused to observe. Very unscientific! Galileo, under threat of being burned at the stake, recanted his findings. It was not until October 1992 that the Roman © 4th ed. 2006 Dr. Rick Yount
1-13
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
Catholic Church officially overturned the decision against Galileo's book and agreed that he had indeed been right. Science questions, observes, and seeks to learn how the world works. Sometimes this process collides with the vested interests of dogmatic religious leaders.
Suspicion of Religion By the Scientific Science is meticulous in its focus on the rational structure of the universe. Scientists look with suspicion at the simple faith of believers who glibly say “I don't know how, but I just know God did it.” Such a statement reflects mental laziness. How does the world work? What can we learn of the processes?
There Need Be No Conflict Many of the European men and women who pioneered science were motivated by the Reformation and their new found faith to discover all they could about God's creation. Stephen Hales, English founder of the science of plant physiology, wrote (1727), “Since we are assured that the all-wise Creator has observed the most exact proportions of number, weight and measure in the make of all things, the most likely way therefore to get any insight into the nature of those parts of the Creation which come within our observation must in all reason be to number, weigh and measure.”20 Hales’ commitment to scientific methodology in no way compromised his faith in the “all-wise Creator.” Nor did his faith undermine his scientific precision. Still, the skeptical neutrality of science often collides with the perspective of faith, acceptance and obedience. When I was in the sixth grade, our science class began a unit on the water cycle. I had always believed that “God sent the rain to water the flowers and trees,” because that's what mom told me (authoritative knowing) when I asked her why it rained. Now, before my very eyes was a chart showing a mountain and a river, and an ocean and a cloud. Carefully the teacher explained the diagram. “Water vapor evaporates from the ocean and forms a cloud. The wind blows the cloud to the mountain, where water condenses in the form of rain. The rain collects and forms a river which flows back into the ocean. This is the water cycle.” I can vividly remember my confusion and fear — where was God in the water cycle? My dad helped when he got home that night. “Well, the water cycle certainly explains the mechanical process of evaporation and condensation, but Who do you think designed the water cycle?” My confusion was gone. My faith was strengthened — though less simplistic and naive than it had been before (“If God sends rain to water the plants, why doesn't He send some to the areas of drought, where people are starving to death?”). And, I had learned something about how the world works that I hadn't even thought about before. The faithful should not use “faith” as a cop out for mental laziness. And so, faith focuses on the supernatural and subjectively sees with the heart’s eye that which is unseen by the natural eye. Scripture, the Objective Anchor of our subjective experiences, is a record of personal experiences with God through the ages. Faith focuses on the Creator. Science focuses on the natural and objectively gathers data on repeatable phenom20
“Stephen Hales,” The Columbia Dictionary of Quotations is licensed by Microsoft Bookshelf from Columbia University Press. Copyright © 1993 by Columbia University Press. All rights reserved.
1- 14
© 4th ed. 2006 Dr. Rick Yount
Chapter 1
Scientific Knowing
ena, the machinery, so we may better understand how the world works. Science focuses on the creation. There need be no conflict between giving your heart to the Lord and giving your mind to the logical pursuit of natural truth.
Summary In this chapter we looked at six ways of knowing. We discussed specifically how scientific knowing differs from the other five. We introduced you to the scientific method, as well as seven types of research. Finally, we made a brief comparison of faith-knowing and science-knowing.
Vocabulary authority common sense control of bias correlational research deductive reasoning descriptive research empiricism evaluation ex post facto research experience experimental research external criticism historical research inductive reasoning internal criticism intuition/revelation precision primary sources research and development scientific method secondary sources theory construction verification
knowledge based on expert testimony cultural or familial knowledge, local maintaining neutrality in gaining knowledge analyzing relationships among variables from principle (general) to particulars (specifics) analyzing specified variables in select populations basing knowledge on observations analyzing existing programs according to set criteria analyzing effects of independent variables “after the fact” knowledge gained by trial and error determining cause and effect relationships between treatment and outcome determining the authenticity of a document or relic analyzing variables and trends from the past from particulars (specific) to principles (general) determining the meaning of a document or relic knowledge discovered from within striving for accurate measurement materials written by researchers themselves (e.g. journal articles) creating new materials according to set criteria objective procedure for gaining knowledge about the world materials written by analysts of research (e.g. books about) converting research data into usable principles replicating (re-doing) studies under varying conditions to test findings
Study Questions 1. Define in your own words six ways we gain knowledge. Give an original example of each. 2. Define “science” as a way of knowing. 3. Compare and contrast “faith” and “science” as ways of knowing for the Christian. 4. Define in your own words five characteristics of the scientific method. 5. Define in your own words eight types of research.
© 4th ed. 2006 Dr. Rick Yount
1-15
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
Sample Test Questions Answer Key Provided for Sample Test Questions. See Appendix A1
1. Learning by trial and error is most closely related to A. deductive reasoning B. intuition C. common sense D. experience
Affectionate Warning: Memorizing right answers is not enough to understand research and statistics. Be sure you understand why the right answer is right.
2. Inductive logic is best described by A. particulars drawn from general principles B. general principles derived from a collection of particulars C. particulars established through reasoning D. general principles grounded in authoritative knowledge
3. Match the type of research with the project by writing the letter below in the appropriate numbered blank line. Historical Experimental Research & Development
Descriptive Ex Post Facto Qualitative
Correlational Evvaluation
____ An Analysis of Church Staff Job Satisfaction by Selected Pastors and Staff Ministers ____ Differentiating Between the Effects of Testing and Review on Retention ____ The Effect of Seminary Training on Specified Attitudes of Ministers ____ An Analysis of the Differences in Cognitive Achievement Between Two Specified Teaching Approaches ____ Determining the Relationship Between Hours Wives Work Outside the Home and the Couples’ Marital Satisfaction scores ____ The Church’s Role in Faith Development in Children as Perceived by Pastors and Teachers of Preschoolers ____ The Relationship Between Study Habits and Self Concept in Baptist College Freshmen ____ The Life and Ministry of Joe Davis Heacock, Dean of the School of Religious Education, 1953-1970 ____ Church Life Around the Conference Table: An Observational Analysis of Interpersonal Relationships, Communication, and Power in the Staff Meetings of a Large Church ____ An Analysis of the Relationship Between Personality Trait and Level of Group Member Conflict... ____ The Role of Woman’s Missionary Union in Shaping Southern Baptists’ View of Missions ____ The Effectiveness of the CWT Training Program in Developing Witnessing Skills ____ Determining the Effect of Divorce on Men’s Attitudes Toward Church ____ A Learning System for Training Church Council Members in Planning Skills
1- 16
© 4th ed. 2006 Dr. Rick Yount
Chapter 1
Scientific Knowing
____ A Multiple Regression Model of Marital Satisfaction of Southwestern Students ____ The Effect of Student Knowledge of Objectives on Academic Achievement ____ A Study of Parent Education Levels as They Relate to Academic Achievement Among Home Schooled Children ____ A Critical Comparison of Three Specified Approaches to Teaching the Cognitive Content of the Doctrine of the Trinity to Volunteer Adult Learners in a Local Church ____ Curriculum Preferences of Selected Latin American Baptist Pastors ____ A Study of Reading Comprehension of Older Children Using Selected Bible Translations
© 4th ed. 2006 Dr. Rick Yount
1-17
Research Design and Statistical Analysis in Christian Ministry
1- 18
I: Research Fundamentals
© 4th ed. 2006 Dr. Rick Yount
Chapter 2
Proposal Organization
2 Proposal Organization Front Matter The Introduction The Method The Analysis Reference Material
The research proposal is a concise, clearly organized plan of attack for analyzing formal research problems. The beginning point in developing a proposal — itself not a part of the final product — is the “felt difficulty.” Hopefully, as you have read textbooks and journal articles, as you have listened to lectures and participated in discussion, you have been attracted to specific issues and concerns in your field. Perhaps there have been questions that remain unanswered, problems which remain unsolved, or conflicts which remain unresolved. These issues, your felt difficulties, hold the beginning point for your research proposal. The first step toward an objective study of your felt difficulty is the choice of a topic. Consider a topic which has the potential to make a contribution to theory or practice in your chosen field. Afterall, a dissertation will consume large quantities of your time, your money, and your very self. Worthwhile topics can be discovered by browsing the indexes of information databases such as the Educational Resources Information Center (E.R.I.C.) or Psychological Abstracts (For detailed suggestions, see Chapter 6, “Synthesis of Related Literature”). This search, whether done manually or by computer, can provide useful information for confirming or abandoning a research topic. Once a topic has been determined, it must be translated, step by step, into a clear statement of a solvable problem and a systematic procedure for collecting and analyzing data. We begin that translation process in this chapter by providing a structural blueprint, as well as definitions of each proposal element, for the proposal you will eventually develop. The following structural overview gives you a framework for organizing your own proposal. Each element listed in the structural overview is defined. Study these element until you can see the structure of the whole. © 4th ed. 2006 Dr. Rick Yount
2-1
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
Front Matter
Proposal Overview
Title Page Contents Tables Illistrations
Front Matter Title Page Table of Contents List of Tables List of Illustrations INTRODUCTION Introductory Statement Statement of the Problem Purpose of the Study Synthesis of Related Literature Significance of the Study Statement of the Hypothesis METHOD Population Sampling Instrument Limitations Assumptions Definitions Design Procedure for Collecting Data ANALYSIS Procedure for Analyzing Data Testing the Hypotheses Reporting the Data Reference Materials Appendices Bibliography
2-2
Title Page The coversheet for the proposal contains basic information for the reader. You will list on this page your school name, the proposal title, your major department, your name and the date the proposal is submitted. The title of your proposal should provide sufficient information to permit your readers to make an intelligent judgment about the topic and type of study you’re proposing to do. Your doctoral dissertation will be cataloged in Dissertation Abstracts upon graduation, so a clear title will attract more readers to your work.
Table of Contents The Table of Contents lists the major headings and subheadings and their respective page numbers within the proposal. Suggestion: organize your proposal (and simplify the writing of the Table of Contents) using a three-ring binder with dividers for each section and element of the proposal. As you work on each section, file your materials in proper order in the binder.
List of Tables As you write your dissertation, you will want to augment your written explanations with visual representations of the data. One form of presentation is the “table,” which displays the data tabular form — rows and columns of figures — which enhances, clarifies, and reinforces the verbal narrative. The List of Tables lists each table by name and page number. Let me suggest that you consider carefully the tables you will need to use to display your data and include a sample of each planned table in your proposal. Doing this shows that you have given adequate consideration to the forms your data will take. © 4th ed. 2006 Dr. Rick Yount
Chapter 2
Proposal Organization
List of Illustrations An illustration is a graph, chart, or picture that enhances visually the meaning of what you write. The List of Illustrations lists each illustration by caption and page number.
Introduction The introduction section includes the introductory statement, the statement of the problem, the purpose of the study, the synthesis of related literature, the significance of the study, and the hypothesis. The purpose of the introduction is to demonstrate the thoroughness of your preparation for doing the study. This section explains to others, like the Advanced Studies Committee for instance, why you want to do this study. It further demonstrates how well you understand your specific field.
Introductory Statement Problem Purpose Synthesis Significance Hypothesis
The Introductory Statement The proposal begins with an introductory statement, usually several pages in length, which leads like a funnel from a broad view of your topic to the specific Statement of the Problem. It provides readers of the proposal your rationale, based on published sources, for doing the study. For example, if I wanted to study priority research needs in religious education in Southern Baptist churches, I might organize my introductory statement in nine paragraphs as follows: Teaching in Jesus’ ministry Teaching in the early church The Sunday School movement of the past century Seminaries and Religious Education Southwestern Baptist Theological Seminary The School of Religious Education Doctoral degrees in the School of Religious Education Sources of problems for dissertation research The need to establish research priorities in a given field It is not necessary to begin with the Bible as I have done in my example. A study of cognitive counseling theories might begin with Gestalt psychology in the 1920’s. Behavioral approaches to therapy might begin with B. F. Skinner in the 1950's. The point is to begin with a broad view of the field you’re studying, and then narrow the focus to the point of the Problem Statement. Notice that my sample introductory statement outline begins with a broad overview of the field of “the teaching of Jesus” and ends with the specific point of “research needs in religious education.” Use objective language in writing the introductory statement. Document every statement. Do not include the personal feelings, experiences, or opinions which inspired your proposal. It simply isn't appropriate to say “I had a bad experience with XYZ one time and wonder what might happen if...”.
The Statement of the Problem The Problem Statement, usually no more than a single sentence, is the most important part of the whole proposal. It identifies the variables you plan to study as well as the type of study you intend to do. All other parts of the proposal grow out of the © 4th ed. 2006 Dr. Rick Yount
2-3
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
Problem Statement. Just as an instructional objective provides the framework for lesson planning, so the Problem refects the very heart of the study. For example, look at the following Problem Statements from the dissertations of Drs. Marcia McQuitty and Norma Hedin: The problem of this study [will be] to determine the relationship between the dominant management style and selected variables of full-time ministers of preschool and childhood education in Southern Baptist churches in Texas. The selected variables [are] level of education, years of service on church staffs, task preference, gender, and age.1 The problem of this study [will be] to determine the differences in measured self-concept of children in selected Texas churches across three variables: school type (home school, Christian school, and public school), grade (fourth, fifth, and sixth), and gender.2
See Chapter Four for more information on writing a Problem Statement.
Purpose of the Study The Purpose of the Study section expands the Problem statement and describes in more detail the intention of the study. Use verbs like “to determine,” “to ascertain,” “to evaluate,” “to discover.” A listing of purposes for Dr. McQuitty's Problem Statement above reads this way: “The purposes of this study [will be] to determine: 1. the dominant management style of full-time preschool and children's ministers in Southern Baptist churches in Texas 2. the relationship between the dominant management style and selected variables of level of education, years of service on church staffs, task preference, gender, and age 3. areas of strengths and weaknesses in management style which could be addressed by additional printed material, professional development seminars, and the addition or restructuring of seminary class content for preschool and children's ministers.”3
Notice that the list of Purpose statements comes directly out of the Problem Statement, and yet expands each component of it.
Synthesis of Related Literature Part of the proposal-writing process involves library research. Preliminary sources such as literature indexes (“Dissertation Abstracts”) and key word thesauri (the “E.R.I.C. Thesaurus”) provide a doorway into millions of research articles. Use these resources to locate recent journal reports and dissertations related to your subject. Analyze these sources and condense the information into a clearly organized narrative. The purpose of the literature search is to establish a solid foundation for your study as well as prepare you to conduct the study. The Synthesis provides a backdrop for your 1 Marcia G. McQuitty, “A Study of the Relationship Between Dominant Management Style and Selected Variables of Preschool and Children's Ministers in Texas Southern Baptist Churches,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1992), 5. Tenses changed from dissertation past tense to proposal future. 2 Norma Sanders Hedin, “A Study of the Self-Concept of Older Children in Selected Texas Churches Who Attend Home Schools as Compared to Older Children Who Attend Christian Schools and Public Schools,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1990), 6. Tenses changed from dissertation past tense to proposal future. 3 McQuitty, 5-6
2-4
© 4th ed. 2006 Dr. Rick Yount
Chapter 2
Proposal Organization
study. It details what others are doing in the field, what methods are being used, and what results have been obtained in recent years. A synthesis is different from a summary. In a summary, articles relating to a subject are outlined and then written up one after another. Let's say we have three articles. Article 1 contains discoveries A, B, and D. Article 2 contains discoveries A, B, and C. Article 3 contains discoveries A and C. A summary would look like this: Article 1 found A, B, and D. Article 2 found A, B, and C. Article 3 found A and C.
This makes for lifeless writing and boring reading. It also fails to uncover the groupings of discoveries across all the articles. A synthesis, however focuses on key words and discoveries across many articles and combines the various research articles' findings. The focus is on the research discovery-clusters, not on individual articles. Look at the following rewrite: Three researches found A (1,2,3). Two researchers found B (1,2), and two researchers found C (2,3).
This approach helps you discover linkages among researchers and makes for much more interesting reading. I've used three articles as an example, but a dissertation study will involve scores of them! When I was doing library research on my last doctorate, I found over a hundred research reports relating to my subject. In these reports, statisticians argued about “proper procedures” on the basis of a particular kind of error rate. As I analyzed the articles, I found that the researchers could be put into three camps. These camps, and the comparison of their views of various statistical issues, formed the organizational structure for my Related Literature section. I condensed ninety-two journal articles into fifteen pages of synthesis using over 30 key words. I remember my grandfather gathering the sap from maple trees to boil down into syrup. It frequently required over 100 gallons of sap to produce a gallon of syrup. This same process applies to the preparation of the Synthesis of Related Literature. Dr. Rollie Gill provides an example of synthetic writing in his dissertation on leadership styles.4 Outside research on Situational Leadership has questioned the validity and reliability of the "theory."127 See Chapter Six for more information on synthesizing literature.
Significance of the Study The Significance of the Study section explains why, on the basis of the research literature, your study is worth doing. What makes your study important to your field? 127 Blank et al., “A Test of the Situational Leadership Theory,” 579-96; Goodson et al., “Situational Leadership Theory,”446-60; Norris and Vecchio, “Situational Leadership Theory,” 331-41; Vecchio, “Situational Leadership Theory,” 444-50; and Harold Ellwood Wiggin, Jr., “A Meta-Analysis of Hersey and Blanchard's Situational Leadership Theory,” (Ph. D. diss., Florida Atlantic University, 1991), in Dissertation Abstracts International, 52 (June 1992): 4488-A. 4 Rollie Gill, “A Study of Leadership Styles of Pastors and Ministers of Education in Large Southern Baptist Churches,” (Ph.D. diss., Southwestern Baptist Theological Seminary, 1997), 27-28
© 4th ed. 2006 Dr. Rick Yount
2-5
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
What tangible contribution will it make? In short, it answers the so-what question. “You want to study something. You find what you expect. So what?!” The personal interest of the student or his/her major professor is not sufficient rationale for approving a proposal. The best rationale is a reference to one or more research studies stating the need for what you propose to do. Dr. Dean Paret wrote an effective statement of significance for his study on healthy family functioning:5 This study [will be] significant in that: 1. It provides empirical data for the relationship between family of origin in terms of autonomy and intimacy roles that were adapted and the current family healthy functioning patterns. Empirical validation has been called for by Hoverstadt et al.118 to support the theoretical assumptions upon which family therapy techniques are based. 2. It provides empirical data for breaking the recurrent cycle perpetuating the adult child syndrome.119 3. It provides a basis for the development of specific parenting training for the ministry of the church. 4. It provides helpful information for the seminary to aide [sic] the students who are having a difficult time juggling married life and student life, by providing indicators of stress areas related to autonomy and intimacy. According to Dr. David McQuitty, Director of Student Aid, the seminary through his office sees an increase in problems encountered by students as their seminary journey increases, both in financial stress, and student stresses, that could possibly be related to issues brought forward from the family of origin.120 It is therefore necessary to provide empirical data to help in breaking down the dysfunctional patterns of interaction. 118 120
119 Hoverstadt, et al., 287 and 296 Fine and Jennings, 14 Conversation with Dr. McQuitty on August 18, 1990
Just before my Proposal Defense, I made one last trip to the North Texas Science library. On that trip, I found a reference to a speech made two years earlier. Looking up the speech, I found a gold mine! The writers had analyzed many of the procedures I was studying. Their conclusion was to call for a computer analysis of several of the most popular procedures. It was the focus of my study! I added this recommendation to my “significance” section. It provided a solid rationale for my study when I defended it before my Proposal Committee.
The Hypothesis The Statement of the Problem describes the heart of your study in one or two succinct sentences. The Statement of the (research) Hypothesis describes the expected outcome of your study. Base the thrust of your hypothesis on the synthesis of literature. Use the Problem Statement as the basis for the format of the hypothesis. Look at this Problem-Hypothesis pair from the dissertation of Dr. Joan Havens: “The problem of this study [is] to determine the difference in level of academic achievement across four populations of Christian home schooled children in Texas: those whose parents possessed (1) teacher certification, (2) a college degree, but no certification, (3) two or more years of college, or (4) a high school diploma or less.”6 Dean Kevin Paret, “A Study of the Perceived Family of Origin Health as It Relates to the Current Nuclear Family in Selected Married Couples,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1991), 36-37 5
2-6
© 4th ed. 2006 Dr. Rick Yount
Chapter 2
Proposal Organization
[One of the hypotheses of this study is that there will] “be no significant difference in levels of academic achievement in home schooled children across the four populations surveyed.”7
Or another, from the dissertation of Dr. Don Clark, who did an analysis of the statistical power levels of “dissertations hypothesizing differences” written here in the School of Educational Ministries at Southwestern since 1981.8 The problem of this study [will be] to determine the difference in power of the statistical test between selected dissertations' hypotheses proven statistically significant and those selected dissertations' hypotheses not proven statistically significant in the School of Religious Education at Southwestern Baptist Theological Seminary.9 The hypothesis of this study [is] that power of the statistical test will be significantly higher in those dissertations' hypotheses finding statistically significant results than those. . .not finding statistically significant results.10
The Problem poses the question to be answered; the hypothesis presents the expected answer. The research hypothesis must be stated in measurable terms and should indicate, at least generally, the kind of statistic you'll use to test it. See Chapter Four for more information on writing the Hypothesis Statement.
Method The METHOD section contains a detailed blueprint of your planned procedures. It specifically explains how you will collect the necessary data to analyze the variables you’ve chosen in a clear step-by-step fashion. This section includes the following components: population, sampling, instrument, limitations, assumptions, definitions, design, and collecting data.
Population Sampling Instrument Limitations Assumptions Definitions
Population
Design
The Population section of the proposal specifies the largest group to which your study's results can be applied. Any samples used in the study (see below) must be drawn from defined one or more populations. Here is Dr. Da Silva's population: The population for this study [will consist] of social work administrators in Texas who [are] members of the National Association of Social Workers. According to the mailing list of May 21, 1992, there [are] five hundred and seventy-eight administrators from the state of Texas.11
Here is Dr. Clark's population: Joan Ellen Havens, “A Study of Parent Education Levels as They Relate to Academic Achievement Among Home Schooled Children,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1991), 7. Tenses changed from dissertation past tense to proposal future. 7 Ibid., 10 8 Don Clark, “Statistical Power as a Contributing Factor Affecting Significance Among Dissertations in the School of Religious Education at Southwestern Baptist Theological Seminary,” (Ph.D. diss., Southwestern Baptist Theological Seminary, 1996) 9 10 Ibid., 5 Ibid., 30 11 Maria Bernadete Da Silva, “A Study of the Relationship Between Leadership Styles and Selected Social Work Values of Social Work Administrators in Texas,”(Ed.D. diss., Southwestern Baptist Theological Seminary, 1993), 7. Tenses changed from dissertation past tense to proposal future. 6
© 4th ed. 2006 Dr. Rick Yount
2-7
Collecting Data
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
The population of this study [will consist] of all hypotheses from Ed.D. and Ph.D. dissertations completed within the School of Religious Education at Southwestern Baptist Theological Seminary which met four criteria: 1. The hypothesis was included within a dissertation completed between May 1978 and May 1996. 2. The hypothesis tested differences between groups as opposed to relationhips between variables. 3. The hypothesis was tested statistically by means of t-Test for Difference Between Means, One-way ANOVA, Two Factor ANOVA, or Three factor ANOVA. 4. Statistical significance was determined solely upon meeting a singular criteria, that being a single statistical test.12
See Chapter Seven for more information.
Sampling The Sampling section describes how you will draw one or more samples from the population or populations defined above. It also explains how many subjects you intend to study in these samples. Here are examples of sampling statements based on the populations we defined above. A twenty-five percent random sample [will be] obtained from the mailing list of the National Association of Social Workers in the State of Texas. The sample [is] estimated to consist of 144 subjects.13 A simple random sample of hypotheses [will be] conducted to produce two equal groups of fifty hypotheses: hypotheses proven statistically significant (Group X) and hypotheses not proven significant (Group Y). . . .14
See Chapter Seven for more information.
Instrument The Instrument section describes the tools you plan to use in measuring subjects. “Instruments” includes tests, scales, questionnaires and interview guides, observation checklists, and the like). If you choose an existing instrument appropriate for your study, then describe its development, use, reliability and validity. If you cannot find a suitable instrument, you will need to develop your own. Provide a step by step explanation of the procedure you will use to develop, evaluate, and validate the instrument. Here is a portion of Dr. Hedin's “instrument” section: The instrument selected for this study [is] the Piers-Harris Children's Self-Concept Scale (The Way I Feel About Myself), developed by Ellen V. Piers and Dale B. Harris in 1969. . .Answers are keyed to high self-concept; thus, a higher total score [indicates] a positive concept of self. . .Reliability coefficients ranging from .88 to .93, based on Kuder-Richardson and SpearmanBrown formulas, were reported for various samples29 . . . Content validity was built into the scale by using children's statements about themselves as the universe to be measured as selfconcept. By writing items pertaining to that universe of statements, the authors defined selfconcept for their scale31 . . . An attempt was made to establish construct validity during the initial standardization study. The PHCSCS scale was administered to eighty-eight adolescent institutionalized retarded females. As predicted by Piers and Harris, these girls scored signifi12
2-8
Clark, 30-31
13
Da Silva, 7
14
Clark, 31
© 4th ed. 2006 Dr. Rick Yount
Chapter 2
Proposal Organization
cantly lower than normals of the same chronological or mental age. This was interpreted as meaning that the PHCSCS did measure self-concept and discriminated between high and low self-concept.32
Dr. Wes Black developed his own instrument: No standardized instrument was found to be applicable to this study. It is therefore necessary to devise such an instrument . . .thirteen experts received the questionnaire for their evaluation. The learning objectives from the “Youth Discipleship Taxonomy” were arranged in random order under each of the five areas of Church Training task assignment. . .The experts were asked to select ten items most appropriate for inclusion in a questionnaire on learning objectives for youth discipleship training from each of the five task areas and rank order their choices from one (highest) to ten (lowest) in each area. Responses from the experts were checked for completeness and correctness. The rankings were reversed scored (a ranking of one received ten points; ranking of two received nine points; and so forth) and scores totalled for each item on the taxonomy. Ten items in each of the five areas resulted in clear choices of the experts to be included in the instrument for this study. Table 1 [will provide] a summary of the experts. Appendix B lists the experts. The results of the content validity study [will be located]. . . in appendix C.15
See Chapters Nine, Ten and Eleven for more information on developing instruments..
Limitations The Limitations section describes external restrictions that reduce your ability to generalize of your findings. An external restriction is one that is beyond your control. Let's say you plan to randomly assign students in a local high school to one of three experimental teaching groups. When you check with the principal, he allows you to do the experiment, but only if you use the regular classes of students — he does not want you disrupting classes through random assignment. Since random assignment is an important part of experimental design, this is a limitation to your study and must be stated in this section. Limitations differ from delimitations. Delimitations are restrictions you set on your study. The fact that you decide to study single adults ages 20-50 is a delimitation of your study, not a limitation. Choosing to study only 6 of the 16 scales of the 16PF Test is a delimitation, because you make that decision on your own. Limitations are external restrictions and belong in this section. Delimitations are personal restrictions and belong in the “Procedures for Collecting Data” section of the proposal -- there is no “Delimitations” section. One of Dr. Matt Crain's limitations was:
Due to the lack of a central organizational headquarters, no directory of Churches of Christ exists whereby a true random sample of all congregations may be obtained.16
Here's one from the dissertation of Dr. Charles Bass:
15 Wesley Black, “A Comparison of Responses to Learning Objectives for Youth Discipleship Training from Minister of Youth in Southern Baptist Churches and Students Enrolled in Youth Education Courses at Southwestern Baptist Theological Seminary,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1985), 30-31 16 Matthew Kent Crain, “Transfer of Training and Self-Directed Learning in Adult Sunday School Classes in Six Churches of Christ,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1987), 8
© 4th ed. 2006 Dr. Rick Yount
2-9
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
This study [will be] subject to the limitations recognized in collecting data by mail, such as difficulty in assessing respondent motivation, inability to control the number of responses, and bias of sample if a 100 percent response is not secured.17
Assumptions Every study is built on assumptions. The purpose of this section is to insure that the researcher has considered his assumptions in doing the study. In doing a mailed questionnaire, the researcher must assume that the subjects will complete the questionnaire honestly. In testing which of two counseling approaches is best, one assumes that the approaches are appropriate for the subjects involved. Provide a rationale for the assumptions you state. It is not enough to copy assumptions out of previous dissertations. Explain the why of your assumptions. Here are several assumptions made by Dr. Darlene Perez: 1. All [112 Puerto Rican Southern and American Baptist] churches will have a youth Sunday School enrollment. 2. The pastors and youth leaders will cooperate with the study and will insure completion of the questionnaires. 3. Since [all] 112 Southern Baptist and American Baptist churches were used in the study, it is assumed that the findings are important in that they represent the general opinion of Baptist youth groups in Puerto Rico. . . .18
Here are several assumptions made by Dr. Gail Linam: 2. The in-depth training provided to researchers who administrated and/or scored the Iowa Tests of Basic Skills, the cloze reading comprehension test, and the retell ing comprehension analysis insured consistency in test administration and objectivity in scoring. 3. The Iowa Tests of Basic Skills, as a norm-based test, provided an accurate assessment of the reading level of boys and girls in Arlington, Texas, and thus offers a meaningful base of reference for religious educators around the nation who seek to make application of the study's findings to their particular group of boys and girls.19
Definitions If you are using words in your study that are operationally defined -- that is, defined by how they are measured -- or have an unusual or restricted meaning in your study, you must define them for the reader. You do not need to define obvious or commonly used terms. For example, Dr. Kaywin LaNoue studied differences in “spiritual maturity” in high school seniors across two variables: active versus non-active in Sunday School, and Christian school versus public school. But what did she mean by “active” in Sunday School? What is “spiritual maturity” and how did she measure it? Here are her definitions for these two terms: 17 Charles S. Bass, “A Study to Determine the Difference in Professional Competencies of Ministers of Education as Ranked by Southern Baptist Pastors and Ministers of Education,” (Ph.D. diss., Southwestern Baptist Theological Seminary, 1998), 45 18 Darlene J. Perez, “A Correlational Study of Baptist Youth Groups in Puerto Rico and Youth Curriculum Variables,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1991), 12 19 Gail Linam, “A Study of the Reading Comprehension of Older Children Using Selected Bible Translations,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1993), 85
2-10
© 4th ed. 2006 Dr. Rick Yount
Chapter 2
Proposal Organization
Active. Active means those students attending their Sunday School at least three Sundaysa month.2 Spiritual maturity. Peter gives the steps in a Christian's growth toward maturity when he lists the attributes of the Christian life in the order by which they should be sought. He does this in 2 Peter 1:5-8. . . . In this study, spiritual maturity [is] the extent to which the students have assimilated (internalized) the virtues of goodness, knowledge, self-control, perseverance, godliness, brotherly kindness, and love.21
Dr. LaNoue used an adaptation of the Spiritual Maturity Test, developed and published by Dr. James Mahoney, to convert the virtues listed above into a test score.22 Sometimes special terms are used to communicate complex concepts quickly. These terms need to be defined. For example, the term "k, J combination" makes no sense until it is clearly defined: k,J combination. -- This term refers to two major variables in this study: the number of groups in an experiment, k, and the sample size category, J. There [will be] four levels of k representing three, four, five, and six groups. There [will be] seven levels of J. J(1) through J(5) [will represent] equal n sample sizes of 5, 10, 15, 20, and 25 respectively. J(6) [will represent] an unequal set of nj's in the ratio of 1:2:3:4:5:6 with n1= 10. That is, when k=3, the sample n's [will be] 10, 20, and 30. J(7) [will represent] a set of nj's in the ratio of 4:1:1:1:1:1 with n1=80. That is, when k=3, the sample n's [will be] 80, 20, and 20. This provides twenty-eight combinations of k,J.23
See Chapter Three for more information on operationalizing variables.
Design The Design section describes the research type of your study. It is here you declare your research to be correlational, or historical, or experimental. See the overview of Research Types in Chapter One for a description of eight major design types. Describe key factors that make your study of the stated type. If you are using an experimental design, explain which you are using and why. Dr. Brad Waggoner explained his design this way:24 The method of research [which will be] employed in this study [is] “Research and Development”. . . This type of research [is] accomplished in two phases. The first phase [will involve] the development of the product. The second phase [will consist] of evaluating the use or effects of the product.xx Although the exact number of specific stages of Research and Development vary from author to author, the following five steps [will be] applied:xy 1. The identification of a need, interest, or problem 2. The gathering of information and resources concerning the problem or need 3. The preliminary product or process [is] developed 4. The product or process [is] field-tested
Kaywin Baldwin LaNoue, “A Comparative Study of the Spiritual Maturity Levels of the Christian School Senior and the Public School Senior in Texas Southern Baptist Churches With a Christian School,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1987), 25 21 22 Ibid., 26 Ibid., 93-97 23 William R. Yount, “A Monte Carlo Analysis of Experimentwise and Comparisonwise Type I Error Rate of Six Specified Multiple Comparison Procedures When Applied to Small k's and Equal and Unequal Sample Sizes,” (Ph.D. diss., University of North Texas, 1985), 8 24 Waggoner, 7-8 20
© 4th ed. 2006 Dr. Rick Yount
2-11
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
5. The product or process [is] refined based on the information obtained from the field-testing. xx
C. M. Charles, Introduction to Educational Research (New York: Longman, 1988), 13
xy
Ibid., 13-14
Dr. Martha Bergen described the design of her study this way:25 The design of this study [is] descriptive in nature. [A] questionnaire [will be] designed to determine the attitudes of Southwestern Seminary's full-time faculty toward computers for seminary education. Further, certain variables [will be] examined to determine their possible predictions of these attitudes.
See Chapter Thirteen for more information on experimental designs.
Procedure for Collecting Data The Procedure for Collecting Data section explains step by step how you plan to prepare instruments and gather data. Anticipate problems you may encounter and make contingency plans as needed. Avoid fuzzy over-generalized statements such as, “Prepare and mail out survey forms.” This phrase requires many specific actions: development, evaluation, rough draft, pilot testing, revision, final draft, printing, packaging, and mailing. Consult related dissertations and primary sources to discover the best procedures to use when collecting the particular type of data you need. At the end of this section, you should picture yourself with data sheets filled with numbers linked to each subject and every variable in the study. If the METHODS section is properly planned and executed, the result will be valid and reliable data ready for analysis. See Chapters Nine, Ten, Eleven, and Twelve for more information on collecting data.
Analysis Analyzing Data Testing Hypothesis Reporting Data
The third and final major section of the proposal is the analysis section. The ANALYSIS section describes how you plan to process the numbers on the data sheets. This section moves step by step through the application of selected statistical procedures, the testing of hypotheses, and the reporting of the data in a systematic, coherent way.
Procedure for Analyzing Data The Procedure for Analyzing Data explains step by step how you plan to statistically analyze your data. What statistical procedure(s) will you use? Procedures must agree with the stated Problem and Hypothesis. I was impressed by the importance of this section during my very first semester as the faculty member responsible for research and statistics. A doctoral student came into my office with a box full of inventory sheets. He had spent nearly $1,000 on printing and postage. He sat down, looked painfully at the box and asked,“Now, what do I do Martha S. Bergen, “A Study of the Relationship Between Attitudes Concerning Computer-Enhanced Learning and Selected Individual and Institutional Variables of Full-Time Faculty Members at Southwestern Baptist Theological Seminary,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1989), 52 25
2-12
© 4th ed. 2006 Dr. Rick Yount
Chapter 2
Proposal Organization
with this?” “What do you want to find out,” I asked. “I dunno ... uh, I’m not sure.” He had paid $300 for advice from a statistician across town, and had been led down a dead-end alley. The student left too much for others to decide. He did not own his own research. I gave him some suggestions, and, with a great deal of effort on his part and some additional help from his statistician, he was able to produce an acceptable dissertation. But he paid for it in many sleepless nights! The truth of the matter is that, as shown in the diagram at right, we really cannot correctly collect data until we know how we're going to analyze it. The two parts — design and analysis — work together.
Testing the Hypotheses The Testing the Hypothesis section describes how you will test the statistical result obtained in the previous section to determine whether it is a “significant” finding or not. It is here you state the null form of your hypothesis, state your significance level (α) and explain the hypothesis testing procedure appropriate for the selected statistic. See Chapters Sixteen through Twenty-Six for procedures for analyzing data and testing the hypothesis.
Reporting the Data The Reporting the Data section shows the charts, graphs, tables, or figures you plan to use to report the data you’ve collected and the findings of your analysis. Dr. Daryl Eldridge developed thirty-nine tables for his “Effect of Student Knowledge of Behavioral Objectives” dissertation.26 If you include actual examples of labelled charts or graphs (without data) in your proposal, then transferring the actual data to the chart is “all that's left to do” after the study. By deciding how to handle your data during the proposal stage, you clarify in your own mind exactly what you will need in order to finish your study. Putting off these decisions may cause you to overlook important areas in your study. Not only will this increase the difficulty of getting your proposal approved by the Ph. D. Committee, it will create unnecessary problems in writing your dissertation.
Reference Material The Reference Material section contains supporting materials for the proposal. These materials include appendices and bibliography.
Appendices An appendix contains supporting materials which relate directly related to your study. Most proposals require several appendices to include cover letters, a sample of the instrument, results of a pilot study, the data summary sheets, complex tables, illustrations of statistical analysis, and so forth. Dr. Daryl Eldridge developed twentythree appendices to house all the “supplemental materials” generated by his 188-page dissertation. What could possibly take up twenty-three appendices? Here's the list:27 1 - Course Objectives for Building a Church Curriculum Plan 332-435 [3 pages] 2 - Sample of Class / Session Objectives [1] 3 - First Draft of Unit 1 Exam [5] 26
Eldridge, 79
27
Ibid., 96-183
© 4th ed. 2006 Dr. Rick Yount
2-13
Appendices Bibliography
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
4 - Cornell Inventory for Student Appraisal of Teaching and Courses [7] 5 - Letter to Research Associates for Validation of Cognitive Tests [2] 6 - Test Item Analysis - Unit 1 Exam [2] 7 - Letter to Research Associates for Validation of Precourse Attitude Inventory [2] 8 - Report Form For Student Test Scores [1] 9 - Session Goals and Indicators [4] 10 - Unit 1 Exam, Final Form [8] 11 - Unit 3 Exam, Final Form [5] 12 - Cognitive PreTest, Final Form [4] 13 - Postcourse Student Inventory [8] 14 - Precourse Student Inventory [3] 15 - Tentative Class Schedule [4] 16 - Course Syllabus, Fall Semester [3] 17 - Course Syllabus, Spring Semester [5] 18 - Quizzes Over SBC Curriculum [6] 19 - Letter to Cornell University [1] 20 - Selected Comments From the Postcourse Inventory and Student Evaluations [3] 21 - Raw Scores For All of the Instruments [4] 22 - A Comparison of Scores Across Semesters for the Various Instruments [2] 23 - Statistical Analysis for Each of the Instruments Across Semesters [5]
You provide a clear, categorized filing system for supportive information by packaging materials in appendices. Small parcels of this information can be drawn from these appendices for explanation and illustration in the body of the dissertation. Such a design permits you to provide complete information, through references to the appendices, without bogging down the flow of thought in the dissertation itself. In the proposal development stage, think ahead concerning what appendices you will need and include an empty copy of each as an appendix to the proposal. This demonstrates to the Committee forethought and critical thinking.
Bibliography, or Cited Sources The bibliography lists all primary and secondary references footnoted in the body of your proposal. List books first, then published articles and periodicals, then dissertations, then unpublished sources, interviews and, finally, other. Format bibliographical references according to the current style manual.
Practical Suggestions
Personal Anxiety Professionalism
Here are some practical suggestions to help you write a solid proposal.
Personal Anxiety This assignment is complex. Some students experience a frightening sense of anxiety as they consider the daunting task of writing a research proposal. A research proposal taxes the thinking skills of the best students. You are confronted with learning new definitions (knowledge), understanding new concepts (comprehension), discovering conceptual links among numerous articles (analysis), writing an integrative narrative (synthesis), choosing the correct design and statistical procedures (evaluation) and putting all of this together in a single-focused, comprehensive document. Your educational experiences in high school and college may have emphasized rote memory,
2-14
© 4th ed. 2006 Dr. Rick Yount
Chapter 2
Proposal Organization
recall, and simple concepts rather than clear thinking. Therefore, writing an original research proposal is “a strange new thing” for some. Many paths to choose. Many decisions to make. What topic will I choose? What kind of research will I select? Where do I begin? For some, too many “neat ideas” compete for attention. For others, “neat ideas” are nowhere to be found. Don't panic. Take each section, each step of the process, one at a time.
Professionalism in Writing A research proposal should be written in a clear, professional manner or it will not be understood. Here are some suggestions.
Clear Thinking Your proposal should show clear thinking. Write and revise. Squeeze out fuzzy phrases, word magic28 and awkward grammar. Write simply and clearly. Use professional jargon only when simple English can’t convey the thought.
Unified Flow There should be a unified flow through the proposal. Take care not to ramble or lose focus in the details. March step by step in a single direction from the first page to the last.
Quality Library Research The proposal should demonstrate extensive yet focused library research. Use primary sources less than five years old to establish current trends. Use secondary sources less than ten years old to establish the scope of your study. Use sources older than ten years only to establish historical trends.
Efficient Design Your proposal should demonstrate your understanding of research design and statistical analysis, and how they work together. The proposal should present a narrative that is all-of-one-piece rather than a disjointed collection of pieces. Problem, Hypothesis, and Statistic should form its backbone.
Accepted Format Finally, write in the accepted professional format of your school. Content is more important than format, but a professional format is required.
Summary This chapter lays out the complete skeletal organization, with examples from actual dissertations, for the proposal you are developing. Study each component individually, as well as its relationship to the whole. Refer to this chapter and to the Evaluation Guidelines in Chapter 27 throughout the writing process to insure that you are on course. You will add to your understanding of each of these components as the semesI use the term “word magic” to refer to high-sounding, emotive words that have little substantive meaning. The majestic purpose of the American school is to instill in the hearts and minds of our youth the requisite essentials which will allow them to take their rightful place in society and fulfill their destiny. Huh? We hear word magic in sermons and classrooms as well. It “gets the amens” but communicates little. 28
© 4th ed. 2006 Dr. Rick Yount
2-15
Research Design and Statistical Analysis in Christian Ministry
I: Research Fundamentals
ter progresses. Use this overview to anchor “the big picture” in your mind.
Vocabulary Analysis Appendix Assumptions Bibliography Definitions Delimitations Design felt difficulty Front Matter Hypothesis Instrument Introduction Introductory Statement Limitations List of Tables List of Illustrations Method Population Procedure for Collecting Data Procedure for Analyzing Data Purpose of the Study Reporting the Data research proposal Sampling Significance of the Study Statement of the Problem Synthesis of Related Literature Table of Contents Testing the Hypotheses Title Page
describes step-by-step the analysis of collected data an addendum to a proposal which contains supporting examples stated presuppositions upon which a proposed study is based a list of references used in developing the proposal a list of meanings of terms which are unique to the study, operationalized restrictions placed on a study by the researcher an explanation of the specific experimental approach to be used the beginning point of a study but not included in proposal preliminary materials such as Table of Contents and Lists the anticipated outcome of the study or solution to the Problem the means by which data is gathered the first major section of the proposal (includes the Problem) the opening statement of the proposal which leads to the problem restrictions placed on a study outside the researcher’s control a listing of tables used in the proposal (Front Matter) a listing of illustrations used in the proposal (Front Matter) the second major section of a proposal (includes sampling and instrument) the largest group to which the proposed study can be generalized step-by-step procedure for sampling, instrumentation, and gathering data step-by-step procedure for statistically reducing data to meaningful results explanation of the rationale for doing the study explanation of how data analysis will be presented (charts, tables) a step-by-step blueprint for conducting scientific inquiry the process of identifying a representative group from a population stated reasons why a study is necessary (answers `so what?’) Simple focused statement of the relationship among variables in the study a clear narrative which fuses research materials related to the study an outline of proposal organization (Front Matter) an explanation of how stated hypotheses will be tested statistically the cover page of the proposal
Study Questions 1. Differentiate between the “Introduction” and the “introductory statement.” 2. Differentiate between a synthesis and a summary of related literature. 3. Differentiate between a limitation and a delimitation. 4. What are the three essential elements that make up the backbone of a proposal?
Sample Test Questions 1. Which of the following proposal elements do not belong in the Introduction Section? a. The Problem b. The Hypothesis c. The Definitions d. The Synthesis of Related Literature
2-16
© 4th ed. 2006 Dr. Rick Yount
Chapter 2
Proposal Organization
2. The introductory statement should a. move from a broad focus of the field to the narrow focus of the study b. express the subjective interest and intent of the researcher c. take care not to use information from research articles d. lead directly to the statement of the hypothesis 3. Which of the following is not recommended as a way to organize the synthesis of literature? a. research article publication dates b. research article author names c. concepts addressed by research articles d. hypotheses of the study 4. Which of the following sections may be omitted from a proposal — with appropriate caution? a. The Problem b. The Hypothesis c. The Significance of the Study d. The Limitations
© 4th ed. 2006 Dr. Rick Yount
2-17
Research Design and Statistical Analysis in Christian Ministry
2-18
I: Research Fundamentals
© 4th ed. 2006 Dr. Rick Yount
Chapter 3
Empirical Measurement
3 Empirical Measurement Variables and Constants Measurement Types Operationalization Scientific knowing stands or falls on the precision of its empirical observations. Whether these observations are made by a microscope, or a telescope, or stop watch, or pencil and paper test, the scientist strives for an accurate, numerical representation of the phenomena he is studying. The first step is to define the phenomenon under study in terms of the way you intend to measure it. This process is called operationalization. In order to understand the process, you will need to understand the terms “variable” and “measurement.” If you do not determine a clear way to measure what you intend to study, you will eventually bog down in the confusion of instrument design and statistical procedures. Now, not “sometime later in your studies,” is the time to decide specifically how you will measure the variables you intend to study.
Variables and Constants A “constant” is a specific number which remains the same under all conditions. For example, water freezes (at sea level) at 32 degrees fahrenheit or 0 degrees centigrade. The number of seconds in a minute is 60. An object dropped from an airplane will accelerate at 32 feet per second per second (32 ft/sec²). These are constants. A “variable” is an entity which can take different values. If we were to weigh each member of this class, we would find the numbers (“weight”) varying from subject to subject. “Weight” is a variable. So is eye color, IQ, gender, education level, level of anxiety, marital satisfaction, and so on. Research design and statistical analysis focus on the study of variables. As you think about subjects you would like to study, you will determine what “variables” you will study. Some of you are interested in counseling technique. Others are interested in learning style. Still others in administrative effectiveness. But what will you measure to determine which counseling technique is “best?” What will you measure to determine the effects of differing learning styles? What will you measure in order to gain a better picture of productive administration? Answers to these questions involve © 4th ed. 2006 Dr. Rick Yount
3-1
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
an understanding of independent and dependent variables.
Independent Variables An independent variable is one that you control or manipulate. You decide to study three different teaching methods. “Teaching Method” is an independent variable. Or you want to compare four approaches to counseling abused children. “Counseling Approach” is the independent variable.
Dependent Variables A dependent variable is the variable you measure to demonstrate the effects of the independent variable. If you are studying “Teaching Method” you might measure “achievement” or “attitude toward the class.” If you are studying “counseling approach” you might measure “anxiety level” or “overt aggression.”
Measurement Types Nominal Ordinal Interval Ratio
Before a dependent variable can be analyzed statistically, it must be measured or classified in some manner. There are four major ways we measure variables. These measurement types are called nominal, ordinal, interval and ratio.
Nominal Measurement “Nominal” data refers to variables which are categorized into discrete groups. Subjects are grouped or classified into categories on the basis of some particular characteristic. Examples of nominal variables include all of the following: gender, college major, religious denomination, hair color, residence in a certain geographic region, staff position.
Ordinal Measurement “Ordinal” data refers to variables which are rank ordered. Notice that nominal variables have no “order” to them. “Males” and “Females” imply nothing more than two different groups of subjects. but ordinal data orders subjects from high to low on some variable. An example of this data type would be the rank ordering of ten priorities for Christian education in the local church.
Interval Measurement An ordinal scale only reports 1st, 2nd, 3rd places in a set of data. It cannot tell us whether the distance between 1st and 2nd is greater than or less than the distance between 2nd and 3rd. In order to measure distances between data points, we need a scale of equal, fixed gradations. This is precisely what an interval scale is. Numbers are associated with these fixed gradations, or intervals. One of the most common examples of an interval scale is temperature. The difference between 50 and 60 degrees F. is the same as the difference between 100 and 110 degrees F. Another example is an attitude scale which has 20 items. Each item can have a value of 1, 2, 3, or 4. That means a subject can make a score between 20 and 80. The scores on this scale fall at regular one-point intervals from 20 to 80.
3-2
© 4th ed. 2006 Dr. Rick Yount
Chapter 3
Empirical Measurement
Ratio Measurement Interval data does not, however, lend itself to ratios. We cannot say, for example, that 100 degrees is twice as hot as 50 degrees. The zero point on an interval scale is arbitrary; that is, it does not represent the total absence of the measured characteristic. A temperature reading of 0 degrees F. does not mean there is “no heat.” (The Kelvin scale was invented for this. A temperature of 0 degrees Kelvin, about -485 degrees F., is absolute zero temperature.) Ratio measurement differs from interval measurement only in the fact that the ratio scale contains a meaningful zero point. Zero weight means that the object weighs nothing. Zero elapsed time means that no time has passed since the beginning of the experiment (it has yet to begin!). A true zero point means that observations can be compared as ratios or percentages. It is meaningful to say that a 60-year-old is twice the age of a 30-year-old. Or that a 90-pound weakling weighs half as much as a 180-pound bully. In most types of studies, interval and ratio data are treated the same for purposes of selecting the proper statistical procedure.
Data Type Summary Perhaps one final example will help you see how these four data types differ from each other. Study the table below. The students listed below have class attitude scores (interval: 20-80), test scores (ratio: 0-100), test rankings (ordinal: 1-11), grade classifiAttitude Test Test Test Gender cations (nominal; A, B, C) Scores Scores Rank Grade (20-80) (0-100) (1-11) (A-F) (M-F) and gender classifications (nominal: M,F). Barb 80 100 1 A F The ratio scale (“test Chris 48 96 2.5 A M 96 2.5 A F Bonnie 74 scores”) has a true zero and Robert 35 93 4 A M equally spaced intervals (the Jim 79 92 5 A M Tina 60 89 7 B F points between 0 and 100). Ron 55 89 7 B M The interval scale (“attitude Jeff 56 89 7 B M Brenda 74 88 9 B F scores”) has equally spaced Mark 56 82 10 B M intervals, but no true zero Mike 65 75 11 C M point (points range from 20 to Type: interval ratio ordinal nominal nominal 80). The ordinal scale (“test rank”) ranks the students by test scores. Notice that equal test scores received the same average rank (e.g., 96 and 96 both ranked '2.5', the average if ranks 2 and 3. These are called 'tied ranks'. "89" is ranked 7, 7, and 7 -- the average of 6, 7, 8). The first nominal scale (“test grade”) identifies students by grade categories and the second (“gender”) identifies them “m” or “f.” These four measurement types require three different sets of statistical procedures: one set for interval/ratio, another for ordinal, and still another for nominal. We'll look at some of the major procedures in a statistical flow chart in Chapter 5.
Operationalization
Definition
Our research design describes how we plan to measure selected variables. Statistical analysis describes how we plan to reduce these measurements to a meaningful (numerical) form. In both cases, the variables in the study must be defined in terms of measurement. © 4th ed. 2006 Dr. Rick Yount
3-3
An Example Another Example Questions to Answer
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
Definitions An operational definition indicates the operations1 or activities that are performed to measure or manipulate a variable.2 The purpose of an operational definition is to help scientists speak the same language when reporting research. Since one of the primary characteristics of science is precision, we must begin with precise definitions of the variables we plan to study. Operational definitions force us to think concretely and specifically about the terms we use. Some of my students struggle with this. In one of my Principles of Teaching classes, a student was attempting to describe the fruit of the Spirit (Gal. 5:22-23). He defined "love" as "God's kind of love." But what kind of love is that? "Joy" was defined as "joy that you feel deeply, the joy we'll experience in heaven." But what is joy? These are non-definitions. They are empty. They are useless in teaching because they convey nothing but semantic fluff. I call this kind of definition "word magic," for it deceives teachers into thinking they are explaining words and phrases when in fact the definitions are little more than puffs of smoke in the air. Defining terms in precise terms of measurement avoids this kind of imprecision in research. Secondly, operational definitions provide a common base for communication of terms with others. When terms are operationally defined, readers know exactly how we are using our terms. For example, what does hunger mean? In one research study the operational definition for hunger was ...the state of animals kept at 90% of their normal body weight. This is certainly not the definition people use when they reach for their third chocolate-covered doughnut, saying, “I'm really hungry!!” The goal is to precisely understand the terms we use in research, and to convey that meaning clearly to others.
An Example Years ago, General Motors used the slogan “We Build Excitement —PONTIAC!” Suppose we wanted to study that. What does General Motors mean by “excitement?” We need to operationalize the term. There are several ways to do it. Have trained raters follow selected owners of Pontiacs, Fords and Chryslers and count the number of times they behave in an excited, agitated or exuberant manner. “Excitement” means the number of such behaviors per day. Is there a significant difference among the owners of these three makes of cars? Or, tally the number of dates selected car owners have per week. “Excitement” means the number of dates per week. This definition assumes that dates are exciting. Or, ask the owners: “How excited does your car make you?” Have them respond by marking a scale from 0 (no excitement) to 10 (excited all the time because of the car). Here “excitement” is a self-reported feeling, measured by a number on a scale. Or, ask two acquaintances of each selected subject to rate them on a “car excitement” scale. With this definition, “excitement” is the average scale score of impres-
Meriam Lewin, Understanding Psychological Research (New York: John Wiley & Sons, 1979), 75. Walter R. Borg and Meredith D. Gall, Educational Research: An Introduction, 4th ed. (New York: Longman, 1983), 22 1 2
3-4
© 4th ed. 2006 Dr. Rick Yount
Chapter 3
Empirical Measurement
sions of the two acquaintances. Each of these definitions provides a different measure of the general term “excitement.” In fact, we actually have four concepts of the term. But each definition is clear in its meaning.3
Another Example Let’s illustrate the operationalization process with a practical example. Read this example carefully, noting each step in the process. John is considering several topics for his research proposal. He is drawn toward the problem of adolescent “bail-out” of church attendance when they leave home. Putting his first thoughts down on paper, he writes: "Church attendance decreases when young people leave home" Writing out your thoughts is important! Almost anything can “sound logical” as you play with ideas in your mind. Putting these thoughts down on paper is a first real step toward constructing a workable topic. I’ve heard students complain, “I know what I want to study, but I just can’t put it down on paper!” Well, they “feel” like they know what they want to study, but their ideas are only wisps of fantasy. To put your idea down on paper is to grasp it, refine it, put shape to it, and bring it into the real world where the rest of us live. Do you have an idea for your study? Write it down. Then work on it, as a sculptor on granite, and bring out the essence of your creation. Nothing of value comes easy. As John reflects on his statement, he asks as many questions about it as he can. He steps away from his idea and objectively critiques it. You must separate your ego from your statement. Otherwise you will find yourself defending your work rather than refining it. Here are some of his questions: Whose church attendance decreases? This statement could refer to parents or friends. It is not specific on this point. The statement seeks to measure a change in behavior. This requires before-and-after measurements. Is this possible to do? What is church attendance? What does this term imply? Worship? Bible Study? Church softball league? What is “home”? What does it mean to “leave home”? After writing down these questions and considering alternative ways to express what he wants to study, he rewrites his statement like this: “Young people living away from home will have a lower rate of attendance at worship services than young people living at home.” First, this statement is better because it clarifies “attendance” as the young people’s attendance at worship services. Second, this statement is better because it indicates measuring attendance of See Earl Babbie, The Practice of Social Research, 3rd ed. (Belmont, CA: Wadsworth Publishing Company, 1983), 130-131 3
© 4th ed. 2006 Dr. Rick Yount
3-5
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
two groups and comparing them, rather than a before-and-after measurement of a selected group of subjects. The term “young people” is still fuzzy, however. How young is “young”? What does “living away from home” mean? Or “living at home”? Answers to these questions would be placed in the Definitions section of the proposal. In John's case, he defined these terms as follows: “Young people” is defined as persons aged 18-25. “Home” is defined as the residence of the subjects’ parents and where he or she lived as a child. “Living at home” is defined as the continued full-time residence of the subject at “home”. “Leaving home” is defined as the subject taking up residence away from “home” for at least three months. In order to do this study, John needs to define two populations: “young people living at home” and “young people living away from home.” He will need to sample two study groups from these populations. He will need to gather four pieces of data from each subject: (1) age, (2) residence, (3) attendance at worship services, and (4) how long away from home. You have just walked through a process of operationalization. It is a process essential for clear problem-solving. Begin now to operationalize the variables you are considering for your study.
Operationalization Questions As you consider the measurement of variables for your study, there are two basic questions you must answer. The first is “Are my variables measurable?” If they are not, you cannot study them -- not statistically, that is. Some students have difficulty answering this question because they have too limited an understanding of what “measurement” entails. We will be looking at several approaches to measurement in the chapters ahead: direct observation, survey, testing, attitude measurement, and experimentation. Once you have settled on what kind of data you need for your study, begin looking in research texts and journal articles for ways to gather that data. Don’t overlook the guidelines in later chapters of this text! The second question is “How will I measure these variables?” Define each of your variables in terms of how you will measure them [“operational definitions”]. I suggest you work on the statement for a while and then put it aside for several hours. When you come back to it, you’ll be able to look at it more objectively. It is difficult to avoid rationalization and self-defense of your work. But you will excel in writing your proposal only if you can critique yourself clearly and objectively. It is better if you find the weaknesses before others do! Once you have operationalized your draft statement, you will be ready to write the Statement of the Problem and the Research Hypothesis. We will get into these two sections of the proposal in the next chapter.
Summary This chapter has introduced you to the concept of variables, four data types (nominal, ordinal, interval, and ratio), as well as the process of operationalization: defining selected variables in terms of measurement.
3-6
© 4th ed. 2006 Dr. Rick Yount
Chapter 3
Empirical Measurement
Vocabulary arbitrary zero category constant dependent variable independent variable interval interval data measurement type measurement nominal data operational definition operationalization ordinal data rank ratio data true zero variable
an arbitrary 'zero value' -- does not mean absence of the variable (e.g. 0°F) a class or group of subjects (e.g. male/female on variable GENDER) a numerical value which does not change (e.g. the freezing point of water: 32°F) a variable which is MEASURED by the researcher a variable which is MANIPULATED by the researcher equi-distant markings on a scale (e.g. degrees on a thermometer) a measurement which reflects a position on an interval scale (e.g. 54°F) a specific kind of measurement (nominal, ordinal, interval, ratio) the process of assigning a number value to a variable a measurement which reflects counts in a group (e.g. 15 males in Research class) describing a variable by it’s measurement (e.g. ‘adult’ means 18+ years old) the process of defining variables by their measurement a measurement which reflects rank order within a group “the relative position in a group (e.g. 1st, 2nd, 3rd)” a measurement which reflects a position on a ratio scale (e.g. 93 on Exam 1) the complete absence of a variable (e.g. 0 pounds = no weight) “an element that can have many values (e.g. `weight’ can be 120 or 210 or 5)”
Study Questions 1. List and define four kinds of measurement. Give an example of each kind. 2. Define “constant” and “variable.” Give two examples of each. 3. Operationalize the “fuzzies” below. A. “Staff members who work with autocratic pastors are less happy than those who work with democratic pastors.” B. “Teaching Sunday School with discussion will result in better feelings than teaching with lecture.” C. “Group counseling is better than individual counseling.”
Sample Test Questions 1. Identify the following kinds of data by writing Nominal, Ordinal, Interval, or Ratio in the blank provided. ___ Birth Year ___ Test score
___ oC or oF ___ Nationality
___ Class rank ___ Body Weight
2. Which of the following is not a characteristic of an operational definition? A. helps researchers communicate clearly B. uses global, abstract terminology C. specifies activities used to measure a variable D. addresses science’s desire for precision con't © 4th ed. 2006 Dr. Rick Yount
3-7
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
3. Which of the following is the best operational definition? A. An attitude of forgiveness B. Aggressive facial expressions C. Immoral behavior D. Anxiety test score 4. Identify the type of data expressed in the statements below by writing the appropriate letter in the blank provided. N ominal
O rdinal
I nterval
R atio
____ Statistical Aptitude will be measured by scores obtained on the STAT2 (0-20)1 ____ My current feelings toward my father could be characterized as:2 Very Warm and Tender Good Unsure Unfavorable 1 2 3 4 ____ Employment Status: Full-Time
Part-Time
Very Distant and Cold 5
Not Employed3
____ Study Habits: Sum of Delay Avoidance (DA) and Work Methods (WM) Scores on the Survey of Study Habits and Attitudes Inventory (Max: 100)4 ____ Critical Thinking Ability: score on the Watson-Glaser Critical Thinking Appraisal5 ____ Leadership Style: “9,9” “5,5” “9,1” “1,9” “1,1”6 ____ Reasons for Dropping Out of a Christian College: Ranking of 50 Attrition Factors7 ____ Child Density: Computed by dividing the number of children in a family by the number of years married8
"0" means no aptitude for statistics. Ibid., 86 3 James Scott Floyd, “The Interaction Between Employment Status and Life Stage on Marital Adjustment of Southern Baptist Women in Tarrant County, Texas,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1990), 45 4 Steven Keith Mullen, “A Study of the Difference in Study Habits and Study Attitudes Between College Students Participating in an Experiential Learning Program Using the Portfolio Assessment Method of Evaluation and Students Not Participating in Experiential Learning,” (Ph.D. diss., Southwestern Baptist Theological Seminary, 1995), 51 5 Bradley Dale Williamson, “An Examination of the Critical Thinking Abilities of Students Enrolled in a Masters Degree Program at Selected Theological Seminaries,” (Ph.D. diss., Southwestern Baptist Theological Seminary, 1995), 23 6 Helen C. Ang, “An Analytical Study of the Leadership Style of Selected Academic Administrators in Christian Colleges and Universities as Related to Their Educational Philosophy Profile,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1984), 28-29 7 Judith N. Doyle, “A Critical Analysis of Factors Influencing Student Attrition at Four Selected Christian Colleges,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1984), 98 8 Martha Sue Bessac, “The Relationship of Marital Satisfaction to Selected Individual, Relational, and Institutional Variables of Student Couples at Southwestern Baptist Theological Seminary,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1986), 23 1 2
3-8
© 4th ed. 2006 Dr. Rick Yount
Problem and Hypothesis
Chapter 4
4 Getting On Target The Problem of the Study The Hypothesis of the Study From Raw to Refined
I lay in the cold, damp sand of Fort Dix, New Jersey with my M-16 pointing down range. Getting my weapon “on target” was not as easy as my instructors had made it sound in class. I felt as if I were all thumbs as I wrestled with sight alignment, breathing, placement of the front sight on the target, correction for wind, and correction for distance. I had one thing going for me, however, despite my awkward confusion. My “problem” was clear: put the 7.62mm round in the center of the target standing 100 yards away. The anticipated result was clear as well: put it all together and the round will hit the bull’s eye. Practice translated the problem into the anticipated result. I qualified for the Sharpshooter’s Badge. Writing a proposal is more complex than target practice. The need to “get on target” with your proposal, however, is just as important. The “Problem” and “Hypothesis” statements focus every other element of the proposal. They form the proposal’s heart — its “bull’s eye.” Confusion here will generate confusion throughout the proposal.
The Problem Statement The problem statement defines the essence of your study and identifies the variables you will study.
Characteristics of a Problem The following characteristics are important to keep in mind as you develop the formal statement of the problem of your study.
Limit scope of your study Novice researchers tend to include too many variables or too much material in their studies. The problem statement helps limit your study by focusing your attention on the particular variables you want to investigate.
Current theory and/or latest research The problem statement should reflect the most recent discoveries in your field of interest. You will refine your problem as you conduct the literature review (Chapter Six). A clear understanding of your specific problem will help you gather pertinent © 4th ed. 2006 Dr. Rick Yount
4-1
Characteristics Examples
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
data from your field and discover if you are proposing a redundant study.
Meaningfulness Is your problem statement meaningful? Is it important to your field? The problem may focus on something you personally want to know, but this is not enough to establish the need for the study. The inexperienced tend to focus on the obvious, surface issues related to ministry. The problem statement should have a theoretical basis beyond the pragmatic concern of “what works?” Research seeks to know the “whys” as well as the “hows” of the way the world works.
Clearly written The problem statement is usually a single sentence which isolates the variables of the study and indicates how these variables will be studied. The statement is terse, brief, concise. It is objectively written so that another can read the statement and understand the focus of the study.
Examples of Problem Statements Let’s focus on several practical formats that Problems can take. We can study the relationship between variables or the differences between groups.
Association Between Two Variables A study can focus on the relationship between two variables. The general format of this type of Problem Statement is this: The problem of this study is to determine the relationship between (Variable 1) and (Variable 2) in (a specific group).
Dr. Helen Ang wrote her problem statement in this format: The problem of this study [is] to determine the relationship between the leadership style of academic administrators in selected Christian colleges and universities and their educational philosophy profile.1
This study proposes to measure the administrative leadership style and the particular philosophy of education of selected Christian college administrators and determine whether there is any relationship between these two variables. Since “style” and “philosophy” are nominal variables, this problem statement infers the use of the chisquare2 Test of Independence -- relationship between two nominal variables. (See Chapters 5 and 23 for further information on chi-square.)
Association of several variables A study may focus on how a selected group of variables may predict another. The general format is this: The problem of this study is to determine the relationship between (variable 1) and a specified set of predictor variables. Helen C. Ang, “An Analytical Study of the Leadership Style of Selected Academic Administrators in Christian Colleges and Universities as Related to Their Educational Philosophy Profile,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1984), 3 2 “Chi” is pronounced “ki” as in “kite.” 1
4-2
© 4th ed. 2006 Dr. Rick Yount
Problem and Hypothesis
Chapter 4
Dr. Bob Welch wrote his problem statement like this:
The problem of this study [is] to determine the relationship between ministerial job satisfaction and a specific set of predictor variables. These variables [are] Principle Ministry Classification, Gender, Age, Marital Status, Education, Tenure, and presence in the workplace of a Performance Evaluation.2
This statement identifies variables which the researcher believed influences the degree of job satisfaction in ministerial staff members of Southern Baptist Churches. Problem statements of this type refer to multiple regression analysis. (See Chapter 26 for further information on multiple regression).
Difference Between Two Groups A study may focus on how two groups differ on a variable. The general format of this type of Problem Statement is this: The problem of this study is to determine the difference in (variable) between (group 1) and (group 2).
Dr. Mark Cook wrote his Problem statement this way: The problem of this study [is] to determine the difference in learning outcomes between classes taught with active student participation and classes taught with no active participation in adult Sunday School classes in a Southern Baptist Church.3
This study will measure the variable “learning outcomes” -- defined later as “the achievement score of the student on the multiple-choice post test measuring the lesson objectives at three cognitive levels: knowledge, comprehension, and application”4 -- in two groups of adult Sunday School members. One group experienced a Bible study which intentionally integrated active participation methods. The second group experienced the same Bible study without active participation. Would intentional active participation make a difference in their learning? The statistic inferred by this statement is the t-Test for Independent Samples. (See Chapter 20 for further information on the two sample independent t-test).
Differences Between More Than Two Groups A study may focus on how more than two groups differ on a variable. The general format of this type of Problem Statement is this: The problem of this study is to determine the difference in (variable) across (more than two groups).
Dr. Scott Floyd wrote his second problem statement this way: It [is] also the problem of this study to determine the difference in marital adjustment of Southern Baptist women. . . who were not employed outside the home, employed part-time, and employed on a fulltime basis.5
This study will measure “marital adjustment,” a ratio score, in Southern Baptist 2 Robert Horton Welch, “A Study of Selected Factors Related to Job Satisfaction in the Staff Organizations of Large Southern Baptist Churches,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1990), 4 3 Marcus Weldon Cook, “A Study of the Relationship Between Activie Participation as a Teaching Strategy and Student Learning in a Southern Baptist Church,” (Ph.D. diss., Southwestern Baptist Theological Seminary, 1994), 3 4 5 Ibid., 24 Floyd, 5 © 4th ed. 2006 Dr. Rick Yount 4-3
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
women divided into three employment groups. Do the mean scores of these three groups differ significantly? The Problem Statement infers the use of one-way Analysis of Variance (ANOVA). (See Chapter 21 for further information on ANOVA). Dr. Floyd tested one independent variable above. His primary problem, however, involved two. In addition to “employment status” he also divided women into three levels of “life cycle” -- ages 18-31, 32- to 46 and 47 to 65. The Problem statement for this design read this way: The problem of this study [is] to determine the interaction between life cycle stage and employment status of Southern Baptist women in Tarrant County, Texas, on a measure of marital adjustment.
This problem statement infers the use of two-way ANOVA, because it identifies two independent variables, employment and life cycle, and one dependent variable, marital adjustment. (See chapter 25 for information on Factorial ANOVA.) The Problem statement delineates the question of the study. It is the climax of the Introductory Statement and opens the door to the Synthesis of Related Literature. In doing your literature search, you will learn a great deal from others who have studied the variables you are interested in studying. At the end of the Related Literature section (see Chapter 6) you will be ready to write a confidence statement of your expected findings. This statement of expectation is called a hypothesis.
The Hypothesis Statement Research Directional Non-Directional Null
As explained in Chapter 2, an hypothesis states the anticipated answer to the problem you’ve stated. The two major types of hypotheses are the research, or alternative, hypothesis, and the null, or statistical, hypothesis. The research hypothesis can either be directional or non-directional.
The Research Hypothesis The research hypothesis flows directly out of the problem statement and declares in clear, objective, measurable terms what you expect the result of your study to be. The research hypothesis is located in the proposal under the section title “The Statement of the Hypothesis.” We'll consider examples of hypotheses, with their corresponding problem statements, under the same four divisions as before.
Association Between Two Variables Dr. Helen Ang wrote her problem statement in this format: The problem of this study [is] to determine the relationship between the leadership style of academic administrators in selected Christian colleges and universities and their educational philosophy profile.6
Her corresponding hypothesis was: [It is the hypothesis of this study that there will be] a significant relationship between the leadership style of the academic administor and his/her educational philosophy profile.7
Another way this “relationship between nominal variables” could be stated is this: It is the hypothesis of this study that leadership style of the academic administor
Ang, 3
6
4-4
Ibid., 19
7
© 4th ed. 2006 Dr. Rick Yount
Problem and Hypothesis
Chapter 4
and his/her educational philosophy profile are not independent.” The phrase “not independent” indicates more clearly that the study will use the chi-square statistic. Categories of leadership style and educational philosophy are the nominal measurements.
Association of several variables Dr. Dean Paret wrote his problem statement like this: The problem of this study [is] to determine the relationship between percieved current nuclear family health and a set of predictor variables: perceived autonomy and percieved intimacy in the family of origin of randomly selected married graduate students...8
His corresponding hypothesis was: It [is] the hypothesis of this study that autonomy and intimacy as percieved in the couple's family of origin are significant positive predictors of current nuclear family health.9
The above is a multiple regression example where one variable is being predicted by two others. Association among several variables can also involve several pairings of variables. Dr. Maria Bernadete Da Silva wrote her problem statement to analyze the relationships among several pairs of variables. The problem of this study [is] to determine the relationship between leadership style and the levels of agreement on selected social work values of social work administrators in social service agencies in Texas.10
Her corresponding hypothesis was: The hypothesis of this study [is] that leadership styles of social work administrators and the levels of agreement on four selected social work values [will not be] independent.11
The four social work values were respect for basic rights, social responsibility, individual freedom, and self-determination. “Level of agreement” of these values consisted of the number of social workers selecting one of four options: strongly agree, agree, disagree, or strongly disagree. This design required four chi-square tests of independence, matching leadership style and each of the four values.
Difference Between Two Groups Dr. Joan Havens wrote her problem statement like this: The problem of this study [is] to determine the difference in level of academic achievement across four populations of Christian home schooled children in Texas: those whose parents possessed (1) teaching certification, (2) a college degree but no certification, (3) two or more years of college, or (4) a high school diploma or less.12
One of her four hypotheses was stated this way: [The third hypothesis of this study is that] there would be no significant difference in levels of academic achievement in home schooled children whose parents possessed a teaching certificate and those whose parents did not.10 Paret, 5 Ibid., 10
8
Ibid., 37
9
10
Da Silva, 4
11
Ibid., 7
12
Havens, 7
13
© 4th ed. 2006 Dr. Rick Yount
4-5
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
Scores were divided into two groups for purposes of testing this hypothesis: one group of children had parent-teachers with teacher certification and the second group did not. Did academic achievement -- defined as “improved grade level scores in vocabulary, reading, writing, spelling, mathematics, science and social studies skills, as measured by the subtests of the Standford Achievement Test”14 -- significantly differ between these two groups? This hypothesis suggests the use of t-Test for Independent Samples (Chapter 20). Dr. Daryl Eldridge, conducting an experimental study, wrote his problem statement this way: The problem of this study will be to investigate the effect of student knowledge of behaviorally stated course objectives upon the performance and attitudes of seminary students in a church curriculum planning course.15
Dr. Eldridge wrote two hypotheses out of this problem: To carry out the purposes of this study, the following hypotheses will be tested: 1. It is the hypothesis of this study that the test scores of students who have knowledge of course objectives will be significantly greater than the test scores of students who have no knowledge of objectives. 2. It is the hypothesis of this study that students with knowledge of course objectives will score significantly higher on an inventory of Student Appraisal of Teaching and Course than those who have no knowledge of objectives.16
Both of these hypotheses infer t-Test for Independent Samples.
Differences Between More Than Two Groups Dr. John Babler wrote his Problem Statement this way: The problem of this study [is] to determine the differences between hospice social workers, nurses, and spiritual care professionals in their provision of spiritual care to hospice patients and families.17
His corresponding hypothesis was: The hypothesis of this study [is] that there [will] be significant differences in scores on the instrument adapted for this study to assess the provision of spiritual care to hospice patients and families between social workers, nurses, and spiritual care professionals.18
The instrument adapted for his study produced interval data. The hypothesis infers use of the one-way Analysis of Variance statistic. (Chapter 21) Research hypotheses can be directional or non-directional. The distinction between these two types of research hypotheses lies in whether the hypothesis simply states a difference or states a difference in a specific direction.
The Directional Hypothesis Several of the previous examples of research hypotheses are directional. That is, they include a specific “direction” of result: For example, look at the following again: 14
4-6
Ibid., 21
15
Eldridge, 3
16
Ibid., 29
17
Babler, 7
18
Ibid., 32
© 4th ed. 2006 Dr. Rick Yount
Problem and Hypothesis
Chapter 4
It [is] the hypothesis of this study that autonomy and intimacy as percieved in the couple's family of origin are significant positive predictors of current nuclear family health. (Paret) 1. It is the hypothesis of this study that the test scores of students who have knowledge of course objectives will be significantly greater than the test scores of students who have no knowledge of objectives. (Eldridge)
When you state your research hypothesis in a directional form, you show more confidence in the anticipated result of your study. This confidence grows out of your literature review and expertise in the field. You should state your research hypotheses in a directional format if possible.
The Non-directional Hypothesis A non-directional hypothesis states that a “difference” or “relationship” exists between variables, but does not specify what kind of difference or relationship it is. For example, the hypotheses above can be re-written as non-directional hypotheses as follows: It [is] the hypothesis of this study that autonomy and intimacy as percieved in the couple's family of origin are significant predictors of current nuclear family health. 1. It is the hypothesis of this study that the test scores of students who have knowledge of course objectives will be significantly different from the test scores of students who have no knowledge of objectives.
The first hypothesizes prediction, but does not specify direction positive or negative. The second hypothesizes difference, but does not specify greater than or smaller than. These non-directional statements are weaker than the directional statements actually stated by the researchers. Use a non-directional research hypothesis in your proposal only if you cannot develop a reasonable basis for stating a direction for your anticipated results.
The Null Hypothesis Research design emphasizes the research hypothesis. Statistical analysis, on the other hand, emphasizes the null hypothesis since statistical procedures can only test null hypotheses. The null hypothesis is stated to reflect “no difference” between groups or “no relationship” between variables. If the null hypothesis of “no difference” is shown statistically to be unlikely, we can “reject the null hypothesis” and “accept the alternative (research) hypothesis.” The null hypothesis is located in the proposal section entitled “Testing the Hypothesis.” This section is found in the ANALYSIS section of your proposal. (Review this section in chapter 2.) Let’s restate the hypotheses listed above in their null form. It [is] the hypothesis of this study that autonomy and intimacy as percieved in the couple's family of origin are not significant predictors of current nuclear family health. 1. It is the hypothesis of this study that the test scores of students who have knowledge of course objectives will not be significantly different from the test scores of students who have no knowledge of objectives.
© 4th ed. 2006 Dr. Rick Yount
4-7
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
Notice that the “null” form of the hypothesis declares no relationship among variables, and no difference between groups. NOTE: There are times, though rare, when the "null hypothesis" is the "research hypothesis" of the study. For example, you are creating a new treatment that you believe will require half the time, but will produce the same results, as a more costly, time-intensive procedure. Your intent to show "no difference" between the approaches. In these rare occasions, the null is the research hypothesis as wellas the statistical hypothesis. The point: The null is not always the opposite of the research hypothesis.
Revision Examples It is relatively easy to read a statement of problem or hypothesis and agree that it is focused and meaningful. It is quite another to write such statements. The following examples are problem and hypothesis statements written by students in class. I will comment on the statement as written, and then suggest a revised version.
Example 1 “The problem of this study is to determine the effect of adequate premarital counseling on the success rate of teenage marriages.”
Comments The term “effect” calls for an experimental or ex post facto approach to the study. If you are thinking in this direction, move to Chapter 13 soon. I encourage you to pursue an experimental design, but students sometimes use the term “effect” when they are actually thinking of correlation. You cannot infer a cause-and-effect relationship from a correlation. There are other questions raised by this Problem. What is “adequate” counseling? What kind of “premarital counseling”? How will you measure “success rate”? Success over what period of time? How do you define “teenage marriage”? Is this study focusing only on teenagers who are married, or on all marriages which began in the teenage years?
Suggested revision “The problem of this study is to determine the difference in attitude toward married life between married teenagers who undergo a specified course of premarital counseling and those who do not.” Here you are studying teenagers who are married. You will have two groups: one group undergoes a specified counseling treatment (which you will define under Procedure for Collecting Data) and the other doesn’t. You measure differences in attitude toward married life between the two groups.
Example 2 “The problem of this study is to determine whether those who complete MasterLife Discipleship Training will have a more positive attitude toward discipleship and will become actively involved in discipleship.”
4-8
© 4th ed. 2006 Dr. Rick Yount
Problem and Hypothesis
Chapter 4
Comments “More positive” than what? There is nothing to compare MasterLife against. What is meant by “actively involved”? “Discipleship” is a global term. What does it mean in the framework of this study? What is the theoretical basis for this study? How will it contribute to the field of Christian education? Is this really an evaluation of the MasterLife program?
Suggested revision “The problem of this study is to determine the difference in discipleship skills and attitudes developed in median adults between the MasterLife Discipleship Training program and the (Alternative) Discipleship Training program.” This study will evaluate MasterLife against another discipleship training program. The basis for comparison will be measured skills and attitudes in the area of discipleship.
Example 3 “It is the hypothesis of this study that the level of social extroversion expressed by a child will differ significantly in relationship to the type of before and after school care environment he or she receives.”
Comments This statement targets the variables rather well. Level of social extroversion and type of “care” environment are clearly stated. But the wording is awkward. How many types of “before and after school care” will be studied? Two? Three? What does “type of care” mean? How will it be measured?
Suggested revision “It is the hypothesis of this study that children receiving Type I care will score significantly higher on the social extroversion scale than children receiving Type II care.” Two types of child care are specified. These two types are directly compared on the basis of a social extroversion measurement of the children. If one were interested in comparing several types of child care, the hypothesis could read: “It is the hypothesis of this study that children’s scores on the social extroversion scale will significantly differ across (number) specified types of before and after school care.”
Example 4 “It is the hypothesis of this study that staff longevity of ministers is significantly increased in churches using a salary administration plan than churches who do not use such a plan.”
Comments The term “increased” indicates a “before and after” study. This may be difficult to do in churches. How do you get churches to agree to install a different plan for pur© 4th ed. 2006 Dr. Rick Yount
4-9
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
poses of a research study? It is easier to focus on “difference.” What is “staff longevity”? How long a staff member stays in a position? How is it measured? Months? Years? What is a “salary plan”? This is a fuzzy concept. How will you determine whether a church qualifies as “having a plan” or “not having a plan”? Is a bad plan better than no plan? Is salary the major factor in staff longevity? Are there other variables that need to be considered in studying why staff members remain in a given church? How will the researcher deal with ineffective staff members who are not invited to consider other churches —those who remain because they have nowhere else to go?
Suggested revision “It is the hypothesis of this study that the length of service of ministers is significantly higher in churches that qualify as having a specified salary administration plan than in churches that do not.” The researcher maintains his focus on salary. However, there is a procedure which will be used to categorize churches on the basis of their salary plans. Rather than measure “increase,” the researcher will look at the difference between length of service of ministers in two categories of churches.
Example 5 “The hypothesis of this study is that men who remain in the pastorate are significantly different than those who leave the pastorate to enter denominational work.”
Comments This statement uses some of the words we’ve discussed, but misses the mark as a hypothesis statement. It is an excellent example of a hypothesis written by someone who “knows the words” but does not understand their meaning (“But I used the words ‘significantly different’!) What is the variable being studied? These two groups of men will be “different” on what variables(s)? What is the theoretical foundation of this? Is there justification for considering “pastoral ministry” or “denominational ministry” better than the other? Besides, what is being measured? How will the researcher obtain his data? There is really no study here. We need to head back to the drawing board on this one.
Dissertation Examples The Problem-Hypothesis-Statistic set forms the backbone, the framework, for both the proposal and the dissertation itself. While you are certainly not expected to understand the statistical procedures referenced here, I include them for future reference and for a sense of completeness. We will introduce you to these and other statistical procedures in Chapter 5, and focus on them in chapters 16 to 26. The following statement-sets are drawn from dissertations of our graduates. They are written in the past tense since they are taken from the dissertations.
Regression Analysis The problem of this study was to determine the relationship between attitudes concerning computer-enhanced learning and selected individual and institutional variables of full-time
4-10
© 4th ed. 2006 Dr. Rick Yount
Problem and Hypothesis
Chapter 4
faculty members at Southwestern Baptist Theological Seminary. [The hypothesis] of this study was that the following variables would prove to be significant predictors of attitudes toward computer-enhanced learning for theological education among the full-time faculty of Southwestern Baptist Theological Seminary: age, gender, school division where teaching, discipline teaching, degree(s) held, number of years teaching at Southwestern, last enrolled in a course, whether or not own a computer, believe students should own a computer, and taken any computer courses/instruction.19
The statistic for this study was Multiple Regression (see Chapter 26). There were two significant predictors found in this study: whether the professor owned a computer or not, and whether they believed students should own a computer. A positive attitude toward computer-enhanced learning in theological education was predicted by "yes" answers to these two questions.
Correlation of Competency Rankings The problem of this study was to determine the relationship between rankings of competencies for effective ministers of education. These rankings were produced by two groups of Southern Baptist ministers. Group one consisted of Southern Baptist pastors currently serving with ministers of education. Group two consisted of ministers of education currently serving in Southern Baptist churches. The hypothesis for this study was that there is a signicant positive relationship between the two rankings of competencies for an effective minister of education as identified by Southern Baptist pastors and ministers of education.20
The statistic for this study was Spearman rho correlation coefficient (see Chapter 22). Competencies for ministers of education were divided into five areas: minister, administrator, educator, growth agent, and personal [relational skills]. Higher coefficients reflect higher agreement between pastors and educators on ranked competencies. Lower coefficients reflect lesser agreement. The coefficients were minister (0.94), administrator (.64), educator (.83), growth agent (.54) and personal (.70).
Factorial Analysis of Variance The problem of this study was to determine the difference in the spiritual maturity levels of the Christian school senior and the public school senior in the Texas Southern Baptist churches sponsoring a Christian school with twelve grades. The hypotheses of this study are (1) there will be insignificant interaction between the variables "school" [public, Christian] and "activity" ["active"/"inactive" in Sunday School], (2) there will be significant . . . difference in spiritual maturity across the variable "school," and (3) there will be a significant . . . difference in spiritual maturity across the variable "activity."
The statistic for this study was Factorial Anova (see Chapter 25). There was no interaction between the two variables, so the two "main effects" (school, activity) could be interpreted directly. There was no significant difference in spiritual maturity between seniors in Christian vs. public schools, but spiritual maturity in active Sunday School attenders was significantly higher than in inactive attenders.
Chi-Square Analysis of Independence The problem of this study was to determine the relationship between the dominant management style and selected variables of full-time ministers of preschool and childhood education in Southern Baptist churches in Texas. The selected variables were level of education, years 19
Bergen, 7, 46
20
Bass, 3, 37
© 4th ed. 2006 Dr. Rick Yount
21
LaNoue, 2, 22
22
Marcia McQuitty, 5, 27
4-11
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
of service on church staffs, task preference, gender, and age. The hypothesis of this study was that dominant management style and selected variables were not independent.22
The statistic for this study was the chi-square test of independence (see Chapter 23). Dr. McQuitty queried all full-time preschool and children's ministers serving in Texas Baptist churches (N=132), and actually gathered data from eighty one (81). Only nineteen (19) ministers produced a “dominant” management style, and thirteen (13) of these were categorized as “comforter.” This discovery required a change in the hypothesis: rather than one of five management styles, Dr. McQuitty tested her specified variables against “dominant” vs “multiple” management styles. None of the specified variables produced a significant chi-square value.23 Still, insights gained through the data collection provided important insights into the strengths and needs of preschool and childhood education ministers -- insights which Dr. McQuitty uses in her seminary classes.
Analysis of Variance The problem of this study was to determine the difference in achievement, both cognitive and affective, among students who learned through interactive instruction, simulation games, and presentational instruction in the Hong Kong Baptist Theological Seminary, Hong Kong.24 The following were the hypotheses of the study: 1. H1 : was the hypothesis that there was significant difference among the means across [testing] occasions. . . 2. H2 : was the hypothesis that there was significant difference among the means across all groups. . . 3. [interaction] 4. [post-test 1: cognitive] 5. [post-test 1: affective] 6. [post-test 2: cognitive] 7. [post-test 2: affective]25
The statistic for this study was one-way analysis of variance (see Chapter 21). The analysis revealed no significant differences in cognitive learning across teaching methods used in the three groups. All three groups learned. The greatest change in attitude toward learning and interpersonal relationships occurred in the "Simulation Games" group.26
Summary The material of this chapter is crucial to your research proposal. It is important that you understand the concepts discussed here and be able to use them with your own topic. Read the examples of good statements several times until the pattern of each kind of study begins to become clear. Work step-by-step through the evaluations of the “real-life” examples.
Ibid., 43 Stephen Tam, “A Comparative Study of Three Teaching Methods in the Hong Kong Baptist Theological Seminary,” (Ed.D. diss., Southwestern Baptist Theological Seminary, 1989), 2 25 26 Ibid., 14-17 Ibid., 76-77 23 24
4-12
© 4th ed. 2006 Dr. Rick Yount
Problem and Hypothesis
Chapter 4
Vocabulary research hypothesis null hypothesis statistical hypothesis directional hypothesis non-directional
anticipated outcome of study, stated in terms of difference (grps), or relationship (vars) anticipated outcome of study, stated in terms of NO difference or NO relationship same as null hypothesis states a direction of difference (larger, smaller) or relationship (positive, negative) states no direction -- simply states 'difference' or 'relationship'
Study Questions 1. Explain the purpose of the problem and hypothesis statements. 2. Describe the four characteristics of a good problem statement. 3. Describe four types of hypothesis statements.
Sample Test Questions 1. A good problem statement should A. broaden the focus of the proposed study B. give primary attention to practical “how-to” matters C. include necessary definitions and procedures for clarity D. focus on the theoretical foundation of your field 2. Choose the best problem statement below: “It is the problem of this study... A. to determine the relationship between pastors and youth ministers on their attitude toward the Bible.” B. to see how well churches treat their staff members.” C. to answer the question, “Why do so many staff members leave the ministry?” D. to determine the difference between SBC pastors and denominational employees’ attitudes concerning Cooperative Program giving.” 3. Identify the following hypothesis statements as directional research, non-directional research, or null (statistical) hypotheses by writing the appropriate letter in the blank provided. D irectional
N on-directional
S tatistical
____
Therapy A will result in significantly less marital anxiety than therapy B
____
There will be no significant difference between Teaching Approaches 1 and 2.
____
There will be a relationship between Number of Hours Studied and GPA
____
Number of Hours Worked Outside the Home and Marital Satisfaction are independent
____
Bible Knowledge Score will be significantly different across the three groups
____
Senior Adults' Preference Score toward the King James Version will be significantly higher than for Young Adults
____
There will be no difference in ministerial commitment scores across three staff categories
____
Men and women will score differently on the “nurturing scale” of the BA12 Test
© 4th ed. 2006 Dr. Rick Yount
4-13
Introduction to Statistics
Chapter 5
5 Statistical Analysis Statistics, Mathematics, and Measurement A Statistical Flow Chart
In the first four chapters of the text, we have focused on concerns of research design: the scientific method, types of research, proposal elements, measurement types, defining variables, and problem and hypothesis statements. But designing a plan to gather research data is only half the picture. When we complete the gathering portion of a study, we have nothing more than a group of numbers. The information is meaningless until the numbers are reduced, condensed, summarized, analyzed and interpreted. Statistical analysis converts numbers into meaningful conclusions in accordance with the purposes of a study. We will spend chapters 15-26 mastering the most popular statistical tools. But you must understand something of statistics now in order to properly plan how you should collect your data. That is, the proper development of a research proposal is dependent on what kind of data you will collect and what statistical procedures exist to analyze that data. The fields of research design and statistical analysis are distinct and separate disciplines. In fact, in most graduate schools, you would take one or more courses in research design and other courses in statistics. My experience with four different graduate programs has been that little effort is made to bridge the two disciplines. Yet, the fields of research and statistics have a symbiotic relationship. They depend on each other. One cannot have a good research design with a bad plan for analysis. And the best statistical computer program is powerless to derive real meaning from badly collected data. So before we get too far into the proposal writing process, some time must be given to establishing a sense of direction in the far-ranging field of statistics.
Statistics, Mathematics, and Measurement There are two major divisions of statistical analysis. The first emphasizes reducing the data that has been collected in a research study. The second emphasizes making inferences from the collected data.
© 4th ed. 2006 Dr. Rick Yount
5-1
Descriptive Inferential Mathematics Measurement
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
Descriptive Statistics Descriptive statistical procedures are used to describe a group of numbers. These tools reduce raw data to a more meaningful form. You’ve used descriptive statistics when averaging test grades during the semester to determine what grade you’ll get. The single average, say, a 94, represents all the grades you’ve earned in the course throughout the entire semester. (Whether this 94 translates to an “A” or a “C” depends on factors outside of statistics!). Descriptive statistics are covered in chapters 15 (mean and standard deviation) and 22 (correlation).
Inferential statistics Inferential statistics are used to infer findings from a smaller group to a larger one. You will recall the brief discussion of “population” and “sample” in chapter 2. When the group we want to study is too large to study as a whole, we can draw a sample of subjects from the group and study them. Descriptive statistics about the sample is not our interest. We want to develop conclusions about the large group as a whole. Procedures that allow us to make inferences from samples to populations are called inferential statistics. For example, there are over 36,000 pastors in the Southern Baptist Convention. It is impossible to interview or survey or test all 36,000 subjects. Round-trip postage alone would cost over $21,000. But we could randomly select, say, one percent (1%) or 360 pastors for the study, analyze the data of the 360, and infer the characteristics of the 36,000. Inferential procedures are covered in chapters 16, 17, 18, 19, 20, and 21.
Statistics and Mathematics Statistics is a branch of applied mathematics. Depending on how much emphasis a teacher or text places on the term applied (“practical”), the study of statistics can range from helpful to hurtful to the mathematical novice. The more mathematical the emphasis, the more one finds derivations of formulas and higher order math symbolism. I’ve seen some statistics texts that had more Greek than English in them! The emphasis of this textbook is practical use of statistical procedures and their interpretations. You will be doing some math, but mathematical skill is not required to do well in the course. Most of the procedures we will study require simple arithmetic computations (+, - , x, ÷). We will also make use of the square (²) and square root (√) keys on your calculator. If you don’t already own a calculator, buy a statistical calculator. You can recognize one by keys such as Σ+, Σ-,and σ. You can buy a calculator like this for less than $25.00 and it will be money well spent!
Statistics and Measurement In chapter 3 we introduced you to four kinds of data: nominal, ordinal, interval, and ratio. Interval and ratio data often use the same statistical procedures, which means that we must learn three different sets of tests. Nominal data requires one kind of statistics (we’ll focus on chi-square), ordinal data another (we’ll focus on the Spearman rho and the Mann Whitney U), and interval/ratio a third. The interval/ratio procedures —z-test, t-test, ANOVA, Pearson’s r correlation coefficient and multiple regression — make up the bulk of our study in statistics.
5-2
© 4th ed. 2006 Dr. Rick Yount
Introduction to Statistics
Chapter 5
Statistical Flowchart Accompanies explanation on pages 5-4 to 5-7 in text
Studying Similarities Among Variables or Differences Between Groups?
1 Relationships Between Variables Interval/Ratio? Ordinal? Nominal?
I/R
O
3
4
2 Vars 3+ Vars
2 Ranks 3+ Ranks
3b· Multiple Regression
4b ·Kendall's W
3a· Pearson's r · Linear Regression
4a · Spearman rho (ρ)
2 Differences Between Groups
I/R
N
5 2 Dicho* 1 Var 2 Vars
· Kendall's tau (τ)
5a · Phi Coefficient (rφ) · Rank Biserial 1 Dichotomous and 1 Ordinal
· Point Biserial
1 Dichotomous and 1 Interval/Ratio
Interval/Ratio? Ordinal?
O
6
7
1 Group 2 Groups 2+ Groups
2 Groups 3+ Groups
6c · One-Way ANOVA 1 Independent Var - 1 Dependent Var - Ind't Groups
· Repeated Measures ANOVA 1 Independent Var - 1 Dependent Var - Matched Groups
· Factorial ANOVA
2+ Independent Var - 1 Dependent Var - Ind't Groups
·MANOVA
1 Independent Var - 2+ Dependent Var - Ind't Groups
6b · t-test for Ind't Samples 2 Independent Groups
· t-test for Matched Samples 2 Matched Groups
6a · One-sample z-test
Sample mean and Population mean - σ known, or n>30
· One-sample t-test χ²) 5b ·Chi-Square (χ Goodness of Fit Equal E · Proportional E
χ²) 5c ·Chi-Square (χ Test of Independence Contingency Coefficient Cramer's Phi
Sample mean and Population mean - σ unknown
7c · Kruskal-Wallis H test
Rankings Divided into 3+ Matched Groups
7a · Wilcoxin Matched Pairs T test Rankings Divided into 2 Matched Groups
7b · Mann-Whitney U test · Wilcoxin Rank-Sum test Rankings Divided into 2 Ind't Groups
ASSOCIATION
DIFFERENCE
*Dichotomous - two and only two categories
© 4th ed. 2006 Dr. Rick Yount
5-3
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
A Statistical Flow Chart The flow chart on the preceding page is offered as a visual mental “road map” into the world of statistics. It is designed to provide you direction, step by step, by use of questions, to the specific procedure you should use for a particular type of study and data. The following is a verbal roadmap, given to clarify the diagram. An additional purpose of this section is to introduce you to the names of major procedures we’ll study later in the semester. Bold faced names indicate procedures we’ll discuss extensively. Follow the directions to the proper numbered section below.
Question One: Similarity or Difference? In your study, are you looking for similarities between variables, or differences between groups? A “similarity” study would explore, for example, the “relationship between self-esteem and marital harmony” (two variables) in selected couples (one group).. A “difference” study might examine the “difference in social skills” (one variable) between autocratic and democratic ministers (two groups). If you are contemplating a “similarity” study, go to -1- below. If you are contemplating a “difference” study, go to -2-.
-1- Question Two: Data Types in Similarity Studies
You have chosen a “similarity study.” Statistical procedures that compute coefficients of similarity or association or correlation (synonymous terms) come in four basic types. The first type computes correlation coefficients between interval or ratio variables. The second type computes correlation coefficients between ordinal variables. The third type computes correlation coefficients between nominal variables (or, at the very least, at least one of the two is nominal). The fourth type is a special category which computes a coefficient of independence between nominal variables. If your data is interval or ratio, then go to -3- below. If your data is ordinal, then go to -4- below. If your data is nominal, then go to -5- below.
-2- Question Two: Data Types in Difference Studies You have chosen a “difference study.” Statistical procedures that compute measures of significant difference come in two major types. The first computes measures of difference for interval or ratio variables. The second computes differences between ordinal variables. If your data is interval or ratio, then go to -6- below. If ordinal, go to -7-.
-3- Interval or Ratio Correlation Interval/ratio correlation procedures come in two types. The first type examines two and only two variables at a time (go to -3a-). The second type examines three or more variables simultaneously (go to -3b-). -3aInt/ratio Correlation with 2 Variables The two procedures we will study are Pearson’s Product Moment Correlation Coefficient (rxy or simply r) and simple linear regression. Pearson’s r directly measures the degree of association between two interval/ratio variables. See Chapter 22. Simple linear regression computes an equation of a line which allows researchers to predict one interval/ratio variable from another. See Chapter 26.
5-4
© 4th ed. 2006 Dr. Rick Yount
Chapter 5
Introduction to Statistics
-3bInterval\ratio Correlation with 3+ Variables The procedure we will study which analyzes three or more interval/ratio variables simultaneously is multiple linear regression. This procedure is quickly becoming the dominant statistical procedure in the social sciences. With this procedure, you develop “models” which relate two or more “predictor variables” to a single predicted variable. We will confine our study to understanding the printouts of a statistical computer program called SYSTAT. See Chapter 26.
-4- Ordinal Correlation
Just like the interval/ratio procedures above, ordinal correlation procedures come in two types.
-4aOrdinal Correlation with 2 Variables The two procedures which compute a correlation coefficient between two ordinal variables are Spearman’s rho (rs) and Kendall’s tau (τ). Spearman’s rho should be used when you have ten or more pairs of rankings; Kendall’s tau when you have less than ten. Both measures give you the same information. If you had pastors and ministers of education rank order seven statements of “characteristics of Christian leadership,” you would compute the degree of agreement between the rankings of the two groups with Kendall’s tau. See Chapter 22. -4bOrdinal Correlation with 3+ Variables Kendall’s Coefficient of Concordance (W) measures the degree of agreement in ranking from more than two groups. Using our example above, you could compute the degree of agreement in rankings of pastors, ministers of education and seminary professors using Kendall’s W. See Chapter 22.
-5- Nominal Correlation Nominal correlation procedures come in two types differentiated by the number of categories in nominal variables. If you have two and only two categories, the variables are called dichotomous (go to -5a-). If nominal variables have more than two categories, go to -5b- below. -5aNominal Correlation with Dichotomous Variables When you have two variables which can take two and only two values (“dichotomous variables”), use the Phi Coefficient. When you have one dichotomous and one rank variable, use Rank Biserial. When you have one dichotomous and one interval/ ratio variable, use Point Biserial. See Chapter 22. -5bNominal Correlation wth 3+ Categories Procedures which determine whether two nominal variables are independent (not χ²) tests. The word "chi" related) or not independent (related) are called Chi-square (χ is pronounced "ki" as in "kite." The Chi-square Goodness of Fit test compares observed category counts (30 males [75%], 10 females [25%]) with expected counts based on school enrollment [85% © 4th ed. 2006 Dr. Rick Yount
5-5
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
male, 15% female] to determine if class enrollment “fits well” the expected enrollment. The Chi-square Test of Independence compares two nominal variables to determine if they are independent. Are “educational philosophy” (5 categories) and “leadership style” (5 categories) independent of each other? When you want to determine the strength of the relationship between the two nominal variables, use Cramer’s Phi (φc). This procedure computes a Pearson’s r type coefficient from the computed χ² value. See Chapter 23.
-6- Interval/Ratio Differences This section of statistical procedures is the most important, and will consume the greater part of our study of statistics. If you are testing one sample against a given population, go to -6a-. If you are testing the difference between two groups, go to -6b-. If you are testing differences between three or more groups, go to -6c-. -6a1-Sample Parametric Tests of Difference The first type of interval/ratio difference procedures computes whether data from one sample is significantly different from the population from which it was drawn. If you have more than 30 subjects in the sample, use the one-sample z-test. If you have fewer than 30 subjects, use the one-sample t-test. Here’s an example: You know the average income of all Southern Baptist pastors in Texas. You collect information on income of a sample of Southern Baptist pastors who are also seminary graduates. Is there a significant difference in average income between the sample and the population? See Chapter 19. -6b2-Sample Parametric Tests of Difference The second type computes whether data from two samples is significantly different. There are two different procedures which are used. The first is used when the two samples are randomly selected independently of each other: a sample of Texas pastors and a second sample of Texas ministers of youth. For this situation, use the Independent Samples t-test. See Chapter 20. The second procedure is used when pairs are sampled. Examples of sampling pairs include husbands and their wives, pastors and their deacon chairmen, fathers and their sons, counselors and their clients, and so forth. If you have two groups of paired subjects (husbands and their wives), use the Matched Samples t-test. See Chapter 20. -6cn-Sample Parametric Tests of Difference The third type computes “significant difference” across three or more groups. Again, procedures depend on whether the groups are matched (correlated, related) or independent. If the groups are independent, and you are examining one independent (“grouping”) variable, use One-Way Analysis of Variance. For example, is there a significant difference in Integration of Faith with Life between Southern Baptists, Episcopalians, and members of the Assemblies of God. See Chapter 21. If you are studying two or more independent variables simultaneously, use n-Way Analysis of Variance (also called Factorial ANOVA), where n is the number of independent variables. The importance of Factorial ANOVA is in the ability to study
5-6
© 4th ed. 2006 Dr. Rick Yount
Introduction to Statistics
Chapter 5
interaction among the independent variables. See Chapter 25. If the groups are related, use the Repeated Measures Analysis of Variance. (Not discussed in this text.)
-7- Ordinal Differences The measurement of significant difference between (small) groups of data is closely related to the interval/ratio procedures we just mentioned. To conserve space, let me simply give you the ordinal equivalents of the procedures we discussed in -6above. Use these procedures when your group sizes are too small for procedures under -6- above. See Chapter 21 for all these procedures. -7a-
The Wilcoxin Matched Pairs Test (Wilcoxin T) is analogous to the Matched Samples t-test. -7b-
The Wilcoxin Test Rank Sum Test and the Mann-Whitney U Test are analogous to the Independent Samples t-test. -7c-
The Kruskal-Wallis H Test is analogous to the One-Way ANOVA.
Summary In this chapter we introduced you to statistical analysis. We linked statistics to the process of research design. We looked at two major divisions of statistics. We separated the practical application of statistical procedures from the need for higher level mathematics skills. We differentiated statistical differences by measurement type. And finally, we laid out a mental map of statistical procedures we will be studying so that you can determine which procedures might be of use to you in your own proposal.
Vocabulary correlation coefficient Cramer’s Phi descriptive statistics Factorial ANOVA Goodness of Fit Indep't Samples t-test Inferential statistics Kendall’s tau Kendall’s W Kruskal-Wallis H Test Linear Regression Mann-Whitney U Test Matched Samples t-test Multiple Regression one-sample z-test one-sample t-test One-Way ANOVA Pearson’s r
© 4th ed. 2006 Dr. Rick Yount
a number which reflects the degree of association between two variables measures strength of correlation between two nominal variables measures population or sample variables two-way, three-way ANOVA” compares observed counts with expected counts on 1 nominal variable tests whether the average scores of two groups are statistically different INFERS population measures from the analysis of samples correlation coefficient between two sets of ranks (n < 10) correlation coefficient among three or more sets of ranks non-parametric equivalent of ANOVA establishes the relationship between one variable and one predictor variable non-parametric equivalent of the independent t-test tests whether the paired scores of two groups are statistically different establishes the relationship between one variable and multiple predictor variables tests whether a sample mean is different from its population mean (n > 30) tests whether a sample average is different from its population average tests whether average scores of three or more groups are statistically different correlation coefficient between two interval/ratio variables
5-7
Research Design and Statistical Analysis for Christian Ministry
Phi Coefficient Point Biserial Rank Biserial Spearman’s rho Test of Independence Two Sample Wilcoxin Wilcoxin Matched Pairs
I: Research Fundamentals
correlation coefficient between two dichotomous variables correlation coefficient between interval/ratio variable and dichotomous variable correlation coefficient between ordinal variable and dichotomous variable correlation coefficient between two sets of ranks (n > 10) chi-square test of association between two nominal variables non-parametric equivalent of independent t-test non-parametric equivalent of matched samples t-test
Study Questions 1. Differentiate between descriptive and inferential statistics. 2. Consider your own proposal. Review the types of data (Chapter 3). List several statistical procedures you might consider for your proposal. Scan the chapters in this text which deal with the procedures you’ve selected. 3. Give one example of each data type (Review Chapter 3). Identify one statistical procedure for each example you give.
Sample Test Questions Identify which statistical procedure should be used for the following kinds of studies. Write the letter of the procedure in the blank. Goodness of Fit - χ2 Pearson r Mult Regression (Mult Multiple) T-test (IInd’t)
Mann Whitney U Phi Coefficient Ph Spearman rho Test of Ind Independence - χ2
One-Way ANOVA Regression (LLinear) Matched) T-test (M Wilcoxin T (Pairs)
____ 1. Difference between fathers and their adult sons on a Business Ethics test. ____ 2. Whether learning style and gender are independent. ____ 3. Analysis of six predictor variables for job satisfaction in the ministry. ____ 4. Difference in Bible Knowledge test scores across three groups of youth ministers. ____ 5. Prediction of marital satisfaction by self-esteem of husband. ____ 6. Relationship between number of years in the ministry and job satisfaction score. ____ 7. Difference in anxiety reduction between treatment group I and treatment group II. ____ 8. Correlation between rankings of objectives of the School of Religious Education by pastors and ministers of education.
5-8
© 4th ed. 2006 Dr. Rick Yount
Chapter 6
Synthesis of Literature
6 Synthesis of Related Literature A Definition The Procedure In this chapter we look at the process of finding, collecting, analyzing and synthesizing research articles which relate to the topic of our study. Before we can add to the knowledge base of our field of study, we must learn what is already known. The literature search provides a factual base for the proposed study.
A Definition The related literature section of your proposal, entitled the “Synthesis of Related Literature,” is a synthetic narrative of recent research which is related to your study.
Synthetic Narrative The related literature section is a synthetic narrative. It is a narrative in the sense that it should flow from the beginning to the end with a single, coordinated theme. It should not contain a series of disjointed summaries of research articles. Such unrelated and disconnected summaries generate confusion rather than understanding. It is synthetic in that it has been born out of the synthesis of many research studies. You will analyze research reports by key words. There may be twenty articles that provide information for a given key word. As you write your findings for each of your key words, you will draw from all of the articles addressing that key word simultaneously. The final product will be a synthesis — a smooth blending — of selected articles built around the key words of your study. This is the reason for the name of this section: “The Synthesis of Related Literature.” Not a summary, but a synthesis.
Recent Research The synthesis of related literature focuses on recent research. The rule of thumb in defining “recent” is ten years. You will want to select and include research articles which are less than 10 years old. Major emphasis should be placed on research conducted in the past 5 years. Articles older than this are out of date and misleading. Consider an opinion survey conducted in 1955 on the attitudes of Americans on “family.” Such information has little relevance to family attitudes today. Its only value would be to show the change in attitude since 1955. Gather your information from research journal articles rather than books. Books are, by necessity, more out of date than the research they’re based upon. Research reports are primary sources of information, because they are written by those who conducted the study. Books are usually secondary sources; that is, sources written by authors not directly associated with the reported research: they merely compile re© 4th ed. 2006 Dr. Rick Yount
6-1
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
search results from many sources. Focus your synthesis on primary sources of information.
Related to Your Study Your Problem Statement and its associated operationalized variables define the boundaries of your literature search. Each and every footnote in the synthesis should directly relate to your subject. The purpose of the synthesis is not to provide filler for the proposal. The purpose is to convey in a clear, focused way the present body of knowledge which relates to your intended study.
The Procedure for Writing the Related Literature
Choose database Choose sources Determine keys
Choose One or More Databases
Search Select articles Analyze articles Reorganize Material Write Synthesis Revise Synthesis
A “database” is a high-tech term which refers to a collection of information in a particular field of study. The information stored in a database includes research reports, formal speeches, journal articles, minutes of professional meetings, and the like. These databases can be searched manually, by book-type indexes, or electronically, by computer. Manual searching costs little or no money, but consumes large amounts of time. Computer searches are fast and efficient, but can become expensive.
E.R.I.C. The Educational Resources Information Center (ERIC) was initiated in 1965 by the U.S. Office of Education to transmit findings of current educational research to researchers, teachers, administrators and graduate students.1 Information is housed in 16 “clearinghouses” around the nation.2
RIE The ERIC system consists of two major parts. The first is the Resources in Education (RIE) which provides abstracts of unpublished papers presented at educational conferences, speeches, progress reports of on-going research studies, and final reports of projects conducted by local agencies such as school districts.3
CIJE The second major part of the ERIC system is the Current Index of Journals in Education (CIJE). The CIJE indexes articles published in over 300 educational journals and articles about educational concerns in other professional journals.4 In general, ERIC listings have less lag time than the Education Index or Psychological Abstracts. This means it will provide you with more recent research findings. Altogether, the ERIC system indexes and abstracts research projects, theses, conference proceedings, project reports, speeches, bibliographies, curriculum-related materials, books and more than 750 educational journals.1 Walter R. Borg and Meredith D. Gall, Educational Research: An Introduction, 4th (New York: Longman Publishing Co., 1983), 153. 2 See Borg and Gall, pp. 901-2 for addresses of clearinghouses. 3 Ibid., p. 153 4 Charles D. Hopkins, Educational Research: A Structure for Inquiry (Columbus, Ohio: Charles E. Merrill Publishing Company, 1976), 221 1
6-2
© 4th ed. 2006 Dr. Rick Yount
Chapter 6
Synthesis of Literature
Psychological Abstracts Published by the American Psychological Association, this publication lists articles from over 850 journals and other sources in psychology and related fields.2 It gives summaries of studies, books, and articles on all fields of psychology and many educational articles.3
Dissertation Abstracts The Dissertation Abstracts database contains all dissertations written and registered since 1860. This is a rich resource not only of graduate level research findings, but also of research design and statistical analysis methods.
Choose Preliminary Sources Primary sources are research reports written by the researchers involved in the study. Secondary sources are compilations of research reports by authors not associated with the reported research. Preliminary sources are reference books and indexes which lead to specific research articles within a given database. Here is a brief list of some of the major preliminary sources.4
Thesaurus of ERIC Descriptors The pathway into the enormous ERIC database is a periodical called the Thesaurus of ERIC Descriptors. This publication contains a listing of all the key words used to categorize research articles and unpublished papers in the ERIC system. Similar indexes exist for Dissertation Abstracts and Psychological Abstracts databases.
Education Index The Education Index provides an up-to-date listing of articles published in hundreds of education journals, books about education and publications in related fields since 1929. For an index to educational articles for the years 1900 to 1929, check the Readers’ Guide to Periodic Literature.5
Citation Indexes The Citation Indexes list published articles which references (“cites”) a given article. My statistics professor at University of North Texas gave me a copy of a 1973 article on multiple comparisons one evening before class. He thought the questionable findings in the article would make a good dissertation study. By using citation indexes, I was able to quickly track down references to over fifty articles published since 1973 which cited the article he’d given me. The Science Citation Index (SCI) provides citations in the fields of science, medicine, agriculture, technology, and the behavioral sciences.
1
Sharon B Merriam and Edwin L. Simpson, A Guide to Research for Educators and Trainers of Adults (Malabar, FL: Robert E. Krieger Publishing Company, 1984), 35. 2 Borg and Gall, p. 150. 3 Hopkins, p. 224 4 See Borg and Gall, pp. 148-166 for detailed information on these and many other sources. 5 Hopkins, p. 221
© 4th ed. 2006 Dr. Rick Yount
6-3
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
The Social Science Citation Index (SSCI) does the same for the social, behavioral and related sciences.1
Smithsonian Science Information Exchange The Smithsonian Science Information Exchange (SSIE) is the best preliminary source for recently completed and ongoing research in all fields. It has the least lag time between publication and indexing. Several services are offered, including research information packages on major topics, custom searches, and computer searches.2
Mental Measurements Yearbook The Mental Measurements Yearbook, edited by Oscar K. Buros, indexes articles and findings related to published tests. It gives “information regarding forms, manuals, grade levels, publishers, and prices of educational, psychological, and vocational tests, plus reviews of the tests by testing experts.”3 It is published in six-year intervals and is an excellent source of finding a validated testing instrument for your proposal.
Measures for Psychological Measurement Provides information on over 3000 psychological measures that have been described in research literature. These are tests not published by regular test developers, so there is little overlap with Mental Measurements Yearbooks.4 These are some of the major sources. Check Borg and Gall for many more reference sources available to you. Also ask your professors to suggest major research journals in your field. University libraries are also a good resource for information on specialty databases.
Select Key Words Most databases are accessed through the use of key words, or descriptors. As we have previously noted, the key words for ERIC documents are published in the Thesaurus of ERIC Descriptors. Key words for Psychological Abstracts are published in the Thesaurus of Psychological Index Terms. Each database has its own set of key words. Borg and Gall provide an example of a study and how one would go about doing a literature search. The study is “the academic self-concept of handicapped children in the elementary school.”5 Key phrases in this study are “academic self-concept,” “handicapped school children,” and “elementary school students.” There is no descriptor for “academic self-concept”in the Thesaurus of ERIC Descriptors. There are the descriptors “self-concept” and “self-esteem,” both of which appear to fit this study. Since there are no specific definitions of how reviewers used the terms, it would be wise to use both of these descriptors in the data search. There are two descriptors for handicapped school children: “handicapped children” and “handicapped students.” The final descriptor is “elementary school students.” Using these ERIC descriptors, a search can be made manually or electronically through every document in the ERIC system. (We’ll follow this study in later steps). Borg and Gall, p. 156-7. Borg and Gall, 158
6-4
Borg and Gall, 168-9 Ibid., 171
1
2
4
5
Hopkins, 225
3
© 4th ed. 2006 Dr. Rick Yount
Chapter 6
Synthesis of Literature
The bridge that connects your study to the documents in the databases you’ve selected is made up of the descriptors, or key words, that grow out of your Problem Statement and operationalized variables. Only key words that are known by the database will work. In the example above, we found that the descriptor “academic self-concept” does not exist in the ERIC system. Other key words had to be substituted. When I wrote a research proposal on “Research Priorities in Religious Education,” the descriptor “Religious Education” led me to over thirty research articles. But none of the articles used the term the way Southern Baptists use it. If you consider a study which has a solid theoretical base, you will find it easier to find descriptors. Ultimately, you will secure reports that provide a good foundation for your study. If your study is theoretically shallow, you will have difficulty finding descriptors. You will be barred from the world of scientific knowledge.
Searching the literature Having determined your key words, the next step to locate the research articles which are associated with them. We can do this manually by thumbing through the printed database index, or electronically by doing a computer search.
Searching manually To do a manual search for the key words listed above in the ERIC system, follow these steps: 1. Look in the ERIC index published in the most recent month of the current year. (Indexes for ERIC documents are published monthly; semi-annual volumes are published twice each year.) 2. Look up each of your descriptors in the “Subject Index” section. 3. You will notice that descriptors are organized in hierarchies. The higher up the hierarchy you find a descriptor, the broader it is (that is, the greater number of articles it references). The farther down the hierarchy you find a descriptor, the narrower it is (the fewer number of articles it references). Articles are referenced under the descriptors by “ED” numbers, such as ED 654 321. 4. Look up the ED number in the “Document Resumes” section of the ERIC index. Here you will find a brief description (an abstract) of the referenced article. You can usually tell from the abstract whether the article will be of help to you in your own study. 5. When you have found all the abstracts for all your descriptors in this index, move to the next earlier month and repeat the process. 6. When you have completed the current year, use the semi-annual volumes to search back through previous years. 7. Continue the process until you have located every ERIC document related to every descriptor back as far as you want the search to extend.
Searching by Computer A manual search requires a great deal of time because you must manually thumb through multiple volumes of database indexes. Just think about looking up each of four descriptors, along with their associated articles, in monthly and then semi-annual indexes for up to ten years! How much time do you have to sit in the Reference Section of your university library? But more important than wasted time is the limitation of doing only simple searches. This rules out searches such as “self-esteem” AND “elementary school children.” Such a search would select only those articles which © 4th ed. 2006 Dr. Rick Yount
6-5
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
relates to BOTH descriptors. With a computerized database, you can search through literally millions of articles in seconds, and combine key words in complex ways. We can combine all our selected descriptors into a single search command for the computer. With one pass through all the ERIC documents, every article meeting the specifications of the command line will be selected from that database. Let’s use our example to illustrate the process. 1. The library assistant responsible for doing computer searches dials up the data base. 2. Descriptors are entered one at a time. 3. With each entry, there is a pause for a few seconds while the computer scans all of its material. It responds with a number of articles relating to that descriptor. The following numbers of articles were found by Borg and Gall for the example problem: 1. handicapped children 2. handicapped students 3. self-concept 4. self-esteem 5. elementary school students Total Number:
277 450 4,433 894 5,031 11,085
4. Descriptors can be combined to select only those articles that fit a specific combination. Borg and Gall's example is interested in (1) “self-concept” OR (2) “self-esteem” AND (3) “handicapped children” OR (4) “handicapped students” AND (5) “elementary school students”. This combination is entered with the command (1 or 2) and (3 or 4) and (5). The “OR” increases the number of selected articles by including additional descriptors. Any article relating to either “self-esteem” OR “self-concept” and any article relating to either “handicapped children” OR “handicapped students” will be selected. The “AND” narrows the number of selected articles by requiring articles to match all the descriptors connected by it. All articles must have either (1) or (2) AND either (3 or (4) AND (5) elementary school students to be selected in this search. The search above produced only one article reference out of the 11,085 articles identified by single descriptors. The Related Literature section requires more than a single article! The researchers broadened the search by dropping (5) elementary school students. Entering the command (1 or 2) and (3 or 4) produced 41 articles in ERIC documents. 5. Print out abstracts. You can have the computer print out the selected abstracts immediately (“online”) or you can have them printed out later (“off-line”). The difference is COST! Printing out abstracts while “on-line” means paying the connect fee between the computer and the database while the printer cranks out the abstracts. Printing “off-line” gives you the abstracts in a few days, but cost only a few cents each. This lower cost is possible because the database computer can call the library in the evening when phone rates are low, down-load all of the articles to the library’s computer, and hang up. The library computer then prints out the listing. “On-line” printing is expensive, but quick. You get your listing of articles immediately. “Off-line” printing is much cheaper, but you may have to wait 3-4 days before you can get your printouts.
Borg and Gall suggest the most productive results for educational topics would be to search RIE and CIJE from 1969 to date, RIE and Education Index from 1966 to 1968, and Education Index from 1965 back as far as the student plans to extend his review.1 Note: This provides a good historical context. Use sources less than 10 years old for the bulk of your study.
1
Borg and Gall, 29
6-6
© 4th ed. 2006 Dr. Rick Yount
Chapter 6
Synthesis of Literature
Select Articles You now have either citations or abstracts of the selected articles. Citations give the author, title, and date of selected articles; an abstract gives a 50-100 word summary of the study. You want to get abstracts if the database provides them. You now must find the article. Your library can help you do a computer search and provide you with citations. However, the articles cited may not be in on your campus. You may need to go to a larger university or state school to find the original article. In our area, for example, North Texas State University has over 5 million journal articles on microfiche and adds thousands of articles each year. Make a list of the publications cited in your search. The first step is to find out which libraries in the area carry these publications. The reference desk at area university libraries can provide you with a catalog of publications collected by a particular library. Locate the publications on your list in the directory. Some libraries have articles bound in annual volumes and stored on shelves. Others record articles on microfilm or microfiche and store them in filing cabinets. Using the library’s indexing system, you can locate the full article selected by your key word search. There are two major ways to process the articles when you find them. The first is to read through the article in the library and take notes on it immediately. Copy down what you think is relevant on 5x8 cards. Be sure to get all the bibliographical information you need for footnotes and references. The second way is to merely scan the article to determine whether it really pertains to your study or not. If it does, make a xerox copy of it. Both bound journals and microfilm/fiche materials can be xeroxed. The cost is about ten cents per page. You may spend twenty or thirty dollars in dimes this way, but you have a real advantage over the first approach. You have the articles. You can analyze them at home: write on them, categorize them, cut and paste them — the copies belong to you. I heartily recommend this approach -- especially if you have a family who would like to see you from time to time. Check the bibliographies of the research articles for further references to related literature. This provides you another path to important studies done in your area of interest. Now you must analyze and organize all of this material.
Analyze the Research Articles There are several ways to organize the mass of information you have collected. One way is to form a chronology of events or developments related to your subject. Place the articles chronologically and look for trends over time. Another approach is to organize the information conceptually. In this approach you organize your information in major and minor concepts relating to your subject. A third way is to organize the literature around your stated hypotheses.
An Organizational Notebook In my last dissertation, I organized my literature conceptually. I began by scanning the 167 selected articles, looking for key concepts and terms used by the authors that related to the key words of my study. I then placed each term at the top of a blank sheet of paper in a notebook. I began with about thirty concepts which were organized alphabetically.
© 4th ed. 2006 Dr. Rick Yount
6-7
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
Prioritizing Articles While I scanned the articles, I categorized them into three levels of importance: high, medium, and low. High priority articles were identified as those which dealt directly either with my subject or methods. Medium priority articles were identified as those which provided either relevant background information or important implications of my subject. Low priority articles were identified as those which only tangentially referred to my subject or methodology. After my “key word” notebook was organized, I began reading the high priority articles in detail. New concepts were added to the organizational notebook.
Selecting Notes and Quotes with References Each time I read something related to one of my key concepts, I recorded it on the appropriate page in my organizational notebook. I was careful to include reference information for each quote. This saved hours of retracing the source of a good quote later. When a key word page was filled with quotes, a second page was added to the notebook. This was done for all the high priority articles. I then analyzed the medium priority articles and scanned the low priority articles. Information drawn from these was added to the concept pages.
Reorganize Material by Key Words I now possessed everything in the selected articles related to the major terms in my notebook. As information was added under each key word, the differing viewpoints, definitions, and explanations of the authors leapt off the page! Conflicting opinions were obvious. Schools of thought, formed by groups of writers sharing opinions and collectively opposing others, became apparent to me as I studied the articles in this dissected form. Further, introductory statements to the articles provided me with quotes and paraphrases for my introductory statement. Historical perspectives, both explicitly stated in articles and implicitly discovered by matching findings with dates of publication, provided background information. Arguments and counter-arguments over definitions and viewpoints gave me insight into the significance of my study. The very process of pulling information out of articles piece by piece (analysis) and placing it under particular key words transformed 167 research reports into thirty “key” conceptual groupings (synthesis).
Write a Synthesis of Related Literature The next step in the process is to refine each of the key word groupings into a narrative. The Related Literature section is not a list of article summaries. It should be a flowing, well-structured narrative that begins with the variables you established in your Problem and ends with a question begging to be answered. Study each key word grouping. How do the various authors define and use the concept? Do they speak for or against the concept? Can you group the authors by differing opinions concerning the concept? Write out, in narrative form, a clear description of how these authors use this particular concept. Once each of the key word groupings have been analyzed and refined into a narrative, determine what order the key word narratives should take in the Related Literature section. There are three major approaches to ordering the key word clusters: chronologically, conceptually, or by stated hypotheses. Chronologically: If the key word clusters
6-8
© 4th ed. 2006 Dr. Rick Yount
Chapter 6
Synthesis of Literature
for a natural timeline of development, a chronological ordering is best. In this case, clusters will be time-sensitive, showing a change in thinking over time. Conceptually: If your study is anchored in clear, inter-related concepts, a conceptual ordering is suggested. My last dissertation had sections on the development of ANOVA and multiple comparisons tests, Type I error rate, Type II error rate, power, and research design. Stated hypotheses: If you have several hypotheses in your study, thesde form a natural way to order key word clusters.
Revise the Synthesis As in any specialized writing, revision is necessary. We think we know what we want to say. We feel that we have said it clearly. Our thoughts easily flow out of our minds, so we assume they flow smoothly on paper. But this is rarely the case. All good writing takes time - bulding up, tearing down, and building up again. Lay the first draft of the Synthesis aside for several days. Come back to it and read it with an objective eye. You will always find sections that are too brief, or too wordy, or awkward in structure. You will find redundancies, blind spots, and grammatical mistakes. Revise the material and set it aside for a week. Repeat the process until the material reads smoothly, clearly and tersely. It should go without saying that you must plan ahead in order to do this. Waiting until just before the deadline is a sure way to produce inferior work. This procedure applies to your entire proposal — but is critical for the Related Literature section.
Summary As you can see, the process of developing the Related Literature section of your paper involves a great deal more than checking ten or twelve books out of the library and writing a term paper. The process takes time. You have most of the semester to complete this — but don’t wait! Searching the literature will provide you necessary insight into how to mold your entire proposal. Begin now to search the literature. You should do at least one computer search just for the practice of it, and, additionally, it will save you weeks of library time.
Vocabulary CIJE Citation Indexes computer search databases descriptors Dissertation Abstracts ERIC Education Index manual search Measures for Psy. Measurement Mental Measurements Yearbook organizational notebook preliminary sources primary sources Psychological Abstracts RIE secondary sources SSIE
© 4th ed. 2006 Dr. Rick Yount
abbreviation for Current Index of Journals in Education (published articles) resources that list articles which cite a given research article reference locating research articles by computer collections of research information by subject matter (e.g. ERIC) key words by which research articles are indexed (e.g. cognitive or children) a resource that catalogs abstracts of all dissertations back to 1860 abbreviation for Educational Resources Information Center (CIJE and RIE) a resource that catalogs education information back to 1929 locating research articles using printed indexes catalogs psychological tests used in research catalogs published educational, psychological and vocational tests tool to aid in dissecting articles and synthesizing related ideas resources used to locate articles (e.g. indexes) materials produced by those who conduct research (e.g. journal articles) index to over 850 psychological journals abbreviation for Resources in Education: index to unpublished materials materials produced by writers who study research reports (e.g. books) abbr for Smithsonian Science Information Exchange: best for ongoing research
6-9
Research Design and Statistical Analysis for Christian Ministry
synthetic narrative X AND Y = Z X OR Y = Z
I: Research Fundamentals
multiple articles broken down and reordered by concept in clear concise writing both X and Y must be true (1) for Z to be true (1); otherwise Z = 0 false” either X or Y must be true (1) for Z to be true (1); both 0? Z = 0 false”
Study Questions 1. Differentiate among preliminary, primary and secondary sources of information. 2. Define the following terms: ERIC, SSIE, RIE, CIJE, descriptor, SCI, SSCI, database, synthesis. 3. Differentiate between a summary of literature and a synthesis of literature. 4. What is the major difference between printing abstracts “on-line” and “off-line”? 5. Discuss the importance of “revision” in writing your proposal. How are you planning to incorporate revision into your proposal development schedule?
Sample Test Questions 1. John is interested in analyzing recent unpublished conference proceedings in the field of educational psychology. His best resource is a. SCI b. CIJE c. RIE d. Education Index 2. Two advantages to using the computerized databases to search the literature are a. simple searches and expense b. complex searches and expense c. simple searches and time d. complex searches and time 3. Given 4 descriptors, which of the following will provide the greatest number of articles? a. (1 or 2) and (3 or 4) b. (1 and 2) or (3 and 4) c. 1 and 2 and 3 and 4 d. 1 or 2 or 3 or 4 4. Which of the following is not a good way to organize your Synthesis of Related Literature section? a. alphabetically by author’s last name b. chronologically by article date c. according to major and minor concepts d. according to stated hypotheses
6-10
© 4th ed. 2006 Dr. Rick Yount
Populations and Sampling
Chapter 7
7 Populations and Sampling The Rationale of Sampling Steps in Sampling Types of Sampling Inferential Statistics: A Look Ahead The Case Study Approach
The Rationale of Sampling In Chapter One, we established the fact that inductive reasoning is an essential part of the scientific process. Recall that inductive reasoning moves from individual observations to general principles. If a researcher can observe a characteristic of interest in all members of a population, he can with confidence base conclusions about the population on these observations. This is perfect induction. If he, on the other hand, observes the characteristic of interest in some members of the population, he can do no more than infer that these observations will be true of the whole. This is imperfect induction, and is the basis for sampling.1 The population of interest is usually too large or too scattered geographically to study directly. By correctly drawing a sample from a specific population, a researcher can analyze the sample and make inferences about population characteristics.
The Population A “population” consists of all the subjects you want to study. “Southern Baptist missionaries” is a population. So is “ministers of youth in SBC churches in Texas.” So is “Christian school children in grades 3 and 4.” A population comprises all the possible cases (persons, objects, events) that constitute a known whole.2
Sampling Sampling is the process of selecting a group of subjects for a study in such a way that the individuals represent the larger group from which they were selected.3 This representative portion of a population is called a sample.4
Donald Ary, Lucy Cheser Jacobs, and Asghar Razavieh, Introduction to Research in Education, (New York: Holt, Rinehart and Winston, Inc., 1972), 160 2 Ibid., p. 125 3 L. R. Gay, Educational Research: Competencies for Analysis and Application, 3rd ed., (Columbus, Ohio: Merrill Publishing Company, 1987), 101. 4 Ary et. al., 125 1
© 4th ed. 2006 Dr. Rick Yount
7-1
Population Sampling Biased Samples Randomization
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
Biased Samples It is important that samples provide a representative cross-section of the population they supposedly represent. The sample should be a “microcosm” — a miniature model — of the population from which it was drawn. Otherwise, the results from the sample will be misleading when applied to the population as a whole. If I select “Southern Baptist ministers” as the population for my study and select Southern Baptist pastors in Fort Worth as my sample, I will have a biased sample. “Fort Worth pastors” may not reflect the same characteristics as ministers (including staff members) across the nation. Selecting people for a study because they are within convenient reach —members of my church, students in a nearby school, co-workers in the surrounding region — yields biased samples. Biased samples do not represent the populations from which they are drawn.
Randomization The key to building representative samples is randomization. “Randomization” is the process of randomly selecting population members for a given sample, or randomly assigning subjects to one of several experimental groups, or randomly assigning experimental treatments to groups. In the context of this chapter, it is selecting subjects for a sample in such a way that every member of the population has an equal chance at being selected. By randomly selecting subjects from a population, you statistically equalize all variables simultaneously.
Steps in Sampling Target Population Accessible Population Size of Sample Select
Regardless of the specific type of sampling used, the steps in sampling are essentially the same: identify the target population, identify the accessible population, determine the size of the sample, and select the sample.
Identify the Target Population The first step is the identification of the target population. In a study concerning professors in Southern Baptist seminaries, the target population would be all professors in all Southern Baptist seminaries. In a study of job satisfaction of local church staff ministers, the target population is all staff ministers in all churches.
Identify the Accessible Population Since it is usually not possible to reach all the members of a target population, one must identify that portion of the population which is accessible. The nature of the accessible population depends on the time and resources of the researcher. Given the target population of “Southern Baptist professors,” the accessible population might be “Southwestern Seminary professors.” Given the target population of “local church staff ministers,” the accessible population might be “Southern Baptist ministers of education in Texas.” Notice that specifying the accessible populations reduces the scope of the two examples in the preceding paragraph. In most cases this is helpful because beginning researchers tend to include too much in their study.
7-2
© 4th ed. 2006 Dr. Rick Yount
Populations and Sampling
Chapter 7
Determine the Size of the Sample Student researchers often ask “How big should my sample be?” The first answer is “use as large a sample as possible.”5 The reason is obvious: the larger the sample, the better it represents the population. But if the sample size is too large, then the value of sampling — reducing time and cost of the study — is negligible. The more common problem, however, is having too few subjects, not too many.6 So the more important question is, “What’s the minimum number of subjects I need?” The question is still difficult to answer. Here are some of the factors which relate to proper sample size.
Accuracy In every measurement, there are two components: the true measure of the variable and error. The error comes from incidental extraneous sources within each subject: degree of motivation, interest, mood, recent events, future expectations. All of these cause variations in test results. In all statistical analysis, the objective is to minimize error and maximize the true measure. As the sample size increases, the random extraneous errors tend to cancel each other out, leaving a better picture of the true measure of the population.
Cost An increasing sample size translates directly into increasing costs: not only of money, but time as well. Just think of the difference in printing, mailing, receiving, processing, tabulating, and analyzing questionnaires for 100 subjects, and then for 1000 subjects. The dilemma of realistically balancing “accuracy” (increase sample size) with “cost” (decrease sample size) confronts every researcher. Inaccurate data is useless, but a study which cannot be completed due to lack of funds is not any better. “Cost per subject” is directly related to the kind of study being done. Interviews are expensive in time, effort and money. Mailing out questionnaires is much less expensive per subject. Therefore, one can plan to have a larger sample with questionnaires than with interviews for the same cost.
The Homogeneity of the Population “Homogeneous” [from homo-genos, “like-kind”] means “of the same kind or nature; consisting of similar parts, or of elements of the like nature” (Webster, s.v. “homogeneous”). Homogeneity in a population means that the members of the population are similar on the characteristic under study. We can take a sample of two drops of water from a 10 gallon drum, and have a good representative sample of the ten gallons. This is because the water in a 10 gallon drum is an homogeneous solution (if we mix it up well before sampling). But if we take two people out of a group of 500, we will not have a good representative sample of the 500. “People” are much less homogeneous than a water solution! But even populations of people vary in homogeneity. The population “Texas Baptists” would have less variability on the issue of gambling than the more general population of “Texans.” The greater the variability in the population, the larger the sample needs to be. 5
Ary et. al., 167
© 4th ed. 2006 Dr. Rick Yount
6
Gay, 114
7-3
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
Other Considerations Borg and Gall list several additional factors which influence the decision to increase the sample size (See pp. 257-261). These are 1. When uncontrolled variables are present. 2. When you plan to break samples into subgroups. 3. When you expect high attrition of subjects. 4. When you require a high level of statistical power (see Chapter 17) .
So, what is a good rule of thumb for setting sample size in a research proposal? Here are two suggestions
Sample Size Rule of Thumb Dr. John Curry, Professor of Educational Research, North Texas State University (now retired), provided his research students (fall, 1984) with the "rule of thumb" on sample size (see right). Using this rule, an adequate sample of Southern Baptists’ 36,000 pastors would be a random sample of 1%, or 360 pastors. L. R. Gay suggests 10% of large populations and 20% of small populations as minimums.7 Using Gay’s suggestion, our sample of pastors would include 3,600. It is left to the student to weigh the factors of accuracy, cost, homogeneity of the accessible population, type of sampling and kind of study, and determine the best sample size for his study.
Size of Population
Sampling Percent
0-100
100%
101-1,000
10%
1,001-5,000
5%
5,001-10,000
3%
10,000+
1%
Select the Sample The final step is to actually select a sample of predetermined size from the accessible population.
Types of Sampling Simple Systematic Stratified Cluster
There are several ways of doing this. We will look at four major types here: simple random, systematic, stratified, and cluster sampling. The basic characteristic of random sampling is that all members of the population have an equal and independent chance of being included in the sample.8
Simple Random Sampling The most common method of sampling is known as simple random sampling: "Pick a number out of a hat!" Gay provides a good example of this type of sampling.9 A superintendent of schools wants to select a sample of teachers so that their attitudes toward unions can be determined. Here is how he did it: 1. The population is 5,000 teachers in the system. 2. The desired sample size is 10%, or 500 teachers. 3. The superintendent has a directory which lists all 5,000 teachers alphabetically. He assigns numbers Gay, 114-115
7
7-4
Ary, 162
8
Gay, 105-7
9
© 4th ed. 2006 Dr. Rick Yount
Populations and Sampling
Chapter 7
from 0000 to 4999 to the teachers. 4. A table of random numbers is entered at an arbitrarily selected number such as the one underlined below: 59058 11859 53634 48708 71710
5. Since his population has only 5000 members, he is interested only in the last 4 digits of the number, 3634. 6. The teacher assigned #3634 is selected for the sample. 7. The next number in the column is 48708. The last four digits are 8708. No teacher is assigned #8708 since there are only 5000. Skip this number. 8. Applying these steps to the remaining numbers shown in the column, teachers 1710, 3942, and 3278 would be added to the sample. 9. This procedure continues down this column and succeeding columns until 500 teachers have been selected.
This random sample could well be expected to represent the population from which it was drawn. But it is not guaranteed. The probable does not always happen. For example, if 55% of the 5000 teachers were female and 45% male, we would expect about the same percentages in our random sample of 500. Just by chance, however, the sample might contain 30% females and 70% males. If the superintendent believed teaching level (elementary, junior high, senior high) might be a significant variable in attitude toward unions, he would not want to leave representation of these three sub-groups to chance. He would probably choose to do a stratified random sample.
Systematic Sampling A systematic sample is one in which every Kth subject on a list is selected for inclusion in the sample.10 The “K” refers to the sampling interval, and may be every 3rd (K=3) or 10th (K=10) subject. The value of K is determined by dividing the population size by the sample size. Let’s say that you have a list of 10,000 persons. You decide to use a sample of size 1000. K = 10000/1000 = 10. If you choose every 10th name, you will get a sample of size 1000. The superintendent in our example would employ systematic sampling as follows: 1. The population is 5,000 teachers. 2. The sample size is 10%, or 500 teachers. 3. The superintendent has a directory which lists all 5,000 teachers in alphabetical order. 4. The sampling interval (K) is determined by dividing the population (5000) by the desired sample size (500). K = 5000/500 = 10. 5. A random number between 0 and 9 is selected as a starting point. Suppose the number selected is “3”. 6. Beginning with the 3rd name, every 10th name is selected throughout the population of 5000 names. Thus, teacher 3, 13, 23, 33 ... 993 would be chosen for the sample (Gay, pp. 113-114).
Writers disagree on the usefulness of systematic sampling. Ary and Gay discount systematic sampling as “not as good as” random sampling because each selection is not independent of the others.11 Once the beginning point is established, all other choices are determined. Both writers give as an example a population which includes various nationalities. Since certain nationalities have distinctive last names that tend to group together under certain letters of the alphabet, systematic sampling can skip over 10
Gay, 112
11
Ary, 116 and Gay, 114
© 4th ed. 2006 Dr. Rick Yount
7-5
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
whole nationalities at a time. Babbie on the other hand, states that “systematic sampling is virtually identical to simple random sampling” when one chooses a random starting point.12 Sax reports that systematic sampling “usually leads to the same results as simple random sampling.”13 There is a module on your tutorial disk that directly compares systematic sampling with simple random sampling. Use that to compare the results of sampling for yourself. There is one major danger with systematic sampling on which all authors agree. If there is some natural periodicity — repetition — within the list, the systematic sample will produce estimates which are seriously in error.14 If this condition exists, the researcher can do one of two things. He can use simple random sampling on the list as it exists, or he can randomly order the list and then use systematic sampling.
Stratified Sampling Stratified sampling permits the researcher to identify sub-groups within a population and create a sample which mirrors these sub-groups by randomly choosing subjects from each stratum. Such a sample is more representative of the population across these sub-groups than a simple random sample would be.15 Subgroups in the sample can either be of equal size or proportional to the population in size. Equal size sample subgroups are formed by randomly selecting the same number of subjects from each population subgroup. Proportional subgroups are formed by selecting subjects so that the subgroup percentages in the population are reflected in the sample. The following example is a proportionally stratified sample. The superintendent would follow these steps to create a stratified sample of his 5,000 teachers.16 1. The population is 5,000 teachers. 2. The desired sample size is 10%, or 500 teachers. 3. The variable of interest is teaching level. There are three subgroups: elementary, junior high, and senior high. 4. Classify the 5,000 teachers into the subgroups. In this case, 65% or 3,250 are elementary teachers, 20% or 1,000 are junior high teachers, and 15% or 750 are senior high teachers. 5. The superintendent wants 500 teachers in the sample. So 65% of the sample (325 teachers) should be elementary, 20% (100) should be junior high teachers, and 15% (75) should be senior high teachers. This is a proportionally stratified sample. (A non-proportionally stratified sample would randomly select 167 subjects from each of the three groups.) 6. The superintendent now has a sample of 500 (325+100+75) teachers, which is representative of the 5,000 and which reflects proportionally each teaching level.
Cluster sampling Cluster sampling involves randomly selecting groups, not individuals. It is often impossible to obtain a list of individuals which make up a target population. Suppose 12 Earl Babbie, The Practice of Social Research, 3rd. (Belmont, CA: Wadsworth Publishing Company, 1983), 163 13 Gilbert Sax, Foundations of Educational Research (Englewood Cliffs, NJ: Prentice-Hall, 1979), 191 14 Gilbert Churchill, Marketing Research: Methodological Foundations, 2nd (Hinsdale, IL: The Dryden Press, 1979), 328 15 Ary and others, 164; Babbie, 164-165; Borg and Gall, 248-249; Sax, 185-190 16 Gay, pp. 107-109
7-6
© 4th ed. 2006 Dr. Rick Yount
Populations and Sampling
Chapter 7
a researcher is interested in surveying the residents of Fort Worth. Through cluster sampling, he would randomly select a number of city blocks and then survey every person in the selected blocks. Or, another researcher wants to study social skills of Southern Baptist church staff members. No list exists which contains the names of all church staff members. But he could randomly select churches in the Convention, and use all the staff members of the selected churches. Any intact group with similar characteristics is a cluster. Other examples of clusters include classrooms, schools, hospitals, and counseling centers. Let’s apply this approach to the superintendent’s study. 1. The population is 5,000 teachers. 2. The sample size is 10%, or 500 teachers. 3. The logical cluster is the school. 4. The superintendent has a list of 100 schools in the district. 5. Although the clusters vary in size, there are an average of 50 teachers per school. 6. The required number of clusters is obtained by dividing the sample size (500) by the average size of cluster (50). Thus, the number of clusters needed is 500/50 = 10 schools. 7. The superintendent randomly selects 10 schools out of the 100. 8. Every teacher in the selected schools is included in the sample.
In this way, the interviewer can conduct interviews with all the teachers in ten locations, and save traveling to as many as 100 schools in the district.17 There are drawbacks to cluster sampling. First, a sample made up of clusters may be less representative than one selected through random sampling.18 Only ten schools out of 100 are used in our example. These ten may well be different from the other ninety. Using a larger sample size, say, 25 schools rather than 10, reduces this problem. A second drawback is that commonly used inferential statistics are not appropriate for analyzing data from a study using cluster sampling.19 The statistical procedures we will be studying require random sampling.20
Inferential Statistics: A Quick Look Ahead The field of inferential statistics allows researchers to study samples and infer the characteristics of populations. We have already noted the two basic components of every piece of collected data: the true measurement and error. Suppose you have a population of 1000 test scores. The average of the entire 1000 scores is 75. You would expect the average of a random sample of 100 scores to also be 75. So you draw your first sample of 100 and compute the average. You get 73.8. You draw another 100 and find the average to be 76.2. Another hundred: 77.7. Yet another: 71.5. The central tendency in 1000 scores is not exactly duplicated in a sample of 100. The differences among sample averages is due to sampling error. Inferential statistics provides a way to estimate true population parameters from sample statistics through the use of the laws of probability. Each of the sample means is different from the population mean. But is the difference great enough to be considered significant? We will master some of the most popular techniques for inferring population characteristics from sample measurements a little later in the course. 17 20
18 19 Gay, 110-112 Ibid., 111 Ibid, 112 See Babbie 167-171 his discussion of statistical analysis and cluster sampling.
© 4th ed. 2006 Dr. Rick Yount
7-7
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
The Case Study Approach Not all research is geared to sampling subjects out of large populations. The case study is a kind of descriptive research in which an in-depth investigation of an individual, group, event, community or institution is conducted. The strength of the case study approach is its depth, rather than its breadth. The investigator tries to discover all the variables that are important in the history or development of his subject.1 The weakness of the case study is its lack of breadth. “The dynamics of one individual or social unit may bear little relationship to the dynamics of others. Most case studies arise out of counseling or remedial efforts and therefore provide information on exceptional rather than representative individuals.”2 Because of this, it is more difficult to write an acceptable dissertation which employs a case study approach. The objective of graduate research is to concentrate on areas which have high generalizability. In most cases, this involves sampling from specified populations. The case study approach involves finding atypical subjects that exemplify some relevant trait. Random sampling methods are therefore inappropriate.3 Borg and Gall cite several areas where the case study approach is used:4
Historical Case Studies of Organizations An historical case study of an organization involves the analytical observation of an organization, by way of records, documents and personal interviews of members and leaders, from its inception to the present. An example of this kind of case study would be “The Development of the School of Educational Ministries, Southwestern Baptist Theological Seminary, Fort Worth, Texas.”
Observational Case Studies An observational case study involves the in-depth observation of a specific individual or group over a period of time. An example of this type of study would be "Living with the Children of God: New Testament Community or New Age Cult?"
Oral Histories Oral histories involve extensive first-person interviewing of a single individual. Dissertations have been written on the lives of J. M. Price and Joe Davis Heacock, former deans of the School of Religious Education, using this approach.
Situational Analysis An event is studied from the perspective of the participants involved. For example, a staff member is summarily fired from a church staff by the pastor. Interviews with the staff member and family, staff colleagues, pastor, church leaders and selected church members would be conducted. When all the views are synthesized, an indepth understanding of the event can be produced.
Clinical case study A particular problem is studied through in-depth analysis of a single individual 1 Ary, p. 286. See the following for guidance in proposing a case study approach: Borg and Gall, 488490; Gay, 207; Sax, 106. 2 3 4 Ibid., 287 Sax, 106 Headings from Borg and Gall, 489
7-8
© 4th ed. 2006 Dr. Rick Yount
Populations and Sampling
Chapter 7
suffering from the problem. "Depression in the Ministry: A Case Study of Twenty Ministers of Eeducation."
Summary In this chapter you have learned about sampling techniques that allow you to select and study a small representative group of subjects (the sample) and infer findings to the larger group (the population). You have been given a rationale for sampling, the place of randomization in sampling, the steps of sampling, four types of sampling, and a look at the case study approach.
Vocabulary accessible population attrition biased sample case study approach Cluster sampling error estimated parameters population parameters randomization sample sample size sample statistics sampling error Sampling Simple random sampling statistical power Stratified sampling systematic sampling target population true measure
subjects available for sampling (e.g. mailing list) loss of subjects during a study subjects selected in non-random manner (e.g. 3rd grade classes at school) in-depth study of individual subject or institution selecting subjects by randomly choosing groups (e.g. city blocks or churches) difference between the measurement of a variable and its true value mean and standard deviation of population computed from sample statistics mean and standard deviation of population measured directly selecting subjects so that each population member has equal chance of being selected a (smaller) group of subjects which represents a (larger) population the number of subjects in a sample (symbolized by N or n) mean and standard deviation of a sample (not useful in themselves) source of the discrepancy between sample statistics and population parameters process of selecting a representative sample from a population drawing subjects by random number (e.g. names out of a hat) the probability that a statistic will declare a difference `significant’ selecting subjects at random from population strata (e.g., male, female) selecting every kth subject from a list. (e.g., every 10th person in 1000 = 100 subjects) population of interest to your study (e.g. single adults) the true value of a variable (no error)
Study Questions 1. Define “target population,” “accessible population,” and “sample.” 2. Explain why sampling is an important part of research. 3. List and describe four types of sampling. 4. Explain why randomization is important in sampling. 5. You want to study “Youth ministers’ attitudes toward small group Bible study.” You have identified 4,573 youth ministers. Using the “rules of thumb” estimate for sampling, how many youth ministers should you select for your study?
© 4th ed. 2006 Dr. Rick Yount
7-9
Research Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
Sample Test Questions 1. Sampling is based on the principles of a. intuition b. trial and error c. inductive reasoning d. deductive reasoning 2. The key to producing a good representative sample is a. random selection of subjects b. using volunteers for the sample c. narrowly defining the target population d. using the minimum number of subjects in the sample 3. “Southern Baptist single adults” would be considered a(n) a. accessible population b. target population c. stratified sample d. cluster sample 4. One would increase sample size if he expected a. high attrition of subjects during the study b. a high cost per subject c. high homogeneity of the population d. few uncontrolled variables in the population
7-10
© 4th ed. 2006 Dr. Rick Yount
The Measurement Triad
Chapter 8
8 Collecting Dependable Data Validity Reliability Objectivity
We have discussed variables and problems, hypotheses and purposes, populations and samples. The theoretical foundation of your study must sooner or later yield to concrete action: the collection of real pieces of data. The tools used to collect data are called instruments. An instrument may be an observation checklist, a questionnaire, an interview guide, a test or attitude scale. It may be a video camera or cassette recorder. An instrument is any device used to observe and record the characteristics of a variable. Before you can accurately measure the stated variables of your study, you must translate those variables into measurable forms. This is done by operationally defining the variables of your study (Chapter 3). Data collection is meaningless without a clearly operationalized set of variables. The second step is to insure that the selected instrument accurately measures the variables you’ve selected. The naive researcher rushes past the instrument selection or development phase in order to collect data. The result is faulty, error-filled data -- which yields faulty conclusions. The accuracy of the instrument used in your study is an important factor in the usefulness of your results. If the data is incomplete or inadequate, the study is destined for failure. A wonderful design and precise analysis yields useless results if the data quality is poor. So carefully design or select the instrument you will use to collect data. Three characteristics -- "the Great Triad" -- determine the precision with which an instrument collects data. The Great Triad consists of (1) validity, “Does the instrument measure what it says it measures?”; (2) reliability, “Does the instrument measure accurately and consistently?”; and (3) objectivity, “Is the instrument immune to the personal attitudes and opinions of the researcher?”
Validity
Content
The term validity refers to the ability of research instruments to measure what they say they measure. A valid instrument measures what it purports to measure. A © 4th ed. 2006 Dr. Rick Yount
8-1
Predictive Concurrent Construct
Reseach Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
12-inch ruler is a valid instrument for measuring length. It is not a valid instrument for measuring I.Q., or a quantity of a liquid, or an amount of steam pressure. These require an I.Q. test, a measuring cup, and a pressure gauge. Let’s say a student wants to measure the variable “spiritual maturity,” and operationally defines it as “the number of times a subject attended Sunday School out of the past 52 Sundays.” The question we should ask is whether “attendance count” in Sunday School is a valid measure of spiritual maturity — does “count” really measure “spiritual maturity”? Can one attend Sunday School and be spiritually immature? (Yes, for coffee, fellowship and business contacts). Can one be spiritually mature and not attend Sunday School? (Yes, pastors usually use this time for pastoral work). If either of these questions can be answered yes, (and they are), then the measure is not a valid one. There are four kinds of instrument validity: content, concurrent, predictive, and construct. Each of these have specific meaning, and helps establish the nature of valid instruments.
Content Validity The content validity of a research instrument represents the extent to which the items in the instrument match the behavior, skill, or effect the researcher intends them to measure.1 In other words, a test has content validity if the items actually measure mastery of the content for which the test was developed. Tests which ask questions over material not covered by objectives or study guidelines, or draw from other fields besides the one being tested, violate this kind of validity. Content validity is different from face validity, which is a subjective judgement that a test appears to be valid. Researchers establish content validity for their instruments by submitting a long list of items (such as statements or questions) to a “validation panel.” Such a validation panel consists of six to ten persons who are considered experts in the field of study for which the instrument is being developed. The panel judges the clarity and meaningfulness of each of the items by means of a 4- or 6-point rating scale. Compute the means and standard deviations (see Chapter 16) for each of the items. Select the items with the highest mean and lowest standard deviation on "meaningfulness" and "clarity" to be included in your instrument. In summary, content validity asks the question, “How closely does the instrument reflect the material over which it gathers data?” Content validity is especially important in achievement testing.
Predictive Validity The predictive validity of a research instrument represents the extent to which the test’s results predict such things as later achievement or job success. It is the degree to which the predictions made by a test are confirmed by the later success of the subjects. Suppose I developed a “Research and Statistics Aptitude Test” to be given students at the beginning of the semester. If I correlated these test scores of incoming students with their final grade in the course, I could use the test as a predictor of success in the course. In this example, the Research and Statistics Test provides the predictor measures and the final course grade is the criterion by which the aptitude test is analyzed for validity. In predictive validity, the criterion scores are gathered some time after the predictor scores. The Graduate Record Examination (GRE) is taken Merriam, p. 140
1
8-2
© 4th ed. 2006 Dr. Rick Yount
The Measurement Triad
Chapter 8
by college students and supposedly predicts which of its users will succeed in (future) doctoral level studies. Predictive validity asks the question, “How closely does the instrument reflect the later performance it seeks to predict?”
Concurrent Validity Concurrent validity represents the extent to which a (usually smaller, easier, newer) test reflects the same results of a (usually larger, more difficult, established) test. The established test is the criterion, the benchmark, for the newer, more efficient test. Strong concurrent validity means that the smaller, easier test provides data as well as the larger, more difficult one. A popular personality test, called the Minnesota Multi-Phasic Inventory (MMPI), once had only one form consisting of about 550 questions. The test required several hours to administer. In order to reduce client frustration, a newer short-form version was developed which contained about 350 questions. Analysis revealed that the shorter form had high concurrent validity with the longer form. That is, psychologists found the same results with the shorter form as with the long form, but also reduced patient frustration and administration time. A researcher wanted to determine whether anxious college students showed more preference for female role behaviors than less anxious students. To identify contrasting groups of anxious and non-anxious students, she could have had a large number of students evaluated for clinical signs of anxiety by experienced clinical psychologists. However, she was able to locate a quick, objective test, the Taylor Manifest Anxiety Scale, which has been demonstrated to have high concurrent validity with clinical ratings of anxiety in a college population. She saved considerable time conducting the research project by substituting this quick, objective measure for a procedure that is time-consuming and subject to personal error.1 Concurrent validity asks the question, “How closely does this instrument reflect the criterion established by another (usually more complex or costly) validated instrument?”
Construct Validity Construct validity reflects the extent to which a research instrument measures some abstract or hypothetical construct.2 Psychological concepts, such as intelligence, anxiety, and creativity are considered hypothetical constructs because they are not directly observable -- they are inferred on the basis of their observable effects on behavior.3 In order to gather evidence on construct validity, the test developer often starts by setting up hypotheses about the differentiating characteristics of persons who obtain high and low scores on the measure. Suppose, for example, that a test developer publishes a test that he claims is a measure of anxiety. How can one determine whether the test does in fact measure the construct of anxiety? One approach might be to determine whether the test differentiates between psychiatric and normal groups, since theorists have hypothesized that anxiety plays a substantial role in psychopathology. If the test does in fact differentiate the two groups, then we have some evidence that it measures the construct of anxiety.4 Construct validity asks the question, “How closely does this instrument reflect Borg and Gall, 279 A construct is a theoretical explanation of an attribute or characteristic created by scholars for purposes of study. Merriam, p. 141 3 4 5 6 Borg and Gall, 280 Ibid. Sax, 206 Ary et. al., 200 7 David Payne, The Assessment of Learning: Cognitive and Affective (Lexington, Mass.: D.C. Heath and Company, 1974), 259 1 2
© 4th ed. 2006 Dr. Rick Yount
8-3
Reseach Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
the abstract hypothetical construct it seeks to measure?”
Reliability Stability Consistency Equivilance
Reliability is the “extent to which measurements reflect true individual differences among examinees.”5 It is the “degree of consistency with which [an instrument] measures what it is measuring.”6 The higher the reliability of an instrument, the less influenced it is by random, unsystematic factors.7 In other words, is an instrument confounded by the “smoke” and “noise” of human characteristics, or can it measure the true substance of those variables? Does the instrument measure accurately, or is there extraneous error in the measurements? Do the scores produced by a test remain stable over time, or do we get a different score every time we administer the test to the same sample? There are three important measures of reliability. These are the coefficients of stability, internal consistency, and equivalence. All three use a correlation coefficient to express the strength of the measure. We will study the correlation coefficient in detail when we get to chapter 22. For the time being, we will merely state that a reliability coefficient can vary from 0.00 (no reliability) to +1.00 (perfect reliability, which is never attained). A coefficient of 0.80 or higher is considered very good.
Coefficient of Stability The coefficient of stability, also called test-retest reliability,9 measures how consistent scores remain over time. The test is given once, and then given to the same group at a later time, usually several weeks. A correlation coefficient is computed between the two sets of scores to produce the stability coefficient. The greatest problem with this measure of reliability is determining how much delay to use between the tests. If the delay is too short, then subjects will remember their previous answers and the reliability coefficient will be higher than it should be. If the delay is too long, then subjects may actually change in the interval. They will answer differently, but the difference is due to a change in the subject, not in the test. This will yield a coefficient lower than it should be.10 Still, science does best with consistent, stable, repeatable phenomena, and the stability of responses to a test is a good indicator of the stability of the variable being measured.
Coefficient of Internal Consistency The purpose of a test is to measure, honestly and precisely, variables resident in subjects. The structure of the test itself can sometimes reduce the reliability of scores it produces. The coefficient of internal consistency11 measures consistency within a given test. The coefficient of internal consistency has two major forms. The first is the splithalf test. After a test has been administered to a single group of subjects, the items are divided into two parts. Odd items (1, 3, 5...) are placed in one part and even items (2, 4, 6...) in the other. Total scores of the two parts are correlated to produce a measure of item consistency. Since reliability is related to the length of the test, and the split-half coefficient reduces test length by half, a correction factor is required in order to obtain the reliability of the entire test. The Spearman-Brown prophecy formula (formula at right) is used to make this correction.12 Here the r’ is the corrected reliability coefficient and r equals the comBorg and Gall, 283
9
8-4
10
Ibid., 284
11
Ibid., 284-5
12
Ibid., 285
© 4th ed. 2006 Dr. Rick Yount
The Measurement Triad
Chapter 8
puted correlation between the two halves. If r=0.60, then the formula yields r' = 0.75. Another measure of internal consistency can be obtained by the use of the KuderRichardson formulas. The most popular of the formulas are known as K-R 20 and K-R 21. The K-R 20 formula is considered by many specialists in education and psychology to be the most satisfactory method for determining test reliability. The K-R 21 formula is a simplified approximation of the K-R 20, and provides an easy method for determining a reliability coefficient. It requires much less time to apply than K-R 20 and is appropriate for the analysis of teacher-made tests and experimental tests written by a researcher which are scored dichotomously.13 (A dichotomous variable is one which has two and only two responses: yes-no, true-false, on-off). Cronbach’s Coefficient Alpha is a general form of the K-R 20 and can be applied to multiple choice and essay exams. Coefficient Alpha compares the sum of the variances for each item with the total variance for all items taken together. If there is high internal consistency, coefficient alpha produces a strong positive correlation coefficient.
Coefficient of Equivalence A third type of reliability is the coefficient of equivalence, sometimes called parallel forms, or alternate-form reliability. It can be applied any time one has two or more parallel forms (different versions) of the same test.14 One can administer both forms to the same group at one sitting, or with a short delay between sittings. A correlation coefficient is then computed on the two sets of parallel scores. A common use of this type of reliability is in a pretest-posttest research setting. By using the same test for both testing occasions, the researcher cannot know how much of the gain in scores is due to the treatment and how much is due to subjects remembering their answers from the first test. If one has two parallel forms of the same exam, and the coefficient of equivalence is high, one can use one form as the pretest and the other as the posttest.
Reliability and Validity A test can be reliable and not valid for a given testing situation. But can a test be unreliable and still be valid for a given testing situation?
Answer 1: A Test Must be Reliable in Order to be Valid You will read in some texts that an unreliable instrument is not valid. For example, Bell states “If an item is unreliable, then it must also lack validity, but a reliable item is not necessarily also valid.”15 Sax writes, “a perfectly valid examination must measure some trait without random error...in a reliable and consistent manner.”16 Both of these statements subsumes the concept of reliability under validity rather than depicting them as interdependent concepts. Nunnally agrees in the sense that “reliability does place a limit on the extent to which a test is valid for any purpose.” Further, “high reliability is a necessary, but not sufficient, condition for high validity. If the reliability is zero, or not much above zero, then the test is invalid.”17 Dr. Earl McCallon of North Texas University put it more directly in class. The 14 Borg and Gall, 285-6 Ibid., 283 Judith Bell, Doing Your Research Project (Philadelphia: Open University Press, 1987), 51 16 Sax, 220 13 15
© 4th ed. 2006 Dr. Rick Yount
8-5
Reseach Design and Statistical Analysis for Christian Ministry
I: Research Fundamentals
maximum validity of a test is equal to the square root of its reliability.18 Therefore, test validity is dependent upon test reliability.
Answer 2: A Test Can be Valid Even If It Isn’t Reliable Both Bell and Sax reflect what Payne calls the “cliche of measurement that a test must be reliable before it can be valid.”19 Payne explains validity in terms of a theoretical inference. Validity is not strictly a characteristic of the instrument but of the “inference that is to be made from the test scores derived from the instrument.” Payne differentiates between validity and reliability as interdependent concepts: validity deals with systematic errors (clarity of instructions, time limits, room comfort) and reliability with unsystematic errors (various levels of subject motivation, guessing, forgetting, fatigue, and growth) in measurement.20 Babbie uses marksmanship to demonstrate the inter-relationship of validity and reliability.21 High reliability is pictured as a tight shot pattern and low reliability as a loose shot pattern. This is a measure of the shooter’s ability to consistently aim and fire the weapon. High validity is pictured as a shot-cluster on target and low validity as a cluster off to the side. This is a measure of the trueness or accuracy of the sights. Using this analogy we can define four separate conditions: 1. High reliability and high validity is a tight cluster in the bull’s eye. 2. Low reliability with high validity is a loose cluster centered around the bull’s eye. (One could certainly question the “validity” of such data) 3. High reliability with low validity is a tight cluster off the target. 4. Low reliability with low validity is a loose cluster off the target.
Payne and Babbie would hold that an instrument can be unreliable and still be valid. A yardstick made out of rubber or a measuring tape made out of yarn are valid instruments for measuring length, even though their measurements would not be accurate. Bell, Sax and Nunnally would say a tape measure made of yarn is not valid if it cannot produce reliable measurements. McCallon demonstrates the boundary condition of Vmax = √R. In the final analysis, whether we are aiming a rifle or designing a research instrument, our goal should be to get a “tight cluster in the bull’s-eye.” Use instruments which demonstrate the ability to collect data with high validity and high reliability.
Objectivity The third characteristic of good instruments is objectivity. Objectivity is the extent to which equally competent scorers get the same results. If interviewers A and B interview the same subject and produce different data sets for him, then it is clear that 17 Jum Nunnally, Educational Measurement and Evaluation, 2nd ed. (New York: McGraw-Hill Book Company, 1972), 98-99 18 Class notes, Research Seminar, Spring 1983 19 Payne, 259 20 Ibid., 254
8-6
© 4th ed. 2006 Dr. Rick Yount
The Measurement Triad
Chapter 8
the measurement is subjective.22 Something about the subject is “hooking” the interviewers differently. The difference is not in the subject, but in the interviewers. A pilot study which uses the researcher’s instrument with subjects similar to those targeted for the study will demonstrate whether it is objective or not. This is particularly important in interview or observation type studies in which human subjectivity can distort the data being gathered. The validation panel described under “validity” also helps the researcher create an objective test. All items in an item bank should be as clear and meaningful as the researcher can make them. But after the validation panel has evaluated and rated them, the best of the items can be selected for the instrument. This will filter out much of the researcher’s own biases. An illustration of the objective-subjective tension in instruments is the difference between essay and objective tests. The difference in grades produced on essay tests can be more related to the mood of the grader than the knowledge of the student. A well-written objective test avoids this problem because the answer to every question is definitively right or wrong. Whether you are planning to use an interview guide, an observation checklist, an attitude scale, or a test, you must work carefully to insure that the data you gather reflects the real world as it is, and not as you want it to be.
Summary The first element of the Great Triad is validity. The four types of validity — content, predictive, concurrent, and construct — focus on how well an instrument measures what it purports to measure. The fifth type of validity, “face” validity, is nothing more than a subjective judgement on the part of the researcher and should not be used as a basis for validating instruments. The second element of the Great Triad is reliability. These three approaches to reliability — stability, internal consistency, equivalence — focus on how accurate the gathered data is. The third element of the Great Triad is objectivity, which concerns the extent that data is free from the subjective characteristics of the researchers.
Authentic scientific knowing is based on data that is... VALID it says what it purports to say
RELIABLE it says what it says accurately and consistently, and OBJECTIVE it says what it says without subjective distortion or personal bias
Vocabulary coefficient of stability concurrent validity construct validity 21
Babbie, 118
© 4th ed. 2006 Dr. Rick Yount
measure of steadiness, or sameness, of scores over time degree a new (easier?) test produces same results as older (harder?) test degree to which test actually measures specified variable (e.g.’ intelligence’) 22
Sax, 238
8-7
Reseach Design and Statistical Analysis for Christian Ministry
content validity Cronbach’s coefficient α coefficient of equivalence face validity coefficient of internal consistency Kuder-Richardson formulas objectivity parallel forms predictive validity reliability Spearman-Brown prophecy formula split half test test-retest validity
I: Research Fundamentals
degree to which test measures course content measure of internal consistency of a test measure of sameness of two forms of a test degree a test looks as if it measures stated content degree each item in a test contributes to the total score measures of internal consistency the degree that data is not influenced by subjective factors in researchers tests used to establish equivalence degree test measures some future behavior degree a test measures variables accurately and consistently used to adjust the r value computed in split-half test procedure used to establish internal consistency test given twice over time to establish stability of measures degree a test measures what it purports to measure
Study Questions 1. Define the terms “instrument,” “validity,” “reliability,” and “objectivity.” 2. Discuss the relationship between an operational definition and the procedures for collecting data. 3. Of these three essentials of research, which is most important? Clear research design, accurate measurement, precise statistical analysis. Why?
Sample Test Questions 1. Which of the following is not part of the Great Triad? a. predictive validity b. internal consistency c. instrument objectivity d. empirical measurement 2. Content validity is most concerned with how well the instrument a. predicts some future behavior b. defines an hypothetical concept c. matches the results of another instrument d. measures a specific universe of knowledge 3. The coefficient of stability is more commonly known as the a. split-half test b. test-retest c. the K-R 20 d. parallel forms 4. Using Babbie’s analogy of shots on a target, a tight cluster off to the side of the target would represent a. high reliability with high validity b. low reliability with high validity c. high reliability with low validity d. low reliability with low validity
8-8
© 4th ed. 2006 Dr. Rick Yount
Chapter 9
Observation
Unit II: Research Methods
9 Observation The Problem The Obstacles Practical Suggestions In a sense, all scientific research involves observation of one kind or another. This is what empiricism means (Review Chapter One if needed). But in this chapter we come to focus on observation as one specific research technique among many. In this sense, the term “observation” means “looking at something without influencing it and simultaneously recording it for later analysis.”1 In observational research, we do not deal with what people want us to know (self-report measures) or with what some test writer believes he knows (tests and scales). Rather, we deal with actual people in real situations. People are seen in action. As such, observation is the most basic of techniques. The researcher with pad in hand carefully observes subjects he has selected in order to quantify variables he is interested in. Deciding what to observe and who to observe has been discussed in more general ways. Here we will look at how to record what is seen, and what mode of observation to use. Before we move to practical steps in doing observational research, we must first consider the biggest problem in observational research. That problem is, quite simply, the human being who does the observing.
The Problem of the Observation Method Observation is a natural process. We do it all the time. We look at and listen to people. We infer meanings, characteristics, motivations, feelings, and intentions. We “know” when someone is sincere or not. We can “feel” whether or not someone is telling the truth. And this is the problem. When an observer moves from the actions he sees to an inference of motivation behind those actions, his observational data is as much related to who he is as it is to what subjects do. The major problem with observation is the fact that the observer is human! Observers have feelings, aspirations, fears, biases, and prejudices. Any one of these can influence and distort that which is being observed. Here's two examples: An observer watches a group of children at play. One child turns to another and strikes him on the arm. The observer jots down “hostility.” The event was “one child strikes another.” The observer interpreted the act to be one of hostility, which is a complex con-
June True, Finding Out: Conducting and Evaluating Social Research (Belmont, CA: Wadsworth, 1983),
1
159
© 4th ed. 2006 Dr. Rick Yount
9-1
Research Design and Statistical Analysis for Christian Ministry
II: Research Methods
struct.
Two people watch a prominent television evangelist preach for ten minutes. One responds, “What courageous leadership! What a man of God!” The other responds “What a con man! He sure can manipulate people!” The difference in the data is in the observers, not in the evangelist. More data is needed to determine which of these two pictures is more correct.
These two examples illustrate inference, an enemy of valid and reliable data. When an observer infers motive to observed action, he adds something of himself to the data. Such data is distorted, invalid and unreliable. A second enemy is interference. The very presence of the observer can affect the behavior of the people being observed. Tell a Sunday School teacher you’ll be visiting his class the next Sunday, and you can expect a marked improvement in preparation of the lesson. This factor is also the rationale for using “undercover agents” — to infiltrate and observe criminal behavior as it really is. The presence of a uniformed police officer would certainly interfere with the criminal behavior.
Obstacles to Objectivity in Observation Personal interest Early decision Personal characteristics
Obstacles to objectivity in collecting data in observation research include personal interest, early decision, and personal characteristics.
Personal Interest “I see what I want to see.” I once had a lady church member who insisted that we never elect a divorced person as a Sunday School teacher. She quoted scripture and produced one reason after another why divorced persons would be the ruin of the church —until her own daughter got a divorce. It was not three weeks until this same lady was in my office, quoting scripture and complaining of how “the church does not care about divorced people” -- that we needed to give them opportunities for service — after all, “they’re people too!!!” The scripture had not changed, but she certainly had, because of her personal experience. We always have a personal interest in any study we conduct. If we did not, the process of giving birth to a research plan might be unbearable. But our personal interest should be directed toward collecting objective facts, not proving preconceived notions. If the study is intended from its inception to substantiate what you already believe, you will have difficulty seeing anything that contradicts this perspective. This is called selective observation, or, as we have noted, “I see what I want to see.”
Early decision It is part of the reality of human perception that we naturally and automatically “fill in the gaps” of what we know to be true. We add elements from our own imagination to make situations “reasonable.” The problem with this is that we can be deceived by our own imagination into creating a situation that does not exist in reality. When we have too few factual observations, we tend to fill in too much. This is the psychological basis of gossip: filling in the gaps between known data points with what we subjectively feel. The researcher needs a large number of objective data points from which to develop a theoretical pattern. By ending the observation phase prematurely, the researcher may interpret the data incorrectly. “I’ve seen enough. I can see the trend.” The trend may be an incorrect extrapolation from the facts.
9-2
© 4th ed. 2006 Dr. Rick Yount
Chapter 9
Observation
Personal characteristics Many of the things that characterize us as being “human” pose difficulties in the observation process: emotions, prejudices, values, physical condition. We can unknowingly make a faulty inference because of the subjective influence of one or more of these personal characteristics. They may be difficult to identify.2 Whatever we study, we must make every effort to insure that our data reflects that which we study and not ourselves. Objective observation checklists can help remove our personal biases and lack of neutrality concerning the chosen subject.
Practical Suggestions for Avoiding these Problems Here are some key guidelines to use if you plan to do an observational study.3
Definition Observation is the act of looking at something —without influencing it — and recording the scene or action for later analysis.
Familiar Groups Positively, studying a familiar group permits the use of previous experience with the group and established understanding of the subjects. Negatively, this very previous experience reduces the objectivity of the study. Further, revelation of discoveries within a familiar group can be perceived by group members as a betrayal of a trust. For example, a minister on a large church staff decides to study "interpersonal conflict in local church ministry," using his position as a platform for observation if staff meeting discussions. While his existing relationship with the staff (and further, the level of trust he enjoys with staff colleagues) will encourage more realistic behaviors, revelation of those behaviors through his study may well end his relationships!
Unfamiliar Groups Positively, studying an unfamiliar group reduces the effects of group identification and bias. In addition, observers notice things that insiders overlook. Unfamiliarity with the group improves objectivity in the data. Negatively, observers face problems in gaining access to unfamiliar groups, and, once involved, may have difficulty in understanding member actions within the group.
Observational Limits Observation is an intensively human process. It is a fact that observers simply cannot study some people. Factors such as gender, age, race, appearance, religious denomination, or political affiliation of observers may prevent access to some groups of subjects. These are just six of many possible barriers to observation.
Manual versus Mechanical Recording Manual recording refers to taking notes by hand during an observational session. Mechanical recording refers to recording the observations with tape recorders or video equipment. Manual recording of data is more difficult than mechanical, but can be Hopkins, 81
2
True, 175-176
3
© 4th ed. 2006 Dr. Rick Yount
9-3
Research Design and Statistical Analysis for Christian Ministry
II: Research Methods
simplified by using shorthand or tallies on observation checklists. Mechanical recording makes an exact record of all the data, but does nothing to simplify or reduce the bulk of the observations. Observational episodes must be analyzed at a later time.
Interviewer Effect Observation is an intensely human process! If subjects see observers taking notes, they may well change their behavior. (Interviewer effect is increased). Recording data surreptitiously decreases interviewer effect, but can be an invasion of privacy!
Debrief Immediately Write-ups of observation sessions have to be made promptly because observers -being human! -- may selectively forget details, or unintentionally distort observations. Waiting until after the observational session is over to record responses greatly increases the likelihood that observer subjectivity will influence the data.
Participant Observation (Compare "Familiar Groups"). Positively, participant observers (i.e., observers who are members of the groups they observe), have easier access, and gain a truer picture of group behavior. Negatively, participant observers are restricted to one role within the group, and are more partial in their observations than a non-participant observer.
Non-participant Observation (Compare "Unfamiliar Groups"). Positively, non-participant observers have a clearer, less biased perspective on group behavior. Negatively, the presence of a known (non-member) observer alters the behavior of subjects, especially at the beginning of the study. Failure to announce the purpose for an observer being present in the group may be unethical.
Observational Checklist An observational checklist is a “structure for observation,” and allows observers to record behaviors during sessions quickly, accurately, and with minimal interviewer effect on behaviors. Dr. Mark Cook developed an “observer consistency checklist” for use in his study on active participation as a teaching strategy in adult Sunday School classes.4 He described his instrument this way: The observer consistency checklist was developed to be used by trained observers in examining each teaching situation for consistency across treatments. It was imperative in this study that all other elements in the lesson plan and teaching environment be held constant while allowing active participation to be the independent variable. This evaluation form included (a) a checklist of teacher factors (such as any unusual enthusiasm or behaviors), student factors (such as unusual interruptions or group behaviors), and unusual external factors (outside interruptions, weather, or equipment problems); (b) frequency counts of the number of external interruptions, disruptions by students, departures from the lesson, and active participation; (c) a five-point rating of teacher enthusiasm; and (d) a record of the time span of the lesson.5
A copy of the checklist is located at the end of the chapter.
Cook, 21
4
9-4
Ibid., 22
5
© 4th ed. 2006 Dr. Rick Yount
Chapter 9
Observation
Summary The fundamental data gathering technique in science is observation. In this chapter we looked at the obstacles facing one who plans to do an observational study, as well as practical suggestions to help you plan an effective study.
Vocabulary inference interference interviewer effect observation
researcher infers motivation behind observed behavior researcher changes observed behavior by his/her presence potential bias in data due to subjective factors in interviewers gathering data by way of objective observation of behavior
Study Questions 1. Define “observation research.” 2. Define in your own words the terms “inference” and “interference” as they relate to enemies of valid data. Give an original example of each term. 3. Explain how our “humanness” is a liability in observational research. observa
Sample Test Questions 1. The most basic approach of science to acquiring data is through a. statistical analysis b. standardized testing c. direct observation d. controlled experimentation 2. “I see what I want to see” is most closly related to which of the faollowing obstacles? a. personal interest b. early decision c. personal characteristics d. subjective projection 3. By observing unfamiliar groups, researchers a. reduce the objectivity of the studies b. increase the introduction of their own personal bias into the data c. notice things insiders easily overlook d. employ their own personal experiences with the group
© 4th ed. 2006 Dr. Rick Yount
9-5
Research Design and Statistical Analysis for Christian Ministry
II: Research Methods
APPENDIX A6 OBSERVER CONSISTENCY CHECKLIST Date: _______________________
Time: ________________________
Observer: ___________________
Teacher:______________________
Observer Instructions: Place a checkmark for each episode of the following factors. Memo the significant events or factors under the comment section at the bottom of the form.
OBSERVED FACTORS SON
ACTIVE LESSON
NONACTIVE LES-
EXTERNAL FACTORS Interruptions from outside class Unusual weather Equipment problems Any other external factors
_____ _____ _____
_____ _____ _____
STUDENT FACTORS Students' experiences affect lesson Student interruptions Hostile environment Unusual group behavior
_____ _____ _____ _____
_____ _____ _____ _____
_____ _____ _____ _____ _____ _____ _____ _____
_____ _____ _____ _____ _____ _____ _____ _____
_____
_____
_____
_____
TEACHER FACTORS Teacher experience affects lesson Unusual teacher enthusiasm Unusual teacher behavior Different teaching style Variation from lesson plan Gave test answers Use of active participation Level of teacher enthusiasm (Scale: 1-5) Time of lesson (record in minutes) Attendance in the class
_____
_____
COMMENTS: _________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________
Cook, 61
6
9-6
© 4th ed. 2006 Dr. Rick Yount
Survey Research
Chapter 10
10 Survey Research The Questionnaire The Interview Developing a Survey Instrument Survey research uses questionning as a strategy to elicit information from subjects in order to determine characteristics of selected populations on one or more variables.1 A written survey is called a questionnaire; an oral survey is called an interview. Although they serve similar purposes in gaining information, each provides unique advantages and disadvantages to the researcher.
The Questionnaire The mailed questionnaire has been heavily criticized in recent times and has fallen into disfavor as a device for gathering data. But it has been the abuse and misuse of this technique that has drawn the criticism, not the nature of the questionnaire itself.2 Hastily constructed questionnaires, consisting of poorly worded questions, produce unreliable information at best and invalid results at worst. A planned, well-constructed questionnaire can obtain information that is obtainable in no other way.
Advantages A questionnaire provides researchers several advantages over the interview.
Reliability
A questionnaire allows researchers to gather data from any part of the world. Through the use of existing postal systems, or, more recently, the internet, contact can be made with almost any literate population of interest. As a result, subjects can be randomly selected from wide-ranging populations, such as “Southern Baptists in America.”
Researcher influence The standardized wording of a printed questionnaire reduces researcher interference in subject responses. The researcher’s gender, appearance, mannerisms, social skills and the like have no effect on how subjects respond to the questions.
Cost Even with the high cost of postage, the mailed questionnaire is still the most Gay, 191
Influence Cost
Remote subjects
1
Remote Subjects
Hopkins, 145
2
© 4th ed. 2006 Dr. Rick Yount
10-1
Convenience
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
economical means, per subject, for gathering data. The economy of process allows researchers to increase the number of subjects in the study. Increased sample size provides more accurate estimates of population characteristics. Not only does the questionnaire save money directly, it also saves time. Consider the difference in processing time between mailing out 1000 questionnaires and interviewing 1000 subjects. Dr. Jay Sedgwick of Dallas Theological Seminary (Southwestern Ph.D. graduate, 2003) analyzed differences in costs and data quality among three data collection techniques. He investigated direct collection from conference participants, e-mail responses to a website, and a traditional mailed survey. Conventional wisdom suggested that email would provide quality data at greatly reduced costs. He found this not to be the case. Direct collection can be frustrated by restrictions imposed by conference leaders. Return rate was lowest among email recipients -- and responses provided the least reliability. The mailed survey was the most expensive, but provided the best return rate and quality of data.
Reliability The standardized wording and structured questions of the questionnaire provide a higher reliability in the data than is practically able to be obtained by interview.
Subjects’ convenience The questionnaire is completed at the subjects’ convenience. They can consider each question, check necessary records, and reflect on their answers. Data is more valid under these conditions than when answers are given "on the spot" in an interview.
Disadvantages Rate of Return Inflexibility Motivation Limited data Loss of control
There are disadvantages in using a mailed questionnaire that are overcome by the interview. These include the questionnaire's rate of return, its inflexible structure, the level of subject motivation, the limitation of not observing the subject as questions are answered, and the loss of control over the questionning process.
Rate of return The biggest drawback in using questionnaires is the rate of return of the completed forms. Let me illustrate. You have drawn a representative sample from which to collect data. But when the questionnaires stop coming in, you find that only 35% of the sample responded. Why did 65% not respond? Are they different in some systematic way from the 35% who did? Does this have a bearing on your variables? You have no way of knowing. And this is a confounding variable (a source of error) in your study. Therefore, valid mail surveys have extensive follow-up procedures to produce the largest possible rate of return. How large? Some texts say 50%, some 60%. We suggest that doctoral students gathering data for their dissertation aim to get a 70% response rate or better. The return rate is computed as a percentage as follows:4
rate = (R / (S - ND)) x 100 where R equals the number of questionnaires that were returned, S the number Hopkins, 148
4
10-2
© 4th ed. 2006 Dr. Rick Yount
Survey Research
Chapter 10
sent out, and ND the number unable to be delivered (“return to sender”). For example, if you send out 180 questionnaires, and have 10 undelivered and 150 returned, your return rate is
rate = (150 / (180 - 10)) x 100 = rate = (150 / 170) x 100 = rate = (.882353) x 100 =
88.24%
The major problem with a low rate of return is that the data may not reflect the true measure of the sample you chose to study. Part of the sample volunteered to comply with the research request, and returned the completed form. Others ignored the questionnaires. The difference in willingness to comply may relate to some aspect of your study. So, a low return rate (i.e., less than 50%) of survey forms may well give a distorted view of the target population. Higher return rates (60% - 80%) increase confidence that the returned data correctly reflects the sample, which, in turn, reflects characteristics in the population from which the sample was drawn.
Inflexibility The structure of a written questionnaire (which increases reliability of subject responses) also limits the researcher’s ability to probe subject responses or clarify misunderstandings. To write a questionnaire which directs subjects through a series of probes (follow-on questions which move the subjects deeper) and branches (skips to following sections) usually results in a complex, perhaps confusing, instrument. The written questionnaire is much more inflexible than the interview as a device for gathering data.
Subject motivation There is no way to determine the motivation level of the subjects when they fill out the form. What is the subject's mental state: overworked, busy, contemplative, focused? The questionnaire cannot measure this as an interviewer would.
Verbal behavior only Questionnaire data is limited to the responses subjects choose to make. Researchers can’t know the mental or emotional state subjects are in when they complete the questionnaire. Researchers can not observe how subjects behave while completing the questionnaire. They merely get subjects’ verbal responses about their behavior.
Loss of control Researchers give up control over the administration of the questions on the survey form. There is no control over the subject’s environment, time, or attention to the task. There is no control over the order in which the questions are answered. There is no control over leaving answers blank. This loss of control creates “missing data” or “distorted data,” which can pose problems in statistical analysis.
Types of questionnaires Questionnaires consist of questions of two basic types: structured and unstructured. A structured question, sometimes called “close-ended,” provides a predeter© 4th ed. 2006 Dr. Rick Yount
10-3
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
mined set of answers from which the subject chooses. Here is an example of a structured, or close-ended, question: “What kind of college did you attend? ____ Evangelical college
____ Catholic college
____ Private secular college
____ State college”
____
Other: ______________________________________________________ (please describe)
The advantage of this type of question over the unstructured (“open- ended,” see below) question is its greater reliability. It is a more reliable (consistent, stable) question because subjects are given specific responses from which to choose. The data from this type of question are more easily analyzed than data from open-ended items. The second type of question is the unstructured, or “open-ended,” question. This question asks the subject for information without providing choices. Here’s how the structured question above might be restated as an unstructured item. “Describe the kind of college you attended.”
This type of question allows subjects to respond in their own way, using their own terms and language. It is less restrictive, so it might uncover subject characteristics that would be missed by the close-ended type. The open-ended item, however, increases the likelihood that subjects will respond incorrectly (that is, in a way not planned by the researcher). One subject might answer the above question like this: “It was an expensive nightmare!” This tells the researcher how he felt about his college, but it does not answer the question he had in mind. Close-ended questions may miss important data points because they are restrictive. Open-ended questions may provide so many data points that the researcher cannot reduce them meaningfully. The answer? Use a survey form of open-ended questions in a pilot project to gather as many answers as possible. Then design a close-ended questionnaire for the actual study. This provides a valid base for the structured items, yet yields a reliable set of data for the study.
Guidelines Here are some specific guidelines for developing a questionnaire.
Asking questions The key to designing an effective questionnaire is asking good questions. A good question is specific, clearly presented, and generates an answer that is definite and
10-4
© 4th ed. 2006 Dr. Rick Yount
Survey Research
Chapter 10
quantifiable. Asking unambiguous, meaningful questions is difficult. Researchers write questions according to standard guidelines (see Chapter 11). They then evaluate and revise questions as needed. Finally, questions are validated for “clarity” and “meaningfulness” by objective judges. The quality of the questionnaire is built directly on the quality of each question in it.
Clear instructions Questionnaire designers know how to fill out their questionnaires because they created them. It is easy to assume that anyone would know how to complete the form. Such assumptions can doom a survey study. Subjects need clear instructions for completing the survey. If there are several sections in the form, specific instructions should be given for each section.
Understandable format The order of questions in the questionnaire should not confuse subjects. Answers should be easy to select. Eliminate complex structures as much as possible (i.e., avoid probes into telescoped questions, or jumps to different sections in the form). A simple structure will produce more reliable data.
Demographics at the end “Demographic questions” describe the subject who is answering the questionnaire in general categories: age, gender, economic status, education level, and other such personal information. Place these questions at the end of the questionnaire. First, by placing content questions first, you lead subjects to make thoughtful responses quickly. After investing time to answer content questions, subjects are more likely to fill out the demographic questions, and -- most importantly! -- return the form to the researcher (increasing return rate!). When you place demographic questions first, subjects can get a feeling of invasion, and simply throw the document away. “Demographics last” increases the validity of answers and the return rate.
The Interview In its most basic form, the interview is an “oral questionnaire” where subjects answer questions “live,” in the presence of researchers or their assistants.
Advantages There are several key advantages to using an interview approach over the mailed questionnaire.
Flexibility Motivation Observation
Flexibility
Broad Application
A face-to-face interview affords greater flexibility than the more rigid written questionnaire. Interviewers can branch from one set of questions to another without confusing the subject. The interviewer can clarify misunderstandings of questions or instructions. If a subject makes an unexpected comment, the interviewer can investigate with follow-up questions. The survey instrument can be more complex. This is because a trained interviewer is better able to handle branching and probing than the
© 4th ed. 2006 Dr. Rick Yount
10-5
No Mailing
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
untrained subject.
Motivation When interviewers and subjects are facing each other, the motivation level of subjects can be directly observed and noted. Rapport between the interviewer and subject can create a more cooperative atmosphere, which increases the validity of the subjects’ responses.
Observation Researchers can record the manner, as well as the content, of subjects’ answers. Mood, attitude, bias, emotional state, body language, facial expression —these are excellent clues to the quality of answers being received.
Broader Application Interviewers can gather information from people who cannot read. Young children, senior adults with poor eyesight, and groups for whom English is a second language can give better information through an interview than they can with a written questionnaire.
Freedom from mailings Interviewing subjects precludes all of the problems associated with mailing out (and getting back!) surveys: postage and materials’ costs, bad addresses, return mail, return rates, and the like. Time Cost
Disadvantages Likewise, there are some major disadvantages with the interview.
Int. Effect Int. Variables
Time Questioning scores of subjects one by one, in person, requires far more time than sending out survey forms by mail. In order to acquire a sufficient sample size of subjects, researchers may need to enlist and train a group of assistants to help in the interviewing. The training of interviewers is a monumental task and requires a great deal of time to insure that all the interviewers administer the survey the same way.
Cost While the cost of postage is avoided by interviewing subjects, interviewing involves other expenses. Payment of assistants is more expensive than stamps, but is necessary if you plan to do a professional study. The printed interview guide will cost about the same to print as a comparable questionnaire. Additionally, interviewing may require travel costs or long distance phone costs. This means that, given a set research budget, the number of subjects you can interview will be less than the number you can survey by mail. This results in a loss of statistical power in your study.
Interviewer effect Do you remember the problems of inference and interference associated with observation research (Chapter 9)? All of the “human” problems we discussed regard-
10-6
© 4th ed. 2006 Dr. Rick Yount
Survey Research
Chapter 10
ing observational research apply to interviewers as well. Personal characteristics, social skills, competence, gender, appearance — all of these factors will produce variance in subject responses to questions, unless they can be controlled by homogeneous enlistment and adequate training.
Recording Skills Demographics Modes
Interviewer variables Differences among interviewers — their values, beliefs, and biases — may introduce distortion in the way interviewers interpret and record responses by subjects.
Types of Interviews Earlier in the chapter we defined questions which are “structured” (close-ended) and “unstructured” (open-ended). A “structured interview” is simply an oral questionnaire. Researchers ask the questions in the order they appear on the form. An unstructured, or “free response,” interview presents the subject with open-ended questions. Researchers can follow up answers with probes and skips without confusing subjects. Just as the structured question increases reliability and decreases the range of answers, so does the structured interview. Just as the unstructured question increases answer variance and decreases the ability to quantify research data, so does the unstructured interview.
Guidelines Here are some specific guidelines to consider if you plan to use the interview.
Recording responses Subject responses need to be accurately recorded during the interview. Recording the responses after the interview invites problems with subjective interpretation, selective memory, or unconscious bias.
Interviewer skills Before the study begins, interviewers should be given adequate practice in asking questions, fielding responses, probing, clarifying instructions, and recording answers. If skill levels differ among the interviewers, extraneous variability will be introduced into the data, making findings ambiguous.
Objective Write Items' Select Items
Demographics first
Format
Ask demographic questions first in the interview. By asking non-threatening demographic questions at the beginning of an interview session, researchers establish rapport between themselves and subjects. Such rapport improves the level of trust between researchers and subjects, which, in turn, increases the validity of answers received. Demographics come FIRST in the interview, LAST in the questionnaire.
Instructions
Alternative modes The face-to-face interview is only one mode of interviewing. Researchers can conduct interviews by telephone. This extends the range of the interview far beyond that possible with face-to-face meetings. Researchers can also mail cassette tapes to subjects. The subject listens to the question on tape and records his answer. This is less © 4th ed. 2006 Dr. Rick Yount
10-7
Pilot Study
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
expensive than interviewing by phone, and extends the interview beyond that possible with face-to-face meetings. These modes provide more subject information than the written questionnaire. Voice characteristics, subject hesitation, and tone of voice provide clues to subject motivation. Still, none of these alternatives permit direct observation of the subject as in the face-to-face meeting.
Developing the Survey Instrument The following steps should be taken in developing a questionnaire or interview guide (See Borg and Gall, Chapter 11, for details).
Specify Survey Objective Determine the objective of the survey. What is the focus of the research? What exactly do they need to know? What are the related areas? Include in the instrument only what is needed for the study.
Write Good Questions Develop an “item pool” of good questions (i.e., more questions than are needed) which relate to the study. Each question should be clear and definite. Each should generate an answer that is clear and quantifiable.
Evaluate and Select the Best Items Submit the item pool to a panel of evaluators. This panel should consist of 5 to 8 experts in either the content area of the survey, or research design, or both. Have them rate each item in the pool on the basis of relevance to the study, and clarity of composition. Combine the ratings of all the judges for all the items to determine which items are best suited for your study.
Format the Survey Place the questions in an attractive format that enhances transition from question to question. If the instrument is to be used as a written questionnaire, provide an easy way to record responses.
Write Clear Instructions Write clear, concise instructions to insure that the survey is done correctly. Let several people who are unfamiliar with the study read the instructions, and explain what they would do to complete the instrument. Revise the instructions as needed.
Pilot Study Select a group of people similar to those who will be involved in the actual study. Use the instrument to gather data from them. Check for any problems the pilot group encountered while completing the form. Ask the group for suggestions. Revise the instrument as needed.
10-8
© 4th ed. 2006 Dr. Rick Yount
Survey Research
Chapter 10
Summary Survey research gathers specific data from a large group of people that possess that data. We have developed advantages, disadvantages, and guidelines for using the mailed questionnaire and the personal interview.
Examples Dr. Margaret Lawson designed her own questionnaire to gather data for her study of selected variables and their relationship to whether or not Life Launch pilot churches (1987-88, n=120) continued offering LIFE courses (MasterLife, Experiencing God, Pa-renting by Grace, and the like, 1992-93).5 She collected data on what courses were offered, who led the courses (pastor, staff or lay), how the materials were paid for (participants paid full, part, or none), as well as attendance in Sunday School and Discipleship Training, church membership, number of baptisms, gifts and initiated ministries. Her survey instrument is located at the end of the chapter. Her procedure for developing the survey form was as follows: The steps in developing the survey instrument were as follows:6 1. Questions were designed for subjects' responses to reflect information on the factors present in those churches that did, and those that did not, continue to offer LIFE courses. The same two-page questionnaire was sent to all the churches. Drew and Hardman suggest that respondants are more likely to complete a cone or two page questionnaire.91 2. A validation panel of experts drawn from the areas of adult discipleship training, research design, and the field of religious education were asked to rate the relevance and clarity of each question. . . .Following the panel's critique and evaluation eight surveys were returned. Suggestions were offered by Avery Willis and Clifford Tharpe and the appropriate revisions and modifications were incorporated.93
Dr. Darlene Perez developed her Spanish-language survey to gather information from youth and youth leaders in Puerto Rico concerning Youth Curriculum materials. Here was her procedure:7 The Youth Sunday School Curriculum Questionnaire was designed to obtain data related to the youth curriculum variables identified in the problem statement. The procedures for designing the instrument followed guidelines in . . . Research Design and Statistical Analysis for Christian Ministry.2 . . . .The first step . . .consisted of stating the purpose of the study with clear instructions on how to complete the questionnaire. Second, an item pool of questions was developed. The questions were written in an objective, structured and close-ended form. They were designed to obtain information about the curriculum being used by participants, the degree of curriculum satisfaction, the disposition to change curriculum, the preference for a Bible study approach, and the preference for a teaching/learning method. Third, the questionnaire included a section at the end for demographic information. A copy of this questionnaire is provided as appendix H. . . . The questionnaire was submitted to a validation panel of seven experts in the areas of education or curriculum development or youth knowledge. Each panel member considered points of clarification and the validity of each item. The best, most clear, and most valid questions were selected for the survey. . . . A proposed pilot study with youth and youth leaders not included in the research was to 5 Margaret P. Lawson, “A Study of the Relationship Between Continuance of LIFE Courses in the LIFE Launch Pilot Churches and Selected Descriptive Factors,” (Ph.D. dissertation, Southwestern Baptist Theological Seminary, 1994) 6 7 Ibid., 25-26 Perez, 55-58
© 4th ed. 2006 Dr. Rick Yount
10-9
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
be completed in Puerto Rico. The validation procedures with a pilot group the following steps: 1. The Sunday School Board provided a list of Baptist and non-Baptist churches in Puerto Rico currently using the Spanish Convention Uniform Series. A non-Baptist, evangelical church (Alianza Christiana y Misionera, Rio Pierdras, Puerto Rico) was selected for the pilot study. The questionnaire was submitted during a youth Sunday School class to a group of thirteen youth and three youth leaders. Corrections were made to clarify the instructions on how to complete the questionnaire. Also, the term "youth" (joven) was changed to Intermedios y Pre-jóvenes along with a parenthesis stating the ages twelve to seventeen. 2. After making corrections, it was felt that the instrument needed further validation. A second validation pilot study was performed with a group of thirty youth and youth leaders from the Baptist Convention of Puerto Rico who were meeting at a youth camp during July, 1990. After this validation process, the following changes were made. . . [six changes listed]. 3. In order to make the validation process more consistent, a third pilot study was performed with a group of thirty youth and youth leaders from the Puerto Rico Southern Baptist Association, at a youth camp in July 1990. Only a few corrections were made in the section of demographics. . .[two changes listed]. A copy of the validated questionnaire appears as appendix I. [the English-language version is included at the end of the chapter]
10-10
© 4th ed. 2006 Dr. Rick Yount
Survey Research
Chapter 10
Vocabulary close-ended question demographics item pool open-ended question rate of return structured question unstructured question validation panel
type of question which provides a set of answers to choose from (a b c d) personal data on subjects (gender, ed level,years in ministry) a collection of test items from which a subset is drawn for creating an instrument question which allows subject to answer in his/her own words percentage of mailed questionnaires which are completed and returned synonym for close-ended question synonym for an open-ended question judges who analyze the clarity and relevance of questions in an item pool
Study Questions 1. Compare and contrast the advantages and disadvantages of the interview and questionnaire. 2. Define “structured” or “close-ended” questions. Give an example. 3. Define “unstructured” or “open-ended” questions. Give an example. 4. Discuss the pros and cons of using structured or unstructured questions. 5. Differentiate the handling of demographic questions in the questionnaire and interview.
Sample Test Questions 1. The criticism of survey research is based primarily on the A. lack of depth of information gained by the survey approach B. availability of better data gathering instruments C. absence of good statistical tools to analyze survey data D. abundance of poorly constructed survey instruments 2. One major advantage of the questionnaire is that it A. generally produces a high return rate B. possesses a high degree of flexibility C. eliminates the researcher’s influence on subjects D. focuses only on the verbal behavior of subjects 3. You send out 1000 questionnaires. 200 are returned marked “Addressee unknown — Return to Sender.” 400 are completed and mailed back. Your rate of return is A. 50% B. 400 C. 40% D. 600 4. The best advantage of close-ended questions is the ____ of the answer. A. reliability B. flexibility C. range and depth D. correctness 5. An open-ended question A. decreases the validity of the answer B. increases the reliability of the answer C. increases the variability of the answer D. increases the objectivity of the answer 6. A major disadvantage of the interview is A. its broad application B. its inflexibility C. the higher cost of the data D. the limitation of measuring verbal behavior only © 4th ed. 2006 Dr. Rick Yount
10-11
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
[65]
LIFE LAUNCH SURVEY
7
Please complete the information requested concerning LIFE courses in your church at the time of the LIFE Launch project and the present time. FIRST YEAR refers to the reporting year following the LIFE LAUNCH, October 1987 to September 1988. LAST YEAR refers to the latest reporting year, October 1992 to September 1993.
1. What LIFE courses did you offer in the first year of the LIFE Launch? MasterLife MasterBuilder MasterDesign Parenting by Grace None Other (please specify) __________ 2. What LIFE courses have you offered during the last year? MasterLife MasterBuilder MasterDesign DecisionTime Parenting by Grace I Parenting by Grace II Covenant Marriage WiseCounsel Disciple's Prayer Life Experiencing God Step by Step Step by Step Through the Old Testament Through the New Testament LifeGuide to Discipleship None and Doctrine Other (please specify) ______________________________ 3. Which staff member began the initial LIFE courses? Pastor Associate Pastor Minister of Education Other (please specify) __________ 4. Did any lay person have a leadership position from the beginning? Yes No 5. Has a staff person led LIFE courses in the past year? Yes No 6. Has a lay person led LIFE courses in the past year? Yes No
(OVER) Lawson, 65-66
7
10-12
© 4th ed. 2006 Dr. Rick Yount
Survey Research
Chapter 10
7. Indicate how participants paid for their study materials in the first year: Participants paid full price Participants paid some of the cost Materials were provided Other (please specify) free of charge __________ 8. Indicate how participants paid for their study materials in the past year: Participants paid full price Participants paid some of the cost Materials were provided Other (please specify) free of charge __________ 9. Indicate the total number of participants in all LIFE groups: _______ FIRST YEAR
________ LAST YEAR
10. Indicate the average number of participants in individual LIFE groups: _______ FIRST YEAR
________ LAST YEAR
11. Complete the following information about your church during the LIFE Launch year: _____ Resident Church Membership _____ Average Sunday School Attendance
_____ Total Baptisms _____ Average Discipleship Training Attendance
_____ Total Gifts 12. Complete the following information about your church during the past year: _____ Resident Church Membership _____ Average Sunday School Attendance
_____ Total Baptisms _____ Average Discipleship Training Attendance
_____ Total Gifts 13. What specific ministries have been initiated by LIFE course participants?
Please return the completed survey to: Margaret Lawson address address city, state Would you like to receive a summary of the results of the survey? ______________
© 4th ed. 2006 Dr. Rick Yount
10-13
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
108 APPENDIX I8 VALIDATED YOUTH SUNDAY SCHOOL MATERIALS QUESTIONNAIRE (ENGLISH TRANSLATION) The purpose of this questionnaire is to obtain basic information about the Sunday School youth materials being used in your church and to identify the curriculum preferences of youth and youth leaders. Instructions: Select with a check mark (9) the best alternative. Choose only one response for each question
1) Which Sunday School materials are currently being used in your church? __ 1. El Interprete (Convention Uniform Series of the Sunday School Board) __ 2. Enseñanza Bíblica Para Jóvenes, (Diálogo y Ación Program of The Spanish Publishing House) __ 3. Materials designed in your own church. __ 4. Exploradores y Embajadores (Editorial Vida, Miami, Florida) __ 5. Other, specify: ____________________________________________
2)
How satisfied are you with the Youth Sunday School materials used in your church?
__ 1. Very satisfied (I like it very much)
__ 3.
__ 2. Satisfied (I like it)
3)
__ 4.
Very dissatisfied (I do not like at all)
Are you interested in changing Youth Sunday School materials?
__ 1. Yes
4)
Dissatisfied (I do not like it)
__ 2. No
__ 3. Indifferent
If you were going to change Youth Sunday School materials, which Bible study approach would you prefer?
__ 1. I would like to study the Bible systematically, book by book, covering the whole Bible within a certain period of time. __ 2. I would like to study the Bible by themes that relate to daily life, such as the family, friendships, the community, and others. __ 3. I would like to study the Bible by doctrinal themes, such as the doctrine of God, Jesus, the Holy Spirit, Church, Bible, prayer, and others. __ 4. I would like to have Bible studies about discipleship, Christian growth and formation.
Perez, 108-109
8
10-14
© 4th ed. 2006 Dr. Rick Yount
Survey Research
Chapter 10
109 5.
If you were going to change the Youth Sunday School materials, which teaching/learning methods would you prefer?
__ 1. Conference -- The teacher would expose and explain the Bible passage. __ 2. Questions and answers -- The teacher would use questions to promote group participation. __ 3. Small group work -- The class would be divided into small groups. Each group is assigned to work on a task and will report to the whole class its findings. __ 4. Individual tasks -- The teacher would assign questions or tasks to each student and he/she would work independently. __ 5. Other, specify: _________________________________________________
Please complete the following information: Position: ___ ___ ___ ___ ___ ___ ___
Youth Pastor Youth Minister Minister of Christian Education Sunday School Director Youth teacher Other
Sex:
Age: ___
___
Male
___
Female
Denomination: ___
Southern Baptist
___
American Baptist
___
Other, specify: ___________________________________
Church name: ____________________________________________ Have you completed this questionnaire before?
____ yes
_____ no
Comments/suggestions: ________________________________________________________________________ ________________________________________________________________________ ________________________________________________________________________ ________________________________________________________________________ ________________________________________________________________________
© 4th ed. 2006 Dr. Rick Yount
10-15
Research Design and Statistical Analysis in Christian Ministry
10-16
II: Research Methods
© 4th ed. 2006 Dr. Rick Yount
Writing Tests
Chapter 11
11 Developing Tests Preliminary Considerations Objective Test Items Essay Test Items Item Analysis
A test is an instrument which measures a subject’s knowledge, understanding, or skill in a given content area, and produces a ratio score reflecting that measure. If the focus of a study is "testing subjects" on some variable (Bible knowledge, comprehension of various translations, current events), an appropriate test must be found, or one must be developed. This chapter introduces you to principles of developing tests.
Preliminary considerations You may be able to use existing tests for your study. Let’s say the nature of your study is to identify a relationship between “job satisfaction” and “interpersonal dynamics among staff members”. You may be able to find an existing test which will measure “job satisfaction.” Check the Mental Measurements Yearbook, or Tests In Print, or other such resources for published tests in your area of interest. Tests can also be found in research articles being gathered for the Related Literature section of your proposal. Study the validity and reliability scores on the test, the population(s) the test was designed for, and the conditions of test administration. If these factors fit your study, you’re in business! Describe these characteristics in the “Instrument” section of your proposal. You may need, however, to develop your own test, since there are many areas in the field of Christian Education that do not yet have tests. This chapter focuses on the procedure to use in developing such a test for use in a larger dissertation context. Good tests gather good data. Good tests build good attitudes. Good tests can even produce a positive learning experience. The principles discussed here will help you in this task.
The Emphases in the Material A test should measure important areas of instruction, knowledge, understanding, or skill. The emphases of the test should parallel emphases in the material which subjects have learned. Avoid writing trivial, ambiguous, or simplistic questions.
Nature of Group Being Tested Study the group you intend to test. The level of difficulty of the test, the language © 4th ed. 2006 Dr. Rick Yount
11-1
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
you use, the length of the test and other such variables depend a great deal on who your subjects are.
The Purpose of the Test What is the purpose of the test? What do you really want to know? Are you measuring knowledge, or comprehension, or the ability to solve problems, or to analyze new situations? Are you measuring simple recall, or mental reasoning? The purpose of the test provides a “North Star” to guide your developmental process.
Writing items Avoid ambiguous or meaningless test items. Use good grammar. Avoid rambling or confusing sentence structure. Use items that have a “definitely correct” answer. Avoid obscure language and “big words,” unless you are specifically testing for language usage. Be careful not to give the subject irrelevant clues to the right response. Using “a(n)” rather than “a” or “an” is an example of this. In short, a test should not provide any barrier to subjects apart from demonstrating mastery over the test content. Otherwise, scores reflect more “noise” than “true measure.”
Objective Tests True-False Multiple Choice Matching
An “objective” test is a test made up of close-ended questions. Objective tests have several advantages over essay tests. Asking 100 objective questions over a given content field provides a much better sampling of examinee knowledge and understanding than asking three or four essay questions. With objective tests, grading is easier and the scores are a more reliable measure of what the examinee knows. There are four common types of objective questions. These are the constant alternative (true-false) question; the changing alternative (multiple choice) question; the supply (or fill-in-the-blank) question; and the matching question.1
The True-False Item The true-false, or constant alternative item, presents the subject or student with a factual statement. The statement is judged to be either true or false.
Advantages The advantages of the true-false test item are efficiency and potency. It is efficient in that a large number of items can be answered in a short period of time. Scoring is fast and easy. It is potent because it can, in a direct way, reveal common misconceptions and fallacies.
1 The material in this chapter is a synthesis of principles gleaned from Nunnally, “Chapter 6: Test Items,” 153-196; and Payne, “Chapter 5: Constructing Short Answer Achievement Test Items,” 95-136. These are excellent resources for those wanting to improve their test-writing ability. Another excellent (more recent) source is Tom Kubiszyn and Gary Borich, Educational Testing and Measurement: Classroom Application and Practice, 2nd (Glenview, IL: Scott, Foresman and Company, 1987). Also more recent material can be found in my own Created to Learn (1996), Chapter 14 and Called to Teach (1999), Chapter 9, both from Broadman and Holman.
11-2
© 4th ed. 2006 Dr. Rick Yount
Writing Tests
Chapter 11
Disadvantages Good true-false items are hard to write. An item that makes sense to the writer may confuse even well-informed subjects. Statements require careful wording, evaluation and revision. Secondly, true-false items encourage guessing. An examinee can earn around 50% of the test score by mere chance simply by guessing at the right answer. If there are only two alternatives, then pure chance gives him 50% over the long run. This is if subjects know absolutely nothing about the subject. This is a lot of “noise” in the test scores. Thirdly, constant alternative items tend toward response sets. A response set is a repetitious pattern of answers, like the following 18-item test. T T F T T F T T F F T F T T F T T F ^ ^ ^ ^ ^ Notice that the pattern “T T F” repeats itself through the test. Test writers can produce these response sets without being aware of it. Subjects pick up these irrelevant clues, and score higher than their knowledge allows. The objective is not to insure high scores, but to actually measure what subjects know and understand.
Writing True-False items The following guidelines will help you avoid major pitfalls in writing true-false test items.
Avoid specific determiners
Determiners
Specific determiners, such as “only,” “all,” “always,” “none,” “no,” or “never,” give irrelevant clues to the correct answer. When you find these terms in a true-false item, the answer is usually FALSE. Terms like “might,” “can,” “may,” or “generally” are usually true. Write items without using these terms.
Negatives Language Quotes Item length Sentences
Absolute answer
False Items
Base true-false items on statements that are absolutely true or absolutely false. Avoid statements that are true under some conditions, but not others, unless the conditions are specifically stated. Well-informed subjects have greater difficulty answering ambiguous questions correctly, because they have more information to process in trying to understand the item.
Avoid double negatives A double negative is confusing. “T F It is not infrequently observed that threeyear-olds play in groups.” State the item positively: “T F Three-year-olds play in groups.” The latter item tests knowledge of three-year-olds and social development. The former requires knowledge plus practice in “mental gymnastics.”
Use precise language Avoid using terms like “few,” “many,” “long,” “short,” “large,” “small,” or “important” in test items. These terms are ambiguous. How much is enough to determine the truth or falseness of a T-F question? How big is “big”? How many is “many”?
© 4th ed. 2006 Dr. Rick Yount
Answers
11-3
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Avoid direct quotes If the treatment has been a classroom situation or a series of directed readings, do not test over direct quotes from class notes or the readings. These, taken out of context, are usually too ambiguous to use as test items.
Watch item length Avoid making true statements longer than false items. This is easy to do because true statements often need qualifications to “make sure they are absolutely true.” The additional length is an irrelevant clue to the answer.
Avoid complex sentences Complex grammatical constructions and obscure language infuse questions with an irrelevant level of difficulty. Take a central idea and write two simple statements, one true and one false. Place these in your item pool.
Use more false items When developing a True-False test, make about 60% of the items false. False items discriminate better between examinees than true items. Yountian modification: Improve the reliability of the test scores by having subjects correct false items to make them read correctly. Underline the most important concept in the statement, and have them change it to make the statement true. Score one point for the correct answer and one point for correcting the statement. This reduces guessing and increases reliability of scores.
Multiple Choice Items The multiple choice, or changing alternative, item consists of a sentence stem and several responses. One and only one of the responses is correct. All other responses are incorrect, but plausible. The most common form presents a stem and four or five responses.
Advantages The multiple choice question, with its multiple responses, can be written with less ambiguity and greater structure than the true-false question. Guessing is reduced since the probability of guessing the correct answer is 1 in 4 (25%) instead of 1 in 2 (50%) for true-false items. Multiple choice items can demand more subtle discrimination than other forms of objective questions. Lastly, one can write multiple choice items which test at higher levels of learning, such as application and analysis, than other question types.
Disadvantages Good multiple choice questions are difficult to write. Effective detractors — plausible wrong answers — are hard to create, particularly if you are providing a 5th or 6th alternative response. Secondly, multiple choice tests are less efficient because a subject can process fewer multiple choice items in a given time than other types.
Writing Multiple Choice Items The following guidelines will help you avoid major pitfalls in writing changing alternative items.
11-4
© 4th ed. 2006 Dr. Rick Yount
Chapter 11
Writing Tests
Pose a singular problem
Single Problem
The stem of the question should pose a clear, definite, singular problem. A common mistake in multiple choice questions is the use of an incomplete stem. “In the continent of Africa...” could be followed with any number of responses that “fit.” Even better (though not a requirement) is to make the stem a complete sentence or a direct question, rather than an sentence fragment.
Repeats Negative Stems Reponses Similar Responses Exclusive Responses Plausible Responses Random
Avoid repeating phrases in responses
Irrelevance
Rather than putting the same phrase in every response, include the phrase in the stem. Keep the alternative responses as simple as possible.
Extraneous
Minimize negative stems Avoid negative stems if possible. “Which of the following is NOT a characteristic of....” This construction can confuse some subjects who might otherwise know the material.
Make responses similar Avoid making the correct response systematically different from the others (grammar, length, construction). Responses should be written in parallel form so that the form of the response is not a clue to the correct answer.
Make responses mutually exclusive Each response should be mutually exclusive of all others. Avoid overlapping responses.
Make responses equally plausible All responses in an item set should be equally plausible and attractive to the less knowledgeable subject.
Randomly order responses Responses (ABCD) should be randomly ordered for each question. Some test writers hesitate to place the proper answer first (A) because “subjects won’t read the others” or last (D) because “that’s an obvious place for the right answer.” That leaves “B” or “C” for the majority of correct responses. Use a random number table or a computer, or even a die, to assign the order of responses and avoid unintentional response sets.
Avoid sources of irrelevant difficulty Avoid irrelevant sources of difficulty in the statement of the problem or in the responses. Some test writers confuse subjects by using complex vocabulary, for instance. Do you want to know what the subject knows, or do you want to test his vocabulary?
Eliminate extraneous material Do not include extraneous material in a question. That is, do not attempt to mislead examinees by including information not necessary for answering the question. © 4th ed. 2006 Dr. Rick Yount
11-5
“None of the Above”
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Avoid “None of the Above” Alternative responses such as “none of the above,” “all of the above,” and “both b and d” should be eliminated if possible. The uninformed thinker can use simple reasoning skills to eliminate several alternatives. But the informed non-thinker may not be able to correctly manipulate the problem. Unless the test is supposed to measure reasoning ability, do not make “ability to reason” part of the score.
Supply Items Supply items, sometimes called “recall” or “fill in the blank” items, present a statement with one or more blanks. The task of the subject is to fill in the blank(s) with the most appropriate terms in order to correctly complete the statement.
Advantages Supply items are relatively easy to construct. Second, they are efficient in that a large number of statements can be processed in a given length of time. Third, remembering a term or phrase is more difficult than recognizing it in a list or response set. Therefore, supply items discriminate better between subjects’ knowledge of important definitions and concepts.
Disadvantages Supply items are notorious for being ambiguous. It is difficult to write a supply item that is clear and plainly stated. Supply items are unclear in the way they’re graded because usually more than one word will adequately fill the blank. Grading can be arbitrary and unfair, depending on how synonyms are handled.
Writing Supply Items The following guidelines will help you avoid major pitfalls in writing supply items. When? Limit blanks One Correct Important Blank End Blank Irrelevant Clues Text Quotes
When to use supply items In general, only use supply items when the correct response is a single word or brief phrase.
Limit blanks Use only one or two blanks in a supply item. The greater the number of blanks, the greater the item ambiguity and the more difficult grading is.
Only one correct answer Write the item in such a way that only one term or word will correctly complete the statement. If there are equally acceptable terms for a given concept (i.e., “null” and “statistical” hypothesis), then credit should be given for either answer.
Blank important terms Leave only the most important word or term blank. Blanking out minor words makes the item trivial.
11-6
© 4th ed. 2006 Dr. Rick Yount
Writing Tests
Chapter 11
Place blank at the end In most cases, it is preferable to place the blank at the end of the sentence. This gives subjects the entire sentence to construct the basis for supplying the proper term. Placing the blank at the beginning reverses this natural process and causes confusion.
Avoid irrelevant clues An irrelevant clue is an element of the statement, unrelated to the conceptual focus of the question, which hints at the correct answer. An example is making the length of the blank equal to the number of characters in the word to be supplied. Another irrelevant clue is the use of “a” or “an” before a blank. Avoid these two irrelevant clues by making all blanks the same length, and using the more general “a(n)” before the blank.
Avoid text quotes Do not use directly quoted sentences out of required reading as supply items. This is a easy temptation to fall into and seems to make sense: “If subjects have read the material, they should be able to supply the appropriate term.” But sentences taken out of context are usually ambiguous. Write the supply item based on a clear concept, not a specific quote.
Matching Items Matching items presents subjects with two or three columns of items which relate to each other. An example of a matching question is one which provides a numbered list of authors with a parallel “lettered” list of the books they wrote. Match the books to their authors by writing the “letter” of each book in the space next to the numbered author. The list of authors is the “item list” and the list of books is the “response option list.”
Advantages The matching item can test a large amount of material simply and efficiently. Response pairs can be drawn from various texts, class notes, and additional readings to form a summary of facts. Grading is easy.
Disadvantages A good matching item is difficult to construct. As the number of response pairs in a given item increases, the more mental gymnastics is required to answer it. Matching items can present little more than a confusing array of trivial terms and sentence fragments.
Writing Matching Items The following guidelines will help you avoid major pitfalls in writing supply items.
Limit Pairs Option List
Limit number of pairs
One Correct
Do not include too many pairs to be matched in a given item. The list should contain no more than 8-10 pairs.
Central Theme Responses Systematic
© 4th ed. 2006 Dr. Rick Yount
11-7
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Make option list longer If each response should be used once and only once, then the option list should be longer than the item list. That is, it should contain more responses than are needed to match all the items. Then subjects cannot answer the last item by the simple process of elimination. However, if responses can be used more than once, then both lists can be the same length. Note: There are times when a matching question can be used in place of several multiple choice questions. The response option list consists of only a few responses. Responses from the options list are used several times to match each of the items. This is not a true matching question, which consists of matched item-response pairs, but it is a common practice and eliminates the need for several repetitive multiple choice items. You can see an example of this kind of “matching” question at the end of the chapter.
Only one correct match It is important to insure that each term in the item list matches only one term in the option list. This becomes more difficult as the list grows larger. Response options may be used more than once, however.
Maintain a central theme A matching item should contain matched pairs that all relate to one central theme. Avoid mixing names, dates, events, and definitions in a single matching item. If this is not possible, construct several matching items, each with a central theme: dates-events, terms-definitions, and so forth.
Keep responses simple It is better to place longer statements in the item list and the shorter answers in the response option list. This helps subjects scan the response list for the correct match more efficiently.
Make the response option list systematic Arrange the answers in the response option column in some systematic way. This might be alphabetical or chronological order. This makes the task of searching through the list less taxing and allows subjects to concentrate on answering correctly.
Specific instructions Be sure to clearly instruct subjects on how the matching is to be done. Show an example, if necessary. This eliminates “test-wiseness” as an extraneous variable in the scoring.
Essay Tests Essay tests are constructed from unstructured or “open-ended” questions which require subjects to write out a response.
11-8
© 4th ed. 2006 Dr. Rick Yount
Writing Tests
Chapter 11
Advantages Essay test items allow much greater flexibility and freedom in answering. Grammar, structure, and content of the answer is left to subjects. Essay items permit testing at the higher levels of learning than most types of objective questions. Finally, essay questions permit a greater range of answers than objective items.
Disadvantages The greatest disadvantage of essay items is that they are difficult to score consistently. The answers are more ambiguous and subjective than objective responses. The reliability of scores is lower than those produced by objective tests over the same content because of the variability of response. Essay items test a smaller sample of material because of the amount of time required to analyze and understand the question, develop the answer, and write it out in complete sentences. They are less efficient than objective types.
Writing essay items The following guidelines will help you avoid major pitfalls in writing essay items.
Use short-answer essays It is much better to use several short answer essay items than one or two long ones. If the testing period is one hour, ask six ten-minute essays rather than two 30minute essays. This improves sampling of material, focuses the essays sufficiently to increase reliability of grading, and produces a better measure of what subjects know.
Write clear questions Be sure that the question you ask gives sufficient guidance to examinees. The question, “Discuss sampling processes,” is much too vague. Better essay questions structure the thinking of subjects: “Define four types of sampling and explain specifically how each is used in research.”
Develop a grading key Outline a specific grading key for each essay item. Points may be awarded for each element in the key. Major elements should receive more points than minor ones. A point or two should be awarded for grammar, punctuation, organization, and the like. This grading key provides a systematic guide for objectively grading an essay answer. Without such a key, the score is as much a result of the perception of the grader as it is a measure of the knowledge of the subject.
Item analysis Item analysis is a procedure for determining which items in an objective test discriminate between informed and uninformed subjects. If a test’s purpose is to separate subjects along a scale of content mastery (and most tests have this purpose!), then it is important that this separation be done fairly. Every item in a test should contribute to this separation process. Those that do not should be revised or eliminated. © 4th ed. 2006 Dr. Rick Yount
11-9
Short Answer Clear Question Grading Key
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
A popular method of item analysis is a procedure called the Discrimination Index. After administering and grading the exam, the procedure is applied as follows:
Rank Order Subjects By Grade Rank order subjects high to low by their grade on the exam. The rank position of each student is a reflection of their overall preparation for the examination.
Categorize Subjects into Top and Bottom Groups Identify top and bottom proportions of students to compare. You can choose a percentage ranging from 10- to 40-percent. Twenty-five percent is common, and gives you the top and bottom quarters of the class.
Compute Discrimination Index An example will illustrate this step better than a definition. Let’s say you have a class of 40 students. You select top and bottom quarters (25%) for computation of the discrimination index. This means you have 10 students in the top group and 10 in the bottom, as identified by test score rank. Count how many students in the groups answered question one correctly. Let’s say that in our case, 8 of the top 10 subjects answered question #1 right, and 3 of the bottom 10 answered it right. The discrimination index is equal to 8 minus 3 divided by 10, or +0.500.
Revise Test Items A discrimination index ranges from -1.00 to +1.00. A negative index indicates a faulty question: more “bottom” students answered it right than “top” students. This question should at least be rewritten and may need to be eliminated from the test. Questions you expect everyone in the class to know, a so-called barrier question, will appear with a low discrimination index — often a “0.000” index. Questions you design as discriminating questions —questions designed to separate students according to their mastery of the material — should have moderate to high index values (+0.500 and above). A reasonable test should contain 60% barrier questions and 40% discriminating questions. A test can be more difficult (while remaining completely fair and unbiased) by including higher percentages of discriminating questions (say, 50-50), or by including questions with discrimination indexes of +0.750 or higher (or both!). The use of the discrimination index by test designers solves one of the most frustrating aspects of education and research: arbitrary testing. The discrimination index provides a way to develop tests which contain questions that actually separate the prepared (knowledgable) from the unprepared, andyield test data which is more valid and reliable.
Summary In this chapter we have looked at procedures for developing various types of tests. We have considered four kinds of objective items: true-false, multiple choice, supply and matching. We have discussed the use of essay questions. Finally, we described item analysis, which allows test developers to determine whether objective test items are valid.
11-10
© 4th ed. 2006 Dr. Rick Yount
Writing Tests
Chapter 11
Examples In addition to the checklist in Chapter nine, Dr. Mark Cook also developed an objective test
. . .to measure the lesson objectives at three cognitive levels: knowledge, comprehension, and application. The process of development began by creating a thirty-item multiple-choice test to be used in the field test of the study (appendix D). The test was examined by three selected specialists. The specialists that were asked for validation of the test were as follows: [specialists listed]. These professors were provided complete lesson plans to use in evaluation.3
A copy of the test is located at the end of the chapter. Dr. Brad Waggoner focused his entire 1991 dissertation on developing a standardized test to measure the “discipleship base” -- defined as 'that portion of a given church's membership that meets the criteria of a disciple'4 -- of local Southern Baptist churches. He worked in conjunction with the International Mission Board of the Southern Baptist Convention to produce a valid and reliable instrument. A final instrument of 136 items5 produced a Cronbach's alpha reliability coefficient of 0.9618.6 While we can certainly not replicate the fifty-eight pages7 of his development procedure here, we will outline the procedure and focus on key aspects of test development. Phase One: Identification of Functional Characteristics8 Attitudes: A disciple is one who: Possesses a desire and willingness to learn Has conviction regarding the necessity of living in accordance to biblical principles and guidelines Evidences a repentant attitude when a violation of Scripture occurs Possesses a willingness to forfeit personal desires and conveniences, if necessary, in order to seek the interests of others Possesses and demonstrates the character trait of humility Possesses and demonstrates the character trait of integrity Is willing to be accountable to others Conduct/Behavior: A disciple is one who: Manifests a lifestyle of utilizing time and talents for God's purposes Possesses a lifestyle depicted by intentional compliance with the with the moral teachings of the Bible. . . Maintains appropriate behavior toward those of the opposite sex Actively seeks to promote social justice and righteousness in society as well as to individuals Relational/Social: A disciple is one who: Values and accepts himself as created in the image of God Has an awareness of the reality and presence of God through the ministry of the Holy Spirit Experiences trust in God in times of adversity as well as in times of prosperity Seeks to commune with and learn about God through the means of meditation upon Scripture and prayer Is consistently involved in fellowship with other believers in the context of a local church Applies oneself to building meaningful relationships with other believers Maintains a forgiving spirit when wronged Confesses or seeks forgiveness when guilty of an offense
4 Cook, 22-23 Waggoner, 9 Ibid., Headings from 68-80
3
Ibid., 209
5
Ibid., 118
6
Pages 65-118 of 233 pages
7
8
© 4th ed. 2006 Dr. Rick Yount
11-11
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Ministry/Skills: A disciple is one who: Publicly identifies with Christ and the Church when provided an opportunity Seeks and takes advantage of opportunities to share the Gospel with others Is involved in ministering to other believers Seeks the good of all men with a willingness to meet practical social needs such as food, clothing, and the like Doctrine/Beliefs: Eternal security Salvation The Holy Spirit (the nature and role of) The Eternal State (the literal existence of heaven and hell) Scripture (the authority and reliability of) Phase Two: Testing of Content Validity9 The finctional characteristics, categorized according to the five domains described above, were placed on a 9-point Likert rating scale, a value of "1" being "not valid," and a value of "9" being "very valid" with gradations of validity in between57 (appendix B). The purpose of the rating scale was for a panel of experts to determine the degree to which each characteristic was a valid and measureable function of a disciple.(58) A list of names was compiled. . .the panel was to consist of five experts and two alternatives representing the academic, denominational, and local church levels (appendix C).(59) A letter was constructed that explained the nature and purpose of the research and requested their participation on the panel (appendix D). . . . When the rating scales were returned, the mean scores were calculated for the characteristics (appendix F). Phase Three: Revision of Characteristics10 Revisions to the list of characteristics were made based on the panel's scores, comments, and additions. It was predetermined that any item receiving a mean score of less than 7.0 would be considered for deletion. Phase Four: Item Writing11 Review Related Measures Construction of Questions The Size of the Item Pool The Issue of Relevance The Issue of Clarity The Issue of Simplicity The Issue of Single Meaning The Issue of Double Negatives The Issue of Question Length The Issue of Question Variety The Issue of Response Categories The Issue of Assuming The Issue of "Leading" or "Loaded" Questions The Issue of Grammar and Tone Phase Five: Testing Content Validity of Questions12 Selection of a Panel of Experts Development of a Validation Instrument Follow-Up of Validation Panel Ibid., 81-82
9
11-12
10
Ibid., 82
11
Ibid., 83-91
© 4th ed. 2006 Dr. Rick Yount
Writing Tests
Chapter 11
Calculation of Validity: "...questions receiving mean scores of less than 6.0 would be considered for deletion." Phase Six: Questionnaire Design13 Question Order and Flow Questionnaire Length Questionnaire Design and Layout Size and Color of Paper Layout Instructions Expression of Gratitude Expression of Confidentiality Identification of the Sponsor Phase Seven: Refining the Pilot Test14 The process of refining the pilot test consisted of a small number of individuals evaluating the clarity of questions, word meanings, instructions, and procedure for completing the instrument. . . .Revisions were made to the instrument based upon the results. Subsequently, over 100 questionnaires were printed and put into booklet form (appendix M). Phase Eight: Pilot Test #115 Selection of Sample Group [n=50 church members in two groups] Establish Time and Place of Pilot Test Letter of Invitation Constructed and Mailed Administering the Instrument Phase Nine: Data Analysis16 [This is part of Chapter Four of the dissertation]. Phase Ten: Revision of the Instrument17 Phase Eleven: Second Pilot Test18 Selection of [Three] Churches Procedure for Administering the Pilot Test Follow-Up Procedure Phase Twelve: Data Analysis of the Second Pilot Test19 [This is part of Chapter Four of the dissertation].
As mentioned in Chapter One, this instrument -- with further revisions by Dr. Waggoner in conjunction with the IMB and LifeWay Christian Resources (SBC) -- is being integrated into revised MasterLife materials produced by LifeWay.
Vocabulary changing alternative constant alternative discrimination index distractors multiple choice question 12 17
Ibid., 91-92 Ibid., 103-104
synonym for a multiple choice test item synonym for a true-false test item procedure used to determine quality of test items multiple choice options which appear plausible but are incorrect test item with one stem and 4 or 5 plausible options 13 18
Ibid., 92-98 Ibid., 105-106
© 4th ed. 2006 Dr. Rick Yount
14 19
Ibid., 98-99 Ibid., 106
15
Ibid., 99-103
16
Ibid., 103
11-13
Research Design and Statistical Analysis in Christian Ministry
response set specific determiners supply question
II: Research Methods
predictable pattern in objective answers (e.g. TTTF TTTF TTTF) terms like `never’ or `sometimes’ that give clues to correct answer synonym for fill-in-the-blank questions
Study Questions 1. Explain the four preliminary guidelines given for writing tests in your own words. 2. Explain why objective test items produce more reliable scores than essay test items. 3. Write out 3 TF, 3 MC, 2 supply and 2 essay questions relating to this material. Set them aside for a few days. Then go back and evaluate each of your questions according to the criteria given for each kind of question.
Sample Test Questions 1. T F A true-false question which uses terms such as “only,” “none,” or “always” are usually true. 2. Choose the best true-false question below. A. Disuse of double negatives does not impair item validity. B. Payne writes, “Don’t use direct quotes in t-f items.” C. Direct quotes, “fuzzy” language, double negatives, specific determiners and complex sentences should be avoided in t-f items; rather, focus on central concepts, precise language, and simple sentences. D. Constant alternative items consist of a stem and several parallel responses. 3. Which of the following is an advantage of multiple choice items over true-false items? A. Easy to write B. Guessing is reduced C. Less efficient D. More open ended 4. Which of the following is a problem of matching questions? A. The question contains less than 10 pairs B. The response options list is systematically ordered C. Each item matches one and only one response option D. Response options cover multiple themes
11-14
© 4th ed. 2006 Dr. Rick Yount
Writing Tests
Chapter 11
Sample Test APPENDIX B3 PRE-SESSION TEST Student Number _________ (see your name tag) Circle the letter of the phrase that best completes the sentence. 1. The phrase "priesthood of believers" is found in the Bible (a) in the New Testament, (b) in the Old Testament, (c) in both testaments, (d) in neither testament. 2. The doctrine of the priesthood of the believer teaches that priests should (a) be representative of all people, (b) represent God to other persons (c) be ordained by a church, (d) remain completely separated from the world. 3. During the Reformation, the priesthood of all believers particularly emphasized (a) infant baptism, (b) personal witnessing, (c) direct access to God, (d) wrongs of the Catholic church. 4. The concept of priest in the Old Testament is most often associated with the priesthood of (a) all Israelites, (b) some Israelites, (c) no Israelites, (d) the special prophets of Israel. 5. The Old Testament covenant was designed by God (a) to bless Israel as His people only, (b) to assure that Israel worshipped only God, (c) to help Israel conquer their world, (d) to make Israel a blessing to all other nations. 6. Christians are referred to as a holy priesthood. This holiness is best reflected by Christians when they are (a) motivated by love, (b) pure in their thoughts, (c) serving God at church, (d) separated from the world.... 62 Cook, 62-63. The entire test runs 15 items.
3
© 4th ed. 2006 Dr. Rick Yount
11-15
Research Design and Statistical Analysis in Christian Ministry
11-16
II: Research Methods
© 4th ed. 2006 Dr. Rick Yount
Developing Scales
Chapter 12
12 Developing Scales The Likert Scale The Thurstone Scale The Q-Sort Scale The Semantic Differential Our emphasis from the beginning of the text has been on the objective measurement of research variables. Sometimes we are most interested in studying subjective variables: attitudes, feelings, personal opinions, or word usage. How can we measure subjective variables objectively? The answer is an instrument called a scale.1 Dr. Martha Bergen used an adaptation of an existing scale2 to measure the attitude of seminary professors toward using computers in seminary education. Respondants [110 seminary professors serving at Southwestern Baptist Theological Seminary in 1988] were asked to read each question and decide to what extent they agreed or disagreed with each question. They were instructed to circle the appropriate number after each of the items. The rating scale was set up in a logical pattern using the numbers "1," "2," "3," and "4" to correspond with "strongly disagree," "disagree," "agree," and "strongly agree," respectively. Responses [from the 53 items] were totaled and evaluated to reveal which attitude/s was/were most prominent. . . . A validation panel consisting of five experts in the areas of education, religious education, and computers was asked to rate the relevance and clarity of each question. Proper revisions and modifications were made as deemed necessary from the panel's critique and evaluation. For the purpose of establishing reliability, a stratified random sample of ten seminary professors -- representative of the intended population -- was selected to respond to the questionnaire. The method of split-half correlation was used to determine the coefficient of internal consistency. . . .3
The result of the modifications was an instrument which measured the strength of support (an attitude) of seminary professors for the use of computers in the seminary education in 1988. The internal consistency coefficient, after applying the SpearmanSee Babbie, "Chapter 15: Indexes, Scales and Typologies," pp. 366-389; Nunnally, "Chapter 15: Attitudes and Interests," pp. 441-467; and Payne, "Chapter 8: The Development of Self-Report Affective Items and Inventories," pp. 164-200. An excellent paperback dealing with this subject is Daniel J. Mueller, Measuring Social Attitudes: A Handbook for Researchers and Practitioners, (New York: Teachers College Press, 1986). 2 Bergen describes her instrument as “an adaptation of a 1986 dissertation [instrument] from North Texas State Univeristy. See Mitchell Drake Weir, 'Attitudes and Perceptions of Community College Educators toward the Implementation of Computers for Administrative and Instructional Purposes' (Ph.D. dissertation, North Texas State University, 1987), pp. 129-35. In May 1988 North Texas State University became the University of North Texas,” 48 3 Ibid., 48-49. See also 57-62 for more detail. 1
© 4th ed. 2006 Dr. Rick Yount
12-1
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Brown Prophecy Formula, was +0.75, a strong positive value (see Chapter 22). A scale is an instrument which measures subjective variables. In this chapter we look at four major types of scales: the Likert (LIE-kurt), the Thurstone, the Q-sort and the Semantic Differential. Each of these important scale types provides the means to gather subjective data objectively.
The Likert Scale The Likert scale is by far the most popular attitude scale type. A statement is followed by several levels of agreement: strongly agree, agree, no opinion, disagree, strongly disagree. This five-point scale is commonly used, but other scales, from four to ten points, can be used as well.2 Follow these steps to develop a Likert Scale for use in research. We will use the attitude “desire to learn” in college students to illustrate the process.
Define the attitude The first step in designing an attitude scale is to define the attitude you want to measure. What does the attitude mean? What does “desire to learn” mean? If students do not have a desire to learn, what do they have? Perhaps, “desire to get a degree.” With these two end points we can begin to build a scale to differentiate between those who desire to learn, and those who merely want a credential. In defining the attitude, we must choose which end of the scale will be positive, and which will be negative. The simplest way to do this is to assign the positive end of the scale to your attitude. For our example, we'll make “desire to learn” positive, and “desire to get a degree” negative.
Determine related areas Having defined the end points of the scale, we next determine what attitudes, opinions, behaviors, or feelings might be related to each end of the scale. What kinds of things would reflect the positive side? The negative side? These related areas provide the raw material from which we’ll develop attitudinal statements. In what areas would “learn” and “degree” students differ? Here’s my suggested list: doing homework, using the library, extra reading, free time discussion, meetings with professors, opinions concerning the meaning of a degree, and views on grades.
Write statements Next, we will write statements that reflect positive and negative aspects of these areas. We’ve defined “positive” to mean “that which agrees with my position,” and “negative” means “that which disagrees with my position.” The statements, even though reflecting subjective variables, should be objective. That is, statements must not be systematically biased toward one position or the other. Students who really want merely to get a degree should have no trouble scoring low on the scale. They should tend to agree with statements reflecting “degree” and tend to disagree with statements reflecting “learning.” In the same way, students who really want to learn should tend to agree with “learning” statements, and tend to disagree with “degree” 2 Mueller, “Chapter 2: Likert Attitude Scaling” and “Chapter 3: Likert Scale Construction: A Case Study,” 8-33.
12-2
© 4th ed. 2006 Dr. Rick Yount
Developing Scales
Chapter 12
statements.
Positive examples Positive statements should be objective statements which are acceptable by those having the attitude, and just as unacceptable to those not having it. The following reflect these characteristics in regard to our attitude scale: • I generally enjoy homework assignments and sometimes do more than the assignment requires. • I frequently use library resources to go beyond the required reading. • I believe a degree is empty unless it reflects my best efforts of scholarship. • A late assignment, thoughtfully done, is more important than the loss in grade average.
Negative examples Negative statements should be objective statements which are acceptable to those not having the attitude, and just as unacceptable to those having it. These statements coincide with the positive examples above. • Homework assignments are designed to meet course requirements. It is impractical in time and energy to do more than is required. • It is better to master the required reading than to dilute one’s thinking with other authors. • A degree is a credential for ministry and reflects, in itself, none of the extremes of scholarship some try to ascribe to it. • It is better to turn in an assignment on time than to be docked for lateness to make it better.
Create an item pool Continue writing items, both positive and negative, until you have an item pool at least twice the size of your intended instrument. If you plan to have 20 statements in your final scale, then create an item pool of 40 items.
Validating the items Enlist a validation panel of 6-8 persons to evaluate each item. It is suggested that you have persons on the panel who represent both extremes of the scale. Have the panel rate each item on its clarity and potency in defining the attitude in question.
Rank Rank order the evaluated items on clarity and potency. Choose an equal number 3 Mueller states, “Five categories are fairly standard.... Some scale constructors use seven categories, and some prefer four or six response categories (with no middle category). All of these options seem to work satisfactorily. It should be noted in this regard that reducing the number of response categories reduces the spreading out of scores (reduces variance) and thus tends to reduce reliability. Increasing the number of response categories adds variance. As the number of categories is increased, a point is reached at which respondents can no longer reliably distinguish psychologically between adjacent categories [i.e., what’s the difference between a 10 and an 11 on a 12-point scale? WRY]. Increasing the number of categories beyond this point simply adds random (error) variance to the score distribution” (pp. 12-13).
© 4th ed. 2006 Dr. Rick Yount
12-3
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
of positive and negative items from the best statements.
Formatting the Scale Randomly order the selected statements. Use letters to indicate choices, such as “SD”, “D”, “A”, and “SA” rather than numbers. I recommend that you use four or six levels of response. Using an even number of responses forces respondents to mark the direction of their attitudinal tendencies — positive or negative. Mean scores for groups filling out the scale have more meaning in this less stable construction. Many Likert scales have 5 levels, with a “no opinion” center. This neutral middle option allows subjects an easy way to avoid considering the statement.
Write instructions Write instructions which clearly explain how to select responses on the form. (See the finished example at the end of the chapter.) There are other ways to indicate the intensity of response. Dr. Don Mattingly (Ed.D., 1984) developed a scale for his dissertation which used the categories Yes!
Yes
No
No!
to indicate how strongly his subjects agreed or disagreed with statements concerning recreation ministry.
Scoring the Likert scale The points given for each response depend on whether the statement is positive or negative. The person who “strongly agrees” with a “positive statement” gets the maximum points. One who “strongly disagrees” with a “positive statement” gets the minimum points. For a four-point scale, the scoring would be as follows for positive statements: SD=1, D=2, A=3, SA=4. The person who “strongly agrees” with a negative statement gets the minimum number of points (1), while the one who “strongly disagrees” with a negative statement gets the maximum points (4). In our four-point example, the scoring for negative statements would be as follows: SD=4, D=3, A=2, and SA=1. In this short 8-item example attitude scale (see end of chapter), subject attitude scores will range from a low of “8” (8 x 1 = 8) to a high of “32” (8 x 4 = 32). For a twenty-five item scale, this procedure yields scores ranging from 25 to 100. These scores can then be used to compare groups on the defined attitude.
The Thurstone Scale The Likert Scale, which we just discussed, consists of statements that are all of equal weight. The subjects’s score results from adding together all of the scaled responses for all the statements. The Thurstone attitude scale, however, consists of statements which have a range of weights from high (usually 11) to low (usually 1). Subjects select the attitudinal statements they agree with most. Their scores result from computing the average of the weights of the items selected.4 Use the following steps to develop weighted items for a Thurstone scale. See Mueller, “Chapter 4: Thurstone Scale Construction,” 34-46.
4
12-4
© 4th ed. 2006 Dr. Rick Yount
Developing Scales
Chapter 12
Attitude Toward Seminary Learning INSTRUCTIONS: Read each statement below. Circle the letter which best describes your response to the statement. If you strongly disagree with the statement, circle SD. If you DISAGREE, circle D, AGREE, A, or STRONGLY AGREE, SA. 1. Homework assignments are designed to meet course requirements. It is impractical in time and energy to do more than is required. (-)
SD (D) A = 3 pts
2. A late assignment — thoughtfully done — is more important to me than the loss in grade aver age. (+)
SD
3. A degree is a credential for ministry and reflects, in itself, none of the extremes of scholar ship some try to ascribe to it. (-)
D
(SD) D = 4 pts
SA
(A) SA = 3 pts A
SA
4. I generally enjoy homework assignments and sometimes do more than the assignment requires. (+)
SD
D
(A) SA = 3 pts
5. It is better to turn in an assignment on time, as it is, than to be docked for lateness to make it better. (-)
SD
D
(A) SA = 2 pts
6. I frequently use library resources to go beyond the required reading. (+)
SD (D) A = 2 pts
7. I believe a degree is empty unless it reflects my best efforts of scholarship. (+)
SD
D
8. It is better to master the required reading than to dilute one’s thinking with other authors. (-)
SD
D
A
SA (SA) = 4 pts
(A) SA = 2 pts
Red notations are not included on the form, but are included here to demonstrate the scoring of a completed form. This subject selects items as marked, which are scored according to statement type. This subject scored 23 points on this scale (32 possible). Very positive attitude!
© 4th ed. 2006 Dr. Rick Yount
12-5
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Develop item pool As in the development of Likert scale items, develop an item pool of attitudinal statements. Include statements that range from extremely unfavorable to extremely favorable, as well as neutral statements. An item pool of about 50 attitudinal statements is adequate.
Compute item weights Compute a scale value (or “weight”) for each statement. This is done by having a panel of 10 or more judges rank each statement. This is done by having each judge read through all statements, and choosing the most positive. This statement is given 1 point. Most negative? (11 points). These statements are eliminated from the pool. Choose the two next most positive (2 points each). Two next most negative (10 points). Four next most positive (3 points). Four next most negative (9 points). After all judges have rank ordered the statements, average weights are computed by adding up all the points from all the judges for each item, and dividing by the number of judges. This average is the item weight. The item with the lowest weight is the most positive according to the panel of judges. The item with the highest weight is the most negative.
Rank the items by weight Rank the items by item weight, low (positive) to high (negative).
Choose Items by Equidistant Weights Compose the final scale by selecting 20 to 25 statements whose weights are approximately equidistant from each other throught the entire scale. If a 9-category favorableness scale was used by judges and if 22 items are to be selected for the final scale, the items will need to be picked at scale intervals of approximately .36. (There are eight units between 1.00 and 9.00; 8/22 = 0.36). In fact, since no items will have median values as low as 1.00 or as high as 9.00, a slightly smaller interval size, perhaps around .33, should be used to select 22 equidistant items.5 If two items have the same weight, choose the item with the smaller standard deviation (see Chapter 16 for how to calculate standard deviation). In this way, the list of statements form a range of weights, as determined by the panel of judges.
Formatting the Scale Place the selected statements in random order. Do not include item weights on the instrument.
Administering the Scale Direct the subjects to read all statements in the instrument and mark those with which they agree. They may choose as many as they like. See the example at the end of the chapter.
Scoring Compute the median (or mean) of the weights of the statements marked by the Mueller, p. 37
5
12-6
© 4th ed. 2006 Dr. Rick Yount
Developing Scales
Chapter 12
subject. This is the subject’s score which reflects attitude on the theme.
Q-Methodology It is difficult to rank order more than ten statements. But rank ordering attitudinal statements is a good way to gather subjective data on a given sample. The “Q-sort” is a procedure for rank ordering a large number of statements. Rankings of statements by two or more groups can then be compared. One version of the Q-sort uses a physical set of boxes, numbered 1 through 11 (This is the same arrangement as that described for weighting Thurstone items). The procedure is usually applied when the number of statements to be ranked is greater than 40. The subject looks through a number of statements written on cards. Each card contains one statement. The first time through, the subject selects the statement he agrees with the most. That item goes into box 1. The subject then goes through the cards a second time and selects the statement he agrees with the least. This card is placed in box 11. The next time through the cards, the subject selects two cards he agrees with the most, and places them into box 2. Then he chooses the two cards he agrees with least in box 10. Then 4 cards in box 3 and 4 cards in box 9, and so forth, until he is left with the middle box (#6). All the remaining statements are placed in it. The researcher then assigns point values for each statement, 1-11, based upon the box into which they were placed. After all subjects have placed the statements, averages are computed. Rank order statements for the group on the basis of their average values.
Semantic Differential The semantic differential provides information on differences (“differential”) in word usage (“semantics”) in subjects. Osgood and Tannenbaum wrote the classic work on using the semantic differential, entitled The Measurement of Meaning.1 The book is a My Church detailed analysis of this powerful technique. We valuable __ : __ : __ : __ : __ : __ : __ simply introduce the procedure here. clean __ : __ : __ : __ : __ : __ : __ Osgood and Tannenbaum isolated three bad __ : __ : __ : __ : __ : __ : __ major dimensions of word meanings through unfair __ : __ : __ : __ : __ : __ : __ the use of factor analysis. These dimensions are large __ : __ : __ : __ : __ : __ : __ evaluative (good or bad), potency (strong or strong __ : __ : __ : __ : __ : __ : __ weak) and activity (fast or slow). Their book deep __ : __ : __ : __ : __ : __ : __ contains hundreds of adjective pairs relating to fast __ : __ : __ : __ : __ : __ : __ these three dimensions. active __ : __ : __ : __ : __ : __ : __ A subject is presented a sheet of paper with hot __ : __ : __ : __ : __ : __ : __ a single word or term at the top. Below this word are a number of adjectival pairs, separated (1) (2) (3) (4) (5) (6) (7) by seven blanks. For example, the meanings associated with the term “my church” might be formatted like this: The first four adjective pairs measure the evaluative dimension; the next three measure potency; and the last three measure activity. The numbers shown above are not
worthless dirty good fair small weak shallow slow passive cold
Urbana: University of Illinois Press, 1957
1
© 4th ed. 2006 Dr. Rick Yount
12-7
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
printed on the instrument, but are shown here to help clarify the scoring procedure. Pairs which are reversed should be scored in reverse, so that positive is always (1) and negative (7) regardless of which side of the scale they appear. Subjects check one blank between each pair indicating their opinion of the term on this scale. Blanks are scored 1-7, providing a numerical score for the meaning of the term in each dimension. Groups of subjects can then be compared on the three dimensions of meaning for any commonly used word. (Note: the numbering scale 1-7 is true only if the positive term is on the left; otherwise the scale is labelled 7-1). Results can be plotted in three dimensions — to provide a picture of semantic differences between two or more groups of subjects.
The Delphi Technique The Delphi Technique, while having alternative forms and procedures, is essentially used to determine concensus in a group of subjects. The items around which this concensus is formed are constructed from comments from the group itself, thereby eliminating researcher bias in item creation. Suppose a researcher is interested in defining the most important concerns of Sunday School teachers of youth in Tarrant Baptist Association churches. A letter would be sent to all youth teachers in Tarrant Association churches asking them to list their "major concerns" in teaching young people. Responses would most likely range from literature, to facilities, to youth attitudes, to parental problems, to . . . well, the list would be long. The "major concerns" from all responders would be analyzed for commonalities, and a list of key "major concerns" would be produced. From this list of "major concerns" the researcher would create pairs of attitudinal statements, one positive and the other negative. For example, for the major concern of "youth literature," one might create the following paid of attitudinal statements. (+) (-)
The literature we use for teaching youth demonstrates an understanding of youth needs, and how the Bible addresses those needs. The literature we use for teaching youth demonstrates a lack of understanding of youth needs, and provides little help in addressing those needs with the Bible.
Pairs of statements are created for each major concern. Randomly select an equal number of positive and negative statements for inclusion in the Delphi instrument. Construct an instrument in which statements are randomly listed. Associate each with a Likert type response: Strongly Agree. . . Strongly Disagree. Duplicate the instrument and send it to all youth teachers in Tarrant Association. Each teacher will read the statements and mark his or her degree of agreement (or disagreement) with each statement. Completed forms will be returned to the researcher by means of self-addressed and stamped envelopes. Score forms just like a Likert scale. Scores for each statement produces a mean for the entire group. Means (and their associated statements) will then be ranked. From this ranking, the researcher can determine how the group responded to the "major concerns" submitted by individuals earlier. Thesewill either be reinforced by agreement by the entire group (major concerns, indeed!), or they will be identified as
1
12-8
Procedure described by Dr. John Curry, University of North Texas, EDER 601, Fall 1983
© 4th ed. 2006 Dr. Rick Yount
Developing Scales
Chapter 12
a isolated concerns not shared by the group. The Delphi Technique is a powerful way to allow a group of subjects to create their own attitude statements, and then measure the strength (or lack) of support by the whole group for the statements generated by the process.1
Summary In this chapter we have introduced ways researchers measure attitudes. We have emphasized the Likert and Thurstone scales, the Q-Sort, and the Semantic Differential. These are but a sampling of procedures available to you to measure the subjective characteristics of groups.
Vocabulary Evaluative Likert scale Potency Q-sort Activity Semantic Differential Thurstone scale
A scale in the semantic differential which measures good-bad Attitude scale which uses + and - equally weighted statements A scale in the semantic differential which measures strong-weak Method for rank ordering a large number of attitudinal statements A scale in the semantic differential which measures fast-slow An attitude scale which measures differences in word meanings Attitude scale which uses weighted statements
Study Questions 1. Define “attitude scale.” 2. Compare and contrast the Likert and Thurston attitude scales. 3. What applications would be appropriate for the semantic differential in Christian research? Likert scale? Thurstone scale? Delphi Technique?
Sample Test Questions 1. The attitude scaling technique which uses equally weighted items is the A. Likert Scale B. Thurstone Scale C. Q-Sort D. Semantic Differential 2. The best approach to rank ordering a large number of statements is the A. Likert Scale B. Thurstone Scale C. Q-Sort D. Semantic Differential 3. The method to use in measuring the differences between selected groups in the way they use specified terms is the A. Likert Scale B. Thurstone Scale C. Q-Sort D. Semantic Differential
© 4th ed. 2006 Dr. Rick Yount
12-9
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Sample Thurstone Scale Instructions: Read the statements below, and check off any that reflects your attitude toward education. You may check off as many as you like. (weights on next page) __x__ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ __x__ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ __x__ _____ _____ _____ _____ _____ __x__ _____ _____
I am intensely interested in education. I go to school only because I am compelled to do so. I am interested in education but one shouldn’t get too concerned about it. I like reading thrillers and playing games better than studying. Education is of first rate importance in the life of man. Sometimes I feel education is necessary and sometimes I doubt it. I wouldn’t work at studying so hard if I didn’t have to pass exams. Education tends to make people snobs. I think time spent studying is wasted. It is better to start a career at age 18 than to go to college. It is doubtful that education has helped the world. I have no desire to have anything to do with education. We cannot become good citizens unless we are educated. More money should be spent on education. I think my education will be of use to me after I leave school. I always read newspaper articles on education. Education does more harm than good. I see no value in education. Education allows us to live a less monotonous life. I dislike education because that time has to be spent on homework. I like the subjects taught in school but do not like attending school. Education is doing more harm than good. Lack of education is the source of all evil. Education enables us to make the best possible use of our lives. Only educated people can enjoy life to the full. Education does more good than harm. I do not like school teachers so I somewhat dislike education. Education is alright in moderation. It is enough that we should be taught to read, write and do sums. I do not care about education so long as I can live comfortably. Education makes people forget God and despise Christianity. Education is an excellent character builder. Too much money is spent on education. If anything, I must admit to a slight dislike of education.
Attitude score = 1.0 + 1.3 + 2.7 + 1.8/4 = 1.3 Very positive!
12-10
© 4th ed. 2006 Dr. Rick Yount
Developing Scales
Chapter 12
Sample Thurstone Scale (with weights) Subject score equals average of weights of statements selected.
1.0 10.0 4.2 6.4 0.5 5.4 6.9 8.4 10.1 7.9 5.7 10.9
I am intensely interested in education. I go to school only because I am compelled to do so. I am interested in education but one shouldn’t get too concerned about it. I like reading thrillers and playing games better than studying. Education is of first rate importance in the life of man. Sometimes I feel education is necessary and sometimes I doubt it. I wouldn’t work at studying so hard if I didn’t have to pass exams. Education tends to make people snobs. I think time spent studying is wasted. It is better to start a career at age 18 than to go to college. It is doubtful that education has helped the world. I have no desire to have anything to do with education.
1.3 2.2 3.7 3.0 9.3 11.4 3.3 7.4 4.5 10.5 2.3 0.3 1.2 2.7 7.1 4.9 5.8 8.9 9.9 1.8 8.6 6.7
We cannot become good citizens unless we are educated. More money should be spent on education. I think my education will be of use to me after I leave school. I always read newspaper articles on education. Education does more harm than good. I see no value in education. Education allows us to live a less monotonous life. I dislike education because time has to be spent on homework. I like the subjects taught in school but do not like attending school. Education is doing more harm than good. Lack of education is the source of all evil. Education enables us to make the best possible use of our lives. Only educated people can enjoy life to the full. Education does more good than harm. I do not like school teachers so I somewhat dislike education. Education is alright in moderation. It is enough that we should be taught to read, write and do sums. I do not care about education so long as I can live comfortably. Education makes people forget God and despise Christianity. Education is an excellent character builder. Too much money is spent on education. If anything, I must admit to a slight dislike of education.
© 4th ed. 2006 Dr. Rick Yount
12-11
Research Design and Statistical Analysis in Christian Ministry
12-12
II: Research Methods
© 4th ed. 2006 Dr. Rick Yount
Chapter 13
Experimental Designs
13 Experimental Designs What is Experimental Research? Internal Invalidity External Invalidity Types of Designs We've previously discussed aspects of three dissertations which embraced an experimental design. My Southwestern dissertation compared three approaches to teaching adults in a local Southern Baptist church: Skinnerian behaviorism, Brunerian cognitivism, and an eclectic approach of the two in 1978. Dr. Stephen Tam compared three approaches to teaching with Chinese students in Hong Kong seminary: interactivity, gaming, and lecture in 1989. Dr. Mark Cook studied the role of active participation in adult learning in a local church in 1994.1
What Is Experimental Research? The research methods we have examined in the past few chapters are generally considered descriptive studies. A descriptive study analyzes a present condition in order to describe it completely. It answers the question “What is?” Experimental research, on the other hand, answers the question “What if?” The researcher manipulates independent variables (e.g., type of treatment, teaching method, communication strategy) and measures dependent variables (anxiety level, Bible comprehension, marital satisfaction) in order to establish cause-and-effect relationships between them. Notice, the independent variable is controlled or set by the researcher. The dependent variable is measured by the researcher. An “experiment” is a prescribed set of conditions which permit measurement of the effects of a particular treatment.2 In our varied curricula —education, administration, age-group ministry, counseling and social work —there is a need to discover the “If p then q” links in the world of local church ministry. In this chapter we will explain threats to internal and external experimental validity as well as illustrate both true- and quasi- experimental research designs. There are numerous hindrances to planning a good experiment. A “good” experiSee Yount “A Critical Analysis...”; Tam, “A Comparative Study...”; and Cook, “A Study of...” See Babbie, “Chapter 8: Experiments,” pp. 186-207; Borg and Gall, “Chapters 15-16: Experimental Designs, Parts I and II,” pp. 631-731; Clifford J. Drew and Michael L. Hardman, “Chapter 5: Designing Experimental Research,” Designing and Conducting Behavioral Research (New York: Pergamon Press, 1985), pp. 77-105; Sax, “Chapter 6: Research Design: Factors Affecting Internal and External Validity,” pp. 116-151 and “Chapter 7: Research Design, Types of Experimental Designs,” pp. 152-178; and True, “Chapter 8: Experiments and Quasi-experiments,” pp. 233-258. 1 2
© 4th ed. 2006 Dr. Rick Yount
13-1
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
ment is one that confines the variation of measurement scores to variation caused by the treatment itself. The hindrances to good research design are called sources of experimental invalidity. These sources fall under two major subdivisions: internal invalidity and external invalidity. Let’s define further these sources of experimental invalidity.2
Internal Invalidity
History Maturation Testing Instrumentation Regression Selection Mortality Interaction John Henry Diffusion
Internal invalidity asks the question, “Are the measurements I make on my dependent (i.e., the variable I measure) variable influenced only by the treatment, or are there other influences which change it?” An experimental design suffers from internal invalidity when the other influences, called extraneous sources of variation, have not been controlled by the researcher. When extraneous variables have been controlled, researchers can be reasonably sure that post-treatment measurements are influenced by the experimental treatment, and not by extraneous variables. Donald Campbell and Julian Stanley wrote a chapter of a text on research designs that has become a classic in the field.3 In this chapter they list eight extraneous variables: history, maturation, testing, instrumentation, statistical regression, differential selection, experimental mortality, and selection-maturation interaction. Borg and Gall list two more: the John Henry effect and experimental treatment diffusion.4
History History refers to events other than the treatment that occur during the course of an experiment which may influence the post-treatment measure of treatment effect. If the explosion of the nuclear reactor in Chernobyl, Ukraine had occurred in the middle of a six-month treatment to help people reduce their “anxiety of nuclear power,” it is likely that post-test anxiety scores would be higher than they would have been without the disaster. History does not refer to the background of the subject. Since history is an internal source of invalidity, it's influence must occur during the experiment. If you study two groups, one which receives the treatment and a similar one which does not, you “control” for history (which is why this second group is called a “control group”) since both groups are statistically5 affected the same way by events outside the experiment. Any differences between the two groups at the end of the experiment could reasonably be linked to the treatment.
Maturation Subjects change over the course of an experiment. These changes can be physical, mental, emotional, or spiritual. Perspective can change. The natural process of human growth can result in changes in post-test scores quite apart from the treatment. Question: How would a “control group” control this source of internal invalidity?6
I use the term “invalidity” to differentiate this concept from “test validity” discussed in Chapter 8. Be careful, however. Many texts use the terms “experimental validity” and “test validity.” 3 Donald T. Campbell and Julian C. Stanley, “Experimental and Quasi-experimental Designs for Research on Teaching,” in Handbook of Research on Teaching, ed. N. L. Gage (Chicago: Rand McNally, 1963) 4 Borg and Gall, 635-637 5 Individuals might be affected, but the groups will not significantly differ from each other. 6 Subjects in both groups will mature, on average, the same. 2
13-2
© 4th ed. 2006 Dr. Rick Yount
Chapter 13
Experimental Designs
Testing A common research design is to give a group a pre-test, a treatment, and then a post-test (see p. 13-6). If you use the same test both times, the group may show an improvement simply because of their experience with the test. This is especially true when the treatment period is short and the tests are given within a short time. Unless you must specifically measure changes during the experiment -- requiring testing before and after the treatment -- it is better to only give a post-test. Randomly assign subjects to groups to render the dependent variable (as well as all others!) statistically equal at the beginning of the study.
Instrumentation In the previous section we discussed the problem of using the same test twice in pre- and post-measurements. But if you use different tests for pre- and post-measurements, then the change in pre- and post-scores may be due to differences between the tests rather than the treatment. The best remedy, as we have already discussed, is to use randomization and a post-test only design. But if you must have pre-test scores — you must use intact groups and need to know if the groups are “equivalent”, or you want to study changes over time — then you must develop “equivalent tests” using the parallel forms techniques discussed in Chapter Eight. How does use of a control group relate to instrumentation?7
Statistical regression Set a glass of cold milk and a hot cup of coffee on a table. Over time, the cold milk will get warmer and the hot coffee colder. They both regress toward the room temperature. Statistical regression refers to the tendency of extreme scores, whether low or high, to move toward the average on a second testing. Subjects who score very high or very low on one test will probably score less high or low when they take the test again. That is, they regress toward the mean. Let’s say you are analyzing how much a particular reading enrichment program enhances the reading skills of 3rd grade children. You give a reading skills test and select for your experiment every child who scores in the bottom third of the group. You provide a three-month treatment of reading enrichment, and then measure the reading ability of the group. On the basis of the scores on the children’s first and second tests, you find that reading skills improved significantly. What, in your opinion, is wrong with this study?8 Do not study groups formed from extreme scores. Study the full range of scores. The question we need to answer is: Does the reading enrichment program significantly improve reading skills of randomly selected subjects over a control group?
Differential selection If we select groups for “treatment” and “control” differently, then the results may be due to the differences between groups before treatment. Say you select high school Even if tests are not “equivalent” both experimental and control groups answer the same test. This controls for the effects of instrumentation on the treatment group. It isolates treatment group changes to the given treatment. 8 The group would have scored, on average, better on the second testing regardless of the treatment, simply due to statistical regression. In addition, there is no control by which to measure the treatment. 7
© 4th ed. 2006 Dr. Rick Yount
13-3
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
seniors who volunteer for a special Bible study program as your treatment group, and compare their scores with a control group of high school seniors who did not volunteer. Do your post-test scores measure the effect of the Bible study treatment, or the differences between volunteers and non-volunteers? You cannot say. Randomization solves this problem by statistically equating groups.
Experimental mortality Experimental mortality, also called “attrition,” refers to the loss of subjects from the experiment. If there is a systematic bias in the subjects who drop out, then posttest scores will be are biased. For example, if subjects drop out because they are aware that they’re not improving as they should, then the post-test scores of all those who complete the treatment will be positively biased. Your results will appear more favorable than they really are. How does use of a control group solve the problem of attrition?9
Selection-Maturation Interaction of Subjects Interaction means the mixing or combining of separate elements. If you draw a group of subjects from one church to serve as the treatment group, and a second group from a different church to serve as a control, you could well find -- beyond the simple problem of selection differences (“Are the two groups equivalent?”) -- a mixing of selection and maturation factors to compound the extraneous influence on your measurements. For example, if the two churches differ in the average age of their members, they may well respond to the treatment differently due to inherent maturational factors. Randomly selecting all subjects from a defined population solves this problem.
The John Henry Effect John Henry, the legendary “steel drivin’ man,” set himself to prove he could drive railroad spikes faster and better than the newly invented steam-powered machine driver. He exerted himself so much in trying to outdo the "experimental" condition that he died of a ruptured heart. If subjects in a control group find out they are in competition with those in an experimental treatment, they tend to work harder. When this occurs, differences between control and treatment groups is decreased, minimizing the perceived treatment effect.
Treatment diffusion Similar to the John Henry effect is treatment diffusion. If subjects in the control group perceive the treatment as very desirable, they may try to find out what’s being done. For example, a sample of church members are selected to use an innovative program of discipleship training, while the control group uses a traditional approach. Over the course of the experiment, some of the materials of the treatment group may be borrowed by the control group members. Over time, the treatment “diffuses” to the control group, minimizing the treatment effect. This often happens when the groups are in close proximity (members of the same church, for example). Both the John Henry Effect and Treatment Diffusion can be controlled if experimental and control groups are isolated. Subjects will tend to drop out of both treatment and control groups equally. Those who remain in both groups provide a better picture of "difference" than before-and-after type designs. 9
13-4
© 4th ed. 2006 Dr. Rick Yount
Chapter 13
Experimental Designs
External Invalidity
Effects of Testing
External invalidity asks, “How confidently can I generalize my experimental findings to the world?” Sources of external invalidity cause changes in the experimental groups so that they no longer reflect the population from which they were drawn. The whole point of inferential research is to secure representative samples to study so that inferences can be made back to the population from which the samples were drawn (Chapter Seven). External invalidity hinders the ability to infer back. Campbell and Stanley list four sources of external invalidity: the reactive effects of testing, the interaction of treatment and subject, the interaction of testing and subject, and multiple treatment interference.
Reactive effects of testing Subjects in your samples may respond differently to experimental treatments merely because they are being tested. Since the population at large is not tested, experimental effects may be due to the testing procedures rather than the treatment itself. This reduces generalizability. One type of reactive effect is pretest sensitization. Subjects who take a pre-test are sensitized to the treatment which is to follow (educators sometimes use a pre-test as an advanced organizer to prepare students for learning). This preparation changes the research subjects from the population from which they were drawn, and therefore reduces the ability to generalize findings back to the (untested) population. The best experimental designs do not use pretests. Another type of reactive effect is post-test sensitization. The posttest can be, in itself, a learning experience that helps subjects to “put all the pieces together.” Different results would be obtained if the treatment were given without a posttest. While researchers must make measurements, care must be taken to measure treatment effect, not add to it, with a post-test.
Treatment and Subject Interaction Subjects in a sample may react to the experimental treatment in ways that are hard to predict. This limits the ability of the researcher to generalize findings outside the experiment itself. If there is a systematic bias in a sample, then treatment effects may be different when applied to a different sample.
Testing and Subject Interaction Subjects in a sample may react to the process of testing in ways that are hard to predict. This limits the ability of the researcher to generalize findings outside the experiment itself. If there is a systematic bias of test anxiety or “test-wiseness” in a sample, then treatment effects will be different when applied to a different sample.
Multiple Treatment Effect Normally we find a single treatment in an experiment. If, however, an experiment exposes subjects to, say, three treatments (A, B, and C) and test scores show that treatment C produced the best results, one cannot declare treatment C the best. It may have been the combination of the treatments that led to the results. Treatment C, given alone, may produce different results.
© 4th ed. 2006 Dr. Rick Yount
13-5
Treatment & Subject Testing & Subject Multiple Treatments
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Summary Designing an experiment that produces reliable, valid, and objective data is not easy. But experimental research is the only direct way to measure cause and effect relationships among variables. What a help it would be to Kingdom service if we could develop effective experimental researchers who are also committed ministers of Gospel -- learning from direct research how to teach and counsel and manage and serve in ways that directly enhance our ministry.
Types of Designs The following is a summary of some of the more important designs of Campbell and Stanley. I will briefly describe the design, give an example of how the design would be used in a research study, and indicate possible sources of internal and external invalidity. In the design diagrams which follow, a test is designated by “O,” a treatment by “X,” and randomization by an “R.”
True Experimental Designs Experimental designs are considered true experiments when they employ randomization in the selection of their samples and control for extraneous influences of variation on the dependent variable. The three designs we will consider in this section are the best choices for an experimental dissertation. These are the pretest-posttest control group design, the Posttest Only Control Group design, and the Solomon Four Group design.
Pretest-Posttest Control Group Two randomly selected groups are measured before (O1 and O3) and after (O2 and O4) one of the groups receives a treatment (X).
R R
O1 O3
X
O2 O4
Example. Third graders are randomly assigned to two groups and tested for knowledge of Paul. Then one group gets a special Bible study on Paul. Both are then tested again. Analysis. The t-test for independent samples (Chapter 20) can be used to determine if there is a significant difference between the average scores of the groups (O2 and O4). You can also compute gain scores (O2 - O1 and O4 - O3) and test the significance of the average gain scores with the matched samples t-test. Comments. This design’s only weakness is pre-test sensitization and the possible interaction between pretest and treatment.
Posttest Only Control Group Subjects are randomly selected and assigned to two groups. Due to randomization, the two groups are statistically equal. No pretest is given. One group receives the treatment.
13-6
© 4th ed. 2006 Dr. Rick Yount
Chapter 13
Experimental Designs
R R
X
O1 O2
Example. Third graders are randomly assigned to two groups. Then one group receives a special study on the life of Paul (no pre-test). Both are tested on their knowledge of Paul at the conclusion of the study. Analysis. The difference between group means (O1 and O2) can be computed by an independent groups t-test. [Other procedures that can be used include one-way ANOVA (though usually used with three or more groups - see Chapter 24), the ordinal procedures Wilcoxin Rank Sum test or Mann-Whitney U (see Chapter 21). We’ll discuss these later].
Solomon Four-Group Subjects are randomly selected and assigned to one of four groups. Group 1 is tested before and after receiving the treatment; Group 2 is tested before and after receiving no treatment; Group 3 is tested only after receiving the treatment; and Group 4 is tested after receiving no treatment.
1 2 3 4
R R R R
O1 O3
X X
O2 O4 O5 O6
The Solomon design is actually a combination of the Pre-Test Post-Test Design (groups 1 and2) and the Post-Test Only design (groups 3 and 4). Look!
1 2
R R
3 4
R R
O1 O3
X
O2 O4
X
O5 O6
Example. Third graders are randomly assigned to 1 of 4 groups. The “knowledge of Paul” is measured in groups 1 and 2. Groups 1 and 3 are given a special study on the life of Paul. When the special study is over, all four groups are tested. Analysis. One-way ANOVA can be used to test the differences in the four posttest mean scores (O2, O4, O5, O6). The effects of the pretest can be analyzed by applying a t-test to the means of O4 (pretest but no treatment) and O6 (neither pretest or treatment). The effects of the treatment can be analyzed by applying a t-test to the means of O5 (treatment but no pretest) and O6 (neither pretest or treatment). Subject maturation can be analyzed by comparing the combined means of O1 and O3 against O 6. Comments. The Solomon Four Group design provides several ways to analyze data and control sources of extraneous variability. Its major drawback is the large number of subjects required. Since each group needs to contain at least 30 subjects, one experiment would require 120 subjects. © 4th ed. 2006 Dr. Rick Yount
13-7
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Quasi-experimental Designs The term quasi- (pronounced kwahz-eye) means almost, near, partial, pseudo, or somewhat. Quasi-experimental designs are used when true experiments cannot be done. A common problem in educational research is the unwillingness of educational administrators to allow the random selection of students out of classes for experimental samples. Without randomization, there are no true experiments. So, several designs have been developed for these situations that are “almost true experiments,” or quasi-experimental designs. We’ll look at three: the time series, the nonequivalent control group design, and the counterbalanced design.
Time Series Establish a baseline measure of subjects by administering a series of tests over time (O1 through O4 in this case). Expose the group to the treatment and then measure the subjects with another series of tests (e.g., O5 through O8).
O1
O2 O3 O4
X
O5 O6
O7
O8
Example. A class of third graders is given several tests on Paul before having a special study on him. Several tests are given after the special study is finished. Analysis. I could say something like “data is analyzed by trend analysis for correlated data on n subjects under k conditions (linear and polynomial), or the monotonic trend test for correlated samples,” but let me simply say that data analysis is much more complex with a time series design. An effective visual analysis can be made by graphing the group’s mean scores on each test over time. Important changes in the group can easily be attributed to the treatment by the shape of the line. One could also average the pre-treatment scores and the post-treatment scores, and apply a t-test for matched samples to the averages! Comments. Since there is no control group, one cannot determine the effects of history on the test scores. Instrumentation may also be a problem (Are the tests equivalent?) Beyond these internal validity problems, the reactive effects of repeated testing of subjects is a source of external invalidity.
Nonequivalent Control Group Design Subjects are tested in existing or “intact” groups rather than being randomly selected. The dotted line in the diagram represents “non-equivalent” groups. Both groups are measured before and after treatment. Only one group receives the treatment.
O1 X O2 --------------------O3 O4 Example. Two intact third grade classes (no random selection) are tested on their knowledge of Paul before and after one of them receives a special study on the life of Paul. Analysis. One approach to measuring the significance of difference between the
13-8
© 4th ed. 2006 Dr. Rick Yount
Chapter 13
Experimental Designs
two groups is to compute gain scores. This is done by subtracting the pre-test score from the post-test score for each subject. Use gain scores to compute average gain for each group. Test whether the average gain is significantly different by the t-test for independent samples. Another approach is to use the pre-test scores as a covariate measure to adjust the posttest means. Analysis of covariance (See Chapter 25) is the procedure to use. Comments. This design should be used only when random assignment is impossible. It does not control for selection-maturation interaction and may present problems with statistical regression. Beyond these internal sources of invalidity, this design suffers from pretest sensitization.
Counterbalanced Design Subjects are not randomly selected, but are used in intact groups. Group 1 receives treatment 1 and test 1. Then at a later time, they receive treatment 2 and test 2. Group 2 receives treatment 2 first and then treatment one.
Group 1 Group 2
Time 1 2 X2 O X1 O X2 O X1 O
Example. Two third grade classes receive two special studies on Paul: one in classroom and the other on a computer. Class 1 does the classroom work first, followed by the computer; class 2 does the computer work first. Both groups are tested after both treatments. Analysis. Use the Latin Squares analysis (beyond the scope of this text). Comments. Since randomization is not used in this design, selection-maturation interaction may be a problem. Multiple treatment effect is a possible source of external invalidity.
Pre-experimental Designs Pre-experimental designs should not be considered true experiments, and are not appropriate for formal research. I include them so that you can contrast them with the better designs. Data collected with these designs is highly suspect. We will consider the One Shot Case Study design, the One Group Pretest Posttest design, and the Static Group comparison design.
The One Shot Case Study A single group is given a treatment and then tested.
X
O
Example. A third grade class is provided a special Bible study course on Paul, after which their knowledge of Paul is tested.
© 4th ed. 2006 Dr. Rick Yount
13-9
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Analysis. Very little analysis can be done because there is nothing to compare the posttest against and no basis to determine what influence the treatment had. Comments. None of the sources of internal or external invalidity are controlled by this design. It suffers most in the areas of history, maturation, regression, and differential selection. It also suffers from the external source of “treatment and subject.” The design is useless for most practical purposes because of numerous uncontrolled sources of difference.
One-Group Pretest/Posttest A single intact group is tested before and after a treatment. O1
X
O2
Example. A group of third graders is tested on knowledge of Paul before and after a special study on the life of Paul. Analysis. Test the difference between the pre-test and post-test means using the matched sample t-test (See Chapter 20) or Wilcoxin matched pairs signed rank test (See Chapter 21). Comments. Problems abound with history, maturation, testing, instrumentation, and selection-maturation interaction. The reactive effects of pre-and post- tests and treatment and subject are external sources of invalidity.
Static-Group comparison Two intact groups are tested after one has received the treatment. X O -----------O Example. Two classes of third graders are tested on their knowledge of Paul after one of them has had the special Bible study. Analysis. Determine whether there is a significant difference between post-test means by using the t-test for independent samples (Chapter 20) or the Mann-Whitney U nonparametric test (Chapter 21). While these statistics will work, their results are meaningless since there is no assurance that groups were the same at the beginning of the treatment. Comments. This design suffers most from selection, attrition, and selection-maturation interaction problems. It also fails to control the external invalidity source of treatment and subject.
13-10
© 4th ed. 2006 Dr. Rick Yount
Chapter 13
Experimental Designs
Summary This chapter introduced you to the world of experimental research design. The concepts of internal and external validity, randomization, and control are essential to constructing experiments which provide valid data. Experimental research is the only type which can establish cause-and-effect relationships between variables.
Vocabulary control group differential selection experimental mortality external invalidity history instrumentation interaction of testing/subject interaction of treatment/subject internal invalidity John Henry effect maturation posttest sensitization pretest sensitization selection-maturation interaction statistical regression testing treatment diffusion true experimental research
representative sample which does not receive treatment subjects selected for samples in a non-random manner, i.e., in "different ways" loss of subjects from the study flaw which prevents experimental results from being generalized to the original population events during experiment which influences scores on post test differences in subject scores due to differences in tests used subjects may react to tests unpredictably (generalization?) subjects may react to treatment unpredictably (generalization?) condition which alters measurements within the experiment Control Group tries harder (distorting the results) change in subjects over course of the experiment posttest changes subjects: they ‘put it all together’ and score higher than they normally would pretest changes subjects: ‘advance organizer’: prepares subjects for treatment samples of subjects may mature differently top and bottom scoring subjects move toward the average on second test source of internal invalidity: improvement due to (different) tests, not treatment source of internal invalidity: treatment ‘leaked’ to Control Group design which involves random selection and random assignment
Study Questions 1. Define internal and external invalidity. 2. Explain the ten sources of internal invalidity and four sources of external invalidity. 3. What is required for a research design to be “true experimental”? Why?
© 4th ed. 2006 Dr. Rick Yount
13-11
Research Design and Statistical Analysis in Christian Ministry
II: Research Methods
Sample Test Question Identify each statement on the left as “E”xternal or “I”nternal invalidity by writing an “E” or “I” in the first blank. Then match the type of invalidity on the right with the statements on the right by placing the appropriate letter in the second blank by each statement. E/I ___
Ltr ___
1. Subject familiarity with tests
A. History
___
___
2. Systematic differences in drop-out
B. Instrumentation
___
___
3. Exp Groups chosen differently
C. John Henry Effect
___
___
4. Differences between pre/post test
D. Maturation
___
___
5. Control group “tries harder”
E. Mortality
___
___
6. Subjects react differently to experimental treatment
F. Multiple Treatments
___
___
7. Natural changes in subjects
___
___
8. Pre-test sensitization
___
___
9. Impact of external events
___
___
10. Treatment “leaked” to Control
___
___
11. “Low” group scores higher on second test
___
___
12. Subjects react differently to testing procedures
13-12
G. Regression H. Reactive Effect of Tests I. Selection J. Testing K. Testing & Subject L. Treatment & Subject M. Treatment Diffusion
© 4th ed. 2006 Dr. Rick Yount
Chapter 15
Data Reduction: Distributions and Grapahs
15 Distributions and Graphs Creating an Ungrouped Frequency Distribution Creating a Grouped Frequency Distribution Visualizing the Distribution: the Histogram Visualizing the Distribution: the Frequency Polygon Common Distribution Shapes Distribution-Free Data The end of the research part of a study comes after the data has been collected through tests, attitude scales, questionnaires, or other instruments. Raw data presents us an incomprehensible mass of numbers. The first step in statistical analysis is to reduce this incomprehensible mass of numbers into meaningful forms. This is done by using frequency distributions and associated graphs. In this chapter we’ll look at several ways to organize data so that you see its meaning. We will look at both Ungrouped and Grouped Frequency Distributions.
Creating An Ungrouped Frequency Distribution Let’s say that you have given a Bible knowledge test to 38 high school seniors. The maximum score is 120. Here are the scores: 90 84 99 80
59 70 98 75
75 105 59 68
81 109 93 91
66 104 82 97
95 47 69
75 89 72
71 62 84
100 83 97
78 95 44
51 58 74
As you can see, this collection of numbers makes little sense as it is. But we can organize and summarize the data in such a way to make it meaningful. Let’s start by rank ordering the numbers from high (109) to low (44). v v v v v v
109 105 104 100 99 98
© 4th ed. 2006 Dr. Rick Yount
97 97 95 95 93 91
90 89 84 84 83 82
81 80 78 75 75 75
74 72 71 70 69 68
66 62 59 59 58 51
47 44
15-1
Research Design and Statistical Analysis in Christian Ministry
III: Statistical Fundamentals
This ranking helps us to see where any given score fell along the whole range of scores. But the list is still rather long and difficult to manage. Let’s now go through the list and count the number of times each score occurs. This is the score’s frequency, represented by the letter “f.” Score
f
Score
f
Score
f
109 105 104 100 99 98 97 95 93 91 90
1 1 1 1 1 1 2 2 1 1 1
89 84 83 82 81 80 78 75 74 72 71
1 1 1 1 1 1 1 3 1 1 1
70 69 68 66 62 59 58 51 47 44
1 1 1 1 1 2 1 1 1 1
The ungrouped frequency distribution above removes the redundancy of repeating scores. But the large number of single scores (f=1) still confuses the picture. If we were to group ranges of scores together in classes, we would get a better picture of the data. Grouping scores into classes produces a grouped frequency distribution.
Creating a Grouped Frequency Distribution The steps in constructing a grouped frequency distribution are as follows: calculate the range of scores, compute the class width (i), determine the lowest class limit, determine the limits of each class, and finally group the scores into the classes.
Calculate the Range The range of scores is found by subtracting the lowest score from the highest and adding one. Or, in statistical shorthand, Range = Xmax - Xmin + 1 The “X” represents a score. The term “Xmax” refers to the highest (maximum) score and “Xmin” to the lowest (minimum) score. Putting the above formula into English, we read, The range of a group of scores is equal to the difference between the maximum and minimum scores in the group, plus 1. In our case, the range equals (109 - 44 + 1=) 66.
Compute the Class Width We approximate the size of each category of scores, called the class width (i), by dividing the range by the number of intervals we wish to have. Conventional practice suggests we use 5 to 15 classes. We'll use 10 classes here. The tentative class width (i) is equal to the range of 66, computed above, divided by the number of intervals desired, 10.
15-2
© 4th ed. 2006 Dr. Rick Yount
Chapter 15
Data Reduction: Distributions and Grapahs
66 / 10 = 6.6 We need to round up or down to a whole number. Odd class widths are better than even ones because the midpoint of an odd-width class is a whole number. So let’s round up to “7.” (In this context, we would even round a number like '6.1' up to 7). The distribution will have a class interval (i) of 7.
Determine the Lowest Class Limit Each class of scores should begin with a multiple of the class width. The lowest class limit should be a multiple of i (in our case, i=7) AND include the lowest score. Our lowest score is 44. The value “42” includes the score of 44 and is a multiple of 7. So our first class begins with 42 and includes 7 scores. As a result, all scores with a value of 42, 43, 44, 45, 46, 47, or 48 will be counted in this class. The lowest class is 42-48.
Determine the Limits of Each Class The next higher class will begin with (42+7=) 49, the next with (49+7=) 55, and so on, until we reach the last class, 105-111. All classes are listed below.
Group the Scores in Classes Move through the data and count how many scores fall into each class. The result looks like this: Class 105-111 98-104 91-97 84-90 77-83 70-76 63-69 56-62 49-55 42-48
Counts // //// ////// //// ///// /////// /// //// / //
f 2 4 6 4 5 7 3 4 1 2 n = Σf = 38 scores
This grouped frequency distribution reveals much more about the Bible knowledge of high school seniors than we could discern in previous listings. On the down side, by grouping our scores into classes, we actually lost some detail. But "losing detail" is necessary when the aim is to derive meaning from the numbers. We can combine our scores even more by increasing the class width i. Let’s look at a frequency distribution of the same data with i = 14. Class 98-111 84-97 70-83 56-69 42-55
© 4th ed. 2006 Dr. Rick Yount
Tally ///// ///// ///// ///// ///
f / ///// ///// // //
6 10 12 7 3 n = 38
15-3
Research Design and Statistical Analysis in Christian Ministry
III: Statistical Fundamentals
This last graph gives a smoother picture of the data set, though we notice the loss of more detail because we reduced the number of classes. Frequency distributions certainly simplify data sets, but we can present the data even more clearly by graphing the frequency distributions.
Graphing Grouped Frequency Distributions Graphs display frequencies in a visual form. We can see a bit of this visual form in the “counts” columns above. The length of the counts (\\\) gives a rough visual image of the data distribution. But we can do better with a graph. A frequency distribution graph consists of two axes which frame the frequency of each score interval.
X- and Y-axes A graph is composed of a vertical line, called the ordinate or the Y-axis, and a horizontal line, called the absissa or the X-axis. These two lines intersect to form a right angle. By convention, the Yaxis should be three-fourths the length of the X-axis. Axis is pronounced AX-is. Axes is pronounced AXees.
Scaled Axes Numbers are placed on the X- and Y-axes at equal intervals to represent the scale values of the variable being graphed. In a graph of a grouped frequency distribution, the X-axis is scaled by the range and class intervals, the Y-axis is scaled by frequency. There are two major graph types used to display information from a grouped frequency distribution. The first is the histogram and the other is the frequency polygon.
Histogram A histogram (HISS-ta-gram) is a special type of bar graph. The width of the bars equals the class interval and the heights of the bars equal class frequencies. Let's use the example data to build a histogram with a range of 44-111 and class width (i) of 7. The frequencies for this graph are located in the middle of page 15-3. Look at the graph at left. Class limits are listed along the X-axis. The widths of all classes equal 7. The height of each bar equals the frequency of scores contained in each category. The shape of the graph provides us a clear and meaningful picture of the entire data set. Then we reduced the number of categories from ten to five (increased i from 7 to 14). The graph at
15-4
© 4th ed. 2006 Dr. Rick Yount
Chapter 15
Data Reduction: Distributions and Grapahs
left shows the effect of reducing the number of classes. Irregularities have been smoothed out, but some of the more specific (irregular) data has been glossed over. Choosing class width and the number of classes is a trial and error process. Our goal is to reflect the shape of the data as clearly as possible while attaining as much precision as possible.
Frequency Polygon By connecting the midpoints of the bars with lines, we produce a frequency polygon. The frequency polygon displays the same information as the histogram, but in a different form. The frequency polygon at right is based on the ten-class histogram on the previous page. If we remove the bars of the histogram, we obtain a frequency polygon graph, below right.
Distribution Shapes The graphic image of a histogram or frequency polygon tells us at a glance the group profile of the data. The incomprehensibility of a set of numbers is transformed into a meaningful visual protrait. This visual portrait displays two special characteristics: kurtosis and skewness. The kurtosis of a curve describes how flat or peaked it is. The three basic profiles of kurtosis are platykurtic (flat), leptokurtic (peaked), and mesokurtic (balanced).
platykurtic
A flat curve is called platykurtic. Think of the flatness of a plate and you’ll remember “platey-kurtic.” Notice that there are low frequencies for all the categories.
leptokurtic
A peaked curve is called leptokurtic. Think of the central frequencies “leaping” away from the others and you’ll remember “leap-tokurtic.” Notice that outer categories have lower frequencies while the central categories have high frequencies.
mesokurtic
A curve that falls between platykurtic and leptokurtic is called mesokurtic. Think of medium (meso-) and you’ll remember meso-kurtic. The familiar bell shaped curve is mesokurtic.
negative skew
The skewness of a curve describes how horizontally distorted a curve is from the familiar bell-shaped curve. A curve with negative skew has its left tail pulled outward to the left, to the negative end of the scale.
positive skew
A curve with positive skew has its right tail pulled outward to the right, to the positive end of the scale. A common mistake is to focus on the “mound of scores” rather than the distorted tail. Remember: the direction the tail is pulled is the direction of the skew.
rectangular
A distribution where all categories of scores have equal © 4th ed. 2006 Dr. Rick Yount
15-5
Research Design and Statistical Analysis in Christian Ministry
III: Statistical Fundamentals
frequency is called a rectangular distribution.
Distribution-Free Measures Our discussion on distributions applies to ratio or interval data only, called parametric data. Two other types of statistics deal with the non-parametric measures: either ordinal (ranks) or nominal (counts) data. Non-parametric data is often called “distribution-free.” We will spend the next few chapters dealing with parametric statistics, and then deal with non-parametric types in Chapters 22, 23, and 24.
Summary This chapter carried you through the first step in data analysis: reducing a series of chaotic numbers to orderly distributions and graphs. Before engaging in more sophisticated statistical procedures, you should initially analyze your data with these data reduction techniques. All good introductory statistics texts have chapters on data reduction techniques.
Vocabulary Absissa Class width (i) Class Exponential curve Frequency (f) Frequency polygon Histogram Kurtosis Leptokurtic Mesokurtic Midpoint Negative skew Non-parametric measures Ordinate Parametric measures Platykurtic Positive skew Rectangular distribution Skew X-axis Y-axis
number along the horizontal (x-) axis of a graph distance between upper and lower limits in a given class a subset of scores defined by upper and lower limits in a frequency distribution line on a graph produced by the equation y = x² the number of scores in a given class graph that depicts class frequencies: uses class midpoints graph that depicts class frequencies: uses class limits amount of flatness (or peakedness) in a distribution of scores highly peaked distribution ("leaps up" in the middle) moderately peaked distribution (normal curve) halfway point between class limits in a given class: x' negative end of skewed distribution: tail pulled left in a negative direction ranks or counts; ordinal or nominal; distribution-free number along the vertical (y-) axis scales or tests; interval or ratio; normal distribution flat distribution ("like a plate") positive end of skewed distribution: tail pulled right in a positive direction all classes have same frequency the degree a tail in a frequency distribution is pulled away from the mean the horizontal axis in a graph the vertical axis in a graph
Study Question Using the following data and the guidelines provided in this chapter... 89, 92, 83, 98, 98, 80, 89, 97, 83, 87, 86, 84, 97, 97, 99, 90, 95, 90, 91, 96, 95, 91, 91, 92, 94, 93, 94, 100 a) ...to construct a grouped frequency distribution with i=3. b) ...to construct a histogram of this distribution. c) ...to construct a frequency polygon of this distribution. d) How would you describe this distribution? (What type?)
15-6
© 4th ed. 2006 Dr. Rick Yount
Chapter 15
Data Reduction: Distributions and Grapahs
Sample Test Questions 1. Frequency distributions and graphs perform what statistical function? A. reduce massive data sets to meaningful forms B. infer characteristics of populations from samples C. predict future trends or behaviors of subjects D. depict significant differences between groups 2. A distribution has a range of 55 points. The best value for “i” is A. 55 B. 11 C. 7 D. 2 3. In a positively skewed distribution, A. the scores are “piled up” on the right B. the right tail curves away from the x-axis C. the long tail points to the right D. the curve is narrow and pointed 4. Which of the following best describes a negatively skewed distribution? A. The test was too easy for the sample of subjects B. The test was too difficult for the sample of subjects C. Scores on the test were evenly distributed among subjects D. Few subjects scored high on the test.
© 4th ed. 2006 Dr. Rick Yount
15-7
Research Design and Statistical Analysis in Christian Ministry
15-8
III: Statistical Fundamentals
© 4th ed. 2006 Dr. Rick Yount
Chapter 16
Data Reduction: Central Tendency and Variation
16 Central Tendency and Variation Measuring the Central Tendency of Data Measuring the Variability of Data Statistics and Parameters The Standard (z-) Score In the last chapter we considered a way to reduce a mass of numbers by creating a grouped frequency distribution and graphing it. The graph is a visual image of the data, and is an important first step in data analysis. In this chapter we develop basic concepts in reducing data numerically. A group of numbers has two primary numerical characteristics. The first is a central point about which they cluster, called the central tendency. The second is how tightly they cluster about that point, called variability.
Measuring Central Tendency The central tendency of a group of scores is the numerical focus point of those scores. It refers to the point of greatest concentration of the set of numbers. There are three separate measures of central tendency. These are the mode, the median, and the mean.
The Mode The mode is the most frequently occurring score in a set of scores. 82 82 83 83 84 85 86 87 87 87 88 90 95 99 99 The mode of the above set of numbers is 87 because it appears three times — more than any other number in the set. 82 83 84 86 87 88 88 89 90 91 91 92 94 97 98 There are two modes above. The numbers 88 and 91 both appear twice. This is a bi-modal (two modes) data set. 82 83 84 86 87 88 89 90 91 92 93 94 95 96 97 There is no mode for this distribution. No score occurs more frequently than any other. The mode is the most frequent score in a set of data.
The Median The median is the middlemost score. That is, it is the score that represents the © 4th ed. 2006 Dr. Rick Yount
16-1
III: Statistical Fundamentals
Research Design and Statistical Analysis in Christian Ministry
exact halfway point through the data set. The median score divides the set of data into two equal halves. Half of the scores falls below the median, and half falls above it.
1
3
4
6
8
9
34
56
67
100 356
In the above set, it is the 6th score, or the number 9. Five scores fall below 9, and five scores fall above 9. We can calculate this score with the simple formula: (N+1)/2 where N is the number of scores in the set. There are 11 scores (N=11) in the data set above. Using the formula, we compute (11+1)/2 = 6. The 6th score is the median, which is the number 9. The median of this data set is 9.
1
3
4
6
8
5 scores below
9
34 56 67 100 356
median
5 scores above
Here's another example:
34 23 67 4 8 17 2 78 99 5 178 3 1678 First, we rank order the numbers from low to high.
2 3 4 5 8 17 23 34 67 78 99 178 1678 Applying the formula, we compute (13+1)/2 = 7. We are looking for the 7th score. The 7th score is the number 23. 23 is the middle number, the median. Six scores fall above and six below this number. Here's an example with an even number of scores:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 In this case, there are two “middlemost” values. The median for this data set is the average of the two middlemost values. Add the two middle values together and divide by 2. In our case, (7+8)/2 = 7.5. Notice that seven numbers fall below 7.5 (1-7) and seven numbers fall above it (8-14).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 \_ 7.5
The median is the middlemost score.
The Arithmetic Mean The mean is the average value of a data set and is the best representation of the set of scores. You have often computed test averages in your courses by adding together several test scores and dividing by the number of tests.
1 2 3 4 5 6 7 8 9 10 The mean is found by adding these ten numbers (N) together and dividing by ten (N). We can represent the procedure for computing a mean in a shorter form by using symbols. You were introduced to the Σ symbol (capital sigma) in chapter 14. The
16-2
© 4th ed. 2006 Dr. Rick Yount
Chapter 16
Data Reduction: Central Tendency and Variation
symbol X (capital "X" or "Y" or "L" or any English letter) refers to scores. The letter N refers to the number of scores. And finally, the Greek letter µ (pronounced myoo) represents the arithmetic mean of the scores. Using these letters to define the formula for the mean, we have the following:
Read the above formula like this: “mu equals the sum of X divided by N.” Or, in English, “the average value of a group of scores is the sum of those scores divided by the number of scores in the group.” Let’s use this formula on the following data set: 10 23 17
5 64 28 3
The mean score of 21.43 represents the average value of all the individual scores in the group, and is the most important measure of central tendency due to its use in statistical analysis.
Central Tendency and Skew When a distribution is a normal ("bell-shaped") curve, all three measures of central tendency have the same value. If a distribution is skewed, the three measures differ in a predictable way. The mode will always equal the highest frequency value. The mean moves away from the mode in the direction of the skew. (Extreme scores pull the mean toward them). In the positively skewed distribution at right ("1") the mean is greater than the mode. In the negatively skewed distribution ("2") the mean is smaller than the mode. The median is always between mode and mean. We have defined three measures of central tendency -- the mode, median, and mean -- and established the prominence of the mean. Now we turn to the second essential characteristic of scores -- variability.
Measures of Variability The second essential characteristic of a group of scores is variability. Variability is a measure of how tightly a group of scores clusters about the mean. Scores can be tightly clustered or loosely clustered about the mean. Scores that tightly cluster about the mean have lower variability. Scores that loosely cluster, that are more spread out from the mean, have higher variability. There are three measures of variability. These are range, average deviation, and standard deviation.
Range As we learned in the last chapter, the range of a group of scores is equal to the highest score minus the lowest score plus 1, or, Range = Xmax- Xmin + 1. It is a crude © 4th ed. 2006 Dr. Rick Yount
16-3
III: Statistical Fundamentals
Research Design and Statistical Analysis in Christian Ministry
measure of variability, but is a useful first step in understanding a distribution. Let’s look at an example. Class A took a midterm examination in research. The highest score in the class was 103 and the lowest was 48. The range was 103 - 48 + 1 or 56 points. Class B is the same size and took the same exam. Their highest and lowest scores were 95 and 67 respectively. Their range was 95 - 67 + 1 or 29 points. Therefore, the scores of Class B have lower variability (more tightly clustered) than the scores of Class A. The problem with range is that it tells us nothing of the dispersion of scores between the high and low points. Classes C and D have the same ranges, but have different dispersions of scores. One way of getting at the dispersion of scores throughout the whole distribution is to measure the deviation of each score from the mean — and then compute the average of all the deviations.
Average Deviation A deviation score, symbolized by a lower case x, is the difference between a score µ) of the distribution. When you subtract the mean of a group of (X) and the mean (µ scores from a specific score, you compute the deviation of the score from the mean. Or, we can write this relationship more simply as x = X - µ. The average deviation of a group of scores is computed by summing all the deviations in the group and dividing by N. Look at the following scores: 10
20
30
40
50
First, compute the mean of these scores: (150/5=30). Then compute the deviation scores (x) by subtracting the mean (30) from each score (X) like this: deviation scores (x) score - mean = deviation 10 30 -20 20 30 -10 raw scores (X) 30 30 0 40 30 10 50 30 20 Σx) = 0 sum of deviations (Σ
{
}
Notice that when we sum the deviations, we get 0 (Σx=0). Why is the sum of deviations equalled to zero? The mean is the balance point in a distribution. When two children of equal weight use a teeter-totter, the balance point is placed half-way between them, as in diagram A below left.
16-4
© 4th ed. 2006 Dr. Rick Yount
Chapter 16
Data Reduction: Central Tendency and Variation
But when children of unequal weight use it, the board must be changed so that the balance point, or fulcrum, is closer to the heavier child. This is shown in B below right. Heavier weight plus shorter distance on one side of the board balances with the lighter weight and longer distance of the other. Another way of saying this is that for perfect balance, the moment of force (weight x distance) of one side equals the moment of force of the other. Subtract one from the other and the result is zero. This is what is meant in statistics when we say the mean is the fulcrum of a group of scores. Large deviations are like large distances from the fulcrum, and small deviations like small distances. (All scores “weigh” the same in this example). The sum of deviations on one side of the mean will always cancel out or balance the sum of deviations on the other side of the mean. Therefore, Σx=0. In order to compute average deviation, we must take the absolute values of the deviations. An absolute value, symbolized as |x|, equals the value of a number regardless of sign. So, the absolute value of -4 equals 4 (|-4| = 4).. By taking the absolute values of deviations, we make them all positive distances from the mean. Summing them, we produce a meaningful measure of "spreadedness" from the mean:
The average deviation equals 12. But average deviation has some mathematical limitations that cause problems in more advanced procedures. A better measure of variability, which also reflects the dispersion of scores throughout a distribution, is the standard deviation.
Standard deviation The standard deviation has mathematical properties which make it, like the mean, much more useful in higher-order statistics. The procedure for standard deviation involves summing squared deviations (producing a value variously called the sum of squared deviations, sum of squares, and statistically, Σx2, which is a fundamental component of many statistical procedures) in order to eliminate negative values. The pathway to standard deviation moves from deviations to the sum of squares to variance to standard deviation. We’ll look at two ways to compute sum of squares. The first, called the deviation method, clearly illustrates what standard deviation means. The second, called the raw score method, is easier to use. Both procedures result in the same value for sum of squares.
Deviation Method Compute deviations of all scores from the mean. Square all deviations (x2) and sum them (Σx2) as follows score 10 20 30 40 50
© 4th ed. 2006 Dr. Rick Yount
-
mean = 30 30 30 30 30
deviation squared -20 400 -10 100 0 0 10 100 20 400 2 Σ Σx=0 Σx = 1000
16-5
III: Statistical Fundamentals
Research Design and Statistical Analysis in Christian Ministry
Large groups will have a larger sum of squares than small groups, simply because there are more deviations in a large group. Dividing by N eliminates size of group from the result. This gives a truer picture of spreadedness in a group of numbers no matter how many are in the group. Divide the sum of squares by N in order to factor out the variable of group size. The resulting value is called the variance of the scores, and is symbolized by the lower case Greek letter sigma (σ). Variance (σ2) = Σx2 / N = 1000/5 = 200.0
Since we squared deviations before adding them, variance measures variability in squared units. It would be better if score variability were in the same unit of measure as /)1 of the scores themselves. We can "undo" the squaring by taking the square root (/ the variance, like this: Standard Deviation (σ σ) = /σ2 = /200 = 14.14
The number 14.14 represents the standardized measure of variability for our example. This number represents, in the same unit of measure as our scores, the degree of spread-out-ness of the scores from the mean. The larger the number, the greater the spread. It is useful in comparing the variabilities in different groups of scores, but will become more meaningful in future statistical procedures. This deviation method shows you exactly what a "standard deviation" is, and is fine to use when you have a few scores and a whole number mean. But if you have a large data set, and the mean is a fraction, like 73.031, computing individual deviation scores, squaring them, and then summing them can be painfully tedious. A simpler way to compute the sum of squares -- and get the very same result -- is to use the raw score formula.
Raw Score Method The raw score method uses the squares of each raw score (rather than squares of deviation scores) to produce the sum of squares. The raw score formula for sum of squares is: Σx2
=
ΣX2 - (Σ ΣX)2/N
where ΣX2 refers to the sum of squared-raw-scores (square all the scores and sum ΣX)2 refers to the sum-of-all-scores squared (sum the scores and then them) and (Σ square the sum). Let's apply this formula to the same data that we used under the deviation method. We should get the same answer: Σx2 = 1000. X 10 20 30 40 50 ΣX = 150 ΣX)2 = 22500 (Σ
X2 100 400 900 1600 2500 ΣX2 = 5500
Σx2 = = = Σx2 =
ΣX2 - (Σ ΣX)2/N 5500 - 22500/5 5500 - 4500 1000
Warning: There is a great difference between ΣX2 and Σx2. Do not confuse the two!
1 The "square root" symbol actually looks like this: ox but is difficult to produce within the text. So I am using the simpler (/x) symbol. Later, with more complicated formulas, I will use graphical characters to indicate square root.
16-6
© 4th ed. 2006 Dr. Rick Yount
Chapter 16
Data Reduction: Central Tendency and Variation
As you can see, both methods give sum of squares values of 1000. The raw score method is easier to do and less prone to arithmetic errors.
Equal Means, Unequal Standard Deviations Let’s say I have two groups of scores. The first group consists of scores 1, 3, 5, 9, 11 and 13. We’ll use the letter “X” to refer to them. The second set of scores are 5, 6, 8 and 9. We’ll use the letter “Y” to refer to them. I’ve put them on a scale below like this:
Notice that the means of the two groups are equal. But the degree of scatter (variability) among the scores is not. Let’s compute the standard deviations of both groups to compare them. Which group should have the larger standard deviation?2 Using the deviation method, we calculate the sum of squares of X as follows: i 1 2 3 4 5 6 N = 6
Xi 1 3 5 9 11 13 ΣX = 42
xi -6 -4 -2 2 4 6 Σx = 0
xi2 36 16 4 4 16 36 Σx2 = 112
The variance of group X equals Σx2/N = 112/6 = 18.66. The standard deviation is the square root of variance, or /18.66 = 4.32. Using the raw score method, we calculate the sum of squares for Group X as follows: i 1 2 3 4 5 6 n = 6
Xi 1 3 5 9 11 13 ΣX = 42
Xi2 1 9 25 81 121 169 ΣX2 = 406
Σx2 = ΣX2 - (ΣX)2/N = 406 - (42)2/6 = 406 - 294 = 112 We get the same result, 112, with either method. 2 Did you say the X's? Good. You can see from the graph that the X's are spread out more than the Y's (another way of saying this is that the range of X is greater than the range of Y). We would expect the X's to have more variability than the Y's, and, in turn, the standard deviation of the X's will be greater.
© 4th ed. 2006 Dr. Rick Yount
16-7
III: Statistical Fundamentals
Research Design and Statistical Analysis in Christian Ministry
Now let’s compute variance and standard deviation for Group Y, which should produce a smaller sum of squares, variance and standard deviation than Group X did. Here’s the deviation method: i 1 2 3 4 N = 4
Yi 5 6 8 9 ΣY = 28
yi -2 -1 1 2 Σy = 0
Σy2
yi2 4 1 1 4 = 10
The variance of group Y equals Σy2/N = 10/4 = 2.5. The standard deviation is the square root of variance, or /2.5 = 1.58. Using the raw score method, we calculate the sum of squares for Group 2 as follows: i 1 2 3 4 n = 4
Yi 1 6 8 9 ΣY = 28
Yi2 1 36 64 81 ΣY2 = 206
Σy2 = ΣY2 - (ΣY)2/N = 206 - (28)2/4 = 206 - 196 = 10 Again, we get the same result, 10, with either method. Since the sum of squares equals 10, variance equals 2.5 and standard deviation 1.58, as calculated above. The deviation method illustrates the meaning of standard deviation, the raw score method gives the same result more simply. We have computed the standard deviation for both groups of scores. The groups have identical means, but different “spreads.” We expected that the scores of Group X would have the larger standard deviation than Group Y because of its larger spread of the scores. Calculations demonstrated a standard deviation of 4.32 in Group X and of 1.58 in Group Y, confirming our expectation.
Parameters and Statistics So far we’ve used the symbol µ to refer to the mean, σ2 to refer to the variance, and σ to refer to the standard deviation of a group of scores. We have treated these groups as populations. You will recall from Chapter 8 that a population of scores includes all the scores or subjects in a specified group (e.g., 7,000 Texas Baptist pastors). A sample is a subset of scores drawn from the population we wish to study (e.g., 700 randomly chosen Texas Baptist pastors). We can compute mean and standard deviation for the population directly. The resulting values are called population parameters and are defined as µ and σ. We can compute mean and standard deviation for the sample directly. These values are called sample statistics and are defined as and . We can estimate population parameters from a sample of scores. These values are called estimated parameters and are
16-8
© 4th ed. 2006 Dr. Rick Yount
Chapter 16
defined as
Data Reduction: Central Tendency and Variation
and s. Let’s illustrate these three sets of values.
Population Parameters Suppose we have a population of 10,000 ministers. We want to compute the mean and standard deviation of their IQ. In order to compute these population parameters directly, you give all 10,000 ministers an IQ test. Sum the 10,000 IQ scores (ΣX), and divide by 10000 (N). The result is the population mean, symbolized by µ (pronounced "myoo"). µ), square the 10,000 deviations (x2), sum Subtract µ from the 10,000 IQs (x=X-µ 2 them (Σx ), divide by 10,000 (N), and finally take the square root (/). This yields the population standard deviation , symbolized by σ.
Sample Statistics The cost in time and materials to test 10,000 subjects and compute the parameters is not practical. Draw a random sample of 100 ministers (1%) and measure their IQs. Sum the 100 IQs (SX) and divide by 100 (N) to produce the sample mean. The sample mean is symbolized by , pronounced “X-bar”. from the 100 IQs (x=X- ), square the 100 deviations (x2), sum them Subtract 2 (Σx ), divide by 100 (N), and finally take the square root (/). This sample standard deviation is symbolized by a sigma with a hat on top ( ), pronounced “sigma-hat.”3
Estimated Parameters When we can not compute population parameters directly, we must estimate them from sample statistics. This is not a problem for the estimate of the mean (µ). The sample mean ( ) is the best estimate. But due to the smaller number of scores in the sample -- because it is a subset of the population -- then the sample standard deviation ( ) always underestimates σ. This underestimation of requires a small correction factor in the equation for estimated standard deviation (s). While the equations for σ and have sum of squares divided by N or n4,the equation for estimated standard deviation (s) has sum of squares divided by n-1. Why n-1? It has to do with the Central Limit Theorem and you really don’t want to know. (Okay, for those who do: the selection of a sample of n scores from the population reduces by one the number of n-sized samples that can be drawn from the
Some textbooks refer to the sample standard deviation as "sigma-tilde" ( ). Often, N and n are used interchangibly to refer to the number of scores in a set. Other times, N refers to the number of scores in a population, and n to the number of scores in a sample. 3 4
© 4th ed. 2006 Dr. Rick Yount
16-9
Research Design and Statistical Analysis in Christian Ministry
III: Statistical Fundamentals
population. This reduces the number of degrees of freedom of the population by one. We’ll talk more about degrees of freedom in a few chapters). So, we have three sets of formulas. Mean and standard deviation are common concepts across the three versions, but there are important differences to note. Notice the use of “N” for parameters and “n” for samples. Match the formulas with the diagram above.
Standard (z-) Scores We have demonstrated two related but separate characteristics of data sets. The first is central tendency (the mean is the most important measure). The second is variability (the standard deviation is the most important measure). Given two sets of data, there are four possibilities of comparisons. Using the chart at left, write out the four possibilities in English.5 Comparing two sets of scores is difficult because the groups usually possess different values of locus and scatter. Where do we begin? What is required is a standard scale which reflects in one value both mean and standard deviation. Then translating raw scores from each set to a single standard form would allow us to compare them directly. Direct comparison is possible because transformed scores from the groups have common “standardized” values. The standardized score which reflects in one value both mean and the standard deviation of a set of scores is called a z-score. A raw score (X) from a population that has a mean of µ and standard deviation σ of is transformed into a standardized scale score (z) with this formula.
The equation is pronounced z equals X minus mu over sigma. In English, the formula means that a standardized score is equal to a raw score minus the population mean divided by population standard deviation. A raw score (X) from a sample that has a mean of and estimated standard deviation of s is transformed into a standardized scale score (z) with the following formula.
The equation is pronounced z equals X minus X-bar over s. In English, the formula means that a standardized score is equal to a raw score minus the sample mean divided by the estimated standard deviation. Both formulas reflect the same relationship between a raw score and a standardUpper left: Two distributions have the same mean and standard deviation. Upper right: Two distributions have the same standard deviation, but different means. Lower left: Two distributions have the same mean, but different standard deviations. Lower right: Two distributions have different means and standard deviations.
5
16-10
© 4th ed. 2006 Dr. Rick Yount
Chapter 16
Data Reduction: Central Tendency and Variation
ized score in a distribution of numbers. The distinction is whether the distribution is a sample or a population. Notice the values for mean and standard deviation are both part of the transformation formula. No matter what these parameters are, the standardized scores are plotted on a z-scale which looks like this:
For a standardized scale, the mean is always zero and standard deviation is always one. The z-score equations transform any group of scores into these standardized values. Let’s look at an example of how z-scores facilitate comparison between scores. John is taking Hebrew and Research. On his midterm exams, he made a 85 in Hebrew and an 80 in Research. On which exam did he do better? It seems obvious that he did better in Hebrew than he did in Research. But the real answer is not so easy. To compare his performance on the two tests, we must take into consideration how well his classmates as a whole did. That is, we need to know the means and standard deviations for the two exams. Here’s the information we need: µ σ Hebrew 80 10 Research 70 5 Now compute z-scores for Hebrew (zh) and Research (zr).
Placing these values on a z-scale, we have:
Notice several things about the diagram above. First, since the z-scores from Hebrew and Research now fall on the same standardized scale, we can directly © 4th ed. 2006 Dr. Rick Yount
16-11
Research Design and Statistical Analysis in Christian Ministry
III: Statistical Fundamentals
compare them. It is clear from the scale that John did much better in Research, scoring two standard deviations above the mean, than he did in Hebrew, where he scored only one-half standard deviation above the mean. Second, notice that the means of both classes line up on a z-score of 0. In standardized scores, the mean is always 0. Third, notice that the σ of 1 on the z-scale is equivalent to 10 in Hebrew and 5 in Research. Fourth, notice that John’s score of 85 in Hebrew falls directly below 0.50 on the z-scale. His score of 80 in Research falls directly below 2.00 on the z-scale. Standardized scores lie at the heart of inferential statistics. These basic building blocks provide the foundation for procedures we’ll study soon.
Summary The three measures of central tendency are the mode, the median, and the mean. These refer, respectively, to the most frequent score, the middlemost score, and the arithmetic average of a group of scores. In terms of statistical analysis, the mean is by far the most important of the three, and the most affected by skewed distributions. Three measures of variability are the range, average deviation, and standard deviation. The standard deviation (and its squared cousin, variance) is the most important of the three. The two characteristics of mean and standard deviation can be combined to transform a raw score (X) into a standard score (z). Z-scores can be directly compared across groups, regardless of differing parameters.
Example In my Ed.D. dissertation, I analyzed how much learning of the doctrine of the Trinity in Southern Baptist adults occurred over a seven week course. Cognitive tests were given at the beginning (Test 1), end (Test 2), end plus three months (Test 3) and end plus six months (Test 4).6 I was also interested in whether the mental abilities of the three groups were balanced. Here is one of my Tables showing the means and standard deviations of these groups.7 You can notice several things immediately from the numbers below. The three groups' average mental ability, measured by the Otis-Lennon Mental Ability Test (maximum score: 80), were within 0.90 points of each other. All three groups learned a great deal about the doctrine of the Trinity -- all three groups jumped an average of 50.69 points over the seven weeks (Test #2 Total N minus Test #1 Total N). All three groups forgot some of what they learned, dropping an average of 11.48 points over three months and 17.98 points over six months. Are these means significantly different? We will learn how to answer this question in Chapter 20.
6 William R. Yount, “A Critical Comparison of Three Specified Approaches to Teaching Based on the Principles of B. F. Skinner's Operant Conditioning and Jerome Bruner's Discovery Approach in Teaching the Cognitive Content of a Selected Theological Concept to Volunteer Adult Learners in the Local Church,” (Fort Worth: Southwestern Baptist Theological Seminary, 1978). 41-42 7 Ibid., 168
16-12
© 4th ed. 2006 Dr. Rick Yount
Chapter 16
Data Reduction: Central Tendency and Variation
APPENDIX XI Means and Standard Deviation Scores Total N
X
Y
Z
MENTAL ABILITY
59.96* 15.58+
59.71 16.55
59.67 19.13
60.57 11.30
TEST #1
24.70 8.01
25.57 4.79
23.44 8.80
25.43 10.26
TEST #2
75.39 15.40
81.43 8.02
78.44 14.85
65.43 18.41
TEST #3
63.91 13.91
66.00 12.36
67.78 15.97
56.86 11.44
TEST #4
57.41 11.56
61.00 9.81
59.22 11.58
52.29 11.86
*Mean
+Standard Deviation
Vocabulary
average deviation average central tendency estimated parameter mean median mode n N parameter range standard deviation statistics sum of squares variability variance x X z-score
X-bar, the average or mean of a group of scores (sample) the average or mean of a group of scores (population) sigma-squared, the population variance sigma, the population standard deviation sigma-hat squared, sample variance sigma-hat, sample standard deviation |x|/n : Sum absolute values of deviation scores, then divide by n sum of scores divided by the number of scores focal point of scores: mean, median, mode and s, computed from sample, infers population parameters average score middlemost score most frequent score number of scores (sometimes used to refer to one group within experiment) number of scores (sometimes used to mean entire experiment) population measurements (µ, σ) distance between highest and lowest scores in a group standardized measure of variation in scores: s sample measurements ( and ) sum of squared deviation scores measure of spreadedness in a group of scores measure of spreadedness in squared units deviation score: difference between score (X) and mean (µ or ) raw score: e.g., test score standardardized score which reflects both and σ (or and s)
Ibid., 169
8
© 4th ed. 2006 Dr. Rick Yount
16-13
III: Statistical Fundamentals
Research Design and Statistical Analysis in Christian Ministry
Study Questions 1. What are the modes for the sets of scores below? a. 1 2 3 4 5 6 6 7 8 9
Mode: ____
b. 1 2 3 4 5 6 6 7 8 8
Mode: ____
c. 1 1 2 2 3 3 4 4 5 5
Mode: ____
2. What are the medians for the following data sets? a.
10
15
20
22
27
29
33
b.
3
7
78
45
2
56
4
Md: ____ 7
Md: ____
3. Compute the mean, sum of squares (use deviation method), variance and standard deviation for the following scores: 65 70 70 75 85 90 95
4. Using the scores in #3, compute the sum of squares with the raw score method.
5. You have taken midterm exams. Your score in New Testament survey was 75. Your score in Principles of Teaching was 90. NTS PT
n 100 25
Σ ΣXX 7020 2175
Σ Σxx2 2500 225
a. Compute means for both classes. b. Compute standard deviations (s) for both classes. c. Transform your midterm scores into z-scores. d. Plot your standard scores on a z-scale. Include the appropriate raw score scale values for the two classes. e. In which class did you do better? Explain how you know this.
16-14
© 4th ed. 2006 Dr. Rick Yount
Chapter 16
Data Reduction: Central Tendency and Variation
Sample Test Questions 1. The most important measure of central tendency is the A. mode B. mean C. kurtosis D. range 2. The measure of central tendency which behaves like a balance or a teeter-totter is the A. mean B. mode C. median D. range 3. If you add together all the deviations of scores about the mean, the result is A. the standard deviation B. the variance C. the sum of squares D. zero 4. You compute mean and median of a distribution and find that the median is larger. You know from this that the distribution is A. normal B. leptokurtic C. positively skewed D. negatively skewed 4. Parameters are to statistics as A. mean is to variance B. population is to sample C. average deviation is to standard deviation D. Greek is to English 5. The mean and standard deviation of the z-scale is, respectively, A. 1, 1 B. 0, 1 C. 1, 0 D. 0, 0
© 4th ed. 2006 Dr. Rick Yount
16-15
Chapter 17
The Normal Curve and Hypothesis Testing
17 The Normal Curve and Hypothesis Testing The Normal Curve Defined Level of Significance Sampling Distributions Hypothesis Testing In the last chapter we explained the elementary relationship of means, standard deviations, and z-scores. In this chapter we extend this relationship to include the Normal Curve, which allows us to convert z-score differences into probabilities. On the basis of laws of probability, we can make inferences from sample statistics to population parameters and make decisions about differences in scores. Using z-scores and the Normal Curve, we can convert differences in scores to probabilities. The chapter is divided into the following sections: The Normal Curve Defined. What is the nature of the Normal Curve? How does the Normal Curve and its associated Distribution table, link z-score with area under the curve? How does area under the curve relate to the concept of probability? Level of Significance. What do the terms “level of significance” and “region of rejection” mean? What is alpha (α)? What is a critical value? The Sampling Distribution. What is a sampling distribution? How does it differ from a frequency distribution? Hypothesis Testing. How do we statistically test a hypothesis?
The Normal Curve On page 16-11 we presented a z-scale with the z-scores for John’s Research and Hebrew test scores. It looked like this:
© 4th ed. 2006 Dr. Rick Yount
17-1
Research Design and Statistical Analysis in Christian Ministry
III: Statistical Fundamentals
Recall that the mean of the z-scale equals zero and extends, practically speaking, 3 points in either direction. Each point on the z-scale equals one standard deviation away from the mean. A score of 100 in John’s Hebrew class equals 2 standard deviations above the mean (µ=80, σ=10, z=+2.0). A score of 55 in John’s Research class equals 3 standard deviations below the mean (µ=70, σ=5, z=-3.0). The z-scale assumes that the distribution of standardized scores forms a bellshaped curve, called a Normal Curve. The normal curve is plotted on a set of X-Y axes, where the X-axis represents, in this case, “z-scores” and the Y-axis “frequency of z-scores.” It looks like the diagram at left. The area between the “bell” and the baseline is a fixed area, which equals 100 percent of the scores in the distribution. We will use this area to determine the probabilities associated with statistical tests. There is an exact and unchanging relationship between the z-scores along the xaxis and the area under the curve. The area under the curve between z = ±1 (read “z equals plus or minus 1”) standard deviation is 68% of the scores (p=0.68). The area between ±2 standard deviations is 96%, or 0.96 of the curve.
The “tails” of the distribution theoretically extend to infinity, but 99.9% of the scores fall between z = ±3.00. Now, let’s use the normal curve in a practical way with John’s classes. We can use the information in the diagram above to answer questions about John’s classes. Example 1:
How many Hebrew students scored between 70 and 90?
For the Hebrew class, a score of 70 equals a z-score of -1 and a 90 equals a z-score of +1. The area under the normal curve between -1 and +1 is 68%. Therefore, the proportion of students in Hebrew scoring between 70 and 90 is 0.68 0.68. How many students is that? Multiply the proportion (p=0.68) times the number of students in the class (60). The answer is 40.8. Rounding to the nearest whole student we would say that 41 Hebrew students fall between 70 and 90 on this test. Example 2:
How many research students scored between 60 and 80?
For the research class a score of 60 equals a z-score of -2; an 80 equals a z-score of +2. The area under the curve between -2 and +2 is 96%. Therefore, the proportion of the students in Research scoring between 60 and 80 is 0.96. How many students is that? (0.96)(40)=38.4. Rounding off to the nearest whole student we would say that 38 research students fall between 60 and 80 on this test.
17-2
© 4th ed. 2006 Dr. Rick Yount
Chapter 17
The Normal Curve and Hypothesis Testing
The Normal Curve Table The Normal Curve distribution table allows us to determine areas under the Normal Curve between a mean and a z-score. Look up this table and use it to follow along the following description. You will find this table on page 1 in the Tables Appendix at the back of the book (Appendix A3-1). The left column of the Normal Curve Table is labelled “Standard score z.” Under this heading are z-scores in the form "x.x" beginning with 0.0 at the top and ending with 4.0 at the bottom. Across the top of the chart are the hundredths (0.0x) digits of z-scores, the numbers 0.00 through 0.09. To find the area under the normal curve between a mean (z = 0) and z = 0.23, look down the left column to 0.2 and then over to the column headed by .03. Where the 0.2 row and .03 column you'll find the area under the Normal Curve between z1 = 0 and z2 = 0.23. This area (shown in gray) is 0.0910 or 9.1%. .03 .04 ... 0.0 | 0.1 | .0910 0.2 ———-————-.0910 0.3 .00
.01
.02
.
What is the area under the curve between the mean and z=+1.96? Look down the left column to the row labelled 1.9 and then across to the column labelled .06. Where these cross in the chart you will find the answer: 0.4750. That means that 47.5% of the scores in the group fall between the mean and 1.96 standard deviations away from the mean. .03 .04 .05 .06 1.7 | 1.8 | .4750 1.9 ———-————-.4750 2.0
.07 ...
What is the area under the curve between the mean and z=-1.65? The normal curve is symmetrical, which means that the negative half mirrors the positive half. We can find the area under the curve for negative z-scores as easily as we can for positive ones. Look down the column for the row labelled 1.6 and then across to the column labelled 0.05. Where these cross you will find the answer: 0.4505. Forty-five percent (45%) of the scores of a group falls between the mean and -1.65 standard deviations from the mean. .02
1.4 1.5 1.6 1.7
.03
.04
.05 .06 ... | | .4505 ———-————-.4505
The Normal Curve Table in Action Let’s continue to use John’s exam scores to further illustrate the use of the Normal Curve. We know John scored 85 in Hebrew. How many students scored higher than © 4th ed. 2006 Dr. Rick Yount
17-3
Research Design and Statistical Analysis in Christian Ministry
III: Statistical Fundamentals
John? Our first step is to compute the z-score for the raw score of 85, which we have already done. We know that the standard score for John's Hebrew score of 85 is zh = 0.500 (diagram on 16-11 and 17-1). The second step is to draw a picture of a normal curve with the area we’re interested in. Notice that I’ve lightly shaded the area to the right of the line labelled z = 0.5. This is because we want to determine how many students scored higher than John. Since higher scores move to the right, the shaded area, which is equal to the proportion of students, is what I need. But just how much area is this?
Look at the Normal Curve Table for the proportion linked to a z-score of 0.5. Down the left column to "0.5." Over to the first column headed ".00." The area related to z=0.5 is 0.1915. I have shaded this area darker in the diagram below. Our lightly shaded area is on the other side of z=0.5! The area under the entire Normal Curve represents 100% of the scores. Therefore, the area under half the curve, from the mean outward, represents 50% (0.5000) of the scores. So, the lightly shaded area in the diagram is equal to 0.5000 minus 0.1915, or 0.3085.
So we know that 30.85% of the students in John’s Hebrew class scored higher than he did. How many students is that? Multiplying .3085 (proportion) times 60 (students in class) gives us 18.51, or 19 students. Nineteen of 60 students scored higher than John on the Hebrew exam. Here's another. John scored 80 in Research. How many students scored lower than this? We’ve already computed John’s z-score in Research as +2.00. The area under the curve between the mean and z = 2.00 is 0.4772. Find 0.4772 in the Table.
17-4
© 4th ed. 2006 Dr. Rick Yount
Chapter 17
The Normal Curve and Hypothesis Testing
Since John also score higher than all the students in the lower half of the curve, we must add the 0.5000 from the negative half of the curve to the 0.4772 value of the positive half to get our answer. So, 97.72% of the students in John’s research class scored lower than he. How many students is this? It is (40 * .9772 =39.09) 39 students. Here's an example which takes another perspective. We’ve used the Normal Curve table to translate z-scores into proportions. We can also translate proportions into z-scores. Take this question: What score did a student in John’s Hebrew class have to make in order to be in the top 10% of the class? We start with an area (0.10) and work back to a z-score, then compute the raw score (X) using the mean and standard deviation for the group. Draw a picture of the problem -- like the one below.
We have “cut off” the top 10% of the curve. What proportion do I use in the Normal Curve table? We know we want the upper 10%. We also know that the table works from the mean out. So, the z-score that cuts off the upper 10% must be the same z-score that cuts off 40% of the scores between itself and the mean (50% - 10% = 40%). The proportion we look for in the table is 0.4000. Search the proportion values in the table and find the one closest to .4000. The closest one in our table is “0.3997.” Look along this row to the left. The z-score value for this row is “1.2.” Look up the column from 0.3997 to the top. The z-score hundredth value is “.08.” The z-score which cuts off the upper 10% of the distribution is 1.28. . 1.0 1.1 1.2 1.3
.08 .09 ... | | .3997 ———-————-.3997 05
.06
.07
The z-score formula introduced in Chapter 16 yields a z-score from a raw score when we know the mean and standard deviation of a group of scores (left formula below). This z-score formula can be transformed into a formula that computes X from z. Multiply both sides of the z-score formula by s and add . This produces the formula below right. Do you see how the two equations below are the same? One solves for z and the other for X.
Substituting the values of z=1.28, we get the following:
=80, and s=10 into the equation above right
X = 80 + (1.28 * 10) = 80 + 12.80 X = 92.80
© 4th ed. 2006 Dr. Rick Yount
17-5
Research Design and Statistical Analysis in Christian Ministry
III: Statistical Fundamentals
A student had to make 92.8 or higher to be in the upper 10% of the Hebrew class. These examples may seem contrived, but they demonstrate basic skills and concepts you’ll need whenever you use parametric inferential statistics. Learn them well, become fluent in their use, because you’ll soon be using them in more complex, but more meaningful, procedures.
Level of Significance John’s Hebrew score was different from the class mean, but was the difference greater than we might expect by chance. Or as a statistician would ask it, was the score significantly different? John’s research score was different from the class mean, but was it significantly different?
Criticial Values We determine whether a difference is significant by using a criterion, or critical value, for testing z-scores. The critical value cuts off a portion of the area under the normal curve, called the region of rejection. The proportion of the normal curve in the region of rejection is called the level of significance. Level of significance is symbolized by the Greek letter alpha (α).
In this example, the critical value of 1.65 cuts off 5% of the normal curve. The level of significance shown above is α = 0.05. Any z-score greater than 1.65 falls into the region of rejection and is declared "significantly different" from the mean. Convention calls for the level of significance to be set at either 0.05 or 0.01.
One- and Two-Tailed Tests When all of α is in one tail of the normal curve, the test is called a “one-tailed test.” When we statistically test a directional hypothesis, we use a one-tail statistical test. (Refer back to Chapter 4, if necessary, to review "directional hypothesis") We can also divide the region of rejection between the tails of the normal curve in order to test non-directional hypotheses. To do this, place half of the level of significance (α/2) in each of the two tails. When statistically testing a non-directional hypothesis, use a two-tailed test. The chart below summarizes the four conditions. Notice the effect of 1- or 2-tailed tests and α=.01 or .05 on the critical values used to test hypotheses. Memorize the conditions for each of the four conventional critical values: 1.65, 1.96. 2.33 and 2.58. Notice that the one-tail critical values (1.65, 2.33) are smaller than the two-tail values (1.96, 2.58). Having chosen a directional hypothesis (demonstrating greater confidence in your study), you can show “significance” with a smaller z-score (easier to obtain) than is possible with a non-directional study.
17-6
© 4th ed. 2006 Dr. Rick Yount
Chapter 17
The Normal Curve and Hypothesis Testing
So now we return to our question at the beginning of this section. Did John score “significantly higher” than his class averages in research and Hebrew? Since this is a directional hypothesis, we'll use a 1-tail test, with α =0.05. Under these conditions, John had to score 1.65 standard deviations above the mean in order for his score to be considered "significantly different." In Research, John scored 2.00 standard deviations above the mean. Since 2.00 is greater than 1.65, we can say with 95% confidence that John scored significantly higher in research than the class average. In Hebrew, John scored 0.5 standard deviations above the mean. Since 0.5 is less than 1.65, we conclude that John did not score significantly higher in Hebrew than the class average. Our discussion to this point has focused on single scores (e.g., John’s exam grades) within a frequency distribution of scores. While this has provided an elementary scenario for building statistical concepts, we seldom have interest in comparing single scores with means. We have much more interest in testing differences between a sample of scores and a given population, or between two or more samples of scores. Among the example Problem Statements in Chapter 4, you saw “Group 1 versus Group 2” types of problems. This requires an emphasis on group means rather than subject scores, on sampling distributions rather than frequency distributions. --- --- Warning! This transition from scores to means is the most confusing element of the course --- ---
Sampling Distributions A distribution of means is called a sampling distribution, which is necessary in making decisions about differences between group means. Just as naturally occurring scores fall into a normal curve distribution, so do the means of samples of scores drawn from a population. The normal curve of scores forms a frequency distribution; the normal curve of means forms a sampling distribution. Look at the diagram at right. Here we see three samples drawn from a population. All three sample means are different, since each group of ten scores is a distinct subset of the whole. The variability among these sample means is called sampling error. Even though we are drawing equal-sized groups from the same population, the means differ from one © 4th ed. 2006 Dr. Rick Yount
17-7
Research Design and Statistical Analysis in Christian Ministry
III: Statistical Fundamentals
another and from the population mean. Differences between means must be large enough to overcome this "natural" variation to be declared significant. If we were to draw 100 samples of 10 scores each from a population of 1000 scores, we would have 100 different mean scores. These 100 sample means would cluster around the population mean in a sampling distribution, just as scores cluster around the sample mean in a frequency distribution. If we were to compute the "mean of the means" we would find it would equal the population mean. The two characteristics which define a normal frequency distribution are the mean and standard deviation. These same characteristics define a sampling distribution. The mean (µ) of a sampling distribution is the population mean, if it is known. If it is unknown, then the best estimate of the mean is one of the sample means ( ). The standard deviation of the sampling distribution, called the standard error of the σ) divided by the mean ( ), is equal to the standard deviation of the population (σ √ square root of the number of subjects in the sample (√n). Or, as in the formula below left,
If the population standard deviation (σ σ) is unknown (which is usually the case), we must estimate it. In this case, the formula for standard error of the mean ( ) is based on the estimated standard deviation (s), as in the formula above right.
The Distinction Illustrated Let’s illustrate these concepts with the following scenario: a staff believes the education space in the church needs renovating. They want to measure “attitude toward building renovation” among the membership. They develop a “building renovation attitude scale” which has a range of 1 (low) to 7 (high). Because of several meetings already conducted, their hypothesis is that church members have a negative attitude toward building. They set α = 0.05, and decide to use a 1-tail test since they are certain the scores will reveal a negative attitude. Here is the seven-point attitude scale used in the study. 1 2 3 4 5 6 7 Negative Neutral Positive On a seven-point scale, the value of “4” is neutral. It represents the condition of neutral attitude. The research1 hypothesis for their study was Ha: µ 30 tests difference between µ and X-bar - if σ unknown and estimated by s (must use if n
View more...
Comments