Doane Chapter 02
Short Description
Doane Chapter 02...
Description
2 Data Collection Data Vocabulary Level of Measurement Time Series and Cross-sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research
C h a p t e r
2 Data Collection Data Vocabulary Level of Measurement Time Series and Cross-sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research
C h a p t e r
Data Vocabulary •
Data is the plural form of the Latin datum (a “given”
fact). •
In scientific research, data arise from experiments whose results are recorded systematically.
•
In business, data usually arise from accounting transactions or management processes.
•
Important decisions may depend on data.
Data Vocabulary Sub jects , Variables, Data D ata Sets S ets
• We will refer to Data as plural and data set as a
particular collection of data as a whole. • Observation – Observation – each data value. • Subject (or individual ) – an item for study (e.g., an
employee in your company). • Variable – Variable – a characteristic about the subject or individual (e.g., employee’s income).
Data Vocabulary Sub jects , Variables, Data Sets
• Three types of data sets: Data Set
Variables
Typical Tasks
Univariate
One
Histograms, descriptive statistics, frequency tallies
Bivariate
Two
Scatter plots, correlations, simple regression
Multivariate More than two
Multiple regression, data mining, econometric modeling
Data Vocabulary Sub jects , Variables, Data Sets
Consider the multivariate data set with 5 variables 8 subjects 5 x 8 = 40 observations
Data Vocabulary Data Typ es
• A data set may have a mixture of data types.
Types of Data Attribute (qualitative) Verbal Label Coded X = economics X = 3 (your major) (i.e., economics)
Numerical (quantitative) Discrete X = 2 (your siblings)
Continuous X = 3.15 (your GPA)
Data Vocabulary A t t r i b u t e D at at a
• Also called categorical, nominal or qualitative data. • Values are described by words rather than
numbers. • For example,
- Automobile style (e.g., X = full, midsize, compact, subcompact). - Mutual fund (e.g., X = load, no-load).
Data Vocabulary D at at a C o d i n g
• Coding refers to using numbers to represent
categories to facilitate statistical analysis. • Coding an attribute as a number does not make
the data numerical. • For example, 1 = Bachelor’s, 2 = Master’s, 3 = Doctorate • Rankings may exist, for example,
1 = Liberal, 2 = Moderate, 3 = Conservative
Data Vocabulary B i n a r y D at at a
• A binary variable has only two values,
1 = presence, 0 = absence of a characteristic of interest (codes themselves are arbitrary). • For example, 1 = employed, 0 = not employed 1 = married, 0 = not married 1 = male, 0 = female 1 = female, 0 = male • The coding itself has no numerical value so binary variables are attribute data.
Data Vocabulary N u m e r i c a l D at at a
• Numerical or quantitative data arise from counting
or some kind of mathematical operation. • For example, - Number of auto insurance claims filed in March (e.g., X = 114 claims). - Ratio of profit to sales for last quarter (e.g., X = 0.0447). • Can be broken down into two types – discrete or continuous data.
Data Vocabulary Disc rete Data
• A numerical variable with a countable number of
values that can be represented by an integer (no fractional values). • For example,
- Number of Medicaid patients (e.g., X = 2). - Number of takeoffs at O’Hare (e.g., X = 37).
Data Vocabulary C o n t i n u o u s Dat a
• A numerical variable that can have any value
within an interval (e.g., length, weight, time, sales, price/earnings ratios). • Any continuous interval contains infinitely many possible values (e.g., 426 20) N
Here, N /n > 20
n
Sampling Methods Probability Samples
Simple Random Sample
Use random numbers to select items from a list (e.g., VISA cardholders).
Systematic Sample
Select every k th item from a list or sequence (e.g., restaurant customers).
Stratified Sample
Select randomly within defined strata (e.g., by age, occupation, gender).
Cluster Sample
Like stratified sampling except strata are geographical areas (e.g., zip codes).
Sampling Methods Nonprobability Samples
Judgment Sample
Use expert knowledge to choose “typical” items (e.g., which employees
to interview). Convenience Sample
Use a sample that happens to be available (e.g., ask co-worker opinions at lunch).
Sampling Methods S im p l e R an d o m S am p l e
• Every item in the population of N items has the
same chance of being chosen in the sample of n items.
• We rely on random numbers to select a
name. =RANDBETWEEN(1,48)
Sampling Methods R an d o m N u m b e r Tab l es
• A table of random digits used to select random numbers between 1 and N. • Each digit 0 through 9 is equally likely to be
chosen. S et t i n g U p a R u l e • For example, NilCo wants to award cash prizes to 10 of its 875 loyal customers. • To get 10 three-digit numbers between 001 and 875, we define any consistent rule for moving through the random number table.
Sampling Methods S et t i n g U p a R u l e
• Randomly point at the table to choose a starting
point. • Choose the first three digits of the selected five-
digit block, move to the right one column, down one row, and repeat. • When we reach the end of a line, wrap around to
the other side of the table and continue. • Discard any number greater than 875 and any duplicates.
Start Here
Table of 1,000 Random Digits
82134
14458
66716
54269
31928
46241
03052
00260
32367
25783
07139
16829
76768
11913
42434
91961
92934
18229
15595
02566
45056
43 9 39
31188
43272
11332
99494
19348
97076
95605
28010
10244
19093
78 51 6
63463
85568
70034
82811
23261
48794
63984
12940
84434
50087
20 1 89
58009
66972
05764
10421
36875
64964
84438
45828
40353
28925
11 9 11
53502
24640
96880
93166
68409
98681
67871
71735
64113
90139
33 4 66
65312
90655
75444
30845
43290
96753
18799
49713
39227
15955
46 1 67
63853
03633
19990
96893
85410
88233
22094
30605
79024
01791
39 38 8
85531
94576
75403
41227
00192
16814
47054
16814
81349
92264
01 0 28
29071
78064
92111
51541
76563
69027
67718
06499
71938
17354
12 6 80
26 2 46
71746
94019
93165
96713
03316
75912
86209
12081
57817
98766
67312
96358
21351
86448
31828
86113
78868
67243
06763
37895
51055
11929
44443
15995
72935
99631
18190
85877
31309
27988
81163
52212
25102
61798
28670
01358
60354
74015
18556
19216
53008
44498
19262
12196
93947
90162
76337
12646
26838
28078
86729
69438
24235
35208
48957
53529
76297
41741
54735
34455
61363
93711
68038
75960
16327
95716
66964
28634
65015
53510
90412
70438
45932
57815
75144
52472
61817
41562
42084
Sampling Methods W i t h o r W i t h o u t R ep l ac e m en t
• If we allow duplicates when sampling, then we are sampling with replacement . • Duplicates are unlikely when n is much smaller than N . • If we do not allow duplicates when sampling, then we are sampling without replacement .
Sampling Methods C o m p u t e r M et h o d s Excel - Option A
Enter the Excel function =RANDBETWEEN(1,875) into 10 spread-sheet cells. Press F9 to get a new sample.
Excel - Option B
Enter the function =INT(1+875*RAND()) into 10 spreadsheet cells. Press F9 to get a new sample.
Internet
The web site www.random.org will give you many kinds of excellent random numbers (integers, decimals, etc).
Minitab
Use Minitab’s Random Data menu with the Integer
option.
These are pseudo-random generators because even the best algorithms eventually repeat themselves.
Using MINITAB to generate random numbers.
Sampling Methods Ro w – C o l u m n D at a A r r ay s
• When the data are arranged in a rectangular array,
an item can be chosen at random by selecting a row and column. • For example, in the 4 x 3 array, select a random
column between 1 and 3 and a random row between 1 and 4. • This way, each item has an equal chance of being
selected.
Sampling Methods Ro w – C o l u m n D at a A r r ay s
• Use =RANDBETWEEN function to choose row 3
and column 3 (Target).
Dillard's
K-Mart
Saks
Dollar General
Kohl's
Sears Roebuck
Federated Dept Stores
May Dept Stores
Target
J. C Penney
Nordstrom
Wal-Mart Stores
Sampling Methods R an d o m i zi n g a L i s t
• In Excel, use function =RAND() beside each row
to create a column of random numbers between 0 and 1. • Copy and paste these numbers into the same column using “Paste Special | Values” (to paste
only the values and not the formulas). • Sort the spreadsheet on the random number
column.
Sampling Methods R an d o m i zi n g a L i s t
• The first n items
are a random sample of the entire list (they are as likely as any others).
Sampling Methods S y s t e m a t ic S am p l i n g
• Sample by choosing every k th item from a list,
starting from a randomly chosen entry on the list. • For example, starting at item 2, we sample every k = 4 items to obtain a sample of n = 20 items from a list of N = 78 items.
• Note that N /n = 78/20 4.
Sampling Methods S y s t e m a t ic S am p l i n g
• A systematic sample of n items from a population of N items requires that periodicity k be approximately N/n. • Systematic sampling should yield acceptable
results unless patterns in the population happen to recur at periodicity k . • Can be used with unlistable or infinite populations. • Systematic samples are well-suited to linearly
organized physical populations.
Sampling Methods S y s t e m a t ic S am p l i n g
• For example, out of 501 companies, we want to
obtain a sample of 25. What should the periodicity k be? k = N /n = 501/25 20. • So, we should choose every 20 th company from a
random starting point.
Sampling Methods S t r at i f i ed S am p l i n g
• Utilizes prior information about the population. • Applicable when the population can be divided
into relatively homogeneous subgroups of known size (strata). • A simple random sample of the desired size is taken within each stratum. • For example, from a population containing 55%
males and 45% females, randomly sample 120 males and 80 females (n = 200).
Sampling Methods S t r at i f i ed S am p l i n g
• Or, take a random sample of the entire population
and then combine individual strata estimates using appropriate weights. • For a population with L strata, the population size N is the sum of the stratum sizes: N = N 1 + N 2 + ... + N L • The weight assigned to stratum j is w j = N j / n • For example, take a random sample of n = 200 and then weight the responses for males by .55 and for females by = .45.
Sampling Methods C l u s t e r Sam p l e
• Strata consist of geographical regions. • One-stage cluster sampling – sample consists of all elements in each of k randomly chosen
subregions (clusters). • Two-stage cluster sampling, first choose k
subregions (clusters), then choose a random sample of elements within each cluster.
Sampling Methods C l u s t e r Sam p l e
• Here is an
example of 4 elements sampled from each of 3 randomly chosen clusters (two-stage cluster sampling).
Sampling Methods C l u s t e r Sam p l e
• Cluster sampling is useful when
- Population frame and stratum characteristics are not readily available - It is too expensive to obtain a simple or stratified sample - The cost of obtaining data increases sharply with distance - Some loss of reliability is acceptable
Sampling Methods J u d g m en t Sam p l e
• A nonprobability sampling method that relies on
the expertise of the sampler to choose items that are representative of the population. • Can be affected by subconscious bias (i.e., nonrandomness in the choice). • Quota sampling is a special kind of judgment
sampling, in which the interviewer chooses a certain number of people in each category.
Sampling Methods C o n v en i en c e S am p l e
• Take advantage of whatever sample is available at
that moment. A quick way to sample. Sam p le Size
• Sample size depends on the inherent variability of
the quantity being measured and on the desired precision of the estimate.
Data Sources U s e f u l D a t a So u r c e s Type of Data
Examples
U.S. general data
Statistical Abstract of the U.S.
U.S. economic data
Economic Report of the President
Almanacs
World Almanac, Time Almanac
Periodicals
Economist, Business Week, Fortune
Indexes
New York Times, Wall Street Journal
Databases
CompuStat, Citibase, U.S. Census
World data
CIA World Factbook
Web
Google, Yahoo, msn
Survey Research B as i c 1:S State t e p s the o f Su r v ey R es e ar c h goals of the research • Step
• Step 2:
Develop the budget (time, money,
staff) • Step 3:
Create a research design (target
population, frame, sample size) • Step 4:
Choose a survey type and method of
Survey Research B as i c S t e p s o f Su r v ey R es e ar c h
• Step 5:
• Step 6:
Design a data collection instrument (questionnaire) Pretest the survey instrument and
revise as needed • Step 7:
needed)
Administer the survey (follow up if
Survey Research S u r v e y Ty p es Type of Survey
Characteristics
Mail
You need a well-targeted and current mailing list (people move a lot). Low response rates are typical and nonresponse bias is expected (nonrespondents differ from those who respond). Zip code lists (often costly) are an attractive option to define strata of similar income, education, and attitudes. To encourage participation, a cover letter should clearly explain the uses to which the data will be put. Plan for follow-up mailings.
Survey Research S u r v e y Ty p es Type of Survey
Characteristics
Telephone
Random dialing yields very low response and is poorly targeted. Purchased phone lists help reach the target population, though a low response rate still is typical (disconnected phones, caller screening, answering machines, work hours, nocall lists). Other sources of nonresponse bias include the growing number of non-English speakers and distrust caused by scams and spams.
Survey Research S u r v e y Ty p es Type of Survey
Characteristics
Interviews
Interviewing is expensive and time-consuming, yet a trade-off between sample size for high-quality results may still be worth it. Interviews must be carefully handled so interviewers must be welltrained – an added cost. But you can obtain information on complex or sensitive topics (e.g., gender discrimination in companies, birth control practices, diet and exercise habits).
Survey Research S u r v e y Ty p es Type of Survey
Characteristics
Web
Web surveys are growing in popularity, but are subject to nonresponse bias because those who participate may differ from those who feel too busy, don’t own computers or distrust your motives
(scams and spam are again to blame). This type of survey works best when targeted to a well-defined interest group on a question of self-interest (e.g., views of CPAs on new proposed accounting rules, frequent flyer views on airline security).
Survey Research S u r v e y Ty p es Type of Survey
Characteristics
Direct Observation
This can be done in a controlled setting (e.g., psychology lab) but requires informed consent, which can change behavior. Unobtrusive observation is possible in some nonlab settings (e.g., what percentage of airline passengers carry on more than two bags, what percentage of SUVs carry no passengers, what percentage of drivers wear seat belts).
Survey Research S u r v e y G u i d e l i n es
Plan
What is the purpose of the survey? Consider staff expertise, needed skills, degree of precision, budget.
Design
Invest time and money in designing the survey. Use books and references to avoid unnecessary errors.
Quality
Take care in preparing a quality survey so that people will take you seriously.
Survey Research S u r v e y G u i d e l i n es
Pilot Test Buy-in
Pretest on friends or co-workers to make sure the survey is clear. Improve response rates by stating the purpose of the survey, offering a token of appreciation or paving the way with endorsements.
Expertise
Work with a consultant early on.
Survey Research G et t in g A d v i c e
• Consider hiring a consultant in the early stages. • Many resources are available to help
- The American Statistical Association - The Research Industry Coalition - The Council of American Survey Research Organizations
Survey Research Q u e s t i o n n a i r e D es i g n
• Use a lot of white space in layout. • Begin with short, clear instructions. • State the survey purpose. • Assure anonymity. • Instruct on how to submit the completed survey.
Survey Research Q u e s t i o n n a i r e D es i g n
• Break survey into naturally occurring sections. • Let respondents bypass sections that are not applicable (e.g., “if you answered no to question 7, skip directly to Question 15”). • Pretest and revise as needed. • Keep as short as possible.
Survey Research Q u e s t i o n n a i r e D es i g n Type of Question
Example
Open-ended question
Briefly describe your job goals.
Fill-in-the-blank
How many times did you attend formal religious services during the last year? ________ times
Check boxes
Which of these statistics packages have you ever used? SAS Visual Statistics SPSS MegaStat Systat Minitab
Survey Research Q u e s t i o n n a i r e D es i g n Type of Question
Example
Ranked choices
“Please evaluate your dining experience”
Excellent
Good
Fair
Poor
Food
Service
Cleanliness
Overall
Ambiance
Survey Research Q u e s t i o n n a i r e D es i g n Type of Question
Example
Pictograms
“What do you think of the President’s economic policies?” (circle one)
Likert scale
Statistics is a difficult subject. Strongly Agree
Slightly Agree
Neither Agree Nor Slightly Strongly Disagree Disagree Disagree
Survey Research Q u es t i o n W o r d i n g
• The way a question is asked has a profound
influence on the response. For example, 1. Shall state taxes be cut? 2. Shall state taxes be cut, if it means reducing highway maintenance? 3. Shall state taxes be cut, it is means firing teachers and police?
Survey Research Q u es t i o n W o r d i n g
• Make sure you have covered all the possibilities.
For example, Are you married?
Yes No
• Overlapping classes or
How old is your father? unclear categories are a 35 – 45 problem. For example, 45 – 55 55 – 65 65 or older
Survey Research C o d i n g an d D at a S c r e en i n g
• Responses are usually coded numerically
(e.g., 1 = male 2 = female). • Missing values are typically denoted by special characters (e.g., blank, “.” or “*”). • Discard questionnaires that are flawed or missing
many responses. • Watch for multiple responses, outrageous or inconsistent replies or range answers. • Follow-up if necessary and always document your data-coding decisions.
Survey Research S o u r c es o f E r r o r Source of Error
Characteristics
Nonresponse bias
Respondents differ from nonrespondents
Selection bias
Self-selected respondents are atypical
Response error
Respondents give false information
Coverage error
Incorrect specification of frame or population
Interviewer error
Responses influenced by interviewer
Measurement error
Survey instrument wording is biased or unclear
Sampling error
Random and unavoidable
Survey Research Data File For m at
• Enter data into a spreadsheet or database as a “flat file” (n subjects x m variables matrix).
Survey Research A d v i c e o n C o p y i n g D at a
• Using commas (,), dollar signs ($), or percents (%)
as part of the values may result in your data being treated as text values. • A numerical variable may only contain the digits
0-9, a decimal point, and a minus sign. • To avoid round-off errors, format the data column
as plain numbers with the desired number of decimal places before you copy the data to a statistical package.
View more...
Comments