Doane Chapter 02

June 14, 2018 | Author: ThomasMCarter | Category: Level Of Measurement, Survey Methodology, Sampling (Statistics), Likert Scale, Statistics
Share Embed Donate


Short Description

Doane Chapter 02...

Description

2 Data Collection Data Vocabulary Level of Measurement Time Series and Cross-sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

 C  h   a  p  t    e r 

2 Data Collection Data Vocabulary Level of Measurement Time Series and Cross-sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

 C  h   a  p  t    e r 

Data Vocabulary •

Data is the plural form of the Latin datum (a “given”

fact). •

In scientific research, data arise from experiments whose results are recorded systematically.



In business, data usually arise from accounting transactions or  management processes.



Important decisions may depend on data.

Data Vocabulary  Sub jects , Variables, Data D ata Sets  S ets 

• We will refer to Data as plural and data set as a

particular collection of data as a whole. • Observation – Observation – each data value. • Subject (or individual ) – an item for study (e.g., an

employee in your company). • Variable – Variable – a characteristic about the subject or  individual (e.g., employee’s income).

Data Vocabulary  Sub jects , Variables, Data Sets 

• Three types of data sets: Data Set 

Variables

Typical Tasks

Univariate

One

Histograms, descriptive statistics, frequency tallies

Bivariate

Two

Scatter plots, correlations, simple regression

Multivariate More than two

Multiple regression, data mining, econometric modeling

Data Vocabulary  Sub jects , Variables, Data Sets 

Consider the multivariate data set with 5 variables 8 subjects 5 x 8 = 40 observations

Data Vocabulary  Data Typ es 

•  A data set may have a mixture of data types.

Types of Data  Attribute (qualitative) Verbal Label Coded  X = economics  X = 3 (your major) (i.e., economics)

Numerical (quantitative) Discrete  X = 2 (your siblings)

Continuous  X = 3.15 (your GPA)

Data Vocabulary  A t t r i b u t e D at at a  

•  Also called categorical, nominal or qualitative data. • Values are described by words rather than

numbers. • For example,

- Automobile style (e.g., X = full, midsize, compact, subcompact). - Mutual fund (e.g., X = load, no-load).

Data Vocabulary  D at at a C o d i n g  

• Coding refers to using numbers to represent

categories to facilitate statistical analysis. • Coding an attribute as a number does not make

the data numerical. • For example, 1 = Bachelor’s, 2 = Master’s, 3 = Doctorate • Rankings may exist, for example,

1 = Liberal, 2 = Moderate, 3 = Conservative

Data Vocabulary  B i n a r y D at at a  

•  A binary variable has only two values,

1 = presence, 0 = absence of a characteristic of  interest (codes themselves are arbitrary). • For example, 1 = employed, 0 = not employed 1 = married, 0 = not married 1 = male, 0 = female 1 = female, 0 = male • The coding itself has no numerical value so binary variables are attribute data.

Data Vocabulary  N u m e r i c a l D at at a  

• Numerical or quantitative data arise from counting

or some kind of mathematical operation. • For example, - Number of auto insurance claims filed in March (e.g., X = 114 claims). - Ratio of profit to sales for last quarter  (e.g., X = 0.0447). • Can be broken down into two types – discrete or  continuous data.

Data Vocabulary  Disc rete Data 

•  A numerical variable with a countable number of 

values that can be represented by an integer (no fractional values). • For example,

- Number of Medicaid patients (e.g., X = 2). - Number of takeoffs at O’Hare (e.g., X = 37).

Data Vocabulary  C o n t i n u o u s Dat a  

•  A numerical variable that can have any value

within an interval (e.g., length, weight, time, sales, price/earnings ratios). •  Any continuous interval contains infinitely many possible values (e.g., 426 20) N

Here, N /n > 20

n

Sampling Methods Probability Samples

Simple Random Sample

Use random numbers to select items from a list (e.g., VISA cardholders).

Systematic Sample

Select every k th item from a list or  sequence (e.g., restaurant customers).

Stratified Sample

Select randomly within defined strata (e.g., by age, occupation, gender).

Cluster Sample

Like stratified sampling except strata are geographical areas (e.g., zip codes).

Sampling Methods Nonprobability Samples

Judgment Sample

Use expert knowledge to choose “typical” items (e.g., which employees

to interview). Convenience Sample

Use a sample that happens to be available (e.g., ask co-worker opinions at lunch).

Sampling Methods  S im p l e R an d o m S am p l e  

• Every item in the population of N items has the

same chance of being chosen in the sample of n items.

• We rely on random numbers to select a

name. =RANDBETWEEN(1,48)

Sampling Methods  R an d o m N u m b e r Tab l es  

•  A table of random digits used to select random numbers between 1 and N. • Each digit 0 through 9 is equally likely to be

chosen.  S et t i n g U p a R u l e   • For example, NilCo wants to award cash prizes to 10 of its 875 loyal customers. • To get 10 three-digit numbers between 001 and 875, we define any consistent rule for moving through the random number table.

Sampling Methods  S et t i n g U p a R u l e  

• Randomly point at the table to choose a starting

point. • Choose the first three digits of the selected five-

digit block, move to the right one column, down one row, and repeat. • When we reach the end of a line, wrap around to

the other side of the table and continue. • Discard any number greater than 875 and any duplicates.

Start Here

Table of 1,000 Random Digits

82134

14458

66716

54269

31928

46241

03052

00260

32367

25783

07139

16829

76768

11913

42434

91961

92934

18229

15595

02566

45056

43 9  39

31188

43272

11332

99494

19348

97076

95605

28010

10244

19093

78 51 6 

63463

85568

70034

82811

23261

48794

63984

12940

84434

50087

20 1  89

58009

66972

05764

10421

36875

64964

84438

45828

40353

28925

11 9  11

53502

24640

96880

93166

68409

98681

67871

71735

64113

90139

33 4  66

65312

90655

75444

30845

43290

96753

18799

49713

39227

15955

46 1  67

63853

03633

19990

96893

85410

88233

22094

30605

79024

01791

39 38 8 

85531

94576

75403

41227

00192

16814

47054

16814

81349

92264

01 0  28

29071

78064

92111

51541

76563

69027

67718

06499

71938

17354

12 6  80

26 2  46

71746

94019

93165

96713

03316

75912

86209

12081

57817

98766

67312

96358

21351

86448

31828

86113

78868

67243

06763

37895

51055

11929

44443

15995

72935

99631

18190

85877

31309

27988

81163

52212

25102

61798

28670

01358

60354

74015

18556

19216

53008

44498

19262

12196

93947

90162

76337

12646

26838

28078

86729

69438

24235

35208

48957

53529

76297

41741

54735

34455

61363

93711

68038

75960

16327

95716

66964

28634

65015

53510

90412

70438

45932

57815

75144

52472

61817

41562

42084

Sampling Methods  W i t h o r W i t h o u t R ep l ac e m en t  

• If we allow duplicates when sampling, then we are sampling with replacement . • Duplicates are unlikely when n is much smaller  than N . • If we do not allow duplicates when sampling, then we are sampling without replacement .

Sampling Methods  C o m p u t e r M et h o d s   Excel - Option A

Enter the Excel function =RANDBETWEEN(1,875) into 10 spread-sheet cells. Press F9 to get a new sample.

Excel - Option B

Enter the function =INT(1+875*RAND()) into 10 spreadsheet cells. Press F9 to get a new sample.

Internet

The web site www.random.org will give you many kinds of excellent random numbers (integers, decimals, etc).

Minitab

Use Minitab’s Random Data menu with the Integer 

option.

These are pseudo-random generators because even the best algorithms eventually repeat themselves.

Using MINITAB to generate random numbers.

Sampling Methods  Ro w  – C o l u m n D at a A r r ay s  

• When the data are arranged in a rectangular array,

an item can be chosen at random by selecting a row and column. • For example, in the 4 x 3 array, select a random

column between 1 and 3 and a random row between 1 and 4. • This way, each item has an equal chance of being

selected.

Sampling Methods  Ro w  – C o l u m n D at a A r r ay s  

• Use =RANDBETWEEN function to choose row 3

and column 3 (Target).

Dillard's

K-Mart

Saks

Dollar General

Kohl's

Sears Roebuck

Federated Dept Stores

May Dept Stores

Target 

J. C Penney

Nordstrom

Wal-Mart Stores

Sampling Methods  R an d o m i zi n g a L i s t  

• In Excel, use function =RAND() beside each row

to create a column of random numbers between 0 and 1. • Copy and paste these numbers into the same column using “Paste Special | Values” (to paste

only the values and not the formulas). • Sort the spreadsheet on the random number 

column.

Sampling Methods  R an d o m i zi n g a L i s t  

• The first n items

are a random sample of the entire list (they are as likely as any others).

Sampling Methods  S y s t e m a t ic S am p l i n g  

• Sample by choosing every k th item from a list,

starting from a randomly chosen entry on the list. • For example, starting at item 2, we sample every k = 4 items to obtain a sample of n = 20 items from a list of N = 78 items.

• Note that N /n = 78/20  4.

Sampling Methods  S y s t e m a t ic S am p l i n g  

•  A systematic sample of n items from a population of N items requires that periodicity k be approximately N/n. • Systematic sampling should yield acceptable

results unless patterns in the population happen to recur at periodicity k . • Can be used with unlistable or infinite populations. • Systematic samples are well-suited to linearly

organized physical populations.

Sampling Methods  S y s t e m a t ic S am p l i n g  

• For example, out of 501 companies, we want to

obtain a sample of 25. What should the periodicity k be? k = N /n = 501/25  20. • So, we should choose every 20 th company from a

random starting point.

Sampling Methods  S t r at i f i ed S am p l i n g  

• Utilizes prior information about the population. •  Applicable when the population can be divided

into relatively homogeneous subgroups of known size (strata). •  A simple random sample of the desired size is taken within each stratum. • For example, from a population containing 55%

males and 45% females, randomly sample 120 males and 80 females (n = 200).

Sampling Methods  S t r at i f i ed S am p l i n g  

• Or, take a random sample of the entire population

and then combine individual strata estimates using appropriate weights. • For a population with L strata, the population size N is the sum of the stratum sizes: N = N 1 + N 2  + ... + N L • The weight assigned to stratum j is w  j  = N j  / n • For example, take a random sample of n = 200 and then weight the responses for males by .55 and for females by = .45.

Sampling Methods  C l u s t e r Sam p l e  

• Strata consist of geographical regions. • One-stage cluster sampling  – sample consists of  all elements in each of k randomly chosen

subregions (clusters). • Two-stage cluster sampling, first choose k 

subregions (clusters), then choose a random sample of elements within each cluster.

Sampling Methods  C l u s t e r Sam p l e  

• Here is an

example of 4 elements sampled from each of 3 randomly chosen clusters (two-stage cluster sampling).

Sampling Methods  C l u s t e r Sam p l e  

• Cluster sampling is useful when

- Population frame and stratum characteristics are not readily available - It is too expensive to obtain a simple or stratified sample - The cost of obtaining data increases sharply with distance - Some loss of reliability is acceptable

Sampling Methods  J u d g m en t Sam p l e  

•  A nonprobability sampling method that relies on

the expertise of the sampler to choose items that are representative of the population. • Can be affected by subconscious bias (i.e., nonrandomness in the choice). • Quota sampling is a special kind of judgment

sampling, in which the interviewer chooses a certain number of people in each category.

Sampling Methods  C o n v en i en c e S am p l e  

• Take advantage of whatever sample is available at

that moment. A quick way to sample.  Sam p le Size 

• Sample size depends on the inherent variability of 

the quantity being measured and on the desired precision of the estimate.

Data Sources  U s e f u l D a t a So u r c e s   Type of Data

Examples

U.S. general data

Statistical Abstract of the U.S.

U.S. economic data

Economic Report of the President 

 Almanacs

World Almanac, Time Almanac 

Periodicals

Economist, Business Week, Fortune

Indexes

New York Times, Wall Street Journal 

Databases

CompuStat, Citibase, U.S. Census

World data

CIA World Factbook 

Web

Google, Yahoo, msn

Survey Research B as i c 1:S State t e p s the o f Su r v ey R es e ar c h   goals of the research • Step

• Step 2:

Develop the budget (time, money,

staff) • Step 3:

Create a research design (target

population, frame, sample size) • Step 4:

Choose a survey type and method of 

Survey Research  B as i c S t e p s o f Su r v ey R es e ar c h  

• Step 5:

• Step 6:

Design a data collection instrument (questionnaire) Pretest the survey instrument and

revise as needed • Step 7:

needed)

 Administer the survey (follow up if 

Survey Research  S u r v e y Ty p es   Type of  Survey

Characteristics

Mail

You need a well-targeted and current mailing list (people move a lot). Low response rates are typical and nonresponse bias is expected (nonrespondents differ from those who respond). Zip code lists (often costly) are an attractive option to define strata of  similar income, education, and attitudes. To encourage participation, a cover letter should clearly explain the uses to which the data will be put. Plan for follow-up mailings.

Survey Research  S u r v e y Ty p es   Type of  Survey

Characteristics

Telephone

Random dialing yields very low response and is poorly targeted. Purchased phone lists help reach the target population, though a low response rate still is typical (disconnected phones, caller  screening, answering machines, work hours, nocall lists). Other sources of nonresponse bias include the growing number of non-English speakers and distrust caused by scams and spams.

Survey Research  S u r v e y Ty p es   Type of  Survey

Characteristics

Interviews

Interviewing is expensive and time-consuming, yet a trade-off between sample size for high-quality results may still be worth it. Interviews must be carefully handled so interviewers must be welltrained – an added cost. But you can obtain information on complex or sensitive topics (e.g., gender discrimination in companies, birth control practices, diet and exercise habits).

Survey Research  S u r v e y Ty p es   Type of  Survey

Characteristics

Web

Web surveys are growing in popularity, but are subject to nonresponse bias because those who participate may differ from those who feel too busy, don’t own computers or distrust your motives

(scams and spam are again to blame). This type of  survey works best when targeted to a well-defined interest group on a question of self-interest (e.g., views of CPAs on new proposed accounting rules, frequent flyer views on airline security).

Survey Research  S u r v e y Ty p es   Type of  Survey

Characteristics

Direct Observation

This can be done in a controlled setting (e.g., psychology lab) but requires informed consent, which can change behavior. Unobtrusive observation is possible in some nonlab settings (e.g., what percentage of airline passengers carry on more than two bags, what percentage of SUVs carry no passengers, what percentage of drivers wear seat belts).

Survey Research  S u r v e y G u i d e l i n es  

Plan

What is the purpose of the survey? Consider staff expertise, needed skills, degree of precision, budget.

Design

Invest time and money in designing the survey. Use books and references to avoid unnecessary errors.

Quality

Take care in preparing a quality survey so that people will take you seriously.

Survey Research  S u r v e y G u i d e l i n es  

Pilot Test Buy-in

Pretest on friends or co-workers to make sure the survey is clear. Improve response rates by stating the purpose of the survey, offering a token of  appreciation or paving the way with endorsements.

Expertise

Work with a consultant early on.

Survey Research  G et t in g A d v i c e  

• Consider hiring a consultant in the early stages. • Many resources are available to help

- The American Statistical Association - The Research Industry Coalition - The Council of American Survey Research Organizations

Survey Research  Q u e s t i o n n a i r e D es i g n  

• Use a lot of white space in layout. • Begin with short, clear instructions. • State the survey purpose. •  Assure anonymity. • Instruct on how to submit the completed survey.

Survey Research  Q u e s t i o n n a i r e D es i g n  

• Break survey into naturally occurring sections. • Let respondents bypass sections that are not applicable (e.g., “if you answered no to question 7, skip directly to Question 15”). • Pretest and revise as needed. • Keep as short as possible.

Survey Research  Q u e s t i o n n a i r e D es i g n   Type of Question

Example

Open-ended question

Briefly describe your job goals.

Fill-in-the-blank

How many times did you attend formal religious services during the last year?  ________ times

Check boxes

Which of these statistics packages have you ever used?  SAS  Visual Statistics  SPSS  MegaStat  Systat  Minitab

Survey Research  Q u e s t i o n n a i r e D es i g n   Type of Question

Example

Ranked choices

“Please evaluate your dining experience”

Excellent

Good

Fair

Poor 

Food









Service

















Cleanliness









Overall









 Ambiance

Survey Research  Q u e s t i o n n a i r e D es i g n   Type of Question

Example

Pictograms

“What do you think of the President’s economic policies?” (circle one)

Likert scale

Statistics is a difficult subject. Strongly  Agree

Slightly Agree





Neither  Agree Nor Slightly Strongly Disagree Disagree Disagree 





Survey Research  Q u es t i o n W o r d i n g  

• The way a question is asked has a profound

influence on the response. For example, 1. Shall state taxes be cut? 2. Shall state taxes be cut, if it means reducing highway maintenance? 3. Shall state taxes be cut, it is means firing teachers and police?

Survey Research  Q u es t i o n W o r d i n g  

• Make sure you have covered all the possibilities.

For example,  Are you married?



Yes  No

• Overlapping classes or 

How old is your father? unclear categories are a  35 – 45 problem. For example,  45 – 55  55 – 65  65 or older 

Survey Research  C o d i n g an d D at a S c r e en i n g  

• Responses are usually coded numerically

(e.g., 1 = male 2 = female). • Missing values are typically denoted by special characters (e.g., blank, “.” or “*”). • Discard questionnaires that are flawed or missing

many responses. • Watch for multiple responses, outrageous or  inconsistent replies or range answers. • Follow-up if necessary and always document your  data-coding decisions.

Survey Research  S o u r c es o f E r r o r   Source of Error 

Characteristics

Nonresponse bias

Respondents differ from nonrespondents

Selection bias

Self-selected respondents are atypical

Response error 

Respondents give false information

Coverage error 

Incorrect specification of frame or  population

Interviewer error 

Responses influenced by interviewer 

Measurement error 

Survey instrument wording is biased or  unclear 

Sampling error 

Random and unavoidable

Survey Research  Data File For m at 

• Enter data into a spreadsheet or database as a “flat file” (n subjects x m variables matrix).

Survey Research  A d v i c e o n C o p y i n g D at a  

• Using commas (,), dollar signs ($), or percents (%)

as part of the values may result in your data being treated as text values. •  A numerical variable may only contain the digits

0-9, a decimal point, and a minus sign. • To avoid round-off errors, format the data column

as plain numbers with the desired number of  decimal places before you copy the data to a statistical package.

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF