Descripción: Formula sheet for Elementary Survey Sampling 7th Edition by Richard L. Scheaffer ; William Mendenhall, III ...
Survey Sampling Formula Sheet Prepared by Robert D’Agostino
[email protected] Formulas and Descriptions Obtained from: Elementary Survey Sampling 7th Edition Richard L. Scheaffer ; William Mendenhall, III ; R. Lyman Ott ; Kenneth G. Gerow
Population Parameters / Sample Statistics ∑
)
∑(
∑(
)
∑(
∑(
)
̅)
[∑(
)
(∑
) ]
Simple Random Sampling If a sample size of size is drawn from a population of size such that every possible sample of size has the same chance of being selected, the sampling procedure is called simple random sampling. Estimators / Bound on the error: ̂
̂
̅
∑
̂
̂ ( ̅)
(
∑
̅
⁄
̅
∑
)
(
)
√ ̂ ( ̅)
̂ ( ̂)
√(
̂ ( ̅)
√ ̂ ( ̂)
(
)
√(
̂ ( ̂)
(
√ ̂ ( ̂) Sample size required to estimate
)
)
√(
) ̂̂
̂̂
)
with a bound on the error of estimation B: (
)
(
)
Sample size required to estimate with a bound on the error of estimation B: ( Sample size required to estimate
)
(
)
with a bound on the error of estimation B: (
)
(
If no estimate of is available, substitute be larger than what’s required.
)
to obtain a conservative sample size which will likely
Stratified Random Sampling A stratified random sample is one obtained by separating the population elements into non-overlapping groups, called strata, and then selecting a simple random sample from each stratum. ∑ Notation: # of strata # of sampling units in stratum , . ̂
̂(̅ )
̅
∑
∑
(
̅
)( )
̂
̂( ̅ )
̅
̂(̅ )
∑
̅
∑
(
)( )
Approximate sample size required to estimate µ or with a bound B on the error of estimation: ∑ ∑ Where
is the fraction of observations allocated to stratum . when estimating .
when estimating
and
Approximate allocation that minimizes cost for a fixed value of ( ̅ ) or minimizes ( ̅ ) for a fixed cost: √ ( ) ∑ √ And the total sample size for optimal allocation that minimizes cost for a fixed value of ( ̅ ) or minimizes ( ̅ ) for a fixed cost: (∑ √ )(∑ √ ) ∑ Neyman Allocation: (
)
∑
(∑
) ∑
Special Case of Neyman Allocation (Proportional Allocation): If the stratum variances are approximately equal, the Neyman Allocation formulas reduce to: (
( )
)
∑ ∑
∑ Estimation of the Population Proportion: ̂
̂( ̂ )
∑
̂( ̂ )
∑
̂
∑
(
)(
̂ ̂
)
Approximate sample size required to estimate ∑ ∑
with a bound B on the error of estimation: ∑ ∑
Approximate allocation that minimizes cost for a fixed value of ( ̂ ) or minimizes ( ̂ ) for a fixed cost: √ ( ) ∑ √ Stratification after Selection of the Sample: ̅
̂(̅ )
∑( ) (
̂ (̅ )
(
)( )
∑
∑( ) ( )
)∑
∑(
Ratio, Regression, and Difference Estimation ̅ ̂ ̅
∑
)
∑ ∑ ∑ ∑
̂
∑ ∑
̂
̂( )
(
̅ ̅ )(
∑
Note: We can estimate
̅
)( )
(
)
by ̅ if needed. ̂( ̂ ) ̂( ̂ )
̂( ) ̂( )
( (
) )
Sample size required to estimate R with a bound on the error of estimation B:
Sample size required to estimate
with a bound on the error of estimation B:
Sample size required to estimate
with a bound on the error of estimation B:
Ratio Estimation in Stratified Random Sampling: Separate ratio estimator: ̅ Estimate the ratio within each stratum by ̂ ⁄ ̅ and then form a weighted average of these separate estimates as a single estimate of population ratio denoted by ̂ . ̂
̂( ̂ )
∑
̂
(
)
∑(
̂
̂( ̂
∑
)
̂
∑( ) (
̂
)
̅ ̅
)(
)
Combined ratio estimator: We estimate by the stratified estimate ̅ as usual, and then estimate combined ratio estimate denoted by ̂ . ̅ ̂ ̅ ̂( ̂ )
∑( ) (
)(
)
by ̅ , leading to the
̂
∑( ̅ ̅
̂ ̂
)
Regression Estimation: ̅ ̂
̂( ̂ )
(
(
̅ ∑
)( )(
( ∑
̅ ̅) ̅)
̅)( (
) ∑(
̅) (
) )
Difference Estimation: ̅ ̂ ̂( ̂
)
̅ (
(
̅
(
)(
)
̅ ̅
̅)
)( )(
) ∑(
̅)
Systematic Sampling A sample obtained by randomly selecting one element from the first k elements in the frame and every kth element thereafter is called a 1 in k systematic sample with a random start. Similarly, one can perform a repeated systematic sampling procedure, also known as a r in k systematic sample. In this procedure, r random starts are selected. ∑ ̂ ̅ ̂(̅ )
(
)
These formulas are the exact same as simple random sampling, similarly they are the same for proportions and totals as well. Sample size calculations are done using the simple random sampling formulas also. Cluster Sampling A cluster sample is a probability sample in which each sampling unit is a collection, or cluster, of elements. In these formulas # of clusters in the population, # of clusters selected in a SRS, # of elements in cluster , , ̅ average cluster size for the sample , # of elements in the population , ̅ average cluster size for the population, total of all observations in cluster .
̂
̂ ( ̅)
( ∑
̂
) (
̅ )
̅ ∑ ∑
̅
̂ ( ̅) Estimator of the population total, when
∑ ∑
̅
̂ ( ̅)
(
)
is unknown: ̂
̂( ̅ )
̅
∑
̂(̅ ) ∑
Approximate sample size required to estimate
(
(
)
̅)
with a bound B on the error of estimation: ̅
Approximate sample size required to estimate using
̅ with a bound B on the error of estimation:
Approximate sample size required to estimate using
̅ with a bound B on the error of estimation:
Estimator of the population proportion : Let denote the total number of elements in cluster that possess the characteristic of interest. ̂
∑ ∑
̂ ( ̂)
( ∑
) (
̅ ) ̂
Sample size required to estimate ̂ with a bound B on the error of estimation: ̅ Cluster Sampling Combined with Stratification: ∑ ∑
̅
̂ ( ̂)
̅ ̅
∑
(
∑(
)
̅
∑
)
̅
Cluster Sampling with Probabilities Proportional to Size: ̂
̂( ̂
)
̿
(
∑
)
̂
̂( ̂
)
̅
∑( ̅
̂
)
∑̅
(
)
∑( ̅
̂
)
Two-Stage Cluster Sampling A two-stage cluster sample is obtained by first selecting a probability sample of clusters, and then selecting a probability sample of elements from each sampled cluster. For this chapter we denote # of clusters in the population, number of clusters selected in a simple random sample, # of elements in cluster , # of elements selected in a simple random
# of elements in the population , ̅
sample from cluster , population , mean for the
the
̂
̂ ( ̂)
(
( )
)(
̅
∑
)
Ratio Estimation (Use when
̂
)(
is unknown):
(
)(
̅
∑
∑(
)(
̅
)
∑(
̅
̂
(
)(
))
̅
)
Estimation of a Population Proportion:
(
))
̅
)
∑(
̂( ̂ )
)(
̅ ̂)
̅
∑ ∑ ̂
̂( ̂ )
(
̅)
̂
(
̅
∑(
̅
∑(
̂ ( ̂)
sample
̅
∑(
̂ ( ̂)
∑
̅
∑
cluster , ̅
observation in the sample from the
cluster.
= average cluster size for the
(
)(
))
̂ )
̅
∑ ∑ ̂
̅
∑(
∑(
̂
(
̂)
)(
̂ ̂
))
Sampling Equal-Sized Clusters: Suppose that each cluster contains ̅ elements; that is common to take samples of equal size from each cluster, so that these conditions we can obtain estimates: ̂
̂ ( ̂)
∑∑
(
∑̅
)
(
̅
)
∑ ∑(
̿
)( )
̂)
∑( ̅
(
̅ . In this case, it is Under
̅)
∑
Two-Stage Cluster Sampling with Probabilities Proportional to Size: Since the number of elements in a cluster may vary from cluster to cluster, it may be advantageous to sample clusters proportional to their sizes. These estimates are for two stage cluster sampling in which the first stage sampling is carried out with probabilities proportional to size. ̂
̂( ̂
)
∑̅
(
)
̂
̂( ̂
)
∑( ̅
̂
)
∑̅
(
)
∑( ̅
̂
)
Estimating the Population Size Direct sampling: First, a random sample of size is drawn from the population. At a later date a second sample of size is drawn. Let denote the number of tagged individuals observed in the second sample. ̂ ̂ ( ̂) Inverse sampling:
(
)
First, an initial sample of individuals is drawn, tagged, and released. Later, random sampling is conducted until exactly tagged animals are recaptured. If the sample contains individuals, the proportion of tagged individuals is estimated by . Note that is fixed, and is random. ̂ ( (
̂ ( ̂)
) )
Choosing Sample Sizes for Direct and Inverse Sampling: Let
, and
. It turns out that
( ̂)
. Thus if given an estimate of N, and targeted
value of variance, one can determine either of the sampling fractions, given a choice for the other. Estimating Population Density and Size from Quadrat Samples: ∑ Let denote the number of elements in quadrat , # of total elements in the ⁄ population (having combined area ) , density of elements. Suppose the element counts are obtained from independently and randomly selected quadrats, each of area . ̅
̂
̂
̂ (̂) ̂
̂ ̂ ( ̂)
̂ (̂)
(
̂
)
Estimating Population Density and Size from Stocked Quadrats: A quadrat that contains species of interest is said to be stocked. For a sample of quadrats each of area , from a population of area , let denote the number of sampled quadrats that are NOT stocked. ̂
( )
̂ (̂)
( ̂
̂ ( ̂)
̂ (̂)
Adaptive Sampling: Let denote the number of cells in a network, and
( ) ̂
)
̂ (
̂
)
denote the total count of points of interest.
̂
∑
̂ ( ̂)
∑̅
(
)
̅
Supplemental Topics Interpenetrating Subsamples: An experimenter is interested in obtaining information from a simple random sample of people selected from a population of size . She has interviewers available to do the fieldwork, but the interviewers differ in their manner of interviewing and hence obtain slightly different responses from identical subjects. A good estimate of the population mean can be obtained by randomly dividing the sample elements into subsamples of elements each and assign one interviewer to each of the subsamples. We consider the first subsample to be a simple random sample of size selected from the elements in the total sample. The second subsample is then a simple random sample of size ) elements. This process is continued until the elements have selected from the remaining ( been randomly divided into subsamples. The subsamples are called interpenetrating samples. Now, let denote the observation in the sub-sample, where and . ̅
∑
̂
̅
̂ ( ̅)
∑̅
(
)
∑( ̅
̅)
Estimation of Means and Totals over Subpopulations: Let denote the number of elements in the population, and the number of elements in the subpopulation. A simple random sample of elements is selected from the population of elements. Let denote the number of sampled elements from the subpopulation. Let denote the sampled observation that falls in the subpopulation. We will consider the sample mean for elements from the subpopulation. ̂
̅
̂(̅ )
∑
(
)
∑(
̂
̂(
̅
∑
̂(̅ )
̅ )
(
̅ )
)
(
(
)
)
Estimator of the subpopulation total when
is unknown: ̂
∑
̂( ̂ )
(
)
Here is the sample variance calculated from an adjusted sample consisting of replacing all the observations NOT from the subpopulation of interest with zeros. The sample variance is then calculated from all the “observations”. Random-Response Model: It allows respondents to respond to sensitive issues (such as criminal behavior or sexuality) while maintaining confidentiality. Designate the people in the population with and without the characteristic of interest as groups A and B, respectively. Let p be the proportion of people in group A with the characteristic of interest. We wish to estimate this parameter by starting with a stack of cards that are ), are marked B. A identical except that a fraction, , are marked A and the remaining fraction, ( simple random sample of size n is selected from the population. Each person is asked to randomly draw a card from the deck and to state “yes” if the letter on the card agrees with the group to which they belong to, or “no” if the letter does not correspond to the group they belong to. The card is replaced before the next person draws, and the interviewer does not see the cards, just the responses of yes and no. Let be the number of people in the sample who responded yes. We have the following unbiased estimator of : ̂ ̂ ( ̂)
(
(
) )
̂(
( )
)
( ( (
)
)
)( )(
)(
)