Estimation
July 13, 2016 | Author: Utkarsh Singh | Category: N/A
Short Description
Sample estimation...
Description
INFERENTIAL STATISTICS
Statistical inference may be divided into two major areas: • Estimation
• Testing of Hypothesis
ESTIMATION Terms • In theory of estimation, STATISTIC is renamed as ESTIMATOR (
x)
Value of an estimator is called ESTIMATE
Point Estimate is a SINGLE VALUE of the estimator, obtained from available sample observations Example : proportion of vegetarians in a random sample of 50 PGP students can be a point estimate of the corresponding proportion in the population of all PGP students. Interval Estimate (Confidence Interval) is an INTERVAL that provides an upper and lower bound for a specific unknown population parameter. Ex: The interval (45, 52) may contain the true proportion of vegetarians among all PGP students with 95% confidence. Point estimate is always within the interval estimate
POINT ESTIMATION
Point Estimators and Their Properties An estimator of a parameter is a statistic used to estimate the parameter. The most commonly-used estimator of the: Population (Parameter) Estimator (statistic) Mean () is the Mean (X) Variance (2) is the Variance (s2) Standard Deviation () is the Standard Deviation (s) Proportion (p) is the proportion ( p ) Difference of means (1 2 ) is the difference of means( x1 x2 )
•
Desirable properties of estimators include: Unbiasedness Efficiency Consistency
{
Unbiased and Biased Estimators
Bias
An unbiased estimator is on target on average.
A biased estimator is off target on average.
Properties of estimator: Unbiasedness T is said to be an unbiased estimator of θ iff E(T)=θ Example: SAMPLE MEAN IS THE ESTIMATOR OF POPULATION MEAN
1 n 1 n 1 n E ( x ) E ( xi ) E ( xi ) E ( x ) n i 1 n i 1 n i 1
Example of biased estimator: Sample variance. Given sample of size n from the population with unknown mean () and variance (2) we estimate mean as we already know and variance (intuitively) as: 2 1 n 1 n 2 2 T ( xi x ) xi x n i 1 n i 1
Sample variance is not an unbiased estimator for the population variance. That is why when mean and variance are unknown the following equation is used for sample variance: 1 n
E (T )
E (x n i 1
2 i
) E(x 2 )
1 n [var( xi ) ( E ( xi )) 2 ] [var( x ) ( E ( x )) 2 n i 1 1 2 2 [ ] [ 2 ] n i 1 n n 1 2 n n
2
1 n 2 s ( x x ) i n 1 i 1 2
Consistency • A consistent estimator converges towards the parameter being estimated as the sample size increases. i.e. E (T ) and V (T ) 0 as n
n = 10 McGraw-Hill/Irwin
n = 100 © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Efficiency An estimator is efficient if it has a relatively small variance (and standard deviation).
An efficient estimator is, on average, closer to the parameter being estimated..
An inefficient estimator is, on average, farther from the parameter being estimated.
sample mean vs sample median E(sample mean)=μ E(sample median)=μ V(sample mean) is σ2/n
V(sample median) is 1.57σ2/n
Example Suppose, you want to estimate mean and sd of score of a batsman in one day cricket. So, you have randomly chosen 5 different innings and recorded scores as below 20 52 8 63 11 Find out unbiased estimator of mean and variance.
x 30.8 1 n 2 ˆ s ( x x ) 25.07 i n 1 i 1
Interval Estimator = Point Estimator ± Margin of Error
Elements of Interval Estimation
A Probability That the Population Parameter Falls Somewhere Within the Interval. Sample Confidence Interval Statistic or Point Estimator Confidence Limit (Lower)
Confidence Limit (Upper)
Elements of Interval Estimation • Confidence Coefficient/Level : Probability that the confidence interval will contain true parameter • Denoted by (1 - α) % e.g. 90%, 95%, 99% : Probability that the interval does not contain the parameter
The confidence coefficient is the area under the curve of the sampling distribution.
Interval Estimator (Large Sample) Confidence Intervals Mean
Known known
Proportion
unknown
Confidence Interval of ( known) Assumption –
Population Standard Deviation, is Known
–
Sample size is large
Confidence Interval Estimator of :
Let x1 , x2 ,..., xn be an iid random sample of size n, drawn from a population with mean and sd . 100(1-)% Confidence Interval of is
x z / 2
n
x z / 2
n
The quantity z / 2 the sampling error. n
For example, if: n = 30 = 20 x = 122
is often called the margin of error or
A 95% confidence interval: =0.05 ; /2=0.025
z0.025 1.96
20 x 1.96 122 1.96 n 30 122 (1.96)(3.65) 122 7.15
114.85,129.15
What is happening? Sampling Distribution of the Mean 0.4
95% f(x)
0.3
0.2
0.1
2.5%
2.5%
0.0
x 1.96 n
x 1.96
x
n
x x
2.5% fall below the interval
x x x
2.5% fall above the interval
x
x x x
95% fall within the interval
Background We define z as the z value that cuts off a right-tail area of under the standard 2 2 normal curve.
P z > z /2 2 P z < z /2 2 < < P z z z (1 ) 2 2
S t a n d ard N o r m al Dis trib utio n 0.4
(1 )
f(z)
0.3
0.2
0.1
2
2
(1- )100% Confidence Interval:
0.0 -5
-4
-3
-2
-1
z
2
0
1
Z
z
2 2
3
4
5
x z 2
n
Critical Values of z and Levels of Confidence 0.99 0.98 0.95 0.90 0.80
2
0.005 0.010 0.025 0.050 0.100
S t a n d a rd N o r m al Di s trib utio n
z
0.4
(1 )
2
2.576 2.326 1.960 1.645 1.282
0.3
f(z)
(1 )
0.2
0.1
2
2 0.0 -5
-4
-3
-2
-1
z
2
0
1
Z
z
2 2
3
4
5
Example 10.1
Confidence level and the Width of the Confidence Interval When sampling from the same population, using a fixed sample size, the
higher the confidence level, the wider the confidence interval. S t a n d a r d N o r m al Di s tri b u ti o n
0.4
0 .4
0.3
0 .3
f(z)
f(z)
S t a n d a r d N o r m al Di s tri b uti o n
0.2
0.1
0 .2
0 .1
0.0
0 .0 -5
-4
-3
-2
-1
0
1
2
3
4
5
-5
-4
-3
-2
-1
Z
1
2
3
4
Z
80% Confidence Interval: x 128 .
0
n
95% Confidence Interval: x 196 .
n
5
Sample Size and the Width of the Confidence Interval When sampling from the same population, using a fixed confidence level, the larger the sample size, n, the narrower the confidence interval. S a m p lin g D is trib u tio n o f th e Me a n
S a m p lin g D is trib u tio n o f th e Me a n
0 .4
0 .9 0 .8 0 .7
0 .3
f(x)
f(x)
0 .6 0 .2
0 .5 0 .4 0 .3
0 .1
0 .2 0 .1
0 .0
0 .0
x
95% Confidence Interval: n = 20
x
95% Confidence Interval: n = 40
Confidence Interval of ( unknown)
Assumption –
Population Standard Deviation, is unknown
–
Sample size is large
Confidence Interval Estimator of : Let x1 , x2 ,..., xn be an iid random sample of size n, drawn from a large sample with mean and sd . 100(1-)% Confidence Interval of is
x z / 2
s n
x z / 2
where s
1 n 2 x x i n i 1
s n
Practice Problem 1: • A manufacturer of light bulbs claims that its light bulbs have a mean life hours with a standard deviation of 85 hours. A random sample of 40 such bulbs is selected for testing. If the sample produces a mean value of 1505 hours, find out 95% Confidence Interval of . Solution: Given, n=40 (large), =85 (known), 1-=0.95, =0.05, x 1505
z / 2 z 0.025 1.96
Therefore, 95% CI of is given by
85 85 1505 40 1.96 , 1505 40 1.96 1478.66 , 1531.34
Practice Problem 2: • Waiting times (in hours) at a popular restaurant are found to have a mean waiting time of 1.52 hours with sd 2.25hrs. for a sample of 50 customers. Construct the 99% confidence interval for the estimate of the population mean. Solution: Given, n=50 (large), s=2.25 (estimated), 1-=0.99, =0.01,
z / 2 z 0.005 2.58 Therefore, 99% CI of is given by
x 1.52
2.25 2.25 2.58 , 1.52 2.58 1.52 50 50 1.20 , 2.34
Large-Sample Confidence Intervals for the Population Proportion, p The estimator of the population proportion, p , is the sample proportion, p . If the sample size is large, p has an approximately normal distribution, with E( p ) = p and pq V( p ) = , where q = (1 - p). When the population proportion is unknown, use the n estimated value, p , to estimate the standard deviation of p . For estimating p , a sample is considered large enough when both n p an n q are greater than 5.
Large-Sample Confidence Intervals for the Population Proportion, p • Assumptions – Two Categorical Outcomes – Population Follows Binomial Distribution – Large Sample 100(1-)% Confidence Interval for population proportion p is given by pˆ Z / 2
pˆ (1 pˆ ) p pˆ Z / 2 n
pˆ (1 pˆ ) n
Practice Problem 3: A marketing research firm wants to estimate the share that foreign companies have in the Indian market for certain products. A random sample of 100 consumers is obtained, and it is found that 34 people in the sample are users of foreign-made products; the rest are users of domestic products. Give a 95% confidence interval for the share of foreign products in this market.
p z 2
pq ( 0.34 )( 0.66) 0.34 1.96 n 100 0.34 (1.96)( 0.04737 ) 0.34 0.0928 0.2472 ,0.4328
Thus, the firm may be 95% confident that foreign manufacturers control anywhere from 24.72% to 43.28% of the market.
Reducing the Width of Confidence Intervals The Value of Information The width of a confidence interval can be reduced only at the price of: • a lower level of confidence, or • a larger sample. Lower Level of Confidence
Larger Sample Size Sample Size, n = 200
90% Confidence Interval p z 2
pq (0.34)(0.66) 0.34 1645 . n 100 0.34 (1645 . )(0.04737) 0.34 0.07792 0.2621,0.4197
p z 2
pq (0.34)(0.66) 0.34 196 . n 200 0.34 (196 . )(0.03350) 0.34 0.0657 0.2743,0.4057
Interval Estimator (Small Sample) Confidence Intervals Mean
Known known
unknown
Confidence Interval of ( known)
Assumption
– Population Distribution is Normal
– Population Standard Deviation, is known Confidence Interval Estimator of :
Let x1 , x2 ,..., xn be an iid random sample of size n, drawn from a normal distribution with mean and sd . 100(1-)% Confidence Interval of is
x z / 2
n
x z / 2
n
Confidence Interval of ( unknown) Assumption – Population Distribution is Normal – Population Standard Deviation, is unknown
Confidence Interval Estimator of : Let x1 , x2 ,..., xn be a random sample of size n, drawn from normal with mean and sd . 100(1-)% Confidence Interval of is
x t / 2,n 1
s n
x t / 2,n 1
where t / 2 is the value of the t distribution with n-1 degrees of freedom that cuts off a tail area of to its right.
where s
s n 1 n 2 x x i n 1 i 1
Practice Problem 4: A stock market analyst wants to estimate the average return on a certain stock. A random sample of 15 days yields an average (annualized) return of x 10.37% and a standard deviation of s = 3.5%. Assuming a normal population of returns, give a 95% confidence interval for the average return on this stock. The critical value of t for df = (n -1) = (15 -1) =14 and a righttail area of 0.025 is:
𝑠`2
15 2 𝑠 14
t 0.025 2.145 = 13.125; 𝑠` = 3.623
= The corresponding confidence interval or interval estimate is: x t 0.025
s n
10.37 2.145 10.37 1.81 8.56,12.18
3.623 15
Sample-Size Determination Before determining the necessary sample size, three questions must be answered:
• How close do you want your sample estimate to be to the unknown •
•
parameter? (What is the desired bound, B?) What do you want the desired confidence level (1-) to be so that the distance between your estimate and the parameter is less than or equal to B? What is your estimate of the variance (or standard deviation) of the population in question?
For example : (1 - )% Confidence Interval for : x z
n
}
2
Bound, B
Minimum Sample Size: Mean and Proportion Minimum required sample size in estimating the population mean, : z2 2 n 2 2 B Bound of estimate : B (Known)
Minimum required sample size in estimating the population proportion, z2 pq n 2 2 B
Example 1 A marketing research firm wants to conduct a survey to estimate the average amount spent on entertainment by each person visiting a popular resort. The people who plan the survey would like to determine the average amount spent by all people visiting the resort to within $120, with 95% confidence. From past operation of the resort, an estimate of the population standard deviation is s = $400. What is the minimum required sample size? z 2
n
2
2
B
2
2
(1.96) ( 400) 120 2
42.684 43
2
Example 2 The manufacturers of a sports car want to estimate the proportion of people in a given income bracket who are interested in the model. The company wants to know the population proportion, p, to within 0.01 with 99% confidence. Current company records indicate that the proportion p may be around 0.25. What is the minimum required sample size for this survey?
n
z2 pq 2
B2
2.5762 (0.25)(0.75) 010 . 2 124.42 125
Problem NDTV randomly selected 10,000 final year students across different management schools in India and asked them about their career choices. 4% said they want to take the plunge and start their own companies even if that meant giving up lucrative job offers from established MNCs. Find a 99% confidence interval of the true population proportion of management students in India who want to work their start-ups.
View more...
Comments