The Normal Distribution Estimation Correlation (1)
Short Description
The Normal Distribution Estimation Correlation (1)...
Description
THE NORMAL DISTRIBUTION
-3
-2
-1
0
1
2
3
DEFINITION: A continuous random variable X is said to be normally distributed if its density function is given by:
for
and
for
constants
µ
and
σ,
where
Notation: If X follows the above distribution, we write The graph of the normal distribution is called normal curve. Properties of the normal curve: 1. The curve is bell-shaped and symmetric about a vertical axis through the mean µ. 2. The normal curve approaches the horizontal axis asymptotically as we proceed in either direction away from the mean. 3. The total area under the curve and above the horizontal axis is equal to 1.
DEFINITION: The distribution of a normal random variable with mean zero and standard deviation equal to 1is called a standard normal distribution. If
, then X can be transformed into a standard normal random variable
through the following transformation:
If X is between the values
, the random variable Z will fall between the
corresponding values:
Therefore, Examples: 1. Let Z be a standard normal random variable. That is,
. Find the following
probabilities: (see the z-table for the probabilities) A. B.
C. D.
2. Let Z be a standard normal random variable. That is A.
. Find the value of a.
B.
C.
3. Let X be a normal random variable with probabilities: A.
Therefore, the B.
Therefore, the
. Find the following
C.
Therefore, the 4. Given a test with a mean of 84 and a standard deviation of 12. A. What is the probability of an individual obtaining a score of 100 or above in this test? B. What score includes 50% of all the individuals who took the test? C. If 654 students took the examination, then how many students got a score below 60? Solution: Given: µ=84, σ=12 A.
Therefore, the probability of an individual obtaining a score of 100 or above on this test is 0.0918 or 9.18%. B. In notation form, the statement is equivalent to:
Finding the corresponding z-score of the probability 0.50, z = 0.00
From the transformation formula,
Therefore, the score that includes 50% of those who took the exam is 84. C. Given: µ=84, σ=12, N= 654
The number of students who got a score lower than 60 is equal to the product of the probability and the total number of students.
Exercise 6.2 1. Let Z be a standard normal variable. Find the following probabilities: a. b. c. d. 2. Given a normal distribution with µ= 82 and a value a. Less than 78
find the probability that X assumes
b. More than 90 c. Between 75 and 80 3. The mean weight of 500 male students at a certain college is 151 pounds. And the standard deviation is 15 pounds. Assume that the weights are normally distributed. a. How many students weigh between 120 and 155 pounds? b. What is the probability that a randomly selected male student weighs less than 128 pounds?
ESTIMATION Basic Concepts of Estimation Definition of terms: Estimator- any statistic whose value is used to estimate an unknown parameter. Estimate- a realized value of an estimator. Point Estimate- a single value used to represent the parameter of interest. Interval Estimator- a rule that tells us how to calculate two numbers based on a sample data, forming an interval within which the parameter is expected to lie. The pair of numbers (a,b) is called interval estimate or confidence interval. Level of Confidence or confidence coefficient- the degree of certainty to an interval estimate for the unknown parameter
Point Estimation of the mean and the Standard Deviation A statistic is used to estimate parameters. The following are used to estimate the parameters given below:
Parameter
Statistic
Population mean (µ) Population Standard Deviation (σ) Interval Estimation of the Mean for a Single Population Confidence Interval for µ ,σ is known If
is the mean of a random sample of size n from a population with known variance confidence interval for µ is given by µ
Note: For small samples selected from nonnormal populations, we cannot expect our degree of confidence to be accurate. However, for small samples of size , regardless of the shape of most population, sampling theory guarantees good results. To compute a
confidence interval for µ, it was assumed that
this is generally not the case,
shall be estimated by s, provided
is known. Since
Example: A survey of the delivery time of 100 orders worth P20,000 from WILLIAM’S PIZZA yielded a mean of 55 minutes with a standard deviation of 12 minutes. Assuming that the delivery time follow a normal distribution, construct a 95% confidence interval for the true mean. Solution: Given:
minutes,
12 minutes, n = 100 orders, µ
Substituting the values in the formula: µ
= 5%
we obtained:
Conclusion: The WILLIAM’S PIZZA is 95% confident that the true mean delivery time is between 52.648 minutes and 57.352 minutes. Error in Estimating the Population Mean If
is used as an estimate of µ, we can be
confident that the error will
not exceed Example: The heights of a random sample of 50 college students showed a mean of 174.5 cm and a standard deviation of 6.9 cm. What can we assert with 98% confidence about the possible size of our error if we estimate the mean height of all college students to be 174.5? Solution: Given: = 174.5 cm, = 6.9 cm, n= 50 students,
= 2%
The possible size of the error can be obtained by using
Substituting the values in the formula:
Conclusion: We can therefore conclude that we are 98% confident that the sample mean differs from the true mean height by 2.27 cm. Sample Size for Estimating the Population Mean If
is used as an estimate of µ, we can be
not exceed a specified amount e when the sample size is
confident that the error will .
Example: The monthly wage of new employees at a certain broadcasting company is said to follow a normal distribution with a standard deviation of P1,000. How large sample would be needed to be 99% confident that the sample mean will be within P300 of the true mean. Solution: Given:
,
,
= 1%
by substitution:
Conclusion: Therefore we can conclude that the sample size should be 74 employees to be 99% confident that the sample mean will be within P300 of the true mean wage. Small-Sample Confidence Interval for µ, If
is unknown
and s are the mean and standard deviation respectively, of a random sample of size
from an approximate normal population with unknown variance
,
confidence interval for µ is given by
where
is the t value with
degrees of freedom.
Note: Values for t are found in the Table of T-values Example: A random sample of 8 cigarettes of a certain brand has average nicotine content of 3.6 milligrams and a standard deviation of 0.9 milligrams. Construct a 99% confidence interval for the true average nicotine content of this particular brand of cigarettes, assuming an approximate normal distribution. Solution:
Given:
,
0.9 milligrams, n = 8 cigarettes,
= 1%
with by substitution:
we obtained:
Conclusion: Therefore we can conclude that we are 99% confident that the true average nicotine content of a certain brand of cigarette is within 3.2818 milligrams and 3.9182 milligrams. Exercise 7. 1. An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with a standard deviation of 40 hours. If a random sample of 30 bulbs has an average life of 780 hours, find a 96% confidence interval for the population mean of all bulbs produced by this firm. How large a sample is needed if we wish to be 96% confident that our sample mean will be within 10 hours of the true mean? 2. The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2 and 9.6 liters. Find a 95% confidence interval for the mean content of all such containers, assuming an approximate normal distribution for container contents. 3. A random sample of 100 PUJ (Public utility jeep) shows that a jeepney is driven on the average 24,500 km per year, with a standard deviation of 3,900 km. a. Construct a 99% confidence interval for the average number of kilometer a jeepney is driven annually. b. What can we assert with 99% confidence about the possible size of our error if we estimate the average number of km driven by jeepney drivers to be 23,500 km per year? 4. Suppose that the time allotted for commercials on a primetime TV program is known to have a normal distribution with a standard deviation of 1.5 minutes. A study of 35 showings gave an average commercial time of 10 minutes. Compute for the maximum error. Construct a 95% confidence interval for the true mean.
5. A random sample of 12 female students in a certain dorm showed an average weekly expenditure of P750 for snack foods, with a standard deviation of P175. Construct a 90% confidence interval for the average amount spent each week on snack foods by female students living in this dormitory, assuming the expenditures to be approximately normally distributed. 6. The mean and standard deviation for the quality grade point averages of a random sample of 28 college seniors are calculated to be 2.6 and 0.3 respectively. Find the 95% confidence interval for the mean of the entire senior class. How large a sample is required if we want to be 95% confident that our estimate of µ is not off by more than 0.05? 7. To estimate the average serving time at a fast food restaurant, a consultant noted the time taken by 40 counter servers to complete a standard order (consisting of 2 burgers, 2 large fries and 2 drinks). The servers averaged 78.4 seconds with a standard deviation of 13.2 seconds to complete the orders. What can the consultant assert with 95% confidence about the maximum error if he uses seconds as an estimate of the true average time required to complete this standard order? 8. A company surveyed 4400 college graduates about the lengths of time required to earn their bachelor’s degrees. The mean is 5.15 years, and the standard deviation is 1.68 years. Based on these sample data, construct the 99% confidence interval for the mean time required by all college graduates. 9. In a time-use study, 20 randomly selected managers were found to spend an average of 2.4 hours each day on paperwork. The standard deviation of the 20 observations is 1.30 hours. Construct a 95% confidence interval for the mean time spent on paperwork by managers. 10. In a study of physical attractiveness and mental disorders 231 subjects were rated for attractiveness, and the resulting sample mean and standard deviation are 3.94 and 0.75, respectively. Determine the sample size necessary to estimate the sample mean, assuming you want a 95% confidence and a margin of error of 0.05. 11. The number of incorrect answers on a true-false test for a sample of 15 students was recorded as follows: 2, 1, 3, 0, 1, 3, 6, 0, 3, 3, 5, 2, 1, 4, 2. Estimate the variance. 12. In a study of the use of hypnosis to relieve pain, sensory ratings were measured for 16 subjects, with the results given below. Use these sample data to estimate the mean. 8.8 6.2 7.7 7.4 6.4 6.1 6.8 9.8 8.3 11.9 8.5 5.2 6.1 11.3 6.0 10.6
CORRELATION ANALYSIS
A correlation exists between two variables when one of them is related to the other in some way. Correlation Analysis attempts to measure the strength of relationships between two variables by means of a single number called a correlation coefficient r. The linear correlation coefficient r measures the strength of the linear relationship between the paired x and y values in the sample. This is also referred to as the Pearson product moment correlation coefficient in honor of Karl Pearson who originally developed it. The formula is given below:
n x y x y i i i i r 2 2 2 2 n x x n y y i i i i
Since r is computed from the sample data, it is a sample statistic. Interpretation of the values of r r=1 : perfect positive correlation between X and Y 0.5 r < 1
: strong positive correlation between X and Y
0 < r < 0.5
: positive correlation between X and Y
r=0
: zero correlation
-0.5 < r < 0
: negative correlation between X and Y
-1 < r -0.5
: strong negative correlation between X and Y
r = -1
: perfect negative correlation between X and Y
Zero correlation means lack of linearity and not lack of association. r measures the strength of the linear relationship. It is not designed to measure the strength of a relationship that is not linear. The value of r is always between –1 and 1, that is –1 r 1 . (rounding off should be at least up to 3 decimal places) Common errors in interpreting the results: 1. We must be careful to avoid concluding that a significant linear correlation between two variables is a proof that there is a cause-effect relationship between them. 2. No significant linear correlation does not mean X and Y are not related in any way. 3. Rounding errors can wreak havoc with the results. Round the linear correlation coefficient to three decimal places.
Examples: For numbers 1 to 4, identify the error in the stated conclusion and write the correct conclusion. 1. Given: The paired sample data result in a linear correlation coefficient very close to zero. Conclusion: The two variables are not related in any way. 2. Given: There is a strong positive linear correlation between smoking and cancer. Conclusion: Smoking causes cancer. 3. Given: x = age
y = test score r = 0.40
Conclusion: Older people tend to get lower scores. 4. Given: There is a strong positive linear correlation between income and spending. Conclusion: Increased spending is caused by increased income. 5. Ten students from the College of Business Administration were chosen to become respondents in a study conducted to determine the relationship between the grades of students ( X ) with their number of hours studying ( Y ). After computing the degree of relationship, it was found out to be 0.575. What would be the conclusion? 6. The data on yearly consumption of cigarettes in the Philippines and the percentage of the country’s population admitted to mental institutions as psychiatric cases were collected for 8 years. The correlation coefficient r = 0.61. What can we conclude about the data? 7. The temperature in a certain locality and number of pregnant women were found to have a strong negative correlation. What would be the right conclusion? EXAMPLES: Construct a scatter diagram, find r and interpret the results. 1. X
2
3
7
12
16
20
22
14
20
9
14
5
1
15
2. X
9
4
5
4
2
6
3
7
2
8
Y
8
5
8
4
3
4
4
10
4
10
Y
3. X
2
4
6
8
10
12
Y
6
12
18
24
30
36
4. X
25
64
75
35
86
15
19
66
37
9
12
9
47
Y
90
3
85
70
67
45
22
12
85
66
54
16
24
83
5. X
3 4 3 4 5 6 5 6 7 8 7 8 9 11 9 10
Y
15 17 3 4 5 21 23 13 11 12 25 6 7 9 16 7
EXERCISES A. Construct a scatter diagram, find r and interpret the results. 1. Grades of 6 students selected at random MATH GRADE ( X ) ENGLISH GRADE (Y)
74
70
92
80
74
65
84
63
87
78
90
2. The data below consists of weights in pounds of discarded paper and size of households X (paper)
2.41
7.57
9.55
8.82
8.72
6.96
6.83
11.42
Y (household size)
2
3
3
6
4
2
1
5
3.The data below consists of number of persons in the household and the number of cars they own X (household size)
2
4
4
2
2
1
2
3
Y (cars)
0
2
2
1
1
3
0
2
2
4. The data below consists of age and the income in thousands of dollars
5
Age
60
63
51
25
47
Income 43.4 18.8 14.4 29.4 19.4
56
19
24
25
20
66
19
48
52
83 10.4 12.6 36.4 29.6 17.2 17.2
67
33 37.4
5. A teacher is interested in knowing whether or not two IQ tests produce linearly related scores. A sample of 10 students was taken randomly. Five students took Test 1 and 5 students took Test 2 in the morning. In the afternoon, those who took Test 1 took Test 2 and vice versa. The results are shown in the table below: STUDENT
TEST 1 (X)
TEST 2 (Y)
A
125
114
B
145
127
C
110
126
D
120
116
E
124
108
F
110
100
G
121
129
H
142
131
I
100
96
J
126
113
a. Plot a scatter diagram for these data. b. Solve for r. c. How well do the two tests relate linearly? Explain. 6. In a study of factors that affect success in a calculus score, data were collected for 10 different persons. Scores on an Algebra placement tests are given, along with Calculus achievement scores. a. Plot a scatter diagram for these data. b. Find the value of the linear correlation coefficient r.
27
c. Test the significance of r at = 0.05.
ALGEBRA SCORE (X) CALCULUS SCORE (Y)
17
21
11
16
15
11
24
27
19
8
73
66
64
61
70
71
90
68
84
52
7. One study was conducted to determine the relationship between the age and systolic blood pressure of 12 women. Age ( X )
Systolic Blood Pressure ( Y )
56
147
42
125
72
160
36
118
63
149
47
128
55
150
49
145
38
115
42
140
68
152
60
155
a. Plot a scatter diagram for these data. b. Solve for r and interpret. c. What can you conclude about the relationship between age and systolic blood pressure of women? Explain statistically.
View more...
Comments