Statistics and Probability - Solved Assignments - Semester Spring 2010

March 4, 2019 | Author: Muhammad Umair | Category: Bias Of An Estimator, Statistics, Median, Standard Deviation, Confidence Interval

Share Embed Donate

Report this link

Short Description

Statistics and Probability - Solved Assignments - Semester Spring 2010...

Description

Statistics and Probability Solved Assignments Semester Spring 2010

Assignment 1 Question 1: 2+2+2+4=10 (a)

(Marks:

Give an answer of the followings: •

For a series, mean is 5 and mode is 2, find median of the series

Given that Mean =5 and mode = 2 Now we will find the median by using the empirical relationship among the three measures i.e.

mod e = 3median − 2mean 1 median = (mod e + 2mean) 3 1 1 median = (2 + 2 × 5) = (12 ) = 4 3 3 •

What is aim of collecting numerical data for a statistical study?

The main purpose of a statistical study is to make inference about population on the basis of sample data. So to get descriptive information from sample, we need data. And collection of numerical data provides the BASIS for the analysis of data to carry out further steps. •

Write down the functions of statistics.

1. Statistics assists in summarizing the larger set of data. 2. Statistics assists in the efficient design of laboratory and field experiments as well as surveys.

3. Statistics assists in a sound and effective planning in any field of inquiry. (b) A paint retailer has had numerous complaints from customers about under-filled paint cans. As a result retailer started to inspect the incoming shipments. A recent shipment contained 2,440 gallon-size cans. The retailer sampled 50 cans and weighted each on a scale capable of measuring weight up to four decimal places and properly filled cans weight 10 pounds. Now for this problem 1. 2. 3. 4. Sol:

Describe a population Describe a variable of interest Describe the data type of variable Describe a sample

Reading the question statement, we know that a) The population is the set of units of interests to the retailer, which is the shipment of 2,440 cans of paint. b) The weight of paint cans is the variable, the retailer wishes to evaluate. c) In this case retailer has to measure the weight, and the weight is continuous quantitative variable. d) The sample is the subset of population. In this case, it is the 50 cans of paint selected by the retailer.

Question 2: 2+2+6=10

Marks:

(a) How collection of data is performed with the help of enumerators. Under this method, the information is gathered by employing trained enumerators who assist the informants in making the entries in the schedules or questionnaires correctly. This method gives the most reliable information if the enumerator is well-trained, experienced and tactful.

(b) Average height of the students in a school is 5.2 inches. A sample of 12 students showed the following heights in inches. 5.0, 5.3, 5.2, 4.9, 4.11, 5.0, 5.5, 5.4, 5.1, 5.0, 5.2, 4.10 Calculate the sampling error. Sol:

As µ=5.2 and sample mean of the data is x=

∑ x = 59.81 = 4.98 n

12

Sampling error = x − µ

=4.98-5.5=-0.22 (c) Find the missing frequencies and complete the following table.

x

f

C.f

2 4

2/15 1

6 8

Relative Cumulative frequency

7 3

10

15

1

As the relative frequency= class frequency/total =2/15 So, First class has 2 frequency and in cumulative first class frequency is the first cumulative frequency so first cumulative will also be 2 Now if we add 1 and 2 we will get 3 which is third cumulative frequency. The difference between 7 and 3 is 4, so 4 will be the 3rd class frequency Add 7 and 3 will give 10 which is the 4th cumulative frequency And last cumulative frequency is the total no of all the frequencies the difference between 10 and 15 will generate 5 which is the last class frequency

BY dividing all the also frequencies we can obtain the relative frequencies.

x

f

c.f

Cumulativ Relative frequency

2

2

2

2/15

4

1

3

3/15

6

4

7

7/15

8

3

10

10/15

10

5

15

15/15

Question 3: =2+8=10

Marks:

a) Can we find out the Median from the following data? If yes, write the reason (No need to calculate the median). Wages of workers in a factory Monthly Income (Rs.)

NO. of Workers

Less than 2000/-

100

2000-2999/-

300

3000-3999/-

250

4000-4999/-

50

5000 & above

1200

Sol: Yes we can find the median from the data as median is the most appropriate measure of average when data is in open ended class intervals. (b) Compute Mean, Median and Mode from the following data. No. of students f

1

2

3

5

6

15

10

5

15

5

Sol: No. of students(x)

f

fx

c.f

1

15

15

15

2

10

20

25

3

5

15

30

5

15

75

45

6

5

30

50

Total

50

155

Mean=

X=

∑ fx = 155 / 50 ∑f

= 3.1 Since n/2 =50/2=25 is an integer so, median will be the averages of (n/2)th value and {(n+2)/2}th value, n median = ( ) th value 2 50 = ( ) th value 2 = 25 th value

and n+2 )th value 2 50 + 2 =( )th value 2 = (52 )th value 2 = 26 th value

median = (

Now we check the 25th value and the 26th value in the cumulative frequency column and found that these values lie corresponds to 2 & 3 respectively. So Median= (2+3)/2

=2.5 Mode As the data is discrete, so mode would be that value; which occur maximum no. of times in the data set and here we have two modes 1 and 5, as they both occur equal no. of times in the data set i.e. 15 times.

Assignment 2 Question 1: 4x2=8)

(Marks:

Give the answer of short questions. a) Why Quartile deviation is better than the Rang?

Range is only the difference between the minimum and maximum value. It gives no information about the distribution between two ends of series and it is affected by outliers (highly extreme values). Hence it can draw misleading/false picture of the observation. The quartile deviation is superior to range as it is not affected by extremely large or small observations. It covers the central 50% of values. It is also used in situations where extreme observations are thought to be unrepresentative.

b) How standard deviation is better than mean absolute deviation?

Both are used to measure the dispersion of the data set and involve each and every data-value in their computation. But in mean deviation, while using the absolute values we neglect the fact that some deviations are negative and some are positive. We introduce a kind of artificiality in Mean Deviation and because of that the further theoretical development or application of the concept is impossible. This problem is overcome by computing the standard deviation. This problem is overcome by computing the Standard Deviation. We square the deviations in Standard Deviation rather than taking absolute values of the deviations. That’s why standard deviation is much preferred and widely used measure of dispersion.

c) What is the uselessness of Chebyshev’s Theorem? A limitation of the Chebychev's theorem is that it gives no information at all about the probability of observing a value within one standard deviation of the mean. That is when the value of constant “k” is one. Although huge amount of data fall within µ ± σ , this can not be explained by this theorem.

d) If coefficient of skewness = 0, then what would you say about the skewness of the distribution?

If the coefficient of skweness = 0, then it is a symmetrical distribution. That’s mean, median and mode of distribution is equal.

Question 2: 4+8=12)

(Marks:

a) Show that the range is greatly affected by the extreme values; interpret the result. 996

999

9

997

995

1000

1014

1002

9

997

995

1000

1014

1002

1001

1001

Solution: Given that 996

999

Then Range=Xm-X0 =1014-9 =1005 Interpretation: Observing the values closely, we find that value ‘9’ is significantly smaller than the rest of values in the data set. And since range depends on this value too, this single value has caused the range of the data set to be wider and it is presenting a misleading picture about the whole data.

b) The mean and the standard deviation of a set of values is 50 and 10 respectively. Compute X ± 2 S and X ± 3S . Interpret the results in the light of (i) empirical rule (ii) Chebyshe’s inequality.

Solution: From the given information X ± 2 S = 50 ± 2(10) = (30, 70) X ± 3S = 50 ± 3(10) = (20,80)

(i)

Empirical Rule: •

According to empirical rule, in a normal distribution, the interval X ± 2 S contains 95.45% values. So here we can say that the 95.45% of the data lies in the interval the (30, 70).

•

According to empirical rule, in a normal distribution, the interval X ± 3S contains 95.45% values. So we can say that the 99.73 % values lie within interval (20, 80).

(ii)

Chebychev’s inequality: •

According to Chebychev’s inequality, the interval X ± 2 S contains at least 1   1  3  1 − 2  = 1 − 2  = = 75% of the observations. So we can say that by this rule, 75%  k   2  4 values of given data lies in the interval (30, 70).

•

According to Chebychev’s inequality, the interval X ± 3S contains at least 1   1 8  1 − 2  = 1 − 2  = = 88.89% of the observations. So we can say that by this rule,  k   3  9 88.99% values of given data lies in the interval (20, 80).

Question 3: 5+5=10)

(Marks:

a. Find the first two moments about mean from the following data. X= 34, 70, 42, 54, 40, 68, 56, 38, 36, 72 Solution: To find the moments about mean we have to find the mean of the data.

X

X −X

(X − X )2

34

-17

289

36

-15

225

38

-13

169

40

-11

121

42

-9

81

54

3

9

56

5

25

68

17

289

70

19

361

72

21

441

0

2010

Mean:

X=

ΣX 510 = = 51 n 10

Firs moment is given by

m1 =

∑ (x i − x ) n

=0

Second moment is given by 2 ( xi − x ) ∑ m2 = n 2010 = = 201 10

b) Calculate Bowley’s coefficient of skewness from the following information.

Q1 = 34.087156 Q3 = 44.962963

Xɶ = 39.606382

Solution: Bowley’s co-efficient of skew ness:

(Q1 + Q3 − 2 Median) Q3 − Q1 34.087156 + 44.962963 − 2(39.606382) Sk = 44.962963 − 34.087156 −0.162645 Sk = 10.875807 Sk = −0.014954752 Sk =

Assignment 3 Question 1: 3+3+4=10

Marks:

a) For a particular data with five pair of values:

∑Y

2

= 26, ∑ Y = 10, ∑ XY = 37

The fitted line is

y = -1+0.5x

Find the standard error of estimate ( s yx )

Solution:

s yx = =

∑Y

2

− a ∑ Y − b∑ XY n−2

26 − ( −1)(10 ) − ( 0.5 )( 37 ) 5−2

=

26 + 10 − 18.5 3

=

17.5 = 5.833 = 2.415 3

b) Two equations of the least square regression lines are given by Y= 2.64 + 10.83 X And X= -1.91 + 6.18 Y Are these lines possible for any data set? Explain your answer: Solution: These lines are possible only if the square root of the product of two slopes “r” lies between -1 and +1. The correlation coefficient “r” in this case is given blow.

r = byx × bxy r = 10.83 × 6.18 r = 66.93 = 8.18 > 1 So these lines are not possible for any data sets.

c) Two dice are rolled. Make a sample space also find the probability that i. The sum of the outcomes is equal to 10. ii. The sum of the outcomes is equal to 7. iii. The sum of the outcomes is equal to 1. Solution: S= {(1, 1), (1, 2), (1, 3), (1, 4) (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)} n(S) = 36 Let A be the event that sum of the outcomes is equal to 10. A = {(4, 6), (5, 5), (6, 4)} n( sum 10) 3 = = 0.0833 n( S ) 36 Let B be the event that sum of the outcomes is equal to 7. P (Sum is A) =

B = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}

n( sum 7) 6 = = 0.167 = 6/36 n( S ) 36 Let C be the event that sum of the outcomes is equal to 1. P(B) =

C = {φ } P(C) = Question 2: 4+6=10

n( sum1) 0 = =0 n( S ) 36 Marks:

a) If S= {1, 2, 3, 4, 5, 6}, A = {1, 2, 3, 4} and B = {3, 4, 5, 6}, then verify whether A and B are independent? Solution: AS

S= {1, 2, 3, 4, 5, 6}, A = {1, 2, 3, 4} and B = {3, 4, 5, 6}, then

For independent events

P ( A ∩ B ) = P ( A) × P ( B ) So we will check this condition

A ∩ B = {3, 4} P (A ∩ B) = 2/6 P (A) = 4/6 P (B) = 4/6 Since, P (A) x P (B) = 4/6 x 4/6 P (A) x P (B) = 4/9 P (A) x P (B) ≠ P (A ∩ B) Hence A and B are not independent. b) Indicate whether the following statement is true or false for three mutually exclusive events A, B and C. Justify your answer. 1 2 1 1 1 P( A) = , × P( B) = and × P(C ) = 6 3 6 4 6 Solution: Given that 1 P( A) = 6 And 2 1 .P( B) = 3 6 1 3 3 ⇒ P( B ) = × = 6 2 12 3 ⇒ P( B ) = 12

Now 1 1 .P ( C ) = 4 6 1 4 4 P (C ) = × = 6 1 6 For three events to be mutually exclusive there sum must be equal to one 1 3 4 13 P ( A) + P ( B ) + P (C ) = ( ) + ( ) + ( ) = ≠1 6 12 6 12 Hence we can say that the given statement is not true. Question 3: 2+8=10

Marks:

a) If we draw a card from an ordinary deck of 52 playing cards. Can king and diamond be mutually exclusive events? Give reason to support your answer. Solution: The both events can not be mutually exclusive because if we draw a card from an ordinary deck of 52 playing cards it can be both a king and a diamond. So they are not mutually exclusive events.

b) A marble is drawn at random from a box containing 10 red, 30 white, 20 blue and 15 orange marbles. Find the probability that the drawn marble is i. ii. iii. iv.

orange or red not – ‘red or blue’ not blue red, white or blue.

Solution: Red marbles White marbles 10 30

Blue marbles 20

Orange marbles 15

Total number of possible ways to draw a marble = ℂ175 = 75 15 + 10 1 i. P(marble is orange or red) = = = 0.33 75 3 30 + 15 45 3 ii. P(marble is not – ‘red or blue’) = = = = 0.60 75 75 5 10 + 30 + 15 55 11 iii. P(marble is not blue’) = = = = 0.73 75 75 15

Total 75

iv.

P(marble is red, white or blue) =

10 + 30 + 20 60 4 = = = 0.80 75 75 5

Assignment 4 Question 1: 3+7=10

Marks:

a) Find mean from the following probability distribution.

No. of Petals X x1 = 3 x2 = 4 x3= 5 x4 = 6 x5 = 7 x6 = 8 x7 = 9 Total

P(X) 0.05 0.10 0.20 0.30 0.25 0.075 0.025 1

Sol: No. of Petals X x1 = 3 x2 = 4 x3= 5 x4 = 6 x5 = 7 x6 = 8 x7 = 9 Total

The mean of this distribution is: µ = E(X) = ∑XP(X) = 5.925 ≅ 5.9.

P(X) 0.05 0.10 0.20 0.30 0.25 0.075 0.025 1

XP(X) 0.15 0.4 1 1.8 1.75 0.6 0.225 5.925

b) A random variable X has the following probability distribution: X -2 -1 0 1 2 3

P(X) 0.1 k 0.2 2k 0.3 3k

Find (i) K (ii) P(X

Statistics and Probability - Solved Assignments - Semester Spring 2010

Short Description

Description

Comments

We need your help!