Durham Maths Question Set - Stats Concepts II 13-14 All Merged

November 18, 2017 | Author: XeroXenith | Category: Bias Of An Estimator, Variance, Poisson Distribution, Confidence Interval, Estimator
Share Embed Donate


Short Description

Question set and solutions for Durham Stats Concepts II course as set in 2013-2014...

Description

Statistical Concepts 2013/14 – Sheet 0 – Probability Revision For this course, it is crucial that you have excellent knowledge of the material presented in the first-year Probability course. These exercises are strongly recommended. You may wish to go over the lecture notes and summary handouts of the Probability course if you struggle with some of these exercises. Some of these may be discussed in class, and full solutions will be handed out. 1. I have ten coins in my pocket. Nine of them are ordinary coins with equal chances of coming up head and tail when tossed, and one has two heads. (a) If I take one of the coins at random from my pocket, what is the probability that it is the coin with two heads? (b) If I toss the coin and it comes up heads, what is the probability that it is the coin with two heads? (c) If I toss the coin a further n times and it comes up heads every time, what is the probability that it is the coin with two heads? (d) If I toss the coin one further time and it comes up tails, what is the probability that it is one of the nine ordinary coins? 2. Let U be a random quantity which has probability density function ( 1 u ∈ [0, 1] f (u) = 0 otherwise Calculate (a) P [U ≥ 0.3] (b) E [U ] (c) Var [U ] (d) E [log(1 + U )] (e) If the random quantity Y has a uniform distribution on the interval [a, b], express Y in terms of U above and hence find E [Y ] and Var [Y ]. 3. Let Y be a random quantity which has a Poisson distribution. Suppose that E [Y ] = 3. (a) What is Var [Y ]? (b) Suppose that I take a large number of independent random quantities, each with the same distribution as Y . Why should I suppose that a normal distribution would be a good approximation to the distribution of their average Y¯ ? (c) Use part (b) to calculate an interval which would have approximately a 98% probability of containing Y¯ based on 100 such random quantities.

4. An entomologist has a large colony of ants which he knows contains just two types, A and B, of similar appearance. By spending some time examining each ant carefully he can discover which type it is, but he decides, on grounds of cost, to use the much quicker method of classifying an ant as type A if its length is less than 8 mm and as type B otherwise. He knows that lengths of each of the two types of ants are normally distributed: type A with expectation 6.5 mm and standard deviation 0.8 mm, and type B with expectation 9.4 mm and standard deviation 0.9 mm. What proportion of (i) type A and (ii) type B ants would he misclassify by this method? If the colony consists of 70% of type A and 30% of type B, what proportion of all ants would he misclassify? It is thought that the number of of ants misclassified may be reduced by choosing a critical point other than 8 mm. Discuss! 5. The mad king has captured Anne, Betty and Charles. He would like to kill them all, but, as he is a fan of probability puzzles, he offers them the following challenge. The following morning, each of the three prisoners will be escorted to a cell. They will each enter the cell simultaneously, but each through a different door. Immediately before entering the cell, a hat will be placed on the head of each prisoner. The colour of the hat will be either red or blue, and the choice for each prisoner will be decided by the flip of a fair coin, independently for each prisoner. As they enter the room, each prisoner will be able to see the colours of the hats of the other two prisoners, but not the colour of their own hat. No communication of any kind is allowed between the prisoners. At the moment that all of the prisoners enter the cell, and observe the colours of their comrades’ hats, each may choose either to remain silent or, instantly, to guess the colour of the hat on their head. If at least one prisoner guesses correctly the colour of their hat, and nobody guesses incorrectly, all the prisoners will be set free. Otherwise, they will all be executed. The prisoners are allowed a meeting beforehand to discuss their strategy. They immediately realise that one possible strategy would be for Anne to guess that her hat was red, and Betty and Charles to stay silent. This strategy gives a probability of 1/2 that all will go free. Is there a better strategy?

Statistical Concepts 2013/14 – Solutions 0 – Probability Revision 1. (a) Each of the coins is equally likely to be chosen. The ten probabilities must sum to one since exactly one coin must be chosen. Hence, P [coin with two heads] = 1/10. (b) P [head | coin with two heads] = 1 and P [head | fair coin] = 1/2. Therefore, applying Bayes theorem, P [coin with two heads | head] =

1 × 1/10 2 = ' 0.182 1 × 1/10 + 1/2 × 9/10 11

(c) Start from the beginning. A total of n + 1 heads have occurred. But P [n + 1 heads | coin with two heads] = 1 and P [n + 1 heads | fair coin] = 1/2n+1 Therefore, applying Bayes theorem as in the previous part, P [coin with two heads | n + 1 heads] =

2n+1 1 × 1/10 = 1 × 1/10 + 1/2n+1 × 9/10 9 + 2n+1

(d) This probability is 1, verify this using Bayes theorem. 2. Using the basic rules for handling probability density functions, expectations and variances, we have: R1 R∞ (a) P [U ≥ 0.3] = 0.3 f (u) du = 0.3 1 du = 0.7. R1 R∞ (b) E [U ] = −∞ uf (u) du = 0 u du = 1/2.     R1 (c) E U 2 = 0 u2 f (u) du = 1/3 → Var [U ] = E U 2 − E [U ]2 = 1/3 − 1/4 = 1/12. R1 (d) E [log(1 + U )] = 0 log(1+u)f (u) du = [(1+u)[log(1+u)−1]10 = 2(log 2−1)−(−1) = 2 log 2 − 1. (e) It is straightforward to show that Y = a + (b − a)U has the required uniform distribution. Hence, E [Y ] = a + (b − a)/2 = (a + b)/2 and Var [Y ] = (b − a)2 /12. 3. Recall that if Y has the Poisson distribution with parameter λ then E [Y ] = λ and Var [Y ] = λ. (a) Therefore, Var [Y ] = 3. (b) The central limit theorem says that when Y1 , . . . , Yn are independent and identically √ distributed with mean µ and variance σ 2 , the distribution of n(Y − µ)/σ for large n will be approximately N(0, 1). (c) Applying the central limit theorem, the distribution of Z = 10(Y −3)/1.732 should be approximately N(0, 1). But from tables of the normal distribution, if Z ∼ N(0, 1) then  P [Z ≤ 2.33] = 0.99 and so P [|Z|  ≤ 2.33] = 0.98. Therefore, P |Y − 3| ≤ 0.40 ' 0.98; that is, P Y ∈ [2.60, 3.40] ' 0.98. Thus, [2.60, 3.40] is the required interval.

4. Let Y denote the length of an ant. Then, P [misclassify an ant | type A] = P [Y > 8 | A] = P [Z > 1.875] ' 0.0304 where Z ∼ N(0, 1). Similarly, P [misclassify an ant | type B] ' 0.0599. The overall misclassification rate is given by P [misclassify an ant] = P [misclassify | type A] P [type A] + P [misclassify | type B] P [type B] = 0.0304 × 0.7 + 0.0599 × 0.3 ' 0.0392 Intuitively, the cutoff point should be made larger to reduce the overall error rate: this reduces the error rate for the type A ants, the more numerous of the two populations. [The following remarks go beyond the intuitive explanation suggested above. First, notice that the original cutoff point of 8 mm is slightly larger than the average of the type A and type B expectations (7.95 mm): this is to account for the difference in standard deviations. In general, for two populations with densities f1 (y) and f2 (y) and associated proportions p1 and p2 , it can be shown that the classification regions are determined by the solution(s) to the equation p1 f1 (y) = p2 f2 (y), and with this choice the overall misclassification rate is minimised. We can see that this might be the case by noticing that pi fi (y) is proportional to the conditional probability that an ant of length y belongs to population i; and an intuitive rule would be to assign such an ant to the population with the largest of these conditional probabilities. This rule generalises to any number of populations, and it turns out that the overall misclassification rate is minimised] 5. Here is the best strategy. Each prisoner does the following. If the hat colours of the other two prisoners are different, they say nothing. If the colour is the same, they guess the opposite colour. This method will be successful unless all of the hats are of the same colour. The chance that all are of the same colour is 1/4. (There is a 1/8 chance that all are red, and 1/8 chance that all are blue.) Therefore this strategy has probability 3/4 of success.

2H Statistical Concepts 2013/14 — Sheet 1 — Sampling 1. [Hand in to be marked] This question illustrates sampling distributions and related ideas in the context of a very small population with known x-values. The population comprises 5 individuals with x-values {1, 3, 3, 7, 9}. A sample of size two (resulting in values Y1 and Y2 ) is drawn at random and without replacement from the population. (a) What is value of N ? Compute the population mean and the population variance. (b) What is value of n? Write down the (sampling) distribution of Y1 . What is the (sampling) distribution of Y2 ? (c) Derive the exact of Y¯ and, in this case, check directly the   (sampling) distribution ¯ ¯ formulae for E Y and Var Y given in lectures. (d) Derive the exact (sampling) distribution for the range of the two sample values (the largest minus the smallest) and show that in this case the sample range is not an unbiased estimator of the population range (the largest minus smallest of the population values). Under what general conditions on the population size and values and the sample size will the sample range be an unbiased estimator for the population range? 2. This is a simple numerical (no context) exercise on some basic things you should have learned so far. A simple random sample (without replacement) of size 25 from a population of size 2000 yielded the following values: 104 86 91 104 79 For the above data,

P25 1

Yj = 2451,

109 80 103 98 87 P25 1

111 119 99 98 94

109 88 108 83 92

87 122 96 107 97

Yj2 = 243505

(a) Calculate an unbiased estimate of the population mean and of the population total.   (b) Calculate unbiased estimates of the population variance and of Var Y¯ . (c) Compute (estimated) standard errors for the population mean and for the population total. 3. Among three boys, Andy has 3 sweets, Bill has 4 sweets and Charles has 5 sweets. Among three girls, Doreen has 4 sweets, Eve has 6 sweets and Florence has 8 sweets. One boy is selected at random, with number of sweets B1 , and independently, one girl is selected with number of sweets G1 . Let D1 = G1 − B1 . (a) Find the sampling distribution of D1 and thus find, directly, the expected value and variance of D1 . (b) Find the expected value and variance of D1 by first finding the corresponding values for G1 and B1 , and check that you get the same answers. (c) A second boy is selected at random from the remaining two boys, with number of sweets B2 , and a second girl is selected with number of sweets G2 . Let D2 = G2 −B2 . Find the sampling distribution of D = (D1 + D2 )/2 and thus find the expected value and variance of D. (d) Find the expected value and variance of D, using the formulae for E(D1 +D2 ), V ar(D1 + D2 ), and check that you get the same answers.

4. Show that with simple random sampling without replacement from a finite population the random quantity s2 h ni 1− n N   (usually denoted by s2¯ ) is an unbiased estimator of Var Y¯ , where Y

n

s2 =

1 X (Yi − Y¯ )2 . n−1 i=1

[Hint: First show that n X i=1

(Yi − Y¯ )2 =

n X

Yi2 − nY¯ 2

i=1

  and then use the expression for Var Y¯ given in lectures in combination with the general result that E Z 2 = Var [Z] + [E [Z]]2 for any random quantity Z.]

Statistical Concepts 2013/14—Solutions 1—Sampling 1. (a) N = 5. µ = 23/5 = 4.6, σ 2 = (149/5) − (4.6)2 = 8.64 (b) n = 2. P [Yj = 1] = P [Yj = 7] = P [Yj = 9] = 0.2, P [Yj = 3] = 0.4, j = 1, 2 (c) Possible samples and corresponding values for the sample mean and range are Sample (y1 , y2 ) Mean (¯ y) Range (r)

(1,3) 2 2

(1,3) 2 2

(1,7) 4 6

(1,9) 5 8

(3,3) 3 0

(3,7) 5 4

(3,9) 6 6

(3,7) 5 4

(3,9) 6 6

(7,9) 8 2

Hence, the sampling distribution of Y¯ is  y¯  P Y¯ = y¯

2 0.2

3 4 5 6 8 0.1 0.1 0.3 0.2 0.1    2 E Y¯ = 2 × 0.2 + · · · + 8 × 0.1 = 4.6 = µ, E Y¯ = 22 × 0.2 + · · · + 82 × 0.1 = 24.4. Thus,  2 ¯ Var Y = 24.4 − (4.6) = 3.24. This agrees with       N − n σ2 5 − 2 8.64 ¯ Var Y = = = 3.24 N −1 n 5−1 2 (d) The sampling distribution of R is r P [R = r]

0 0.1

2 0.3

4 0.2

6 0.3

8 0.1

E [R] = 0 × 0.1 + · · · + 8 × 0.1 = 4 < population range = 9 − 1 = 8 In general, the sample range can never be larger than the population range. Therefore it will be biased if there is a positive probability that the sample range will be smaller than the population range. Therefore there are two situations: • If the population range is zero (all values in the population are the same), the sample range will always be zero and will be unbiased. • If the population range is postive: let a denote the maximum value and b the minimum value in the population; the sample range can only be unbiased if every sample must contain at least one a and at least one b as otherwise there would be positive probability of obtaining a sample with smaller range than the population. The only way to guarantee that both a and b appear in the sample is if n > N − min(Na , Nb ) where Nx denotes the number of times the value x appears in the population. Pn Pn 2 Pn 2 ¯ 2 2. n = 25, N = 2000, 1 Yj = 2451, 1 Yj = 243505, 1 (Yj − Y ) = 243505 − (2451 /25) = 3208.96 (a) Y¯ = 98.04 is an unbiased estimate of the population mean µ; and T = 2000 × 98.04 = 196080 is an unbiased estimate of the population total τ . (b)

3208.96 2000−1 2 2000 × 25−1 = 133.64 is an unbiased estimate of the population variance σ .     2 n 25 ¯ s2Y¯ = sn 1 − N = 3208.96 25×24 1 − 2000 ' 5.28, an unbiased estimate of Var Y .

(c) Estimated SE of Y¯ as an estimate of the population mean µ is sY¯ = 2.298; and the estimated SE of T as an estimate of the population total τ is 2000 times this; namely, sT = 4596. 3. (a) The possible values of D1 are -1,0,1,1,2,3,3,4,5 each with probability 1/9. Therefore E(D1 ) = (1 + 0 − 1 + 3 + 2 + 1 + 5 + 4 + 3)/9 = 2, and V ar(D1 ) = E(D1 − 2)2 = (1 + 4 + 9 + 1 + 0 + 1 + 9 + 4 + 1)/9 = 10/3. (b) E(B1 ) = (3 + 4 + 5)/3 = 4, E(G1 ) = (4 + 6 + 8)/3 = 6 so E(G1 − B1 ) = E(G1 ) − E(B1 ) = 6 − 4 = 2. V ar(B1 ) = E(B1 − 4)2 = 2/3, V ar(G1 ) = E(G1 − 6)2 = 8/3

so, as G1 , B1 are independent, V ar(G1 − B1 ) = V ar(G1 ) + V ar(B1 ) = 10/3 . (c) If we choose two boys and two girls, then we leave one boy and one girl behind. Call their values B3 , G3 with D3 = G3 − B3 . As D1 + D2 + D3 = 6, we have D = (6 − D3 )/2. D3 has the same distribution as D1 so that the possible values of D are 7/2, 6/2, 5/2, 5/2, 4/2, 3/2, 3/2, 2/2, 1/2, each with probability 1/9. So, we can find E(D), V ar(D) directly from this distribution, or from E(D) = (6 − E(D3 ))/2 = (6 − 2)/2 = 2, V ar(D) = V ar((6 − D3 )/2) = V ar(D3 )/4 = 10/12. (d) E(D) = (1/2)(E(D1 ) + E(D2 )) = 2 V ar(D) = (1/4)V ar(D1 + D2 ) = (1/4)(V ar(D1 ) + V ar(D2 ) + 2Cov(D1 , D2 )). We have V ar(D1 ) = V ar(D2 ) = 10/3 and Cov(D1 , D2 ) = Cov(G1 − B1 , G2 − B2 ) = Cov(G1 , G2 ) + Cov(B1 , B2 ), as G1 , G2 are independent of B1 , B2 so Cov(B1 , G2 ) = Cov(G1 , B2 ) = 0. From results in lectures, we have that the covariance between any two values sampled without replacement from a population is minus the variance of a single sample, divided by one less than the population size, so that Cov(G1 , G2 ) = −V ar(G1 )/(3 − 1) = −8/6, Cov(B1 , B2 ) = −V ar(B1 )/(3 − 1) = −2/6, Cov(D1 , D2 ) = −8/6 − 2/6 = −10/6 so V ar(D) = (1/4)(10/3 + 10/3 − 20/6) = 10/12 Pn Pn Pn Pn 4. (n − 1)s2 = j=1 (Yj − Y¯ )2 = j=1 (Yj2 − 2Y¯ Yj + Y¯ 2 ) = j=1 Yj2 − 2Y¯ j=1 Yj + j=1 Y¯ 2 = Pn P n 2 2 ¯ ¯ ¯2 ¯2 j=1 Yj − 2Y nY + nY = j=1 Yj − nY .  2   2 ¯ In i follows use (i) E Y = Var [Y ] + (E [Y ]) for any random quantity Y , and (ii) Var Y = h what Pn

N −n N −1

σ2 n .

Then

  (n − 1)E s2 =

n X

    E Yj2 − nE Y¯ 2

j=1

= =   Therefore, E s2 =

N 2 N −1 σ .

    n X N − n σ2 (σ 2 + µ2 ) − n µ2 + N −1 n j=1   N −n (n − 1)N 2 nσ 2 1 − = σ n(N − 1) N −1

Hence 1h n i  2 1− E s n N h i n 1 N = 1− σ2 n N N −1   N − n σ2 = N −1 n   = Var Y¯

  E s2Y¯ =

  Hence, s2Y¯ is an unbiased estimator of Var Y¯ .

2H Statistical Concepts 2013/14 – Sheet 2 – Estimators and Confidence Intervals 1. [Hand in to be marked] At the time of a historic potential challenge for the leadership of the Conservative party (the “stalking horse” affair where Sir Anthony Meyer challenged Mrs Thatcher for the leadership of the Conservative party), the Independent newspaper performed an opinion poll to assess the level of support for Mrs Thatcher. They asked 150 of the 377 Conservative MPs whether or not they felt it was time for a change of leader and used the results to draw conclusions about the level of support for Mrs Thatcher in the whole of the parliamentary party. Supposing the actual level of support to be 40% among the 377 Conservative MPs, (i) calculate the standard deviation of the proportion of the sample supporting Mrs Thatcher, assuming simple random sampling; and (ii) using the Central Limit Theorem, estimate the chance that the level of support in a sample of size 150 will be in error by more than 1%. Suppose that 50 in the sample of 150 said they supported Mrs Thatcher. Compute an approximate 95% confidence interval for the percentage support (without assuming that the actual level of support is 40%). Discuss whether or not this interval is consistent with an actual level of support of 40%. 2. In a private library the books are kept on 130 shelves of similar size. The numbers of books on 15 shelves selected at random were found to have a sum of 381 and a sum of squares of 9947. Estimate the total number of books in the library and provide an estimated standard error for your estimate. Give an approximate 95% confidence interval for the total number of books in the library, and comment on the reliability of the approximation in this instance. 3. In auditing, the following sampling method is sometimes used to estimate the total unknown value α = a1 + a2 + · · · + aN of an inventory of N items, where ai is the (as yet unknown) “audit value” of item i and, as is often the case, a “book value” bi of each item i = 1, . . . , N is readily available.[Think of second-hand cars with their published “blue book” values, or stamps with their catalogue values.] A simple random sample without replacement of size n is taken from the inventory, and for each item j in the sample the difference Dj = Aj − Bj between the audited value ¯ = A¯ − B ¯ is formed. Aj and the book value Bj is recorded and the sample average D The total inventory value α is estimated as V = N D+β, where β = b1 +b2 +· · ·+bN is the known sum of the book values of the inventory. (a) Show that V is an unbiased estimator of the total value α of the inventory. (b) Find an expression for the variance of the estimator V in terms of the population variances σa2 and σb2 of the inventory values and book values and their covariance σab , where you may assume   N − n σab cov(A, B) = N −1 n PN [Note σab is defined to be i=1 (ai − µa )(bi − µb )/N where µa = α/N and µb = β/N are the inventory and book value population means; and when a = b we get the usual variance formula] (c) Under what conditions on the variances and covariances of the inventory value and book value populations will the variance of V be smaller than that of the usual estimator N A of the total inventory α? (d) Under what circumstances will the answer to (c) be useful in practice?

Statistical Concepts 2012/13 – Solutions 2 – Estimators and Confidence Intervals 1.

(i) Assuming true population proportion p = 0.4, standard deviation of estimator pˆ is s s p(1 − p) (N − n) 0.4(1 − 0.4) (377 − 150) σpˆ = = = 0.031 (about 3%) n (N − 1) 150 (377 − 1) (ii)    0.01 = 0.748 (about a 75% chance) P [ˆ p − 0.4| > 0.01] = 2 1 − Φ 0.031 (iii) pˆ =

50 150

= 0.333. SE of pˆ is r spˆ =

n pˆ(1 − pˆ)  1− = 0.03 n−1 N

Hence, approximate 95% limits are 0.333±1.96×0.03 leading to the interval [27.5%, 39.2%]. Since 40% is outside this interval, data is not consistent with this level of support. Pn 2 P 2. n = 15, N = 130, n1 Yj = 381, 1 Yj = 9947   2 s2 = 9947 − 381 /(15 − 1) = 19.257. 15 T = N Y¯ = 3302, an unbiased estimate of the total number of books in the library.  2 s2Y¯ = sn 1 − Nn = 1.136. sT = N sY¯ = 138.54, the SE of T . z0.025 = 1.96. Hence, an approximate 95% CI for the total number of books has limits, T ± z0.025 sT , which evaluates to [3030, 3574] (nearest integer) . As n is not “large”, the accuracy of the CLT-based CI cannot be guaranteed.   ¯ + β = N (µa − µb ) + β = α − β + β = α 3. (a) E [V ] = N E D (b) Var [V ]            ¯ = N 2 Var A¯ + Var B ¯ − 2Cov A, ¯ B ¯ ¯ = N 2 Var A¯ − B = N 2 Var D  2  σb2 N − n σab N − n 2 σa N − n = N + −2 n N −1 n N −1 n N −1 2  N (N − n)  2 = σa + σb2 − 2σab n(N − 1)   −n σa2 (c) Var N A¯ = N 2 N > Var [V ] when σb2 < 2σab . N −1 n (d) Useful, provided we have knowledge about the relative magnitudes of σb2 and σab . We know the value of σb2 but not σab . The closer the audit and book values are related, as measured by σab . the more useful V would be.

Statistical Concepts 2013/14 – Sheet 3 – Probability Models and Goodness of Fit 1. [Hand in to be marked] The Poisson distribution has been used by traffic engineers as a model for light traffic, based on the rationale that if the rate is approximately constant and the traffic is light, so that cars move independently of each other, the distribution of counts of cars in a given time interval should be nearly Poisson. The following table shows the numbers of right turns during 300 three-minute intervals at a specific road intersection over various hours of the day and various days of the week. # right turns count

0 14

1 30

2 36

3 68

4 43

5 43

6 30

7 14

8 10

9 6

10 4

11 1

12 1

13+ 0

Estimate the rate parameter λ in the Poisson distribution. After pooling the last five cells (explain why we do this), assess the fit of the Poisson distribution using Pearson’s chi-square statistic. Carefully explain any lack of fit. 2. Are birthrates constant throughout the year? Here are all the births in Sweden, in 1935, grouped by season. Spring (Apr-June) 23,385 Summer (Jul-Aug) 14,978 Autumn (Sep-Oct) 14,106 Winter (Nov-Mar) 35,804 (a) Carry out a chi-square test of the constant birthrate hypothesis for these data. Comment on any divergences from constant birth rate. (b) From the given data, construct a 95% confidence interval for the proportion of spring births. Comment on the relationship of this analysis with part (a). 3. Capture-recapture. How does one estimate the number of fish in a lake? The following technique is actually used for this and other problems concerning sizes of wildlife populations. A net is set up to catch some fish. These are marked and returned to the lake. At a later date another batch of fish are caught. The size of the fish population can then be estimated from seeing how many of the marked fish have been caught in the second sample. Argue that if M marked fish from the first stage are returned to the lake, then the probability distribution P [Y = y | M, n, N ] of the number Y of marked fish caught in a second catch of n fish, when the total number of fish in the lake is N , is given by    M N −M y n−y   P [Y = y | M, n, N ] = N n Find E [Y | M, n, N ] and hence suggest an estimator for N . Note that you can evaluate the expectation without using P [Y = y | M, n, N ]; explain how. Evaluate your estimator for the possible values of y in the case when 6 fish are marked in first stage and 10 fish are to be caught in the second stage. For an observed value y of Y , discuss how you might use P [Y = y | M, n, N ] (when considered as a function of N) as providing an alternative route to estimating N . [To clarify this, you might consider by way of an example plotting P [Y = 3 | M = 6, n = 10, N ] as a function of N , corresponding to the specific situation described above in which 3 marked fish are caught at the second stage.] The probability model in this question is known as the hypergeometric distribution; it appears in many contexts.

Statistical Concepts 2013/14 –Solutions 3– Probability Models and Goodness of Fit ˆ = 1168/300 = 3.893. Pool cells to ensure E ≥ 5 in each cell. 1. Total = 1168, n = 300, λ 0 1 O 14 30 E 6.1 23.8 X 2 10.2 1.6

2 36 46.3 2.3

3 68 60.1 1.0

4 5 6 7 8 43 43 30 14 10 58.5 45.6 29.6 16.4 8.0 4.1 0.1 0.01 0.4 0.5

9+ Total 12 300 5.5 300 7.8 27.93

Pearson’s chi-square statistic, X 2 = 27.93. Degrees-of-freedom, ν = (10 − 1) − 1 = 8. Significance probability, P [X 2 > 27.93 | Poisson model] = 0.0005, which is very strong evidence against the Poisson model. There are too many zero counts and too many counts of 9 or more. The Poisson model is unlikely to be reasonable, as the traffic rate will tend to vary over different times of the day and on different days of the week. A Poisson model would more likely hold at a specific place during a specific time period on the same day of the week, such as 7:00 am to 8:00 am on a Sunday, when traffic density is more likely to be light and nearly constant. 2. (a) Under the simple model where births are independent and occur with the same probability every day (assume 365 days in a year), where we have seen a random sample of n = 88, 273 births from a hypothetical infinite population of such births, here is the layout of the chi-square calculation. Obs. freq Probability Exp freq Season O p E=np (O-E) (O − E)2 /E Spring (Apr-June) 23,385 0.24932 22,008 1,377 86.16 Summer (Jul-Aug) 14,978 0.16986 14,994 -16 0.02 Autumn (Sep-Oct) 14,106 0.16712 14,752 -646 28.29 Winter (Nov-Mar) 35,804 0.41370 36,519 -715 14.00 From the table, the χ2 statistic is 128.47, which should be compared to the null distribution of χ2 with 3 degrees of freedom. The p-value for this statistic is very small (much smaller than the smallest value in your χ2 tables), so that we can reject the null hypothesis of equal birthrates on all days with a very low significance probability. Note that with very large data sets we can be sensitive to very small discrepancies from the null distribution. Looking at the table, we see more births than we would expect in spring compensated by less in autumn and winter. (b) To see how far off equal birthrates the data are, we may assess the probability of birth in each season. For spring, the estimate is p = 23385/88, 273 = 0.265. Our confidence p interval is therefore 0.265 ± 1.96 (0.265)(0.735)/88, 273 = 0.265 ± 0.0029. If birth rates were constant over days, then this probability would be 0.249. Note that this value does not lie within the confidence interval. The observed value is about 6% above the equal rate value, and our confidence interval suggests that the true ratio is within 5% and 7%. You can check that the confidence intervals for autumn and winter similarly fail to include the equal rate values.

3. Assuming no mixing, no deaths, etc, the expression for the quantity P [Y = y | M, n, N ] follows from 1H Probability; compare for example to counting the number of aces in a poker hand. Y = Y1 + · · · + Yn , where Yi = 1 if i-th fish caught is marked and Yi = 0 otherwise. Then E [Y | M, n, N ] = nE [Y1 | M, n, N ] = n

M N

Equate number of marked fish y (in second sample) to n M to obtain estimator N ∗ = nM/y. N For M = 6, n = 10, we obtain y N∗

0 ∞

1 60

2 30

3 20

4 15

5 12

6 10

For observed Y = y, we can use P [Y = y | M, n, N ] as a likelihood function for N , and ˆ , which maximises this function. This is the maximum likelihood find the value of N , N estimator. For the above example with say y = 3, we have    N −6 6 N −6 P [Y = 3 | M = 6, n = 10, N ] =

3

10−3  N 10



10−3  N 10

≡ l(N )

l(N ) = 0 for N < 13. Plot l(N ) for N ≥ 13. Note that (N − 5)(N − 9) l(N + 1) = ≥ 1 for N ≤ 19 l(N ) (N − 12)(N + 1) ˆ = 19 or 20 (cf. N ∗ = 20 when y = 3). Therefore, the m.l.e of N is not unique: N

Statistical Concepts 2013/14 – Sheet 4 – Probability Models 1. [Hand in to be marked] A r.q. Y has a gamma distribution with parameters α, λ > 0 if its p.d.f. is 1 α α−1 −λy λ y e y ≥ 0. f (y | α, λ) = Γ(α) and zero otherwise. (a) Show that Var [Y | α, λ] = α/λ2 . (b) Show that the moment generating function of Y is given by  α λ MY (t) = t 0 = λ−t Γ(α) λ−t 0 The condition on t is required as the integral does not converge/exist for other values of t. As the moment generating function is defined for t in an open interval including 0, it can be used to find all moments E(Y r ), and hence uniquely specifies the distribution of Y (see your 1H Probability notes, or the textbook by Rice, for more details on mgf’s if needed to refresh your knowledge). (c) Put Y = Y1 +· · ·+Yn . Then (1H Probability course), because the Yi are independent,  nα λ MY (t) = MY1 (t)MY2 (t) · · · MYn (t) = λ−t Therefore, Y ∼ Gamma(nα, λ). 2. Observed values are

Severity Moderate-to-advanced Minimal Not present

O 7 27 55

A 5 32 50

AB 3 8 7

B 13 18 24

Expected values are

Severity Moderate-to-advanced Minimal Not present The (O − E)2 /E entries are

O 10.01 30.38 48.61

A 9.78 29.70 47.52

AB 2.02 6.14 9.83

B 6.18 18.78 30.04

Severity Moderate-to-advanced Minimal Not present

O 0.904 0.376 0.840

A 2.339 0.178 0.130

AB 0.471 0.560 0.815

B 7.510 0.032 1.214

giving a total of 15.36957. Under the hypothesis of no association between disease and blood group, the null distribution is a chi-square distribution with (r−1)(c−1) = 3×2 = 6 degrees of freedom. The observed value corresponds to a significance level of 0.0176. (In R this can be computed as 1-pchisq(15.36957,6)).) [Note that one of the cells, [AB,moderate] has a small expected value. If we had a large observed value in this cell, then our chi-square distribution approximation would not be reliable.] Thus, there is some evidence of association of disease and blood group within the ABO system, which is mainly confined to A and B in “Moderate-to-advanced”, especially B. 3. Likelihood l(n) for n is given by   n 1 n(n − 1) l(n) = P [2 boys | n] = = for n = 2, 3, 4, 5, . . . n 2 2 2n+1 5 , . . . Thus, m.l.e. n ˆ is not unique, as both n = 3 and n = 4 and takes values 41 , 83 , 38 , 16 maximise l(n). [Obviously, l(0) = l(1) = 0.]

4. [Hand in to be marked] Q y /τ ) for τ > 0. Hence, it is (a) Likelihood, l(τ ) = ni=1 τ −1 exp (−yi /τ ) = τ −n exp (−n¯ sufficient to know the values of (n, y¯) to compute l(τ ). (b) L(τ ) = −n log τ − n¯ y /τ . Hence,   n y¯ n 00 0 L (τ ) = − + n 2 = 0 → τˆ = y¯ L (ˆ τ) = − 2 < 0 τ τ y¯   (c) E Y¯ | τ = E [Y | τ ], where Y ∼ Gamma(α, λ) with α = 1, λ = 1/τ . Thus, E [ˆ τ | τ ] = α/λ = τ (unbiased). Q 5. The likelihood is l(p) = ni=1 (1 − p)pyi −1 = (1 − p)n pn(¯y−1) . Put L(p) = log l(p). Then L‘ (p) =

−n n(¯ y − 1) y¯ − 1 + → pˆ = 1−p p y¯

P For data, yi = 1 × 48 + 2 × 31 + · · · + 12 × 1 = 363, n = 48 + 31 + · · · + 1 = 130. Hence, pˆ = (363 − 130)/363 = 0.642. Noting that Ek+1 = pˆEk and E1 = 130(1 − pˆ) = 46.6 (1 dp) we obtain the following table: Hops O E X2

1 48 46.6 .045

2 31 29.9 .042

3 20 19.2 .035

4 9 12.3 .891

5 6 7.9 .458

6 5 5.1 .001

7 4 3.3 .397

8 2 2.1 −→

9 1 1.3

10 1 0.9

11 2 0.6

12+ 1 0.4

Total 130 130 1.868

Pooled cells have expectation 9.092 = 130P [Y ≥ 7 | p = 0.642] and a contribution of  0.3967 to X 2 . Degrees-of-freedom = 7−1−1 = 5, X 2 = 1.868 and P X52 > 1.868 | geometric model = 0.8672. This “large” significance probability suggests that the geometric model fits well— perhaps too well. However, why should the geometric distribution model the number of hops?

Statistical Concepts 2013/14 – Sheet 5 – Likelihood 1. [Hand in to be marked] Let y1 , . . . , yn be a random sample from a geometric distribution P [Y = y | p] = (1 − p)py−1 y = 1, 2, . . . where p ∈ [0, 1]. [Remember from 1H Probability that this is the distribution for the number trials to the first failure in a sequence of independent trials with success probability p.] Show that, if the model is correct, it is sufficient to know the sample size n and the sample mean y to evaluate the likelihood function. Find the maximum likelihood estimator (m.l.e.) of p in terms of y. In an ecological study of the feeding behaviour of birds, the number of hops between flights was counted for several birds. For the following data, fit a geometric distribution to these data using the m.l.e. of p, and test for goodness of fit using Pearson’s chi-square statistic, remembering the “rule of thumb” to “pool” counts in adjacent cells so that all resulting cell expectations are at least 5. Number of hops 1 Count 48

2 31

3 4 5 6 7 8 9 20 9 6 5 4 2 1

10 11 1 2

12 1

2. Let y1 , . . . , yn be a random sample from an exponential distribution with p.d.f. f (y | τ ) =

1 −y/τ e τ

0≤y 0. (a) Show that if the model is correct it is sufficient to know the sample size n and the sample mean y to evaluate the likelihood function for any value of τ . (b) Find the maximum likelihood estimator τˆ of τ in terms of y. (c) Show that τˆ is an unbiased estimator of τ ; that is, E [ˆ τ | τ ] = τ for all τ > 0..

More problems on other side

3. Suppose that a parameter θ can assume one of three possible values θ1 = 1, θ2 = 10 and θ3 = 20. The distribution of a discrete random quantity Y , with possible values y1 , y2 , y3 , y4 , depends on θ as follows:

y1 y2 y3 y4

θ1 .1 .1 .2 .6

θ2 .2 .2 .3 .3

θ3 .4 .3 .1 .2

Thus, each column gives the distribution of Y given the value of θ at the head of the column. (a) Write down the parameter space Θ. (b) A single observation of Y is made. Sketch the likelihood function and evaluate the m.l.e. θˆ of θ for each of the possible values of Y . ˆ that is, for each θ compute the probability (c) Evaluate the sampling distribution of θ; ˆ based on a single observation of Y . Display your answer in tabular distribution of θ, form. (d) Is θˆ an unbiased estimator of θ? Prove your result! 4. The random quantity Y has a geometric distribution with probability function P [Y = y | p] = (1 − p)py−1

y = 1, 2, . . .

p ∈ [0, 1]

Show that P [Y > y | p] = py . Recall from 1H that Y counts the number of trials to the first ‘failure’ in a sequence of Bernoulli trials, each with success probability p. As part of a quality control procedure for a certain mass production process, batches containing very large numbers of components from the production are inspected for defectives. We will assume the process is in equilibrium and denote by q the overall proportion of defective components produced. The inspection procedure is as follows. During each shift n batches are selected from the production and for each such batch components are inspected until a defective one is found, and the number of inspected components is recorded. At the end of the shift, there may be some inspected batches which have not yet yielded a defective component; and for such batches the number of inspected components is recorded. Suppose at the end of one such inspection shift, a defective component was detected in each of r of the batches, the recorded numbers of inspected components being y1 , . . . , yr . Inspection of the remaining s = n − r batches was incomplete, the recorded numbers of inspected components being c1 , . . . , cs . (a) Show that the likelihood function for q based on these data is l(q) = q r (1 − q)y+c−r

q ∈ [0, 1]

where y = y1 + · · · + yr and c = c1 + · · · + cs . (b) Therefore, show that the maximum likelihood estimate of q is qˆ = 1/a, where a = (y + c)/r.

Statistical Concepts 2013/14 – Solutions 5 – Likelihood Q 1. The likelihood is l(p) = ni=1 (1−p)pyi −1 = (1−p)n pn(¯y−1) . Therefore, by the factorisation criterion, y¯ is sufficient for p, if we know n. Put L(p) = log l(p). Then 0

L (p) =

−n n(¯ y − 1) y¯ − 1 + → pˆ = 1−p p y¯

00

(This is the maximum as L (p) is negative for p ∈ [0, 1].) P For data, yi = 1 × 48 + 2 × 31 + · · · + 12 × 1 = 363, n = 48 + 31 + · · · + 1 = 130. Hence, pˆ = (363 − 130)/363 = 0.642. Noting that Ek+1 = pˆEk and E1 = 130(1 − pˆ) = 46.6 (1 dp) we obtain the following table: Hops O E X2

1 48 46.6 .045

2 31 29.9 .042

3 20 19.2 .035

4 9 12.3 .891

5 6 7.9 .458

6 5 5.1 .001

7 4 3.3 .397

8 2 2.1 −→

9 1 1.3

10 1 0.9

11 2 0.6

12+ 1 1.0

Total 130 130 1.868

Pooled cells have expectation 9.092 = 130P [Y ≥ 7 | p = 0.642] and a contribution of 0.3967 to X 2 . Degrees-of-freedom = 7 − 1 − 1 = 5, X 2 = 1.868 and   P X52 > 1.868 | geometric model = 0.8672 This “large” significance probability suggests that the geometric model fits well—perhaps too well. However, why should the geometric distribution model the number of hops? Q y /τ ) for τ > 0. Hence, it is 2. (a) Likelihood, l(τ ) = ni=1 τ −1 exp (−yi /τ ) = τ −n exp (−n¯ sufficient to know the values of (n, y¯) to compute l(τ ) (by factorisation criterion). (b) L(τ ) = −n log τ − n¯ y /τ . Hence, the MLE is n y¯ L (τ ) = − + n 2 = 0 → τˆ = y¯ τ τ 0

  n 00 as L (ˆ τ) = − 2 < 0 y ¯

  (c) E Y¯ | τ = E [Y | τ ], where Y ∼ Gamma(α, λ) with α = 1, λ = 1/τ . E [ˆ τ | τ ] = α/λ = τ (unbiased).

Thus,

3. (a) Θ = {1, 10, 20}. ˆ 1 ) = 20, θ(y ˆ 2 ) = 20, θ(y ˆ 3 ) = 10, θ(y ˆ 4 ) = 1. (b) θ(y (c) There are three different (sampling) distributions (displayed as columns in the table ˆ one for each θ ∈ Θ. below) for θ, θˆ 20 10 1

1 .2 .2 .6

θ 10 .4 .3 .3

20 .7 .1 .2

h i Eg, P θˆ = 20 | θ = 1 = P [Y = y1 | θ = 1] + P [Y = y2 | θ = 1] = 0.1 + 0.1 = 0.2. h i h i (d) E θˆ | θ = 1 = 20 × 0.2 + 10 × 0.2 + 1 × 0.6 = 6.6 6= 1. Therefore, E θˆ | θ 6= θ for at least one value of θ; so θˆ is not an unbiased estimator of θ.

P 4. (a) P [Y > y] = 1 − P [Y ≤ y] = 1 − yr=1 (1 − p)pr−1 = py . The likelihood, i.e. the probability of the observed data for a given p, is P [Y1 = y1 , . . . , Yr = yr , Yr+1

r s Y Y yi −1 > c1 , . . . , Yn > cs | p] = (1−p)p pcj = (1−p)r py−r pc i=1

(b) Therefore the log-likelihood for q = 1 − p is L(q) = r log q + (y + c − r) log (1 − q) and 0

L (q) = where a = (y + c)/r.

1 r y+c−r + (−1) = 0 → qˆ = q 1−q a

j=1

Statistical Concepts 2013/14 – Sheet 6 – Likelihood 1. [Hand in to be marked] An independent sample x = (x1 , . . . , xn ) of size n is drawn from a Rayleigh distribution with pdf x −x2 /2α e ,x>0 α = 0, x≤0

f (x|α) =

with unknown parameter α > 0 (a) Show that the maximum likelihood estimator for α is α ˆ=

Pn

i=1 2

x2i /2n.

(b) If X has a Rayleigh distribution, parameter α, show that E(X ) = 2α. Therefore, show that Fisher’s information for a sample of size one is 1/α2 . Therefore, write down the information in a sample of size n. (c) Calculate an approximate 95% confidence interval for α if n is large. 2. An offspring in a breeding experiment can be of three types with probabilities, independently of other offspring, 1 1 1 (2 + p) (1 − p) p 4 2 4 (a) Show that for n offspring the probability that there are a, b and c of the three types, respectively, is of the form K (2 + p)a (1 − p)b pc where K does not depend on p. (b) Show that the maximum likelihood estimate pˆ of p is a root of np2 + (2b + c − a)p − 2c = 0. (c) Suppose that an experiment gives a = 58, b = 33 and c = 9. Find the m.l.e. pˆ. (d) Find Fisher’s information, and give an approximate 95% confidence interval for p. (e) Use pˆ to calculate expected frequencies of the three types of offspring, and test the adequacy of the genetic model using Pearson’s chi-square statistic. 3. A random quantity Y has a uniform distribution on the interval (0, θ) so that its p.d.f. is given by 1 f (y | θ) = 0 0 for f (y|θ) to be non-negative. The joint pdf is ( f (y1 , . . . , yn | θ) =

θ−n 0

for 0 < yi ≤ θ otherwise

i = 1, . . . , n

But {0 < yi ≤ θ , i = 1, . . . , n} ≡ {0 < m ≤ θ}, where m = max{y1 , . . . , yn }, and the result follows. The likelihood function is

( l(θ) =

so that θˆ = m.

0 θ−n

if θ < m if θ ≥ m

  0 P [M ≤ m | θ] = P [Y1 ≤ m, . . . , Yn ≤ m | θ] = (m/θ)n   1

Hence h i Z ˆ E θ|θ = 0

θ

mn−1 m n n θ 



 dm =

1 1− n+1

if m ≤ 0 if 0 < m ≤ θ if m > θ  θ

We would expect the true value of θ to be larger than the largest observation, so result is not surprising. ˆ An unbiased estimator of θ is [(n + 1)/n]M , which is bigger than M = θ.

4. (a) One way of observing y1 , . . . yk is C1 C1 · · · C1 C2 C2 · · · C2 · · · Ck Ck · · · Ck | {z } | {z } | {z } y1

y2

yk

with probability p1 p1 · · · p1 p2 p2 · · · p2 · · · pk pk · · · pk = py11 py12 · · · py1k | {z } | {z } | {z } y1

y2

yk

Thus P [Y1 = y1 , Y2 = y2 , . . . Yk = yk | p1 , p2 , . . . , pk ] = Kpy11 py12 · · · py1k where K is the number of different ways this event can occur. (b) Let Y = the sum of a proper subset of Y1 , Y2 , . . . , Yk and p the sum of the corresponding subset of p1 , p2 , . . . , pk . Then Y is the number of observations that fall into the corresponding disjoint union of classes C1 , C2 , . . . , Ck . Thus, Y ∼ Binomial(n, p). (c) 1 11 1+θ θ+ (1 − θ) = = P [F F ] 2 22 4 1−θ P [M F ] = 1 − P [F F ] − P [M M ] = 2

P [M M ] = P [M M | I] P [I] + P [M M | I c ] P [I c ] =

(d) The likelihood is  l(θ) = C

1+θ 4

y1 

1+θ 4

y2 

1−θ 2

y3

Therfore, the log-likelihood is L(θ) = constant + (y1 + y2 ) log (1 + θ) + y3 log (1 − θ) Differentiating wrt to θ we obtain 0

L (θ) =

y1 + y2 y3 − =0 1+θ 1−θ

which has solution θ∗ = 1 − 2yn3 . This θ∗ will be the maximum likelihood estimator provided that θ∗ ≥ 0, so 00 ˆ < 0 as in part (g) then θˆ = θ∗ = 1 − 2yn3 , otherwise θˆ = 0. (Check that θˆ is the maximum by checking L (θ) below.) (e) Observe that the expectation of θ∗ is found as 2 2 E [θ | θ] = 1 − E [Y3 | θ] = 1 − n n n ∗



1−θ 2

 =θ

∗ ∗ ˆ so θ∗ is unbiased. We h know i that θ ≥ θ , but actually there is a positive probability that θ < 0 in which case ∗ θˆ = 0 > θ , hence E θˆ | θ > θ so the estimator θˆ is biased.

(f) For large n, we have θˆ ≈ θ∗ . So, we may find the variance of θˆ approximately as  2     2 h i 2 1−θ 1+θ 1 − θ2 2 ∗ ˆ Var [Y3 | θ] = n = Var θ | θ ≈ Var [θ | θ] = n n 2 2 n (g) 00

L (θ) = −

y1 + y2 y3 − (1 + θ)2 (1 − θ)2

For a sample of size 1, given θ, E(y1 ) = E(y2 ) =

1+θ 1−θ , E(y3 ) = . 4 2

So, Fisher’s information is 00

I(θ) = −E(L (θ)) =

1+θ 1−θ E(y1 ) + E(y2 ) E(y3 ) 1 2 2 + = + )= 2 2 2 2 (1 + θ) (1 − θ) (1 + θ) (1 − θ) (1 − θ2 )

The large sample approximation to the variance of θˆ is therefore ˆ ≈ V ar(θ)

1 (1 − θ2 ) = nI(θ) n

which, in this case, is the same value as found in (f).

Statistical Concepts 2012/13 – Sheet 7 – Sample information 1. [Hand in to be marked] A thousand individuals were classified according to gender and whether or not they were colourblind: Male Normal 442 Colourblind 38

Female 514 6

According to genetic theory, each individual, independently of other individuals, has the following probabilities of belonging to the above categories: Male Female 1 Normal p pq + 12 p2 2 1 2 1 q q Colourblind 2 2 where q = 1 − p. (a) Show that the maximum likelihood estimate qˆ of q is 0.0871, to four decimal places. (b) Compute the large sample estimated standard error for the maximum likelihood estimate, using the “observed information”. Hence, find an approximate 99% confidence interval for q. 2. Evaluate and compare ˆ (i) the estimated sample information, nI(θ), and 00 ˆ for the given sample, (ii) the observed information, −L (θ), for each of the following situations. (a) A sample X from a binomial distribution, parameters n (known) and p (unknown). (b) An iid sample Y1 , ..., Yn of size n, from a Poisson distribution, parameter λ.

Statistical Concepts 2013/14 – Solutions 7 – Sample information 1. (a) Putting a = 442, b = 514, c = 38, d = 6, n = 1000, the likelihood is 1 l(q) = constant × pa (pq + p2 )b q c (q 2 )d ∝ (1 − q)a+b (1 + q)b q c+2d 2 L(q) = log(l(q)) = constant + (a + b) log (1 − q) + b log (1 + q) + (c + 2d) log q L0 (q) = −

b (c + 2d) (a + b) + + = 0 → (a + 2b + c + 2d)q 2 + aq − (c + 2d) = 0 → (1 − q) (1 + q) q 760q 2 + 221q − 25 = 0 → qˆ = 0.0871 00

(Check qˆ is a maximum by checking L (ˆ q ) < 0 as below.) (b) Differentiating again, we have 00

L (ˆ q ) = −[

b (c + 2d) (a + b) + + ] (1 − qˆ)2 (1 + qˆ)2 qˆ2

Substituting the observed values of a, b, c, d and qˆ, the observed information is 00

L (ˆ q ) = 1147.127 + 434.935 + 6590.733 = 8172.79 p Hence, the estimated standard error of qˆ is sqˆ = 1/ − L00 (ˆ q ) = 0.011. As z0.005 = 2.5758, an approximate 99% confidence interval for q has limits 0.0871 ± 2.5758 × 0.011 → [0.059, 0.116]

2. (a) (i) First find Fisher’s information for a sample, z say, of size 1. As in lectures, the likelihood is l(p) = f (z|p) = pz (1 − p)1−z So, if L(p) = ln(l(p)), then 1−z z − p2 (1 − p)2

00

L (p) = − so that 00

I(p) = −E(L (p)) =

p 1−p 1 + = 2 2 p (1 − p) p(1 − p)

From lectures, the maximum likelihood estimate for p given the binomial sample X = x is pˆ = Therefore, we estimate the sample information as nI(ˆ p) =

n = pˆ(1 − pˆ)

x n (1

x n.

n n3 x = x(n − x) − n)

(ii) Alternately, writing out the likelihood for the sample X = x of size n, we have l(p) =

n! px (1 − p)n−x x!(n − x)!

so that 00

L (p) = −

n−x x − 2 p (1 − p)2

Therefore the observed information is 00

−L (ˆ p) =

x n−x x n−x n3 + = x 2+ x 2 = 2 2 pˆ (1 − pˆ) (n) (1 − n ) x(n − x)

Observe that (i) and (ii) are the same in this case. (b) (i) First, find Fisher’s information for a sample, y, of size 1. As in lectures, the likelihood is e−λ λy y!

f (y|λ) = Therefore

00

L (λ) = −

y λ2

so that 00

I(λ) = −E(L (λ)) =

1 λ

ˆ = y, the mean of From lectures, the maximum likelihood estimate for λ given sample values y1 , ..., yn is λ the n observations. Therefore, we estimate the sample information as ˆ = nI(λ)

n n = ˆ y λ

(ii) Alternately, writing out the likelihood for the sample we have l(λ) = Πni=1 so that 00

e−λ λyi yi !

Pn

yi

yi

ny n 2 = y y

i=1 λ2

L (λ) = − Therefore the observed information is 00

ˆ = −L (λ)

Pn

Observe that (i) and (ii) are the same in this case. (They are not always the same!)

i=1 ˆ2 λ

=

Statistical Concepts 2013/14 – Sheet 8 – LR Tests 1. [Hand in to be marked] An independent, identically distributed sample, x = (x1 , ..., xn ), of size n, is drawn from a Poisson distribution with parameter λ. We want to test the null hypothesis H0 : λ = λ1 against the alternative hypothesis H1 : λ = λ2 , where λ1 < λ2 . (a) Write down the likelihood ratio for the data, and P show that all likelihood ratio tests of H0 against H1 are of the form: Reject H0 if ni=1 xi > c, for some c. (b) Suppose that n = 50, λ1 = 2, λ2 = 3. By using the central limit theorem, find, approximately, (i) the value of c for which the significance level of the test is 0.01. (ii) the power of the test for this choice of c. 2. We want to construct a test of hypothesis H0 against H1 , based on observation of a random quantity Y , which takes possible values 1,2,3,4,5, with probabilities, given H0 and H1 as follows.

H0 H1

1 .4 .1

2 .2 .2

3 .2 .2

4 .1 .2

5 .1 .3

(a) Suppose that α0 (δ) is the probability that the test δ accepts H1 , if H0 is true, and α1 (δ) is the probability that δ accepts H0 , if H1 is true. Suppose that we are a little more concerned to avoid making the first type of error than we are to avoid making the second type of error. Therefore, we decide to construct the test δ ∗ which minimises the quantity γ(δ) = 1.5α0 (δ) + α1 (δ). Find the test, δ ∗ , and find the values of α0 (δ ∗ ), α1 (δ ∗ ). (b) In the above example, suppose that we replace γ(δ) = 1.5α0 (δ) + α1 (δ) by γ(δ, c) = cα0 (δ) + α1 (δ). Find the corresponding optimal test δc , and find the corresponding values α0 (δc ), α1 (δc ) for each value of c > 0. 3. If gene frequencies AA, Aa, aa are in Hardy-Weinberg equilibrium, then the gene frequencies are (1 − θ)2 , 2θ(1 − θ), θ2 , for some value of θ. Suppose that we wish to test the null hypothesis H0 : θ = 1/3, against the alternative H1 : θ = 2/3, based on the number of individuals, x1 , x2 , x3 with the given genotypes in a sample of n individuals. (a) Find the general form of the likelihood ratio test. (b) If n = 36, find, approximately, the test with significance level 0.01, and find the power of this test. [Hint: You will need to find the mean and variance of (x3 − x1 ). First find these for a sample of size n = 1.] Comment on possible improvements to this choice of test procedure.

Statistical Concepts 2013/14 – Solutions 8 – LR tests. 1. (a) The Poisson distribution, parameter λ has frequency function f (x|λ) =

e−λ λx , x = 0, 1, ... x!

Therefore the likelihood, for the data x = (x1 , , ..., xn ), given λ is l(λ) =

e Πni=1

−λ xi

λ xi !

Pn

e−nλ λ i=1 xi = Πni=1 xi !

Therefore the likelihood ratio for the data is LR(x) =

l(λ2 ) λ2 Pn = e−n(λ2 −λ1 ) ( ) i=1 xi l(λ1 ) λ1

Each likelihood ratio test is of form: Reject H0 if LR(x) > k for some k. Pn As LR(x) is a monotone function of i=1 xi , this is equivalent to the test: Pn Reject H0 if i=1 xi > c for some c. (b) As each Xi has a Poisson distribution, parameter λ, Xi has mean and variance equal to the value of Pn λ. Therefore,T = X has mean and variance equal to nλ. If n = 50, then approximately T = i i=1 Pn i=1 Xi has a normal distribution, by the central limit theorem, so that approximately T is distributed as N (nλ, nλ). (i) We want to choose c so that, if n = 50 and λ = 2, then P (T > c) = 0.01. In this case, approximately T is N (100, 100). Therefore we want 0.01 = P (T > c) = 1 − P (

c − 100 c − 100 T − 100 ≤ ) ≈ 1 − Φ( ) 10 10 10

Therefore, from tables, c−100 = 2.33 so that c = 123.3. 10 (ii) The power of the test is the probability of rejecting H0 if H1 is true, i.e. we want to calculate P (T > 123.3) if n = 50 and λ = 3. In this case, approximately, T is N (150, 150). Therefore, the power of the test is T − 150 123.3 − 150 123.3 − 150 √ P (T > 123.3) = 1 − P ( √ ≤ ) ≈ 1 − Φ( √ ) = 1 − Φ(−2.18) = 0.985 150 150 150

2. (a) As shown in the lectures, the test which minimises aα0 (δ) + bα1 (δ) is to accept H0 if LR(y) < a/b, and to accept H1 if LR(y) > a/b, and to accept either if LR(y) = a/b, where LR(y) = f1 (y)/f0 (y). The likelihood ratio values are as follows. H0 H1 LR(y)

1 .4 .1 .25

2 .2 .2 1

3 .2 .2 1

4 .1 .2 2

5 .1 .3 3

As a = 1.5, b = 1, the optimal test δ ∗ accepts H1 if Y is 4 or 5, and accepts H0 if Y is 1, 2 or 3. Therefore α0 (δ ∗ ), the probability that δ ∗ accepts H1 , if H0 is true, equals the probability of observing Y to be 4 or 5, given H0 , which is 0.2. Similarly, α1 (δ ∗ ), equals the probability of observing Y to be 1,2 or 3, given H1 ,which is 0.5. (b) The acceptance set for H0 is empty if c < 0.25, with (α0 , α1 ) = (1, 0). For 0.25 < c < 1, add the value y = 1, with (α0 , α1 ) = (0.6, 0.1). For 1 < c < 2, also add the values 2 and 3, with (α0 , α1 ) = (0.2, 0.5). For 2 < c < 3 also add 4, with (α0 , α1 ) = (0.1, 0.7). For c > 3, add y = 5 with (α0 , α1 ) = (0, 1) 3. (a) The likelihood, for general θ is l(θ) = f (x1 , x2 , x3 |θ) =

n! [(1 − θ)2 ]x1 [2θ(1 − θ)]x2 [θ2 ]x3 x1 !x2 !x3 !

Therefore, the likelihood ratio is LR(x1 , x2 , x3 ) =

l(2/3) = 4(x3 −x1 ) l(1/3)

The likelihood ratio is monotonic in x3 − x1 . Therefore, the general form of the LR test is Reject H0 if [x3 − x1 ] > c. (b) As the sample size is reasonably large, approximately, by the central limit theorem, X = X3 − X1 has a normal distribution. To find the mean and variance of this distribution, consider a sample of n = 1. For general θ, the possible values of X if n = 1 are -1, 0 , +1, with probabilities (1 − θ)2 , 2θ(1 − θ), θ2 respectively. Therefore, if θ = 1/3, X takes values -1,0,1 with probabilities 4/9, 4/9, 1/9, so that E(X) = −1/3, V ar(X) = 4/9. Therefore, approximately, the distribution of X, when n = 36, is approximately normal, with mean µ = −36/3 = −12, and variance σ 2 = 36 × (4/9) = 42 . We want to choose a value for c so that P (X > c) = 0.01, when X ∼ N (−12, 42 ). From normal tables, the upper 99% point of the standard normal is 2.33. Therefore c = −12 + 2.33 × 4 = −2.68 gives the critical value for a test at significance level 0.01. From the symmetry of the specification, the distribution of X under H1 is, approximately X ∼ N (12, 42 ). So, the power of the test, namely 1−P (X < −2.68), given H1 , is, approximately, 1−Φ((−12−2.68)/4)) = 1 − Φ(−3.67) = 0.9999. Note that when a test has better power than significance level, we may often be able to change the critical value to reduce the significance level at small cost to the power. For example, choosing c = 0 gives significance level 0.0013, and power 0.9987.

Statistical Concepts 2013/14 – Sheet 9 – LR Tests 1. [Hand in to be marked] We observe a series of n counts, x1 , ..., xn . Our null hypothesis H0 is that each count xi is Poisson, with a common parameter λ, while our alternative hypothesis, H1 , is that each xi is Poisson, but with different parameters λ1 , ..., λn . (a) Given H0 , what is the maximum likelihood estimate for λ? Given H1 , what is the maximum likelihood estimate for each λi ? Show that, if Λ is the generalised likelihood ratio, then the corresponding test statistic is −2 log(Λ) = 2

n X i=1

xi log(

xi ) x¯

where x¯ is the sample mean. How many degrees of freedom does the null distribution have? (b) In a study done at the National Institute of Science and Technology, 1980, asbestos fibres on filters were counted as part of a project to develop measurement standards for asbestos concentration. Assessment of the numbers of fibres in each of 23 grid squares gave the following counts: 31 34 26 28 29 27 27 24 19 34 27 21 18 30 18 17 31 16 24 24 28 18 22 Carry out the above test as to whether the counts have the same Poisson distribution and report your conclusions. 2. Let y1 , . . . , yn be a random sample from N(µ, σ 2 ), where the value of σ 2 is known. Show that the likelihood ratio test of the hypothesis µ = µ0 for some specified value of µ0 is equivalent to rejecting the hypothesis when the ratio |¯ y − µ0 | √ σ/ n is “large”, where y¯ is the sample average. What is the exact significance level when µ0 = 0, σ = 1, n = 9, y¯ = 1?

Statistical Concepts 2013/14 – Solutions 9 – LR tests 1. Under H0 , x1 , ..., xn are an independent sample from a Poisson distribution, parameter λ. As shown in ˆ=x lectures, the maximum likelihood estimate for λ is therefore λ ¯, the sample mean. Under H1 , each xi individually is Poisson, parameter λi , so the maximum likelihood estimator for each λi is λˆi = xi . The likelihood ratio is therefore Qn ˆ xi −λˆ Qn n ˆ Y λ e /xi ! f (xi |λ) x ¯ = = Qni=1 xi ˆ ( )xi exi −¯x Λ = Qni=1 ˆ −λi /x ! ˆ x i f (x | λ ) λ e i i i i=1 i=1 i=1 i The likelihood ratio test statistic is therefore −2 ln(Λ) = −2

n X i=1

[xi ln(

n X xi x ¯ ) + (xi − x ¯)] = 2 xi ln( ) xi x ¯ i=1

Under H1 there are n independent parameters, while under H0 there is only one parameter λ. Therefore, asymptotically −2 ln(Λ) has a X 2 distribution with n − 1 degrees of freedom. Pn (b) For the given data, 2 i=1 xi ln( xx¯i ) = 27.11. With 23 observations, we have 22 degrees of freedom. From the tables of the X 2 distribution, we see that the p-value (i.e. the probability of exceeding this value, given the null distribution) is around 0.2. This would only provide very weak evidence against the null hypothesis of a common value of λ. On the other hand, the sample size is fairly small, so the asymptotic approximation is not fully reliable, and we might expect the test to have quite low power. 2. The likelihood function is l(µ) =

n Y

f (yi |µ) =

i=1

n Y

2 2 1 √ e−(yi −µ) /2σ σ 2π i=1

As n n X X (yi − y¯)2 + n(¯ y − µ)2 (yi − µ)2 = i=1

i=1

the log-likelihood can be written: L(µ) = constant −

n (¯ y − µ)2 2σ 2

The unrestricted m.l.e. of µ is µ ˆ = y¯ and the restricted m.l.e is µ ˆ = µ0 . Hence, 2[L(ˆ µ) − L(µ0 )] =

n (¯ y − µ0 )2 σ2

with dim(Θ) − dim(ω) = 1 − 0 = 1 degree-of-freedom. Hence, rejecting when 2[L(ˆ µ) − L(µ0 )] is “large” is equivalent to rejecting when |¯ y − µ0 | √ σ/ n is “large”. When the null hypothesis (µ = µ0 ) is “true” Z=

Y − µ0 √ σ/ n

has a N(0, 1) distribution, and its value is 3 when µ0 = 0, σ = 1, n = 9, y¯ = 1. Thus the p-value of this test (which can also be called the exact significance level) is P[|Z| ≥ 3] = 0.002699796 (computed as 2*(1-pnorm(3)) in R), which is strong evidence against the null hypothesis. Note that in this example, 2[L(ˆ µ) − L(µ0 )] = Z 2 ∼ X12 , exactly.

Statistical Concepts 2013/14 – Sheet 10 Small sample statistics and distribution theory 1. [Hand in to be marked] Suppose that a pharmaceutical company must estimate the average increase in blood pressure of patients who take a certain new drug. Suppose that only six patients (randomly selected from the population of all patients) can be used in the initial phase of human testing. Assume that the probability distribution of changes in blood pressure from which our sample was selected is normal, with unknown mean, µ, unknown variance, σ 2 . (a) Suppose that we use the sample variance s2 to estimate the population variance σ 2 . Find the probability that s2 will overestimate σ 2 by at least 50%. (b) Suppose that the increase in blood pressure, in points, for each of the sample of six patients is as follows: 1.7, 3.0, 0.8, 3.4, 2.7, 2.1 Evaluate a 95% confidence interval for µ from these data. Compare the interval that you would obtain using large sample approximations. (c) Evaluate a 95% confidence interval for σ 2 from these data. 2. If Z has a normal probability distribution, mean µ, variance σ 2 , and Y = eZ , then find the probability density function of Y . [Y is said to have a lognormal density as log(Y ) is normally distributed.] 3. Let X and Y have the joint density 6 f (x, y) = (x + y)2 7

0 ≤ x ≤ 1, 0 ≤ y ≤ 1

(a) By integrating over appropriate regions, find (i) P(X > Y ) (ii) P(X + Y ≤ 1) (iii) P(X ≤ 21 ) (b) Find the marginal density of X. (c) Write down the marginal density of Y . (d) Find the conditional density of X given Y . (e) Write down the conditional density of Y given X. (f) Are X and Y independent? Explain!

Statistical Concepts 2013/14 – Solutions 10– Small sample statistics and distribution theory 1. (a) As each Xi ∼ N (µ, σ 2 ), the distribution of (n − 1)s2 /σ 2 has distribution, Pna chi-square 2 2 ¯ with degrees of freedom n − 1, where s = (1/(n − 1)) i=1 (Xi − X) , the sample variance, and, in this example n = 6. Therefore if Xα2 is the upper α% point of the chi-square distribution with 5 df, then Pn

i=1 (Xi − σ2

α = P(

¯ 2 X)

≥ Xα2 ) = P (s2 ≥ (Xα2 /5)σ 2 )

Therefore, P (s2 ≥ 1.5σ 2 ) corresponds to the value of α for which Xα2 /5 = 1.5, which, from detailed tables, or from using R, is 0.186. 2 2 [ The version of the tables distributed in class gives X0.2 = 7.29, X0.15 = 8.12, identifying the probability as being a bit lower than 0.2.] (b) From the given data x¯ = 2.283, s = 0.950. The appropriate 95% interval is s x¯ ± t0.025 √ n where t0.025 is the upper 0.025 point of the t-distribution with (6-1)=5 degrees of freedom, which is 2.571 from the tables. Therefore the interval is 0.950 2.283 ± (2.571) √ = 2.283 ± 0.997 6 The large sample approximation in this problem would be to suppose that s2 was a very accurate estimate for σ 2 (which we saw above is rather optimistic), and therefore to use the interval s x¯ ± z0.025 √ n where z0.025 is the upper 0.025 point of the normal distribution, namely 1.96 replacing the value 2.571 in the above interval (and so giving a narrower interval, 2.283 ± 0.760, based on ignoring the substantial uncertainty arising from estimating the variance from a small sample). (c) The 95% confidence interval for σ 2 based on a normal sample of size n is Pn (

¯ 2 i=1 (Xi − X) , 2 X(n−1)(0.025)

Pn

¯ 2 i=1 (Xi − X) ) 2 X(n−1)(0.975)

P ¯ 2 = 4.51. The 0.025 and 0.975 values From the given data, n = 6 and 6i=1 (Xi − X) for the chi-square distribution with 5 DF are 0.831 and 12.83, so the 95% interval for σ 2 is (0.35,5.43).

2.

1 2 1 fZ (z) = √ e− 2σ2 (z−µ) σ 2π

Therefore, z = s(y) = log y → ds(y)/dy = 1/y, so ds(y) 1 2 1 fY (y) = fZ (s(y)) e− 2σ2 (log y−µ) = √ dy σ 2πy for y > 0 and zero otherwise.  R 1 R x (i) P[X > Y ] = 1/2 by symmetry, or 0 0 76 (x + y)2 dy dx. i R 1 hR 1−x (ii) P[X + Y ≤ 1] = 0 0 76 (x + y)2 dy dx = 3/14. i R 1 hR 1 (iii) P[X ≤ 21 ] = 0 02 67 (x + y)2 dx dy = 2/7. R1 (b) fX (x) = 0 76 (x + y)2 dy = 27 (3x2 + 3x + 1) for x ∈ [0, 1] and zero o/w.

3. (a)

(c) Similarly, by symmetry, fY (y) = 72 (3y 2 + 3y + 1) for y ∈ [0, 1] and zero o/w. (d) f (x, y) f (x|y) = = fY (y)

(

3(x+y)2 (3y 2 +3y+1)

for x ∈ [0, 1]

0, otherwise

(e) Similarly, by symmetry f (x, y) f (y|x) = = fX (x)

(

3(x+y)2 (3x2 +3x+1)

for y ∈ [0, 1]

0, otherwise

(f) X and Y are not independent because their joint pdf is not the product of the two marginal densities for all x, y. Equivalently, the conditional densities are not equivalent to their corresponding marginal densities.

Statistical Concepts 2013/14 – Sheet 11 – Distribution theory 1. [Hand in to be marked] Suppose that X and Y are independent random quantities, each with exponential pdf f (z) = λe−λz for z > 0 and 0 otherwise. Let U = X + Y and V = X/Y (a) Find the joint pdf of U and V . (b) Find the marginal pdfs of U and V . (c) Are U and V independent? Justify your answer. 2. Suppose that Y and Z are independent random quantities, where Y has a chi-square distribution with n df, and Z has a standard normal distribution. Let Z X = q and W = Y Y n

(i) Find the joint pdf of W and X. (ii) Deduce that the pdf of X is  −(n+1)/2 Γ[(n + 1)/2] x2 1+ fX (x) = √ n nπΓ(n/2) [This is the pdf of the t-distribution with n df.]

Statistical Concepts 2013/14 – Solutions 11 – Distribution theory 1. (a) The inverse function to u = r1 (x, y) = x + y, v = r2 (x, y) = x/y is x = s1 (u, v) = uv/(1 + v), y = s2 (u, v) = u/(1 + v) over u > 0, v > 0. The Jacobian, J, namely the determinant δs 1 δu δs 2 δu

δs1 δv δs2 δv



has absolute value |J| = u/(1 + v)2 . Hence, since X and Y are independent fU,V (u, v) = fX,Y (x, y)|J| = fX (x)fY (y)|J| = λe−λx λe−λy |J| = λe−λuv/(1+v) λe−λu/(1+v)

u u = λ2 e−λu 2 (1 + v) (1 + v)2

for positive u and v and zero otherwise. (b) The marginal pdf of U is Z



Z fU,V (u, v)dv =

fU (u) = −∞



λ2 e−λu

0

u dv = λ2 ue−λu , u > 0 (1 + v)2

and similarly, fV (v) =

1 (1 + v)2

(c) As fU,V (u, v) = fU (u)fV (v) U, V are independent.

2. (a) The inverse function to p w = r1 (y, z) = y, x = r2 (z, y) = z/ (y/n), is p y = s1 (x, w) = w, z = s2 (x, w) = x (w/n) over w > 0, −∞ < x < +∞. The Jacobian, J, namely the determinant δs 1 δx δs 2 δx



δs1 δw δs2 δw

p w/n. Hence, since Y and Z are independent with pdf

has absolute value |J| =

fY (y) =

1

y (n/2)−1 e−y/2

2n/2 Γ(n/2)

1 2 fZ (z) = √ e−z /2 2π so that fW,X (w, x) = fY,Z (y, z)|J| = fY (y)fZ (z)|J| r r 1 x2 w w ) = cw(n+1)/2−1 e− 2 (1+ n )w = fY (w)fZ (x n n where c=

2(n+1)/2

1 √ nπΓ( n2 )

(b) The marginal pdf of X is therefore Z



Z fW,X (w, x)dw = c

fX (x) = −∞

where h(x) =

1 (1 2



w(n+1)/2−1 e−wh(x) dw

0

2

+ x /n).

Recalling the gamma integral Z



xa−1 e−bx dx =

0

Γ(a) ba

(as the gamma pdf integrates to 1), we have fX (x) = c

Γ[(n + 1)/2] x2 −(n+1)/2 Γ((n + 1)/2) √ = (1 + ) h(x)(n+1)/2 n nπΓ(n/2)

Statistical Concepts 2013/14 – Sheet 12 – Bayesian statistics 1. [Hand in to be marked] Joe is a trainee manager, and his boss decides he should prepare a report on the durability of the light bulbs used in the corporation’s offices. His boss wants to know what proportion last longer than the 900 hours claimed by the manufacturer as the time that at least 90% should survive. What should Joe do? Fred, who looks after light bulb replacement, tells Joe that he has been sceptical about the manufacturer’s claims for years, and he reckons it is more like 80%. Joe is a careful type and decides to pin Fred down a bit, offering him a choice between there being 75%, 80%, 85% and 90% which survive beyond 900 hours and getting him to say how relatively likely he thinks those percentages are. Fred says he reckons that 80% is about 4 times more likely than 75%, and about twice as likely as 85% and that 85% is about 4 times as likely as 90%. Joe knows that since his boss is an ex-engineer he is going to demand some facts to back up the speculation. Joe decides to monitor the lifetimes of the next 30 bulbs installed in offices. Fortunately, since the lights are left permanently on (to show passers–by how well the corporation is doing financially), he simply has to record the time of installation and wait for 900 hours. At the end of the study, Joe is able to write up his report. Of his 30 bulbs, 4 have failed. Assuming that Joe accepts Fred’s opinions as the honest opinions of an expert, what should he conclude about the proportion of bulbs which last beyond 900 hours? 2. Suppose that you have a blood test for a rare disease. The proportion of people who currently have this disease is .001. The blood test comes back with two possible results: positive, which is some indication that you may have the disease, or negative. Suppose that the test may give the wrong result: if you have the disease, it will give a negative reading with probability .05; likewise, a false positive result will happen with probability .05. You have three blood tests and they are all positive. What is the probability of you having the disease, assuming blood test results are conditionally independent given disease state? 3. An automatic machine in a small factory produces metal parts. Most of the time (90% from long records) it produces 95% good parts and the remaining have to be scrapped. Other times, the machine slips into a less productive mode and only produces 70% good parts. The foreman observes the quality of parts that are produced by the machine and wants to stop and adjust the machine when she believes that the machine is not working well. Suppose that the first dozen parts produced are given by the sequence s, u, s, s, s, s, s, s, s, u, s, u where s = satisfactory and u = unsatisfactory. After observing this sequence, what is the probability that the machine is in its ‘good’ state, assuming outcomes are conditionally independent given the state of parts? If the foreman wishes to stop the machine when the probability of ‘good state’ is under .7, when should she stop it? After observing the above sequence, what is the probability that the next two parts produced are unsatisfactory?

4. Suppose that a parameter θ can assume one of three possible values θ1 = 1, θ2 = 10 and θ3 = 20. The distribution of a discrete random quantity Y , with possible values y1 , y2 , y3 , y4 , depends on θ as follows:

y1 y2 y3 y4

θ1 .1 .1 .2 .6

θ2 .2 .2 .3 .3

θ3 .4 .3 .1 .2

Thus, each column gives the distribution of Y given the value of θ at the head of the column. Suppose that the parameter θ assumes its possible values 1, 10 and 20, with prior probabilities 0.5, 0.25 and 0.25 respectively. In what follows, assume observations are conditionally independent given θ. (a) Suppose y2 is observed. What is the posterior distribution of θ? What is the mode of this distribution, and compare it with the mle of θ based on y2 . (b) Suppose a second observation is made and y1 is observed. What does the posterior distribution for θ become? (c) Suppose a third observation were contemplated. Find the conditional probability distribution of this “future” observation given that y2 and y1 have been observed. How might this conditional distribution help in predicting the outcome of the third observation?

Statistical Concepts 2013/14 – Solutions 12 – Bayesian Statistics 1. Data D: “4 failures out of 30”. Model(M ) % P [M ] 2 M1 75 15 8 M2 80 15 4 M3 85 15 1 M4 90 15

 P [D | M ]

30 (0.75)26 (0.25)4 4 30 (0.80)26 (0.20)4 4 30 (0.85)26 (0.15)4 4 30 (0.90)26 (0.10)4 4

P [M | D] 0.055704 0.488715 0.373958 0.081623

Data essentially confirms Fred’s belief that the rate is 80% (or at least between 80% and 85%). ¯ mean “does not have disease”. 2. Let D mean “has disease” and D   ¯ = 0.999. Prior: P [D] = 0.001, P D   ¯ ∝ (0.05)3 . Likelihood: P [+ + + | D] ∝ (0.95)3 , P + + + | D   ¯ | + + + ∝ (0.05)3 × 0.999. Posterior: P [D | + + +] ∝ (0.95)3 × 0.001, P D   ¯ | + + + = 0.127132. Hence, P [D | + + +] = 0.872868, P D 3. P [G] = 0.90 P [S | G] = 0.95 P [S | B] = 0.70

P [B] = 0.10 P [U | G] = 0.05 P [U | B] = 0.30

P [G | sequence] ∝ P [G] P [sequence | G] ∝ 0.9 × (0.95)9 (0.05)3 → 0.394217 P [B | sequence] ∝ P [B] P [sequence | B] ∝ 0.1 × (0.70)9 (0.30)3 → 0.605783 P [G | SU ] ∝ P [G] P [SU | G] ∝ 0.90 × 0.95 × 0.05 → 0.670588 P [B | SU ] ∝ P [B] P [SU | B] ∝ 0.10 × 0.70 × 0.30 → 0.329412 As P [G | SU ] < 0.70, she will stop after the second item, which is unsatisfactory. P [U U | sequence] = P [U U | G] P [G | sequence] + P [U U | B] P [B | sequence] = (0.05)2 × 0.394217 + (0.30)2 × 0.605783 = 0.055506

4. θ1 = 1, θ2 = 10, θ3 = 20 with prior probabilities 0.50, 0.25, 0.25 (a) 2 7 2 f (θ2 |y2 ) ∝ f (θ2 )f (y2 |θ2 ) = 0.25 × 0.2 ∝ 2 → 7 3 f (θ3 |y2 ) ∝ f (θ3 )f (y2 |θ3 ) = 0.25 × 0.3 ∝ 3 → 7 f (θ1 |y2 ) ∝ f (θ1 )f (y2 |θ1 ) = 0.50 × 0.1 ∝ 2 →

ˆ 2 ) = 20 Mode is θ3 = 20 and also θ(y (b) f (θ|y1 , y2 ) ∝ f (y1 |θ)f (θ|y2 ) 1 9 2 f (θ2 |y2 , y1 ) ∝ 2 × 0.2 ∝ 4 → 9 6 f (θ3 |y2 , y1 ) ∝ 3 × 0.4 ∝ 12 → 9 f (θ1 |y2 , y1 ) ∝ 2 × 0.1 ∝ 2 →

(c) f (y|y1 , y2 ) =

P3

i=1

f (y|θi )f (θi |y1 , y2 )

1 9 1 f (y2 |y1 , y2 ) = 0.1 × 9 1 f (y3 |y1 , y2 ) = 0.2 × 9 1 f (y4 |y1 , y2 ) = 0.6 × 9 f (y1 |y1 , y2 ) = 0.1 ×

2 9 2 + 0.2 × 9 2 + 0.3 × 9 2 + 0.3 × 9 + 0.2 ×

6 9 6 + 0.3 × 9 6 + 0.1 × 9 6 + 0.2 × 9 + 0.4 ×

which apart from y3 is a fairly “flat” distribution.

29 90 23 = 90 14 = 90 24 = 90 =

Statistical Concepts 2013/14 – Sheet 13 – Bayesian Statistics 1. [Hand in to be marked] Show that if X ∼ Beta (a, b) then E [X | a, b] =

a a+b

Var [X | a, b] =

ab (a + b)2 (a + b + 1)

In question 1, problem sheet 12, Joe elicits Fred’s prior beliefs in the form of a discrete distribution. Suppose instead that Joe had managed to elicit from Fred that his mean and standard deviation for the percentage of lightbulbs lasting more than 900 hours are 82% and 4%, respectively. Use a Beta distribution to capture Fred’s prior beliefs and calculate the posterior mean and posterior standard deviation for the percentage of lightbulbs lasting more than 900 hours, given that 4 out of the 30 lightbulbs had failed by 900 hours. 2. Independent observations y1 , . . . , yn are such that yi (i = 1, . . . , n) is a realisation from a Poisson distribution with mean θti where t1 , . . . , tn are known positive constants and θ is an unknown positive parameter. [It may be helpful to regard yi as the number of events ocurring in an interval of length ti in a Poisson process of constant rate θ, and where the n intervals are non-overlapping.] Prior beliefs about θ are represented by a Gamma (a, b) distribution, for specified constants a and b. Show that the posterior distribution for θ is Gamma (a + y, b + t), where y = y1 + . . . + yn and t = t1 + . . . + tn . In all of what follows, put a = b = 0 in the posterior distribution for θ, corresponding to a limiting form of “vague” prior beliefs. A new extrusion process for the manufacture of artificial fibre is under investigation. It is assumed that incidence of flaws along the length of the fibre follows a Poisson process with a constant mean number of flaws per metre. The numbers of flaws in five fibres of lengths 10, 15, 25, 30, and 40 metres were found to be 3, 2, 7, 6 and 10, respectively. Find the posterior distribution for the mean number of flaws per metre of fibre; and compute the posterior mean and variance of the mean number of flaws per metre. Show that the probability that a new fibre of length 5 metres will not 24 28 contain any flaws is exactly 25 . [Hint: “average” the probability of this event for any θ with respect to the posterior distribution of θ.]

Statistical Concepts 2013/14 – Solutions 13 – Bayesian Statistics 1. Γ(a + b) E [X] = Γ(a)Γ(b)

Z

1

x(a+1)−1 (1 − x)b−1 dx =

0

Γ(a + b) Γ(a + 1)Γ(b) a = Γ(a)Γ(b) Γ(a + b + 1) a+b

Similarly   Γ(a + b) Γ(a + 2)Γ(b) a(a + 1) E X2 = = Γ(a)Γ(b) Γ(a + b + 2) (a + b)(a + b + 1) Therefore, Var [X] = E [X 2 ] − (E [X])2 = ab/(a + b)2 (a + b + 1). Equating mean and variance, a/(a + b) = 0.82 and ab/(a + b)2 (a + b + 1) = (0.04)2 , gives a = 74.825 and b = 16.425. With s = 26, f = 4, posterior beliefs for p ∼ Beta(a + s, b + f ) = Beta(100.825, 20.425). Therefore, E [p | s = 26, f = 4] = 100.825/(100.825+20.425) = 100.825/121.25 = 0.831546, and Var [p | s = 26, f = 4] = 100.825 × 20.425/121.252 × 122.25 = 0.001146 = (0.03385)2 . Hence, the posterior expectation for the percentage of lightbulbs lasting more than 900 hours is approximately 83.2% and the posterior standard deviation is approximately 3.4%. 2. likelihood ∝

n Y

(θti )yi e−θti ∝ θy e−θt

i=1 a−1 −bθ

prior ∝ θ e posterior ∝ θa−1 e−bθ × θy e−θt = θa+y−1 e−(b+t)θ Hence, the posterior distribution is Gamma(a + y, b + t). When a = b = 0 f (θ|y, t) =

ty y−1 −tθ θ e Γ(y)

In example, y = 3 + 2 + 7 + 6 + 10 = 28 and t = 10 + 15 + 25 + 30 + 40 = 120. Hence, f (θ|y, t) =

12028 27 −120θ θ e Γ(28)

E [θ | y = 28, t = 120] = 28/120 = 0.233 · · · and Var [θ | y = 28, t = 120] = 28/1202 = 0.00194. Let Y be the number of flaws in a new fibre of length T = 5. We want Z



P [Y = 0 | y = 28, t = 120, T = 5] = Z0 ∞

e−5θ f (θ|y = 28, t = 120) dθ

12028 27 −(120+5)θ θ e dθ = Γ(28) 0  28 Z ∞ 120 12528 27 −125θ = θ e dθ 125 Γ(28) 0  28 24 = = 0.318856 25

Statistical Concepts 2013/14 – Sheet 14 – Bayesian Statistics 1. [Hand in to be marked] Suppose that the heights of individuals in a certain population have a normal distribution for which the value of the mean height µ is unknown but the standard deviation is know to be 2 inches. Suppose also that prior beliefs about µ can be adequately represented by a normal distribution with a mean of 68 inches and a standard deviation of 1 inch. Suppose 10 people are selected at random from the population and their average height is found to be 69.5 inches. (a) What is the posterior distribution of µ? (b) (i) Which interval, 1 inch long, had the highest prior probability of containing µ? What is the value of this probability? (ii) Which interval, 1 inch long, has the highest posterior probability of containing µ? What is the value of this probability? (c) What is the posterior probability that the next person selected at random from the population will have height greater than 70 inches? (d) What happens to the posterior distribution in this problem when the number of people n whose heights we measure becomes very large? Investigate this by (i) seeing what happens when n becomes very large in the formulae you used for part (a); (ii) using the general theoretical result on limiting posterior distributions. Check that (i) and (ii) give the same answer in this case. 2. Albert, a geologist, is examining the amount of radiation being emitted by a geological formation in order to assess the risk to health of people whose homes are built on it. He would like to learn about the average amount of radiation λ being absorbed per minute by individual residents. His mean and standard deviation for λ are 100 particles/minute and 10 particles/minute, and he is willing to use a gamma distribution to represent his beliefs about λ. Albert would like to have more precise knowledge about λ. He has an instrument which measures the exposure which would have been received by a human standing at the same location as the instrument for one minute. Since he is dealing with radioactivity, he believes his machine measurements follow a Poisson distribution with mean λ. However, he does not know how many measurements he needs to make to sufficiently reduce his uncertainty about λ. How many measurements would you advise him to make if he wishes his expected posterior variance for λ to be 4 or less? [HINT: first find what his posterior distribution for λ would be for n observations and then use the first-year probability result E [X] = E [E [X | Y ]] to help you compute the expectation of his posterior variance for λ.] 3. When gene frequencies are in equilibrium, the genotypes Hp1-1, Hp1-2 and Hp2-2 of Haptoglobin occur with probabilities (1 − θ)2 , 2θ(1 − θ) and θ2 , respectively. In a study of 190 people the corresponding sample numbers were 10, 68 and 112. Assuming a uniform prior distribution for θ over the interval (0, 1), compute the posterior distribution for θ. Compute the posterior expectation and variance for θ. Find a “large sample” 99% Bayesian confidence interval for θ, based on these data.

4. Suppose that y1 , . . . , yn is a random sample from a uniform distribution on (0, θ) where θ is an unknown positive parameter. Show that the likelihood function l(θ) is given by ( 1/θn m < θ < ∞ l(θ) = (1) 0 otherwise where m = max{y1 , . . . , yn }. Suppose that the prior distribution for θ is a Pareto distribution ( aba /θa+1 b < θ < ∞ f (θ) = (2) 0 otherwise where a and b are specified positive constants. Show that the posterior distribution for θ is also Pareto with constants a + n and max{b, m}. Now put a = b = 0 in the posterior distribution (corresponding to “vague” prior beliefs), and with this specification show that (m, mα−1/n ) is the 100 (1 − α)% highest posterior density (HPD) credibility interval for θ; that is, the posterior density at any point inside the interval is greater than that of any point outside the interval. Is this interval a 100 (1−α)% confidence interval in the frequentist sense? If so, show this to be the case.

Statistical Concepts 2013/14 – Solutions 14 – Bayesian Statistics 1. (a) Height, Y ∼ N (µ, 22 ), so σ 2 = 4. µ ∼ N (68, 12 ), so µ0 = 68, σ02 = 1. n = 10 and y¯ = 69.5. Therefore, µ|data ∼ N (µn , σn2 ) where    n¯ y µ0 68 10×69.5 + σ2 + σ02 2 2 2  = 1 1 µn =  = 69.07 inches 10 n 1 + 2 2 + 1 2 2 2 σ σ 0

and

1 = σn2





n 1 + 2 2 σ0 σ

 =

1 10 + 2 2 1 2

 = 3.5

so that σn = 0.53 inches. Thus, µ|data ∼ N (69.07, 0.532 ). (b) (i) 68 ± 0.5 → (67.5, 68.5). Probability is P [|Z| ≤ 0.5] = 0.3829 √ (ii) 69.07 ± 0.5 → (68.57, 69.57) Probability is P [|Z| ≤ 0.5 3.5] = 0.6505 (c) If X ∼ N (w, σ 2 ), w ∼ N (µ, v 2 ) then X ∼ N (µ, σ 2 + v 2 ). In this problem, the posterior predictive distribution for Y , the height of a further person selected from the population, is therefore N (69.07, 4.281). Therefore 70 − 69.07 Y − 69.07 ≤ √ ) = 1 − Φ(0.45) = 0.326 P (Y > 70) = 1 − P ( √ 4.281 4.281 (d) (i)  µn =  1 = σn2



µ0 σ02

+

n¯ y σ2

1 σ02

+

n σ2

1 n + 2 2 σ0 σ



  → y¯, n → ∞

⇒ σn2 →

σ2 ,n → ∞ n

(ii) The general large sample result is that the posterior distribution of µ tends to a normal distribution N (ˆ µ, −1/L00 (ˆ µ)), as n → ∞, where µ ˆ is the maximum likelihood estimator for µ and L is the log likelihood, (or equivalently N (ˆ µ, 1/nI(ˆ µ)), where I is Fisher’s information, i.e. minus the expected value of L00 ). In this case, we have posterior normality for all sample sizes. The m.l.e. is the sample mean y¯ as previously found, where the log likelihood was shown to be L(µ) = constant − so that

n(¯ y − µ)2 2σ 2

n σ2 so that the large sample limit for posterior variance is σ 2 /n, agreeing with the values found directly in (i). L00 (µ) = −

2. For a Gamma (a, b) prior E [λ] = a/b = 100 and Var [λ] = a/b2 = 100 gives a = 100 and b = 1. For general a and b and with a Poisson likelihood the posterior for λ is Gamma (a + n¯ y , b + n). Thus Var [λ | y] =

a + n¯ y (b + n)2

Therefore

  a + nE Y¯ a E [Var [λ | y]] = = 2 (b + n) b(b + n)      because E Y¯ = E E Y¯ | λ = E [λ] = a/b. With a = 100 and b = 1, we require 100 ≤4 (1 + n)

which gives n = 24. 3. For sample numbers A = 10, B = 68, C = 112, the likelihood is proportional to [(1 − θ)2 ]A [2θ(1 − θ)]B [θ2 ]C ∝ θ2C+B (1 − θ)2A+B and with a uniform prior for θ on (0, 1) the posterior is proportional to the likelihood, and we recognise that this is a Beta(2C + B + 1, 2A + B + 1) distribution with 2C + B + 1 = 293 and 2A + B + 1 = 89. Hence, E [θ | data] = 293/(89+293) = 0.767 and Var [θ | data] = 293×89/3822 ×(382+1) = 0.0004666; and the posterior SD is 0.0216. A 99% confidence can be based on these values, or, almost exactly the same, based on the result that for large samples or “vague” prior information, the posterior distribution for θ is approximately normal with mean θˆ = (2C + B)/2n = 292/380 = 0.7684 and ˆ = θ(1 ˆ − θ)/2n ˆ variance −1/L00 (θ) = 0.0004683. In either case the confidence interval is approximately 0.768 ± 0.02164 × 2.575 → (0.713, 0.824), where z0.005 = 2.575.

4. The joint pdf is ( 1/θn f (y1 , . . . , yn |θ) = 0

yi < θ < ∞ for i = 1, . . . , yn otherwise

But yi < θ < ∞ for i = 1, . . . , n is equivalent to m = max{y1 , . . . , yn } < θ < ∞. When f (y|θ) is considered as a function of θ (for given data y1 , . . . , yn ) the likelihood is ( 1/θn m < θ < ∞ l(θ) = 0 otherwise The prior for θ is ( aba /θa+1 f (θ) = 0

b
View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF