A First Course in Probability Solutions Chapter 1-12

September 20, 2019 | Author: Anonymous | Category: Función (Matemáticas), Conjunto vacío, Posibilidades, Infinito, Conceptos matemáticos
Share Embed Donate


Short Description

Download A First Course in Probability Solutions Chapter 1-12...

Description

BETA

Chapter 1

Probability Basics 1.1

From Percentages to Probabilities

Basic Exercises 1.1 Let G stand for Graham, B for Baucus, C for Conrad, M for Murkowski, and K for Kyl. a) {G, B, C, M, K}. b) 3/5. 1.2 a) {{G,B},{G,C},{G,M}, {G,K},{B,C},{B,M},{B,K}, {C,M},{C,K},{M,K}}. b) 3/10, as there are exactly three possible subcommittees consisting of two Democrats: {G,B},{G,C},{B,C}, out of a total of ten possible subcommittees of two senators. c) 1/10, as there is exactly one possible subcommittee consisting of two Republicans, namely {M,K}, out of a total of ten possible choices for the subcommittee. d) (10−4)/10 = 6/10, as out of ten possible committees, exactly four committees do not consist of one Republican and one Democrat. 1.3 a) {(G,B),(B,G),(G,C),(C,G),(G,M),(M,G), (G,K),(K,G),(B,C),(C,B), (B,M),(M,B),(B,K),(K,B), (C,M),(M,C),(C,K),(K,C),(M,K),(K,M)}, where first letter in each pair (·, ·) corresponds to the name of the chair, while the second one corresponds to the name of the vice-chair. b) Equally probable. In the first scenario the probability is 1/5, in the second scenario the probability is 4/20=1/5, since four outcomes (G,B),(G,C), (G,M),(G,K) out of twenty correspond to an event that Graham is the chair of the committee. c) 12/20=3/5, since 12 ordered pairs out of 20 start with G,B or C. 1.4 Note that the total number of units (in thousands) is equal to 471+1,470+11,715+23,468+24,476+21,327+13,782+15,647=112,356. Therefore, 23, 468 ≈ 0.2089; a) 112, 356 24, 476 + 21, 327 + 13, 782 + 15, 647 75, 232 b) = ≈ 0.6696; 112, 356 112, 356 1, 941 471 + 1, 470 = ≈ 0.0173; c) 112, 356 112, 356 d) 0; e) 1;

1.1 From Percentages to Probabilities

2

f ) All U.S. housing units. 1.5 The total number of murder victims between 20 and 59 years old is equal to 2916 + 2175 + 1842 + 1581 + 1213 + 888 + 540 + 372 = 11, 527. Then 1, 213 a) ≈ 0.1052; 11, 527 8, 611 11, 527 − 2, 916 = ≈ 0.7470; b) 11, 527 11, 527 888 + 540 + 372 1, 800 c) = ≈ 0.1562; 11, 527 11, 527 5, 463 2916 + 2175 + 372 = ≈ 0.4739; d) 11, 527 11, 527 e) All murder victims in the U.S. who are between 20 and 59 years old. 1.6 The answers will vary. In general, suppose the number of left-handed female students in the class is ℓ1 and the number of left-handed male students in the class is ℓ2 . Moreover, suppose the class consists of n1 girls and n2 boys. Then a) n1 /(n1 + n2 ); b) (ℓ1 + ℓ2 )/(n1 + n2 ); c) ℓ1 /(n1 + n2 ); d) (n2 − ℓ2 )/(n1 + n2 ). 1.7 a) 0.38 + (0.62)(0.2) = 0.504, since 38% of students can speak only English, plus another 20% of the bilingual students do not speak Spanish. b) (0.62)(0.8)(0.1) = 0.0496 1.8 a) The answers will vary since the experiment is random. b) n5 (E)/5, where n5 (E) is the number of heads in 5 tosses. The answer is based on the frequentist interpretation of probability. c) n10 (E)/10, where n10 (E) equals to the number of heads in 10 tosses. The answer is based on the frequentist interpretation of probability. d) n20 (E)/20, where n20 (E) equals to the number of heads in 20 tosses. The answer is based on the frequentist interpretation of probability. e) In general, the sequence n(E)/n is not monotone in n and may fail to have a limit. 1.9 On each day of the year, the probability that a Republican governor is chosen equals to 28/50 = 0.56. Thus, a Republican governor is chosen 56% of time. Since 2004 is a leap year (with 366 days), the number of days when a Republican governor is selected to read the invocation is approximately equal to 205, since 366(0.56) = 204.96 1.10 a) About 31.4 percent of human gestation periods exceed 9 months. b) The favorite in a horse race finishes in the money in about two thirds of the races. c) When a large number of traffic fatalities is considered, about 40% of the cases involve an intoxicated or alcohol-impaired driver or nonoccupant. 1.11 a) 4000 × 0.314 = 1256. b) 500 × (2/3) ≈ 333. c) 389 × 0.4 ≈ 156. Advanced Exercises

3

CHAPTER 1. Probability Basics

1.12 The odds that the ball lands on red should be 18 to 20 for the bet to be fair. 1.13 a) If p1 is the probability that Fusaichi Pegasus wins the race, then (1 − p1 )/p1 = 3/5. Thus, p1 = 5/8. b) If p2 is the probability that Red Bullet wins the race, then (1 − p2 )/p2 = 9/2. Thus, p2 = 2/11. (Note: Posted odds at horse races are always “odds against”).

1.2

Set Theory

Basic Exercises 1.14 Yes. If the sets are pairwise disjoint, then the intersection of all the sets must be empty, since that intersection must be contained in pairwise intersections of the original sets. 1.15 The answers will vary. One example is given below:

A C B

1.16 The answers will vary. One example is given below:

A D

B C

1.17 The answers will vary. Say, take {{a,b,e},{b,c,f},{c,d,e},{d,a,f}}, where a,b,c,d,e,f are distinct points, then every pairwise intersection is not empty, but every three sets have an empty intersection. ! " 1.18 a) 4n=1 [0, 1/n] = [0, 1/4] and 4n=1 [0, 1/n] = [0, 1]. b) ∞ ∞ # $ [0, 1/n] = {0}, [0, 1/n] = [0, 1], n=1

n=1

1.19 In part a) assume the convention that [a, b] is empty for b < a. a) (1, 2); b) [1, 2); c) [1, 2]; d) {3}; e) ∅; f ) ∅; g) [5, 6). 1.20 a) (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5).

1.2 Set Theory

4

b) (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5), (3, 1), (3, 2), (4, 1), (4, 2), (4, 3), (5, 1), (5, 2), (5, 3). c) (1, 1), (1, 2), (2, 1), (2, 2), (4, 4), (4, 5), (5, 4), (5, 5). d) A = ({1, 2} × {1, 2}) ∪ ({4, 5} × {4, 5}). 1.21 a) (0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1). b) (0, 0, 1), (0, 0, 2), (0, 1, 1), (0, 1, 2), (1, 0, 1), (1, 0, 2), (1, 1, 1), (1, 1, 2). c) (a, f ), (a, g), (a, h), (b, f ), (b, g), (b, h), (c, f ), (c, g), (c, h), (d, f ), (d, g), (d, h), (e, f ), (e, g), (e, h). d) The answer is the same as in part c). 1.22 [1, 2] × [1, 2]. 1.23 The answers will vary. For example, take In = (0, 1/n), for n ∈ N . Then Ij ∩ Ik = ! 1 ) ̸= ∅ for all j and k, but ∞ (0, 1/n) = ∅. (0, max(j,k) n=1 ! 1.24 The answers will vary. For example, take In = [0, 1/n] for all n ∈ N , then ∞ n=1 In = {0}. (Note: the requirement Ij ∩ Ik ̸= ∅ could be omitted in the formulation of the problem). " 1.25 The answers will vary. One possibility is R = k∈Z [k, k + 1). Theory Exercises

1.26 a) Verification of (A ∩ B)c = Ac ∪ B c using Venn diagrams:

A

B

A

A

B

A

B

Ac

A

A

B

A

Bc

B

(A ∩ B)c

A∩B

B

A

B

B

Ac ∪ B c

b) x ∈ (A ∩ B)c if and only if x ∈ / A ∩ B, which is true if and only if x ∈ / A or x ∈ / B. The latter is equivalent to x ∈ Ac or x ∈ B c , which is in turn equivalent to x ∈ Ac ∪ B c . c) By the first de Morgan’s law, (Ac ∪ B c )c = (Ac )c ∩ (B c )c = A ∩ B, thus, (A ∩ B)c = Ac ∪ B c . 1.27 a) Verification of A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) using Venn diagrams:

A

B C

A

A

B C

B∪C

A

B C

A ∩ (B ∪ C)

5

CHAPTER 1. Probability Basics

A

B

A

C

B

A

C

A∩B

B C

A∩C

(A ∩ B) ∪ (A ∩ C)

Verification of A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C):

A

B

A

C

A

A

B

B

C

B∩C

A ∪ (B ∩ C)

B

A

C

A∪B

B

C

A

C

A

B C

A∪C

(A ∪ B) ∩ (A ∪ C)

b) Proof of distributivity: A ∩ (B ∪ C) = {x : x ∈ A and x belongs to (B or C)} = {x : x ∈ A ∩ B or x ∈ A ∩ C} = (A ∩ B) ∪ (A ∩ C). Moreover, using the de Morgan’s laws and the above equation, one also obtains that A ∪ (B ∩ C) = (Ac ∩ (B ∩ C)c )c = (Ac ∩ (B c ∪ C c ))c = ((Ac ∩ B c ) ∪ (Ac ∩ C c ))c = (Ac ∩ B c )c ∩ (Ac ∩ C c )c = (A ∪ B) ∩ (A ∪ C). 1.28 a) Commutativity is obvious from the Venn diagrams below:

A

B

A∩B =B∩A

A

B

A∪B =B∪A

1.2 Set Theory

6

Verification of A ∩ (B ∩ C) = (A ∩ B) ∩ C using Venn diagrams:

A

B

A

C

A

C

B

A

B

C

C

A∩B

C

B C

B∩C

A

A

B

A ∩ (B ∩ C)

A

B C

(A ∩ B) ∩ C

Verification of A ∪ (B ∪ C) = (A ∪ B) ∪ C using Venn diagrams:

A

B

A

C

A

C

B

A

A ∪ (B ∪ C)

B

C

C

A∪B

C

B C

B∪C

A

A

B

A

B C

(A ∪ B) ∪ C

b) Proof of commutativity: A ∩ B = {x : x ∈ A and x ∈ B} = B ∩ A. A ∪ B = {x : x ∈ A or x ∈ B} = B ∪ A.

7

CHAPTER 1. Probability Basics

Proof of associativity: A ∩ (B ∩ C) = {x : x ∈ A and x ∈ B ∩ C} = {x : x ∈ A and x ∈ B and x ∈ C} = {x : x ∈ A ∩ B and x ∈ C} = (A ∩ B) ∩ C. A ∪ (B ∪ C) = {x : x ∈ A or x ∈ B ∪ C} = {x : x ∈ A or x ∈ B or x ∈ C} = {x : x ∈ A ∪ B or x ∈ C} = (A ∪ B) ∪ C. 1.29 a) A ∪ ∅ consists of all the elements that are in A or in ∅, but ∅ has no elements, thus A ∪ ∅ = A. b) Every element of A must be contained in the union of sets A and B, thus A ⊂ A ∪ B. c) Suppose B ⊂ A, then every element of B is also an element of A, implying that A ∪ B = A. Conversely, suppose A ∪ B = A, then Ac = (A ∪ B)c = B c ∩ Ac , implying that B ∩ Ac = B ∩ (B c ∩ Ac ) = (B ∩ B c ) ∩ Ac = ∅ ∩ Ac = ∅. Thus, there are no elements of B which are not in A. In other words, every element of B must be in A, i.e. B ⊂ A. 1.30 a) A ∩ ∅ consists of all elements that are in both A and ∅. Since ∅ has no elements, A ∩ ∅ must be empty. b) Every element of A ∩ B must be an element of A (and B), thus, A ∩ B ⊂ A. c) Suppose A ⊂ B. If x ∈ A, then x ∈ B, implying that x ∈ A∩B. Therefore, A ⊂ (A∩B) ⊂ A, where the last inclusion holds by part (b), thus, A = A ∩ B. Conversely, suppose A = A ∩ B, then A ∩ B c = (A ∩ B) ∩ B c = A ∩ (B ∩ B c ) = A ∩ ∅ = ∅, thus, there are no elements of A which are not in B, implying that every element of A must be in B, i.e. A ⊂ B. 1.31 a) To see that (A ∩ B) ∪ (A ∩ B c ) = A, first note that A ∩ B ⊂ A and A ∩ B c ⊂ A, implying that (A ∩ B) ∪ (A ∩ B c ) ⊂ A. On the other hand, every element from A is either in B (in which case it belongs to A ∩ B) or not in B (in which case it belongs to A ∩ B c ). Thus, A ⊂ (A ∩ B) ∪ (A ∩ B c ). Therefore, the required equality follows. b) If A ∩ B = ∅, then x ∈ A implies that x ∈ / B (since otherwise we will find x ∈ A ∩ B = ∅, which is impossible). In other words, if x ∈ A then x ∈ B c . Thus, A ⊂ B c . c) If A ⊂ B, then A = A ∩ B, implying that Ac = (A ∩ B)c = Ac ∪ B c . Therefore, B c ⊂ (Ac ∪ B c ) = Ac . 1.32 First let us show that

%

∞ $

An

n=1

&c

=

∞ #

Acn .

n=1

" " c Note that x ∈ ( ∞ / ∞ / An for n=1 An ) if and only if x ∈ n=1 An , which is true if and only if x ∈ all n!∈ N . The latter is equivalent to x ∈ Acn for all n ∈ N , which, in turn, holds if and only if c c x∈ ∞ n=1 An . To see the second identity, note (from the above with An in place of An ) that %∞ &c ∞ ∞ $ # # c Acn , An = (Acn ) = n=1

n=1

n=1

and, upon taking complements, we obtain that (

!∞

c n=1 An )

=

"∞

c n=1 An .

1.2 Set Theory

1.33 First let us show that B∩(

∞ $

An ) =

n=1

∞ $

8

(B ∩ An ).

n=1

" Note that x ∈ B∩( ∞ n=1 An ) if and only if x ∈ B and x ∈ An for some n ∈ N , which " is equivalent to x ∈ B ∩ An for some n ∈ N . The latter statement holds if and only if x ∈ ∞ n=1 (B ∩ An ). c Thus, the required identity holds. Next note that (from the above with An in place of An , and B c in place of B) ∞ ∞ $ $ c c c Acn ), (B ∩ An ) = B ∩ ( n=1

n=1

which, in view of Exercise 1.32, implies that &c %∞ &c % ∞ ∞ $ # $ = (B c ∩ Acn ) (B ∪ An ) = (B ∪ An )c n=1

=

%

n=1

n=1

c

B ∩(

∞ $

n=1

&c

Acn )

=B∪

%

∞ $

n=1

Acn

&c

=B∪

%

∞ #

n=1

An

&

.

" 1.34 a) If B ⊂ ∞ n ∈ N , thus, n=1 An , then every given point x"of B belongs to An for some "∞ x ∈ B ∩ An for some n ∈ N , implying that x ∈ ∞ (B ∩ A ). Thus, B ⊂ n n=1 n=1 (B ∩ An ). "∞ Conversely, note that B ∩ An ⊂ B for all n ∈ N , thus, n=1 (B ∩ An ) ⊂ B. Therefore, " (B ∩ An ). B= ∞ n=1 b) Required equality follows at once from part (a) with B = E. ! c) If!A1 , A2 , . . . are pairwise disjoint, then Aj ∩ Ak = ∅ for all j ̸= k, then (Aj ∩ E) (Ak ∩ E) ⊂ (Aj Ak ) = ∅ for all j ̸= k, implying that (A1 ∩ E), (A2 ∩ E), . . . are pairwise disjoint. d) If " A1 , A2 , . . . form a partition of U , then, by part (b), every E ⊂ U can be written as E = ∞ n=1 (An ∩ E), where the latter union is the union of pairwise disjoint sets (A1 ∩ E), (A2 ∩ E), . . . , due to part (c). Advanced Exercises 1.35 a) Take an arbitrary point " x ∈ lim inf n→∞ An . Then there exists an n ∈ N such that x ∈ Ak for all k ≥ n. Therefore, x ∈ ∞ k=m Ak for every m ∈ N , implying that x ∈ lim supn→∞ An . Thus, lim inf n→∞ An ⊂ lim supn→∞ An . b) The set lim inf n→∞ An consists of all the points that belong to all but finitely many An ’s. The set lim supn→∞ An consists of all the points that belong to infinitely many An ’s. Thus, clearly, the limit inferior is a subset of the limit superior. c) On the one hand, lim supn→∞ An = [−1, 1], since every point in [−1, 1] belongs to infinitely many An ’s and no point outside [−1, 1] belongs to infinitely many An ’s. To see the latter, take an arbitrary x ∈ / [−1, 1], then there exists n ∈ N such that 1 + n1 < |x|, then x ∈ / Ak for all k ≥ n. Thus x ∈ / lim supn→∞ An . On the other hand, lim inf n→∞ An = {0}. Indeed, !∞ note that {0} ∈ An for all n ∈ N and A ∩ A = {0} for all n ∈ N , implying that k=n Ak = {0} for all n ∈ N . Thus, {0} = "n∞ !n+1 ∞ n=1 k=n Ak . 1.36 Define a function f : N → Z by f (2k) = k and f (2k − 1) = −k + 1 for all k ∈ N . Clearly,

9

CHAPTER 1. Probability Basics

f is a one-to-one function from N onto Z, whose inverse f −1 , for each z ∈ Z, is given by ' 2z, if z ∈ {1, 2, 3, . . . }, −1 f (z) = 1 − 2z, if z ∈ {0, −1, −2, −3, . . . }, thus Z and N are equivalent sets and Z is a countably infinite set. 1.37 Define a function f : N 2 → N by f (m, n) = 2m−1 (2n − 1). Clearly, for every point z ∈ N there exists a unique pair (m, n) ∈ N such that z = 2m−1 (2n − 1). Thus, f is a one-to-one function from N 2 onto N , implying that N 2 is a countably infinite set. "∞ 1.38 Take an arbitrary sequence (xk )∞ k=1 {xk } is a countk=1 , then its range R(x1 , x2 , . . . ) = able set. Indeed, either the range R(x1 , x2 , . . . ) is finite or R(x1 , x2 , . . . ) can be written as a set {aj : j ∈ N }, where each aj = xm for some m ∈ N and a1 , a2 , . . . are all distinct. In the latter case, define a function f : N → {aj : j ∈ N } by f (n) = an , then f is a one-to-one function from N onto {aj : j ∈ N } = R(x1 , x2 , . . . ). It follows therefore that the range of an arbitrary sequence is countable. Conversely, let A be an arbitrary nonempty countable set. Then either A is finite or infinitely countable. If A is finite, say, A = {a1 , . . . , am } for some finite m ∈ N , then, for every k ∈ N , let xk = ak if 1 ≤ k ≤ m, and xk = am for k > m. Then (xk )∞ k=1 is an infinite sequence whose range equals A. Finally, if A is infinitely countable, then there is a one-to-one function f that maps N onto A. Thus, A is equal to the range of an infinite sequence (f (n))∞ n=1 . Therefore, we showed that every nonempty countable set is the range of some infinite sequence. 1.39 Let A be a countable set and let g be an arbitrary function defined on A. By Exercise 1.38, A is the range of some infinite sequence (xk )∞ k=1 , thus, g(A) = {g(a) : a ∈ A} = {g(a) : a ∈

∞ $

{xk }} =

∞ $

{g(xk )},

k=1

k=1

implying that g(A) is the range of an infinite sequence (g(xk ))∞ k=1 . Thus, by Exercise 1.38, g(A) is countable. 1.40 Let A be a countable set and B ⊂ A. If B = ∅, then it is finite and therefore countable. If B ̸= ∅, then there exists some point b ∈ B, and we can define a function g : A → B by ' a, if a ∈ B, g(a) = b, if a ∈ A ∩ B c . Clearly, g(A) = B thus, by Exercise 1.39 (which is based on Exercise 1.38), we conclude that B is countable. 1.41 Let A and B be countable sets. If at least one of the two sets is empty, then A × B = ∅, in which case the cartesian product of A and B is a finite set and, therefore, countable. Now suppose that both A and B are nonempty. Then A is the range of some sequence (xk )∞ k=1 and B is the range of some sequence (yj )∞ , implying that j=1 A × B = {(a, b) : a ∈ A, b ∈ B} = {(a, b) : a ∈

∞ $

{xk }, b ∈

k=1

∞ $

{yj }}

j=1

1.2 Set Theory

=

∞ $ ∞ $

{(xk , yj )} =

k=1 j=1

$ $

10

{(xk , yj )},

k∈K j∈J

where K ⊂ N and J ⊂ N are such that, for k ∈ K, xk ’s are all distinct and {xk : k ∈ K} coincides with A, and, for j ∈ J , yj ’s are all distinct and {yj : j ∈ J } coincides with B. Then A × B = {(xk , yj ) : (k, j) ∈ K × J } = g(K × J ), where we defined g((k, j)) = (xk , yj ) for all (k, j) ∈ K × J . By Exercise 1.37, N 2 is countable, thus, by Exercise 1.40, K × J , as a subset of the countable set N 2 , is itself countable. Then it follows that g(K×J ) is countable by Exercise 1.39. Hence A×B is countable. Moreover, suppose A1 , A2 , . . . , An is a finite collection of countable sets, then by induction one can obviously show that A1 × A2 × · · · × An = A1 × (A2 × · · · × An ) is a countable set. " " 1.42 Note that Q = r∈Z q∈N { rq } = g(Z × N ), where we define g((r, q)) = r/q for all (r, q) ∈ Z × N . By Exercises 1.36 and 1.41, Z × N is countable. Thus, by Exercise 1.39, Q = g(Z × N ) is also countable. 1.43 a) Suppose the converse is true, i.e. that [0, 1) is countable. Then let {xn }∞ n=1 be the enumeration of elements of [0, 1). For each xn consider its decimal expansion and let kn equal the nth digit in the decimal expansion of xn . For each n ∈ N define mn = (kn + 5) mod 10. Consider the point ∞ ( mn 10−n . y = 0.m1 m2 m3 · · · = n=1

Then, for every given n ∈ N , y ̸= xn since, by construction, mn ̸= kn (the points y and xn differ (at least) in the nth digit in their respective decimal expansions). On the other hand, clearly y ∈ [0, 1). Thus we have a contradiction. Therefore, [0, 1) is uncountable. b) If (0, 1) were countable, then (0, 1) would have been the range of some sequence {x1 , x2 , . . . }, in which case [0, 1) would have been the range of the sequence {0, x1 , x2 , . . . }, implying that [0, 1) is a countable set, which contradicts part (a). Thus, (0, 1) is uncountable. c) Fix arbitrary real a < b. Define f : (0, 1) → (a, b) by f (x) = a + (b − a)x for all x ∈ (0, 1). Since f is a one-to-one function from (0, 1) onto (a, b), then (0, 1) and (a, b) are equivalent sets in the sense of definition on page 19. Thus, (a, b) is uncountable (since otherwise (0, 1) would have to be countable, which would contradict part (b)). d) Let us show that any non-degenerate interval in R is uncountable. For any a < b we have that, since (a, b) ⊂ (a, b] ⊂ [a, b] ⊂ (−∞, +∞) and (a, b) ⊂ [a, b), then each of the intervals (a, b], [a, b), [a, b] and R = (−∞, ∞) is uncountable, since otherwise, by Exercise 1.40, we will have a contradiction with the fact that (a, b) is uncountable (which was shown in part (c)). 1.44 Suppose An is a countable set for each n ∈ N . Upon applying Exercise 1.38 for each fixed n ∈ N , let {an,k }∞ k=1 be the enumeration of all elements of An . Next define"sets Ek = {a , a , a , · · · , ak,1 } for all k ∈ N . Then each Ek is a finite set and ∞ 1,k 2,k−1 3,k−2 n=1 An = "∞ k=1 Ek . Therefore, in view of Exercise 1.39, it suffices to show that the set J, defined by: J=

∞ $

{(1, k), (2, k − 1), (3, k − 2), · · · , (k, 1)}

k=1

N 2,

is countable. But J ⊂ where (by Exercise 1.41) N 2 is countable. Thus, by Exercise 1.40, J is countable and the required conclusion follows.

11

CHAPTER 1. Probability Basics

1.45 a) To show consistency of the two definitions, take I = {1, · · · , N }. Then the following set of functions {x : {1, · · · , N } → ∪N j=1 Aj such that x(i) ∈ Ai for all i ∈ {1, · · · , N }} can be readily identified with the set of vectors {(a1 , · · · , aN ) : ai ∈ Ai for all i ∈ {1, · · · , N }} since, given an arbitrary (a1 , · · · , aN ) (with ai ∈ Ai ), we can define a function x on I by putting x(i) = ai for each i ∈ I, and conversely, given a function x from the first set, the vector (x(1), · · · , x(N )) belongs to the second set. b) False. To see that the countable Cartesian product of countable sets may be ) uncountable, −k note first that every point x ∈ [0, 1) has a decimal expansion x = 0.x1 x2 x3 · · · = ∞ k=1 xk 10 for some xk ∈ {0, 1, · · · , 9}. For any set E, let E ∞ denote the countable Cartesian product of copies of E. Define a function f : [0, 1) → {0, · · · , 9}∞ by f (x) = (x1 , x2 , x3 , · · · ). Clearly f is a one-to-one map from [0, 1) onto {0, · · · , 9}∞ , where, by Exercise 1.1.43(a), the set [0, 1) is uncountable. Thus, {0, · · · , 9}∞ (which is a countable Cartesian product of finite sets) has to be uncountable.

1.3

Review Exercises

Basic Exercises 1.46 a) 11/16 = 0.6875; b) The 16 states in the South. 4 4 1 = = ≈ 0.00893 190 + 71 + 61 + 25 + 10 + 4 + 87 448 112 b) (61 + 25)/448 = 86/448 = 43/224 ≈ 0.19196 c) (448 − 190)/448 = 258/448 = 129/224 ≈ 0.57589 1.47 a)

1.48 a) In a large number of rolls, a die will have three dots facing up 1/6 of the time. b) In a large number of rolls, a die will have three or more dots facing up two thirds of the time. c) 10000/6 ≈ 1667. d) 10000(2/3) ≈ 6667. 1.49 n(Y ) ≈ 0.6n, since n(Y )/n ≈ 0.6 for large n. 1.50 a) 0, 1, 4; b) [0, 4). 1.51 Associate 0 with a tail and 1 with a head. Then, for example, (0, 1, 0) denotes the outcome that the 1st and 3rd tosses are tails and the 2nd toss is a head. Then “Two heads and one tail in three tosses” corresponds to the following three outcomes (1, 1, 0), (1, 0, 1) and (0, 1, 1). There are eight possible outcomes of the experiment: (0, 0, 0), (0, 0, 1), (0, 1, 0), (1, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0) and (1, 1, 1). Thus, the required probability is 3/8. 1.52 (A ∩ B) ∪ (A ∩ B c ) = {x ∈ A : x ∈ B} ∪ {x ∈ A : x ∈ / B} = {x ∈ A} = A. "k "∞ 1 1 1 1 1 , 1] for all k ∈ N and 1.53 a) n=1 ( n+1 , n ] = (0, 1], since n=1 ( n+1 , n ] = ( k+1 k → ∞.

1 k+1

→ 0 as

1.3 Review Exercises

12

" "k 1 1 1 1 1 1 b) ∞ n=1 [ n+1 , n ] = (0, 1], since n=1 [ n+1 , n ] = [ k+1 , 1] for all k ∈ N , and k+1 → 0 as k → ∞, 1 and {0} ∈ / [ n+1 , n1 ] for all n ∈ N . c) Part (a) involves a union of pairwise disjoint sets ( 12 , 1], ( 31 , 12 ], ( 14 , 13 ], ( 15 , 41 ], ( 61 , 15 ] . . . Part (b) 1 1 1 1 , n+1 ] ∩ [ n+1 , n1 ] = { n+1 }, but involves a union of sets that are not pairwise disjoint, since [ n+2 1 1 1 1 whose intersection is empty since [ n+3 , n+2 ] ∩ [ n+1 , n ] = ∅. 1.54 a) Finite, since the required set equals to {(1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4), (3, 2), (3, 3), (3, 4)}. b) Uncountable, since the required set is equal to {(1, y) : 2 ≤ y ≤ 4} ∪ {(2, y) : 2 ≤ y ≤ 4} ∪ {(3, y) : 2 ≤ y ≤ 4}, which is the union of three parallel (nonempty) segments. c) Uncountable, since the required set is equal to {(x, 1) : 2 ≤ x ≤ 4} ∪ {(x, 2) : 2 ≤ x ≤ 4} ∪ {(x, 3) : 2 ≤ x ≤ 4}, which is the union of three parallel (nonempty) " segments. d) Countably infinite, since the set equals to ∞ n=1 {(1, n), (2, n), (3, n)}. e) Uncountable, since the set equals to {(x, y) : 1 ≤ x ≤ 3, 2 ≤ y ≤ 4}. f ) Countably infinite. ! g) Finite, since ∞ n=1 {n, n + 1, n + 2} ⊂ ({1, 2, 3} ∩ {4, 5, 6}) = ∅.

1.55 a) ({3, 4}c ∩{4, 5}c )c = ({3, 4}c )c ∪ ({4, 5}c )c = {3, 4} ∪ {4, 5} = {3, 4, 5}. b) Note that {3, 4}c = {1, 2, 5, 6, 7, 8, . . . } and {4, 5}c = {1, 2, 3, 6, 7, 8, . . . }, thus, {3, 4}c ∩ {4, 5}c = {1, 2, 6, 7, 8, . . . }, which implies the required equality: ({3, 4}c ∩ {4, 5}c )c = {1, 2, 6, 7, 8, . . . }c = {3, 4, 5}. Theory Exercises 1.56 a) Put B1 = A1 and, for all n ∈ N , let % &c # n−1 $ Bn = An Ak = Ac1 ∩ · · · ∩ Acn−1 ∩ An . k=1

Then Bn ∩ Am = ∅ for all m < n. Since Bm ⊂ Am for all m, it follows that Bn ∩ Bm = ∅ for all m < n. Thus,"the sets B" 1 , B2 , . . . are pairwise disjoint. Now let us show that for every n ∈ N , the equality nj=1 Bj = nj=1 Aj holds. Note that B1 ∪ B2 = A1 ∪ (A2 ∩ Ac1 ) = A1 ∪ A2 . Moreover, % %n−1 &c & n−1 n n−1 n $ $ $ $ $ Aj ) = Aj , Bj = Bn ∪ ( Bj ) = An ∩ Ak ∪( j=1

j=1

k=1

j=1

j=1

where equality has to be valid due to the induction assumption that "n−1 the "2nd n−1 B = A , whereas the last equality holds due to the identity (E ∩ F c ) ∪ F = E ∪ F j j j=1 j=1 for arbitrary E, F . " b) Select (pairwise disjoint) sets B1 , B2 , . . . as in part (a). Upon noting that x ∈ ∞ j=1 Aj if

13

CHAPTER 1. Probability Basics

" " and only if " there exists n ∈ N such that x ∈ nj=1 Aj = nj=1 Bj , and the latter is true if and " " ∞ ∞ only if x ∈ ∞ j=1 Bj , we conclude that j=1 Aj = j=1 Bj .

1.57 a) (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) ⊂ A ∪ (B ∩ C). b) The answers will vary. Any choice of A, B, C such that A ∩ C ̸= A will give a valid example. Say, take A = {a} and B = C = ∅, then (A ∪ B) ∩ C = ∅ = ̸ {a} = A ∪ (B ∩ C).

c) By Proposition 1.2, we know that (A∪ B)∩ C = (A∩ C)∪ (B ∩ C). If A ⊂ C, then A∩ C = A, implying that (A ∪ B) ∩ C = A ∪ (B ∩ C). d) Assume that (A ∪ B) ∩ C = A ∪ (B ∩ C). Suppose that A ̸⊂ C, then there exists x ∈ A such that x ∈ / C. Then, x ∈ A ∪ (B ∩ C) but x ∈ / (A ∪ B) ∩ C, which is impossible since (A ∪ B) ∩ C = A ∪ (B ∩ C). Therefore, A ⊂ C. 1.58 a) Let C be the class of all sets of the form {C ⊂ U : C ⊃ A and C ⊃ B}. Clearly, since A ⊂ A# ∪ B and B ⊂ A ∪ B (and A ∪ B ⊂ U ), then (A ∪ B) ∈ C, which implies that (A ∪ B) ⊃ C. On the other hand, for every C ∈ C, we must have that C ⊃ (A ∪ B). Therefore, (

C∈C #

C) ⊃ (A ∪ B), and the required conclusion follows.

C∈C

b) If A ⊂ D and B ⊂ D, then (A ∪ B) ⊂ D, since every element of A ∪ B is in A or in B, and thus belongs to D. In view of part (a), the smallest D ∈ C is equal to A ∪ B. (Here “smallest” set means that every other set in the class contains it). c) Let E be the class of all sets of the form {E ⊂ U : E ⊂ A " and E ⊂ B}. Clearly, A ∩ B ⊂ A and A ∩ B ⊂ B, thus, A ∩ B ∈ E, implying that A ∩ B ⊂ E∈E E. On the other hand, for every E ∈ E,"we have that E ⊂ (A ∩ B), since every point of E has to belong to both A and B. Therefore, ( E∈E E) ⊂ (A ∩ B) and the required conclusion follows. d) If A ⊃ D and B ⊃ D, then A ∩ B ⊃ D and A ∩ B is the largest set in E, by part (c). (Here “largest” set is in the sense that every other set in the class is contained in it). e) Let A1 , A2 , . . . be a sequence of subsets of U . Let C ∗ be the class of all sets of the form {C ⊂ U : C ⊃ An for all n ∈ N } and let E ∗ be the "∞class of all sets ! of the form "∞{E ⊂ U : C ⊂ ∗ An for all n ∈ N }. Then if D ∈ ! C , then D ⊃"( n=1 An )!and C∈C ∗ C = n=1 An . On the ∞ other hand, if D ∈ E ∗ , then D ⊂ ∞ n=1 An and E∈E ∗ E = n=1 An . 1.59 a) True, since A × (B ∪ C) = {(x, y) : x ∈ A and y ∈ (B ∪ C)} = {(x, y) : x ∈ A and y ∈ B, or x ∈ A and y ∈ C} = (A × B) ∪ (A × C). b) True, since A × (B ∩ C) = {(x, y) : x ∈ A and y ∈ (B ∩ C)} = {(x, y) : x ∈ A and y ∈ B and y ∈ C} = (A × B) ∩ (A × C). c) False. Any non-empty A, B such that A ̸= B will give required counterexample. For example, take A = {0} and B = {1}, then A × B = {(0, 1)} ̸= {(1, 0)} = B × A. Advanced Exercises 1.60 a) (71 + 92)/525 ≈ 0.31; b) (71 + 61)/525 ≈ 0.251; c) 71/(71 + 92) ≈ 0.436; d) 71/(71 + 61 + 37) = 71/169 ≈ 0.42; e) 24/525 ≈ 0.046 1.61 The odds against randomly selecting an adult female Internet user, who believes that

1.3 Review Exercises

14

having a “cyber affair” is cheating, are 1:3, since three out of every four women believe that having a “cyber affair” is cheating. 1.62 a) B c = U \ B; b) A \ B = {x : x ∈ A and x ∈ / B} = {x : x ∈ A and x ∈ B c } = A ∩ B c . c c c c c) (A \ B) = (A ∩ B ) = A ∪ B, by part (b) and the de Morgan’s law. 1.63 A △ B = (A \ B) ∪ (B \ A). 1.64 a) By Exercises 1.62 and 1.63 and properties of operations of complement, union and intersection, note first that (A △ B) △ C = ((A △ B) \ C) ∪ (C \ (A △ B)) = (((A \ B) ∪ (B \ A)) \ C) ∪ (C \ (A △ B)) = ((A \ B) \ C) ∪ ((B \ A) \ C) ∪ (C ∩ ((Ac ∪ B) ∩ (B c ∪ A))) = (A ∩ B c ∩ C c ) ∪ (B ∩ Ac ∩ C c ) ∪ (C ∩ Ac ∩ B c ) ∪ (C ∩ B ∩ A), where the right-hand side of the last equation is symmetric in A, B, C. Therefore, upon interchanging the roles of A and C, one immediately obtains (A △ B) △ C = (C △ B) △ A = A △ (B △ C), with the last equality being obviously true due to symmetry of operation △ (since E△F = F △E for arbitrary E, F ). b) A △ U = (A \ U ) ∪ (U \ A) = ∅ ∪ Ac = Ac . c) A △ ∅ = (A \ ∅) ∪ (∅ \ A) = A ∪ ∅ = A. d) A △ A = (A \ A) ∪ (A \ A) = ∅ ∪ ∅ = ∅. 1.65 a) For arbitrary E, F , note that E △ F = (E ∪ F ) \ (E ∩ F ) = (E ∪ F ) ∩ (E ∩ F )c , therefore, (A ∩ B) △ (A ∩ C) = ((A ∩ B) ∩ (A ∩ C)c ) ∪ ((A ∩ C) ∩ (A ∩ B)c ) = (A ∩ B ∩ C c ) ∪ (A ∩ C ∩ B c ) = A ∩ ((B ∩ C c ) ∪ (C ∩ B c )) = A ∩ (B △ C). b) Note that (A ∪ B) △ (A ∪ C) = ((A ∪ B) ∩ (A ∪ C)c ) ∪ ((A ∪ C) ∩ (A ∪ B)c ) = ((A ∪ B) ∩ Ac ∩ C c ) ∪ ((A ∪ C) ∩ Ac ∩ B c ) = (B ∩ Ac ∩ C c ) ∪ (C ∩ Ac ∩ B c ) = Ac ∩ ((B ∩ C c ) ∪ (C ∩ B c )) = Ac ∩ (B △ C). Since (Ac ∩ (B △ C)) ⊂ (B △ C) ⊂ (A ∪ (B △ C)), it follows that (A ∪ (B △ C)) ⊃ (A ∪ B) △ (A ∪ C). c) The equality holds if and only Ac ∩ (B △ C) = A ∪ (B △ C) = B △ C, which is true if and only if A = ∅, since one must have that A ⊂ (B △ C) ⊂ Ac , implying that A = A ∩ Ac = ∅.

15

CHAPTER 1. Probability Basics

1.66 If B = ∅, then A △ B = A △ ∅ = A, by Exercise 1.64(c). Conversely, suppose A = A △ B. Then B = ∅ △ B = (A △ A) △ B = A △ (A △ B) = A △ A = ∅, where the 1st equality holds by Exercise 1.64(c), the 2nd equality holds by Exercise 1.64(d), the 3rd equality is valid by Exercise 1.64(a), the 4th equality is true due to the assumption that A = A △ B, and the last equality holds by Exercise 1.64(d).

BETA

Chapter 2

Mathematical Probability 2.1

Sample Space and Events

Basic Exercises 2.1 The required Venn diagrams are given by: a) E c = “E does not occur”:

E

b) A ∪ B= “Either A or B or both occur”:

A

B

A

B

c) A ∩ B= “Both A and B occur”:

d) A ∩ B ∩ C= “All three events A, B, C occur at once”:

17

CHAPTER 2. Mathematical Probability

A

B C

e) A ∪ B ∪ C=“At least one of the events A, B, C occurs”:

A

B C

f ) Ac ∩ B=“B occurs but A does not occur”:

A

B

2.2 a) False, unless both A and B are empty. A simple counterexample is obtained by taking C = A ̸= ∅ and A ∩ B = ∅, then A and B are mutually exclusive, but A and C are not mutually exclusive since A ∩ C = A ̸= ∅. b) True, since the condition A ∩ B ̸= ∅ violates definition 2.3 of mutual exclusiveness of the events A, B, C regardless of the choice of C. 2.3 a) The sample space is Ω = [0, 1), since the distance between the center of the petri dish and the center of the first colony of bacteria can equal any number between 0 and 1. (Note that {0} is a possible outcome since the center of the colony can be located exactly at the center of the dish, whereas {1} is excluded from the sample space under the assumption that bacteria can only develop in the interior of the unit circle formed by the petri dish.) b) [1/4, 1/2]= spot is at least 1/4 and no more than 1/2 unit from the center of the dish; c) Distance between the center of the first colony of bacteria and the center of the petri dish does not exceed 1/3 unit. 2.4 a) Ω = {1, 2, 3, 4, 5, 6}. b) A = {2, 4, 6}, B = {4, 5, 6}, C = {1, 2}, D = {3}. c) Ac = die comes up odd={1, 3, 5}; A ∩ B= die comes up either 4 or 6={4, 6}; B ∪ C = die does not come up 3= {1, 2, 4, 5, 6}. d) A ∩ B = {4, 6}, thus A and B are not mutually exclusive; B ∩ C = ∅ implies that B and C are mutually exclusive; A, C, D are not mutually exclusive since A ∩ C = {2} = ̸ ∅.

2.1 Sample Space and Events

18

e) The three events B, C, D are mutually exclusive, whereas the four events A, B, C, D are not mutually exclusive. f ) {5} = die comes up 5; {1, 3, 5}= die comes up odd; {1, 2, 3, 4} = die comes up 4 or less.

2.5 a) Since one of the 39 trees is selected at random, the sample space is Ω = {T1 , · · · , T39 }, where Tn denotes the percentage of seed damage for the nth tree. For any given event E, let N (E) denote the number of outcomes in E. b) B c = selected tree has less than 20% seed damage, N (B c ) = 19 + 2 = 21; c) C ∩ D= selected tree has at least 50% but less than 60% seed damage, N (C ∩ D) = 2; d) A ∪ D = selected tree has either less than 40% or at least 50% seed damage, N (A ∪ D) = 39 − 6 = 33; e) C c = selected tree has either less than 30% or at least 60% seed damage, N (C c ) = 19 + 2 + 5 + 2 = 28; f ) A ∩ D = selected tree has seed damage which, at the same time, is under 40% and at least 50% = ∅, thus, N (A ∩ D) = 0. g) A and D are mutually exclusive; this is the only pair of mutually exclusive events among A, B, C and D, therefore, it is the only collection of mutually exclusive events among A, B, C, D. 2.6 a) The sample space is given by: Ω = {(a, b, c) : a, b, c ∈ {0, 1, · · · , 9} such that a ̸= b and b ̸= c and a ̸= c}. (The three balls are removed from the urn one at a time, which suggests that the order, in which the balls are removed, is observed. However, for the purposes of part (b) of the problem, ˜ = {{a, b, c} : a, b, c ∈ order is irrelevant and it suffices to consider a sample space given by Ω {0, 1, · · · , 9} such that a ̸= b, b ̸= c, c ̸= a}, i.e. each outcome is an unordered set of three distinct integers in [0, 9].) b) Let E denote the event that an even number of odd-numbered balls is removed. Then, on the sample space Ω, E = {(2k, 2ℓ, 2m), (2r + 1, 2n + 1, 2k), (2k, 2r + 1, 2n + 1), (2r + 1, 2k, 2n + 1) : k, ℓ, m, r, n ∈ {0, 1, 2, 3, 4}, where r ̸= n and k ̸= ℓ, ℓ ̸= m and k ̸= m} . ˜ then (If one instead considers the sample space Ω, E = {{2k, 2ℓ, 2m}, {2j, 2r + 1, 2n + 1} : k, ℓ, m, r, n, j ∈ {0, 1, 2, 3, 4}, where k < ℓ < m and r < n}.) 2.7 A2 = {(1, 1)}, A3 = {(1, 2), (2, 1)}, A4 = {(1, 3), (2, 2), (3, 1)}, A5 = {(1, 4), (2, 3), (3, 2), (4, 1)}, A6 = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}, A7 = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, A8 = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}, A9 = {(3, 6), (4, 5), (5, 4), (6, 3)}, A10 = {(4, 6), (5, 5), (6, 4)}, A11 = {(5, 6), (6, 5)}, A12 = {(6, 6)}.

2.8 a) The sample space is equal to Ω = {(1, 0), (1, 1), (2, 0), (2, 1), (2, 2), (3, 0), (3, 1), (3, 2), (3, 3), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4), (5, 0), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (6, 0), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}. In other words, Ω = {(x, y) : x ∈ {1, 2, 3, 4, 5, 6} and y is integer such that 0 ≤ y ≤ x},

since whenever the die turns up x (x ∈ {1, · · · , 6}), the total number of heads that follow is anywhere between 0 and x, inclusive.

19

CHAPTER 2. Mathematical Probability

b) Total # of heads is even = {(x, 2k) : x ∈ {1, 2, 3, 4, 5, 6} and k ∈ {0, 1, 2, 3}, where 2k ≤ x}. More explicitly, total number of heads is even={(1, 0), (2, 0), (2, 2), (3, 0), (3, 2), (4, 0), (4, 2), (4, 4), (5, 0), (5, 2), (5, 4), (6, 0), (6, 2), (6, 4), (6, 6)}. 2.9 a) Ω = {T, HT, HHT, HHHT, HHHHT, . . . }. b) Laura wins = {HT, HHHT, HHHHHT, . . . }. In other words, Laura wins = {H...H !"#$T : k is nonnegative integer}. 2k+1

2.10 a) Ω = {H, TH, TTH, TTTH, . . . }. b) Ω = {HH, THH, HTH, TTHH, THTH, HTTH, TTTHH, TTHTH, THTTH, HTTTH, . . . }. In other words, Ω = {!"#$ T...THT...TH : k and ℓ are nonnegative integers}. !"#$ k



c) {TTTTTH}. d) {TTTTHH, TTTHTH, TTHTTH, THTTTH, HTTTTH}.

2.11 a) Ω = {4, 5, 6, 7, 8, 9, 10}. b) Note that A = {6, 7, 8, 9, 10} and B = {4, 5, 6, 7, 8}, since “at least 4 women are on the jury” means that “8 or less men are on the jury.” Then A ∪ B = Ω = {4, 5, 6, 7, 8, 9, 10}, A ∩ B = {6, 7, 8}, A ∩ B c = {9, 10}. c) A and B are not mutually exclusive since A ∩ B ̸= ∅; A and B c are not mutually exclusive since A ∩ B c ̸= ∅. Since Ac ∩ B c = {4, 5} ∩ {9, 10} = ∅, thus Ac and B c are mutually exclusive.

2.12 a) If A and B c are mutually exclusive, then every element of A is not in B c , thus, every element of A must belong to B, therefore, A ⊂ B and B occurs whenever A occurs. b) “B occurs whenever A occurs” implies that every element of A also belongs to B, i.e. A ⊂ B, which implies that A ∩ B = A and A ∩ B c = ∅. 2.13 a) A ∩ B c . b) (A ∩ B c ) ∪ (Ac ∩ B). c) (A ∩ B c ∩ C c) ∪ (Ac ∩ B ∩ C c ) ∪ (Ac ∩ B c ∩ C), which is essentially saying that either A occurs but B and C do not occur, or B occurs but A and C do not occur, or C occurs but A and B do not occur. d) (A ∩ B ∩ C)c , since at most two of A, B, C occur if and only if all three events do not occur at the same time. 2.14 a) Ω = {(nx , my ) : x, y ∈ {hearts, diamonds, clubs, spades}, n, m ∈ {ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, queen, king}, where nx ̸= my }. (nx stands for a card of denomination n of suit x.) b) Since (queenhearts , acehearts ) ∈ (A∩B), thus, A∩B ̸= ∅, therefore, A and B are not mutually exclusive. 2.15 a) Ω = (0, ∞), where point 0 is excluded under the assumption that several patients cannot arrive at an exactly same time (therefore, only the very first patient can arrive at exactly 6:00 P.M.). b) Ω = {0, 1, 2, . . . }, since the number of people arriving to the emergency room during the first half-hour could be any nonnegative integer. c) A ⊂ B, since A= “the number of patients who arrive before 6:30 P.M. is at least five,” and B= “the number of patients who arrive before 6:30 P.M. is at least four.” d) Ω = {(t1 , t2 , t3 , . . . ) : 0 < t1 < t2 < t3 < . . . }, where, for patients arriving after 6:00 P.M., ti denotes the elapsed time (in hours) from 6:00 P.M. until the arrival of the ith patient. e) False. If event A occurs, then Smith has to wait until the admission of the tenth patient or

2.1 Sample Space and Events

20

longer. Since Jones has to stay only until the tenth patient is admitted, occurrence of A implies that Smith cannot leave the emergency room before Jones. Thus, A ∩ B= “Smith and Jones leave at the same time (when the tenth patient is admitted).” In other words, A ∩ B = “tenth patient is the first patient with a sprained ankle.” Clearly, A ∩ B ̸= ∅, thus, the statement given in the problem is false. f ) False. Note that C=“first patient with a sprained ankle is either the tenth patient admitted to the emergency room or he arrives sometime after the tenth patient,” thus, A = C, implying that A ∩ C = A ̸= ∅, thus A and C are not mutually exclusive. g) False. Since, from answers to (e),(f), A ∩ B ̸= ∅ and A = C, then B ∩ C ̸= ∅ and the statement in the problem is false. h) False. Indeed, suppose first patient with a sprained ankle is the 2nd patient admitted to the emergency room, then Smith leaves right after the 2nd patient, whereas Jones has to wait until the 10th patient. Thus, B occurs but A does not occur. i) False. Suppose the first patient with a sprained ankle is the 12th patient admitted. Then A occurs but B does not occur. j) True, since A = C (see (f)). k) True, since A = C. l) False. The answer follows at once from A = C and part (i). m) False. The answer follows at once from A = C and part (h). 2.16 Note that for each n ≥ 1, (An+1 ∩ Acn ) ∩ An = ∅. Then, since for every m ≤ n − 1, (Am+1 ∩ Acm ) ⊂ Am+1 ⊂ An , it follows that ((Am+1 ∩ Acm ) ∩ (An+1 ∩ Acn )) ⊂ (An ∩ (An+1 ∩ Acn )) = ∅.

Therefore for every n ≥ 1 and every m < n, the events (Am+1 ∩ Acm ) and (An+1 ∩ Acn ) are mutually exclusive. Thus, A2 ∩ Ac1 , A3 ∩ Ac2 , A4 ∩ Ac3 , . . . are pairwise mutually exclusive.

2.17 a) The set of all events ={∅, {a}, {b}, {a, b}}. b) The set of all events ={∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}. c) The set of all events ={∅, {a}, {b}, {c}, {d}, {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}, {a, b, c}, {a, b, d}, {a, c, d}, {b, c, d}, {a, b, c, d}}. d) If Ω contains n outcomes, %then there is 1 event that has 0 outcomes (empty set), n events n& that contain only 1 outcome, events that contain exactly 2 outcomes, and, more generally, 2 % n& there are k events that contain exactly k outcomes from Ω, where k ∈ {1, . . . , n}. Therefore, the total number of events is equal to ' ( ' ( ) n ' ( n n n 1+n+ + ··· + = = (1 + 1)n = 2n . 2 n k k=0

2.18 a) Take an arbitrary outcome ω from the intersection of all events A2*, A4 , A6 , . . . and ∞ c , Ac , . . . . Then for each n ≥ 1, we have that ω ∈ (A ∪ A Ac1 , A* n n+1 ) ⊂ ( k=n Ak ). Since 3 5 +∞ *∞ ∞ ω ∈ ( k=n Ak ) for all n ≥ 1, it follows that ω ∈ n=1 ( k=n Ak ). b) See (a), since the same argument holds. c) Take an arbitrary outcome ω in the intersection of all the sets A10 , A100 , A1000 , · · · and Acn , for all integers n which are not powers of 10. Then for each n ≥ 1, ω∈

10n ,

k=n

Ak ⊂ (

∞ ,

k=n

Ak ),

21

CHAPTER 2. Mathematical Probability

therefore, ω ∈ A∗ and the required conclusion follows. d) Take an arbitrary outcome ω which belongs to the intersection of all Acn with n ∈ / {86, to*the event A86 ∩ A2049 ∩ A30498541 . Then * 2049, 30498541} and which also belongs ∗ ⊂( ∞ ω∈ /( ∞ A ), and since we have that A / A∗ . k ), it follows that ω ∈ k=30498542 k k=30498542 A* ∞ e) Take an arbitrary outcome ω ∈ A∗ . Then for each n ≥ 1, ω ∈ ( k=n Ak ), which implies that ω ∈ Ak for some k ≥ n. Then ω must belong to Aj for infinitely many j. (Indeed, suppose that the set J = {j ≥ 1 : ω ∈ Aj } is finite, let j ∗ = max{j : j ∈ J} and take n = j ∗ + 1. Then there exists k ≥ n such that ω ∈ Ak , therefore, k ∈ J. Then j ∗ ≥ k ≥ n = j ∗ + 1, which results in contradiction.) Conversely, suppose ω belongs to Aj for infinitely many j’s. As before, let J = {j ≥ 1 : ω ∈ Aj }. Since J is infinite, for every n there*exists j ∈ J such that j ≥ n. Therefore, for each n, *∞ there exists j ≥ n such that ω ∈ Aj ⊂ ( ∞ A ), which implies that ω ∈ A k=n k k=n k for all n, ∗ ∗ thus, ω ∈ A . Thus, we showed that A = {ω ∈ Ω : ω ∈ Aj for infinitely many j}. 2.19 a) By condition (1) in the definition of the σ-algebra, +∞ if cAn ∈ A for n = 1, 2, . . . , then c An ∈ + A for n = 1, 2, . . . . Then, by (2), it follows that ( n=1 A implies *n∞) ∈ A, which, +∞ in turn, c )c ∈ A, by (1). Thus, by the De Morgan’s law, c )c ∈ A. that ( ∞ A A = ( A n n=1 n n=1 n=1 n b) Let 2Ω denote the collection of all possible subsets of Ω. Then: (1) If A ∈ 2Ω , then A ⊂ Ω, then Ac = {ω ∈ Ω : ω ∈ / A} ⊂ Ω, thus Ac+∈ 2Ω ; Ω (2) If An ∈ 2 for all n ≥ 1, then all n ≥ 1, then ∞ n=1 An = {ω ∈ Ω : ω ∈ +∞An ⊂ Ω for Ω An for all n ≥ 1} ⊂ Ω, therefore, ( n=1 An ) ∈ 2 . * (3) If An ∈ 2Ω for all n ≥ 1, then *An ⊂ Ω for all n ≥ 1, then ∞ n=1 An = {ω ∈ Ω : ω ∈ Ω. An for some n ≥ 1} ⊂ Ω, therefore, ( ∞ A ) ∈ 2 n=1 n Thus, 2Ω is a σ-algebra. c) Let A = {∅, Ω}, then for each A ∈ A, either A = ∅ or A = Ω. Since ∅c = Ω and Ωc = ∅, we have that if A+∈ A then Ac ∈ A. If An ∈ A for all n ≥ 1, then either An = +∞Ω for all n ≥ 1, ∞ in which case n=1 An = Ω*∈ A, or An = ∅ for some n ≥ 1, in which case n=1 An = ∅ ∈ A. By (a), it follows also that ∞ i=1 Ai ∈ A, thus, A = {∅, Ω} is a σ-algebra. c d) Let A = {Ω, E, E , ∅}. Assuming that E ̸= Ω and E ̸= ∅ (otherwise, the problem reduces to (c)), note that since (E c )c = E and ∅c = Ω, condition (1) in the definition of σ-algebra is satisfied by A at once. Next take an arbitrary sequence of events + An ∈ A for all n ≥ 1, then c either it contains at least one ∅ or both E and E , in which case ∞ i=1 Ai = ∅, or not. In the latter case there are three possibilities: either the sequence {Ai : i ≥ 1} contains only +∞ Ω and c (at least one) E as its elements, or only Ω and (at least one) + E , or only Ω’s. Then i=1 Ai is equal to E or E c or Ω respectively. In each case we have that ∞ i=1 Ai ∈ A, i.e. condition (2) of the definition is satisfied. By (a), condition (3) is then satisfied as well. Thus, A is a σ-algebra. e) If D = {collection of all unions*(including empty) of E1 , E2 , · · · }, where E1 , E2 , . . . are * mutually exclusive events such that ∞ E = Ω, then every set A ∈ D has the form A = i=1 i i∈I Ei * c c for some subset I of {1, 2, 3, . . . }. Then A = i∈I c Ei , implying that A ∈ D and condition (1) of the definition of σ-algebra holds. Next, for an arbitrary sequence {A1 , A2 , ·*· · } of sets in D, there exists a sequence {I1 , I2 , · · · } of subsets of {1, 2, · · · } such that Aj = i∈Ij Ei for each * *∞ * * *∞ *∞ j. Then ∞ j=1 Aj = j=1 ( i∈Ij Ei ) = i∈U Ei , where U = j=1 Ij . Thus, ( j=1 Aj ) ∈ D, i.e. condition (3) in the definition of σ-algebra is satisfied. Condition (2) -* .c then follows at once from +∞ ∞ c conditions (1),(3) and the De Morgan’s law: j=1 Aj = j=1 Aj . f ) If Ω is an infinite set, then the class D of subsets of Ω, given by D = {E ⊂ Ω : either E or E c is finite},

2.2 Axioms of Probability

22

is not a σ-algebra. Indeed, since Ω is infinite, one can choose in Ω an infinite sequence {x1 , x2 , x3 , · · · } of distinct points. Let Ei = {xi } for all i = 1, 2, . . . . Then Ei ∈ D for all i = 1, 2, . . . Consider a set G = {x2 , x4 , x6 , x8 , · · · } = {x2k : k = 1, 2, . . . } ⊂ Ω. Then G is infic c nite and / D. On the other hand, *∞ G ⊃ {x1 , x3 , x5 , · · · }, therefore, G is also infinite. Thus, G ∈ G = k=1 E2k , where E2k ∈ D for all k = 1, 2, · · · . Therefore, condition (3) of the definition of σ-algebra is violated.

2.2

Axioms of Probability

Basic Exercises 2.20 The sample space is Ω = {ω1 , · · · , ω9 }, where ω1 =“Atlantic Ocean,” ω2 =“Pacific Ocean,” ω3 =“Gulf of Mexico,” ω4 =“Great Lakes,” ω5 =“Other lakes,” ω6 =“Rivers and canals,” ω7 =“Bays and sounds,” ω8 =“Harbors,” ω9 =“Other.” a) The probability assignment on the finite space Ω is legitimate since each P ({ωi }) ≥ 0 for all i ∈ {1, · · · , 9} and 9 ) i=1

P ({ωi }) = 0.011 + 0.059 + 0.271 + 0.018 + 0.003 + 0.211 + 0.094 + 0.099 + 0.234 = 1.

b) P ({ω1 , ω2 }) = P ({ω1 }) + P ({ω2 }) = 0.011 + 0.059 = 0.07 c) P ({ω4 , ω5 , ω8 }) = 0.018 + 0.003 + 0.099 = 0.12 d) P ({ω3 , ω7 , ω8 , ω9 }) = 0.271 + 0.094 + 0.099 + 0.234 = 0.698 2.21 For Ω = {H,T}, the set of all events of Ω is given by {∅, {H}, {T}, Ω}. a) The probability assignment P ({H}) = p and P ({T}) = 1 − p, where 0 ≤ p ≤ 1, uniquely determines a probability measure on the events of Ω by Proposition 2.3, since p ≥ 0, 1 − p ≥ 0 and p + (1 − p) = 1. b) P ({H}) = p, P ({T}) = 1 − p, P (Ω) = 1 and P (∅) = 0. 2.22 The sample space is given by Ω = / {1, 2, 3, 4, 5, 6}. a) Assignment #3 is not legitimate as 6k=1 P ({k}) = 6 × 0.2 = 1.2 > 1. Assignment #4 is not legitimate as it assigns negative values to outcomes {3} and {5}. Assignments #1,2,5 are legitimate as they satisfy conditions of Proposition 2.3. b) First note that A = {2, 4, 6}, B = {4, 5, 6}, C = {1, 2} and D = {3}. For assignment #1, P (A) = P (B) = 3/6 = 1/2, P (C) = 2/6 = 1/3 and P (D) = 1/6. For assignment #2, P (A) = 0.15 + 0.05 + 0.2 = 0.4, P (B) = 0.05 + 0.1 + 0.2 = 0.35, P (C) = 0.1 + 0.15 = 0.25, P (D) = 0.4 . For assignment #5, P (A) = 1/8 + 0 + 1/8 = 1/4, P (B) = 0 + 7/16 + 1/8 = 9/16, P (C) = 1/16 + 1/8 = 3/16 and P (D) = 1/4. c) The die comes up odd with probability 1/2 for assignment #1; with probability 0.1 + 0.4 + 0.1=0.6 for assignment #2, and with probability 1/16 + 1/4 + 7/16 = 3/4 for assignment #5. d) If the die is balanced, then, by symmetry, all outcomes are equally likely to occur, thus assignment #1 should be used. e) Based on the frequentist interpretation of probability, assignment #2 should be used. 2.23 a) Assignments #1,2,4,5 are legitimate. Indeed, each probability is non-negative and the

23

CHAPTER 2. Mathematical Probability

sum of all distinct outcomes of the experiment equals one. (For #5, note that /∞ of probabilities 1 k =p· p(1 − p) k=0 1−(1−p) = 1, and #1 is a special case of #5 with p = 1/2). Assignment #3 is not legitimate since the sum of probabilities of all possible outcomes is equal / 1 = ∞. to ∞ k=2 k b) Let An denote the event that success occurs by the nth attempt. Then P (A2 ) = P ({s}) + P ({f, s}), implying that: #1: P (A2 ) = 3/4. #2: P (A2 ) = 2/3. #4: P (A2 ) = 0. #5: P (A2 ) = 2p − p2 = 1 − (1 − p)2 . Note also that P (A4 ) = P (A2 ) + P ({f, f, s}) + P ({f, f, f, s}). Then #1: P (A4 ) = 15/16. #2: P (A4 ) = 1. #4: P (A4 ) = 1. 3 ) 1 − (1 − p)4 = 1 − (1 − p)4 . #5: P (A4 ) = p (1 − p)k = p · 1 − (1 − p) k=0 Similarly, P (A6 ) = P (A4 ) + P ({f, f, f, f, s}) + P ({f, f, f, f, f, s}). Thus, #1: P (A6 ) = 63/64. #2: P (A6 ) = 1. #4: P (A6 ) = 1. 5 ) 1 − (1 − p)6 (1 − p)k = p · #5: P (A6 ) = p = 1 − (1 − p)6 . 1 − (1 − p) k=0 c) If p = 0 assignment #5 is not legitimate, since every possible outcome has probability zero, implying that P (Ω) = 0, which is impossible. Note that p = 0 means that success never occurs. Let us add to Ω an outcome {f, f, f, · · · } and assign it probability one. Then the probability assignment, extended in that fashion, becomes legitimate. 2.24 a) Note that B = (B ∩ A) ∪ (B ∩ Ac ), where B ∩ A and B ∩ Ac are mutually exclusive. Then, by the additivity axiom of probability, P (B) = P ((B ∩ A) ∪ (B ∩ Ac )) = P (B ∩ A) + P (B ∩ Ac ). b) Note that A ∪ B = A ∪ (B ∩ Ac ), where the events A and B ∩ Ac are mutually exclusive, then, by the additivity axiom of probability, P (A ∪ B) = P (A ∪ (B ∩ Ac )) = P (A) + P (B ∩ Ac ). c) From part (a), P (A ∩ B) = P (B) − P (B ∩ Ac ), where P (B ∩ Ac ) = P (A ∪ B) − P (A) by part (b). Thus, if A ∪ B = Ω, then P (A ∩ B) = P (B) − (P (A ∪ B) − P (A)) = P (B) − 1 + P (A). 2.25 a) The probability assignment is legitimate because probability of each outcome is nonnegative and probabilities of all outcomes sum to 1, since there are 36 possible outcomes and each has probability 1/36. b) P (A2 ) = P (A12 ) = 1/36, P (A3 ) = P (A11 ) = 2/36, P (A4 ) = P (A10 ) = 3/36, P (A5 ) = P (A9 ) = 4/36, P (A6 ) = P (A8 ) = 5/36, P (A7 ) = 6/36. c) The answers will vary. For example, let us take P ((1, 1)) = 1, while the remaining 35 outcomes are assigned probability zero. Then P (A2 ) = 1, whereas P (An ) = 0 for all n = 3, · · · , 12. d) If the die is balanced, then the probability assignment from part (c) is not reasonable, whereas the assignment from part (a) is reasonable since, due to obvious symmetry, each face

2.2 Axioms of Probability

24

of the die is equally likely to occur. / 2.26 a) Since 0.01 ≥ 0 and 100 k=1 0.01 / = 1, the assignment is legitimate. b) P (A) = 50(0.01) = 0.5; P (B) = 31 k=1 0.01 = 0.31, since 31 is the largest integer that is less than or equal to 10π. Note also that the number of primes between 1 and 100 is equal to 25. Hence P (C) = 25(0.01) = 0.25. 2.27 a)/Ω = {(x, y) : x, y ∈ {1, 2, 3, 4}}. b) 1/16. c) 1 − 4k=1 P ((k, k)) = 1 − (4/16) = 3/4. 2.28 a) 1/N . b) m/N . Theory Exercises 2.29 a) If P (Ai ) = 1/6 for each i and n = 10, then (∗) tells us that P (A1 ∪ · · · ∪ A10 ) ≤

10 , 6

which is trivial, since probability of any event must always be less than or equal to*1. On the other hand, if P (Ai ) = 1/60 for each i and n = 10, then inequality (∗) states that P ( 10 i=1 Ai ) ≤ 10/60 = 1/6, suggesting that the union of the ten events occurs no more frequently than 1/6 of the time, thus (∗) indeed provides some * useful information. Similarly, if P (Ai ) = 1/6 and n = 4, then inequality (∗) states that P ( 4i=1 Ai ) ≤ 4/6 = 2/3, which provides useful upper bound on the probability of occurrence of the union of the four events. b) Upon using Exercise 2.24(b) and then Exercise 2.24(a), one obtains that P (A1 ∪ A2 ) = P (A1 ) + P (A2 ∩ Ac1 ) = P (A1 ) + (P (A2 ) − P (A2 ∩ A1 )) ≤ P (A1 ) + P (A2 ),

where the last inequality follows from the nonnegativity axiom, i.e. since P (A2 ∩ A1 )* ≥ 0. Let us proceed further by induction. Suppose (∗) holds for all 2 ≤ n ≤ k. Let B = kn=1 An . Then k k+1 , ) An ) = P (B ∪ Ak+1 ) ≤ P (B) + P (Ak+1 ) ≤ P (An ) + P (Ak+1 ). P( n=1

n=1

Advanced Exercises 2.30 a) Since P ((i, j)) ≥ 0 for all i, j ∈ Z+ and P (Ω) =

∞ )

i,j=0

P ((i, j)) =

∞ )

e−5

⎤ 0∞ 1⎡ ∞ j ) 2i ) 3 ⎦ −5 2 3 ⎣ = e−5 = e e e = 1, i!j! i! j!

2i 3j

i=0

i,j=0

j=0

the probability assignment is legitimate. b) P (point is on the x−axis) = P ({(i, 0) : i = 0, 1, 2, . . . }) =

∞ ) j=0

e−5

2i 30 = e−5 e2 = e−3 . i!0!

25

CHAPTER 2. Mathematical Probability

c) P ({(i, j) : i, j ∈ {0, 1, 2, 3}) =

3 )

e−5

i,j=0

2i 3j i!j!

31 32 33 19 × 13 247 20 21 22 23 30 + + + )( + + + ) = e−5 = 5 ≈ 0.5548. 0! 1! 2! 3! 0! 1! 2! 3! 3 3e d) For each i ∈ Z+ , = e−5 (

P ({(i, j) : j ∈ {0, 1, 2, . . . }}) =

∞ )

e−5

2i 3j 2i 2i = e−5 e3 = e−2 . i!j! i! i!

e−5

2i 3j 3j 3j = e−5 e2 = e−3 . i!j! j! j!

j=0

e) For each j ∈ Z+ , P ({(i, j) : i ∈ {0, 1, 2, . . . }}) =

∞ ) i=0

2.31 a) For each m, n ∈ N , we have that P ({(ω1m , ω2n )}) = p1m p2n ≥ 0, since both p1m ≥ 0 and p2n ≥ 0. Also Ω = Ω1 × Ω2 is a countable space, since both Ω1 and Ω2 are countable, and ∞ )

m,n=1

P ({(ω1m , ω2n )}) =

∞ )

p1m p2n = (

∞ )

m=1

m,n=1

p1m )(

∞ )

n=1

p2n ) = 1 · 1 = 1.

Therefore, by Proposition 2.3, the probability assignment is legitimate. b) P ({(ω11 , ω21 ), (ω12 , ω24 )}) = p11 p21 + p12 p24 . c) P ({(ω1m , ω2n ) : n ∈ N }) =

∞ )

P ({(ω1m , ω2n )}) = p1m (

∞ )

n=1

n=1

∞ )

∞ )

p2n ) = p1m .

d) P ({(ω1m , ω2n ) : m ∈ N }) =

P ({(ω1m , ω2n )}) = p2n (

p1m ) = p2n .

m=1

m=1

2.32 The answers will vary. As an example, suppose Q = {x1 , x2 , x3 , · · · } is some enumeration e−1 − 1)!, then P ({ω}) > 0 for all ω ∈ Q and /∞ of rational numbers. /∞ Let P ({xk }) = −1 //(k ∞ −1 −1 P ({x }) = e (1/(k − 1)!) = e k k=1 k=1 k=0 (1/k!) = e e = 1. 2.33 Suppose that Ω = [0, 1] × [0, 1] is a countable set. Then, by Corollary 2.1, ) P ({ω}) = 1. ω∈Ω

On the other hand, since P ({ω}) = 0 for every ω ∈ Ω, it follows that ) ) P ({ω}) = 0 = 0, ω∈Ω

ω∈Ω

2.2 Axioms of Probability

26

which contradicts the previous condition. Since legitimacy of the probability assignment in the problem is taken for a fact, one has to conclude that the set [0, 1] × [0, 1] is uncountable. 2.34 Bruno de Finetti’s thought experiment: Suppose you’re willing to pay $x to be promised that, if the Red Sox win next year’s World Series, you will receive $1. a) If x > 1, then your opponent should sell the promise to you, because if the Red Sox win next year’s World Series, then your opponent’s net profit is ($x − $1) > $0, whereas if the Red Sox lose next year’s World Series, then your opponent’s net profit is $x > $1. Thus, in both cases your opponent makes a positive profit. b) If x < 0, your opponent’s optimal strategy is to buy the promise from you. Indeed, if the Red Sox win next year’s World Series, then your opponent’s net profit is 1 + |x| dollars. If the Red Sox lose next year’s World Series, then your opponent’s net profit is |x| dollars. Thus, regardless of the outcome of the World Series, your opponent is making a positive profit. c) If x < 1 but you’re certain that the Red Sox will win, then you fear your opponent’s decision to buy the promise from you, because in the latter case you have to sell the promise, which is worth $1, for a lower price. In other words, you’ll suffer a loss of (1 − x) > 0 dollars. However, if x = 1, you are not losing money on the deal. On the other hand, if x > 0 but you’re certain that the Red Sox will lose, you fear your opponents decision to sell you the promise, as you will have to pay a positive amount of money $x for the promise that’s worth nothing. On the other hand, if x = 0 then you pay nothing for the promise, thus have nothing to lose when the Red Sox lose the World Series. d) Let x denote the price of the promise that corresponds to the Red Sox, let y denote the price of the promise that corresponds to the Diamondbacks, and let z denote the price of the promise that corresponds to the win by either the Red Sox or the Diamondbacks. Then, if x + y > z, your opponent’s optimal strategy is to sell to you both the promise for the Red Sox and the promise for the Diamondbacks, but to buy from you the promise concerning a victory by either the Red Sox or the Diamondbacks. Indeed, if the Red Sox win the World Series, your opponent’s profit will be ($x − $1) + $y + ($1 − $z) > $0, since he sells the promise worth $1 for $x, plus he sells the promise that’s worth nothing for $y, but he pays $z for the promise worth $1. Similarly, one shows that if the Diamondbacks win, your opponent’s profit is $x + ($y − $1) + ($1 − $z) > $0. Finally, if the winner is neither the Red Sox nor the Diamondbacks, then your opponent essentially sells you two promises that are worth nothing for a total of ($x + $y) but buys a worthless promise for $z, thus, his net profit is $x + $y − $z > $0. Thus, irrespective of the outcome of the World Series, your opponent ends up with a positive profit of $x + $y − $z. e) Let x, y, z be as in part (d), but now assume that x + y < z. Your opponent’s optimal strategy is to sell the promise (priced at $z) concerning a victory by either the Red Sox or the Diamondbacks but to buy the promise for the Red Sox (priced at $x) and the promise for the Diamondbacks (priced at $y). Indeed, if the Red Sox win the World Series, your opponent’s net profit is ($z − $1) − $y + ($1 − $x) > $0, since he essentially sold the promise, which is worth $1, for $z, and he paid $y for the worthless promise, plus he paid $x for the promise that’s worth $1. Similarly, one shows that if the Diamondbacks win, then your opponent’s profit is ($z − $1) + ($1 − $y) − $x > $0. Finally, if both the Red Sox and the Diamondbacks lose, then your opponent’s profit under this strategy is $z − $x − $y > $0. Thus, irrespective of the outcome of the World Series, your opponent has a positive profit of $z − $x − $y.

27

2.3

CHAPTER 2. Mathematical Probability

Specifying Probabilities

Basic Exercises 2.35 a) 18/38 = 9/19 ≈ 0.4737; b) 9/19; c) 2/38 = 1/19 ≈ 0.0526; d) 20/38 = 10/19 ≈ 0.5263; e) 10/19; f ) 36/38 ≈ 0.9474 2.36 a) All components are identical and each one can fail with probability 0.5; b) Note that Ω = {(x1 , · · · , x5 ) : xj ∈ {s, f }, 1 ≤ j ≤ 5}, where P ((x1 , · · · , x5 )) = 2−5 = 1/32, then for each event E,

# of outcomes in E . 32 c) 6/32 = 0.1875; d) 1 − (1/32) = 31/32 = 0.96875; e) 1/32 = 0.03125 P (E) =

2.37 a) Equal-likelihood model is appropriate if the three blood-type alleles a, b and o occur with the same frequency (i.e. 1/3 of the time) and are equally likely to be inherited by an offspring. b) 2/6 = 1/3 ≈ 0.3333; c) No, because different genotypes do not appear in the population with the same frequency. d) P ({bb, bo}) = 0.007 + 0.116 = 0.123, which is smaller than in (b). 2.38 Joint events in a contingency table are mutually exclusive since for all (i, j) ̸= (k, ℓ), (Ai ∩ Rj ) ∩ (Ak ∩ Rℓ ) = (Ai ∩ Ak ) ∩ (Rj ∩ Rℓ ) = ∅, which is due to Ai ∩ Ak = ∅ when i ̸= k and Rj ∩ Rℓ = ∅ when j ̸= ℓ. 2.39 a) 12; b) 94; c) 42; d) 54; e) 22; f ) Y3 =“The player has 6–10 years of experience”; W2 =“The player weighs between 200 and 300 lb”; W1 ∩ Y2 =“The player weighs under 200 lb and has 1–5 years of experience.” g) P (Y3 ) = 8/94 ≈ 0.0851, or approximately 8.51% of the players on the New England Patriots roster have 6–10 years of experience. P (W2 ) = 54/94 ≈ 0.5745, or approximately 57.45% of the players on the New England Patriots roster weigh between 200 and 300 lb. P (W1 ∩ Y2 ) = 9/94 ≈ 0.0957, or approximately 9.57% of the players on the New England Patriots roster weigh under 200 lb and have 1–5 years of experience. h),i) The joint probability distribution table is given by:

W1 W2 W3 P (Yj )

9 94 22 94 11 94 42 94

Y1 ≈ .096 ≈ .234 ≈ .117 ≈ .447

9 94 25 94 7 94 41 94

Y2 ≈ .096 ≈ .266 ≈ .074 ≈ .436

2 94 5 94 1 94 8 94

Y3 ≈ .021 ≈ .053 ≈ .011 ≈ .085

1 94 2 94

Y4 ≈ .011 ≈ .021 0

3 94

≈ .032

P (Wi ) ≈ .223

21 94 54 94 19 94

≈ .574 ≈ .202 1

2.40 a) Yes. The classical probability model is appropriate since dice are balanced, thus every pair of faces is equally likely to occur (i.e. each pair occurs with probability 1/36).

2.3 Specifying Probabilities

28

b) 3/36 ≈ 0.0833; c) 4/36 ≈ 0.1111; d) 5/36 ≈ 0.1389; e) 7/36 ≈ 0.1944 2.41 a) 1/6 ≈ 0.1667; b) 4/6 ≈ 0.6667; c) 1/6 ≈ 0.1667

5 length [10, 30] 20 length [10, 15] = ≈ 0.1667; b) = ≈ 0.6667 length [0, 30] 30 length [0, 30] 30 6 π(1/2)2 1 2.43 a) P ({(x, y) : x2 + y 2 < 1/2}) = = = 0.25, since area of a circle of radius R 2 π·1 4 is equal to πR2 . b) 1 − 0.25 = 0.75; c) The square in question is given by √ √ √ √ C = {(x, y) : −1/ 2 ≤ x ≤ 1/ 2, −1/ 2 ≤ y ≤ 1/ 2}, 2.42 a)

where C is inscribed into a ball B, given by B = {(x, y) : x2 + y 2 ≤ 1}. Then, since C ⊂ B, it follows that

√ π − ( 2)2 2 Area(B) − Area(C) P (B ∩ C ) = = = 1 − ≈ 0.3634 Area(B) π π c

d)

% & 7 Area {(x, y) : −1/4 < x < 1/4, x2 + y 2 ≤ 1} 2 1/4 6 1 − x2 dx = Area ({(x, y) : x2 + y 2 ≤ 1}) π −1/4 ' (81/4 7 8 4 1/4 6 4 1 6 1 2 2 = 1 − x dx = x 1 − x + arcsin(x) 88 π 0 π 2 2 0 √ 2 15 + arcsin(1/4) ≈ 0.315 = 8π π

2.44 a) A = “The x coordinate of the selected point in the square exceeds 1/3,” and P (A) = 2/3. b) B = “The y coordinate of the selected point in the square does not exceed 0.7,” and P (B) = 0.7; c) C = “The sum of the two coordinates of the point, selected in the square, exceeds 1.2” and, since area of Ω is 1, it follows that 0.82 = 0.32; P (C) = Area of triangle with vertices at (0.2, 1), (1, 1), (1, 0.2) = 2 d) D = “The difference between the two coordinates of the point, selected in Ω, is less than 1/10,” and P (D) = P ({(x, y) ∈ Ω : x − 0.1 < y < x + 0.1}) =

1 − (0.92 /2) − (0.92 /2) Area(Ω) − Area(T1 ) − Area(T2 ) = = 0.19, Area(Ω) 1

where T1 denoted the triangle with vertices at (0, 0.1), (0, 1), (0.9, 1) and T2 denoted the triangle with vertices at (0.1, 0), (1, 0), (1, 0.9). e) E = “The x and y coordinates of the selected point in Ω are equal” and P (E) = 0.

29

CHAPTER 2. Mathematical Probability

2.45 a) P ({(x, y) : 0 ≤ x ≤ 1, x < y ≤ 1} = 12 P (Ω) = 12 . b) The desired probability is given by: P ({(x, y) ∈ Ω : min(x, y) < 1/2}) =

Area(Ω) − Area({(x, y) : 1/2 ≤ x < 1, 1/2 ≤ y < 1} 1 3 =1− 2 = . Area(Ω) 2 4

c) 14 , since the required event is the complement of the event in (b). d) P ({(x, y) : 0 ≤ x < 1/2, 0 ≤ y < 1/2} = 41 . e) 34 , since the required event is the complement of the event in (d). 1 f ) P ({(x, y) ∈ Ω : 1 ≤ x + y ≤ 1.5} = (Area(E1 ) − Area(E2 )) Area(Ω) = 21 − 213 = 83 , where E1 denoted the triangle with vertices at (0, 1), (1, 1), (1, 0) and E2 denoted the triangle with vertices at (0.5, 1), (1, 1), (1, 0.5). g) P {(x, y) ∈ Ω : 1.5 ≤ x + y ≤ 2} = Area(E2 ) = 2−3 = 1/8. 2.46 Note that Ω = (0, 1). Then a) P ([0.7, 9 9 0.8)) = 0.1; : , b) P [0.1k + 0.07, 0.1k + 0.08) = 10(0.01) = 0.1; k=0

c) 0.105; =

9 , =

k=0

First note that the set

< 9 , √ [0.1k + 0.07, 0.1k + 0.08) x ∈ (0, 1) : x ∈

;

=

k=0

& (0.1k + 0.07)2 , (0.1k + 0.08)2 , implying that the probability, that the square root of

the selected point has second digit of its decimal expansion equal to 7, is equal to 9 ) % k=0

9 ) & (0.1k + 0.08)2 − (0.1k + 0.07)2 = (0.01) (0.2k + 0.15) k=0

9 ) = 0.002 ( k) + 0.015 = 0.002 × 45 + 0.015 = 0.105 k=0

6 2.47 a) First note that Volume of {(x, y, z) : 1/4 < x2 + y 2 + z 2 ≤ 1} = Volume of unit sphere – Volume of sphere {(x, y, z) : x2 + y 2 + z 2 ≤ 1/16} = (4/3)π − (4/3)π(1/4)3 , since the volume of a 3-dimensional sphere of radius R equals to (4/3)πR3 . Therefore, the required probability equals to 1 63 (4/3)π − (4/3)π(1/4)3 =1− = = 0.984375 (4/3)π 64 64 b) The cube of side length 1 centered at the origin is a subset of the unit sphere, thus, the required probability is given by 13 3 Volume (Cube) = = ≈ 0.2387 Volume (Unit sphere) (4/3)π 4π c) 0, since volume of the surface of the 3-dimensional sphere is 0.

2.3 Specifying Probabilities

30

2.48 a) Ω = {(x1 , x2 ) : 0 ≤ x1 ≤ b, 0 ≤ x2 ≤ h − hx1 /b} is the triangle of base b and height h with vertices at (0, 0), (b, 0), (0, h). Consider triangle Tx = {(x1 , x2 ) ∈ Ω : x2 > x}. Then Tx has vertices at (0, x), (b(1 − x/h), 0), (0, h) and represents the complement of the event of interest. Therefore, the required probability is equal to Area(Ω) − Area(Tx ) Area(Tx ) (1/2)(h − x)b(1 − x/h) =1− =1− Area(Ω) Area(Ω) (1/2)hb x. x .2 x = 2− . =1− 1− h h h

Area(Tx ) − Area(Ty ) (1/2)(h − x)b(1 − x/h) − (1/2)(h − y)b(1 − y/h) = Area(Ω) (1/2)hb ' ( . . x 2 y 2 y−x x+y = 1− − 1− = 2− , for all h ≥ y ≥ x. h h h h

b)

2.49 Let X be the distance from a point, randomly selected on a given side of an equilateral triangle (with side length ℓ), to the opposite vertex. Then a) ⎧ √ 3 ⎪ 0, if x < ⎪ 2 ℓ, ⎨ B P (X ≤ x) =

⎪ ⎪ ⎩

4x2 ℓ2

1,

− 3, if

√ 3 2 ℓ

≤ x ≤ ℓ,

if x > ℓ. C√ D Indeed, note that the range of possible values of X is 23 ℓ, ℓ , since the longest distance X is achieved when the point is chosen at a vertex, whereas the smallest value of X is achieved when the point is chosen exactly in the middle of the side √ case X equals √ of the triangle (in which the height h of the triangle, which is ℓ sin(π/3) = ℓ 3/2). Next, for x ∈ [ℓ 3/2, ℓ], note that X is√at most x if and only if the point is chosen on the side of the triangle within distance d = x2 − h2 of the middle point of that side. Therefore, the required probability is given by: B √ F 8 ℓ E √ 8[ − d, ℓ + d | 2 2 2 x2 − (ℓ 3/2)2 2d 2 x −h 4x2 2 2 = = = = − 3, |[0, ℓ]| ℓ ℓ ℓ ℓ2

where | · | denotes the 1-dimensional volume (i.e. length) of the intervals of interest. b) For all x < y, √ ⎧ 3/2 or x > ℓ, 0, if y < ℓ ⎪ ⎪ B √ ⎪ √ 2 ⎪ 4y 3 ⎪ ⎪ 3/2 and y ∈ [ − 3, if x < ℓ 2 ⎪ 2 ℓ, ℓ], ⎨ Bℓ B √ 2 4y 2 P (x ≤ X ≤ y) = − 3 − 4x − 3, if (x, y) ∈ [ 23 ℓ, ℓ]2 , ℓ2B ℓ2 ⎪ ⎪ √ ⎪ ⎪ 3 4x2 ⎪ − 3, if x ∈ [ 1 − 2 ⎪ 2 ℓ, ℓ] and y > ℓ, ℓ ⎪ √ ⎩ 1, if x < ℓ 3/2 and y > ℓ.

Clearly P (X = x) = 0 for every point x, since the 1-dimensional volume of any single point is 0. Thus, by the additivity axiom, it follows that P (X ≤ x) = P (X < x) + P (X = x) = P (X < x) + 0 = P (X < x) for all x. By the additivity axiom again, P (X ≤ y) = P ({X < x} ∪ {x ≤ X ≤ y}) = P (X < x) + P (x ≤ X ≤ y),

31

CHAPTER 2. Mathematical Probability

implying that P (x ≤ X ≤ y) = P (X ≤ y) − P (X < x) = P (X ≤ y) − P (X ≤ x), and the required result follows by plugging-in the answer to part (a). 2.50 a) P (white) ≈ 3194005/4058814 ≈ 0.7869, P (black) ≈ 622598/4058814 ≈ 0.1534, and P (other) ≈ (4058814 − 3194005 − 622598)/4058814 ≈ 0.0597; b) The probabilities are empirical, since we use proportions of occurrence of events for assignment of probability to those events. 2.51 The probability specified by the engineer is subjective, as it relies on an incomplete information about the company, incomplete knowledge of the job market and limited interviewing experience. 2.52 The type of probability specified by the realtor is subjective (but with a hint of empirical evidence behind it if the realtor has data on how long comparable houses stay on the market). Advanced Exercises 2.53 0. Let {r1 , r2 , · · · } be the enumeration of all rational points in (0, 1). Then, by Kolmogorov additivity axiom and Proposition 2.5 in the textbook, P (Q ∩ (0, 1)) = P (

∞ ,

{ri }) =

i=1

∞ ) i=1

P ({ri }) =

∞ ∞ ∞ ) ) |{ri }| 0 ) 0 = 0, = = |(0, 1)| 1 i=1

i=1

i=1

where | · | stands for the length of the interval in question (i.e. it denotes the one-dimensional volume of the subset). 2.54 Suppose the converse, i.e. that there exists a countably infinite sample space Ω = {ω1 , ω2 , · · · } which permits an equal-likelihood probability model. Then, for some p ≥ 0, P ({ωi }) = p for all i ∈ N . Then, by additivity axiom, P (Ω) =

∞ ) i=1

P ({ωi }) =

∞ ) i=1

p=

G

∞, if p > 0, 0, if p = 0.

But this contradicts the Kolmogorov certainty axiom that P (Ω) = 1. Therefore, equal-likelihood probability models are not possible in the case of countably infinite sample spaces. 2.55 1/2. First note that the three line segments can form a triangle whenever the length of each of them is less than the sum of lengths of the other two. Therefore, the required event of interest is the set T = {(x, y, z) ∈ (0, ℓ)3 : x < y + z, y < x + z, z < x + y} and the required probability is given by 8 8 8{(x, y, z) ∈ (0, ℓ)3 : x < y + z, y < x + z, z < x + y}8 |T | = |(0, ℓ)3 | |(0, ℓ)3 | 8 8 8& 1 %8 = 3 8(0, ℓ)3 8 − 3 8{(x, y, z) ∈ (0, ℓ)3 : z > x + y}8 , ℓ

2.4 Basic Properties of Probability

32

where | · | denotes the three-dimensional volume of a set. Therefore, 777 77 3 3 P (T ) = 1 − 3 dxdydz = 1 − 3 (ℓ − (x + y))dxdy ℓ ℓ 0 1/2, then the extinction probability equals to (1 − r)/r. Let En be the event that the amoeba population is eventually extinct if the initial size of the population is n, and let pn = P (En ). Then, p1 = 1 · P (1st amoeba dies) + P (E1 | 1st amoeba splits)P (1st amoeba splits) = (1 − r) + p2 r = (1 − r) + p21 r. Note that the above quadratic equation rp21 − p1 + (1 − r) = 0 has the following roots: 1 and (1 − r)/r. Thus, when r ≤ 1/2, (1 − r)/r ≥ 1, implying that the required extinction probability p1 equals to 1. On the other hand, if r > 1/2, then the required extinction probability equals to (1 − r)/r. b) For r = 1/3 and r = 1/2, the extinction probability is 1. For r = 3/4, the extinction probability equals to 1/3. For r = 9/10, the extinction probability equals to 1/9. c) We use contextual independence to conclude that P (E1 | 1st amoeba splits) = p2 = p21 .

4.64 Note that in every repetition of the experiment, either event E or E c occurs. Then # $ {n(E) = k} = {E occurs in k repetitions and E c occurs in n − k repetitions}. There are nk ways to pick a set of those repetitions where E occurs. Therefore, using independence of the repetitions, we obtain that " % " % n n k k c n−k P (n(E) = k) = (P (E)) (P (E )) = p (1 − p)n−k , k k where k = 0, 1, · · · , n. # $ n 4.65 2n n /4 . Let X be the number of heads obtained by Jan in n independent tosses, and let Y be the number of heads obtained by Jean in n independent tosses. Then P (X = Y ) =

n ! j=0

P ({X = j} ∩ {Y = j}) =

n !

P (X = j)P (Y = j),

j=0

" % n 1 where, by Exercise 4.64, P (X = j) = P (Y = j) = . Therefore, j 2n % " % n " %" n " %" % ! 1 ! n n 1 2n n n 1 1 · = n = n , P (X = Y ) = 4 j n−j 4 n j j 2n 2n j=0

since

4.4

#n$ j

=

#

n n−j

$

and the last equality holds by the Vandermonde’s identity (Exercise 3.48).

Bayes’s Rule

Basic Exercises

j=0

4.4 Bayes’s Rule

88

4.66 a) 0.045 0.045 × 0.514 b) ≈ 0.2148, by the Bayes’s rule. 0.045 × 0.514 + 0.174 × 0.486 c) 4.5% of American females 7 years old or older play golf. Approximately 21.48% of Americans, who are 7 years old or older and play golf, are female. .54 × .207 ≈ 0.256, by the 4.67 .83 × .127 + .54 × .207 + .43 × .22 + .37 × .165 + .27 × .109 + .2 × .172 Bayes’s rule. Thus, approximately 25.6% of adult moviegoers are between 25 and 34 years old. 4.68

0.6 × 0.4 ≈ 0.4138 0.6 × 0.4 + 0.8 × 0.32 + 0.3 × 0.28

4.69 0.0989; Let N be the event that a selected policyholder is normal, and let A be the event that a randomly selected policyholder has an accident in a given year. Then, by the Bayes’s rule, P (N c | Ac ) =

0.4 × 0.18 P (Ac | N c )P (N c ) = ≈ 0.0989 c c c c P (A | N )P (N ) + P (A | N )P (N ) 0.8 × 0.82 + 0.4 × 0.18

4.70 495/8784 ≈ 0.05635; Upon using the notation and result of Exercise 4.54, we obtain that P (X = 4 | win) =

P (win | X = 4)P (X = 4) = P (win)

3/36 3/36+6/36 244 495

×

3 36

=

495 495 = ≈ 0.05635 244 × 36 8784

4.71 a) 1/6, 1/3, 1/2. Let I denote the event that the first urn is selected, II be the event that the second urn is selected, III be the event that the third urn is selected and R be the event that the randomly chosen marble is red. Then, by the Bayes’s rule, P (I | R) =

1 10

×

1 3

+

1 10 2 10

× ×

1 3 1 3

+

3 10

×

P (III | R) =

1 3

1 10

=

×

1 , 6

P (II | R) =

1 3

3 10 2 10

+

× ×

1 3 1 3

+

3 10

1 10

×

×

1 3

1 3

=

+

2 10 2 10

× ×

1 3 1 3

+

3 10

×

1 3

=

1 2 = , 6 3

1 3 = . 6 2

b) 1/6, 1/3, 1/2; the result is not affected by the addition of one million green marbles in each urn. Indeed, 1 1 1 1,000,010 × 3 P (I | R) = 1 2 1 3 1 = 6. 1 1,000,010 × 3 + 1,000,010 × 3 + 1,000,010 × 3 Similarly, one shows that P (II | R) and P (III | R) remain unchanged. c) 3/8, 1/3, 7/24. The result will be affected by the addition of extra 1,000,000 green marbles, with the new probabilities given by: 0.3333337, 1/3, 0.333333; in general, as the number of extra green marbles added to urns increases, each required probability converges to 1/3. Let G denote the event that the selected marble is green. Then P (I | G) =

9 10

×

1 3

+

9 10 8 10

× ×

1 3 1 3

+

7 10

×

1 3

=

9 3 = , P (II | G) = 24 8

9 10

×

1 3

+

8 10 8 10

× ×

1 3 1 3

+

7 10

×

1 3

1 = , 3

89

CHAPTER 4. Conditional Probability and Independence

P (III | G) =

9 10

×

1 3

+

7 10 8 10

If extra 1,000,000 green marbles are added, then P (I | G) =

1,000,009 1,000,010

×

1 3

+

1,000,009 1,000,010 1,000,008 1,000,010

× ×

1 3 1 3

+

× ×

1 3 1 3

7 10

×

×

1 3

+

1,000,007 1,000,010

7 . 24

=

1 3

1, 000, 009 ≈ 0.3333337, 3, 000, 024

=

Similarly,

1 1, 000, 007 1, 000, 008 = , P (III | G) = = 0.333333 3, 000, 024 3 3, 000, 024 In general, note that if each urn contains n marbles, where the first urn contains exactly one red marble, the second urn contains exactly two red marbles and the third urn contains exactly three red marbles (and the rest of the marbles are green), then P (II | G) =

P (I | G) =

n−1 n

×

1 3

+

n−1 n n−2 n

× ×

1 3 1 3

+

n−3 n

×

1 3

=

1 n 6 n

1− n−1 = 3n − 6 3−

1 −→ , as n → ∞. 3

Similarly, P (II | G) =

1− 1 n−3 n−2 = , P (III | G) = = 3n − 6 3 3n − 6 3−

3 n 6 n

1 −→ , as n → ∞. 3

4.72 1/2. Let S be the event that the first chosen coin is silver, let A denote the event that chest A is selected. Similarly define events B, C and D (corresponding to chests B, C and D). Then P (the other drawer has silver coin | S) = P (C | S) =

1 2

×

1 4

+

1 2

1 × 14 × 14 + 1 ×

1 4

+0×

1 4

=

1 , 2

by the Bayes’ rule (where events A, B, C and D form the needed partition of the sample space). 4.73 0.8033; Let An be the event that n eggs are broken in a randomly selected carton (containing 12 eggs), n = 0, 1, 2, 3. Let B be the event that a randomly selected egg from a carton n is broken. Then, by the Bayes’s rule and upon noting that P (B | An ) = 12 , we obtain that P (A1 | B) =

1 12

× 0.192 +

1 12 2 12

× 0.192 × 0.022 +

3 12

× 0.001

≈ 0.8033

4.74 a) 9/17. Let Ci be the event that coin #i is tossed, i = 1, 2. Then P ({HT} | C1 )P (C1 ) P (C1 | {HT}) = = P ({HT} | C1 )P (C1 ) + P ({HT} | C2 )P (C2 )

1 4 1 4

×

1 2

+

× 2 3

1 2

×

1 3

×

1 2

=

9 . 17

b) 9/17, same as in (a). c) 9/17, since P (C1 | {HT,TH}) =

P ({HT,TH} | C1 )P (C1 ) = P ({HT,TH} | C1 )P (C1 ) + P ({HT,TH} | C2 )P (C2 )

× 12 + (2 × 32 × 13 ) × 1 2

1 2

×

1 2

1 2

4.4 Bayes’s Rule

90

4.75 25/42. Let Hi the event that the ith toss is a head, i = 1, 2. Then P (H2 | H1 ) =

P (H1 ∩ H2 ) P (H1 ∩ H2 | C1 )P (C1 ) + P (H1 ∩ H2 | C2 )P (C2 ) = P (H1 ) P (H1 | C1 )P (C1 ) + P (H1 | C2 )P (C2 ) =

1 2

×

1 2 1 2

× ×

1 2 1 2

+ +

2 3 2 3

× ×

2 3 1 2

×

1 2

=

25 ≈ 0.5952 42

Advanced Exercises 4.76 a) 96% of people who have the disease will test positive. 98% of people who do not have the disease will test negative. 48 0.96 × 0.001 = ≈ 0.046 b) 0.96 × 0.001 + 0.02 × 0.999 1047 c) Approximately 4.6% of people testing positive actually have the disease. d) The prior probability of illness is 0.001, whereas the posterior probability of illness, given a positive test, is 0.046, which is 46 times higher than the prior probability. e) Let D denote the event that the selected person has the disease, and let +/− denote the corresponding positive/negative result of the test. Then the first step of the algorithm maps c 2 2 c (P (D), P (D c )) · (P (+ | D), P (+ | D c )) to (103 P (D), 103 P # (D5 )) · (10 P (+ | D),5 10 Pc(+ | D )) c(to $ work with integers), and the latter is further mapped to 10 P (D)P (+|D), 10 P (D )P (+ | D ) . Finally that vector is mapped to % " 105 P (+ | D c )P (D c ) 105 P (+ | D)P (D) , , 105 (P (+ | D)P (D) + P (+|D c )P (D c )) 105 (P (+ | D)P (D) + P (+|D c )P (D c )) which is nothing but the Bayes’s rule to compute (P (D | +), P (D c | +)), but with both the numerator and denominator multiplied by 105 in order to work with integers. 100p N

j %. Let Sj be the event that a randomly selected member of the population 4.77 Pm jpk N k k=1 belongs to stratum j (j = 1, · · · , m). Let A be the + event that a randomly selected member of the population has the specified attribute. Let N = m k=1 Nk denote the population size. Then

N

i.e.

pj × Nj P (A | Sj )P (Sj ) = +m P (Sj | A) = +m k=1 P (A | Sk )P (Sk ) k=1 pk ×

j Nj P100p % m k=1 pk Nk

Nk N

pj Nj , = +m k=1 pk Nk

of members having the specified attribute belong to stratum j (1 ≤ j ≤ m). P (6 heads in 10 tosses | p=0.4) 1

4.78 a) P (p = 0.4 | 6 heads in 10 tosses) = P (6 heads in 10 tosses | p=0.4) 1 +P (6 heads in 102 tosses | p=0.55) 1 2 2 #10$ 6 4 0.4 × 0.6 6 # $ = #10$ ≈ 0.3186; P (p = 0.55 | 6 heads in 10 tosses) ≈ 1 − 0.3186 6 4 + 10 0.556 × 0.454 0.4 × 0.6 6 6 = 0.6814; b) 0.0611, 0.9389; P (p = 0.4 | 6 heads in first 10 tosses and 8 heads in the next 10 tosses) #10$ 6 #10$ 8 1 4 2 6 0.4 × 0.6 × 8 0.4 × 0.6 × 2 #10$ #10$ #10$ = #10$ 1 1 6 4 8 2 6 4 8 2 6 0.4 × 0.6 × 8 0.4 × 0.6 × 2 + 6 0.55 × 0.45 × 8 0.55 × 0.45 × 2 0.414 × 0.66 = ≈ 0.0611; 0.414 × 0.66 + 0.5514 × 0.456 P (p = 0.55 | 6 heads in first 10 tosses and 8 heads in the next 10 tosses) ≈ 1 − 0.0611 = 0.9389

91

CHAPTER 4. Conditional Probability and Independence

4.5

Review Exercises

Basic Exercises 4.79 a) 3/8, since there are 3 red marbles among the marbles numbered #8 through #15. b) 3/10, since exactly three of the ten red marbles are numbered in the #8–#15 range. c) P (2nd is red | 1st is red) = 9/14; P (2nd is red | 1st is green) = 10/14 = 5/7. 9 10 3 d) × = . 14 15 7 10 10 5 2 9 × + × = . e) 14 15 14 15 3 3 7 2 3

9 , where we used (d) and (e). 14 9 5 = , where we used (f) and the complementation rule for conditional probability. g) 1 − 14 14

f)

=

4.80 26/57 ≈ 0.456, since P (N has two aces and S has one ace) = implying that

#

4 $# 48 $ 2,1,1 11,12,25 # 52 $ , 13,13,26

" P (S has one ace | N has two aces) =

4.81 a)

4 2, 1, 1 "

P (N has two aces) =

#4$#48$ 2

#5211 $ , 13

%"

48 11, 12, 25 % 52 13, 13, 26 " %" % 4 48 2 11 " % 52 13

% =

26 . 57

3263 ≈ 0.352; 9269

3263 14889 ̸= = P (college). 9269 68335 c) Conditional distributions of type by level and the marginal distribution of type are given by: b) Not independent, since P (college | private) =

Elementary Type

Public (T1 )

33903 38543

Private (T2 )

4640 38543

Total

≈ 0.8796 ≈ 0.1204 1

Level High school 13537 14903 1366 14903

≈ 0.9083 ≈ 0.0917 1

College

11626 14889 3263 14889

≈ 0.7808 ≈ 0.2192 1

P (Ti ) 59066 68335 9269 68335

≈ 0.8644 ≈ 0.1356 1

4.5 Review Exercises

92

d) The conditional distributions of level by type and the marginal distribution of level are given by: Type Public Private P (Li ) Elementary (L1 ) Level

33903 59066

High school (L2 )

13537 59066

College (L3 )

11626 59066

Total

≈ 0.574 ≈ 0.2292 ≈ 0.1968 1

4640 9269 1366 9269 3263 9269

≈ 0.5006 ≈ 0.1474 ≈ 0.352 1

38543 68335 14903 68335 14889 68335

≈ 0.564 ≈ 0.2181 ≈ 0.2179 1

19 18 171 3 × × = ≈ 0.0087 4.82 a) 50 49 48 19600 #3$#19$ b)

1

#50$2 3

≈ 0.0262

28 × 27 × 26 ≈ 0.1671 50 × 49 × 48 28 × 27 × 26 + 3 × 2 × 1 + 19 × 18 × 17 d) ≈ 0.2166 50 × 49 × 48 e) Let Ai denote the event that the ith student selected received a master of arts, let Di denote the event that the ith student selected received a master of public administration, and let Si denote the event that the ith student selected received a master of science, i = 1, 2, 3. Then the required tree diagram is of the form: c)

93

CHAPTER 4. Conditional Probability and Independence

1 48

A2 2 49

28 48 19 48 2 48

A1

28 49

D2

19 49

27 48 19 48 2 48

S2 3 50

28 48 18 48 2 48

A2 3 49



28 50

D1

27 49

D2

19 49

27 48 19 48 3 48 26 48 19 48 3 48

S2 19 50

27 48 18 48 2 48

A2 3 49

28 48 18 48 3 48

S1

28 49

D2

18 49

27 48 18 48 3 48

S2

28 48 17 48

Event

Probability

A3

A1 ∩ A2 ∩ A3

0.00005

D3

A1 ∩ A2 ∩ D3

0.00143

S3

A1 ∩ A2 ∩ S3

0.00097

A3

A1 ∩ D2 ∩ A3

0.00143

D3

A1 ∩ D2 ∩ D3

0.01929

S3

A1 ∩ D2 ∩ S3

0.01357

A3

A1 ∩ S2 ∩ A3

0.00097

D3

A1 ∩ S2 ∩ D3

0.01357

S3

A1 ∩ S2 ∩ S3

0.00872

A3

D1 ∩ A2 ∩ A3

0.00143

D3

D1 ∩ A2 ∩ D3

0.01929

S3

D1 ∩ A2 ∩ S3

0.01357

A3

D1 ∩ D2 ∩ A3

0.01929

D3

D1 ∩ D2 ∩ D3

0.16714

S3

D1 ∩ D2 ∩ S3

0.12214

A3

D1 ∩ S2 ∩ A3

0.01357

D3

D1 ∩ S2 ∩ D3

0.12214

S3

D1 ∩ S2 ∩ S3

0.08143

A3

S1 ∩ A2 ∩ A3

0.00097

D3

S1 ∩ A2 ∩ D3

0.01357

S3

S1 ∩ A2 ∩ S3

0.00872

A3

S1 ∩ D2 ∩ A3

0.01357

D3

S1 ∩ D2 ∩ D3

0.12214

S3

S1 ∩ D2 ∩ S3

0.08143

A3

S1 ∩ S2 ∩ A3

0.00872

D3

S1 ∩ S2 ∩ D3

0.08143

S3

S1 ∩ S2 ∩ S3

0.04944

4.5 Review Exercises

94

4.83 Special multiplication rule. Sampling with replacement makes selection of any given member of the sample independent from selection of all other members of the sample, thus the conditional probabilities, present in the general multiplication rule, are equal to the corresponding unconditional probabilities in the case of sampling with replacement. 4.84 80%. Let J be the event that a person questioned by Kushel enjoys his/her job, and L be the event that a person enjoys his/her personal life. Then P (J | L) =

1−

(P (J c

0.04 4 P (J ∩ L) = = = 0.8 c c ∩ L ) + P (J ∩ L )) 1 − (0.15 + 0.8) 5

4.85 0; 1/3. Clearly, if the sum of the two dice is 9, the 1st die cannot come up 1, thus, the corresponding probability is 0. If the sum of the dice is 4, then the 1st die comes up 1 if and only if the outcome (1, 3) occurs. Since the sum of the dice is 4 whenever one of the three outcomes (1,3),(2,2),(3,1) occurs, the required probability that 1st die is 1, given the sum of the dice is 4, is equal to 1/3. 4.86 The answers will vary. (Mk−1 −1) . Let D be the event that the kth item selected is defective, and A be the event that N (M ) there are M − 1 defective items among the first k − 1 items selected. Then

4.87

P (kth item selected is the last defective) = P (D ∩ A) = P (D | A)P (A) # M $# N −M $ # k−1 $ (N −M )! M! 1 1 M −1 k−1−(M −1) (M −1)! · (k−M )!(N −k)! = × = × = M #N $ # N−1$ . N! N − (k − 1) N − k + 1 (k−1)!(N −k+1)! k−1 M

4.88 a) 4/10. Let Ak be the event that there are k women among the first three students selected, k = 0, 1, 2, 3. Then P (4th selected student is female) =

3 ! k=0

=

3 ! 4−k k=0

7

P (4th selected student is female | Ak )P (Ak ) ·

#4$# k

6 $ 3−k #10$ 3

= 0.4

b) 4/10. By symmetry, the fourth student selected is as likely to be a woman as the first student selected, and the probability that the first student selected is female is 4/10. c) 4/10. Let f1 , f2 , f3 , f4 denote the four female students in a class of n students (with the remaining n − 4 students being male), n ≥ 4. The students are selected from the class one after another without replacement. Let Fi,n be the event that fi is the fourth student selected from the class (of n), i = 1, . . . , 4. Clearly, P (Fi,n ) = P (F1,n ) for all i = 2, 3, 4. Let us show that P (F1,n ) = 1/n for all n ≥ 4. Clearly the desired claim is true for n = 4, since 4P (F1,4 ) = P (F1,4 ) + · · · + P (F4,4 ) = 1, implying that P (F1,4 ) = 1/4. Next suppose that P (F1,n ) = 1/n for n = k. Then, for n = k + 1, we have that P (F1,k+1 ) = P (F1,k+1 |“f1 is among first k selected”)P (“f1 is among first k selected”) k 1 1 = , = P (F1,k )P (“f1 is among first k selected”) = × k k+1 k+1

95

CHAPTER 4. Conditional Probability and Independence

where event “f1 is among first k selected” refers to a class of size k + 1, thus it is a complement of the event “f1 is the last student selected from the class of size (k + 1)”, implying that P (“f1 is among first k selected”) = 1 −

k! k = . (k + 1)! k+1

By mathematical induction principle, it follows that P (F1,n ) = 1/n for all n ≥ 4. Therefore, for all n ∈ {4, 5 . . .}, ") % ! 4 4 4 P Fi,n = P (Fi,n ) = 4P (F1,n ) = . n i=1

i=1

Upon putting n = 10, we conclude that the required probability is equal to 4/10. 4.89 a) 0.5177; By the inclusion-exclusion formula, " % " % " % " % 4 1 4 1 4 1 671 4 1 − + − = ≈ 0.5177 P (at least one 5 in four rolls of a die) = 2 3 4 2 6 3 6 4 6 1296 1 6 b) 0.5177, since " %4 5 ≈ 0.5177 P (at least one 5 in four rolls of a die) = 1 − P (no 5s in four rolls) = 1 − 6 c) In the context of the problem it is natural to assume that tosses are independent of one another. This contextual independence is used in our solutions to (a) and (b). 4.90 a) 1 − (3/4)8 ≈ 0.8999 b) 1 − (3/4)8 − 8 × (3/4)7 × (1/4) ≈ 0.6329 c) 0.6329/0.8999 ≈ 0.7033, where we used the answers to (a) and (b). 6 # 3 $7 # 3 $6 7 1 3 −7· 4 · 4 4 · 1− 4 d) ≈ 0.658 0.6329 4.91 a) 0.914 ≈ 0.6857 b) 0.913 × 0.09 ≈ 0.0678 3 c) 4# × ≈ 0.2713 $ 0.912 × 0.09 4 d) 2 0.91 0.092 ≈ 0.0402

4.92 Not independent. If the gender and activity limitation were independent, one should have had that P (limitation | female) = P (limitation | male), which is not the case here. 4.93 a) Yes, independent, since for all a, b ∈ [0, 1], P ({(x, y) ∈ Ω : x < a, y < b}) = ab = P ({(x, y) ∈ Ω : x < a})P ({(x, y) ∈ Ω : y < b}),

where Ω = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. b) Yes, independent, since for all a, b ∈ [0, 1] (with Ω as in (a)), P ({(x, y) ∈ Ω : x < a, y > b}) = a(1 − b) = P ({(x, y) ∈ Ω : x < a})P ({(x, y) ∈ Ω : y > b}).

4.5 Review Exercises

96

4.94 a) No, not independent when a ∈ (0, 1) and b ∈ (0, 1). For Ω = {(x, y) : 0 ≤ y ≤ x ≤ 1}, 2 /2 2 /2 we have P ((x, y) ∈ Ω : x < a) = a1/2 = a2 , P ({(x, y) ∈ Ω : y < b}) = 1 − (1−b) = b(2 − b), 1/2 and 8 2 a , if a ≤ b, P ({(x, y) ∈ Ω : x < a, y < b}) = a2 − (a − b)2 , if a > b. Therefore, it’s easy to see that P ({(x, y) ∈ Ω : x < a, y < b}) ̸= P ({(x, y) ∈ Ω : x < a})P ({(x, y) ∈ Ω : y < b}). b) Yes, the events are independent, since 2

P ({(x, y) ∈ Ω : x < a})P ({(x, y) ∈ Ω : y < bx}) = a b =

a(ba) 2 1 2

= P ({(x, y) ∈ Ω : x < a, y < bx}).

4.95 1 − 0.9999748 ≈ 0.0721, i.e. there is about 7.21% chance that at least one “critically 1” item fails. 4.96 a) 0.208 × 0.471 + 0.276 × 0.529 ≈ 0.244;

b) 0.471; 0.208 × 0.471 c) ≈ 0.4015 0.244 d) Approximately 24.4% of U.S. engineers and scientists have masters degree as their highest degree obtained. 47.1% of all U.S. engineers and scientists are engineers. Among the U.S. engineers and scientists with masters degree as their highest degree obtained approximately 40.15% are engineers. 4.97 a) 2/3, since the required conditional probability equals to P (both sides are red) = P (first side is red) 1·

1 2 1 2

+

1 2

·

1 2

2 = . 3

b) The red face showing is equally likely to be any one of the three red sides, and the other side can be red in two of those three possibilities. 4.98 a) 2.3%, since P (a randomly selected item is defective) = 0.03 × 0.5 + 0.02 × 0.3 + 0.01 × 0.2 = 0.023 b) 26.1%, since P (item is from Factory II | item is defective) =

0.02 × 0.3 6 = ≈ 0.261 0.023 23

4.99 P (2nd side is head | 1st side is head) =

P (both sides are heads) = P (1st side is head)

1 3 1 2

=

2 . 3

(In the above, P (1st side is head) = 12 can be established either directly by symmetry, or by the law of total probability, since P (1st side is head) = 13 × 1 + 31 × 12 + 31 × 0 = 12 ).

97

CHAPTER 4. Conditional Probability and Independence

4.100 Let A be the event that the student knows the answer, and C be the event that the student gives the correct answer to the question. 500p %, since a) (4p + 1) p 5p P (A | C) = = . 1 4p + 1 p + 5 · (1 − p) b) If p = 10−6 , then

P (A | C) =

5 5p = 4p + 1 4+

1 p

=

5 5 ≈ 6, 4 + 106 10

i.e. the probability is approximately five in one million that a student who answers correctly knows the answer. c) If p = 1 − 10−6 , then 1 − p = 10−6 and p ≈ 1, implying that P (Ac | C) = 1 −

5p 1−p 10−6 1 = ≈ = , 4p + 1 4p + 1 5 5 × 106

i.e. the probability is about one in five million that a student who answers correctly is ignorant of the answer. Theory Exercises 4.101 a) For each N ∈ N , by independence of A1 , A2 , . . . , ", % . N N P An = P (An ). n=1

Let BN =

N ,

n=1

n=1

An . Then, in view of BN ⊃ BN +1 for all N ∈ N and continuity properties of the

probability measure (Proposition 2.11), one must have that ", % ∞ ∞ N . . P BN = lim P (BN ) = lim P (An ) = P (An ). N =1

N →∞

N →∞

n=1

The required conclusion then easily follows upon noting that

n=1

∞ ,

BN =

∞ ∞ , N , , An . An ) = (

N =1 n=1

N =1

n=1

b) Using the complementation rule, De Morgan’s law, independence of Ac1 , Ac2 , . . . (which holds by Proposition 4.5) and the result of (a), one obtains that ") % ", % ∞ ∞ ∞ ∞ . . c P An = 1 − P An = 1 − P (Acn ) = 1 − (1 − P (An )). n=1

n=1

n=1

n=1

4.102 Using definition of conditional probability and the multiplication rule (Proposition 4.2), one obtains that P (A ∩ B)/P (B) P (A ∩ B) P (B | A)P (A) P (A) P (B | A) P (A | B) = = = = · . c c c c c P (A | B) P (A ∩ B)/P (B) P (A ∩ B) P (B | A )P (A ) P (Ac ) P (B | Ac )

4.5 Review Exercises

98

Advanced Exercises 4.103 a) For i = 0, 1, · · · , min(k, d), #ℓ$#n−ℓ$ #d$#N −d$

min(n+i−k, d)

!

pi =

i

#nk−i $ ·



k

ℓ=max(i, n+d−N )

#Nn−ℓ $ ; n

and pi = 0 for all i ∈ / {0, 1, · · · , min(k, d)}. To obtain the answer, note first that P (there are ℓ defectives in the 1st sample) = 0 unless ℓ ∈ {0, · · · , min(d, n)} and n − ℓ ≤ N − d. On the other hand, for all integers ℓ ∈ {max(0, n + d − N ), · · · , min(n, d)}, #d$#N −d$ P (there are ℓ defectives in the 1st sample) =



#Nn−ℓ $ . n

Similarly, for i ∈ {0, · · · , ℓ} such that n − ℓ ≥ k − i, we have that

P (there are i defectives in the 2nd sample | there are ℓ defectives in the 1st sample) =

#ℓ$#n−ℓ$ i

#nk−i $ , k

and the above conditional probability is zero otherwise. Thus, by the law of total probability, min(n,d)

pi =

!

ℓ=max(0,n+d−N )

P (i defectives in the 2nd sample | ℓ defectives in the 1st sample) · min(n+i−k, d)

!

=

ℓ=max(i, n+d−N )

#ℓ$#n−ℓ$ #d$#N −d$ i

#nk−i $ · k



#Nn−ℓ $ .

#d$#N −d$ ℓ

#Nn−ℓ $ n

n

b) p0 ≈ 0.9035497, p1 ≈ 0.09295779, p2 ≈ 0.003435811, p3 ≈ 5.63 × 10−5 , p4 ≈ 4.03 × 10−7 , p5 ≈ 9.87 × 10−10 . c) #N −d$ " %" % N −d n n #N $ n k n = " %" %" %. min(n−k,d) p0 ! d N −d n−ℓ ℓ n−ℓ k ℓ=max(0,n+d−N )

4.104 a) When sampling with replacement, for i = 0, · · · , k, pi =

n " % ! k ℓ=0

i

i

k−i

(ℓ/n) ((n − ℓ)/n)

" % n (d/N )ℓ ((N − d)/N )n−ℓ , ℓ

since, for ℓ = 0, · · · , n, " % n P (ℓ defectives in the 1st sample) = (d/N )ℓ ((N − d)/N )n−ℓ , ℓ

99

CHAPTER 4. Conditional Probability and Independence

" % k P (i defectives in the 2nd sample | ℓ defectives in the 1st sample) = (ℓ/n)i ((n − ℓ)/n)k−i . i

b) p0 ≈ 0.9122056, p1 ≈ 0.0767048, p2 ≈ 0.01005675, p3 ≈ 0.0009548676, p4 ≈ 7.32 × 10−5 , p5 ≈ 4.6 × 10−6 , p6 ≈ 2.33 × 10−7 , p7 ≈ 9.15 × 10−9 , p8 ≈ 2.63 × 10−10 , p9 ≈ 4.96 × 10−12 , p10 ≈ 4.59 × 10−14 . c) ((N − d)/N )n (N − d)n . = n " % ! p0 k n ℓ n−ℓ ((n − ℓ)/n) d (N − d) ℓ ℓ=0

4.105 275/1296. Let X be the number of tosses until first 6 occurs. Then P (X is even) =

∞ !

P (X = 2k) =

k=1

∞ ∞ ! 5 1 5 ! 5 (25/36)m = · (5/6)2k−1 (1/6) = = , 25 36 m=0 36 1 − 36 11 k=1

implying that P (X = 4 | X is even) =

(5/6)3 (1/6) 275 = ≈ 0.2122 5/11 1296

4.106 Prisoner’s paradox. No, the probability remains equal to 1/3. Let A be the event that the jailer tells that a will be freed, and let B be the event that the jailer tells that b will be freed. Also, let Eb be the event that b is to be executed, and Ec be the event that c is to be executed. Then P (A | Ec )P (Ec ) = P (Ec | A) = P (A | Ec )P (Ec ) + P (A | Eb )P (Eb )

1 2

×

1 2 1 3

× 31 +1×

1 3

1 = . 3

Similarly, P (Ec | B) = 1/3. 4.107 p. By symmetry, the kth member selected is as likely to have the specified attribute as the 1st member selected. 4.108 r/(r+g). Let Rk denote the event that a ball chosen from urn #k is red. Let pk = P (Rk ), k ∈ {1, · · · , n}. Then for all k ∈ {2, · · · , n}, c c pk = P (Rk ) = P (Rk | Rk−1 )P (Rk−1 ) + P (Rk | Rk−1 )P (Rk−1 )

= and p1 =

r r + pk−1 r+1 · pk−1 + · (1 − pk−1 ) = r+g+1 r+g+1 r+g+1

r . The required answer then follows by induction. r+g

4.109 a) The required conditional probability equals to N ! (k/N )n+1 ·

P (all n + 1 children selected are boys) = k=0 N P (first n children selected are boys) ! (k/N )n · k=0

1 N +1

1 N +1

=

N ! (k/N )n+1 k=0 N !

n

(k/N )

k=0

,

4.5 Review Exercises

100

by the law of total probability and since P (all n children selected are boys | kth family is chosen) = (k/N )n . b) If N is large, we can use Riemann integral approximations ' 1 N 1 ! 1 (k/N )n+1 ≈ xn+1 dx = , N n+2 0 k=0

' 1 N 1 ! 1 (k/N )n ≈ xn dx = , N n+1 0 k=0

and conclude that the answer to (a) is given by: N ! (k/N )n+1 k=0 N !

(k/N )n

1 n+1 ≈ n+2 = . 1 n+2 n+1

k=0

4.110 The required result follows upon noting that any sequence of outcomes for the n repetitions of the experiment that leads to outcome Ei occurring ki times for i #= 1, · · · $, m, will have n probability pk11 pk22 . . . pkmm , by independence of repetitions. As there are k1 ,...,k = k1 !k2n! !...km ! m such sequences of outcomes (since there are n!/(k1 !k2 ! . . . km !) different permutations of n elements of which k1 are alike, k2 are alike,. . . ,km are alike), it follows that " % n P (n(E1 ) = k1 , . . . , n(Em ) = km ) = pk1 . . . pkmm . k1 , . . . , km 1 4.111 i/10. Let Ei denote the event that you ruin your opponent, given that your starting capital is equal to $i and your opponent’s starting capital is equal to $(10−i), and let pi = P (Ei ), where i = 0, . . . , 10. Then pi = P (Ei | H on 1st toss)P (H) + P (Ei | T on 1st toss)P (T) = pi+1 ·

1 1 + pi−1 · , 2 2

for all i = 1, . . . , 9, and obviously p0 = 0 and p10 = 1. Therefore, pi+1 = 2pi − pi−1 = 2(2pi−1 − pi−2 ) − pi−1 = 3pi−1 − 2pi−2 = · · · = (2 + k)pi−k − (1 + k)pi−k−1 , for all k = 0, · · · , i − 1. Taking k = i − 1 in the above equation yields that pi+1 = (i + 1)p1 − ip0 = (i + 1)p1 , i ∈ {0, · · · , 9}, since p0 = 0. Therefore, p10 = 10p1 = 1, implying that p1 = 1/10 and, thus, pi = ip1 = i/10 for all i ∈ {0, · · · , 10}. 4.112 Randomized Response: Let S denote the event that a sensitive question is chosen, I denote the event that an innocuous question is chosen and Y be the event that the answer is positive. Suppose P (Y | I) is known. Let X be the number of positive responses in the sample, and n be the (known) size of the sample. Then, assuming that the coin is balanced (thus, P (I) = P (S) = 1/2), by the law of total probability, 1 1 P (Y ) = P (Y | S) + P (Y | I) , thus, P (Y | S) = 2P (Y ) − P (Y | I). 2 2

101

CHAPTER 4. Conditional Probability and Independence

On the other hand, P (Y ) is naturally estimated by X/n, which is a proportion of positive 2X responses in the sample. Thus, a natural estimate of P (Y | S) is given by − P (Y | I). n 4.113 Assume that 0 < P (head) = p < 1 for the coin in question. Consider successive pairs of tosses of the coin. If both of a pair’s tosses show the same side, disregard and toss another pair. When pair’s tosses differ, treat outcome {HT} as a “head” of a balanced coin, and treat {TH} as a “tail” of a balanced coin (since P ({HT}) = P ({TH})). This is a well-defined procedure for simulating a balanced coin provided that only finitely many pairs of tosses of the original coin are needed to simulate one toss of a balanced coin. The latter indeed is true (with probability one), since ∞ !

P (1st pair showing either HT or TH occurs on the kth pair′ s toss)

k=1

=

∞ ! k=1

(1 − p)2 + p2

9k−1

2p(1 − p) = 1.

CHAPTER FIVE

Instructor’s Solutions Manual Anna Amirdjanova University of Michigan, Ann Arbor

Neil A. Weiss Arizona State University

FOR A Course In

Probability Neil A. Weiss Arizona State University

Boston London Mexico City

Toronto Munich

San Francisco Sydney Paris

New York

Tokyo

Cape Town

Singapore

Madrid

Hong Kong

Montreal

Publisher: Greg Tobin Editor-in-Chief: Deirdre Lynch Associate Editor: Sara Oliver Gordus Editorial Assistant: Christina Lepre Production Coordinator: Kayla Smith-Tarbox Senior Author Support/Technology Specialist: Joe Vetere Compositor: Anna Amirdjanova and Neil A. Weiss Accuracy Checker: Delray Schultz Proofreader: Carol A. Weiss

Copyright © 2006 Pearson Education, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.

Chapter 5

Discrete Random Variables and Their Distributions 5.1 From Variables to Random Variables Basic Exercises 5.1 Answers will vary. 5.2 a) Yes, X is a random variable because it is a real-valued function whose domain is the sample space of a random experiment, namely, the random experiment of tossing a balanced die twice and observing the two faces showing. Yes, X is a discrete random variable because it is a random variable with a finite (and, hence, countable) range, namely, the set {2, 3, . . . , 12}. b)–f) We have the following table: Part

Event

b) c) d) e)

{X = 7} {X > 10} {X = 2 or 12} {4 ≤ X ≤ 6}

f)

{X ∈ A}

The sum of the two faces showing . . .

Subset of the sample space

is seven. exceeds 10 (i.e., is either 11 or 12). is either 2 or 12. is four, five, or six.

{(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} {(5, 6), (6, 5), (6, 6)} {(1, 1), (6, 6)} {(1, 3), (2, 2), (3, 1), (1, 4), (2, 3), (3, 2), (4, 1), (1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} {(1, 1), (1, 3), (1, 5), (2, 2), (2, 4), (2, 6), (3, 1), (3, 3), (3, 5), (4, 2), (4, 4), (4, 6), (5, 1), (5, 3), (5, 5), (6, 2), (6, 4), (6, 6)}

is an even integer.

5.3 a) Yes, X is a random variable because it is a real-valued function whose domain is the sample space of a random experiment, namely, the random experiment of observing which of five components of a unit are working and which have failed. Yes, X is a discrete random variable because it is a random variable with a finite (and, hence, countable) range, namely, the set {0, 1, 2, 3, 4, 5}. b)–e) We have the following table: Part

Event

b)

{X ≥ 4}

c) d) e)

{X = 0} {X = 5} {X ≥ 1}

In words . . .

Subset of the sample space

At least four (i.e., four or five) of the components are working. None of the components are working. All of the components are working. At least one of the components is working.

{(s, s, s, s, f ), (s, s, s, f, s), (s, s, f, s, s), (s, f, s, s, s), (f, s, s, s, s), (s, s, s, s, s)} {(f, f, f, f, f )} {(s, s, s, s, s)} {(f, f, f, f, f )}c

5-1

5-2

CHAPTER 5

f) {X = 2}

g) {X ≥ 2}

Discrete Random Variables and Their Distributions

h) {X ≤ 2}

i) {2 ≤ X ≤ 4}

5.4 a) Let m denote male and f denote female. Then we can represent each possible outcome as an ordered triple, where each entry of the triple is either m or f . For instance, (m, f, f ) represents the outcome that the first born is male, the second born is female, and the third born is female. Therefore, a sample space for this random experiment is ! = { (x1 , x2 , x3 ) : xj ∈ {m, f } , 1 ≤ j ≤ 3 }. b) We have the following table: (m, m, m) (m, m, f ) (m, f, m) (m, f, f ) (f, m, m) (f, m, f ) (f, f, m) (f, f, f )

ω Y (ω)

0

1

1

2

1

2

2

3

c) Yes, Y is a random variable because it is a real-valued function whose domain is the sample space of a random experiment. Yes, Y is a discrete random variable because it is a random variable with a finite (and, hence, countable) range, namely, the set {0, 1, 2, 3}. ! 5.5 Take ! = { (x, y) : −3 ≤ x ≤ 3, −3 ≤ y ≤ 3 } and, for brevity, set d(x, y) = x 2 + y 2 . a) We have " 10, if d(x, y) ≤ 1; S(x, y) = 5, if 1 < d(x, y) ≤ 2; 0, otherwise. b) Yes, S is a discrete random variable because it is a random variable with a finite (and, hence, countable) range, namely, the set {0, 5, 10}. c)–h) We have the following table: Part

Event

c)

{S = 5}

d)

{S > 0}

e)

{S ≤ 7}

f)

{5 < S ≤ 15}

g) h)

{S < 15} {S < 0}

The archer’s score is . . .

5 points; that is, she hits the ring with inner radius 1 foot and outer radius 2 feet centered at the origin. positive; that is, she hits the disk of radius 2 feet centered at the origin. at most 7 points; that is, she does not hit the bull’s eye. more than 5 points and at most 15 points; that is, she hits the bull’s eye. less than 15 points, which is certain. negative, which is impossible.

Subset of the sample space

{ (x, y) ∈ ! : 1 < d(x, y) ≤ 2 } { (x, y) ∈ ! : d(x, y) ≤ 2 } { (x, y) ∈ ! : d(x, y) > 1 } { (x, y) ∈ ! : d(x, y) ≤ 1 } ! ∅

5.6 a) Let n denote nondefective and d denote defective. Then we can represent each possible outcome as an ordered five-tuple, where each entry of the five-tuple is either n or d. For instance, (n, d, n, n, d) represents the outcome that the first, third, and fourth parts selected are nondefective and the second and fifth parts selected are defective. Thus, a sample space for this random experiment is # $ ! = (x1 , x2 , x3 , x4 , x5 ) : xj ∈ {n, d} , 1 ≤ j ≤ 5 . b) We have Y = 1 if X ≤ 1, and Y = 0 if X > 1.

5.1

5-3

From Variables to Random Variables

5.7 a) Let r denote red and w denote white. Then we can represent each possible outcome as an ordered triple, where the first entry of the triple is the number of the urn chosen, and the second and third entries are the colors of the first and second balls selected, respectively. For instance (II, r, w) represents the outcome that Urn II is chosen, the first ball selected is red, and the second ball selected is white. Hence, a possible sample space is ! = {(I, r, r), (I, r, w), (I, w, r), (II, r, w), (II, w, r), (II, w, w)}. b) We have the following table: ω

(I, r, r) (I, r, w) (I, w, r) (II, r, w) (II, w, r) (II, w, w) 2

X (ω)

1

1

1

1

0

5.8 a) As the interview pool consists of three people, the number of women in the interview pool must be between zero and three, inclusive. Hence, the possible values of the random variable X are 0, 1, 2, and 3. b) The number of ways that the interview pool can contain exactly x women (x = 0, 1, 2, 3) is the number of ways that x women can be chosen from the five women and 3 − x men can% be & chosen from the six men. The number of ways that x women can be chosen from the five women is x5 , and the number % 6 & . Consequently, by the BCR, the number of ways that 3 − x men can be chosen from the six men is 3−x % &% 6 & . We, therefore, have the of ways that the interview pool can contain exactly x women equals x5 3−x following table: x

Number of ways {X = x } can occur

' (' ( 5 6 = 1 · 20 = 20 0 3 ' (' ( 5 6 = 5 · 15 = 75 1 2 ' (' ( 5 6 = 10 · 6 = 60 2 1 ' (' ( 5 6 = 10 · 1 = 10 3 0

0 1 2 3

c) {X ≤ 1} is the event that at most one (i.e., 0 or 1) woman is in the interview pool. From the table in part (b), we see that the number of ways that event can occur is 20 + 75 = 95. 5.9 a) We have the following table: ω

HHH

HHT

HTH

HTT

THH

THT

TTH

TTT

Y (ω)

3

1

1

−1

1

−1

−1

−3

b) The event {Y = 0} is, in words, that the number of heads equals the number of tails, which is impossible. Hence, as a subset of the sample space, we have {Y = 0} = ∅.

5.10 Denote the two salads by s1 and s2 , the three entrees by e1 , e2 , and e3 , and the two desserts by d1 and d2 . Also, let s0 , e0 , and d0 represent a choice of no salad, entree, and dessert, respectively. Thus, each possible choice of a meal is of the form si ej dk , where i, k ∈ {0, 1, 2} and j ∈ {0, 1, 2, 3}. We note that the sample space consists of 3 · 4 · 3 = 36 possible outcomes.

5-4

CHAPTER 5

Discrete Random Variables and Their Distributions

a) We have the following table: ω

X (ω)

ω

X (ω)

ω

X (ω)

ω

X (ω)

s0 e0 d0 s0 e0 d1 s0 e0 d2 s0 e1 d0 s0 e1 d1 s0 e1 d2 s 0 e 2 d0 s0 e2 d1 s0 e2 d2

0.00 1.50 1.50 2.50 4.00 4.00 3.00 4.50 4.50

s0 e3 d0 s0 e3 d1 s0 e3 d2 s1 e0 d0 s1 e0 d1 s1 e0 d2 s1 e1 d0 s1 e1 d1 s1 e1 d2

3.50 5.00 5.00 1.00 2.50 2.50 3.50 5.00 5.00

s1 e2 d0 s1 e2 d1 s1 e2 d2 s1 e3 d0 s1 e3 d1 s1 e3 d2 s2 e0 d0 s2 e0 d1 s2 e0 d2

4.00 5.50 5.50 4.50 6.00 6.00 1.50 3.00 3.00

s2 e1 d0 s2 e1 d1 s2 e1 d2 s2 e2 d0 s2 e2 d1 s2 e2 d2 s2 e3 d0 s2 e3 d1 s2 e3 d2

4.00 5.50 5.50 4.50 6.00 6.00 5.00 6.50 6.50

b) The event {X ≤ 3} is, in words, that the meal costs $3.00 or less. As as subset of the sample space, we have that {X ≤ 3} = {s0 e0 d0 , s0 e0 d1 , s0 e0 d2 , s0 e1 d0 , s0 e2 d0 , s1 e0 d0 , s1 e0 d1 , s1 e0 d2 , s2 e0 d0 , s2 e0 d1 , s2 e0 d2 } . 5.11 We can take the sample space to be ! = {w, ℓw, ℓℓw, . . .}, where ℓ) .*+ . . ℓ, w

n−1 times

represents the outcome that you don’t win a prize the first n − 1 weeks and you do win a prize the nth week; that is, it takes n weeks before you win a prize. a) Yes, W is a random variable because it is a real-valued function whose domain is the sample space !. We have n ∈ N. W (ℓ) .*+ . . ℓ, w ) = n, n−1 times

b) Yes, W is a discrete random variable because, as we see from part (a), the range of W is N . c) {W > 1} is the event that it takes you more than 1 week to win a prize or, equivalently, you don’t win a prize the first week. d) {W ≤ 10} is the event that you win a prize by the 10th week. e) {15 ≤ W < 20} is the event that you first win a prize between weeks 15 and 19, inclusive; that is, it takes you at least 15 weeks but less than 20 weeks to win a prize. 5.12 a) Define g(x) = ⌊m + (n − m + 1)x⌋. For 0 < x < 1, we have

m − 1 < m + (n − m + 1)x − 1 < g(x) ≤ m + (n − m + 1)x < n + 1. % & Because g is integer valued, it follows that g (0, 1) ⊂ {m, m + 1, . . . , n}. However, if k is an integer with m + 1 ≤ k ≤ n, then ' ( k−m k−m 0<

ln(0.05) ln(0.8)

or

n > 13.4.

Therefore, a sequence of decimal digits has probability exceeding 0.95 of containing at least one 6 or 7 if and only if it is of length 14 or more.

5.3

5-27

Binomial Random Variables

5.51 a) Consider the experiment of randomly selecting a number from the interval (0, 1), and let the specified event be that the number obtained exceeds 0.7. In three independent repetitions of this random experiment, the median of the three numbers obtained exceeds 0.7 if and only if two or more of them exceed 0.7—that is, if and only if two or more of the three trials result in success. b) In part (a), the specified event is that a number selected at random from the interval (0, 1) exceeds 0.7, which has probability 1 − 0.7 = 0.3. 5.52 Let X denote the total number of red marbles selected. Also, let I = event Urn I is chosen, and II = event Urn II is chosen. The probabilities that a red marble is selected from Urns I and II are 0.5 and 0.55, respectively. Because the marbles are selected with replacement, we see that, if Urn I is chosen, then X ∼ B(6, 0.5), whereas, if Urn II is chosen, then X ∼ B(6, 0.55). Hence, for x = 0, 1, . . . , 6, ' ( ' ( 6 6 x 6−x P (X = x | I ) = (0.5) (0.5) and P (X = x | II ) = (0.55)x (0.45)6−x . x x As each urn is equally likely to be the one chosen, we have P (I ) = P (II ) = 0.5. Applying Bayes’s rule, we now conclude that, for x = 0, 1, . . . , 6, P (II )P (X = x | II ) P (I )P (X = x | I ) + P (II )P (X = x | II ) % & 0.5 · x6 (0.55)x (0.45)6−x = % & % & 0.5 · x6 (0.5)x (0.5)6−x + 0.5 · x6 (0.55)x (0.45)6−x ' (−1 (0.55)x (0.45)6−x 1 = = 1 + . (0.5)x (0.5)6−x + (0.55)x (0.45)6−x (1.1)x (0.9)6−x

P (II | X = x) =

Evaluating this last expression, we get the following table, where the probabilities are displayed to four decimal places: x

0

1

2

3

4

5

6

P (II | X = x )

0.3470

0.3938

0.4425

0.4925

0.5425

0.5917

0.6392

Theory Exercises 5.53 a) Applying the binomial theorem, we get n ' ( &n / n k 1 = 1 = p + (1 − p) = p (1 − p)n−k . k k=0 n

%

b) Let X ∼ B(n, p). Then, as a PMF must sum to 1, we get n ' ( / / n k 1= pX (x) = p (1 − p)n−k . k x k=0

5-28

CHAPTER 5

Discrete Random Variables and Their Distributions

5.54 Suppose that X is symmetric. Then, in particular, we have ' ( ' ( n n n 0 n n−n p = p (1 − p) = pX (n) = pX (n − n) = pX (0) = p (1 − p)n−0 = (1 − p)n . n 0 Therefore, p = 1 − p, which implies that p = 1/2. Conversely, suppose that p = 1/2. Let R = {0, 1, . . . , n}. We note that n − x ∈ R if and only if x ∈ R. Hence, for x ∈ / R, we have pX (x) = 0 = pX (n − x). Moreover, for x ∈ R, ' ( ' ( n −n n pX (x) = 2 = 2−n = pX (n − x). x n−x Thus, we have shown that pX (x) = pX (n − x) for all x; that is, X is symmetric. 5.55 a) We have

and

' ( ' ( n n! n! (n − 1)! n−1 & =n (n − k) = (n − k) = =n % k k! (n − k)! k! (n − k − 1)! k k! (n − 1) − k ! ' ( ' ( n n! n! (n − 1)! n−1 % & =n k =k = =n . k k! (n − k)! (k − 1)! (n − k)! k−1 (k − 1)! (n − 1) − (k − 1) !

b) Using integration by parts with u = (1 − t)k and dv = t n−k−1 dt, we get =

q

t 0

n−k−1

<

t n−k (1 − t) dt = (1 − t) · n−k k

=

k

>q 0



=

k 1 pk q n−k + n−k n−k

q 0

=

q 0

t n−k · k(1 − t)k−1 (−1) dt n−k t n−k (1 − t)k−1 dt.

% & c) Multiplying both sides of the equation in part (b) by (n − k) nk , we get ' (= q ' (= q n n n−k−1 k (n − k) t (1 − t) dt = pX (k) + k t n−k (1 − t)k−1 dt. k 0 k 0 Applying now the identities in part (a) gives ' (= q ' (= q n−1 n−1 n−k−1 k n t (1 − t) dt = pX (k) + n t n−k (1 − t)k−1 dt. k k−1 0 0 Now let

' (= q n−1 Ak = n t n−k−1 (1 − t)k dt. k 0

Then we have Ak = pX (k) + Ak−1 , or pX (k) = Ak − Ak−1 . Therefore, P (X ≤ k) = pX (0) +

k /

j =1

pX (j ) = pX (0) +

k / (Aj − Aj −1 ) = pX (0) + Ak − A0 .

j =1

5.3

Binomial Random Variables

However, A0 = n Consequently,

=

0

q

5-29

t n−1 dt = q n = pX (0).

' (= q ' ( n−1 n−1 n−k−1 k P (X ≤ k) = Ak = n t (1 − t) dt = n Bq (n − k, k + 1). k k 0

d) Let Y = n − X. Then Y is the number of failures in n Bernoulli trials with success probability p. Interchanging the roles of success and failure, we see that Y ∼ B(n, q). Applying part (c) to Y with k replaced by n − k − 1, we get P (X > k) = P (n − Y > k) = P (Y < n − k) = P (Y ≤ n − k − 1) ' (= p n−1 =n t n−(n−k−1)−1 (1 − t)n−k−1 dt n−k−1 0 ' ( ' (= p n−1 n−1 k n−k−1 t (1 − t) dt = n Bp (k + 1, n − k). =n k k 0

Advanced Exercises 5.56 a) Let X denote the total number of red marbles obtained in the five draws. We note that, because the sampling is with replacement, X ∼ B(5, k/12). Therefore, ( ' ( ' (2 ' k 3 10 2 5 k 1− = k (12 − k)3 . P (X = 2) = 12 12 2 (12)5

Consider the function g(x) = x 2 (12 − x)3 on the interval [0, 12]. Now,

g ′ (x) = 2x(12 − x)3 − 3x 2 (12 − x)2 = x(12 − x)2 (24 − 2x − 3x) = x(12 − x)2 (24 − 5x).

Thus, g ′ (x) > 0 for x ∈ (0, 4.8) and g ′ (x) < 0 for x ∈ (4.8, 12). Consequently, g is strictly increasing between 0 and 4.8 and strictly decreasing between 4.8 and 12. As g(4) = 8192 and g(5) = 8575, it follows that P (X = 2) is maximized when k = 5. b) Referring to part (a), we see that the maximum probability is 10 10 g(5) = · 8575 = 0.345. 5 (12)5 (12) 5.57 Let p denote the probability that any particular engine works. Also, let X and Y denote the number of working engines for the first and second rockets, respectively. We know that X ∼ B(2, p) and Y ∼ B(4, p). Rocket 1 achieves its mission if and only if X ≥ 1, and rocket 2 achieves its mission if and only if Y ≥ 2. By assumption, then, P (X ≥ 1) = P (Y ≥ 2). Now, ' ( 2 0 P (X ≥ 1) = 1 − P (X = 0) = 1 − p (1 − p)2 = 1 − (1 − p)2 = 2p − p2 0

and

' ( ' ( 4 1 4 0 4 p (1 − p)3 P (Y ≥ 2) = 1 − P (Y = 0) − P (Y = 1) = 1 − p (1 − p) − 1 0 = 1 − (1 − p)4 − 4p(1 − p)3 = 6p2 − 8p 3 + 3p 4 .

5-30

CHAPTER 5

Discrete Random Variables and Their Distributions

Therefore, we want to find p so that 2p − p 2 = 6p2 − 8p 3 + 3p 4 , or 3p4 − 8p 3 + 7p 2 − 2p = 0. However, 3p4 − 8p 3 + 7p 2 − 2p = p(p − 1)2 (3p − 2). As the common probability of achieving the mission is nonzero, we know that p ̸= 0. Consequently, either p = 2/3 or p = 1.

5.4 Hypergeometric Random Variables Basic Exercises 5.58 When we use %the& word %m&“definition” in this exercise, we are referring%mto& Definition 5.5. If r = 0, m then, by definition, r = 0 = 1; similarly, if r < 0, then, by definition, r = 0. If 1 ≤ r ≤ m, then, by definition, ' ( m m(m − 1) · · · (m − r + 1) m(m − 1) · · · (m − r + 1) (m − r)! = = · r r! r! (m − r)! =

m(m − 1) · · · (m − r + 1)(m − r)! m! = . r! (m − r)! r! (m − r)!

If r > m, then, by definition, ' ( m(m − 1) · · · 1 · 0 · (−1) · · · (m − r + 1) 0 m m(m − 1) · · · (m − r + 1) = = = 0. = r! r! r! r Hence,

⎧ m! ⎪ ⎪ , ⎪ ⎨ r! (m − r)!

' ( m = 1, ⎪ r ⎪ ⎪ ⎩ 0,

if 1 ≤ r ≤ m; if r = 0;

if r > m or

r < 0.

5.59 Referring to Definition 5.5 on page 211 yields the following results. a) Because 3 > 0, ' ( 7 7·6·5 7·6·5 = = = 35. 3 3! 6 b) Because −3 < 0, ' ( 7 = 0. −3 c) Because 3 > 0, we have ' ( −7 (−7) · (−8) · (−9) 7·8·9 = =− = −84. 3 3! 6

d) Because −3 < 0, e) Because 7 > 0,

'

( −7 = 0. −3

' ( 3 3 · 2 · 1 · 0 · (−1) · (−2) · (−3) 0 = = = 0. 7 7! 7!

5.4

Hypergeometric Random Variables

f) Because −7 < 0,

'

5-31

( 3 = 0. −7

g) Because 7 > 0, ' ( 8·9 −3 (−3) · (−4) · (−5) · (−6) · (−7) · (−8) · (−9) 7·6·5·4·3·8·9 =− = −36. = =− 7! 2·1 7 7!

h) Because −7 < 0,

' ( −3 = 0. −7

i) Because 3 > 0, we have ' ( 7 1/4 (1/4) · (1/4 − 1) · (1/4 − 2) (1/4) · (−3/4) · (−7/4) 1·3·7 = = = = 3 . 3 3! 3! 128 4 ·6

j) Because −3 < 0,

' ( 1/4 = 0. −3

k) Because 3 > 0, we have ' ( 15 −1/4 (−1/4) · (−1/4 − 1) · (−1/4 − 2) (−1/4) · (−5/4) · (−9/4) 1·5·9 =− = = =− 3 . 3 3! 3! 128 4 ·6

l) Because −3 < 0,

m) No, because 1/4 is not an integer.

'

( −1/4 = 0. −3

5.60 We refer to Defintion 5.5 on page 211. a) If r = 0, then ' ( ' ( −1 −1 = = 1 = (−1)0 = (−1)r . r 0

If r > 0, then ' ( −1 (−1) · (−2) · · · (−1 − r + 1) (−1)r 1 · 2 · · · r r! = = = (−1)r = (−1)r . r r! r! r! b) If r = 0, then ' ( ' ( ' ( ' ( ' ( −m −m m−1 0 m+0−1 r m+r −1 = =1= = (−1) = (−1) . 0 0 r 0 0 If r > 0, then ' ( −m (−m) · (−m − 1) · · · (−m − r + 1) (−1)r m · (m + 1) · · · (m + r − 1) = = r r! r! ' ( r (m + r − 1) · (m + r − 2) · · · m r m+r −1 = (−1) = (−1) . r! r

5-32

CHAPTER 5

Discrete Random Variables and Their Distributions

5.61 % & a) We first note that there are 52 5 possible equally likely outcomes of the random experiment. Hence, for each event E, N(E) P (E) = ' ( . 52 5

Let A be the event from & the % &that exactly three face cards are obtained. Three face cards can be selected%40 ways, and two non-face cards can be selected from the 40 non-face cards in 12 face cards in 12 3 2 ways. Hence, by the BCR, ' (' ( 12 40 N(A) 3 2 ' ( = 0.0660. P (A) = ' ( = 52 52 5 5

b) Let the specified attribute be “face card.” Here N = 52, M = 12, N − M = 40, and n = 5. Noting that p = 12/52 = 3/13, we see that the number, X, of face cards obtained has the H(52, 5, 3/13) distribution. Consequently, in view of Equation (5.13) on page 210, ' (' ( 12 40 x 5−x ' ( , x = 0, 1, 2, 3, 4, 5, pX (x) = 52 5 and pX (x) = 0 otherwise.

5.62 We first note that X ∼ H(N, n, p). It follows that pˆ is a discrete random variable whose possible values are (contained among) 0, 1/n, 2/n, . . . , (n − 1)/n, 1. Therefore, from Equation (5.12) on page 210, the PMF of pˆ is ' (' ( Np N(1 − p) nx n(1 − x) ' ( , ppˆ (x) = P (pˆ = x) = P (X/n = x) = P (X = nx) = pX (nx) = N n if x = 0, 1/n, 2/n, . . . , (n − 1)/n, 1; and ppˆ (x) = 0 otherwise.

5.63 a) Of the five pills chosen, let X denote the number of placebos selected. Here N = 10, M = 5, and N − M = 5. Noting that p = 5/10 = 0.5, we see that X ∼ H(10, 5, 0.5). Hence, from the complementation rule and Equation (5.13) on page 210, P (X ≥ 2) = 1 − P (X = 0) − P (X = 1) = 1 − pX (0) − pX (1) ' (' ( ' (' ( 5 5 5 5 6 1 4 0 5 = 0.897. =1− ' ( − ' ( =1− 10 10 252 5 5

b) We can solve this problem in several ways. One way is to use the general multiplication rule. Let E be the event that the first three pills you select are placebos and, for 1 ≤ j ≤ 3, let Aj be the event that

5.4

Hypergeometric Random Variables

5-33

the j th pill you select is a placebo. Then E = A1 ∩ A2 ∩ A3 and, by the general multiplication rule for three events, P (E) = P (A1 ∩ A1 ∩ A3 ) = P (A1 )P (A2 | A1 )P (A3 | A1 ∩ A2 ) =

5 4 3 1 · · = = 0.0833. 10 9 8 12

Another method is to use the BCR as follows: P (E) =

5·4·3 60 1 = = = 0.0833. 10 · 9 · 8 720 12

Yet another method is to use the H(10, 3, 0.5) distribution: ' (' ( 5 5 10 · 1 1 3 0 P (E) = ' ( = = = 0.0833. 10 120 12 3 5.64 Let Y denote the number of households of the six sampled that have a VCR. Because the sampling is without replacement, Y ∼ H(N, 6, 0.842), where N is the number of U.S. households. Because the sample size is small relative to the population size, we can use the binomial distribution with parameters n = 6 and p = 0.842 to approximate probabilities for Y . a) We have ' ( 6 P (Y = 4) ≈ (0.842)4 (0.158)2 = 0.188. 4

b) We have

P (Y ≥ 4) = P (Y = 4) + P (Y = 5) + P (Y = 6) ' ( ' ( ' ( 6 6 6 ≈ (0.842)4 (0.158)2 + (0.842)5 (0.158)1 + (0.842)6 (0.158)0 4 5 6 = 0.946. c) From the complementation rule, P (Y ≤ 4) = 1 − P (Y > 4) = 1 − P (Y = 5) − P (Y = 6) ' ( ' ( 6 6 ≈1− (0.842)5 (0.158)1 − (0.842)6 (0.158)0 5 6 = 0.242. d) We have P (2 ≤ Y ≤ 5) = P (Y = 2) + P (Y = 3) + P (Y = 4) + P (Y = 5) ' ( ' ( ' ( 6 6 6 ≈ (0.842)2 (0.158)4 + (0.842)3 (0.158)3 + (0.842)4 (0.158)2 2 3 4 ' ( 6 + (0.842)5 (0.158)1 5 = 0.643.

5-34

CHAPTER 5

Discrete Random Variables and Their Distributions

e) As we noted, because the sample size is small relative to the population size, Y has approximately the binomial distribution with parameters n = 6 and p = 0.842. Hence, ' ( 6 pY (y) ≈ (0.842)y (0.158)6−y , y = 0, 1, . . . , 6, y

and pY (y) = 0 otherwise. f) The PMF obtained in part (e) is only approximately correct because the sampling is without replacement, implying that, strictly speaking, the trials are not independent. g) As we noted, Y ∼ H(N, 6, 0.842). Hence, from Equation (5.12) on page 210, ' (' ( 0.842N 0.158N y 6−y ' ( , y = 0, 1, . . . , 6, pY (y) = N 6

and pY (y) = 0 otherwise.

5.65 Let X denote the number of defective parts of the three selected. Also, let I = event that Bin I is chosen, and II = event that Bin II is chosen.

Given that Bin I is chosen, X ∼ H(20, 3, 1/4), whereas, given that Bin II is chosen, X ∼ H(15, 3, 4/15). Hence, ' (' ( ' (' ( 4 11 5 15 5 66 2 1 2 1 and P (X = 2 | II ) = ' ( = . P (X = 2 | I ) = ' ( = 20 15 38 455 3 3

Moreover, because the bin chosen is selected randomly, we have P (I ) = P (II ) = 1/2. Hence, by Bayes’s rule, P (I | X = 2) = =

P (I )P (X = 2 | I ) = P (I )P (X = 2 | I ) + P (II )P (X = 2 | II ) 5 38 5 38

+

66 455

1 2

1 2

·

5 38 5 1 66 38 + 2 · 455

·

= 0.476.

5.66 We have N = 10, n = 3, and p = 0.6. a) If the sampling is without replacement, then X ∼ H(10, 3, 0.6). Thus, ' (' ( 6 4 x 3−x ' ( , x = 0, 1, 2, 3, pX (x) = 10 3

and pX (x) = 0 otherwise. Evaluating the nonzero probabilities, we get the following table: x

0

1

2

3

pX (x )

0.0333

0.3000

0.5000

0.1667

5.4

5-35

Hypergeometric Random Variables

b) If the sampling is with replacement, then X ∼ B(3, 0.6). Thus, ' ( 3 pX (x) = (0.6)x (0.4)3−x , x = 0, 1, 2, 3, x and pX (x) = 0 otherwise. Evaluating the nonzero probabilities, we get the following table: x

0

1

2

3

pX (x )

0.064

0.288

0.432

0.216

c) The results of parts (a) and (b) differ substantially, which is not surprising because the sample size is not small relative to the population size. 5.67 We have N = 100, n = 5, and p = 0.06. a) We know that X ∼ H(100, 5, 0.06). Thus, ' (' ( 6 94 x 5−x ' ( , pX (x) = 100 5

x = 0, 1, . . . , 5,

and pX (x) = 0 otherwise. Evaluating the nonzero probabilities, we get the following table: x

0

1

2

3

4

5

pX (x )

0.72908522

0.24302841

0.02670642

0.00116115

0.00001873

0.00000008

b) The appropriate approximating binomial distribution is the one with parameters n = 5 and p = 0.06. Thus, ' ( 5 pX (x) = (0.06)x (0.94)5−x , x = 0, 1, . . . , 5, x and pX (x) = 0 otherwise. Evaluating the nonzero probabilities and juxtaposing the results with those of part (a), we get the following table: Defectives x

Hypergeometric probability

Binomial probability

0 1 2 3 4 5

0.72908522 0.24302841 0.02670642 0.00116115 0.00001873 0.00000008

0.73390402 0.23422469 0.02990102 0.00190858 0.00006091 0.00000078

c) Referring to part (b), we see that, here, the binomial approximation to the hypergeometric distribution is quite good for the two smallest values of x, moderately good for the next value of x, and then degrades rapidly as x increases. This fact is not surprising because the rule of thumb for the approximation is barely met—the sample size is exactly 5% of the population size. 5.68 In this case, we have N = 80,000, M = 30,000, N − M = 50,000, and n = 100. Noting that p = 30,000/80,000 = 0.375, we conclude that the number, X, of words that the student knows has

5-36

CHAPTER 5

Discrete Random Variables and Their Distributions

the H(80,000, 100, 0.375) distribution. Hence, ' (' ( 30,000 50,000 x 100 − x ' ( pX (x) = , 80,000 100

x = 0, 1, . . . , 100,

and pX (x) = 0 otherwise. a) We have

' (' ( 30,000 50,000 40 60 ' ( P (X = 40) = pX (40) = = 0.0712. 80,000 100

b) From the FPF,

P (35 ≤ X ≤ 40) =

/

35≤x≤40

pX (x) =

40 /

k=35

'

(' ( 30,000 50,000 k 100 − k ' ( = 0.465. 80,000 100

c) Yes, a binomial approximation to the hypergeometric distribution would be justified in this case because the sample size is very small relative to the population size; in fact, the sample size is only 0.125% of the population size. d) As this other student knows 38 of the 100 words sampled, her sample proportion of words known in this dictionary is 0.38. So, we would estimate that she knows about 0.38 · 80,000 = 30,400 words in this dictionary. Assuming that this student knows very few words that aren’t in this dictionary, we would estimate the size of her vocabulary at 30,400 words.

Theory Exercises % & 5.69 The number of possible samples of size n from the population of size N is Nn . As the selection is done randomly, a classical probability model is appropriate. Hence, for each event E, P (E) =

N(E) N(E) = ' (. N N(!) n

Now, let k ∈ {0, 1, . . . , n}. We want to determine the number of ways that the event {X = k} can occur. There are Np members % & of the population with the specified attribute, of which k are to be selected; that can be done in Np without the specified k ways. And there are N(1 − p) members %of the population N(1−p)& attribute, of which n − k are to be selected; that can be done in n−k ways. Hence, by the BCR, ' (' ( Np N(1 − p) N({X = k}) k n−k ' ( ' ( . = P ({X = k}) = N N n n

5.4

Consequently, pX (x) = and pX (x) = 0 otherwise.

Hypergeometric Random Variables

'

Np x

(' ( N(1 − p) n−x ' ( , N n

5-37

x = 0, 1, . . . , n,

5.70 We know that X has the hypergeometric distribution with parameters N, n, and p. Recalling that q = 1 − p, we have, for x = 0, 1, . . . , n, '

Np x

('

Nq n−x ' ( N n

(

(Np)x (Nq)n−x ' ( · n (Np)x · (Nq)n−x x! (n − x)! = = pX (x) = (N)n x (N)n n! % & ' ( n (Np)(Np − 1) · · · (Np − x + 1) · (Nq)(Nq − 1) · · · Nq − (n − x) + 1 = x N(N − 1) · · · (N − n + 1) 8 9 8 9 8 9 8 9 ' ( p p − 1 · · · p − x−1 · q q − 1 · · · q − n−x−1 N N N N n 9 8 9 8 . = x 1 1 − 1 · · · 1 − n−1 N

Hence,

N

' ( x n−x ' ( n p ·q n x = p (1 − p)n−x . lim pX (x) = N→∞ x 1 x

5.71 a) Here we consider ordered samples with replacement. From Example 3.4(c) on page 90, there are N n such samples. The problem is to find the probability that exactly k of the n members sampled have the specified attribute. Because the %sampling is random, each possible sample is equally likely to be the one & selected. We note that there are nk ways to choose the k positions in the ordered sample for the members that have the specified attribute. Once that option has been determined, there are (Np)k choices for which members with the specified attribute occupy the k positions and (Nq)n−k choices for which members without the specified attribute occupy the remaining n − k positions. Hence, by the BCR, % n& ' ( · (Np)k · (Nq)n−k N({X = k}) n (Np)k (Nq)n−k P ({X = k}) = = k = . N(!) k Nn Nn Consequently,

' ( n (Np)x (Nq)n−x , x = 0, 1, . . . , n, pX (x) = x (N)n and pX (x) = 0 otherwise. b) For x = 0, 1, . . . , n, we have ' ( ' ( ' ( n (Np)x (Nq)n−x n x n−x n (Np)x (Nq)n−x = = p q , n x n−x (N) x (N) (N) x x

which is the PMF of a binomial distribution with parameters n and p. c) Here we consider ordered samples without replacement. From Example 3.8(a) on page 96, there are (N)n such samples. The problem is to find the probability that exactly k of the n members sampled

5-38

CHAPTER 5

Discrete Random Variables and Their Distributions

have the specified attribute. Because the %sampling is random, each possible sample is equally likely to be n& the one selected. We note that there are k ways to choose the k positions in the ordered sample for the members that have the specified attribute. Once that option has been determined, there are (Np)k choices for which members with the specified attribute occupy the k positions and (Nq)n−k choices for which members without the specified attribute occupy the remaining n − k positions. Hence, by the BCR, % n& ' ( N({X = k}) n (Np)k (Nq)n−k k · (Np)k · (Nq)n−k P ({X = k}) = = . = k N(!) (N)n (N)n Consequently,

' ( n (Np)x (Nq)n−x pX (x) = , x (N)n

x = 0, 1, . . . , n,

and pX (x) = 0 otherwise. d) For x = 0, 1, . . . , n, we have, in view of the permutations rule (Proposition 3.2 on page 95) and the combinations rule (Proposition 3.4 on page 99), (Np)! (Nq)! ' ( n (Np)x (Nq)n−x n! (Np − x)! (Nq − (n − x))! = N! x (N)n x! (n − x)! (N − n)! (Np)! (Nq)! x! (Np − x)! (n − x)! (Nq − (n − x))! = = N! n! (N − n)!

'

Np x

('

( Nq n−x ' ( , N n

which is the PMF of a hypergeometric distribution with parameters N, n, and p. 5.72 a) Applying Vandermonde’s identity with k = n, n = M, and m = N − M yields ( ' ( ' ( n ' (' / M N −M M + (N − M) N = = . k n − k n n k=0

% & The required result now follows upon dividing both sides of the preceding display by Nn . b) Let X ∼ H(N, n, M/N). Then, as a PMF must sum to 1, we get ' (' ( ' (' ( N · M/N N · (1 − M/N) M N −M n n / / / k n−k k n−k ' ( ' ( 1= = . pX (x) = N N x k=0 k=0 n n c) Applying part (b) with n = k, M = n, and N = n + m yields ' (' ( ' (' ( n (n + m) − n n m ( k k ' (' k / / / 1 n m j k−j j k−j ' ( =' ' ( ( 1= = . n+m n+m n + m j =0 j k − j j =0 j =0 k k k & % The required result now follows upon multiplying both sides of the preceding display by n+m k .

5.4

Hypergeometric Random Variables

5-39

Advanced Exercises 5.73 Suppose that the committee were randomly selected. Let X denote the number of men chosen. Noting that 12/20 = 0.6, we have that X ∼ H(20, 5, 0.6). The probability that one or fewer men would be chosen is ' (' ( ' (' ( 12 8 12 8 / 0 5 1 4 pX (x) = pX (0) + pX (1) = ' ( + ' ( = 0.0578. P (X ≤ 1) = 20 20 x≤1 5 5 In other words, if the committee were randomly selected, there is less than a 6% chance that at most one man would be chosen. Consequently, the men do have reason to complain that the committee was not randomly selected. 5.74 a) Let X denote the number of red balls of the six sampled. Then X ∼ H(12, 6, 2/3). Noting that at least two of the balls sampled must be red, we have ' (' ( 8 4 x 6−x ' ( P (X = x) = , x = 2, 3, . . . , 6, 12 6 and P (X = x) = 0 otherwise. b) Let Y denote the number of red balls of the eight sampled. Then Y ∼ H(12, 8, 1/2). Noting that at least two of the balls sampled must be red and that at most six of the balls sampled can be red, we have ' (' ( 6 6 x 8−x ' ( P (Y = x) = , x = 2, 3, . . . , 6, 12 8 and P (Y = x) = 0 otherwise. c) We claim that the quantities on the right of the two preceding displays are equal for all x = 2, 3, . . . , 6. But that result is just the identity presented in Exercise 3.106(c) with N = 12, n = 8, and m = 6. Hence, there are no values of x for which the answer to part (a) is greater than the answer to part (b), and there are no values of x for which the answer to part (a) is smaller than the answer to part (b). 5.75 a) As the first class contains only four upper-division students, at least two of the six students sampled must be lower-division students. This type of restriction does not arise in the second class because there are 40 upper-division students. Consequently, as both classes contain at least six lower-division students, the set of possible values of the number of lower-division students sampled is {2, 3, . . . , 6} for the first class and {0, 1, . . . , 6} for the second class. b) A greater difference exists for the first class between the probability distributions of the number of lower-division students obtained for sampling with replacement (binomial distribution) and for sampling without replacement (hypergeometric distribution). Indeed, the sample size is smaller relative to the population size in the second class than in the first class, namely, 5% and 50%, respectively.

5-40

CHAPTER 5

Discrete Random Variables and Their Distributions

5.76 Let A denote the event that a shipment is good—it contains three or fewer defective parts. Set X equal to the number of defective parts in a random sample of size 10 from a shipment. We want to find the smallest integer k so that P (X < k | A) ≥ 0.95. To that end, let Aj denote the event that the shipment ? contains exactly j defective parts. We note that the Aj s are mutually exclusive and that A = j3=0 Aj . Suppose that P (X < k | Aj ) ≥ 0.95 for 0 ≤ j ≤ 3. Then, P ({X < k} ∩ A) P (X < k | A) = = P (A) ≥

33

j =0 P (Aj ) · 0.95 33 j =0 P (Aj )

33

j =0 P ({X < k} ∩ Aj ) 33 j =0 P (Aj )

33

j =0 P (Aj )

= 0.95 · 33

j =0 P (Aj )

=

33

= 0.95.

j =0 P (Aj )P (X < 33 j =0 P (Aj )

k | Aj )

Thus, we want the smallest k such that P (X < k | Aj ) ≥ 0.95 for 0 ≤ j ≤ 3. Now, given that event Aj occurs, we know that X ∼ H(100, 10, 0.0j ). Thus, min{k−1,j ' (' ( / } 'j ('100 − j ( j 100 − j k−1 i 10 − i / i 10 − i i=0 ' ( ' ( P (X < k | Aj ) = = . 100 100 i=0 10 10

We therefore obtain the following table: k

P (X < k | A0 )

P (X < k | A1 )

P (X < k | A2 )

P (X < k | A3 )

1 2

1 1

0.9 1

0.8091 0.9909

0.7265 0.9742

From the table, we see that the smallest k such that P (X < k | Aj ) ≥ 0.95, for 0 ≤ j ≤ 3, is k = 2.

5.5 Poisson Random Variables Basic Exercises 5.77 The PMF of X is given by pX (x) = e−3

3x , x!

and pX (x) = 0 otherwise. a) We have P (X = 3) = pX (3) = e−3

x = 0, 1, 2, . . . ,

33 = 4.5e−3 = 0.224. 3!

b) From the FPF, P (X < 3) =

/ x 3) = 1 − 0.353 = 0.647.

e) From the complementation rule and part (b),

P (X ≥ 3) = 1 − P (X < 3) = 1 − 0.423 = 0.577. 5.78 Let X denote the number of wars that began during the selected calendar year. The PMF of X is given by (0.7)x pX (x) = e−0.7 , x = 0, 1, 2, . . . , x! and pX (x) = 0 otherwise. a) We have (0.7)0 = e−0.7 = 0.497. P (X = 0) = pX (0) = e−0.7 0! b) From the FPF, / P (X ≤ 2) = pX (x) = pX (0) + pX (1) + pX (2) x≤2

(0.7)0 (0.7)1 (0.7)2 + e−0.7 + e−0.7 0! 1! 2! ' ( 0 1 2 (0.7) (0.7) (0.7) + + = 1.945e−0.7 = 0.966. = e−0.7 0! 1! 2!

= e−0.7

c) From the FPF,

P (1 ≤ X ≤ 3) =

/

1≤x≤3

pX (x) = pX (1) + pX (2) + pX (3)

(0.7)1 (0.7)2 (0.7)3 + e−0.7 + e−0.7 1! 2! 3! ' ( 1 2 3 (0.7) (0.7) (0.7) + + = 0.498. = e−0.7 1! 2! 3! = e−0.7

5.79 Let X denote the number of Type I calls made from the motel during a 1-hour period. The PMF of X is given by (1.7)x pX (x) = e−1.7 , x = 0, 1, 2, . . . , x! and pX (x) = 0 otherwise. a) We have (1.7)1 = 1.7e−1.7 = 0.311. P (X = 1) = pX (1) = e−1.7 1!

5-42

CHAPTER 5

Discrete Random Variables and Their Distributions

b) From the FPF, P (X ≤ 2) =

/ x≤2

pX (x) = pX (0) + pX (1) + pX (2)

(1.7)0 (1.7)1 (1.7)2 + e−1.7 + e−1.7 0! 1! 2! ' ( 0 1 2 (1.7) (1.7) (1.7) + + = 4.145e−1.7 = 0.757. = e−1.7 0! 1! 2! = e−1.7

c) From the complementation rule and FPF,

P (X ≥ 2) = 1 − P (X < 2) = 1 −

/ x ≈ 459.4. ln 0.995 Hence, using the binomial distribution, the sample size must be at least 460. Applying the Poisson approximation, the condition becomes e−0.005n < 0.1

or

n>−

ln 0.1 ≈ 460.5. 0.005

Thus, using the Poisson approximation, the sample size must be at least 461. The answers obtained by using the binomial distribution and Poisson approximation agree almost perfectly (i.e., they differ by 1). 5.85 Let X denote the number of eggs in a nest, and let Y denote the number of eggs observed. We know that X ∼ P(4). Moreover, because an empty nest cannot be recognized, we have, for k ∈ N , P (X = k, X > 0) P (X = k) = P (X > 0) P (X > 0) ' −4 ( k −4 k e 4 /k! e 4 P (X = k) = = . = 1 − P (X = 0) 1 − e−4 1 − e−4 k!

P (Y = k) = P (X = k | X > 0) =

Hence,

pY (y) = and pY (y) = 0 otherwise.

'

e−4 1 − e−4

(

4y , y!

y = 1, 2, 3, . . . ,

Theory Exercises 5.86 a) We first note that, because npn → λ as n → ∞, we have lim pn = lim

n→∞

n→∞

λ npn = = 0. n lim n n→∞

5.5

Poisson Random Variables

5-45

Now, let x be a nonnegative integer. Then, for n sufficiently large, ' ( n x (n)x x pn (1 − pn )n−x = p (1 − pn )−x (1 − pn )n pXn (x) = x x! n 8 8 npn 9n (n)x nx x (n)x (npn )x npn 9n −x −x 1 − = p (1 − p ) = · · (1 − p ) · 1 − . n n x! nx n n nx x! n

However,

' ( ' ( n(n − 1) · · · (n − x + 1) (n)x 1 x−1 = lim = lim 1 · 1 − ··· 1 − =1 lim n→∞ nx n→∞ n→∞ nx n n and, by the hint,

8 npn 9n 1− = e−λ . n→∞ n lim

Therefore,

lim pXn (x) = 1 ·

n→∞

λx λx · (1 − 0)−x · e−λ = e−λ . x! x!

b) Let X ∼ B(n, p), where n is large and p is small. Set Xn = X, pn = p, and λ = np. Note that npn = λ. By part (a), we have, for x = 0, 1, . . . , n, pX (x) = pXn (x) ≈ e−λ

λx (np)x = e−np , x! x!

which is Equation (5.21) on page 220. Thus, part (a) provides the theoretical basis for Proposition 5.7. 5.87 For k ∈ N , we have λ e−λ λk /k! λ−k pX (k) = =1+ = −λ k−1 . pX (k − 1) k k e λ /(k − 1)! Therefore, It follows that pX (k) > pX (k − 1), pX (k) = pX (k − 1), pX (k) < pX (k − 1),

if k < λ; if k = λ; if k > λ.

(∗)

Let’s now consider two cases. Recall that m = ⌊λ⌋. Case 1: λ isn’t an integer. We have k < λ if and only if k ≤ m and we have k > λ if and only if k ≥ m + 1. Thus, from Relations (∗), we see that pX (k) is strictly increasing as k goes from 0 to m and is strictly decreasing as k goes from m to ∞. Case 2: λ is an integer. We have k < λ if and only if k ≤ m − 1 and we have k > λ if and only if k ≥ m + 1. Thus, from Relations (∗), we see that pX (k) is strictly increasing as k goes from 0 to m − 1, that pX (m − 1) = pX (m), and that pX (k) is strictly decreasing as k goes from m to ∞.

5-46

CHAPTER 5

Discrete Random Variables and Their Distributions

5.88 a) Using integration by parts with u = t k+1 and dv = e−t dt, we get >∞ < = ∞ = ∞ 1 1 1 k+1 −t k+1 −t t e dt = − + (k + 1)t k e−t dt t e (k + 1)! a (k + 1)! a (k + 1)! a = a k+1 1 ∞ k −t + t e dt. = e−a (k + 1)! k! a b) We want to prove that

=



0

t k e−t dt = k!,

k = 0, 1, 2, . . . .

(∗)

@∞ Now, because 0 e−t dt = 1 = 0!, Equation (∗) holds for k = 0. Assuming that Equation (∗) holds for k, we get, in view of part (a), that = ∞ = (k + 1)! ∞ k −t k+1 −t −0 k+1 t e dt = e 0 + t e dt = 0 + (k + 1)k! = (k + 1)!, k! 0 0

as required. c) Let

1 Ak = k!

=



t k e−t dt,

λ

k = 0, 1, 2, . . . .

Noting that A0 = e−λ , we get, from the FPF and part (a), that P (X ≤ k) =

/ x≤k

pX (x) =

= Ak =

1 k!

=

λ

k /

e−λ

j =0



k k / / λj λj = e−λ + e−λ = A0 + (Aj − Aj −1 ) j! j! j =1 j =1

t k e−t dt =

1 '(k + 1, λ). k!

d) Applying, in turn, the complementation rule, part (c), and part (b), we get '= ∞ ( = = λ 1 1 ∞ k −t k −t k −t t e dt = 1 − t e dt − t e dt P (X > k) = 1 − P (X ≤ k) = 1 − k! λ k! 0 0 ' ( = λ = 1 1 λ k −t 1 =1− k! − t k e−t dt = t e dt = γ (k + 1, λ). k! k! k! 0 0

Advanced Exercises 5.89 Let X denote the number of defective bolts in a box of 100. Then X ∼ B(100, 0.015). We can compute probabilites for X exactly by using the binomial PDF ' ( 100 (0.015)x (0.985)100−x , x = 0, 1, . . . , 100, pX (x) = x and pX (x) = 0 otherwise. However, it is computationally simpler to use the Poisson approximation: pX (x) ≈ e−1.5

(1.5)x , x!

x = 0, 1, . . . , 100.

5.5

5-47

Poisson Random Variables

a) We have P (X = 0) = pX (0) ≈ e−1.5

(1.5)0 = e−1.5 = 0.223. 0!

b) From the FPF, P (X ≤ 2) =

/ x≤2

pX (x) =

≈ e−1.5

2 /

pX (k)

k=0

(1.5)0 (1.5)1 (1.5)2 + e−1.5 + e−1.5 0! 1! 2!

= 3.625e−1.5 = 0.809. c) We again use the Poisson approximation. Let X denote the number of defective bolts in a box that contains 100 + m bolts. We know that X ∼ B(100 + m, 0.015) ≈ P(1.5 + 0.015m). We want to choose the smallest m so that P (X ≤ m) ≥ 0.95. Now, P (X ≤ m) =

/

x≤m

pX (x) =

m / k=0

pX (x) ≈ e−(1.5+0.015m)

m / (1.5 + 0.015m)x

x!

x=0

.

Using the preceding result, we obtain the following table: m

0

1

2

3

4

P (X ≤ m)

0.223

0.553

0.801

0.929

0.978

From the table, we see that the required m is 4; that is, the minimum number of bolts that must be placed in a box to ensure that 100 or more of the bolts will be nondefective at least 95% of the time is 104. 5.90 Let X denote the number of customers per day who arrive at Grandma’s Fudge Shoppe, and let Y denote the number who make a purchase. We know that X ∼ P(λ). Furthermore, given that X = n, we have Y ∼ B(n, p). Applying the law of total probability and Equation (5.26) on page 222, we get, for each nonnegative integer k, that P (Y = k) =

∞ / n=k

=e

−λ

=e

−λ

P (X = n)P (Y = k | X = n) =

∞ / n=k

e

−λ

' ( λn n k p (1 − p)n−k n! k

% &n−k ∞ ∞ k / λ(1 − p) λn−k λk p k / n−k −λ (λp) (1 − p) =e k! n=k (n − k)! k! n=k (n − k)! % &j ∞ λ(1 − p) (λp)k / (λp)k λ(1−p) (pλ)k e . = e−λ = e−pλ k! j =0 k! k! j!

Hence, Y ∼ P(pλ), that is, Y has the Poisson distribution with parameter pλ.

5-48

CHAPTER 5

Discrete Random Variables and Their Distributions

5.91 For nonnegative integers x and y, λx x!

pX (x) = e−λ

and

pY (y) = e−µ

µy , y!

and pX (x) = pY (y) = 0 otherwise. a) Let z be a nonnegative integer. Then, by the law of total probability, the assumed independence condition, and the binomial theorem, ∞ /

pX+Y (z) = P (X + Y = z) = = =

∞ / x=0

z /

x=0

P (X = x)P (X + Y = z | X = x)

P (X = x)P (Y = z − x | X = x) = e

−λ

x=0

= e−(λ+µ)

∞ / x=0

P (X = x)P (Y = z − x)

z ' ( / λx −µ µz−x z x z−x 1 −(λ+µ) 1 ·e =e λ µ = e−(λ+µ) (λ + µ)z . x! (z − x)! z! x=0 x z!

(λ + µ)z . z!

Therefore, X + Y ∼ P(λ + µ). b) From the assumed independence condition, we have P (X + Y = 3 | X = 1) = P (Y = 2 | X = 1) = P (Y = 2) = pY (2) = e−µ

µ2 −µ µ2 = e . 2! 2

5.92 From the exponential series, et =

∞ k ∞ ∞ / / / t t 2k t 2k+1 = + . k! k=0 (2k)! k=0 (2k + 1)! k=0

It follows that e−t =

∞ ∞ / / t 2k t 2k+1 − (2k)! k=0 (2k + 1)! k=0

and, therefore, that t

e +e

−t

∞ / t 2k =2 . (2k)! k=0

Applying the FPF, we get that P (X is even) =

/

x is even

= e−λ

pX (x) =

∞ / k=0

pX (2k) =

1 + e−2λ eλ + e−λ = . 2 2

∞ / k=0

e−λ

∞ / λ2k λ2k = e−λ (2k)! (2k)! k=0

5.6

Geometric Random Variables

5-49

5.6 Geometric Random Variables Basic Exercises 5.93 a) From Proposition 5.9 on page 229, we see that X ∼ G(0.67). Hence, pX (x) = (0.67)(0.33)x−1 if x ∈ N , and pX (x) = 0 otherwise. b) From part (a), P (X = 3) = pX (3) = (0.67)(0.33)3−1 = (0.67)(0.33)2 = 0.0730. From Proposition 5.10 on page 231, P (X ≥ 3) = P (X > 2) = (1 − 0.67)2 = (0.33)2 = 0.109. From the complementation rule and Proposition 5.10, P (X ≤ 3) = 1 − P (X > 3) = 1 − (1 − 0.67)3 = 1 − (0.33)3 = 0.964.

c) We want to determine n so that P (X ≤ n) ≥ 0.99 or, equivalently, P (X > n) ≤ 0.01. From Proposition 5.10, this last relation is equivalent to (1 − 0.67)n ≤ 0.01. Thus, we want ln 0.01 ≈ 4.2. ln 0.33 Hence, you must bet on five (or more) races to be at least 99% sure of winning something. (0.33)n ≤ 0.01

or

n≥

5.94 Let X denote the number of at-bats up to and including the first hit. Then X ∼ G(0.26). a) We have P (X = 5) = pX (5) = (0.26)(1 − 0.26)5−1 = (0.26)(0.74)4 = 0.0780.

b) From Proposition 5.10 on page 231,

P (X > 5) = (1 − 0.26)5 = (0.74)5 = 0.222.

c) From Equation (5.31) on page 230 and Proposition 5.10 on page 231,

P (3 ≤ X ≤ 10) = P (2 < X ≤ 10) = P (X > 2) − P (X > 10)

= (1 − 0.26)2 − (1 − 0.26)10 = (0.74)2 − (0.74)10 = 0.498.

5.95 a) From the FPF, P (X is even) = =

/

x is even

p 1−p

pX (x) =

∞ % / k=1

∞ / k=1

(1 − p)2

b) From part (a) and the complementation rule,

pX (2k) =

&k

=

∞ / k=1

p(1 − p)2k−1

p (1 − p)2 1−p · = . 2 1 − p 1 − (1 − p) 2−p

P (X is odd) = 1 − P (X is even) = 1 −

1 1−p = . 2−p 2−p

5-50

CHAPTER 5

Discrete Random Variables and Their Distributions

c) From the lack-of-memory property, the complementation rule, and Proposition 5.10 on page 231, P (2 ≤ X ≤ 9 | X ≥ 4) = P (4 ≤ X ≤ 9 | X ≥ 4) = P (4 ≤ X ≤ 9 | X > 3) =

9 / k=4

P (X = k | X > 3) =

6 /

j =1

P (X = 3 + j | X > 3) =

= P (X ≤ 6) = 1 − P (X > 6) = 1 − (1 − p)6 .

6 /

j =1

P (X = j )

d) From the conditional probability rule, the complementation rule, and Proposition 5.10, we have, for k = 1, 2, . . . , n, that P (X = k | X ≤ n) =

P (X = k, X ≤ n) P (X = k) p(1 − p)k−1 . = = P (X ≤ n) 1 − P (X > n) 1 − (1 − p)n

e) Applying part (d) with n − 1 replacing n and with n − k replacing k, we have, for k = 1, 2, . . . , n − 1, that P (X = n − k | X < n) = P (X = n − k | X ≤ n − 1) =

p(1 − p)(n−k)−1 p(1 − p)n−k−1 = . 1 − (1 − p)n−1 1 − (1 − p)n−1

5.96 Applying Equation (5.35) on page 233, we see that P (X = 10 | X > 6) = P (X = 6 + 4 | X > 6) = P (X = 4).

Hence, it is the first equation that is implied by the lack-of-memory property. 5.97 a) From the conditional probability rule and Proposition 5.10 on page 231, P (X = k | X > n) =

P (X = k, X > n) P (X = k) p(1 − p)k−1 = p(1 − p)k−n−1 . = = P (X > n) P (X > n) (1 − p)n

b) From the lack-of-memory property, Equation (5.35) on page 233, % & P (X = k | X > n) = P X = n + (k − n) | X > n = P (X = k − n) = p(1 − p)k−n−1 . 5.98 Let

I = event that Coin I is selected, and

II = event that Coin II is selected.

Because the coin is selected at random, P (I ) = P (II ) = 0.5. Also, given I , we have X ∼ G(1 − p1 ), whereas, given II , we have X ∼ G(1 − p2 ). a) Let k ∈ N . Then, by the law of total probability, P (X = k) = P (I )P (X = k | I ) + P (II )P (X = k | II ) = 0.5(1 − p1 )p1k−1 + 0.5(1 − p2 )p2k−1 9 8 = 0.5 (1 − p1 )p1k−1 + (1 − p2 )p2k−1 .

Hence,

9 8 pX (x) = 0.5 (1 − p1 )p1x−1 + (1 − p2 )p2x−1 ,

x ∈ N,

and pX (x) = 0 otherwise. b) If p1 = p2 , say, both equal to p, then, by part (a), 8 9 % &x−1 pX (x) = 0.5 (1 − p)px−1 + (1 − p)p x−1 = (1 − p)px−1 = (1 − p) 1 − (1 − p)

if x ∈ N , and pX (x) = 0 otherwise. Thus, in this case, X ∼ G(1 − p).

5.6

Geometric Random Variables

5-51

5.99 a) When sampling is with replacement, the sampling process constitutes Bernoulli trials with success probability p. Consequently, from Proposition 5.9 on page 229, X has the geometric distribution with parameter p. Thus, pX (x) = p(1 − p)x−1 ,

x = 1, 2, . . . ,

and pX (x) = 0 otherwise. b) When sampling is without replacement, the sampling process doesn’t constitute Bernoulli trials. The event {X = x} occurs if and only if none of the first x − 1 members sampled have the specified attribute (event A) and the xth member sampled has the specified attribute (event B). Several approaches are possible for obtaining P (X = x). Here, we proceed as follows. Applying the hypergeometric distribution with parameters N , x − 1, and p, we get ' ( ' (' ( N(1 − p) Np N(1 − p) x−1 0 x−1 ' ( ( . = ' P (A) = N N x−1 x−1 Clearly, P (B | A) = Np/(N − x + 1). Hence, by the general multiplication rule, '

( N(1 − p) Np x−1 ( P (X = x) = P (A ∩ B) = P (A)P (B | A) = ' N N −x+1 x−1 % & % & N(1 − p) x−1 N(1 − p) x−1 (Np) =p . = (N)x (N − 1)x−1 Thus, pX (x) = p

%

& N(1 − p) x−1 (N − 1)x−1

,

x = 1, 2, . . . , N(1 − p) + 1,

and pX (x) = 0 otherwise. c) Here we consider the specified attributed to be “house key.” We observe that, in the notation of part (b), we have p = 1/N. Let Cn denote the event that you get the house key on the nth try, where 1 ≤ n ≤ N . Applying the result of part (b) with x = n, we get % & N(1 − 1/N) n−1 (N − 1)n−1 = (1/N) = 1/N. P (Cn ) = pX (n) = (1/N) (N − 1)n−1 (N − 1)n−1 5.100 a) Let An denote the event that Tom wins on toss 2n + 1. Also, let X1 and X2 denote the number of tosses required for Tom and Dick to get a head, respectively. Observe that Xj ∼ G(pj ), j = 1, 2. We have P (An ) = P (X1 = n + 1, X2 > n) = P (X1 = n + 1)P (X2 > n) % &n = p1 (1 − p1 )(n+1)−1 (1 − p2 )n = p1 (1 − p1 )(1 − p2 ) .

5-52

CHAPTER 5

Discrete Random Variables and Their Distributions

Letting T denote the event that Tom wins and noting that A1 , A2 , . . . are mutually exclusive events, we get ( / '0 ∞ ∞ ∞ % / &n An = P (An ) = p1 (1 − p1 )(1 − p2 ) P (T ) = P n=0

n=0

n=0

p1 1 = . p1 + p 2 − p 1 p2 1 − (1 − p1 )(1 − p2 ) b) If both coins are balanced, then p1 = p2 = 0.5 and, hence, = p1 ·

P (T ) =

p1 0.5 1/2 2 = = = . p1 + p2 − p1 p2 0.5 + 0.5 − 0.5 · 0.5 3/4 3

c) For the game to be fair, we must have P (T ) = 0.5. In view of part (a), this happens if and only if p1 1 = p1 + p2 − p1 p2 2

or

p1 =

p2 . 1 + p2

d) If Dick’s coin is balanced, p2 = 0.5. Then, in view of part (c), the game will be fair if and only if p1 =

0.5 1 + 0.5

or

p1 =

1 . 3

5.101 a) Let T denote the event that Tom wins and let E denote the event that Tom goes first. By assumption, we have P (E) = 0.5. Also, from Exercise 5.100(a), P (T | E) = p1 /(p1 + p2 − p1 p2 ). Interchanging the roles of Tom and Dick, it follows as well from Exercise 5.100(a) that P (T | E c ) = 1 − P (T c | E c ) = 1 − Hence, by the law of total probability,

p2 p1 (1 − p2 ) = . p1 + p 2 − p 1 p2 p1 + p2 − p1 p2

P (T ) = P (E)P (T | E) + P (E c )P (T | E c ) = 0.5 · =

p1 p1 (1 − p2 ) + 0.5 · p1 + p2 − p1 p2 p1 + p2 − p1 p2

p1 + p1 (1 − p2 ) p1 (2 − p2 ) = . 2(p1 + p2 − p1 p2 ) 2(p1 + p2 − p1 p2 )

b) For the game to be fair, we must have P (T ) = 0.5. In view of part (a), this happens if and only if p1 (2 − p2 ) 1 = 2(p1 + p2 − p1 p2 ) 2

or

p1 = p2 ,

a result that we could also obtain by using a symmetry argument. 5.102 From the solution to Exercise 5.85, the probability of observing a nest with six eggs is ' −4 ( 6 e 4 ≈ 0.106. −4 6! 1−e

Hence, the number of nests examined until a nest containing six eggs is observed, Y , has the geometric distribution with parameter (approximately) 0.106. That is, pY (y) = (0.106)(1 − 0.106)y−1 = (0.106)(0.894)y−1 ,

and pY (y) = 0 otherwise.

y ∈ N,

5.6

5-53

Geometric Random Variables

Theory Exercises 5.103 We must verify that pX satistfies the three properties stated in Proposition 5.1 on page 187. Clearly, we have pX (x) ≥ 0 for all x. Moreover, { x ∈ R : pX (x) ̸= 0 } = N , which is countable. Finally, applying Equation (5.30) on page 230, we get / x

pX (x) =

∞ / k=1

p(1 − p)

k−1

=p

∞ / k=1

(1 − p)

k−1

=p

∞ /

j =0

(1 − p)j = p ·

1 = 1. 1 − (1 − p)

5.104 a) In Bernoulli trials, the trials are independent and, furthermore, the success probability is positive. Therefore, from Proposition 4.6, in repeated Bernoulli trials, the probability is 1 that a success will eventually occur. b) If we let X denote the number of trials up to and including the first success, then X ∼ G(p). The probability that a success will eventually occur is / / P (X < ∞) = pX (x) = pX (x) = 1. x a} = {a < X ≤ b} ∪ {X > b}

and that the two events on the right of this equation are mutually exclusive. Hence, by the additivity property of a probability measure, P (X > a) = P (a < X ≤ b) + P (X > b)

or

P (a < X ≤ b) = P (X > a) − P (X > b).

5.106 We first recall from Proposition 4.1 on page 134 that P (· | X > n) is a probability measure. If Equation (5.35) holds, then ' 0 ( ∞ ∞ / {X = n + j } | X > n = P (X > n + k | X > n) = P P (X = n + j | X > n) j =k+1

=

∞ /

j =k+1

P (X = j ) =

j =k+1

∞ /

j =k+1

P (X = j ) = P (X > k).

Conversely, suppose that P (X > n + k | X > n) = P (X > k) for all n, k ∈ N . Then, P (X = n + k | X > n) = P (X > n + k − 1 | X > n) − P (X > n + k | X > n) so that Equation (5.35) holds.

= P (X > k − 1) − P (X > k) = P (X = k),

Advanced Exercises 5.107 a) If the lack-of-memory property holds for X, then % & P (X > m | X > n) = P X > n + (m − n) | X > n = P (X > m − n),

where the last equality follows from Exercise 5.106. Hence, Equation (∗) is assured to hold by the lack-of-memory property.

5-54

CHAPTER 5

Discrete Random Variables and Their Distributions

b) The question is whether it is possible for X to satisfy Equation (∗∗). Suppose that it does. Then, we are assuming tacitly that P (X > n) ̸= 0 for all n ∈ N . By the conditional probability rule, we have P (X > m) = P (X > m | X > n) =

P (X > m, X > n) P (X > m) = , P (X > n) P (X > n)

which implies that P (X > n) = 1 for all n ∈ N . Note that this equation holds as well for n = 0 because X is positive-integer valued. Consequently, by Equation (5.32) on page 230, pX (n) = P (X > n − 1) − P (X > n) = 1 − 1 = 0,

n ∈ N,

which, in turn, implies that 1=

/ x

pX (x) =

∞ / n=1

pX (n) = 0,

which is a contradiction. Hence, it is not possible for X to satisfy Equation (∗∗). 5.108 a) For k = 2, 3, . . . , we have, by the law of total probability and the independence condition, that / P (X = n)P (X + Y = k | X = n) P (X + Y = k) = n

=

= Hence,

/ n

k−1 / n=1

P (X = n)P (Y = k − n | X = n) = p(1 − p)n−1 p(1 − p)k−n−1 =

k−1 / n=1

pX+Y (z) = (z − 1)p2 (1 − p)z−2 ,

/ n

P (X = n)P (Y = k − n)

p 2 (1 − p)k−2 = (k − 1)p2 (1 − p)k−2 . z = 2, 3, . . . ,

and pX+Y (z) = 0 otherwise. b) No, because, for instance, pX+Y (1) = 0. c) For convenience, set Z = min {X, Y }. Then, by the independence condition and Proposition 5.10 on page 231, we have, for n ∈ N , that P (Z > n) = P (min {X, Y } > n) = P (X > n, Y > n) = P (X > n)P (Y > n) = (1 − p)n (1 − p)n % % % &n 8 &9n 8 &9n = 1 − 2p − p 2 . = (1 − p)2n = (1 − p)2 = 1 − 1 − (1 − p)2

Applying again Proposition 5.10, we see that Z has the same tail probabilities as a geometric random variable 2p − p 2 . As tail probabilities determine the distribution, we conclude % with parameter & 2 that Z ∼ G 2p − p . 5.109 a) We observe that the event {X = x, Y = y} occurs if and only if the first x − 1 trials are failures, the xth trial is success, the next y − 1 trials are failures, and the following trial is success. Hence, P (X = x, Y = y) = (1 − p)x−1 p(1 − p)y−1 p = p2 (1 − p)x+y−2 .

Applying the conditional probability rule, we find that P (Y = y | X = x) =

p2 (1 − p)x+y−2 P (X = x, Y = y) = = p(1 − p)y−1 . P (X = x) p(1 − p)x−1

5.7

Other Important Discrete Random Variables

5-55

Now applying the law of total probability, we get / / P (X = x)P (Y = y | X = x) = pX (x)p(1 − p)y−1 P (Y = y) = x

y−1

= p(1 − p)

/ x

x y−1

pX (x) = p(1 − p)

· 1 = p(1 − p)y−1 .

Thus, P (Y = y | X = x) = P (Y = y), which means that events {X = x} and {Y = y} are independent. b) We know that X ∼ G(p) and, from the solution to part (a), we see that Y ∼ G(p). Furthermore, from part (a), the independence condition of Exercise 5.108 is satisfied. Hence, from Exercise 5.108(a), pX+Y (z) = (z − 1)p2 (1 − p)z−2 ,

z = 2, 3, . . . ,

and pX+Y (z) = 0 otherwise. c) The random variable X + Y gives the number of trials until the second success. d) Event {X + Y = z} occurs if and only if there is exactly one success in the first z − 1 trials and the zth trial is a success. Let A denote the former event and B the latter. Because Bernoulli trials are independent, events A and B are independent. Applying the binomial distribution, we have ' ( z−1 1 P (A) = p (1 − p)(z−1)−1 = (z − 1)p(1 − p)z−2 . 1 Hence, for z = 2, 3, . . . , pX+Y (z) = P (X + Y = z) = P (A ∩ B) = P (A)P (B)

= (z − 1)p(1 − p)z−2 · p = (z − 1)p2 (1 − p)z−2 .

This result agrees with the one found in part (b).

5.7 Other Important Discrete Random Variables Basic Exercises 5.110 We note that the only possible values of I{X=1} and X are 0 and 1. Hence, to prove that those two random variables are equal, it suffices to show that X = 1 if and only if I{X=1} = 1. However, X(ω) = 1 ⇔ ω ∈ {X = 1} ⇔ I{X=1} (ω) = 1. 5.111 a) We note that both IE∩F and IE · IF are 1 if E ∩ F occurs and both are 0 otherwise. Consequently, we have IE∩F = IE · IF . b) Suppose that E ∩ F = ∅. The only possible values of IE∪F are 0 and 1 and, because E ∩ F = ∅, the same is true for IE + IF . However, / E∪F ⇔ω ∈ / E and ω ∈ / F ⇔ IE (ω) = IF (ω) = 0 ⇔ (IE + IF )(ω) = 0. IE∪F (ω) = 0 ⇔ ω ∈ Consequently, IE∪F = IE + IF . Conversely, suppose that E ∩ F ̸= ∅. Let ω0 ∈ E ∩ F . Then IE∪F (ω0 ) = 1 ̸= 2 = 1 + 1 = IE (ω0 ) + IF (ω0 ) = (IE + IF )(ω0 ). ̸ IE + IF . We have now established that a necessary and sufficient condition Consequently, IE∪F = for IE∪F = IE + IF is that E ∩ F = ∅, that is, that E and F are mutually exclusive.

5-56

CHAPTER 5

Discrete Random Variables and Their Distributions

c) We claim that IE∪F = IE + IF − IE∩F , that is, that IE∪F (ω) = IE (ω) + IF (ω) − IE∩F (ω),

ω ∈ !.

To establish the preceding equation, we consider four cases. Case 1: ω ∈ (E ∪ F )c

Then ω ∈ / E ∪ F, ω ∈ / E, ω ∈ / F , and ω ∈ / E ∩ F . Hence, IE∪F (ω) = 0 = 0 + 0 − 0 = IE (ω) + IF (ω) − IE∩F (ω). Case 2: ω ∈ E ∩ F

Then ω ∈ E ∪ F , ω ∈ E, and ω ∈ F . Hence, IE∪F (ω) = 1 = 1 + 1 − 1 = IE (ω) + IF (ω) − IE∩F (ω). Case 3: ω ∈ E ∩ F c

Then ω ∈ E ∪ F , ω ∈ E, ω ∈ / F , and ω ∈ / E ∩ F . Hence, IE∪F (ω) = 1 = 1 + 0 − 0 = IE (ω) + IF (ω) − IE∩F (ω).

Case 4: ω ∈ E c ∩ F

Then ω ∈ E ∪ F , ω ∈ / E, ω ∈ F , and ω ∈ / E ∩ F . Hence, IE∪F (ω) = 1 = 0 + 1 − 0 = IE (ω) + IF (ω) − IE∩F (ω).

? 5.112 The only possible 3 values of I n An are 0 and 1 and, because A1 , A2 , . . . are mutually exclusive, the same is true for n IAn . Hence, 3to prove that those two random variables are equal, it suffices to show that I?n An = 0 if and only if n IAn = 0. However, '/ ( 0 ? / An ⇔ ω ∈ / An for all n ⇔ IAn (ω) = 0 for all n ⇔ IAn (ω) = 0. I n An (ω) = 0 ⇔ ω ∈ n

n

5.113 a) For 1 ≤ k ≤ n, let Ek denote the event that the kth trial results in success. 3n Then IEk = 0 if the kth trial results in failure and IEk = 1 if the kth trial results in success. Hence, k=1 IEk gives the total number 3n of successes in the n trials; that is, X = k=1 IEk . b) For 1 ≤ k ≤ n, let Xk be 1 or 0 depending on whether the kth trial results in success or 3failure, n respectively. Then each Xk is a Bernoulli random variable with parameter p and, moreover, k=1 Xk 3n gives the total number of successes in the n trials; that is, X = k=1 Xk .

5.114 Because we have a classical probability model, we know that P ({ω}) = 1/N for all ω ∈ !. Thus, for k = 1, 2, . . . , N, we have P (X = k) = P ({ ω : X(ω) = k }) = P ({ωk }) = 1/N. Hence, pX (x) = 1/N if x ∈ {1, 2, . . . , N}, and pX (x) = 0 otherwise. In other words, X has the discrete uniform distribution on the set {1, 2, . . . , N}.

5.7

Other Important Discrete Random Variables

5.115 We have pX (x) = a) For x ∈ S,

-

1/10, 0,

P (Y = y | X = x) = Hence, by the law of total probability, for y ∈ S,

-

5-57

if x ∈ S; otherwise.

1/9, 0,

if y ∈ S \ {x}; otherwise.

/ 1 P (Y = y | X = x) 10 x x∈S ' ( 1 / 1 / 1 1 1 1 = P (Y = y | X = x) = = 9· = . 10 x∈S 10 x∈S\{y} 9 10 9 10

P (Y = y) =

Therefore,

/

P (X = x)P (Y = y | X = x) =

pY (y) =

-

1/10, 0,

if y ∈ S; otherwise.

Thus, Y has the discrete uniform distribution on S. b) By symmetry, the value of Y is equally likely to be any of the 10 digits. Therefore, 1/10, if y ∈ S; pY (y) = 0, otherwise. Thus, Y has the discrete uniform distribution on S.

5.116 a) From Proposition 5.13 on page 241, we see that X ∼ N B(3, 0.67). Hence, ' ( x−1 (0.67)3 (0.33)x−3 , x = 3, 4, . . . , pX (x) = 2

and pX (x) = 0 otherwise. b) From part (a),

'

( 4−1 (0.67)3 (0.33)4−3 = 3(0.67)3 (0.33) = 0.298. P (X = 4) = pX (4) = 2

From the complementation rule, the FPF, and part (a), P (X ≥ 4) = 1 − P (X < 4) = 1 −

/ x 5) = 1 − P (X ≤ 5) = 1 − =1−

5 / k=2

/ x≤5

pX (x) = 1 −

5 /

pX (k)

k=2

(k − 1)(0.26)2 (0.74)k−2 = 1 − (0.26)2

= 0.612.

5 / k=2

(k − 1)(0.74)k−2

Alternatively, because X > 5 if and only if the number of hits in the first five at-bats is at most 1, we have P (X > 5) = c) From the FPF, P (3 ≤ X ≤ 10) =

1 ' ( / 5 k=0

k

/

3≤x≤10

(0.26)k (0.74)5−k = (0.74)5 + 5(0.26)(0.74)4 = 0.612.

pX (x) =

10 / k=3

pX (k) = (0.26)2

10 / (k − 1)(0.74)k−2 = 0.710. k=3

5.118 For each r ∈ N , we know that X is a positive-integer valued random variable. From Proposition 5.11 on page 234, the only positive-integer valued random variables that have the lack-of-memory property are geometric random variables. A negative binomial random variable is a geometric random variable if and only if r = 1. Hence, X has the lack-of-memory property if and only if r = 1. 5.119 Let E denote the event of a 7 or 11 when two balanced dice are rolled. Then P (E) =

6+2 2 = . 36 9

Let X denote the number of rolls required for the fourth occurrence of E. Then X ∼ N B(4, 2/9). Hence, by the complementation rule and the FPF, P (X > 6) = 1 − P (X ≤ 6) = 1 − =1−

6 ' / k=4

= 0.975.

/ x≤6

pX (x) = 1 −

6 /

pX (k)

k=4

( ( 6 ' / k−1 k−1 4 k−4 4 (2/9) (7/9) = 1 − (2/9) (7/9)k−4 4−1 3 k=4

5.7

Other Important Discrete Random Variables

5-59

5.120 a) Consider Bernoulli trials with success probability p. Let X denote the number of trials until the rth success and let Y denote the number of successes in the first n trials. Then X ∼ N B(r, p) and Y ∼ B(n, p). Event {X > n} occurs if and only if the rth success occurs after the nth trial, which happens if and only if the number of successes in the first n trials is less than r, that is, event {Y < r} occurs. Therefore, {X > n} = {Y < r} and, in particular, P (X > n) = P (Y < r). b) We have ' ( x−1 r pX (x) = p (1 − p)x−r , x = r, r + 1, . . . r −1 and

' ( n y pY (y) = p (1 − p)n−y , y

y = 0, 1, . . . , n.

Applying part (a) and the FPF, we get ( ∞ ' r−1 ' ( / / x−1 r n y x−r p (1 − p) = p (1 − p)n−y . r −1 y x=n+1 y=0 c) From the complementation rule and the FPF, P (X > n) = 1 − P (X ≤ n) = 1 −

/ x≤n

pX (x) = 1 −

n /

pX (k).

k=r

Hence, with this computation, we must evaluate n − r + 1 terms of the PMF of X. d) From the FPF, r−1 / / P (Y < r) = pY (y) = pY (k). y n), the difference between the numbers of terms that must be evaluated using the PMF of X and the PMF of Y is ' ( 2r − 1 (n − r + 1) − r = n − 2r + 1 = n 1 − . n

Thus, when n is large relative to r, it is necessary to evaluate roughly n more terms using the PMF of X than using the PMF of Y . In other words, when n is large relative to r, there is considerable computational savings in using the PMF of Y rather than the PMF of X to evaluate P (X > n). 5.121 For x = r, r + 1, . . . , ' ( ' ( ' ( ' ( x−1 x−1 r + (x − r) − 1 −r x−r = = = (−1) , r −1 x−r x−r x−r where the last equality follows from Exercise 5.60(b). Hence, for x = r, r + 1, . . . , ' ( ' ( ' ( x−1 r −r −r p (1 − p)x−r = (−1)x−r p r (1 − p)x−r = pr (p − 1)x−r . r −1 x−r x−r

5-60

CHAPTER 5

Discrete Random Variables and Their Distributions

5.122 Let X and Z denote the calls on which the second and fifth sales are made, respectively. Also, let Y ∼ N B(3, 0.2). Then ' ( j −1 P (X = j ) = (0.2)2 (0.8)j −2 = (j − 1)(0.2)2 (0.8)j −2 , j = 2, 3, . . . , 2−1 and, arguing as in Example 5.26(c), we find that ' ( k−j −1 P (Z = k | X = j ) = P (Y = k − j ) = (0.2)3 (0.8)k−j −3 3−1 ' ( k−j −1 = (0.2)3 (0.8)k−j −3 , k = j + 3, j + 4, . . . ; j = 2, 3, . . . . 2 From the FPF and the preceding display, it follows immediately that ( 15 ' / k−j −1 P (Z ≤ 15 | X = j ) = (0.2)3 (0.8)k−j −3 , 2 k=j +3

j = 2, 3, . . . , 12.

a) Applying the law of total probability, we now get that P (X ≤ 5, Z = 15) = =

5 /

j =2

5 /

j =2

P (X = j )P (Z = 15 | X = j ) 2

j −2

(j − 1)(0.2) (0.8)

'

( 15 − j − 1 (0.2)3 (0.8)15−j −3 2

' ( 5 / 14 − j = (0.2) (0.8) (j − 1) = 0.0156. 2 j =2 5

10

b) Again applying the law of total probability, we get P (X ≤ 5, Z ≤ 15) =

5 /

j =2

P (X = j )P (Z ≤ 15 | X = j )

2 ( 15 ' / k−j −1 (0.2)3 (0.8)k−j −3 2 j =2 k=j +3 1 2 ( 5 15 ' / / k−j −1 k−5 5 (j − 1) (0.8) = 0.104. = (0.2) 2 j =2 k=j +3 5 / (j − 1)(0.2)2 (0.8)j −2 =

1

Theory Exercises 5.123 We note that IE (ω) = 1 if and only if ω ∈ E. Hence, % & pIE (1) = P (IE = 1) = P { ω ∈ ! : IE (ω) = 1 } = P ({ ω ∈ ! : ω ∈ E) = P (E).

Moreover, because the only two possible values of IE are 0 and 1, we have, from the complementation rule, that pIE (0) = P (IE = 0) = 1 − P (IE = 1) = 1 − pIE (1) = 1 − P (E).

5.7

Other Important Discrete Random Variables

Hence, pIE (x) =

"

1 − P (E), P (E), 0,

5-61

if x = 0; if x = 1; otherwise.

5.124 The possible values of X are r, r + 1, . . . . For such a positive integer, the event {X = k} occurs if and only if the rth success is obtained on the kth trial. That happens if and only if (1) among the first k − 1 trials, exactly r − 1 are successes, and (2) the kth trial is a success. Denote the former event by A and the latter event by B. We have {X = k} = A ∩ B and, because the trials are independent, event A and event B are independent. Hence, P (X = k) = P (A ∩ B) = P (A)P (B). The number of successes in the first k − 1 trials has the binomial distribution with parameters k − 1 and p. Thus, by Proposition 5.3 on page 201, ' ( ' ( k − 1 r−1 k − 1 r−1 (k−1)−(r−1) P (A) = p (1 − p) = p (1 − p)k−r . r −1 r −1 Also, the probability that the kth trial is a success is p, so P (B) = p. Therefore, ' ( ' ( k − 1 r−1 k−1 r k−r P (X = k) = p (1 − p) ·p = p (1 − p)k−r . r −1 r −1 Consequently, the PMF of the random variable X is ' ( x−1 r pX (x) = p (1 − p)x−r , r −1

x = r, r + 1, . . . ,

and pX (x) = 0 otherwise. 5.125 We must verify that pX satisfies the three properties stated in Proposition 5.1 on page 187. Clearly, we have pX (x) ≥ 0 for all x. Moreover, { x ∈ R : pX (x) ̸= 0 } = {r, r + 1, . . .}, which is countable, being a subset of the countable set N . Applying Exercise 5.121 and the binomial series [Equation (5.45) on page 241], we get / x

( ( ∞ ' ∞ ' / / k−1 r −r k−r pX (x) = p (1 − p) = p r (p − 1)k−r r − 1 k − r k=r k=r ' ( ∞ / % &−r −r = pr (p − 1)j = pr 1 + (p − 1) = pr p−r = 1. j j =0

5.126 a) In Bernoulli trials, the trials are independent and, furthermore, the success probability is positive. Therefore, from Proposition 4.6, in repeated Bernoulli trials, the probability is 1 that a success will eventually occur. Because of independence in Bernoulli trials, once a success occurs, the subsequent process of Bernoulli trials is independent of the prior trials and behaves the same probabilistically as the process of Bernoulli trials starting from the beginning. Hence, we conclude that the probability is 1 that a second success will eventually occur. Continuing in this way, we deduce that, for each positive integer r, the probability is 1 that an rth success will eventually occur.

5-62

CHAPTER 5

Discrete Random Variables and Their Distributions

b) If we let X denote the number of trials up to and including the rth success, then X ∼ N B(r, p). The probability that an rth success will eventually occur is / / pX (x) = pX (x) = 1, P (X < ∞) = x 0. Also, from Definition 5.5 (p − 1) = p p Proposition 5.1 on page 187. We note that p(0) = −α 0 on page 211, we have, for each k ∈ N , '

( (−α)(−a − 1) · · · (−α − k + 1) α −α α p (−1)k (1 − p)k p (p − 1)k = k! k =

α(a + 1) · · · (α + k − 1) α p (1 − p)k > 0. k!

Hence, p(y) ≥ 0 for all y. Moreover, { y ∈ R : p(y) ̸= 0 } = {0, 1, . . .}, which is countable, being the union of the countable sets {0} and N . Applying now the binomial series, Equation (5.45) on page 241, we get ( ( ∞ ' ∞ ' / / / % &−α −α α −α k α p(y) = p (p − 1) = p (p − 1)k = pα 1 + (p − 1) = 1. k k y k=0 k=0

Advanced Exercises 5.128 a) Let k be an integer between 0 and N, inclusive. Let Rk denote the event that the smoker finds the matchbox in his right pocket empty at the moment when there are k matches in the matchbox in his left pocket; and let Lk denote the event that the smoker finds the matchbox in his left pocket empty at the moment when there are k matches in the matchbox in his right pocket. Events Rk and Lk are mutually exclusive, have the same probability (by symmetry), and have union {Y = k}. To find P (Rk ), we consider a success to be that the smoker goes to his right pocket for a match. Event Rk occurs if and only if the (N + 1)st success occurs on trial (N + 1) + (N − k) = 2N + 1 − k. Applying the negative binomial PMF with parameters N + 1 and 1/2, we see that the probability of that happening is ' ( ' (N+1 ' ((2N+1−k)−(N +1) ' ( ' (2N +1−k (2N + 1 − k) − 1 1 1 2N − k 1 P (Rk ) = = . (N + 1) − 1 N 2 2 2

Therefore,

' ( ' (2N −k 2N − k 1 . P (Y = k) = P (Rk ) + P (Lk ) = 2P (Rk ) = N 2

Consequently,

'

2N − y pY (y) = N and pY (y) = 0 otherwise.

( ' (2N −y 1 , 2

y = 0, 1, . . . , N,

5.7

5-63

Other Important Discrete Random Variables

b) Referring to part (a), we obtain the following table for N = 5: y

0

1

2

3

4

5

pY (y )

0.246094

0.246094

0.218750

0.164063

0.093750

0.031250

And, for N = 10, we have y

pY (y )

y

pY (y )

0 1 2 3 4 5

0.176197 0.176197 0.166924 0.148376 0.122192 0.091644

6 7 8 9 10

0.061096 0.034912 0.016113 0.005371 0.000977

c) We proceed as in part (a) until we apply the negative binomial PDF. At that point, we use the one with parameters N + 1 and p. Hence, we find that ' ( ' ( 2N − k N+1 2N − k p (1 − p)N−k and P (Lk ) = (1 − p)N+1 pN−k . P (Rk ) = N N Therefore,

P (Y = k) = P (Rk ) + P (Lk ) = Consequently,

' ( 9 2N − k 8 N+1 p (1 − p)N−k + (1 − p)N+1 pN−k . N

'

( 9 2N − y 8 N+1 p (1 − p)N−y + (1 − p)N+1 pN−y , pY (y) = N

y = 0, 1, . . . N,

and pY (y) = 0 otherwise. d) Referring to part (c), we obtain the following table for N = 5: y

0

1

2

3

4

5

pY (y )

0.200658

0.217380

0.216760

0.187730

0.126720

0.050752

And, for N = 10, we have y

pY (y )

y

pY (y )

0 1 2 3 4 5

0.117142 0.126903 0.134867 0.138435 0.134671 0.121357

6 7 8 9 10

0.098410 0.068997 0.039308 0.016240 0.003670

e) The answers here are the same as those in part (d) because, if we replace p by 1 − p, the probabilities for Y do not change. We can see this fact by interchanging the roles of the left and right pockets or, more directly, by replacing p by 1 − p (and, therefore, 1 − p by p) in the formula for the PMF of Y determined in part (c).

5-64

CHAPTER 5

Discrete Random Variables and Their Distributions

5.129 a) Consider a success a win of one point by Jane on any particular trial and let X denote the number of trials until the 21st success. Then X ∼ N B(21, p). Jane wins if and only if she gets 21 points before Joan does, that is, if and only if the 21st success occurs on or before trial 41. Hence, by the FPF, the required probability is ( 41 ' / k − 1 21 P (X ≤ 41) = pX (k) = p (1 − p)k−21 . pX (x) = 20 x≤41 k=21 k=21 41 /

/

b) Suppose that p = 1/2, that is, that Jane and Joan are evenly matched. Then, by symmetry, the probability is 1/2 that Jane wins the game. Hence, in view of part (a), we have ( ' (21 ' ( ( 41 ' 41 ' / / 1 k−1 1 1 k−21 k − 1 −k = 1− = 2 . 2 k=21 20 2 2 20 k=21 5.130 a) For each nonnegative integer y, '

( ' ( ' ( −r r (−r)(−r − 1) · · · (−r − y + 1) r(1 − pr ) r r(pr − 1) y y pYr (y) = pr (pr − 1) = 1− y y! r r % & ' ( y r(r + 1) · · · (r + y − 1) r(1 − pr ) r r(1 − pr ) = 1− ry r y! &y ' (' ( ' ( ' ( % 1 2 y−1 r(1 − pr ) r r(1 − pr ) = 1+ 1+ ··· 1 + · 1− · . r r r r y! Now, lim

r→∞

'

1 1+ r

and, because r(1 − pr ) → λ as r → ∞,

(' ( ' ( 2 y−1 1+ ··· 1 + =1 r r

&y % r(1 − pr ) λy = . lim r→∞ y! y!

Moreover, from the hint given in Exercise 5.86(a), ' ( r(1 − pr ) r lim 1 − = e−λ . r→∞ r Hence, lim pYr (y) = 1 · e−λ ·

r→∞

λy λy = e−λ . y! y!

b) Because r(1 − pr ) → λ as r → ∞, we have ' ( λ r(1 − pr ) r(1 − pr ) lim pr = lim 1 − = 1 − lim =1− = 1 − 0 = 1. r→∞ r→∞ r→∞ r r lim r r→∞

5.8

Functions of a Discrete Random Variable

5-65

c) Suppose that X ∼ N B(r, p), where r is large and p is close to 1. Let Y = X − r. Then the PDF of Y is given by Equation (5.49). Now, pX (x) = P (X = x) = P (Y + r = x) = P (Y = x − r) = pY (x − r).

Hence, in view of parts (a) and (b), pX (x) ≈ e

−r(1−p)

% &x−r r(1 − p) , (x − r)!

x = r, r + 1, . . . .

In other words, for large r and p close to 1, the PMF of a N B(r, p) random variable can be approximated by the right side of the previous display.

5.8 Functions of a Discrete Random Variable Basic Exercises 5.131 a) As the possible values of X are −2, −1, 0, 1, 2, and 3, the possible values of Y are −6, −3, 0, 3, 6, and 9. Here g(x) = 3x. We note that g is one-to-one and, solving the equation y = 3x for x, we find that g −1 (y) = y/3. Hence, from Equation (5.54) on page 248, we have pY (y) = p3X (y) = pX (y/3).

Then, for instance,

pY (6) = pX (6/3) = pX (2) = 0.15.

Proceeding similarly, we obtain the following table for the PMF of Y . y

−6

−3

0

3

6

9

pY (y )

0.10

0.20

0.20

0.15

0.15

0.20

b) The transformation results in a change of scale. That is, if the units in which X is measured are changed by a multiple of b, then bX will have the same relative value (in new units) as X does in old % & units. c) Yes, Corollary 5.1 applies to a change of scale i.e., a function of the form g(x) = bx because such a function is one-to-one on all of R.

5.132 a) As the possible values of X are −2, −1, 0, 1, 2, and 3, the possible values of Y are −4, −3, −2, −1, 0, and 1. Here g(x) = x − 2. We note that g is one-to-one and, solving the equation y = x − 2 for x, we find that g −1 (y) = y + 2. Hence, from Equation (5.54) on page 248, we have pY (y) = pX−2 (y) = pX (y + 2).

Then, for instance,

pY (−1) = pX (−1 + 2) = pX (1) = 0.15.

Proceeding similarly, we obtain the following table for the PMF of Y . y

−4

−3

−2

−1

0

1

pY (y )

0.10

0.20

0.20

0.15

0.15

0.20

5-66

CHAPTER 5

Discrete Random Variables and Their Distributions

b) The transformation results in a change of location. That is, if the units in which X is measured are changed by an addition of a, then a + X will have the same relative value (in the new units) as X does in the old units. % & c) Yes, Corollary 5.1 applies to a change of location i.e., a function of the form g(x) = a + x because such a function is one-to-one on all of R. 5.133 a) As the possible values of X are −2, −1, 0, 1, 2, and 3, the possible values of Y are −8, −5, −2, 1, 4, and 7. Here g(x) = 3x − 2. We note that g is one-to-one and, solving the equation y = 3x − 2 for x, we find that g −1 (y) = (y + 2)/3. Hence, from Equation (5.54) on page 248, we have % & pY (y) = p3X−2 (y) = pX (y + 2)/3 . Then, for instance,

% & pY (4) = pX (4 + 2)/3 = pX (2) = 0.15.

Proceeding similarly, we obtain the following table for the PMF of Y . y

−8

−5

−2

1

4

7

pY (y )

0.10

0.20

0.20

0.15

0.15

0.20

b) As the possible values of X are −2, −1, 0, 1, 2, and 3, the possible values of Y are a − 2b, a − b, a, a + b, a + 2b, and a + 3b. Here g(x) = a + bx. We note that g is one-to-one and, solving the equation y = a + bx for x, we find that g −1 (y) = (y − a)/b. Hence, from Equation (5.54) on page 248, we have % & pY (y) = pa+bX (y) = pX (y − a)/b . Then, for instance,

pY (a + 2b) = pX

8% & 9 (a + 2b) − a /b = pX (2) = 0.15.

Proceeding similarly, we obtain the following table for the PMF of Y . y

a − 2b

a−b

a

a+b

a + 2b

a + 3b

pY (y )

0.10

0.20

0.20

0.15

0.15

0.20

% c) Yes, Corollary 5.1 & applies to a simultaneous location and scale change i.e., a function of the form g(x) = a + bx because such a function is one-to-one on all of R.

5.134 a) As the possible values of X are −2, −1, 0, 1, 2, and 3, the possible values of Y are −33, −5, −1, 3, 31, and 107. Here g(x) = 4x 3 − 1. We note that g is one-to-one and, solving the equation y = 4x 3 − 1 % &1/3 for x, we find that g −1 (y) = (y + 1)/4 . Hence, from Equation (5.54) on page 248, we have 8% &1/3 9 pY (y) = p4X3 −1 (y) = pX (y + 1)/4 . Then, for instance,

pY (31) = pX

8% &1/3 9 (31 + 1)/4 = pX (2) = 0.15.

5.8

Functions of a Discrete Random Variable

5-67

Proceeding similarly, we obtain the following table for the PMF of Y . y

−33

−5

−1

3

31

107

pY (y )

0.10

0.20

0.20

0.15

0.15

0.20

b) As the possible values of X are −2, −1, 0, 1, 2, and 3, the possible values of Y are 0, 1, 16, and 81. Here g(x) = x 4 , which is not one-to-one on the range of X. Hence, to find the PDF of Y , we apply Proposition 5.14 on page 247. For instance, / / / pY (16) = pX (x) = pX (x) = pX (x) { x:x 4 =16 }

x∈g −1 ({16})

x∈{−2,2}

= pX (−2) + pX (2) = 0.10 + 0.15 = 0.25. Proceeding similarly, we obtain the following table for the PMF of Y . y

0

1

16

81

pY (y )

0.20

0.35

0.25

0.20

c) As the possible values of X are −2, −1, 0, 1, 2, and 3, the possible values of Y are 0, 1, 2, and 3. Here g(x) = |x|, which is not one-to-one on the range of X. Hence, to find the PDF of Y , we apply Proposition 5.14. For instance, / / / pY (2) = pX (x) = pX (x) = pX (x) { x:|x|=2 }

x∈g −1 ({2})

x∈{−2,2}

= pX (−2) + pX (2) = 0.10 + 0.15 = 0.25. Proceeding similarly, we obtain the following table for the PMF of Y . y

0

1

2

3

pY (y )

0.20

0.35

0.25

0.20

d) As the possible values of X are −2, −1, 0, 1, 2, and 3, the possible values of Y are 4, 6, 8, and 10. Here g(x) = 2|x| + 4, which is not one-to-one on the range of X. Hence, to find the PDF of Y , we apply Proposition 5.14. For instance, / / / pY (8) = pX (x) = pX (x) = pX (x) { x:2|x|+4=8 }

x∈g −1 ({8})

x∈{−2,2}

= pX (−2) + pX (2) = 0.10 + 0.15 = 0.25. Proceeding similarly, we obtain the following table for the PMF of Y . y

4

6

8

10

pY (y )

0.20

0.35

0.25

0.20

√ √ √ e) As √ of X are −2, −1, 0, 1, 2, and 3, the possible values of Y are 5, 8, 17, √ the possible values and 32. Here g(x) = 3x 2 + 5, which is not one-to-one on the range of X. Hence, to find the PDF

5-68

CHAPTER 5

Discrete Random Variables and Their Distributions

of Y , we apply Proposition 5.14. For instance, pY

%√

& 17 =

/

x∈g −1

%#√ $& 17

pX (x) =

#



x:

/

√ $ 3x 2 +5= 17

pX (x) =

/

pX (x)

x∈{−2,2}

= pX (−2) + pX (2) = 0.10 + 0.15 = 0.25. Proceeding similarly, we obtain the following table for the PMF of Y . √ √ √ √ 5 8 17 32 y pY (y )

0.20

0.35

0.25

0.20

5.135 a) Because the range of X is N , the range of Y is RY = { n/(n + 1) : n ∈ N }. b) Here g(x) = x/(x + 1), which is one-to-one on the range of X. Solving the equation y = x/(x + 1) for x, we find that g −1 (y) = y/(1 − y). Hence, from Equation (5.54) on page 248, we have % & pY (y) = pX/(X+1) (y) = pX y/(1 − y) if y ∈ RY , and pY (y) = 0 otherwise. c) Because the range of X is N , the range of Z is RZ = { (n + 1)/n : n ∈ N }. Here g(x) = (x + 1)/x, which is one-to-one on the range of X. Solving the equation z = (x + 1)/x for x, we determine that g −1 (z) = 1/(z − 1). Hence, from Equation (5.54), we have % & pZ (z) = p(X+1)/X (z) = pX 1/(z − 1)

if z ∈ RZ , and pZ (z) = 0 otherwise.

5.136 We know that pX (x) = p(1 − p)x−1 if x ∈ N , and pX (x) = 0 otherwise. a) Let Y = min{X, m}. We note that, because the range of X is N , the range of Y is {1, 2, . . . , m}. Here g(x) = min{x, m}, which is not one-to-one on the range of X. Hence, to find the PDF of Y , we apply Proposition 5.14 on page 247. For y = 1, 2, . . . , m − 1, / / / pX (x) = pX (x) = pX (x) = pX (y) = p(1 − p)y−1 . pY (y) = #

x∈g −1 ({y})

Also, pY (m) = = Hence,

/

x:min{x,m}=y

x∈g −1 ({m}) ∞ /

k=m

$

pX (x) =

x∈{y}

#

/

x:min{x,m}=m

$

pX (x) =

/

x≥m

p(1 − p)k−1 = (1 − p)m−1 .

⎧ ⎨ p(1 − p)y−1 , if y = 1, 2, . . . , m − 1; pY (y) = (1 − p)m−1 , if y = m; ⎩ 0, otherwise.

pX (x)

5.8

5-69

Functions of a Discrete Random Variable

b) Let Z = max{X, m}. We note that, because the range of X is N , the range of Z is {m, m + 1, . . .}. Here g(x) = max{x, m}, which is not one-to-one on the range of X. Hence, to find the PDF of Z, we apply Proposition 5.14. For z = m + 1, m + 2, . . . , / / / pZ (z) = pX (x) = pX (x) = pX (x) = pX (z) = p(1 − p)z−1 . #

x∈g −1 ({z})

Also, pZ (m) = =

x:max{x,m}=z

/

x∈g −1 ({m}) m / k=1

$

pX (x) =

x∈{z}

#

/

x:max{x,m}=m

$

pX (x) =

/

pX (x)

x≤m

p(1 − p)k−1 = 1 − (1 − p)m .

Hence, pZ (z) =

"

1 − (1 − p)m , p(1 − p)z−1 , 0,

if z = m; if z = m + 1, m + 2, . . . ; otherwise.

5.137 We know that pX (x) = e−3 3x /x! if x = 0, 1, . . . , and pX (x) = 0 otherwise. Because the range of X is {0, 1, . . .}, the range of Y is also {0, 1, . . .}. Here g(x) = |x − 3|, which is not one-to-one on the range of X. Hence, to find the PDF of Y , we apply Proposition 5.14 on page 247. For y = 1, 2, and 3, pY (y) =

/

x∈g −1 ({y})

pX (x) =

/

{ x:|x−3|=y }

pX (x) =

= pX (3 − y) + pX (3 + y) = e−3 = 27e−3

'

( 3−y 3y + . (3 − y)! (3 + y)!

/

pX (x)

x∈{3−y,3+y}

33−y 33+y + e−3 (3 − y)! (3 + y)!

Also, for y = 0, 4, 5, . . . , pY (y) =

/

x∈g −1 ({y})

pX (x) =

= pX (3 + y) = e−3

/

{ x:|x−3|=y }

pX (x) =

/

pX (x)

x∈{3+y}

33+y 3y = 27e−3 . (3 + y)! (3 + y)!

Hence, ⎧ 3y −3 ⎪ ⎪ 27e , ⎪ ⎪ (3 + y)! ⎪ ⎨ ' ( pY (y) = 3y 3−y −3 ⎪ 27e + , ⎪ ⎪ (3 − y)! (3 + y)! ⎪ ⎪ ⎩ 0,

if y = 0, 4, 5, . . . ; if y = 1, 2, 3; otherwise.

5-70

CHAPTER 5

Discrete Random Variables and Their Distributions

5.138 We know that pX (x) = 1/9 if x = 0, π/4, 2π/4, . . . , 2π , and pX (x) = 0 otherwise. a) Let Y = sin X. Here g(x) = sin x, which is not one-to-one on the range of X. Hence, to find the PDF of Y , we apply Proposition 5.14 on page 247. To that end, we first construct the following table: x

0

sin x

0

π/4 √ 1/ 2

2π/4 1

3π/4 √ 1/ 2

4π/4

5π/4 √ −1/ 2

0

6π/4 −1

7π/4 √ −1/ 2

8π/4 0

From the table, we see that the PDF of Y is as follows: y

−1

√ −1/ 2

0

√ 1/ 2

1

pY (y )

1/9

2/9

3/9

2/9

1/9

b) Let Z = cos X. Here g(x) = cos x, which is not one-to-one on the range of X. Hence, to find the PDF of Z, we apply Proposition 5.14. To that end, we first construct the following table: x

0

cos x

1

π/4 √ 1/ 2

2π/4 0

3π/4 √ −1/ 2

4π/4 −1

5π/4 √ −1/ 2

6π/4 0

7π/4 √ 1/ 2

8π/4 1

From the table, we see that the PDF of Z is as follows: z

−1

√ −1/ 2

0

√ 1/ 2

1

pZ (z )

1/9

2/9

2/9

2/9

2/9

c) No, tan X is not a random variable, because it is not a real-valued function on the sample space. For instance, we know that P (X = π/2) = 1/9 > 0, which implies that {X = π/2} ̸= ∅. Now, let ω be such that X(ω) = π/2. Then tan X(ω) = tan(π/2) = ∞. 5.139 We know that X ∼ N B(3, p) and, hence, that ' ( x−1 3 pX (x) = p (1 − p)x−3 , 2

x = 3, 4, . . . ,

and pX (x) = 0 otherwise. a) As X denotes the number of trials until the third success, the number of failures by the third success is X − 3. Thus, the proportion of failures by the third success is (X − 3)/X = 1 − 3/X. b) Because the range of X is {3, 4, . . .}, the range of Y is RY = { 1 − 3/n : n = 3, 4, . . . }. Here we have g(x) = 1 − 3/x, which is one-to-one on the range of X. Solving the equation y = 1 − 3/x for x, we find that g −1 (y) = 3/(1 − y). Hence, from Equation (5.54) on page 248, ' ( % & 3/(1 − y) − 1 3 pY (y) = p1−3/X (y) = pX 3/(1 − y) = p (1 − p)3/(1−y)−3 2 ' ( (2 + y)/(1 − y) 3 = p (1 − p)3y/(1−y) 2 if y ∈ RY , and pY (y) = 0 otherwise.

5.8

5-71

Functions of a Discrete Random Variable

5.140 Let X denote the number of trials until the fifth success and let Y = min{X, 17}. We want to determine the PDF of Y . Now, we know that X ∼ N B(5, p). Hence, ' ( x−1 5 p (1 − p)x−5 , x = 5, 6, . . . , pX (x) = 4 and pX (x) = 0 otherwise. Here g(x) = min{x, 17}, which is not one-to-one on the range of X. Hence, to find the PDF of Y , we apply Proposition 5.14 on page 247. For y = 5, 6, . . . , 16, / / / pX (x) = pX (x) = pX (x) pY (y) = x∈g −1 ({y})

= pX (y) = Also, pY (17) =

/

x∈g −1 ({17})

'

(

#

x:min{x,17}=y

$

x∈{y}

y−1 5 p (1 − p)y−5 . 4

pX (x) =

#

/

$ x:min{x,17}=17

pX (x) =

/

pX (x)

x≥17

( 16 ' / k−1 5 pX (k) = 1 − pX (k) = 1 − p (1 − p)k−5 . = 4 k=17 k=5 k=5 16 /

∞ /

Hence,

⎧ 'y − 1( ⎪ ⎪ if y = 5, 6, . . . , 16; p5 (1 − p)y−5 , ⎪ ⎪ 4 ⎪ ⎪ ⎨ ( 16 ' / pY (y) = k−1 5 ⎪ 1 − p (1 − p)k−5 , if y = 17; ⎪ ⎪ 4 ⎪ ⎪ k=5 ⎪ ⎩ 0, otherwise.

Advanced Exercises

5.141 a) The range of Y is {0, 1, 2}. Let g(x) = x (mod 3). Applying Proposition 5.14 on page 247, we get pY (0) = = and pY (1) =

/

g(x)=0

/

x∈{3,6,9,...}

pX (x) =

∞ / k=1

pX (3k) =

∞ / k=1

p(1 − p)3k−1

∞ % &k p(1 − p)2 (1 − p)2 p / = (1 − p)3 = 1 − p k=1 1 − (1 − p)3 3 − 3p + p 2

/

g(x)=1

=p

pX (x) =

pX (x) =

/

x∈{1,4,7,...}

∞ % / &k (1 − p)3 = k=0

pX (x) =

∞ / k=0

pX (3k + 1) =

p 1 = 3 1 − (1 − p) 3 − 3p + p 2

∞ / k=0

p(1 − p)(3k+1)−1

5-72

CHAPTER 5

Discrete Random Variables and Their Distributions

and pY (2) =

/

g(x)=2

pX (x) =

/

x∈{2,5,8,...}

∞ % / &k = p(1 − p) (1 − p)3 = k=0

Hence,

pX (x) =

∞ / k=0

pX (3k + 2) =

∞ / k=0

p(1 − p)(3k+2)−1

p(1 − p) 1−p = . 3 1 − (1 − p) 3 − 3p + p 2

⎧ (1 − p)2 /(3 − 3p + p 2 ), if y = 0; ⎪ ⎪ ⎪ ⎨ if y = 1; 1/(3 − 3p + p2 ), pY (y) = 2 ⎪ (1 − p)/(3 − 3p + p ), if y = 2; ⎪ ⎪ ⎩ 0, otherwise. b) The range of Y is {0, 1, 2}. Let g(x) = x (mod 3). We first note that, for n ∈ N , and

pX (n + 1) = P (X = n + 1) = p(1 − p)(n+1)−1 = (1 − p)p(1 − p)n−1 = (1 − p)pX (n)

pX (n + 2) = P (X = n + 2) = p(1 − p)(n+2)−1 = (1 − p)2 p(1 − p)n−1 = (1 − p)2 pX (n). Applying Proposition 5.14, we get pY (0) = Also, pY (1) =

/

g(x)=1

=p+ and pY (2) =

pX (x) =

/

g(x)=0

pX (x) =

/

x∈{1,4,7,...}

/

x∈{3,6,9,...}

pX (x) =

∞ / k=0

pX (x) =

∞ /

pX (3k).

k=1

pX (3k + 1) = pX (1) +

∞ / (1 − p)pX (3k) = p + (1 − p)pY (0)

/

∞ /

pX (3k + 1)

∞ /

pX (3k + 2)

k=1

k=1

g(x)=2

pX (x) =

= p(1 − p) + Therefore,

/

x∈{2,5,8,...}

pX (x) =

∞ / k=0

pX (3k + 2) = pX (2) +

∞ / (1 − p)2 pX (3k) = p(1 − p) + (1 − p)2 pY (0)

k=1

k=1

% & % & 1 = pY (0) + pY (1) + pY (2) = pY (0) + p + (1 − p)pY (0) + p(1 − p) + (1 − p)2 pY (0) 8 9 % & = p + p(1 − p) + 1 + (1 − p) + (1 − p)2 pY (0) = 2p − p2 + 3 − 3p + p 2 pY (0).

Solving for pY (0) yields

pY (0) =

1 − 2p + p2 (1 − p)2 = . 3 − 3p + p 2 3 − 3p + p 2

5-73

Review Exercises for Chapter 5

Consequently, we also have pY (1) = p + (1 − p)pY (0) = p + (1 − p)

(1 − p)2 3 − 3p + p 2

% & 3p − 3p2 + p 3 + (1 − p)3 1 = = 2 3 − 3p + p 3 − 3p + p 2

and pY (2) = p(1 − p) + (1 − p)2 pY (0) = p(1 − p) + (1 − p)2

(1 − p)2 3 − 3p + p 2

8% 9 & % & (1 − p) 3p − 3p 2 + p 3 + (1 − p)3 p(1 − p) 3 − 3p + p 2 + (1 − p)4 = = 3 − 3p + p 2 3 − 3p + p 2 =

1−p . 3 − 3p + p 2

Hence,

⎧ (1 − p)2 /(3 − 3p + p 2 ), if y = 0; ⎪ ⎪ ⎪ ⎨ 1/(3 − 3p + p2 ), if y = 1; pY (y) = 2 ⎪ (1 − p)/(3 − 3p + p ), if y = 2; ⎪ ⎪ ⎩ 0, otherwise.

Review Exercises for Chapter 5 Basic Exercises 5.142 a) The possible values of X are 1, 2, 3, and 4. b) The event that the student selected is a junior can be represented as {X = 3}. c) From the table, the total number of undergraduate students is 6,159 + 6,790 + 8,141 + 11,220 = 32,310. Hence, because the selection is done randomly, we have P (X = 3) =

N{X = 3} 8,141 = = 0.252. N(!) 32,310

Thus, 25.2% of the undergraduate students are juniors. d) Proceeding as in part (c), we obtain the following table for the PMF of X: x

1

2

3

4

pX (x )

0.191

0.210

0.252

0.347

5-74

CHAPTER 5

Discrete Random Variables and Their Distributions

e) Following is a probability histogram for X. In the graph, we use p(x) instead of pX (x).

5.143 a) The event that the number of busy lines is exactly four can be expressed as {Y = 4}. b) The event that the number of busy lines is at least four can be expressed as {Y ≥ 4}. c) The event that the number of busy lines is between two and four, inclusive, can be expressed as {2 ≤ Y ≤ 4}. d) We have P (Y = 4) = pY (4) = 0.174, 6 / / P (Y ≥ 4) = pY (y) = pY (k) = 0.174 + 0.105 + 0.043 = 0.322, y≥4

and

P (2 ≤ Y ≤ 4) =

/

2≤y≤4

k=4

pY (y) =

4 / k=2

pY (k) = 0.232 + 0.240 + 0.174 = 0.646.

5.144 We note that X ∼ G(0.5). Hence, pX (x) = 2−x if x ∈ N , and pX (x) = 0 otherwise. a) Following is a partial probability histogram for X. In the graph, we use p(x) instead of pX (x).

b) We can’t draw the entire probability histogram because the range of X is infinite, namely, N .

5-75

Review Exercises for Chapter 5

c) By the FPF, P (X > 4) =

/ x>4

pX (x) =

∞ / k=5

(1/2)k =

(1/2)5 1 = (1/2)4 = . 1 − (1/2) 16

d) From the complementation rule and the FPF,

P (X > 4) = 1 − P (X ≤ 4) = 1 − =1−

/ x≤4

pX (x) = 1 −

4 /

(1/2)k

k=1

% & (1/2) − (1/2)5 1 = 1 − 1 − (1/2)4 = (1/2)4 = . 1 − (1/2) 16

e) Proceeding as in parts (c) and (d), respectively, we get P (X > x) = and

/

y>x

pX (y) =

∞ /

(1/2)k =

k=x+1

P (X > x) = 1 − P (X ≤ x) = 1 −

/

y≤x

(1/2)x+1 = (1/2)x = 2−x 1 − (1/2)

pX (y) = 1 −

x / (1/2)k k=1

% & (1/2) − (1/2)x+1 = 1 − 1 − (1/2)x = (1/2)x = 2−x . 1 − (1/2) f) Let N denote the set of prime numbers. Then / P (X ∈ N) = pX (x) ≥ pX (2) + pX (3) + pX (5) + pX (7) + pX (11) =1−

x∈N

1 1 1 1 1 + 3 + 5 + 7 + 11 = 0.41455, 2 2 2 2 2 2 to five decimal places. Also, from part (e), / pX (x) ≤ pX (2) + pX (3) + pX (5) + pX (7) + pX (11) + P (X > 12) P (X ∈ N) = =

x∈N

1 1 1 1 1 1 + 3 + 5 + 7 + 11 + 12 = 0.41479, 2 22 2 2 2 2 to five decimal places. Thus, 0.41455 ≤ P (X ∈ N) ≤ 0.41479 and, hence, P (X ∈ N) = 0.415 to three significant digits. =

5.145 a) We have p = 0.4 because there is a 40% chance that a drinker is involved. b) With independence, an outcome consisting of exactly k successes has probability (0.4)k (0.6)3−k . Thus, we have the following table: Outcome

(s, s, s) (s, s, f ) (s, f, s) (s, f, f )

Probability

(0.4)3 (0.6)0 (0.4)2 (0.6)1 (0.4)2 (0.6)1 (0.4)1 (0.6)2

= 0.064 = 0.096 = 0.096 = 0.144

Outcome

(f, s, s) (f, s, f ) (f, f, s) (f, f, f )

Probability

(0.4)2 (0.6)1 (0.4)1 (0.6)2 (0.4)1 (0.6)2 (0.4)0 (0.6)3

= 0.096 = 0.144 = 0.144 = 0.216

5-76

CHAPTER 5

Discrete Random Variables and Their Distributions

c) From the table in part (b), we see that the outcomes in which exactly two of three traffic fatalities involve a drinker are (s, s, f ), (s, f, s), and (f, s, s). d) From the table in part (b), we see that each of the three outcomes in part (c) has probability 0.096. The three probabilities are the same because each probability is obtained by multiplying two success probabilities of 0.4 and one failure probability of 0.6. e) From parts (c) and (d), we see that the probability that exactly two of three traffic fatalities involve a drinker is equal to 3 · 0.096 = 0.288. f) X ∼ B(3, 0.4); that is, X has the binomial distribution with parameters 3 and 0.4. g) Referring to the table in part (b), we get % & P (X = 0) = P {(f, f, f )} = 0.216, % & % & % & % & P (X = 1) = P {(s, f, f ), (f, s, f ), (f, f, s)} = P {(s, f, f )} + P {(f, s, f )} + P {(f, f, s)} = 0.144 + 0.144 + 0.144 = 0.432, % & % & % & % & P (X = 2) = P {(s, s, f ), (s, f, s), (f, s, s)} = P {(s, s, f )} + P {(s, f, s)} + P {(f, s, s)} = 0.096 + 0.096 + 0.096 = 0.288, % & P (X = 3) = P {(s, s, s)} = 0.064.

Thus, we have the following table for the PMF of X: x

0

1

2

3

pX (x )

0.216

0.432

0.288

0.064

h) Applying Procedure 5.1 with n = 3 and p = 0.4, we get ' ( 3 (0.4)x (0.6)3−x , pX (x) = x

x = 0, 1, 2, 3,

and pX (x) = 0 otherwise. Substituting successively x = 0, 1, 2, and 3 into the preceding display and doing the necessary algebra gives the same values as shown in the table in part (g). 5.146 Let Y denote the number of traffic fatalities examined up to and including the first one that involves a drinker. Then Y ∼ G(0.4). From Proposition 5.9 on page 229, pY (y) = (0.4)(0.6)y−1 ,

y = 1, 2, . . . ,

and pY (y) = 0 otherwise. Also, from Proposition 5.10 on page 231, a) We have

P (Y > n) = (0.6)n ,

n ∈ N.

P (Y = 4) = pY (4) = (0.4)(0.6)4−1 = (0.4)(0.6)3 = 0.0864.

b) From the complementation rule, c) We have

P (Y ≤ 4) = 1 − P (Y > 4) = 1 − (0.6)4 = 0.870. P (Y ≥ 4) = P (Y > 3) = (0.6)3 = 0.216.

d) We want to find the largest n such that P (Y > n) > 0.2. Thus, (0.6)n > 0.2

or

Hence, n = 3. e) See the beginning of the solution to this exercise.

n<

ln 0.2 ≈ 3.15. ln 0.6

5-77

Review Exercises for Chapter 5

5.147 Let X denote the number of the eight tires tested that get at least 35,000 miles. We know that X ∼ B(8, 0.9). Therefore, by the FPF, 8 ' ( / / 8 pX (x) = (0.9)k (0.1)8−k = 0.962. P (X ≥ 0.75 · 8) = P (X ≥ 6) = k x≥6 k=6 5.148 Let X denote the number of “no-shows” when n reservations are made. Then X ∼ B(n, 0.04). We want to find the largest n so that P (X = 0) ≥ 0.8. Noting that ' ( n P (X = 0) = (0.04)0 (0.96)n = (0.96)n , 0

we have the requirement that

(0.96)n ≥ 0.8

or

n≤

ln 0.8 ≈ 5.5. ln 0.96

Hence, n = 5.

5.149 Let X denote the number of moderate-income families of five sampled that spend more than half their income on housing. Then X ∼ H(N, 5, 0.76), where N is the number of moderate-income families in the United States. Because the sample size (5) is small relative to the population size (N), we know, from Proposition 5.6 on page 215, that X ≈ B(5, 0.76). Thus, ' ( 5 (0.76)x (0.24)5−x , x = 0, 1, . . . , 5. pX (x) ≈ x

a) We have

' ( 5 (0.76)3 (0.24)2 = 0.253. P (X = 3) = pX (3) ≈ 3

b) From the FPF,

P (X ≤ 3) = c) From the FPF, P (X ≥ 4) =

/ /

x≤3

x≥4

pX (x) =

3 /

pX (x) =

5 /

k=0

k=4

pX (k) ≈

3 ' ( / 5

(0.76)k (0.24)5−k = 0.346.

pX (k) ≈

5 ' ( / 5

(0.76)k (0.24)5−k = 0.654.

d) From the complementation rule and the FPF,

k=0

k=4

k

k

& % P (1 ≤ X ≤ 4) = 1 − P (X = 0 or 5) = 1 − pX (0) + pX (5) 2 1' ( ' ( 5 5 ≈1− (0.76)0 (0.24)5 + (0.76)5 (0.24)0 0 5 = 1 − (0.24)5 − (0.76)5 = 0.746.

e) Applying the formula presented at the beginning of this solution for the PDF of a B(5, 0.76) random variable, we obtain the following table: x

0

1

2

3

4

5

pX (x )

0.000796

0.012607

0.079847

0.252850

0.400346

0.253553

5-78

CHAPTER 5

Discrete Random Variables and Their Distributions

f) The PMF obtained in part (e) assumes sampling with replacement, whereas the sampling is actually without replacement. Thus, the binomial approximation to the hypergeometric distribution (Proposition 5.6) has been used. g) As we noted, X ∼ H(N, 5, 0.76). Hence, ' (' ( 0.76N 0.24N x 5−x ' ( , x = 0, 1, . . . , 5. pX (x) = N 5 5.150 a) Let X denote the number of successes in n Bernoulli trials with success probability p. We know that X ∼ B(n, p). Let Xj be 0 or 1, respectively, depending on whether a failure or success occurs on trial j . We want to find P (Xj = 1 | X = n). The event {Xj = 1, X = n} occurs if and only if trial j results in a success and among the other n − 1 trials exactly k − 1 successes occur. The former event has probability p and the latter event has probability ' ( ' ( n − 1 k−1 n − 1 k−1 (n−1)−(k−1) p (1 − p) = p (1 − p)n−k . k−1 k−1

Moreover, by the independence of Bernoulli trials, the two events are independent. Applying the conditional probability rule, the special multiplication rule, and the binomial PMF, we get ' ( ' ( n − 1 k−1 n−1 n−k p· p (1 − p) P (Xj = 1, X = n) k k−1 k−1 ' ( = = ' ( = . P (Xj = 1 | X = n) = n n P (X = n) n p k (1 − p)n−k k k

b) Given that exactly k successes occur in n Bernoulli trials, we can think of the positions (trial numbers) of the k successes and n − k failures as obtained by randomly placing k “s”s and n − k “f ”s in n boxes, one in each box. By symmetry, the letter (“s” or “f ”) placed in box j is equally likely to be any of the n letters. As k of the n letters are “s”s, the probability is k/n that an “s” goes into box j —that is, the probability is k/n that trial j results in success. 5.151 a) According to the frequentist interpretation of probability (see page 5), we have P (E) ≈ n(E)/n. Intuitively, then, we estimate P (E) to be n(E)/n. b) We note that n(E) ∼ B(n, p), where p = P (E). Suppose that E occurs the n trials, that %n& kk times inn−k is, that event {n(E) = k} occurs. The probability of that happening is k p (1 − p) . We want to choose p to maximize that quantity, which we denote f (p). We have ' (8 9 n ′ kpk−1 · (1 − p)n−k − p k · (n − k)(1 − p)n−k−1 f (p) = k ' ( ' ( % & n k−1 n k−1 n−k−1 = p (1 − p) k(1 − p) − (n − k)p = p (1 − p)n−k−1 (k − np). k k From the previous display, we see that f ′ (p) = 0 if and only if p = k/n and, moreover, that f ′ (p) > 0 if p < k/n and f ′ (p) < 0 if p > k/n. Therefore, f takes its maximum value at p = k/n. In other words, the maximum likelihood estimate of P (E) is the random variable pˆ = n(E)/n. c) The maximum likelihood estimate of P (E) coincides with the intuitive estimate based on the frequentist interpretation of probability.

Review Exercises for Chapter 5

5-79

5.152 Referring to Definition 5.5 on page 211, we obtain the following results. a) Because 4 > 0, ' ( 9·8·7·6 9 9·8·7·6 = = 126. = 4! 24 4 b) Because −4 < 0, ' ( 9 = 0. −4 c) Because 4 > 0, ' ( −9 (−9) · (−10) · (−11) · (−12) 9 · 10 · 11 · 12 = = = 495. 4 4! 24

d) Because −4 < 0, e) Because 9 > 0,

f) Because −9 < 0,

'

( −9 = 0. −4

' ( 4 4 · 3 · 2 · 1 · 0 · (−1) · (−2) · (−3) · (−4) 0 = = = 0. 9 9! 9! '

( 4 = 0. −9

g) Because 9 > 0, ' ( −4 (−4) · (−5) · (−6) · (−7) · (−8) · (−9) · (−10) · (−11) · (−12) = 9 9! =− h) Because −9 < 0, i) Because 4 > 0, '

10 · 11 · 12 9 · 8 · 7 · 6 · 5 · 4 · 10 · 11 · 12 =− = −220. 9! 3·2·1 '

( −4 = 0. −9

( 1/2 (1/2) · (−1/2) · (−3/2) · (−5/2) 15 5 = =− =− . 4 4! 16 · 24 128

j) Because −4 < 0,

' ( 1/2 = 0. −4

k) Because 4 > 0, ' ( −1/2 (−1/2) · (−3/2) · (−5/2) · (−7/2) 1·3·5·7 35 = = = . 4 4! 16 · 24 128

l) Because −4 < 0,

'

( −1/2 = 0. −4

5.153 Let Y denote the number of tests required under the alternative scheme. Also, let X denote the number of people in the sample who have the disease. We know that X ∼ H(N, n, p), where N is the

5-80

CHAPTER 5

Discrete Random Variables and Their Distributions

number of people in the population. As N is very large, we can use the binomial approximation to the hypergeometric distribution. Hence, X ≈ B(n, p). We have 1, if X = 0; Y = n + 1, if X ≥ 1. Now,

' ( n 0 p (1 − p)n−0 = (1 − p)n pY (1) = P (Y = 1) = P (X = 0) ≈ 0

and, by the complementation rule,

pY (n + 1) = 1 − pY (0) = 1 − (1 − p)n . Hence, pY (y) =

"

(1 − p)n , if y = 1; n 1 − (1 − p) , if y = n + 1; 0, otherwise.

5.154 a) The number of patients, X, that arrive within the first hour of the doctor’s arrival has, by assumption, the Poisson distribution with parameter λ = 6.9. The first patient arrives more than 1 hour after the doctor arrives if and only if no patients arrive within the first hour of the doctor’s arrival, which means that X = 0. Hence, the required probability is (6.9)0 = e−6.9 = 0.00101. 0!

P (X = 0) = e−6.9

b) According to Proposition 5.8 on page 225, the most probable number of patients that arrive within the first hour of the doctor’s arrival is ⌊6.9⌋ = 6. From the complementation rule and the FPF, P (|X − 6| > 1) = 1 − P (|X − 6| ≤ 1) = 1 −

/

|x−6|≤1

pX (x) = 1 −

7 /

e−6.9

k=5

(6.9)k = 0.569. k!

c) From the complementation rule and the FPF, P (X ≥ 5) = 1 − P (X < 5) = 1 −

/ x 6) = (1 − p) = p (1 − p)6 = P (Y = 0). 0 b) Consider Bernoulli trials with success probability p. Let X denote the number of trials up to and including the first success, and let Y denote the number of successes in the first six trials. Then X ∼ G(p) and Y ∼ B(6, p). Now, X > 6 if and only if the first success occurs after the sixth trial, which happens if and only if no successes occur during the first six trials, which means that Y = 0. Consequently, we have {X > 6} = {Y = 0} and, hence, in particular, P (X > 6) = P (Y = 0). 5.162 a) From Proposition 5.13 on page 241, we know that X ∼ N B(3, p) and that ' ( x−1 3 pX (x) = x = 3, 4, . . . , p (1 − p)x−3 , 2

and pX (x) = 0 otherwise. b) From the FPF, P (X ≤ 40) =

/

x≤40

pX (x) =

40 /

pX (k).

k=3

Hence, we see that 38 terms of the PMF of X must be added to obtain P (X ≤ 40). We have ( 40 ' / k−1 3 P (X ≤ 40) = p (1 − p)k−3 . 2 k=3

c) Let Y denote the number of successes in the first 40 trials. Then, we have Y ∼ B(40, p). We note that {X > 40} = {Y < 3}. Hence, from the complementation rule and the FPF, P (X ≤ 40) = 1 − P (X > 40) = 1 − P (Y < 3) = 1 −

/ y k) · · · P (Xn > k) = . N

5-86

CHAPTER 5

Discrete Random Variables and Their Distributions

Consequently, P (Y = k) = P (Y > k − 1) − P (Y > k) = =

(N − k + 1)n − (N − k)n . Nn

Thus, pY (y) =

'

(N − y + 1)n − (N − y)n , Nn

N − (k − 1) N

(n



'

N −k N

(n

y = 1, 2, . . . , N,

and pY (y) = 0 otherwise.

5.165 a) Let X denote the largest numbered member sampled. Note that event {X = k} occurs if and only if exactly n − 1 of the members sampled have numbers at most k − 1 and one member sampled has number k. Hence, ' (' (' ( ' ( k−1 1 N −k k−1 N({X = k}) n−1 1 0 n−1 ' ( ' ( P (X = k) = = = ' ( , k = n, n + 1, . . . , N. N N N n n n

Thus,

pX (x) =

'

( x−1 n−1 ' ( , N n

x = n, n + 1, . . . , N,

and pX (x) = 0 otherwise. b) Let Y denote the smallest numbered member sampled. Note that event {Y = k} occurs if and only if exactly n − 1 of the members sampled have numbers at least k + 1 and one member sampled has number k. Hence, ' (' (' ( ' ( k−1 1 N −k N −k N({Y = k}) 0 1 n−1 n−1 ' ( ' ( = = ' ( , k = 1, 2, . . . , N − n + 1. P (Y = k) = N N N n n n Thus,

pY (y) = and pY (y) = 0 otherwise. 5.166 We have

pX (x) = and

'

( N −y n−1 ' ( , N n

y = 1, 2, . . . , N − n + 1,

' ( n x p (1 − p)n−x , x

' ( m y pY (y) = p (1 − p)m−y , y

x = 0, 1, . . . , n y = 0, 1, . . . , m.

5-87

Review Exercises for Chapter 5

a) For z = 0, 1, . . . , n + m, we have P (X + Y = z) = =

/ x

P (X = x)P (X + Y = z | X = x)

x

P (X = x)P (Y = z − x | X = x) =

/

z ' ( / n

x

P (X = x)P (Y = z − x)

( m = p (1 − p) p z−x (1 − p)m−(z−x) x z − x x=0 ( ' ( z ' (' / n m n+m z z n+m−z = p (1 − p) = p (1 − p)n+m−z , x z − x z x=0 x

n−x

'

/

where the last equality follows from Vandermonde’s identity. Thus, X + Y ∼ B(n + m, p). b) For x = 0, 1, . . . , z, we have, in view of the conditional probability rule, the independence condition, and part (a), that P (X = x | X + Y = z) =

P (X = x, X + Y = z) P (X + Y = z)

P (X = x, Y = z − x) P (X = x)P (Y = z − x) = P (X + Y = z) P (X + Y = z) ' ( ' ( n x m p (1 − p)n−x p z−x (1 − p)m−(z−x) x z−x ' ( = n+m z p (1 − p)n+m−z z ' (' ( n m x z−x ( . = ' n+m z % & Thus, given that X + Y = z, we have X ∼ H n + m, z, n/(n + m) . =

5.167 If the sample is drawn with replacement, then the number of nondefective items of the six sampled has the B(6, 2/3) distribution. Hence, its PMF is ' ( 6 (2/3)k (1/3)6−k , k = 0, 1, . . . 6. k

However, if the sample is drawn without replacement., then the number of nondefective items of the six sampled has the H(12, 6, 2/3) distribution. Hence, its PMF is ' (' ( 8 4 k 6−k ' ( , k = 0, 1, . . . 6. 12 6

5-88

CHAPTER 5

Discrete Random Variables and Their Distributions

The following table provides both PMFs. k

Binomial

Hypergeometric

0 1 2 3 4 5 6

0.00137 0.01646 0.08230 0.21948 0.32922 0.26337 0.08779

0.00000 0.00000 0.03030 0.24242 0.45455 0.24242 0.03030

The engineer will probably estimate the number of nondefectives in the lot of 12 to be roughly twice the number of nondefectives in the sample of six: M ≈ 12 ·

k = 2k. 6

From the table, we see that it’s more likely for the number of nondefectives in the sample to be close to four—that is, for the engineer’s estimate to be close to eight—if the sampling is without replacement (hypergeometric). For instance, the probability that the number of nondefectives in the sample will be within one of four is 0.812 for the binomial and 0.939 for the hypergeometric. Hence, a better estimate of the number of nondefectives in the lot is obtained by sampling without replacement. 5.168 a) We would expect the binomial approximation to be worse when n = 50 than when n = 8 because, in the former case, the sample size is larger relative to the size of the population than in the latter case. In fact, here, when n = 50, the rule of thumb for use of a binomial distribution to approximate a hypergeometric distribution is just barely met—the sample size of 50 is exactly 5% of the population size of 1000. b) We have the following table, where the probabilities are rounded to eight decimal places and where we displayed probabilities until they are zero to that number of decimal places: Successes Hypergeometric x probability

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

0.00001038 0.00013826 0.00089640 0.00377132 0.01157781 0.02765180 0.05349306 0.08617037 0.11793007 0.13921652 0.14344797 0.13023588 0.10498345 0.07561264 0.04891210

Binomial probability

0.00001427 0.00017841 0.00109274 0.00437095 0.01283965 0.02953120 0.05537101 0.08701158 0.11692182 0.13640879 0.13981901 0.12710819 0.10327540 0.07547049 0.04986443

Successes Hypergeometric x probability

15 16 17 18 19 20 21 22 23 24 25 26 27 28

0.02854165 0.01507892 0.00723473 0.00316049 0.00125978 0.00045900 0.00015309 0.00004679 0.00001312 0.00000337 0.00000080 0.00000017 0.00000003 0.00000001

Binomial probability

0.02991866 0.01636177 0.00818088 0.00374957 0.00157877 0.00061177 0.00021849 0.00007200 0.00002191 0.00000616 0.00000160 0.00000039 0.00000009 0.00000002

c) As we see by comparing the table in part (b) to Table 5.15, the accuracy of the binomial approximation to the hypergeometric is superior when n = 8 than when n = 50—just as expected.

Review Exercises for Chapter 5

5-89

5.169 a) The likelihood ratio for M based on an observed value k of X is ' (' (A' ( M N −M N Lk (M) k n−k n (' (A' ( =' M − 1 N − (M − 1) N Lk (M − 1) k n−k n =

M(N − M + 1 − n + k) (N + 1)k − nM =1+ , (M − k)(N − M + 1) (M − k)(N − M + 1)

where the last equality is obtained by subtracting and adding (M − k)(N − M + 1) from the numerator of the penultimate B expression, Cand then simplifying algebraically. It follow that Lk (M) takes its maximum value at M = (N + 1)k/n . Hence, the maximum likelihood estimate of M is given by F E B C X D = (N + 1)pˆ . M = (N + 1) n

b) In this case, N = 80,000, n = 100, and X = 38. Hence, from part (a), the maximum likelihood estimate of the student’s vocabulary is F E 38 D = ⌊30,400.38⌋ = 30,400. M = (80,000 + 1) · 100 5.170 a) The random variable X has PMF given by pX (x) = p(1 − p)x ,

x = 0, 1, 2, . . . ,

and pX (x) = 0 otherwise. Given that X = n, the random variable Y has the binomial distribution with parameters n and p. Let k be a nonnegative integer. Applying the law of total probability yields P (Y = k) =

/

x ∞ /

P (X = x)P (Y = k | X = x) =

∞ / n=0

P (X = n)P (Y = k | X = n)

' ( ∞ ' ( n k pk+1 / n = p(1 − p)n p (1 − p)n−k = (1 − p)2n . k k k (1 − p) n=k n=k Now we let m = n + 1, r = k + 1, and p0 = 1 − (1 − p)2 = p(2 − p). Then ( ∞ ' / pr m−1 P (Y = k) = (1 − p0 )m−1 (1 − p)r−1 m=r r − 1 ' (' ( ∞ ' ( pr (1 − p0 )r−1 / m−1 r = p0 (1 − p0 )m−r . r − 1 p0r (1 − p)r−1 m=r

5-90

CHAPTER 5

Discrete Random Variables and Their Distributions

The terms in the last sum are from the PDF of a negative binomial random variable with parameters r and p0 , summed over all possible values of such a random variable. Hence, the sum is 1. Thus, (' ( (1 − p0 )r−1 pr P (Y = k) = p0r (1 − p)r−1 ' (' ( ' (' ( p p p(1 − p0 ) r−1 p(1 − p0 ) k = = . p0 (1 − p) p0 p0 (1 − p) p0 '

Simple algebra shows that

Consequently,

1 p = p0 2−p pY (y) =

'

1 2−p

and ('

p(1 − p0 ) 1 =1− . p0 (1 − p) 2−p

1 1− 2−p

(y

y = 0, 1, 2, . . . ,

,

and pY (y) = 0 otherwise. b) Let Z = Y + 1. We note that the function g(y) = y + 1 is one-to-one. Solving for y in the equation z = y + 1, we see that g −1 (z) = z − 1. Applying Equation (5.54) on page 248 and the result of part (a) yields, for z ∈ N , ' (' (z−1 % −1 & 1 1 pZ (z) = pg(Y ) (z) = pY g (z) = pY (z − 1) = 1− . 2−p 2−p % & Thus, Y + 1 ∼ G 1/(2 − p) .

5.171 a) We consider a “success” to be a defective item. From independence, we know that repeated observations of success (defective) or failure (nondefective) constitute Bernoulli trials with success probability p. Hence, the time (item number) of the kth success (defective item) has the negative binomial distribution with parameters k and p. Consequently, the probability that the nth item is the kth defective one is ' ( n−1 k n = k, k + 1, . . . . p (1 − p)n−k , k−1 b) Here we consider a “success” to be a rejected item (i.e., defective and inspected). From independence, we know that repeated observations of success or failure constitute Bernoulli trials with success probability pq. Hence, the time (item number) of the kth success (rejected item) has the negative binomial distribution with parameters k and pq. Consequently, the probability that the nth item is the kth rejected one is ' ( n−1 n = k, k + 1, . . . . (pq)k (1 − pq)n−k , k−1

c) We give two solutions.

Method 1: Let Z denote the number of items until the first rejection. From the law of partitions, we have, for each nonnegative integer k, that P (W = k) =

/ n

P (W = k, Z = n) =

∞ /

n=k+1

P (W = k, Z = n).

Review Exercises for Chapter 5

5-91

Event {W = k, Z = n} occurs if and only if among the first n − 1 items exactly k are defective, none of these k defective items are inspected, and the nth item is inspected and defective. The probability of this happening is ' ( n−1 k P (W = k, Z = n) = p (1 − p)n−1−k · (1 − q)k · pq k ' ( n − 1 k+1 = q(1 − q)k p (1 − p)n−1−k . k Therefore, letting r = k + 1, we get ( ∞ ' / n − 1 k+1 k P (W = k) = q(1 − q) p (1 − p)n−1−k k n=k+1 ( ∞ ' / n−1 r = q(1 − q) p (1 − p)n−r = q(1 − q)k , r − 1 n=r k

where, in the last equation, we used the fact that the values of the PMF of a negative binomial random variable with parameters r and p must sum to 1. Consequently, and pW (w) = 0 otherwise.

pW (w) = q(1 − q)w ,

w = 0, 1, 2, . . . ,

Method 2: Let k be a nonnegative integer. The number of defective items until the first rejection equals k if and only if the first k defective items are not inspected and the next defective item is inspected, which has probability (1 − q)k q. Consequently, pW (w) = q(1 − q)w ,

w = 0, 1, 2, . . . ,

and pW (w) = 0 otherwise. d) Let Z = W + 1. We note that the function g(w) = w + 1 is one-to-one. Solving for w in the equation z = w + 1, we see that g −1 (z) = z − 1. Applying Equation (5.54) on page 248 and the result of part (c) yields, for z ∈ N , % & pZ (z) = pg(W ) (z) = pW g −1 (z) = pW (z − 1) = q(1 − q)z−1 .

Thus, W + 1 ∼ G(q) and represent the number of defective items up to and including the first rejection. e) In the preceding (first) scenario, a defective item is rejected if and only if it is inspected, which has probability q. Here, in the second scenario, a defective item is rejected if and only if it is inspected and detected as defective, which has probability qr. Consequently, in the second scenario, the answers to parts (b)–(d) are obtained from those in the first scenario by replacing q by qr. 5.172 Let X denote the number of 6s in four throws of one balanced die and let Y denote the number of pairs of 6s in 24 throws of two balanced dice. Then X ∼ B(4, 1/6) and Y ∼ B(24, 1/36). a) From the complementation rule, ' ( 4 P (X ≥ 1) = 1 − P (X = 0) = 1 − (1/6)0 (5/6)4−0 = 1 − (5/6)4 = 0.518 0 and ' ( 24 P (Y ≥ 1) = 1 − P (Y = 0) = 1 − (1/36)0 (35/36)24−0 = 1 − (35/36)24 = 0.491. 0

Hence, the gambler is not right. The chance of getting at least one 6 when one balanced die is thrown four times exceeds that of getting at least one pair of 6s when two balanced dice are thrown 24 times.

5-92

CHAPTER 5

Discrete Random Variables and Their Distributions

b) From the complementation rule, P (X ≥ 2) = 1 − P (X < 2) = 1 − and

1 ' ( / 4 k=0

k

(1/6)k (5/6)4−k = 0.132

1 ' ( / 24 (1/36)k (35/36)24−k = 0.143. P (Y ≥ 2) = 1 − P (Y < 2) = 1 − k k=0

5.173 Let C and D denote the events that the coin and die experiments are chosen, respectively. Given that D occurs, X ∼ G(1/6), whereas, given that C occurs, X ∼ G(1/2). Also, P (C) = P (D) = 1/2. a) From the lack-of-memory property of the geometric distribution, P (X = 8 | {X > 5} ∩ D) = (1/6)(5/6)3−1 =

25 . 216

b) From the lack-of-memory property of the geometric distribution, P (X = 8 | {X > 5} ∩ C) = (1/2)(1/2)3−1 =

1 . 8

c) From Bayes’s rule P (D | X > 5) = =

P (D)P (X > 5 | D) P (C)P (X > 5 | C) + P (D)P (X > 5 | D) (1/2)P (X > 5 | D) = (1/2)P (X > 5 | C) + (1/2)P (X > 5 | D)

Now, by Proposition 5.10 on page 231, P (X > 5 | D) = (5/6)5 Hence, P (D | X > 5) =

and

1 . P (X > 5 | C) 1+ P (X > 5 | D)

P (X > 5 | C) = (1/2)5 .

1 3125 = . 5 3368 (1/2) 1+ (5/6)5

d) From the conditional form of the complementation rule and part (c), P (C | X > 5) = 1 − P (D | X > 5) = 1 −

3125 243 . = 3368 3368

e) Applying the conditional form of the law of total probability, Exercise 4.20, and the results of parts (a)–(d), we get P (X = 8 | X > 5)

= P{X>5} (X = 8) = P{X>5} (C)P{X>5} (X = 8 | C) + P{X>5} (D)P{X>5} (X = 8 | D) = P (C | X > 5)P (X = 8 | {X > 5} ∩ C) + P (D | X > 5)P (X = 8 | {X > 5} ∩ D) =

243 1 3125 25 · + · = 0.116. 3368 8 3368 216

5-93

Review Exercises for Chapter 5

f) From the law of total probability, P (X = 3) = P (C)P (X = 3 | C) + P (D)P (X = 3 | D) ' ( ' (2 ' ( ' (2 1 1 1 1 1 5 = · + · = 0.120. 2 2 2 2 6 6

g) From parts (e) and (f),

P (X = 5 + 3 | X > 5) = P (X = 8 | X > 5) = 0.116 ̸= 0.120 = P (X = 3).

Therefore, X does not have the lack-of-memory property. h) If five failures have occurred (i.e., the first five trials resulted in failure), then it is more likely that the die experiment was performed than the coin experiment, because the success probability in the former is much smaller than that in the latter (1/6 vs. 1/2). Therefore, you should feel less sure of a success on the sixth trial, given these five failures, than you were of a success on the first trial, when both experiments were equally likely to be performed. i) Knowing that the die experiment has been chosen, X ∼ G(1/6). In particular, then, X has the lackof-memory property. Therefore, the probability of a success on the sixth trial, given these five failures, is equal to the probability of a success on the first trial. 5.174 Let F and B denote the events that the balanced and biased coin is in your hand, respectively. We use P1 and P2 for probabilities involving the first and second experiments, respectively. For a coin with probability p of a head, let X denote the number of heads in six tosses and let Y denote the number of tosses until the third head. Then X ∼ B(6, p) and Y ∼ N B(3, p). a) We note that, for the first experiment, event E occurs if and only if exactly three heads occur in the six tosses. Hence, ' (' (3 ' (3 ' (' (3 ' (3 6 1 1 5 6 4 1 256 P1 (E | F ) = = and P1 (E | B) = = . 3 2 2 16 3 5 5 3125

b) We note that, for the second experiment, event E occurs if and only if the third head occurs on the sixth toss. Hence, ' (' ( ' ( ' (' ( ' ( 6−1 1 3 1 3 5 6−1 4 3 1 3 128 P2 (E | F ) = = = and P2 (E | B) = . 2 2 2 2 32 5 5 3125

c) From Bayes’s rule and part (a), P1 (F | E) =

5 (0.4) · 16 P1 (F )P1 (E | F ) = 5 P1 (F )P1 (E | F ) + P1 (B)P1 (E | B) (0.4) · 16 + (0.6) ·

256 3125

=

15,625 . 21,769

=

15,625 . 21,769

d) From Bayes’s rule and part (b),

5 (0.4) · 32 P2 (F )P2 (E | F ) P2 (F | E) = = 5 P2 (F )P2 (E | F ) + P2 (B)P2 (E | B) + (0.6) · (0.4) · 32

128 3125

e) As we see, the answers to parts (c) and (d) are identical. This equality is a result of the fact that P2 (E | F ) = P1 (E | F )/2 and P2 (E | B) = P1 (E | B)/2. Consequently, when we apply Bayes’s rule in part (d), we get the same answer as in part (c).

BETA

Chapter 6

Jointly Discrete Random Variables 6.1

Joint and Marginal Probability Mass Functions: Bivariate Case

Basic Exercises ! !∞ 2 !∞ 2x−2 = p2 2 x 6.1 a) P (X = Y ) = x P (X = Y = x) = x=1 p (1 − p) x=0 ((1 − p) ) = p 1 p2 × 1−(1−p)2 = 2−p , which represents the probability that both electrical components have the same lifetimes. b) (1 − p)/(2 − p), since P (X > Y ) =

"

P (X = x, Y < x) =

x

=

"

P (X = x, Y ≤ x − 1) =

x

∞ " x−1 "

2

x+y−2

p (1 − p)

x=2 y=1

= p2

#" $ ∞ x−2 " x−1 y (1 − p) =p (1 − p)

(1 − p)x−1 ·

x=2

P (X = x, Y = y)

x=2 y=1

2

x=2

∞ "

∞ " x−1 "

y=0

1 − (1 − p)x−1 1−p = . 1 − (1 − p) 2−p

This represents the probability that the first component outlasts the second. c) Because the joint PMF of X and Y is symmetric in x and y (since P (X = x, Y = y) = P (X = y, Y = x) for all x, y) we must have P (Y > X) = P (X > Y ). p , then d) Since 1 = P (Ω) = P (X > Y ) + P (X = Y ) + P (X < Y ) = 2P (X > Y ) + 2−p 1−

p 1−p = 2P (X > Y ), implying that = P (X > Y ). 2−p 2−p

N ({X = 2, Y = 1}) 8 = = 0.2. The probability that N (Ω) 40 a randomly selected student has exactly 2 siblings, where one of the siblings is a sister and the other is a brother. b) The joint PMF of X and Y is given by: pX,Y (0, 0) = pX,Y (1, 0) = pX,Y (2, 1) = 0.2, 6.2 a) pX,Y (2, 1) = P (X = 2, Y = 1) =

155

CHAPTER 6. Jointly Discrete Random Variables

pX,Y (1, 1) = 0.225, pX,Y (2, 0) = pX,Y (3, 1) = pX,Y (4, 3) = 0.025, pX,Y (2, 2) = pX,Y (3, 2) = 0.05, and pX,Y (x, y) = 0 for all other x, y. Sisters, Y 0 1 2 3 pX (x)

c) Siblings, X

0

0.2

0

0

0

0.2

1

0.2

0.225

0

0

0.425

2

0.025

0.2

0.05

0

0.275

3

0.0

0.025

0.05

0

0.075

4

0

0

0

0.025

0.025

pY (y)

0.425

0.45

0.1

0.025

1

d) Using the law of partitions (Proposition 2.8), we have for each x: pX (x) = P (X = x) =

3 "

P (X = x, Y = k) = pX,Y (x, 0) + pX,Y (x, 1) + pX,Y (x, 2) + pX,Y (x, 3).

k=0

e) If the student has no brothers, then the number of siblings, that the student has, should equal to the number of sisters the student has: P (X = Y ) = pX,Y (0, 0) + pX,Y (1, 1) + pX,Y (2, 2) + pX,Y (3, 3) = 0.475 f ) The probability that the student has no sisters is : P (Y = 0) = pY (0) = 0.425 1 6.3 a) Since X is chosen uniformly from the 10 decimal digits, P (X = x) = 10 for all x ∈ {0, 1, · · · , 9} and P (X = x) = 0 otherwise. Once X has been chosen, Y is uniformly chosen from the remaining 9 decimal digits, thus P (Y = y | X = x) = 91 for y ∈ {0, 1, · · · , 9} \ {x} and P (Y = y | X = x) = 0 otherwise. Thus, upon using the multiplication rule, one obtains that 1 P (X = x, Y = y) = P (Y = y|X = x)P (X = x) = 90 for all x, y ∈ {0, 1, · · · , 9} such that x ̸= y, and P (X = x, Y = y) = 0 otherwise.! 9 b) Using the FPF, P (X = Y ) = i=0 P (X = Y = i) = 0. However, the result is easily obtained upon realizing that by definition , Y can never equal X and ! thus P (X = Y ) = 0. !9 of!Yi−1 9 !i−1 c) Using the FPF, P (X > Y ) = i=1 j=0 P (X = i, Y = j) = i=1 j=0 (1/90) = !9 1 i=1 (i/90) = 2 . The result is easy to obtain directly from P (X = Y ) = 0 and symmetry of the distribution of X, Y :

1 1 = P (Ω) = P (X > Y ) + P (X < Y ) = 2P (X > Y ), thus, P (X > Y ) = . 2 ! 1 9 = 10 . d) Using Proposition 6.2, P (Y = y) = x∈{0,1,··· ,9}\{y} P (X = x, Y = y) = 90 1 e) Using symmetry, each y is equally likely to be chosen, and thus P (Y = y) = 10 . IE 0 6.4 a) IF

0 1 pI E

P (E c

∩ F c)

P (E c ∩ F ) P (E c )

1

pIF (x) F c)

P (F c )

P (E ∩ F ) P (E)

P (F ) 1

P (E ∩

6.1 Joint and Marginal Probability Mass Functions: Bivariate Case

156

b) Since E and E c partition Ω (as do F and F c ), the law of partitions (Proposition 2.8) and definition of indicator random variables yield the desired result. c) An indicator random variable indicates whether a specified event occurs. Knowledge of the probabilities of two different events occurring does not in general provide any knowledge of the probability of both events occurring. d) The joint PMF of two indicator random variables can be determined by the marginal PMFs only when the two events in question are either mutually exclusive or independent. 6.5 By partitioning, for arbitrary x0 ∈ R for discrete random variables X and Y we have that " " " pX (x0 ) = P (X = x0 ) = P (X = x0 , Y ∈ R) = P (X = x, Y = y) = pX,Y (x0 , y), x=x0 y∈R

y∈R

which is Equation (6.3). Similarly, for arbitrary y0 ∈ R, " " " pY (y0 ) = P (Y = y0 ) = P (X ∈ R, Y = y0 ) = P (X = x, Y = y) = pX,Y (x, y0 ) x∈R y=y0

x∈R

which is Equation (6.4). 1 6.6 a) The joint PMF of X and Y is given by: pX,Y (x, y) = 36 for x = y ∈ {1, · · · , 6}, and 2 pX,Y (x, y) = 36 for all x < y, where x, y ∈ {1, · · · , 6}, and pX,Y (x, y) = 0 otherwise.

b) The required table is given by:

Smaller value, X

Larger value, Y 3 4

1

2

5

6

pX (x)

1

0.0278

0.0556

0.0556

0.0556

0.0556

0.0556

0.3058

2

0

0.0278

0.0556

0.0556

0.0556

0.0556

0.2502

3

0

0

0.0278

0.0556

0.0556

0.0556

0.1946

4

0

0

0

0.0278

0.0556

0.0556

0.1390

5

0

0

0

0

0.0278

0.0556

0.0834

6

0

0

0

0

0

0.0278

0.0278

pY (y)

0.0278

0.0834

0.1390

0.1946

0.2502

0.3058

1

c) P (X = Y ) = pX,Y (1, 1) + pX,Y (2, 2) + . . . + pX,Y (6, 6) = 0.1667, which is the probability that both times the die comes up the same number. d) P (X < Y ) = 1 − P (X = Y ) = 0.8333, which is the probability that the value of the larger of the two faces showing is strictly larger than the value of the smaller of the two faces showing. The solution is easily computed since P (X ≤ Y ) = 1. e) P (X > Y ) = 0. By definition of the variables X and Y , X can not be strictly larger than Y . f ) P (Y = 2X) = pX,Y (1, 2) + pX,Y (2, 4) + pX,Y (3, 6) = 0.1667, which is the probability that the larger value of the two faces showing is exactly twice the value of the smaller of the two faces. g) P (Y ≥ 2X) = pX,Y (1, 2) + pX,Y (1, 3) + pX,Y (1, 4) + pX,Y (1, 5) + pX,Y (1, 6) + pX,Y (2, 4) + pX,Y (2, 5) + pX,Y (2, 6) + pX,Y (3, 6) = 0.5, which represents the probability that the larger of the two faces is at least twice the value of the smaller of the two faces showing. h) P (Y − X ≥ 3) = pX,Y (1, 4) + pX,Y (1, 5) + pX,Y (1, 6) + pX,Y (2, 5) + pX,Y (2, 6) + pX,Y (3, 6) = 0.3333, which represents the probability that the difference between the larger of the two faces and the smaller of the two faces is at least 3.

157

CHAPTER 6. Jointly Discrete Random Variables

6.7 a) pX,Y (x, y) =

#

3 x, y, 3 − x − y

$ # $x # $y # $3−x−y 1 1 1 4 4 2

for integer 0 ≤ x, y ≤ 3 where% x + y ≤ &3, and pX,Y (x, y) = 0 otherwise. The result is obtained 3 upon noting that there are x,y,3−x−y ways to select the order (of political affiliations) in which x Greens, y Democrats and (3 − x − y) Republicans are chosen into the sample of size 3 without replacement. Moreover, the probability of every such ordering is the same and equals to ( 82 )x ( 28 )y ( 48 )3−x−y . Democrats, Y 0 1 2 0 b)

Green, X

1 2 3 pY (y)

1 8 3 16 3 32 1 64 27 64

3

pX (x)

3 16 3 16 3 64

3 32 3 64

1 64

0

0

0

0

0

27 64 27 64 9 64 1 64

27 64

9 64

1 64

1

0

3 3 c) Using the FPF, P (X > Y ) = pX,Y (1, 0) + pX,Y (2, 0) + pX,Y (3, 0) + pX,Y (2, 1) = 16 + 32 + 3 11 1 + = , i.e. the probability, that there are more Greens than Democrats in the sample, is 64 64 32

equal to 11/32. d) If no Republicans are chosen, then every person in the sample must either be a Green or a Democrat, thus the event that no Republicans are chosen is represented by {X + Y = 3}. e) Using the FPF and (a), P (X + Y = 3) = pX,Y (3, 0) + pX,Y (2, 1) + pX,Y (1, 2) + pX,Y (0, 3) = 3 3 1 1 1 64 + 64 + 64 + 64 = 8 . % & f ) Let Z be the number of Republicans in the sample. Then Z ∼ B 3, 12 , and # $ # $0 # $3 # $3 3 1 1 1 1 P (Z = 0) = = = 0 2 2 2 8 & % & % g) X ∼ B 3, 14 , i.e. pX (x) = x3 (0.25)x (0.75)3−x for x ∈ {0, 1, 2, 3}, and pX (x) = 0 for all x∈ / {0, 1, 2, 3}. % & % & h) X + Y ∼ B 3, 21 , i.e. pX+Y (u) = u3 /8 for all u ∈ {0, 1, 2, 3}, and pX+Y (u) = 0 otherwise. i) Using the FPF and (a),

P (1 ≤ X + Y ≤ 2) = pX,Y (1, 0) + pX,Y (2, 0) + pX,Y (0, 1) + pX,Y (1, 1) + pX,Y (0, 2) =

3 3 3 3 3 3 + + + + = 16 32 16 16 32 4

j) Using (h), # $ # $ 3 1 3 1 3 P (1 ≤ X + Y ≤ 2) = P (X + Y = 1) + P (X + Y = 2) = + = . 1 8 2 8 4

6.1 Joint and Marginal Probability Mass Functions: Bivariate Case

158

6.8 a) The joint PMF of X and Y is given by: pX,Y (x, y) =

%

&%2 &%2& 4 3−(x+y) x y %8& 3

for (x, y) ∈ {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 1)}, and pX,Y (x, y) = 0 otherwise. 0 b) Green, X

Democrats, Y 1 2

pX (x)

0

0.0714

0.2143

0.0714

0.3571

1

0.2143

0.2857

0.0357

0.5357

2

0.0714

0.0357

0

0.1071

pY (y)

0.3571

0.5357

0.1071

1

c) P (X > Y ) = pX,Y (1, 0) + pX,Y (2, 0) + pX,Y (2, 1) = 0.3214, which represents the probability that more Greens are chosen than Democrats. d) {X + Y = 3}. e) Let E represent the event that no Republicans are chosen. Then, by (d), E = {(1, 2), (2, 1)}. Using the FPF and (a), P (E) = pX,Y (1, 2) + pX,Y (2, 1) = 0.0714. f ) Let R represent the number of Republicans in the sample. Then R is H(N = 8, n = 3, p = 0.5), i.e. the probability distribution of R is given by: pR (r) =

%4&% r

4 %83−r & 3

&

, r ∈ {0, 1, 2, 3}.

4 Thus, pR (0) = 56 ≈ 0.0714 g) X is H(N = 8, n = 3, p = 1/4) and the PMF of X is given by:

pX (x) =

%2 &% x

6 & 3−x %8& 3

for x = 0, 1, 2, pX (x) = 0 for all other values. h) X + Y is H(N = 8, n = 3, p = 0.5). The PMF for X + Y is given as: pX+Y (z) =

%4&% z

4 %83−z & 3

&

for z = 0, 1, 2, 3, pX+Y (z) = 0 for all other values. i) Let F represent the event that the sum of the total number of Green and Democrats chosen are either 1 or 2. Then, F = {(0, 1), (0, 2), (1, 0), (1, 1), (2, 0)}. Using the FPF and part (a), P (1 ≤ X + Y ≤ 2) = pX,Y (0, 1) + pX,Y (0, 2) + pX,Y (1, 0) + pX,Y (1, 1) + pX,Y (2, 0) = 0.8571 %4&%4& %4&%4& j) Using (h), P (1 ≤ X + Y ≤ 2) = pX+Y (1) + pX+Y (2) =

%8&2 + 2%8&1 ≈ 0.8571

1

3

6.9 a) Using Proposition 5.9, X ∼ G(p). b) Using Proposition 5.13, Y ∼ N B(2, p).

3

159

CHAPTER 6. Jointly Discrete Random Variables

c) pX,Y (x, y) = p2 (1 − p)y−2 for x, y ∈ N , x < y and pX,Y (x, y) = 0 otherwise. The answer follows from 2 x−1+(y−x−1) P (X = x, Y = y) = P (F ...F S F ...F S) = p (1− p) = p2 (1− p)y−2 , x < y; x, y ∈ N . ' () * ' () * x−1

y−x−1

d) For X, using Proposition 6.2, for x ∈ N , P (X = x) =

"

P (X = x, Y = y) =

y

= p2 (1 − p)x−1

∞ "

p2 (1 − p)y−2

y=x+1

∞ "

(1 − p)y =

y=0

p2 (1 − p)x−1 = p(1 − p)x−1 p

and pX (x) = 0 otherwise, which agrees with (a). For Y , using Proposition 6.2, for y = 2, 3 . . ., P (Y = y) =

"

P (X = x, Y = y) =

x

y−1 "

2

y−2

p (1 − p)

2

y−2

= p (1 − p)

x=1

2

y−2

= (y − 1)p (1 − p)

=

#

y−1 "

1

x=1

$ y−1 2 p (1 − p)y−2 2−1

and pY (y) = 0 otherwise, which agrees with (b). e) Y − X is the number of trials between the first and the second success. f ) Once the first success has occurred, since each individual trial is independent, the number of additional trials needed before the second success will have the same distribution as the number of trials needed before the first success. thus Y − X ∼ G(p) g) For z ∈ N P (Y − X = z) =

"

P (X = x, Y = x + z) =

x

2

z−1

= p (1 − p)

∞ "

2

x+z−2

p (1 − p)

2

z−2

= p (1 − p)

x=1

∞ " (1 − p)x x=1

∞ " p2 (1 − p)z−1 p2 (1 − p)z−1 = = p(1 − p)z−1 (1 − p)x = 1 − (1 − p) p x=0

and P (Y − X = z) = 0 otherwise, which agrees with (f).

6.10 a) The sample space for the head-tail outcomes in three tosses of the coin is: {(H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, H, H), (T, H, T ), (T, T, H), (T, T, T )} b) The joint PMF of X and Y is given by: Total winnings after 2nd toss, X −2 0 2 Total winnings after 3rd toss, Y

−3

0.125

0

0

−1

0.125

0.25

0

1

0

0.25

0.125

3

0

0

0.125

6.1 Joint and Marginal Probability Mass Functions: Bivariate Case

160

c) Let E be the event X > Y . Then E = {(−2, −3), (0, −1), (2, 1)}. Using a) and the FPF, P (E) = pX,Y (−2, −3) + pX,Y (0, −1) + pX,Y (2, 1) = 0.5 d) First, we note that X can never equal Y . Thus X > Y or X < Y . Now, once we have performed the first two tosses, the value of X is fixed, and Y has a 50-50 chance of being $1 larger or $1 smaller than X depending on the result of the third coin. Thus P (X > Y ) = P (X < Y ) = 0.5. e) The PMF of X is given as follows: x −2 0 2 pX (x)

1 4

1 2

1 4

f ) X =# of heads −# of tails= Z − (2 − Z) = −2 + 2Z, where Z % is2 the & 1 number of heads in the first two tosses. Since Z ∼ B(2, 0.5), it follows that pX (x) = 1+x/2 4 if x = −2, 0, 2, and pX (x) = 0 otherwise. g) The PMF of Y is given as follows: y −3 −1 1 3 pY (y)

1 8

3 8

3 8

1 8

h) Y = W −(3−W ) = −3+2W , where%W is the & 1 number of heads in the first three tosses. Since 3 W ∼ B(3, 0.5), it follows that pY (y) = (3+y)/2 8 if y = −3, −1, 1, 3, and pY (y) = 0 otherwise. i) pY −X (−1) = pY −X (1) = 0.5 j) pY −X (−1) = pX,Y (−2, −3) + pX,Y (0, −1) + pX,Y (2, 1) = 81 + 14 + 81 = 0.5 pY −X (1) = pX,Y (−2, −1) + pX,Y (0, 1) + pX,Y (2, 3) = 18 + 14 + 81 = 0.5 6.11a) The sample space for the head-tail outcomes in three tosses of the coin is: {(H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, H, H), (T, H, T ), (T, T, H), (T, T, T )} b) The joint PMF of X and Y is given by: Total $ after 2nd toss, X −2 0 2 Total $ after

3rd

toss, Y

−3

q3

0

0

−1

pq 2

2pq 2

0

1

0

2p2 q

p2 q

3

0

0

p3

c) Let E be the event X > Y . Then E = {(−2, −3), (0, −1), (2, 1)}. Using a) and the FPF, P (E) = pX,Y (−2, −3) + pX,Y (0, −1) + pX,Y (2, 1) = q 3 + 2pq 2 + p2 q = q(q 2 + 2pq + p2 ) = q(q + p)2 = q(1)2 = q d) Event {X > Y } means that the third toss is a tail. Thus P (X > Y ) = q. e) The PMF of X is given as follows: x −2 0 2 pX (x)

q2

2pq

p2

f ) X = −2 + 2Z where is& the number of heads in the first two tosses. Since Z ∼ B(2, p), it % Z 2 follows that pX (x) = 1+x/2 p1+x/2 q 1−x/2 if x = −2, 0, 2, and pX (x) = 0 otherwise. g) The PMF of Y is given as follows: y −3 −1 1 3 pY (y)

1 8

3 8

3 8

1 8

161

CHAPTER 6. Jointly Discrete Random Variables

h) Y = −3 + 2W where% W is the number of heads in the first three tosses. Since W ∼ B(3, p), & (3+y)/2 3 it follows that pY (y) = (3+y)/2 p q (3−y)/2 if y = −3, −1, 1, 3, and pY (y) = 0 otherwise. i) pY −X (−1) = q, pY −X (1) = p j) pY −X (−1) = pX,Y (−2, −3) + pX,Y (0, −1) + pX,Y (2, 1) = q 3 + 2pq 2 + p2 q = q(q 2 + 2pq + p2 ) = q(q + p)2 = q(1)2 = q pY −X (1) = pX,Y (−2, −1) + pX,Y (0, 1) + pX,Y (2, 3) = pq 2 + 2p2 q + p3 = p(q 2 + 2pq + p2 ) = p(q + p)2 = p 6.12 a) The sample space for the head-tail outcomes in four tosses of the coin is: {(H, H, H, H), (H, H, H, T ), (H, H, T, H), (H, H, T, T ), (H, T, H, H), (H, T, H, T ), (H, T, T, H), (H, T, T, T ), (T, H, H, H), (T, H, H, T ), (T, H, T, H), (T, H, T, T ), (T, T, H, H), (T, T, H, T ), (T, T, T, H), (T, T, T, T )} b) The joint PMF of X and Y is given by: $ after 3rd toss, X −3 −1 1 3 −4 $ after

4th

toss, Y

−2 0

1 16 1 16 3 16

0

0

0

3 16 3 16

0

0

0

0 1 16 1 16

2

0

0

3 16

4

0

0

0

c) Let E be the event X > Y . Then E = {(−3, −4), (−1, −2), (1, 0), (3, 2)}. Using a) and the FPF, P (E) = pX,Y (−3, −4) + pX,Y (−1, −2) + pX,Y (1, 0) + pX,Y (3, 2) = 0.5 d) Again, first, we note that X can never equal Y . Thus X > Y or X < Y . Now, once we have performed the first three tosses, the value of X is fixed, and Y has a 50-50 chance of being $1 larger or $1 smaller than X depending on the result of the fourth coin. Thus P (X > Y ) = P (X < Y ) = 0.5. e) The PMF of X is given as follows: x −3 −1 1 3 pX (x)

1 8

3 8

3 8

1 8

f ) X = −3 + 2Z where %Z is the& number of heads in the first three tosses. Since Z ∼ B(3, 0.5), 3 1 it follows that pX (x) = (3+x)/2 8 if y = −3, −1, 1, 3, and pY (y) = 0 otherwise. g) The PMF of Y is given as follows: y −4 −2 0 2 4 pY (y)

1 16

1 4

3 8

1 4

1 16

h) Y = −4 + 2W where%W is &the number of heads in the first four tosses. Since W ∼ B(4, 0.5), 4 1 it follows that pY (y) = 2+y/2 16 if y = −4, −2, 0, 2, 4, and pY (y) = 0 otherwise. i) pY −X (−1) = pY −X (1) = 0.5 1 3 3 1 j) pY −X (−1) = pX,Y (−3, −4) + pX,Y (−1, −2) + pX,Y (1, 0) + pX,Y (3, 2) = 16 + 16 + 16 + 16 = 0.5 3 3 1 1 pY −X (−1) = pX,Y (−3, −2) + pX,Y (−1, 0) + pX,Y (1, 2) + pX,Y (3, 4) = 16 + 16 + 16 + 16 = 0.5 6.13 a) Using Proposition 6.2, the marginal PMF of X is: P (X = x) =

" y

P (X = x, Y = y) =

∞ " y=0

e−(λ+µ)

λx µy x!y!

6.1 Joint and Marginal Probability Mass Functions: Bivariate Case

162

∞ e−λ λx " e−µ µy e−λ λx = = x! y! x! y=0 ' () * =1

for x = 0, 1, . . . and P (X = x) = 0 otherwise. Thus X ∼ P(λ). Using Proposition 6.2 and the same argument, the marginal PMF of Y is: P (Y = y) =

"

P (X = x, Y = y) =

x

=

∞ "

e−(λ+µ)

x=0

λx µy x!y!

∞ e−µ µy " e−λ λx e−µ µy = y! x! y! 'x=0 () * =1

for y = 0, 1, . . . and P (Y = y) = 0 otherwise. Thus Y ∼ P(µ). b) The joint PMF is the product of the two marginal PMF’s. 6.14) Let p : R2 → R be defined by p(x, y) = pX (x)pY (y). Since pX (x) and pY (y) are both PMF’s, pX (x) ≥ 0 for all x ∈ R and pY (y) ≥ 0 for all y ∈ R, thus p(x, y) ≥ 0 for all (x, y) ∈ R2 . For the second property, we note that for any y ∈ R, if pY (y) = 0, then p(x, y) = 0. Therefore, {(x, y) ∈ R2 : p(x, y) ̸= 0} ⊂ {y ∈ R : pY (y) ̸= 0} and since the subset of any countable set is countable and {y ∈ R : pY (y) ̸= 0} is countable since pY (y) is a PMF, we must have that {(x, y) ∈ R2 : p(x, y) ̸= 0} is countable. Finally, since pX (x) and pY (y) are both PMF’s, "" "" " " p(x, y) = pX (x)pY (y) = pX (x) pY (y) = 1 (x,y)

(x,y)

x

y

Thus, p(x, y) satisfies the three properties and is a joint PMF. 6.15 a) Using Proposition 6.2, the PMF for X is: " " " P (X = x) = P (X = x, Y = y) = q(x)r(y) = q(x) r(y)

for x, y ∈ R

and, the PMF for Y is: " " " P (Y = y) = P (X = x, Y = y) = q(x)r(y) = r(y) q(x)

for x, y ∈ R

y

y

x

y

x

x

!! b) From Proposition 6.1 c), we have that (x,y) pX,Y (x, y) = 1 which implies that "" "" " " pX,Y (x, y) = q(x)r(y) = q(x) r(y) = 1 x

y

x

y

x

Thus we have pX,Y (x, y) = q(x)r(y) = q(x)r(y)

" x

= c) If

!

x q(x)

=

!

y

+

q(x)

" y

,+

r(y)

r(y)

" x

,

q(x)

y

q(x)

"

r(y)

y

= pX (x)pY (y)

r(y) = 1 then q and r would be the marginal PMFs of X and Y respectively.

163

CHAPTER 6. Jointly Discrete Random Variables

Theory Exercises 6.16 Clearly, pX,Y (x, y) = P (X = x, Y = y) ≥ 0 for all x, y ∈ R. Next note that pX (x) = 0 implies that 0 ≤ pX,Y (x, y) = P ({X = x} ∩ {Y = y}) ≤ P (X = x) = pX (x) = 0, thus, {(x, y) : pX (x) = 0} ⊂ {(x, y) : pX,Y (x, y) = 0}. Therefore, {(x, y) : pX,Y (x, y) ̸= 0} ⊂ ({(x, y) : pX (x) ̸= 0}). Similarly one shows: {(x, y) : pX,Y (x, y) ̸= 0} ⊂ ({(x, y) : pY (y) ̸= 0}). Thus, {(x, y) : pX,Y (x, y) ̸= 0} ⊂ {(x, y) : pX (x) ̸= 0, pY (y) ̸= 0} = {x : pX (x) ̸= 0} × {y : pY (y) ̸= 0}. Since the cartesian product of two countable sets is a countable set, and the subset of a countable set is a countable set, then {(x, y) : pX,Y (x, y) ̸= 0} is countable. Finally note that, since X, Y are discrete, we have that " " 1 = P (Ω) = P ((X, Y ) ∈ {(x, y) : pX,Y (x, y) ̸= 0}) = pX,Y (x, y) = pX,Y (x, y). (x,y): pX,Y (x,y)̸=0

(x,y)

6.17!Suppose that (a) p(x, y) ≥ 0 for all x, y ∈ R2 ; (b) {(x, y) ∈ R2 : p(x, y) ̸= 0} is countable; (c) (x,y) p(x, y) = 1. Let Ω = {(x, y) ∈ R2 : p(x, y) ̸= 0}. Then Ω is countable and, in view of conditions (a),(c) and Proposition 2.3, there exists a unique probability measure P such that P ({(ω1 , ω2 )}) = p(ω1 , ω2 ) for all (ω1 , ω2 ) ∈ Ω. Define X((ω1 , ω2 )) = ω1 and Y ((ω1 , ω2 )) = ω2 for all (ω1 , ω2 ) ∈ Ω. Then X and Y are discrete random variables and pX,Y (x, y) = P ({(ω1 , ω2 ) ∈ Ω : X(ω1 , ω2 ) = x and Y (ω1 , ω2 ) = y}) p(x, y), if (x, y) ∈ Ω, = P ({(x, y)}) = 0, if (x, y) ∈ / Ω. On the other hand, note that p(x, y) = 0 for all (x, y) ∈ / Ω, by definition of Ω. Therefore, pX,Y (x, y) = p(x, y) for all (x, y) ∈ R2 . 6.18 By the law of partitioning, {X = x} = ∪y {X = x, Y = y} and hence, P (X = x) = ! P (∪y (X = x, Y = y)) = y P (X = x, Y = y). Using the!same logic, {Y = y} = ∪x {X = x, Y = y} and hence, P (Y = y) = P (∪x {X = x, Y = y}) = x P (X = x, Y = y). 6.19 Note first that

P ((X, Y ) ∈ {(x, y) : pX,Y (x, y) > 0}) =

"

(x,y): pX,Y (x,y)>0

pX,Y (x, y) =

"

pX,Y (x, y) = 1,

(x,y)

by Proposition 6.1(b),(c), and the law of partitioning. Therefore, for every set A ⊂ R2 , 0 ≤ P ((X, Y ) ∈ A ∩ {(x, y) : pX,Y (x, y) = 0}) ≤ P ((X, Y ) ∈ {(x, y) : pX,Y (x, y) = 0}) = 0. Therefore, for arbitrary A ⊂ R2 , P ((X, Y ) ∈ A) = P ((X, Y ) ∈ A ∩ {(x, y) : pX,Y (x, y) = 0}) + P ((X, Y ) ∈ A ∩ {(x, y) : pX,Y (x, y) > 0}) " " = 0+ pX,Y (x, y) = pX,Y (x, y). (x,y)∈A: pX,Y (x,y)>0

(x,y)∈A

6.1 Joint and Marginal Probability Mass Functions: Bivariate Case

164

Advanced Exercises 6.20 a) Since the first number chosen is selected at random from the first N positive integers, p(X = x) = N1 for 1 ≤ x ≤ N and p(X = x) = 0 otherwise. Given the first number X = x has already been selected, the second number is selected at random from the first x positive integers, and thus P (Y = y|X = x) = x1 for integer 1 ≤ y ≤ x and P (Y = y|X = x) = 0 otherwise. Using the general multiplication rule (Proposition 4.2), P (X = x, Y = y) = P (Y = y|X = x)P (X = x) =

1 for integer 1 ≤ y ≤ x ≤ N xN

and P (X = x, Y = y) = 0 otherwise. b) Using the law of partitions (Proposition 2.8), P (Y = y) =

"

P (X = x, Y = y) =

N "

P (X = x, Y = y) =

x=y

x

N 1 "1 N x=y x

! 1 Thus the marginal PMF ! of Y is P (Y = y) = N1 N x=y x for 1 ≤ y ≤ N and P (Y = y) = 0 otherwise. Now to prove y P (Y = y) = 1 we rewrite the sum as: "

P (Y = y) =

y

N "

P (Y = y) =

y=1

N " y=1

+

N 1 "1 N x=y x

Now we note that we can rewrite the summands as follows: N " N " 1 1 1 1 1 = + + + + ... x 1 2 3 4 y=1 x=y 1 1 1 + + + + ... 2 3 4 1 1 + + ... + 3 4 1 + + ... 4 .. + .

,

+ + + +

+ =

1

+

1

+

1

+

1

+

...

+

N N 1 "" 1 = N x x=y y=1

1 N 1 N 1 N 1 N .. . 1 N 1

=N

Thus we have " y

P (Y = y) =

N " y=1

P (Y = y) =

N " y=1

+

N 1 "1 N x=y x

,

N N 1 "" 1 1 = = N =1 N x N x=y y=1

! !N 1 1 !N 1 c) P (X = Y ) = N x=1 x x=1 P (X = x, Y = x) = x=1 N x = N ! 1 1 !N 1 ln N d) Since N x=1 x ∼ ln N as N → ∞, N x=1 x ∼ N as N → ∞.

165

6.2

CHAPTER 6. Jointly Discrete Random Variables

Joint and Marginal Probability Mass Functions: Multivariate Case

Basic Exercises 6.21 2m − 2. There are a total of 2m subsets of X1 , . . . , Xm including the set of all of them, and the empty set. Excluding those two subsets, there are 2m − 2 subsets of random variables for which there are valid marginal PMFs. 6.22 First, we determine the univariate marginal PMFs. To obtain the marginal PMF of X, we have: 2 " 2 " "" x + 2y + z pX (x) = pX,Y,Z (x, y, z) = (y,z) 63 y=0 z=0

x x+2 x+4 x+1 x+2+1 x+4+1 x+2 x+2+2 x+4+2 + + + + + + + + 63 63 63 63 63 63 63 63 63 x+3 for x ∈ {0, 1} = 7 and pX (x) = 0 otherwise. Now, the marginal PMF of Y is given as: =

pY (y) =

""

(x,z)

pX,Y,Z (x, y, z) =

1 " 2 " x + 2y + z x=0 z=0

63

2y 2y + 1 2y + 1 2y + 1 + 1 2y + 2 2y + 1 + 2 + + + + + 63 63 63 63 63 63 4y + 3 = for y ∈ {0, 1, 2} 21 and pY (y) = 0 otherwise. and finaly, the marginal PMF of Z is given as: =

pZ (z) =

""

(x,y)

pX,Y,Z (x, y, z) =

1 " 2 " x + 2y + z x=0 y=0

63

z+1 z+2 z+1+2 z+4 z+1+4 z + + + + + 63 63 63 63 63 63 2z + 5 = for z ∈ {0, 1, 2} 21 and pZ (z) = 0 otherwise. Next we determine the bivariate marginal PMFs. To obtain the marginal PMF of X and Y , we have: =

pX,Y (x, y) =

"

pX,Y,Z (x, y, z) =

z

2 " x + 2y + z z=0

=

x + 2y + 1 21

63

=

x + 2y x + 2y + 1 x + 2y + 2 + + 63 63 63

for x ∈ {0, 1}, y ∈ {0, 1, 2}

6.2 Joint and Marginal Probability Mass Functions: Multivariate Case

166

and pX,Y (x, y) = 0 otherwise. Next, the marginal PMF of X and Z, is given by: pX,Z (x, z) =

"

pX,Y,Z (x, y, z) =

y

2 " x + 2y + z

63

y=0

=

x+z x+z+2 x+z+4 + + 63 63 63

x+z+2 for x ∈ {0, 1}, z ∈ {0, 1, 2} 21 and pX,Z (x, z) = 0 otherwise. Finaly, the marginal PMF of Y and Z is given as: =

pY,Z (y, z) =

"

pX,Y,Z (x, y, z) =

x

1 " x + 2y + z

63

x=0

=

4y + 2z + 1 63

=

2y + z 1 + 2y + z + 63 63

for y, z ∈ {0, 1, 2}

and pY,Z (y, z) = 0 otherwise. 1 6.23 a) We have that since X is chosen uniformly from the 10 decimal digits, P (X = x) = 10 for x = 0, 1, . . . , 9. Once X is chosen, Y is chosen uniformly from the remaining 9 decimal digits, thus, P (Y = y|X = x) = 19 for y = 0, 1, . . . , 9 where y ̸= x. Once Y is also chosen, then Z is chosen uniformly from the remaining 8 decimal digits, implying that P (Z = z | X = x, Y = y) = 1 8 for z = 0, 1, . . . , 9, such that z, y and x are all distinct. Thus, using the general multiplication 1 for rule, P (X = x, Y = y, Z = z) = P (Z = z|X = x, Y = y)P (Y = y|X = x)P (X = x) = 720 x, y, z ∈ {0, 1, . . . , 9} such that x, y and z are all distinct, and P (X = x, Y = y, Z = z) = 0 otherwise. b) Since X, Y and Z must all be different, by the law of partitioning, 1 = P (Ω) = P (X > Y > Z) + P (X > Z > Y ) + P (Y > X > Z) + P (Y > Z > X) + P (Z > X > Y ) + P (Z > Y > X). By symmetry, P (X > Y > Z) = P (X > Z > Y ) = P (Y > X > Z) = P (Y > Z > X) = P (Z > X > Y ) = P (Z > Y > X) and thus 1 = 6P (X > Y > Z) so P (X > Y > Z) = 61 . ! ! !y−1 !9 !x−1 !y−1 1 1 c) P (X > Y > Z) = 9x=2 x−1 z=0 P (X = x, Y = y, Z = z) = z=0 720 = 6 . y=1 x=2 y=1 d) First, we determine the univariate marginal PMFs. To obtain the marginal PMF of X, we have:

pX (x) =

"

P (X = x, Y = y, Z = z) =

y,z

9 9 " "

y=0 z=0 y̸=x z ∈{y,x} /

1 1 = 720 10

for x = 0, 1, . . . , 9

and pX (x) = 0 otherwise. Now, the univariate marginal PMFs of Y and Z are computed using an identical argument and have the same form as X. Next we determine the bivariate marginal PMFs. The marginal PMF of X and Y is given by: pX,Y (x, y) =

" z

P (X = x, Y = y, Z = z) =

9 "

z=0 z ∈{x,y} /

1 1 = 720 90

for x, y = 0, 1, . . . , 9, x ̸= y

167

CHAPTER 6. Jointly Discrete Random Variables

and pX,Y (x, y) = 0 otherwise. Now, the marginal PMF of X and Z and the marginal PMF of Y and Z are computed using an identical argument and have the same distribution. e) For each of the univariate marginal PMFs, since each x, y or z is equally likely by symmetry, each random variable has a uniform distribution over the 10 decimal digits. For the bivariate marginal PMFs, each pair of X, Y , or X, Z, or Y, Z is equally likely to occur, and thus they are each uniformly distributed over the 90 total possibilities. 6.24 a) Using Proposition 5.13, and realizing that all three random variables are negative binomial random variables, the marginal PMF of X is: # $ x−1 pX (x) = p (1 − p)x−1 = p (1 − p)x−1 for x = 1, 2, . . . 0 and pX (x) = 0 otherwise. The marginal PMF of Y is: # $ y−1 2 pY (y) = p (1 − p)y−2 = (y − 1)p2 (1 − p)y−2 1

for y = 2, 3, . . .

and pY (y) = 0 otherwise. The marginal PMF of Z is: pZ (z) =

#

$ z−1 3 p (1 − p)z−3 2

for z = 3, 4, . . .

and pZ (z) = 0 otherwise. b) First, since the time needed to wait between succesive trials is geometric, and {X = x, Y = y, Z = z} is the same as {X = x, Y − X = y − x, Z − Y = z − y}, the joint PMF of X, Y and Z is: pX,Y,Z (x, y, z) = p(1 − p)z−y−1 p(1 − p)y−x−1 p(1 − p)x−1 = p3 (1 − p)z−3 for integer 1 ≤ x < y < z. c) For the bivariate marginal PMFs, first we consider X, Y . Again, since {X = x, Y = y} = {X = x, Y − X = y − x}, the marginal PMF of X and Y is: pX,Y (x, y) = p2 (1 − p)y−2 for integer 1 ≤ x < y. Similarily, since the amount of time needed to wait for the second success has negative binomial distribution with r = 2, the marginal PMF for Y and Z is: pY,Z (y, z) = (y − 1)p2 (1 − p)y−2 p(1 − p)z−y−1 = (y − 1)p3 (1 − p)z−3 for integer 2 ≤ y < z. Now, considering X and Z, the time needed to wait from the first success to the third is negative binomially distributed with r = 2 and since {X = x, Z = z} = {X = x, Z − X = z − x} we have that the marginal PMF of X and Z is: pX,Z (x, z) = p(1 − p)x−1 (z − x − 1)p2 (1 − p)z−x−2 = (z − x − 1)p3 (1 − p)z−3

6.2 Joint and Marginal Probability Mass Functions: Multivariate Case

168

for integer 3 ≤ x + 2 ≤ z. d) First, for X and Y , P (X = x, Y = y) =

" z

= p3 (1 − p)y−2

∞ "

P (X = x, Y = y, Z = z) =

p3 (1 − p)z−3

z=y+1

∞ " (1 − p)z = p2 (1 − p)y−2 = pX,Y (x, y), z=0

for x ∈ N , y = x + 1, . . .. Next, for Y and Z, P (Y = y, Z = z) =

"

P (X = x, Y = y, Z = z) =

x

y−1 "

p3 (1 − p)z−3

x=1

= (y − 1)p3 (1 − p)z−3 = pY,Z (y, z), for y = 2, 3, . . ., z = y + 1, . . .. Finally, for X and Z, P (X = x, Z = z) =

"

P (X = x, Y = y, Z = z) =

y

z−1 "

p3 (1 − p)z−3

y=x+1

= (z − x − 1)p3 (1 − p)z−3 = pX,Z (x, z), for x ∈ N , z = x + 2, . . .. e) Once we’ve correctly identified the limits of the summands, using Proposition 6.5, we have P (X = x) =

""

P (X = x, Y = y, Z = z) =

∞ "

∞ "

p3 (1 − p)z−3 = p(1 − p)x−1 = pX (x)

y=x+1 z=y+1

(y,z)

for all x ∈ N , P (Y = y) =

""

P (X = x, Y = y, Z = z) =

y−1 ∞ " "

p3 (1 − p)z−3

z=y+1 x=1

(x,z)

= (y − 1)p2 (1 − p)y−2 = pY (y), y = 2, 3, . . . , P (Z = z) =

"" (x,y)

=

#

P (X = x, Y = y, Z = z) =

z−2 " z−1 "

p3 (1 − p)z−3

x=1 y=x+1

$ z−1 3 p (1 − p)z−3 = pZ (z), z = 3, 4, . . . . 2

6.25 a) The random variables X, Y, Z have the multinomial distribution with parameters n = 3 and p1 = 14 , p2 = 14 , and p3 = 21 . Thus the joint PMF of X, Y, Z is: # $ # $x # $y # $z 1 1 3 1 pX,Y,Z (x, y, z) = 4 4 2 x, y, z

169

CHAPTER 6. Jointly Discrete Random Variables

if x, y, and z, are nonegative integers whose sum is 3 and otherwise pX,Y,Z (x, y, z) = 0. b) 3 P (X > Y > Z) = pX,Y,Z (2, 1, 0) = , 16 which is the probability that more Greens are chosen than Democrats and more Democrats are chosen than Republicans. P (X ≤ Y < Z) = pX,Y,Z (0, 0, 3) + pX,Y,Z (0, 1, 2) =

5 , 16

which represents the probability that at least as many Democrats are chosen as Greens, and that more Republicans are chosen than either of the other groups. P (X ≤ Z ≤ Y ) = pX,Y,Z (0, 3, 0) + pX,Y,Z (0, 2, 1) + pX,Y,Z (1, 1, 1) =

19 , 64

which represents the probability that there are at least as many Republicans as Greens chosen, and at least as many Democrats as Republicans chosen. & % & % c) Since selection is independent from other selections, X ∼ B 3, 14 , Y ∼ B 3, 41 , and % each & Z ∼ B 3, 12 .

d)% If two are grouped and as one new group, we have X + Y ∼ & of the three% parties & % considered & B 3, 21 , X + Z ∼ B 3, 34 and Y + Z ∼ B 3, 34

e) X + Y and Z have the multinomial distribution with n = 3, p1 = joint PMF of X + Y, Z is # $ 3 1 pX+Y, Z (u, v) = u 8

1 2

and p2 = 12 . Thus the

if u = 0, 1, 2, 3 and v = 3 − u, and otherwise pX+Y,Z (u, v) = 0. f ) Using the FPF, P (1 ≤ X + Z ≤ 2) = pX,Y,Z (2, 1, 0) + pX,Y,Z (0, 2, 1) + pX,Y,Z (1, 2, 0) +pX,Y,Z (0, 1, 2) + pX,Y,Z (1, 1, 1) =

9 . 16

g) Using (d), P (1 ≤ X + Z ≤ 2) = P (X + Z = 1) + P (X + Z = 2) =

9 16 .

h) Since three subjects must be chosen, pX+Y +Z (3) = 1 and pX+Y +Z (u) = 0 for all u ̸= 3. 6.26 a) The random variables X1 , X2 , X3 , X4 have the multinomial distribution with parameters 15 7 6 , p2 = 40 , p3 = 12 n = 5 and p1 = 40 40 and p4 = 40 . Thus the joint PMF of X1 , X2 , X3 , X4 is: pX1 ,X2 ,X3 ,X4 (x1 , x2 , x3 , x4 ) =

#

5 x1 , x2 , x3 , x4

$#

6 40

$x 1 #

15 40

$x 2 #

12 40

$x 3 #

if x1 , x2 , x3 , and x4 are nonegative integers whose sum is 5 and otherwise pX1 ,X2 ,X3 ,X4 (x1 , x2 , x3 , x4 ) = 0. b) 0.0554, since P (X1 = 0, X2 = 3, X3 = 1, X4 = 1) = pX1 ,X2 ,X3 ,X4 (0, 3, 1, 1)

7 40

$x 4

6.2 Joint and Marginal Probability Mass Functions: Multivariate Case

170

% 5 & % 6 &0 % 15 &3 % 12 &1 % 7 &1 ≈ 0.0554 = 0,3,1,1 40 40 40 40 c) P (X2 ≥ 3) = pX1 ,X2 ,X3 ,X4 (2, 3, 0, 0) + pX1 ,X2 ,X3 ,X4 (1, 3, 1, 0) + pX1 ,X2 ,X3 ,X4 (1, 3, 0, 1) + pX1 ,X2 ,X3 ,X4 (0, 3, 2, 0) + pX1 ,X2 ,X3 ,X4 (0, 3, 1, 1) + pX1 ,X2 ,X3 ,X4 (0, 3, 0, 2) + pX1 ,X2 ,X3 ,X4 (1, 4, 0, 0) + pX1 ,X2 ,X3 ,X4 (0, 4, 1, 0) + pX1 ,X2 ,X3 ,X4 (0, 4, 0, 1) +pX1 ,X2 ,X3 ,X4 (0, 5, 0, 0) = 0.0119+0.0475+0.0277+0.0475+0.0554+0.0161+0.0148+0.0297+ 0.0173 + 0.0074 ≈ 0.2752 d) P (X2 ≥ 3, X3 ≥ 1) = pX1 ,X2 ,X3 ,X4 (1, 3, 1, 0)+pX1 ,X2 ,X3 ,X4 (0, 3, 2, 0)+pX1 ,X2 ,X3 ,X4 (0, 3, 1, 1)+ pX1 ,X2 ,X3 ,X4 (0, 4, 1, 0) = 0.0475 + 0.0475 + 0.0554 + 0.0297 ≈ 0.1325 6.27 a) Let Xi be the number of times the ith face appears in 18 tosses. Then X1 , X2 , X3 , X4 , X5 , X6 have the multinomial distribution with %n = 18 and & 1 p1 = p2 = p3 = p4 = 18 ≈ 0.00135 p5 = p6 = 61 . Thus P (X1 = X2 = X3 = X4 = X5 = X6 = 3) = 3,3,3,3,3,3 618 b) 0.00135, since {Xi ≤ 3 for all i = 1, . . . , 6, X1 +. . .+X6 = 18} = {Xi = 3 for all i = 1, . . . , 6}. c) 0.00135, since {Xi ≥ 3 for all i = 1, . . . , 6, X1 +. . .+X6 = 18} = {Xi = 3 for all i = 1, . . . , 6}. 6.28 0.1136; Let X1 be the number of good items, X2 be the number of salvageable items, and X3 be the number of scrapped items. Then the random variables X1 , X2 , X3 have the multinomial distribution with parameters n = 50 and p1 = 0.97, p2 = 0.03 × 32 = 0.02, and p3 = 0.03 × 31 = 0.01. Thus # $ 50 pX1 ,X2 ,X3 (48, 1, 1) = (0.97)48 (0.02)1 (0.01)1 ≈ 0.1136 48, 1, 1 6.29 a) X2 ∼ B(n, p2 ) because when concerned with only the value of X2 , the other 4 variables can be grouped into 1 new larger variable, and thus we have a multinomial distribution with only two variables. b) X3 + X5 ∼ B(n, p3 + p5 ) because after grouping X3 and X5 , the other 3 variables can be grouped as well, and again result in a Binomial distribution. c) X1 + X4 + X5 ∼ B(n, p1 + p4 + p5 ) using the same reasoning as above. d) X1 +X2 and X3 +X4 +X5 have the multinomial distribtion with n and p1 +p2 , and p3 +p4 +p5 using the same regrouping argument as above. e) X1 , X2 + X3 , and X4 + X5 have the multinomial distribution with n and p1 , p2 + p3 and p4 + p5 using the same regrouping argument as above. f ) X1 , X2 + X3 + X4 , and X5 have the multinomial distribution with n and p1 , p2 + p3 + p4 and p5 using the same regrouping argument as above. 6.30 a) By the binomial theorem, $ n−x n−x "# " n n! 1 1 y n−x−y q r = q y r n−x−y n−x n−x (q + r) x, y, n − x − y (q + r) x!y!(n − x − y)! y=0

y=0

=

n−x " n! (n − x)! 1 · q y r n−x−y n−x (q + r) x!(n − x)! y!(n − x − y)! y=0

# $ n−x # $ # $ n " n − x y n−x−y 1 n = q r = . n−x (q + r) x y x y=0 ' () * =(q+r)n−x

171

CHAPTER 6. Jointly Discrete Random Variables

b) Proving the required equality is equivalent to showing that the following holds: # $ $ n−x "# n n x n−x (1 − (q + r)) (q + r) = (1 − (q + r))x q y r n−x−y x x, y, n − x − y y=0

which clearly is the equation used to obtain the marginal PMF of X from a multinomial distribution by using Proposition 6.5. 6.31 If pi is the proportion of people in the population %N pi &of size N with attribute i, then N pi is the total number of people with attribute i. Then xi is the number of ways that xi people % & % & can be chosen with attribte i from the population. Thus, Nxp11 ×· · · × Nxpmm is the total number of ways to choose x1 , . . . xm people with attributes a1 , . . . , am respectively. If n = x1 + . . . + xn , %N & then since n is the total number of ways to pick n people from a population of N people, we have: %N p 1 & %N p m & ··· pX1 ,X2 ,...,Xm (x1 , x2 , . . . , xm ) = x1 %N & xm n

6.32 a) X, Y, Z have a multiple hypergeometric distribution with N = 8, n = 3, px = 14 , py = 14 and pz = 12 . b) 1 P (X > Y > Z) = pX,Y,Z (2, 1, 0) = , 28 which is the probability that more Greens are chosen than either Democrats or Republicans, and more Democrats are chosen than Republicans. P (X ≤ Y < Z) = pX,Y,Z (0, 0, 3) + pX,Y,Z (0, 1, 2) =

2 , 7

which is the probability that at least as many Democrats are chosen as Greens, and that more Republicans are chosen than either other group. P (X ≤ Z ≤ Y ) = pX,Y,Z (0, 2, 1) + pX,Y,Z (1, 1, 1) =

5 , 14

which is the probability that there are at least as many Republicans as Greens chosen, and at least as many Democrats as Republicans chosen. c) %2 &% 6 & P (X = x) =

x

%83−x & 3

for x = 0, 1, 2 and P (X = x) = 0 otherwise. P (Y = y) =

%2&% y

for y = 0, 1, 2 and P (Y = y) = 0 otherwise. P (Z = z) =

6 3−y %8& 3

%4&% z

&

4 & 3−z %8& 3

6.2 Joint and Marginal Probability Mass Functions: Multivariate Case

for z = 0, 1, 2, 3 and P (Z = z) = 0 otherwise. d) P (X + Y = u) =

% 4 &% u

for u = 0, 1, 2, 3 and P (X + Y = u) = 0 otherwise. P (X + Z = v) =

&

%6&% v

for v = 0, 1, 2, 3 and P (X + Z = v) = 0 otherwise. P (Y + Z = w) =

4 %83−u & 3

172

2 & 3−v %8& 3

% 6 &%

2 & w 3−w %8& 3

for w = 0, 1, 2, 3 and P (Y + Z = w) = 0 otherwise. e) X +Y, Z have a multiple hypergeometric distribution with N = 8, n = 3, p1 = 12 , and p2 = 12 . f ) Using the FPF, P (1 ≤ X + Z ≤ 2) = pX,Y,Z (2, 1, 0) + pX,Y,Z (0, 2, 1) + pX,Y,Z (1, 2, 0) +pX,Y,Z (0, 1, 2) + pX,Y,Z (1, 1, 1) =

9 14

9 g) Using d), P (1 ≤ X + Z ≤ 2) = P (X + Z = 1) + P (X + Z = 2) = 14 h) Since three subjects must be chosen, PX+Y +Z (3) = 1 and PX+Y +Z (u) = 0 otherwise.

6.33 a) X1 , X2 , X3 , X4 have a multiple hypergeometric distribution with N = 40, n = 5, 6 12 7 p1 = 40 , p2 = 15 40 , p3 = 40 , and p4 = 40 . b) P (X1 = 0, X2 = 3, X3 = 1, X4 = 1) = pX1 ,X2 ,X3 ,X4 (0, 3, 1, 1) ≈ 0.0581 c) P (X2 ≥ 3) = pX1 ,X2 ,X3 ,X4 (2, 3, 0, 0) + pX1 ,X2 ,X3 ,X4 (1, 3, 1, 0) + pX1 ,X2 ,X3 ,X4 (1, 3, 0, 1) + pX1 ,X2 ,X3 ,X4 (0, 3, 2, 0) + pX1 ,X2 ,X3 ,X4 (0, 3, 1, 1) + pX1 ,X2 ,X3 ,X4 (0, 3, 0, 2) + pX1 ,X2 ,X3 ,X4 (1, 4, 0, 0) + pX1 ,X2 ,X3 ,X4 (0, 4, 1, 0) + pX1 ,X2 ,X3 ,X4 (0, 4, 0, 1) + pX1 ,X2 ,X3 ,X4 (0, 5, 0, 0) ≈ 0.264 d) P (X2 ≥ 3, X3 ≥ 1) = pX1 ,X2 ,X3 ,X4 (1, 3, 1, 0)+pX1 ,X2 ,X3 ,X4 (0, 3, 2, 0)+pX1 ,X2 ,X3 ,X4 (0, 3, 1, 1)+ pX1 ,X2 ,X3 ,X4 (0, 4, 1, 0) ≈ 0.178 6.34 a) When sampling with replacement, the distribution is multinomial and thus: ## $ # $x1 # $x2 # $x3 $ 11 1 6 5 pX1 ,X2 ,X3 (x1 , x2 , x3 ) = 0.4 14 28 4 x1 , x2 , x3 ## $ # $x 1 # $x 2 # $x 3 $ 6 1 5 1 +0.6 , x1 , x2 , x3 4 12 3 for all non-negative integers x1 , x2 , x3 whose sum is 6. b) When sampling without replacement, the distribution is multiple hypergeometric and thus: + % 9 &%15&%12& , + %10 &%11&% 7 & , pX1 ,X2 ,X3 (x1 , x2 , x3 ) = 0.4

x1

%x282&

x3

+ 0.6

6

for all non-negative integers x1 , x2 , x3 such that x1 + x2 + x3 = 6.

x1

%x362& 6

x3

,

173

CHAPTER 6. Jointly Discrete Random Variables

Theory Exercises 6.35 Clearly, pX1 ,...,Xm (x1 , . . . , xm ) = P (X1 = x1 , · · · , Xm = xm ) ≥ 0 for all x1 , . . . , xm ∈ R. Next, similarly to Exercise 6.16, note that {(x1 , · · · , xm ) : pXi (xi ) = 0} ⊂ {(x1 , · · · , xm ) : pX1 ,...,Xm (x1 , · · · , xm ) = 0} for all i = 1, . . . , m, implying that {(x1 , ···, xm ) : pXi (xi ) ̸= 0} ⊃ {(x1 , ···, xm ) : pX1 ,...,Xm (x1 , ···, xm ) ̸= 0} for all i = 1, ..., m. Therefore, {(x1 , ···, xm ) : pX1 ,...,Xm (x1 , ···, xm ) ̸= 0} ⊂

m .

{(x1 , ···, xm ) : pXi (xi ) ̸= 0}.

i=1

/ Note that m i=1 {(x1 , ···, xm ) : pXi (xi ) ̸= 0} = {x1 : pX1 (x1 ) ̸= 0} × . . . × {xm : pXm (xm ) ̸= 0}, which is a cartesian product of a finite collection of countable sets, thus, is itself countable. Therefore, {(x1 , ···, xm ) : pX1 ,...,Xm (x1 , ···, xm ) ̸= 0}, as a subset of a countable set, is countable. Finally note that, since the random variables are discrete, we have that

=

1 = P (Ω) = P ((X1 , ···, Xm ) ∈ {(x1 , ..., xm ) : pX1 ,...,Xm (x1 , ..., xm ) ̸= 0}) " " pX1 ,...,Xm (x1 , ..., xm ) = pX1 ,...,Xm (x1 , ..., xm ).

(x1 ,...,xm ): pX1 ,...,Xm (x1 ,...,xm )̸=0

(x1 ,...,xm )

6.36 Suppose that (a) p(x1 , . . . , xm ) ! ≥ 0 for all x1 , . . . , xm ∈ Rm ; (b) {(x1 , . . . , xm ) ∈ Rm : p(x1 , . . . , xm ) ̸= 0} is countable; (c) (x1 ,...,xm ) p(x1 , . . . , xm ) = 1. Let Ω = {(x1 , . . . , xm ) ∈ Rm : p(x1 , . . . , xm ) ̸= 0}. Then Ω is countable and, in view of conditions (a),(c) and Proposition 2.3, there exists a unique probability measure P such that P ({(ω1 , ..., ωm )}) = p(ω1 , ..., ωm ) for all (ω1 , . . . , ωm ) ∈ Ω. Define Xi ((ω1 , . . . , ωm )) = ωi for all i and for all (ω1 , . . . , ωm ) ∈ Ω. Then X1 , . . . , Xm are discrete random variables and pX1 ,...,Xm (x1 , . . . , xm ) = P ({(ω1 , . . . , ωm ) ∈ Ω : Xi ((ω1 , . . . , ωm )) = xi for all i}) p(x1 , . . . , xm ), if (x1 , . . . , xm ) ∈ Ω, = P ({(x1 , . . . , xm )}) = 0, if (x1 , . . . , xm ) ∈ / Ω. On the other hand, note that p(x1 , . . . , xm ) = 0 for all (x1 , . . . , xm ) ∈ / Ω, by definition of Ω. Therefore, pX1 ,...,Xm (x1 , . . . , xm ) = p(x1 , . . . , xm ) for all (x1 , . . . , xm ) ∈ Rm . 6.37 Similarly to Exercise 6.18, the required result follows at once by the law of partitioning and since for every integer-valued 1 ≤ k1 < · · · < kj ≤ m (with j ∈ {1, · · · , m}), 2 0 1 Xk1 = xk1 , · · · , Xkj = xkj = {X1 = y1 , . . . , Xm = ym }. (y1 ,...,ym ): yki =xki for all i=1,...,j

6.38 Since the set where pX1 ,...,Xm (x1 , . . . , xm ) > 0 must be countable, let us consider A in Proposition 6.6 to be countable. If it wasn’t, we could partition it into a set where for every element in the set, pX1 ,...,Xm (x, . . . , xm ) = 0 (which we could then disregard) and a countable set where for every element in that set, pX1 ,...,Xm (x1 , . . . , xm ) > 0. Now, using the law of partitions, we can partition A into countably many subsets, one subset for every element of the set, and

6.2 Joint and Marginal Probability Mass Functions: Multivariate Case

174

then the desired result is an immediate consequence. 6.39 For k = 1, · · · , m and i ∈ {1, · · · , n}, let Ni = k whenever Ek is the ith event to occur. Moreover, for arbitrary given non-negative integers x1 , ···, xm with x1 + ··· + xm = n, define K = {(k1 , ..., kn ) ∈ {1, ..., m}n : x1 of all the %kj ′ s are &equal to 1, etc., xm of all the kj ′ s are equal to m}. Then the cardinality of K is equal to x1 ,···n,xm and P (X1 = x1 , · · · , Xm = xm ) =

"

P (N1 = k1 , · · · , Nn = kn )

(k1 ,··· ,kn )∈K

=

"

pk1 . . . pkn =

(k1 ,··· ,kn )∈K

"

px1 1

. . . pxmm

=

(k1 ,··· ,kn )∈K

#

$ n px1 . . . pxmm . x1 , · · · , xm 1

Advanced Exercises 6.40 a) If we have that there are n items to be distributed to m places, then knowing x1 + . . . + xm−1 of them have been distributed to X1 , . . . , Xm−1 is the same as knowing that x1 +. . .+xm−1 of them have been distributed to X1 , . . . , Xm−1 and that n − (x1 + . . . + xm−1 of them go to Xm since logically we must have a place for every item. Therefore, {X1 = x1 , . . . , Xm−1 = xm−1 } = {X1 = x1 , . . . , Xm−1 = xm−1 , Xm = xm } b) Using Proposition 3.5, the result follows % n immediately. & x1 % & n−x1 = n px1 (1 − p )n−x1 which is p (1 − p ) c) When m = 2, we have pX1 (x1 ) = x1 ,n−x 1 1 1 x1 1 1 clearly the form for the binomial distribution. d) Consider the marginal PMF for Xj1 , . . . , Xjk for any 1 ≤ k ≤ m. We would have by grouping, {Xj1 = xj1 , . . . , Xjk = xjk } = {Xj1 = xj1 , . . . , Xjk = xjk , Xjk+1 +. . .+Xjm = n−(xj1 +. . .+xjk )} Thus, using Proposition 3.5, we have # $ n pXj1 ,...,Xjk (xj1 , . . . , xjk ) = xj1 , . . . , xjk , n − (xj1 + . . . + xjk ) xj

xj

×pj1 1 · · · pjk k (1 − (pj1 + . . . + pjk ))n−(xj1 +...+xjk ) which is clearly in the same form as the PMF given in b). 6.41 a) For non-negative x1 , . . . , xn such that x1 + . . . + xn = n %N p 1 & x1

% & # $ · · · Nxpmm n 1 = (N p1 )x1 · · · (N pm )xm · %N & x1 , . . . , xm (N )n n

Note that for all 1 ≤ i ≤ n,

xi − 1 xi − 2 (N pi )xi = (N pi − xi + 1)(N pi − xi + 2) . . . (N pi ) = N xi (pi − )(pi − ) . . . pi . Thus, N N 1 (N p1 )x1 · · · (N pm )xm −→ px1 1 · · · pxmm , as N → ∞. Note also that Nn n−2 n−1 )n )(1 − ) . . . 1, which implies that (N (N )n = N n (1 − n → 1 as N → ∞. Therefore, N N N (N p1 )x1 (N pm )xm −→ px1 1 · · · pxmm , (N )n

175

CHAPTER 6. Jointly Discrete Random Variables

as N → ∞, and the required result follows. b) When N is large relative to n, the change in the probabilities from successive removals changes so minutely as to not affect the overall distribution. 6.42 a) Let N be the total population of U.S. residents. Then the exact distribution of X1 , X2 , X3 , X4 is multiple hypergeometric and thus: %N ∗0.19&%N ∗0.231&%N ∗0.355&%N ∗0.224& PX1 ,X2 ,X3 ,X4 (x1 , x2 , x3 , x4 ) =

x1

x2

%N &

x3

x4

n

b) When N is large relative to n, we can approximate the multiple hypergeometric with the multinomial since the difference in corresponding probabilities when sampling is done with and without replacement are negligible. Thus the approximate distribution is: # $ n PX1 ,X2 ,X3 ,X4 (x1 , x2 , x3 , x4 ) ≈ (0.19)x1 (0.231)x2 (0.355)x3 (0.244)x4 . x1 , x2 , x3 , x4 c) When n = 5, PX1 ,X2 ,X3 ,X4 (1, 1, 2, 1) ≈ 0.08098 6.43 Assuming that the politician is running for an office important enough to have her have so many more constituents than 40, we can use the multinomial approximation to the multiple hypergeometric. This being done, were her claim to be true, we would have expected to see 26 people who favor her issue and 12 who don’t. Even if we assume that every one of the “undecided” people sampled eventually sides with her, it would still seem that she is overestimating her popularity on the issue, and it would seem prudent for her opponents to use this to their advantage. Some may note however, that either way, she still has a majority of the people on her side, and thus this may not be the issue of choice for her opponents.

6.3

Conditional Probability Mass Functions

Basic Exercises 6.44 a) Bedrooms, X 2 3 4

Bathrooms, Y

Total

2

3 19

3

0

4

0

5

0

0

1.00

1.00

pX (x)

3 50

14 25

19 50

1.00

14 19 12 23 2 7

2 19 11 23 5 7

1.00 1.00 1.00

2 16 b) P (X = 3 | Y = 2) + P (X = 4 | Y = 2) = 14 19 + 19 = 19 ≈ 0.8421 c) About 84.21% of all 2 bathroom houses have at least 3 bedrooms.

6.45 a) For each number of siblings X, the conditional PMF of the number of sisters Y is given =y) in the table in (b) and is computed as P (Y = y|X = x) = P (X=x,Y P (X=x) .

6.3 Conditional Probability Mass Functions

0

1

Sisters, Y 2

3

Total

0

1

0

0

0

1.00

1

0.471

0.529

0

0

1.00

2

0.091

0.727

0.182

0

1.00

3

0.0

0.333

0.667

0

1.00

4

0

0

0

1

1.00

pY (y)

0.425

0.45

0.1

0.025

1.00

b) Siblings, X

176

10 c) P (Y ≥ 1|X = 2) = 11 ≈ 0.909 d) For each number of sisters Y , the conditional PMF of the number of siblings X is given in =y) the table in (e) and is computed as P (X = x|Y = y) = P (X=x,Y P (Y =y) . e)

0 0 Sisters, Y

8 17

1 8 17

≈ 0.471 0

0.5

2

0

0

0 8 40

pX (x)

1 17 8 18

≈ 0.471

1 3

Siblings, X 2

= 0.2

4

Total

0

0

1.0

0

1.0

0.5

0

1.0

0

1

1.0

0.025

1.0

≈ 0.059 ≈ 0.444

1 18

0.5

0 17 40

3 ≈ 0.056

0 11 40

= 0.425

= 0.275

3 40

= 0.075

f ) P (X ≥ 2 | Y = 1) = 0.5 6.46 a) Given arbitrary x ∈ {0, 1, · · · , 9}, P (Y = y | X = x) =

1 9

for y ∈ {0, 1, · · · , 9} \ {x}, which shows that Y |X is uniformly distributed on the remaining 9 decimal digits. 1 P (X = x, Y = y) 90 =! b) P (X = x | Y = y) = P (Y = y) k∈{0,··· ,9}\{y} P (X = k, Y = y) 1 1 90 =! 1 = 9 for y ∈ {0, 1, · · · , 9} \ {x}. k∈{0,··· ,9}\{y} 90 c) P (3 ≤ X ≤ 4 | Y = 2) = P (X = 3 | Y = 2) + P (X = 4 | Y = 2) =

2 9

IF 0 6.47 a)

IE

1

pI E

0

P (F c | E c )

P (F

| Ec)

1

1 pIF (x)

P (F c | E) P (F c )

P (F | E) P (F )

1 1

b) The result, that guarantees that the values in each row in the above table sum to 1, is Proposition 4.1. 6.48 a) The required table for the conditional PMFs of Y given X = x and the marginal PMF

177

CHAPTER 6. Jointly Discrete Random Variables

of Y is given by: 1

Smaller value, X

Larger value, Y 2 3 4

5

6

Total

2 11 2 9 2 7 2 5 2 3

1

1

1 11

2

0

2 11 1 9

3

0

0

2 11 2 9 1 7

4

0

0

0

2 11 2 9 2 7 1 5

5

0

0

0

0

2 11 2 9 2 7 2 5 1 3

6

0

0

0

0

0

1

1

pY (y)

0.0278

0.0834

0.1390

0.1946

0.2502

0.3058

1

1 1 1 1

b) The required table for the conditional PMFs of X given Y = y and the marginal PMF of X is given by: Smaller value, X 1 2 3 4 5 6 Total

Larger value, Y

1

1

0

0

0

0

0

1

2

1 3 2 5 2 7 2 9 2 11

0

0

0

0

0

1 5 2 7 2 9 2 11

0

0

0

1

1 7 2 9 2 11

0

0

1

1 9 2 11

0

1

6

2 3 2 5 2 7 2 9 2 11

1 11

1

pX (x)

0.3058

0.2502

0.1946

0.1390

0.0834

0.0278

1

3 4 5

6.49 a) The conditional PMF of Y given X = x (x ∈ N ) is equal to P (Y = y|X = x) = p(1 − p)y−x−1 for y = x + 1, x + 2, . . . and P (Y = y|X = x) = 0 otherwise, since by Exercise 6.9 the number of trias needed after the first success to get the second success has geometric distribution with parameter p. =y) b) For each x ∈ N , P (Y = y|X = x) = P (X=x,Y = p(1 − p)y−x−1 for y = x + 1, · · · . P (X=x) 1 c) The conditional PMF of X given Y = y (y ∈ {2, 3, · · · }) is equal to P (X = x|Y = y) = y−1 for x = 1, . . . , y − 1 and P (X = x|Y = y) = 0 otherwise, since once we know when the second success must occur, any of the previous (y − 1) trials could have been the one where the first success occured, and thus is uniformly distributed. (1 − p)y−2 p2 1 for y ∈ {2, 3, · · · } and x ∈ {1, · · · , y − 1}. d) P (X = x|Y = y) = %y−1& = 2 y−2 y−1 1 p (1 − p) e) The conditional PMF of Y − X given X = x equals P (Y − X = z|X = x) = p(1 − p)z−1 for z = 1, . . . and P (Y − X = x|X = x) = 0 otherwise, since the occurence of the first success is independent of the number of trials between the first and second success, and thus as shown before, the number of trials needed between successess is geometric with parameter p. f ) P (Y − X = z|X = x) = P (Y = x + z|X = x) = p(1 − p)z−1 , by (b). 6.50 a) As shown in Exercise 6.13, P (X = x) = P (Y = y|X = x) =

e−λ λx x! ,

x ∈ Z+ , thus

e−(λ+µ) λx µy /(x!y!) e−µ µy P (X = x, Y = y) = = , y = 0, 1, . . . , P (X = x) e−λ λx /x! y!

6.3 Conditional Probability Mass Functions

thus, Y |X = x has P(µ) distribution. b) As shown in Exercise 6.13, P (Y = y) = P (X = x|Y = y) =

e−µ µy y! ,

178

y ∈ Z+ , thus

e−(λ+µ) λx µy /(x!y!) e−λ λx P (X = x, Y = y) = = , x = 0, 1, . . . , P (Y = y) e−µ µy /y! x!

thus, X|Y = y has P(λ) distribution. c) The conditional distribution of Y given X is the same as the marginal distribution of Y . d) The conditional distribution of X given Y is the same as the marginal distribution of X. pX,Y (x,y) pX (x)pY (y) = pY (y) pX (x) = pX (x) pX,Y (x,y) pX (x)pY (y) = pY (y) = pX (x) pY (y)

6.51 a) pY |X (y|x) = b) pX|Y (x|y) =

6.52 a) From Exercise 6.22, we have that pY (y) = pX,Z|Y (x, z|y) =

4y+3 21 ,

and thus

pX,Y,Z (x, y, z) x + 2y + z = pY (y) 12y + 9

Therefore we have pX,Z|Y (x, z|1) =

x+z+2 21

b) Given that Y = 1, the probability that all the family’s automobile-accident losses will be reimbursed is: P (X + Y + Z ≤ 2|Y = 1) = pX,Z|Y (0, 0|1) + pX,Z|Y (1, 0|1) + pX,Z|Y (0, 1|1) = c) From Exercise 6.22, we have that pX,Y (x, y) = pZ|X,Y (z|x, y) =

x+2y+1 , 21

2 3 3 8 + + = . 21 21 21 21

and thus

pX,Y,Z (x, y, z) x + 2y + z = pX,Y (x, y) 3x + 6y + 3

Therefore we have pZ|X,Y (z|0, 1) =

z+2 9

d) Given that X = 0 and Y = 1, the probability that all the family’s automobile-accident losses will be reimbursed is: P (X + Y + Z ≤ 2|X = 0, Y = 1) = pZ|X,Y (0, 1, 0) + pZ|X,Y (0, 1, 1) =

5 9

6.53 a) For x ∈ {0, · · · , n}, % n & x y z # $y # $z pX,Y,Z (x, y, z) q r (n − x)! x,y,z p q r pY,Z|X (y, z|x) = = %n& x = y+z pX (x) y!z! 1−p 1−p x p (q + r) $y # $z # $# r n−x q , = 1−p 1−p y, z

179

CHAPTER 6. Jointly Discrete Random Variables

where y, z are nonnegative integers whose sum is n − x. Thus Y, Z|X = x has a multinomial distribution with parameters n − x and q/(1 − p), r/(1 − p). b) Since we must have that X + Y + Z = n, then if X = x and Y = y, then we must have Z = n − x − y. Hence 1 if z = n − x − y pZ|X,Y (z|x, y) = 0 otherwise 6.54 a) pX1 ,...,Xm (x1 , . . . , xm ) pX1 ,...,Xm−1 |Xm (x1 , . . . , xm−1 |xm ) = = pXm (xm )

%

& x n 1 × · · · × p xm m x1 ,...,xm p1 %n& x n−x m m (1 − pm ) xm pm

# # $x1 $ (n − xm )! p1 pm−1 xm−1 = × ··· × x1 ! · · · xm−1 ! 1 − pm 1 − pm # # $# $x 1 $ pm−1 xm−1 n − xm p1 × ··· × , = x1 , . . . , xm−1 1 − pm 1 − pm

which is a multinomial distribution with parameters n−xm and p1 /(1−pm ), · · · , pm−1 /(1−pm ). 18 2 b) Here m = 3, n = 10, p1 = 18 38 , p2 = 38 , and p3 = 38 . Thus, pX1 ,X2 |X3 (x1 , x2 , 1) =

#

n−1 x1 , x2

$ # $x1 +x2 1 . 2

6.55 a) If the first m − 1 random variables are grouped together, we have: pXm (xm ) =

%N (1−pm )&%N pm & n−xm %N & n

xm

, xm = 0, 1, · · · , N pm ,

thus the conditional distribution of X1 , . . . , Xm−1 given Xm = xm is: pX1 ,...,Xm−1 |Xm (x1 , . . . , xm−1 |xm ) =

=

%N p1 & x1

···

pX1 ,...,Xm (x1 , . . . , xm ) pXm (xm )

%N pm−1 & xm−1

%N (1−pm )&

,

n−xm

which represents a multiple hypergeometric distribution with parameters N (1 − pm ), n − xm and p1 /(1 − pm ), · · · , pm−1 /(1 − pm ). b) Here m = 4, N = 40, n = 5, N p1 = 6, N p2 = 15, N p3 = 12, N p4 = 7, thus % 6 &%15 &%12& pX1 ,X2 ,X3 |X4 (x1 , x2 , x3 , 1) =

x1

%x332& 4

where x1 , x2 , x3 are non-negative integers whose sum is 4.

x3

,

6.3 Conditional Probability Mass Functions

180

Theory Exercises 6.56 Let P (Y = y!| X = x) = g(y) and !P (X = x) = h(x). Then, P (X = x, Y!= y) = h(x)g(y) and P (Y = y) = x h(x)g(y) = g(y) x h(x). Since h(x) is the PMF of X, x h(x) = 1, and thus, P (Y = y) = g(y) = P (Y = y | X = x). 6.57 Clearly the general multiplication rule holds for all x such that pX (x) > 0. Additionally, if pX (x) = 0, then we must have that pX,Y (x, y) = 0 as well. Thus Equation 6.15 holds for all real x.

Advanced Exercises 6.58 a) Let y = n − (xk+1 + . . . + xm ) and py = 1 − (pk+1 + . . . + pm ). From Exercise 6.40, we know # $ n x P (Xk+1 = xk+1 , . . . , Xm = xm ) = py p k+1 · · · pxxm y, xk+1 , . . . , xm y k+1 Thus the conditional distribution of X1 , . . . , Xk given Xk+1 = xk+1 , . . . , Xm = xm is pX1 ,...,Xk |Xk+1 ,...,Xm (x1 , . . . , xk |xk+1 . . . , xm ) =

=

%

& x1 n xm x1 ,...,xm p1 · · · pm % & y xk+1 xm n y,xk+1 ,...,xm py pk+1 · · · px =

#

=

y x1 , . . . , xk

P (X1 = x1 , . . . , Xm = xm ) P (Xk+1 = xk+1 , . . . , Xm = xm )

xk x1 n! x1 !···xm ! p1 · · · pk y n! y!xk+1 !···xm ! py

$#

p1 py

$x 1

···

#

pk py

=

px1 1 · · · pxk l y! x1 ! · · · xk ! pxy 1 +...+xk

$x k

Hence, X1 , . . . , Xk |Xk+1 = xk+1 , . . . , Xm = xm is multinomial with parameters y and ppy1 , . . . , ppky . b) The result of a) says that once we know that out of the n total trials, y − n of them are accounted for by Xk+1 , . . . , Xm , the remaining y are distributed multinomially among the first k X, with relative prbabilities instead of the original ones. 6.59 a) P (Y = y|X = x) = x1 for y = 1, . . . , x and P (Y = y|X = x) = 0 otherwise. In other words, Y |X = x has uniform distribution on {1, · · · , x} (where x ∈ {1, · · · , N }). b) Using the result from Exercise 6.20, the conditional distribution of X given that Y = y is: P (X = x|Y = y) =

P (X = x, Y = y) = P (Y = y)

1 1 N

Nx ! N

1 j=y j

1 = !N

x j=y j

,

!N !N −1 ∼ −1 c) First, as N → ∞, for a fixed y, n=y n n=1 n . Second, using the fact that !N −1 ∼ ln N as N → ∞, we have that P 1 ∼ ln1N as N → ∞. Therefore, finally, we N −1 n=1 n have that

x

PN 1

n=y

n−1



1 x ln N

n=y

as N → ∞.

n

181

CHAPTER 6. Jointly Discrete Random Variables

6.4

Independent Random Variables

Basic Exercises ! ! ! 6.60 a) First, we have that p (x) = p (x, y) = q(x)r(y) = q(x) X X,Y y y ! ! ! y r(y) and similarly, pY (y) = r(y) x q(x). Also we note that 1 = (x,y) pX,Y (x, y) = (x,y) q(x)r(y) = 3! 4 ! ( x q(r)) y r(y) . Thus we have: pX,Y (x, y) = q(x)r(y) = q(x)r(y)

=

+

q(x)

"

,+

r(y)

y

r(y)

"

+ '

" x

y

()

*

=1

,

q(x)

x

,+ , " q(r) r(y)

= pX (x)pY (y)

! ! b) q and r are the marginal PMFs of X and Y respectively only when x q(x) = 1 = y r(y). c) Generalizing to the multivariate case of m discrete random variables, we assume pX1 ,...,Xm (x1 , . . . , xm ) = q1 (x1 ) × · · · × qm (xm ) for all x1 , · · · , xm ∈ R. Then we have that "

pXi (xi ) =

pX1 ,...,Xm (x1 , . . . , xm )

{x1 ,...,xi ,...,xm }∈Ri−1 ×{xi }×Rm−i

"

=

q1 (x1 ) × · · · × qm (xm )

{x1 ,...,xi ,...,xm }∈Ri−1 ×{xi }×Rm−i

"

= qi (xi )

q1 (x1 ) × · · · × qi−1 (xi−1 ) × qi+1 (xi+1 ) × · · · × qm (xm )

{x1 ,...,xi−1 ,xi+1 ,...,xm

= qi (xi )

+

" x1

We also have that 1=

"

}∈Rm−1

,



q1 (x1 ) · · · ⎝

pX1 ,...,Xm (x1 , . . . , xm ) =

Rm

"

xi−1

"

⎞⎛

qi−1 (xi−1 )⎠ ⎝

"

xi+1



qi+1 (xi+1 )⎠ · · ·

q1 (x1 ) × · · · × qm (xm ) =

Rm

+ "

+

"

qm (xm )

xm

,

q1 (x1 ) · · ·

x1

,

+ "

qm (xm )

xm

thus pX1 ,...,Xm (x1 , . . . , xm ) = q1 (x1 ) × · · · × qm (xm ) ++ , + ,,m−1 " " = q1 (x1 ) × · · · × qm (xm ) q1 (x1 ) · · · qm (xm ) x1

=

m 9 i=1



⎝qi (xi )

+

" x1

,



q1 (x1 ) · · · ⎝

"

xi−1

xm

⎞⎛

qi−1 (xi−1 )⎠ ⎝

"

xi+1



qi+1 (xi+1 )⎠ · · ·

+ " xm

,

,⎞ qm (xm ) ⎠

6.4 Independent Random Variables

=

m 9

182

pX1 (xi ),

i=1

thus, by Proposition 6.14, X1 , · · · , Xm are independent. 6.61 X and Y are not independent random variables. 8 9 pX (0)pY (1) = 40 × 18 40 = 100 .

For example, pX,Y (0, 1) = 0 but

6.62 X and Y are not independent random variables since pX,Y (1, 1) = 0 but pX (1)pY (1) =

1 100

6.63 X and Y are not independent random variables since pX,Y (2, 1) = 0 but pX (2)pY (1) = 0.007 6.64 a) X and Y are not independent since Y must be greater than X and therefore depends on X. b) Using Proposition 6.10, we know that while pX (2) > 0 and pY (2) > 0, pX,Y (2, 2) = 0. c) Using Proposition 6.11, we know that while pY (2) > 0, pY |X (2|2) = 0. d) Once the first success occurs, its occurence has no effect on how much longer one must wait till the second success. e) Using Proposition 6.10 and Exercise 6.9, for all x, z ∈ R, pX,Y −X (x, z) = P (X = x, Y − X = z) = P (X = x, Y = z + x) = p2 (1 − p)x+z−2 % &% & = p(1 − p)x−1 p(1 − p)z−1 = pX (x)pY −X (z).

Therefore, X and Y − X are independent. f ) Using Proposition 6.11, P (Y − X = z|X = x) =

p2 (1 − p)x+z−2 P (Y − X = z, X = x) = = p(1 − p)z−1 = pY −X (z) P (X = x) p(1 − p)x−1

Therefore X and Y − X are independent. 6.65 If some two of the random variables X1 , · · · , Xm are not independent, then the m random variables X1 , · · · , Xm are not independent. To see that, suppose the converse, i.e. suppose for some i, j ∈ {1, · · · , m}, Xi and Xj are not independent, but X1 , · · · , Xm are independent. The latter independence, by Definition 6.5, implies the equality that P (X1 ∈ A1 , . . . , Xm ∈ Am ) = P (X1 ∈ A1 ) . . . P (Xm ∈ Am ) for all subsets A1 , · · · , Am of real numbers. Thus, taking Ak = Ω for all k ∈ / {i, j}, we obtain that P (Xi ∈ Ai , Xj ∈ Aj ) = P (Xi ∈ Ai )P (Xj ∈ Aj ) for all subsets Ai , Aj of R, which contradicts the fact that Xi and Xj are not independent. 6.66 a) Since there are only 5 people in the sample, we know that X1 + X2 ≤ 5 and thus if X1 = 5, we must have X2 = 0 b) If X1 , X2 , X3 , X4 were independent than any subset of them would be as well (see the argument in Exercise 6.65), and (a) gives an example of a subset that is not independent, and thus, X1 , X2 , X3 , X4 are not independent. 6.67 a) Since there are only 5 people in the sample, we know that X1 + X2 ≤ 5 and thus if X1 = 5, we must have X2 = 0 b) If X1 , X2 , X3 , X4 were independent than any subset of them would be as well, and a) gives an example of a subset that is not independent, and thus, X1 , X2 , X3 , X4 are not independent. 6.68 Using the same logic as in Exercise 6.66 and 6.67, we know that X1 + X2 ≤ n and thus if

183

CHAPTER 6. Jointly Discrete Random Variables

X1 = n, we must have X2 = 0. Therefore X1 and X2 are not independent. Thus X1 , . . . , Xm can’t be independent by Exercise 6.65. 6.69 Yes, X and Y are independent, since pX,Y (x, y) =

1 100

=

1 10

·

1 10

= pX (x)pY (y).

6.70 a) Since X must be at least as large as Y , then Y depends on X. 9 1 , pY (1) = 55 , pX,Y (0, 1) = 0 ̸= pX (0)pY (1), thus X and Y are not b) Note that pX (0) = 55 independent by Proposition 6.10. 9 c) Since pY (1) = 55 , pY |X (1|0) = 0, thus X and Y are not independent by Proposition 6.11. 6.71 Let X denote the time (to the nearest minute) it takes for a student to complete the mathematics exam, Y denote the corresponding time for the verbal exam. ! and!let x−1 x−1 0.02(1 − 0.025)y−1 0.025 = 49 ≈ 0.5506 a) P (X > Y ) = ∞ x=1 y=1 (1 − 0.02) 89 ! ! 2y x−1 0.975y−1 0.02 · 0.025 ≈ 0.6225 b) P (X ≤ 2Y ) = ∞ 0.98 x=1 y=1 !z−1 !z−1 x−1 (0.02)0.975z−x−1 0.025 = c) P (X + Y = z) = P (X = x, Y = z − x) = x=1 x=1 0.98 % & 0.1 (0.98)z−1 − (0.975)z−1 % . If Z denotes the total & time that a student takes to complete both exams, then pZ (z) = 0.1 (0.98)z−1 − (0.975)z−1 for z = 2, 3, 4, . . . and pZ (z) = 0 otherwise. d) Since our variables are in minutes, we are interested in the probability that the total time is less than or equal to 120 minutes. Then P (Z ≤ 120) =

120 120 " " pZ (z) = 0.1( 0.98z−1 − 0.975z−1 ) = 0.749

120 " z=2

z=2

z=2

: 6.72 a) By Proposition 6.13, X12 + X22 + X32 and X42 + X52 + X62 are independent. −1 1 X2 X3 )) b) sin(X1 X2 X3 ) and cos(X3 X5 ) are not independent since if they were, sin (sin(X = X3 X1 X2 :

would have to be independent of

cos−1 (cos(X3 X5 )) X5

= X3 .

c) By Proposition 6.13, sin(X1 X2 X5 ) and cos(X3 X4 ) are independent. d) X1 −X2 and X1 +X2 are not necessarily independent. For example, if X2 = 0 with probability one and X1 takes at least two possible values with positive probabilities, then clearly X1 − X2 and X1 + X2 are not independent, although X1 , X2 are independent. e) By Proposition 6.13, ! min{X1 , X2 , X7 } and! max{X3 , X4 } are independent. f ) By Proposition 6.13, 5k=1 Xk sin(kt) and 7k=6 Xk sin(kt) are independent.

6.73 a) Let Z denote the total number of patients that arrive each hour at both emergency rooms combined. Letting X and Y being the number of patients that arrive at the big and small emergency rooms respectively, we have: for all z ∈ Z+ , P (Z = z) = P (X + Y = z) =

z "

P (X = x, Y = z − x) =

x=0

$ # −2.6 z−x $ z # −6.9 " e 2.6 e 6.9x x=0

x!

(z − x)!

z z # $ " z! 6.9x 2.6z−x e−9.5 " z =e · = =e 6.9x 2.6z−x x!(z − x)! z! x!(z − x)! z! x x=0 x=0 x=0 !z %z & x y z and using the binomial theorem, x=0 x 6.9 2.6 = (6.9 + 2.6) = 9.5z . Thus we have that −9.5 z P (Z = z) = e z!9.5 for z ∈ Z+ . Hence, Z ∼ P(9.5). b) By definition of the Poisson distribution, we expect 9.5 visits per hour in the two hospitals combined. −(6.9+2.6)

z " 6.9x 2.6z−x

−9.5

6.4 Independent Random Variables

c) P (Z ≤ 8) =

3 −9.5 1 + P (Z = z) = e z=0

!8

9.5 1!

+

9.52 2!

+ ··· +

9.58 8!

4

184

≈ 0.3918

6.74 Let Y be the number of people making a depost during the time interval. Let X be the number of people entering the bank in the time interval. Then " " P (Y = y) = P (X = x, Y = y) = P (Y = y|X = x)P (X = x) x

=

∞ # " x=y

=

x

$ ∞ −λ λx x y e−λ py " (1 − p)x−y λx x−y e p (1 − p) · = y x! y! x=y (x − y)!

∞ ∞ e−λ py " (1 − p)x λx+y e−λ py λy " (1 − p)x λx = y! x=0 x! y! x! x=0

∞ e−λ+λ−pλ (pλ)y e−pλ (pλ)y e−λ (pλ)y " ((1 − p)λ)x = = = y! x! y! y! * 'x=0 () =e(1−p)λ

Thus Y ∼ P(pλ).

6.75 Each person entering the bank either is type 1 (with probability p1 ) or not, independently of the types of other people entering and of the number of people entering the bank, thus, by Exercise 6.74, X1 has P(λp1 ) distribution. A similar argument shows that X2 has P(λp2 ) distribution and X3 has P(λp3 ) distribution. On the other hand, let N = X1 + X2 + X3 denote the number of people entering the bank, then N has P(λ) distribution. Therefore, for all x1 , x2 , x3 ∈ Z+ , pX1 ,X2 ,X3 (x1 , x2 , x3 ) =

∞ "

P (X1 = x1 , X2 = x2 , X3 = x3 | N = k)P (N = k)

k=0

#

$ x1 + x2 + x3 x1 x2 x3 e−λ λx1 +x2 +x3 = p1 p2 p3 · (x1 + x2 + x3 )! x1 , x2 , x3 x x 2 1 (λp2 ) (λp3 )x3 (λp1 ) = e−λ · · = pX1 (x1 )pX2 (x2 )pX3 (x3 ), x1 ! x2 ! x3 ! where in the second equality we used the fact that P (X1 = x1 , X2 = x2 , X3 = x3 | N = k) = 0 unless k = x1 + x2 + x3 , and that, given N = x1 + x2 + x3 , the distribution of X1 , X2 , X3 is multinomial with parameters x1 + x2 + x3 and p1 , p2 , p3 . The last equality easily follows upon noting that e−λ = e−λp1 e−λp2 e−λp3 . Finally, if at least one of x1 , x2 , x3 is not a non-negative integer, then pX1 ,X2 ,X3 (x1 , x2 , x3 ) = 0 = pX1 (x1 )pX2 (x2 )pX3 (x3 ). Thus, X1 , X2 , X3 are independent Poisson random variables with parameters λp1 , λp2 and λp3 respectively. Theory Exercises 6.76 By Definition 6.5, if X1 , · · · , Xm are independent, then P (X1 ∈ A1 , · · · , Xm ∈ Am ) = P (X1 ∈ A1 ) . . . P (Xm ∈ Am ) for all subsets A1 , . . . , Am of real numbers, therefore, for every subset J = {i1 , · · · , ik } ⊂ {1, · · · , m}, upon taking Ai = Ω for all i ∈ {1, · · · , m} \ J , we obtain 9 9 P (Xi ∈ Ai ), P (Xi1 ∈ Ai1 , · · · , Xik ∈ Aik ) = P (Xi1 ∈ A1 ) . . . P (Xik ∈ Ak ) · P (Xi ∈ Ω) = i∈J

i∈J /

'

()

=1

*

185

CHAPTER 6. Jointly Discrete Random Variables

which, by Definition 4.5, implies that {X1 ∈ A1 }, . . . , {Xm ∈ Am } are mutually independent. Conversely, if {X1 ∈ A1 }, . . . , {X ∈ Am } are mutually independent for all subsets of R, ;m m then P (X1 ∈ A1 , · · · , Xm ∈ Am ) = i=1 P (Xi ∈ Ai ) for all subsets of R, thus, by Definition 6.5, X1 , . . . , Xm are independent. 6.77 Suppose X1 , . . . , Xm are independent. Let x1 , . . . , xm be any real numbers, and set A1 = {x1 }, . . . , Am = {xm }. Then pX1 ,...,Xm (x1 , . . . , xm ) = P (X1 = x1 , . . . , Xm = xm ) = P (X1 ∈ A1 , . . . , Xm ∈ Am ) = P (X1 ∈ A1 ) · · · P (Xm ∈ Am ) = P (X1 = x1 ) · · · P (Xm = xm ) = pX1 (x1 ) · · · pXm (xm ) Conversely, suppose pX1 ,...,Xm (x1 , . . . , xm ) = pX1 (x1 ) · · · pXm (xm ) for all x1 , · · · , xm ∈ R. Then, let A1 , . . . , Am be any m subsets of R. Applying the above equation and the FPF gives P (X1 ∈ A1 , . . . , Xm ∈ Am ) = P ((X1 , . . . , Xm ) ∈ A1 × · · · × Am ) " " " " pX1 (x1 ) · · · pXm (xm ) ··· pX1 ,...,Xm (x1 , . . . , xm ) = ···

=

(x1 ,...,xm )∈A1 ×···×Am

=

# "

x1 ∈A1

as required.

(x1 ,...,xm )∈A1 ×···×Am

$ $ # " pXm (xm ) = P (X1 ∈ A1 ) · · · P (Xm ∈ Am ) pX1 (x1 ) · · · xm ∈Am

6.78 Note that for any event E, we have {ω : IE (ω) = 1} = E and {ω : IE (ω) = 0} = E c . Suppose first that the random variables IE1 , . . . , IEm are independent, then by Exercise 6.76, the events {IE1 = 1}, . . . , {IEm = 1} are independent, implying that the events E1 , . . . , Em are independent. Conversely, suppose that E1 , . . . , Em are independent, then, by Proposition 4.5, the events F1 , . . . , Fm are also independent, where each Fi is either Ei or Eic , i = 1, · · · , m. The latter, in view of the earlier note, implies that the events {IE1 = x1 }, . . . , {IEm = xm } are independent for all x1 , · · · , xm ∈ {0, 1}. Thus, pIE1 ,...,IEm (x1 , · · · , xm ) = pIE1 (x1 ) . . . pIEm (xm ) for all x1 , . . . , xm ∈ {0, 1}. On the other hand, if at least one of x1 , . . . , xm is not in {0, 1}, then we have that pIE1 ,...,IEm (x1 , · · · , xm ) = 0 = pIE1 (x1 ) . . . pIEm (xm ). Therefore, pIE1 ,...,IEm (x1 , · · · , xm ) = pIE1 (x1 ) . . . pIEm (xm ) for all real x1 , . . . , xm , which by Exercise 6.77, implies independence of IE1 , . . . , IEm .

Advanced Exercises 6.79 a) The minimum of two random variables is greater than a value x if and only if both of the random variables are greater than x. Thus, for all x ∈ N , by independence of X, Y , P (min{X, Y } > x) = P (X > x, Y > x) = P (X > x)P (Y > x) = (1 − p)x (1 − q)x = (1 − (p + q − pq))x ,

6.4 Independent Random Variables

186

implying that for all x ∈ N , pmin{X,Y } (x) = P (min{X, Y } > x − 1) − P (min{X, Y } > x) = (1 − (p + q − pq))x−1 − (1 − (p + q − pq))x = (1 − (p + q − pq))x−1 (p + q − pq), and we conclude that min{X, Y } ∼ G(p + q − pq). b) Since both X and Y must be positive integers, then if X + Y = z, there are z − 1 ways that could happen, i.e. (X = 1, Y = z − 1), . . . , (X = z − 1, Y = 1). We can compute P (X + Y = z) as follows: P (X + Y = z) =

z−1 "

P (X = x, Y = z − x) =

x=1

= pq(1 − q)z−2

$ z−2 # " 1−p x

pq(1 − q)z−1 1 − =

p(1 − p)x−1 q(1 − q)z−x−1

x=1

x=0

#

z−1 "

p−q

3

1−q

1−p 1−q

4z−1

= pq(1 − q)z−2

$

=

1−

4z−1 1−p 1−q − 1−p 1−q

3

1

pq((1 − q)z−1 − (1 − p)z−1 ) . p−q

c) By (b) we have that P (X + Y = z) =

pq((1 − q)z−1 − (1 − p)z−1 ) p−q

< = pq((1 − q) − (1 − p)) (1 − q)z−2 + (1 − q)z−3 (1 − p) + (1 − q)z−4 (1 − p)2 + ··· + (1 − p)z−2 = p−q = pq

z−2 "

(1 − q)k (1 − p)z−2−k , z ∈ {2, 3, · · · }.

k=0

Thus, when p = q, we have that for z ∈ {2, 3, · · · }, # $ z−2 " z−1 2 z−2 2 z−2 P (X + Y = z) = p (1 − p) = (z − 1)p (1 − p) = p (1 − p)z−2 . 2−1 2

k=0

Therefore, X + Y ∼ N B(2, p). 6.80 a) First we note, pX (x) =

⎧ ⎪ ⎨

1 4, 1 2, 1 4,

x = −1 x=0 x=1

pY (y) =

⎧ ⎪ ⎨

1 4, 1 2, 1 4,

y = −1 y=0 y=1

and

⎪ ⎩

⎪ ⎩

187

CHAPTER 6. Jointly Discrete Random Variables

1 X and Y are not independent, since pX,Y (1, 1) = 0 ̸= 16 = 41 · b) First we note B 1 2 , z = −1 pX+Y (z) = 1 2, z = 1

and pX−Y (z) =

B

1 2, 1 2,

1 4

= pX (1)pY (1)

z = −1 z=1

Now, P (X + Y = −1, X − Y = −1) = P (X = −1, Y = 0) 1 1 1 = · = P (X + Y = −1)P (X − Y = −1) 4 2 2 P (X + Y = −1, X − Y = 1) = P (X = 0, Y = −1) =

1 1 1 = · = P (X + Y = −1)P (X − Y = 1) 4 2 2 P (X + Y = 1, X − Y = −1) = P (X = 0, Y = 1) =

1 1 1 = · = P (X + Y = 1)P (X − Y = −1) 4 2 2 P (X + Y = 1, X − Y = 1) = P (X = 1, Y = 0) =

1 1 1 = · = P (X + Y = 1)P (X − Y = 1) 4 2 2 Thus X + Y and X − Y are independent. =

6.81 First note that P (X1 = x1 , . . . , Xm = xm | X = x) = 0 if x1 + . . . + xm ̸= x. Then P (X1 = x1 ) =

∞ "

···

P (X1 = x1 , · · · , Xm = xm | X = x1 + · · · + xm )P (X = x1 + · · · + xm )

xm =0

x2 =0

=

∞ "

∞ "

x2

$ ∞ # " x1 + . . . + x m x 1 e−λ λx1 +...+xm ··· p1 · · · pxmm x1 , . . . , xm (x1 + . . . + xm )! x =0 =0 =

m

∞ "

x2 =0

=

#

∞ " e−(p1 +...+pm )λ λx1 · · · λxm x1 ··· p1 · · · pxmm x1 ! · · · xm ! xm =0

$#" $ #" $ ∞ ∞ e−p1 λ (p1 λ)x1 e−pm λ (pm λ)xm e−p2 λ (p2 λ)x2 e−p1 λ (p1 λ)x1 = ... x1 ! x2 ! xm ! x1 ! x =0 xm =0 () * ' 2 () * ' =1

=1

for x1 ∈ Z+ . Thus, X1 ∼ P(p1 λ). Moreover, a similar argument implies that Xi ∼ P(pi λ) for each i = 1, · · · , m. Next note that pX1 ,...,Xm (x1 , . . . , xm ) =

∞ "

P (X1 = x1 , · · · , Xm = xm | X = x)P (X = x)

x=0

= P (X1 = x1 , · · · , Xm = xm | X = x1 + · · · + xm )P (X = x1 + · · · + xm )

6.4 Independent Random Variables

=

#

188

$ e−(p1 +...+pm )λ λx1 · · · λxm x1 x1 + . . . + xm x1 e−λ λx1 +...+xm = p1 · · · pxmm p1 · · · pxmm (x1 + . . . + xm )! x1 ! · · · x m ! x1 , . . . , xm # −p1 λ $ # −pm λ $ e (p1 λ)x1 e (pm λ)xm = ··· = pX1 (x1 ) · · · pXm (xm ), x1 ! xm !

thus, Xi ∼ P(pi λ) and X1 , . . . , Xm are independent.

6.82 a) Since X1 , · · · , Xm are independent and identically distributed, then P (min{X1 , ..., Xm } > x) = P (X1 > x, ..., Xm > x) = P (X1 > x)···P (Xm > x) = (P (X1 > x))m . Thus, P (min{X1 , . . . , Xm } = x) = P (min{X1 , . . . , Xm } > x − 1) − P (min{X1 , . . . , Xm } > x) = (P (X1 > x − 1))m − (P (X1 > x))m . b) Using Proposition 5.10 and (a), P (min{X1 , . . . , Xm } > x) = (1 − p)mx . Therefore, for all x ∈ N , P (min{X1 , · · · , Xm } = x) = ((1 − p)m )x−1 − ((1 − p)m )x = ((1 − p)m )x−1 (1 − (1 − p)m ), implying that min{X1 , · · · , Xm } ∼ G(1 − (1 − p)m ). c) The maximum of several random variables is less than a value x if and only if all of the random variables are less than x. Thus, since X1 , · · · , Xm are independent and identically distributed, P (max{X1 , ..., Xm } < x) = P (X1 < x, ..., Xm < x) = P (X1 < x)···P (Xm < x) = (P (X1 < x))m . Therefore, P (max{X1 , · · · , Xm } = x) = P (max{X1 , · · · , Xm } < x + 1) − P (max{X1 , · · · , Xm } < x) = (P (X1 < x + 1))m − (P (X1 < x))m = (P (X1 ≤ x))m − (P (X1 ≤ x − 1))m = (1 − P (X1 > x))m − (1 − P (X1 > x − 1))m . d) In the case when each Xi is geometrically distributed with a common parameter p ∈ (0, 1), we have for x ∈ N , P (max{X1 , . . . , Xm } = x) = (1 − (1 − p)x )m − (1 − (1 − p)x−1 )m . Let us show that max{X1 , · · · , Xm } is not a geometric random variable. Suppose the converse, i.e. max{X1 , · · · , Xm } ∼ G(q). Then we must have that P (max{X1 , · · · , Xm } = 1) = q = pm , and, thus, in view of (a), (1 − (1 − p)2 )m − pm = P (max{X1 , · · · , Xm } = 2) = q(1 − q) = pm (1 − pm ).

189

CHAPTER 6. Jointly Discrete Random Variables

In other words, we must have that pm (2 − p)m − pm = pm (1 − pm ), i.e. (2 − p)m = 2 − pm , but the latter equality holds for m ≥ 2 only if p = 1. Thus, for m = 2, 3, . . ., max{X1 , · · · , Xm } is not a geometric random variable. 6.83 a) The PMF of a constant random variable X is 1, x = c pX (x) = 0, x ̸= c b) First suppose that X is independent of itself. Let us show that it must be a constant random variable. If X is independent of itself, P (X = x, X = y) = P (X = x)P (X = y) for all x, y. Specifically, if x = y, then P (X = x) = (P (X = x))2 for all x. Hence (P (X = x))2 − P (X = x) = 0



P (X = x)(P (X = x) − 1) = 0

And thus P (X = x) must equal either 0 or 1 for all x. And therefore, it must equal 1 for exactly one value of x, and 0 for all other values of x, and thus it is constant. Next, assume that X is a constant random variable equal to c. For any x, y, we have 1, if x = y = c P (X = x, X = y) = 0, if otherwise Additionally, P (X = x)P (X = y) = 0

unless x = y = c

and P (X = x)P (X = y) = 1 × 1 = 1 when x = y = c. Thus, P (X = x, X = y) = P (X = x)P (X = y) for all x and y, and therefore, the constant random variable X is independent of itself. c) First, assume that X is independent of all other random variables defined on the same sample space. Then X and −X are independent, implying that pX (x)p−X (−x) = pX,−X (x, −x) = P (X = x, −X = −x) = P (X = x) = pX (x), thus, for each x ∈ R, either pX (x) = 0 or p−X (−x) = pX (x) = 1. Therefore, X equals to a constant with probability one. Next assume that P (X = c) = 1 for some c ∈ R (P (X = x) = 0 for all x ̸= c). Then for any given random variable Y defined on the same probability space, for all y ∈ R, pX,Y (c, y) = P (X = c, Y = y) = P (Y = y) = 1 · pY (y) = pX (c)pY (y), and pX,Y (x, y) = P (X = x, Y = y) = 0 = 0 · P (Y = y) = pX (x)pY (y) for all x ̸= c, for all y ∈ R. Thus, a constant random variable is independent of any other random variable.

6.5 Functions of Two or More Discrete Random Variables

190

6.84 a) P (X = x, Y = y | Z = z) = P (X = x | Z = z)P (Y = y | Z = z) for all z ∈ R such that P (Z = z) > 0. b) X1 and X3 are not independent, since P (X1 = 1, X3 = −3) = 0 ̸= p(1−p)3 = pX1 (1)pX3 (−3). c) X1 and X3 are conditionally independent given X2 . For X2 = −2, P (X1 = −1, X3 = −3|X2 = −2) = 1−p = 1·(1−p) = P (X1 = −1|X2 = −2)P (X3 = −3|X2 = −2) P (X1 = −1, X3 = −1|X2 = −2) = p = 1 · p = P (X1 = −1|X2 = −2)P (X3 = −1|X2 = −2) and P (X1 = x1 , X3 = x3 |X2 = −2) = 0 = P (X1 = x1 |X2 = −2)P (X3 = x3 |X2 = −2) for all other (x1 , x3 ). For X2 = 0, P (X1 = −1, X3 = −1|X2 = 0) =

1 1−p = · (1 − p) = P (X1 = −1|X2 = 0)P (X3 = −1|X2 = 0) 2 2

P (X1 = −1, X3 = 1|X2 = 0) = P (X1 = 1, X3 = −1|X2 = 0) =

p 1 = · p = P (X1 = −1|X2 = 0)P (X3 = 1|X2 = 0) 2 2

1 1−p = · (1 − p) = P (X1 = 1|X2 = 0)P (X3 = −1|X2 = 0) 2 2

P (X1 = 1, X3 = 1|X2 = 0) =

p 1 = · p = P (X1 = 1|X2 = 0)P (X3 = 1|X2 = 0) 2 2

and P (X1 = x1 , X3 = x3 |X2 = 0) = 0 = P (X1 = x1 |X2 = 0)P (X3 = x3 |X2 = 0) for all other (x1 , x3 ). For X2 = 2, P (X1 = 1, X3 = 1|X2 = 2) = 1 − p = 1 · (1 − p) = P (X1 = 1|X2 = 2)P (X3 = 1|X2 = 2) P (X1 = 1, X3 = 3|X2 = 2) = p = 1 · p = P (X1 = 1|X2 = 2)P (X3 = 3|X2 = 2) and P (X1 = x1 , X3 = x3 |X2 = 2) = 0 = P (X1 = x1 |X2 = 2)P (X3 = x3 |X2 = 2) for all other (x1 , x3 ). d) X1 and X2 are not independent. We have P (X1 = 1, X2 = −2) = 0 ̸= p(1 − p)2 = pX1 (1)pX2 (−2). e) X1 and X2 are not conditionally independent given X3 since P (X1 = −1, X2 = 0 | X3 = −1) = but P (X1 = −1 | X3 = −1)P (X2 = 0 | X3 = −1) =

6.5

2 3

·

2 3

1 3

= 49 .

Functions of Two or More Discrete Random Variables

Basic Exercises

191

CHAPTER 6. Jointly Discrete Random Variables

6.85 a)

b)

c)

d)

z

4

5

6

7

8

9

pX+Y (z)

0.06

0.28

0.28

0.26

0.10

0.02

z

−1

0

1

2

pX−Y (z)

0.06

0.4

0.5

0.04

z

2

3

4

5

pmax{X,Y } (z)

0.06

0.52

0.4

0.02

z

2

3

4

pmin{X,Y } (z)

0.38

0.5

0.12

6.86 a)

z

0

1

2

pX−Y (z)

19 40

19 40

1 20

b) X − Y represents tha number of brothers a student has. 6.87 a)

z

0

1

2

3

4

5

pY −X (z)

1 6

5 18

2 9

1 6

1 9

1 18

b) Y − X is the magnitude of the difference between the two values of the die. 6.88 a) Let X, Y be the arrival times of the two people, and let Z be the number of minutes that the first person to arrive waits for the second person to arrive, i.e. Z = max{X, Y }−min{X, Y }. Clearly, the possible values of Z are 0, 1, · · · , 60, and pZ (0) =

60 "

pX,Y (x, x) =

60 " 1 1 = , 612 61 x=0

x=0

and for z ∈ {1, · · · , 60}, using the symmetry (pX,Y (x, y) = pX,Y (y, x)), pZ (z) = 2

60−z "

pX,Y (x, x + z) = 2

x=0

60−z " x=0

1 2(61 − z) 2(61 − z) = = , 2 2 61 61 3721

and pZ (z) = 0 otherwise. ! 1171 b) P (Z ≤ 10) = 10 z=0 pZ (z) = 3721 ≈ 0.3147 c) pmin{X,Y } (z) = 121−2z / {0, · · · , 60}, since 3721 for z ∈ {0, · · · , 60} and pmin{X,Y } (z) = 0 for all z ∈ P (min{X, Y } = z) = pX,Y (z, z) + 2P (X = z, Y > z) =

1 60 − z 121 − 2z 1 +2· · = 2 61 61 61 3721

for all z ∈ {0, · · · , 60}. d) pmax{X,Y } (z) = 2z+1 / {0, · · · , 60}, since 3721 for z ∈ {0, · · · , 60} and pmin{X,Y } (z) = 0 for all z ∈ P (max{X, Y } = z) =

z "

P (max{X, Y } = z, min{X, Y } = k)

k=0

=

z−1

" 2 2z + 1 1 + = , z = 0, · · · , 60. 612 612 3721 k=0

6.5 Functions of Two or More Discrete Random Variables

192

6.89 a) Let Z = |X − Y |. Then there are N + 1 different ways that Z = 0. Thus P (Z = 0) = N +1 = N 1+1 . Assuming that X > Y , if Z = z for z = 1, . . . , N , then X − Y = z. There are (N +1)2 N + 1 − z ways for this to happen. Thus there are 2(N + 1 − z) ways for Z = z for z = 1, . . . , N , +1−z) hence P (Z = z) = 2(N for z = 1, . . . , N and P (Z = z) = 0 otherwise. (N +1)2 b) For z = 1, · · · , N , N −z "

pZ (z) =

pX,Y (x, x + z) +

x=0

N −z "

pX,Y (y + z, y) =

y=0

pZ (0) =

N "

N "

pX,Y (x, x) =

x=0

x=0

and pZ (z) = 0 for all z ∈ / {0, · · · , N }.

2(N + 1 − z) ; (N + 1)2

1 1 = , 2 (N + 1) N +1

6.90 a) Let Z = g(X, Y ) = X + Y . Then for z = 0, . . . , N , pZ (z) =

z "

pX,Y (x, z − x) =

z " x=0

x=0

1 z+1 = ; (N + 1)2 (N + 1)2

for z = N + 1, · · · , 2N , N "

pZ (z) =

pX,Y (x, z − x) =

x=z−N

x=z−N

Thus, pZ (z) =

N +1−|N −z| (N +1)2

N "

1 2N − z + 1 = . 2 (N + 1) (N + 1)2

for z = 0, . . . , 2N , and pZ (z) = 0 otherwise.

b) Note that for z ∈ {0, · · · , N }, N "

P (min{X, Y } = z) =

P (min{X, Y } = z, max{X, Y } = k) = pX,Y (z, z) +

k=z

k=z+1

1 2(N − z) 2(N − z) + 1 + = ; (N + 1)2 (N + 1)2 (N + 1)2

=

N "

2 (N + 1)2

and pmin{X,Y } (z) = 0 for all z ∈ / {0, · · · , N }. c) For z ∈ {0, · · · , N }, z "

P (max{X, Y } = z) =

P (min{X, Y } = k, max{X, Y } = z) = pX,Y (z, z) +

k=0

k=0

1 2z 2z + 1 + = ; 2 2 (N + 1) (N + 1) (N + 1)2

=

z−1 "

2 (N + 1)2

and pmax{X,Y } (z) = 0 for all z ∈ / {0, · · · , N }. 6.91 a) Following Example 6.22(b), we have pmin{X,Y } (z) =

""

min{x,y}=z

pX,Y (x, y) = pX,Y (z, z) +

∞ "

y=z+1

pX,Y (z, y) +

∞ "

x=z+1

pX,Y (x, z)

193

CHAPTER 6. Jointly Discrete Random Variables

z−1

= pq((1 − p)(1 − q))

+

∞ "

z−1

pq(1 − p)

y−1

(1 − q)

∞ "

+

pq(1 − p)x−1 (1 − q)z−1

x=z+1

y=z+1

= pq((1 − p)(1 − q))z−1 + pq(1 − p)z−1 (1 − q)z

∞ "

(1 − q)y + pq(1 − p)z (1 − q)z−1

∞ " (1 − p)x x=0

y=0

= pq((1 − p)(1 − q))z−1 + p(1 − p)z−1 (1 − q)z + q(1 − p)z (1 − q)z−1

= pq((1 − p)(1 − q))z−1 + p(1 − q)((1 − p)(1 − q))z + q(1 − p)((1 − p)(1 − q))z−1 = (pq + p(1 − q) + q(1 − p))(1 − (p + q − pq))z−1 = (p + q − pq)(1 − (p + q − pq))z−1 for z ∈ N and pmin{X,Y } (x, y) = 0 otherwise. b) Let Z = min{X, Y }. Then using tail probabilities, P (Z = z) = P (Z ≥ z) − P (Z ≥ z + 1). Now , +∞ ,+ ∞ " " P (Y = y) P (Z ≥ z) = P (X ≥ z, Y ≥ z) = P (X ≥ z)P (Y ≥ z) = P (X = x) y=z

x=z

=

+∞ "

p(1 − p)x−1

q(1 − q)y−1

y=z

x=z

= p(1 − p)z−1

,+ ∞ "

+

∞ " (1 − p)x x=0

,



,

⎞ ∞ " q(1 − q)z−1 ⎝ (1 − q)y ⎠ y=0

= ((1 − p)(1 − q))z−1

Similarily, P (Z ≥ z + 1) = ((1 − p)(1 − q))z . Thus P (Z = z) = ((1 − p)(1 − q))z−1 − ((1 − p)(1 − q))z = (p + q − pq)(1 − (p + q − pq))z−1 , Hence, Z ∼ G(p + q − pq). c) If p = q, then P (Z = z) = (2p − p2 )(1 − (2p − p2 )z−1 which agrees with Example 6.22 (b). Theory Exercises 6.92 Note that ⎛

P (g(X1 , · · · , Xm ) = z) = P ⎝ ⎛

=P⎝

2

(x1 ,··· ,xm

2

)∈g −1 ({z})



{(X1 , · · · , Xm ) = (x1 , · · · , xm )}⎠ ⎞

{(X1 , · · · , Xm ) = (x1 , · · · , xm )}⎠

(x1 ,··· ,xm )∈g −1 ({z}): pX1 ,··· ,Xm (x1 ,··· ,xm )>0

=

"

pX1 ,··· ,Xm (x1 , · · · , xm )

(x1 ,··· ,xm )∈g −1 ({z}): pX1 ,··· ,Xm (x1 ,··· ,xm )>0

=

"

(x1 ,··· ,xm )∈g −1 ({z})

pX1 ,··· ,Xm (x1 , · · · , xm ),

6.5 Functions of Two or More Discrete Random Variables

194

where we used the fact that the set of (x1 , · · · , xm ), such that pX1 ,··· ,Xm (x1 , · · · , xm ) > 0, is countable, and then applied the countable additivity of the probability measure. (The last equality holds since arbitrary sums of zeros are always equal to zero). 6.93 a) Let g(x, y) = x + y and g(X, Y ) = Z. Then g −1 ({z}) = {(x, y) : x + y = z} = {(x, y) : y = z − x}. Thus g−1 ({z}) is the set where x can vary freely, and y = z − x. Therefore, ""

pZ (z) =

pX,Y (x, y) =

"

pX,Y (x, z − x).

x

(x,y)∈g −1 ({z})

b) Let g(x, y) = y − x and g(X, Y ) = Z. Then g−1 ({z}) = {(x, y) : y − x = z} = {(x, y) : y = x + z}. Thus g−1 ({z}) is the set where x can vary freely, and y = x + z. Therefore, pZ (z) =

""

pX,Y (x, y) =

"

pX,Y (x, x + z).

x

(x,y)∈g −1 ({z})

c) If X and Y are independent, then "

pX+Y (z) =

pX (x)pY (z − x)

x

and

"

pY −X (z) =

pX (x)pY (x + z).

x

6.94 First, since {X + Y = z} =

2 2 {X + Y = z, X = x} = {X = x, Y = z − x} x

we have pX+Y (z) =

!

x pX,Y

x

(x, z − x). Second, since

{Y − X = z} =

2 2 {Y − X = z, X = x} = {X = x, Y = x + z} x

we have pY −X (z) =

!

x pX,Y

x

(x, x + z).

6.95 a) Let g(x, y) = xy and g(X, Y ) = Z. Then g−1 ({z}) = {(x, y) : xy = z} = {(x, y) : y = z/x}. Thus g−1 ({z}) is the set where x can vary freely, and y = z/x. Therefore, pZ (z) =

""

pX,Y (x, y) =

"

pX,Y (x, z/x).

x

(x,y)∈g −1 ({z})

b) Let g(x, y) = y/x and g(X, Y ) = Z. Then g−1 ({z}) = {(x, y) : y/x = z} = {(x, y) : y = xz}. Thus g−1 ({z}) is the set where x can vary freely, and y = xz. Therefore, pZ (z) =

""

(x,y)∈g −1 ({z})

pX,Y (x, y) =

" x

pX,Y (x, xz).

195

CHAPTER 6. Jointly Discrete Random Variables

c) If X and Y are independent, then pXY (z) =

"

pX (x)pY (z/x)

x

and pY /X (z) =

"

pX (x)pY (xz).

x

6.96 First, since {XY = z} =

2 2 {XY = z, X = x} = {X = x, Y = z/x} x

we have pXY (z) =

!

x

x pX,Y

(x, z/x). Second, since 2 2 {Y /X = z} = {Y /X = z, X = x} = {X = x, Y = xz} x

we have pY /X (z) =

x

!

x pX,Y (x, xz).

Advanced Exercises 6.97 a) X + Y ∼ B(m + n, p). Observing m independent trial each with success probability p and then independently observing n independent trails each with success probability p is the same as observing m + n independent trials each with success probability p. Moreover, if X is the number of successes in m independent trials and Y is the number of successes in the next n independent trials, then X + Y is the number of successes in m + n independent trials (with probability of success p in each trial). % & b) Using Proposition 6.16 and the convention that ab = 0 if either a > b or a < 0, we have that for z ∈ {0, · · · , m + n}, P (X + Y = z) =

""

pX,Y (x, y) =

x+y=z

=

z # $ " m x=0

x

x

n−x

p (1−p)

#

z "

P (X = x, Y = z − x) =

x=0

$ n pz−x (1−p)n−z−x = pz (1−p)m+n−z z−x

=

#

$ m+n z p (1 − p)m+n−z , z

z "

P (X = x)P (Y = z − x)

x=0

$ z # $# " m n x z−x () * 'x=0 =(n+m by Vandermonde’s identity ) k

and, therefore, X + Y ∼ B(m + n, p) c) Claim : X1 + . . . + Xm ∼ B(n1 + . . . + nm , p). Proof is done by induction. By (b), the claim is true for m = 2. Next assume that the claim is true for all m ≤ k, and let Y = X1 + . . . + Xk . Since Xk+1 is independent of X1 , . . . , Xk , then Xk+1 is independent of a function of X1 , . . . , Xk , namely their sum, Y . Thus, by the induction assumption, Xk+1 and

6.6 Sums of Discrete Random Variables

196

Y are two independent Binomial random variables with identical success probability p and the number of trials equal to nk+1 and n1 + . . . + nk respectively. Then using (b), we have that X1 + . . . + Xk + Xk+1 = Xk+1 + Y ∼ B(n1 + . . . + nk+1 , p) and therefore the claim is true for m = k + 1. Thus by the principle of mathematical induction, the claim is true for all m ∈ N . 6.98 a) Using the binomial theorem, we have for z ∈ Z+ , P (X + Y = z) =

z "

P (X = x, Y = z − x) =

x=0

z "

P (X = x)P (Y = z − x)

x=0

z z " e−λ λx e−µ µz−x e−(λ+µ) " z! = · = λx µz−x x! (z − x)! z! x!(z − x)! x=0

=

x=0

e−(λ+µ) z!

z # $ " z x z−x e−(λ+µ) (λ + µ)z = . λ µ z! x x=0 ' () * =(λ+µ)z

Hence, X + Y ∼ P(λ + µ). b) Claim : X1 + . . . + Xm ∼ P(λ1 + . . . + λm ). Proof by induction. By (a), the claim is true for m = 2. Assume that the claim is true for all m ≤ k. Let Y = X1 + . . . + Xk , then Y ∼ P(λ1 + · · · + λk ). Since Xk+1 is independent of X1 , . . . , Xk , it is independent of a function of X1 , . . . , Xk , namely their sum, Y . Thus Xk+1 and Y are two independent Poisson random variables with rates, λk+1 and λ1 + . . . + λk respectively. Then using (a), we have that X1 + . . . + Xk + Xk+1 = Xk+1 + Y ∼ P(λ1 + . . . + λk+1 ) and therefore the claim is valid for m = k + 1. Thus by the induction principle, the claim is true for all m ∈ N . 6.99 Let KX be the countable subset of R such that P (X ∈ KX ) = 1 and KY be the countable subset of R such that P (Y ∈ KY ) = 1. Then KX × KY is a countable subset of R2 and, if we let G = {g(x, y) : (x, y) ∈ KX × KY }, then G is countable by Exercise 1.39. Moreover, P (g(X, Y ) ∈ G) = P ((X, Y ) ∈ KX × KY ) = 1. Thus, g(X, Y ) is a discrete random variable.

6.6

Sums of Discrete Random Variables

Basic Exercises 6.100 Using Proposition 6.18, we have: " pX+Y (4) = pX,Y (x, 4 − x) = pX,Y (2, 2) = 0.06 x

pX+Y (5) =

"

pX,Y (x, 5 − x) = pX,Y (3, 2) = 0.28

x

pX+Y (6) =

"

pX,Y (x, 6 − x) = pX,Y (4, 2) + pX,Y (3, 3) = 0.04 + 0.24 = 0.28

x

pX+Y (7) =

" x

pX,Y (x, 7 − x) = pX,Y (4, 3) + pX,Y (3, 4) = 0.22 + 0.04 = 0.26

197

CHAPTER 6. Jointly Discrete Random Variables

pX+Y (8) =

"

pX,Y (x, 8 − x) = pX,Y (4, 4) = 0.1

x

pX+Y (9) =

"

pX,Y (x, 9 − x) = pX,Y (4, 5) = 0.02

x

X +Y 6.101 a) The random variable M = is the mean value of two faces obtained in two 2 tosses of the die. b) The PMF of M is given by: ⎧ 2z − 1 ⎪ , if z ∈ {1, 1.5, 2, 2.5, 3, 3.5}, ⎪ ⎪ ⎪ 36 ⎨ 13 − 2z pM (z) = , if z ∈ {4, 4.5, 5, 5.5, 6} ⎪ ⎪ 36 ⎪ ⎪ ⎩ 0, otherwise. Indeed, by Proposition 6.18, pX+Y (z) =

z−1 "

pX,Y (x, z − x) =

x=1

pX+Y (z) =

6 "

pX,Y (x, z − x) =

x=z−6

z−1 " 1 z−1 = , for z ∈ {2, · · · , 7}, 36 36 x=1

13 − z 6 − (z − 6 − 1) = , for z ∈ {8, · · · , 12}, 36 36

and pX+Y (z) = 0 for all z ∈ / {2, · · · , 12}. Then by Proposition 5.14, ⎧ 2z−1 ⎪ ⎨ 36 , if 2z ∈ {2, · · · , 7}, 13−2z pM (z) = pX+Y (2z) = 36 , if 2z ∈ {8, · · · , 12}, ⎪ ⎩ 0, otherwise. The required conclusion for PMF of M = (X + Y )/2 then follows.

6.102 Let Z = X + Y . Then using Proposition 6.19, for z = 0, . . . , N pZ (z) = pX+Y (z) =

z "

pX,Y (x, z − x) =

x=0

z "

1 z+1 = , 2 (N + 1) (N + 1)2 x=0

and for z = N + 1, . . . , 2N pZ (z) = pX+Y (z) =

2N "

x=z−N

pX,Y (x, z − x) =

2N "

x=z−N

1 2N − z + 1 = , 2 (N + 1) (N + 1)2

which is the same answer as obtained earlier in Exercise 6.90(a). 6.103 No, the sum of two independent geometric random variables with the same success probability parameter p is not a geometric random variable. It is instead a negative binomial random variables with parameters 2 and the common success probability p. 6.104 Using Proposition 6.19, if X is the number of claims in the first week, Y is the number

6.6 Sums of Discrete Random Variables

198

of claims in the second week, and Z = X + Y is the total number of claims received during the two weeks, then, pZ (z) =

z "

P (X = x)P (Y = z − x) =

z "

2−(x+1) 2−(z−x+1) =

x=0

x=0

z "

2−(z+2) =

x=0

z+1 2z+2

for z = 0, 1, . . . and pZ (z) = 0 otherwise. 6.105 a) For u ∈ N , pX+1 (u) = pX (u − 1) =

1 1 1 = (1 − )u−1 , thus, X + 1 ∼ G(0.5). u 2 2 2

Similarly, Y + 1 is G(0.5). b) By Exercise 6.103, X + Y + 2 ∼ N B(2, 0.5), since independence of X and Y implies independence of X + 1 and Y + 1. c) Using (b) and Proposition 6.17, # $ z+1 1 z+1 = z+2 pX+Y (z) = pX+Y +2 (z + 2) = z+2 1 2 2 for z = 0, 1, 2 · · · , and pX+Y (z) = 0 for all z ∈ / Z+ . The result agrees with the answer obtained in Exercise 6.104. 6.106 a) Part (a) of Proposition 6.20 says that the sum of independent Binomial random variables with varying n and identical probabilities of success will also be a Binomial random variable with the number of trials being the sum of all the number of trials from each of the random variables, and the success probability will remain the same. Part (b) of Proposition 6.20 says that the sum of independent Poisson random variables with varying parameters will be a Poisson random variable with its parameter the sum of each of the random variables parameters. Part (c) of Proposition 6.20 says that the sum of independent Negative Binomial random varaibles with varying r and identical probabilities of success will also be a Negative Binomial random variable with r being the sum of all the r’s from each of the random variables, and the success probability will remain the same. 6.107 From Exercise 6.106, X1 + . . . + Xr ∼ N B(r, p). 6.108 Since X + Y ∼ P(λ + µ), then for z = 0, 1, . . ., P (X = x)P (Y = z − x) P (X = x, X + Y = z) = = pX|X+Y (x|z) = P (X + Y = z) P (X + Y = z) λx µz−x z! = = (λ + µ)z x!(z − x)! 4 3 λ . Thus (X | X + Y = z) ∼ B z, λ+µ

e−(λ+µ) λx µz−x x!(z−x)! e−(λ+µ) (λ+µ)z z!

# $# $x # $z−x z λ µ , x = 0, · · · , z. x λ+µ λ+µ

6.109 a) Since X + Y ∼ N B(r + s, p), then for z = r + s, r + s + 1, · · · , pX|X+Y (x|z) =

P (X = x, X + Y = z) P (X = x)P (Y = z − x) = P (X + Y = z) P (X + Y = z)

199

=

CHAPTER 6. Jointly Discrete Random Variables

3%

4 3% 4 %x−1&%z−x−1& z−x−1& s z−x−s − p)x−r p (1 − p) s−1 % z−1 & % z−1s−1& , x = r, r + 1, · · · , z − s. = r−1 z−(r+s) r+s (1 − p) (r+s)−1 p (r+s)−1

x−1& r r−1 p (1

!z−s b) Since a conditional PMF is a PMF, then x=r pX|X+Y (x|z) = 1 and the desired result follows. c) Using (b) and Proposition 6.19, for z = r + s, r + s + 1, . . . pX+Y (z) =

"

pX (x)pY (z − x)

x

=p

r+s

z−(r+s)

(1 − p)

$# $ z−s # " x−1 z−x−1 x=r

=

#

r−1

s−1

$ z−1 pr+s (1 − p)z−(r+s) . r+s−1

Hence, X + Y ∼ N B(r + s, p) 6.110 Let X denote the number of people waiting when the Ice Cream Shoppe opens, Y denote the number of customers that arrive during the time that the shop is open, and let Z = X + Y be the total number of customers. Then for z ≤ N z e−λ " λz−x e−λ λz−x = pZ (z) = P (X = x)P (Y = z − x) = (N + 1)(z − x)! N + 1 x=0 (z − x)! x=0 x=0 z "

z "

N "

N "

and for z > N pZ (z) =

P (X = x)P (Y = z − x) =

x=0

x=0

N

e−λ λz−x e−λ " λz−x = (N + 1)(z − x)! N +1 (z − x)! x=0

and pZ (z) = 0 otherwise. % & 6.111 a) For z = 0, . . . , m + n, using the convention that ab = 0 if a < 0 or a > b, we have that # $ z # $ z " " m x n m−x P (X = x)P (Y = z − x) = pX+Y (z) = p (1 − p) q z−x (1 − q)n−z+x x z − x x=0 x=0 m z

n−z

= (1 − p) q (1 − q)

$ $# z # $# " m n p(1 − q) x q(1 − p) x z−x x=0

and pX+Y (z) = 0 otherwise. b) If p = q, then from Proposition 6.20, X + Y ∼ B(m + n, p). c) Letting p = q in (a), we have m z

n−z

pX+Y (z) = (1 − p) p (1 − p)

$ $# z # $# " m n p(1 − p) x p(1 − p) x z−x x=0

6.6 Sums of Discrete Random Variables

z

m+n−z

= p (1 − p)

200

$ # $ z # $# " m n m+n z = p (1 − p)m+n−z , x z−x z 'x=0 () * m+n =( z )

by the Vandermonde’s identity, and thus X + Y ∼ B(m + n, p).

6.112 a) The total claim amount distribution is compound Poisson. b) The answers will vary. Theory Exercises 6.113 Applying Proposition 6.16 to function g(x + y) = y + x gives pX+Y (z) =

"

pX,Y (z) =

z−y " "

pX,Y (x, y) =

y x=z−y

(x,y): x+y=z

"

pX,Y (z − y, y).

y

6.114 a) If X and Y are independent Poisson random variables with parameters λ and µ, then so is their sum with parameter λ + µ. In otherwords, if X is counting the number of arrivals when the arrival rate is λ, and Y is counting the number of arrivals when the arrival rate is µ, then the total number of arrivals will have an arrival rate of λ + µ. b) Using the binomial theorem, we have P (X + Y = z) =

z "

P (X = x, Y = z − x) =

=

P (X = x)P (Y = z − x)

x=0

x=0

=

z "

z z " e−λ λx e−µ µz−x e−(λ+µ) " z! · = λx µz−x x! (z − x)! z! x!(z − x)! x=0 x=0

z # $ e−(λ+µ) " z x z−x e−(λ+µ) (λ + µ)z = , z = 0, 1, . . . . λ µ z! z! x () * 'x=0 =(λ+µ)z

Hence, X + Y ∼ P(λ + µ).

6.115 Using the binomial series formula (5.45) we have that for |t| < 1, a

b

(1 + t) (1 + t)

#" ∞ # $ $#" ∞ # $ $ ∞ " ∞ # $# $ " a j b k a b j+k t = t = t j k j k j=0

j=0 k=0

k=0

$ $C ∞ # $# ∞ " ∞ -" ℓ # $# " " a b a b ℓ = t = tℓ ; j ℓ−j j ℓ−j j=0 ℓ=j

ℓ=0

on the other hand, a

b

a+b

(1 + t) (1 + t) = (1 + t)

=

j=0

$ ∞ # " a+b ℓ=0



tℓ .

201

CHAPTER 6. Jointly Discrete Random Variables

Thus, we have the following equality for corresponding polynomials (in t): $C $ ∞ -" ℓ # $# ∞ # " " a b a+b ℓ ℓ t = t, j ℓ−j ℓ j=0

ℓ=0

ℓ=0

implying that the coefficients in front of tℓ on both sides of the equality are the same, namely, for ℓ = 0, 1, · · · , $ # $ ℓ # $# " a b a+b = . j ℓ−j ℓ j=0

6.116 a) The sum of two independent negative binomial random variables with parameters (r, p) and (s, p) is again a negative binomial random variable with parameters (r + s, p). b) If X and Y are independent negative binomial random variables where X counts the number of trials until the r th success and Y counts the number of trials until the sth success, then their sum will count the number of trials until the r + sth success. c) Let Z = X + Y . Then using the representation of the negative binomial PMF given by equation (5.44), z−s " P (X = x, Y = z − x) pZ (z) = P (X + Y = z) = x=r

=

z−s # " x=r

=p

r+s

z−(r+s)

(1 − p)

$

$ −r −s r x−r p (1 − p) ps (1 − p)z−x−s x−r z−x−s

z−(r+s) #

" x=0

−r x

#

$#

−s z − (r + s) − x

$

=p

r+s

z−(r+s)

(1 − p)

#

$ −(r + s) z − (r + s)

where the last step follows from Exercise 6.115. Therefore, X + Y ∼ N B(r + s, p). 6.117 Proposition 6.20 (a) was proved in Exercise 6.97(c). Proposition 6.20 (c) was proved in Exercise 6.98(b). For Proposition 6.20 (b) we have the following: Claim : X1 + . . . + Xm ∼ N B(r1 + . . . + rm , p). Proof is done by induction. By Exercise 6.116, the claim is true for m = 2. Assume that the claim is true for all m ≤ k. Let us show that the claim also holds for m = k +1. Let Y = X1 +. . .+Xk . Since Xk+1 is independent of X1 , . . . , Xk , it is independent of a function of X1 , . . . , Xk , namely their sum, Y . Thus Xk+1 and Y are two independent Negative Binomial with identical probabilities of success and numbers of successes rk+1 and r1 +. . .+rk respectively. Then X1 +. . .+Xk +Xk+1 = Xk+1 +Y ∼ N B(r1 +. . .+rk+1 , p) and, therefore, the claim is true for m = k + 1 . Thus, by induction, the claim is true for all m ∈ N. Advanced Exercises 6.118 We have pX+Y +Z (w) =

"""

pX,Y,Z (x, y, z) =

x+y+z=w

=

" x

pX (x)

+

" y

"""

pX (x)pY (y)pZ (z)

x+y+z=w

,

pY (y)pZ (w − x − y)

=

" x

pX (x)pY +Z (w − x),

6.6 Sums of Discrete Random Variables

202

by Proposition 6.18. 6.119 a) Note that for all ω ∈ R, 2 2 {X1 + . . . + Xm = w} = ··· {X1 = x1 , . . . , Xm−1 = xm−1 , Xm = w − (x1 + . . . + xm−1 )} (x1 ,...,xm−1 )

and the desired result follows from countable additivity of the probability measure and the fact that the set of (x1 , · · · , xm−1 ), where pX1 ,··· ,Xm−1 (x1 , · · · , xm−1 ) > 0, is countable. b) In addition to (a), we have for i = 1, · · · , m − 1, # $ "" " pX1 +···+Xm (w) = pX1 ,··· ,Xm x1 , · · · , xi−1 , w − xk , xi+1 , · · · , xm . (x1 ,··· ,xi−1 ,xi+1 ,··· ,xm )

k∈{1,··· ,m}\{i}

c) If X1 , · · · , Xm are independent, then for i = 1, · · · , m, # $ # "" 9 pX1 +···+Xm (w) = pXk (xk ) pXi w − (x1 ,··· ,xi−1 ,xi+1 ,··· ,xm )

k∈{1,··· ,m}\{i}

"

xk .

k∈{1,··· ,m}\{i}

d) For three random variables X, Y, Z, we have "" pX+Y +Z (w) = pX,Y,Z (x, y, w − (x + y)) (x,y)

or pX+Y +Z (w) =

""

pX,Y,Z (x, w − (x + z), z)

(x,z)

or pX+Y +Z (w) =

""

pX,Y,Z (w − (y + z), y, z).

(y,z)

If X, Y, Z are independent, then pX+Y +Z (w) =

""

pX (x)pY (y)pZ (w − x − y)

(x,y)

or pX+Y +Z (w) =

""

pX (x)pY (w − x − z)pZ (z)

(x,z)

or pX+Y +Z (w) =

""

pX (w − y − z)pY (y)pZ (z).

(y,z)

6.120 For m = 2, 3, . . ., by independence of X1 , · · · , Xm and Exercise 6.19(c), we have pX1 +···+Xm (w) =

"

···

9 " #m−1

(x1 ,··· ,xm−1 )

=

"

···

"

(x1 ,··· ,xm−1 )

k=1

$ pXk (xk ) pXm (w − (x1 + ··· + xm−1 ))

pX1 ,··· ,Xm−1 (x1 , · · · , xm−1 )pXm (w − (x1 + · · · + xm−1 ))

$

203

CHAPTER 6. Jointly Discrete Random Variables

=

" x

=

" x

⎛ ⎝ '

"

···

⎛ ⎝

"

"

···

"

x1 +...+xm−1 =x

(x1 ,...,xm−2 )



pX1 ,··· ,Xm−1 (x1 , · · · , xm−1 )⎠ pXm (w − x) ⎞

pX1 ,··· ,Xm−1 (x1 , · · · , xm−2 , x − (x1 + · · · + xm−2 ))⎠ pXm (w − x) ()

=pX

=

"

1 +···+Xm−1

(x), by Exercise 6.19(a)

pX1 +···+Xm−1 (x)pXm (w − x), for each w ∈ R.

*

x

6.121 The PMF of X + Y + Z is given by: w 1 2 3 4 5 pX+Y +Z (w)

6.7

4 63

2 9

1 3

17 63

1 9

Review Exercises

Basic Exercises College, Y 2 pX (x)

1 1

0.171

0.257

0.429

2

0.229

0.343

0.571

pY (y)

0.4

0.6

1.0

6.122 a) Gender, X

b) Because the first column must sum to 14, and there are 15 different combinations of nonnegative integers that sum to 14, there are 15 different ways of filling in the blanks. c) Since there are 15 different ways of filling in the table, there are 15 different joint PMF’s that have the same pair of marginal PMFs. 6.123 a) pX,Y (3, 6) = 6 radios. b)

Televisions, X

35 250

= 0.14, which is the probability that a family has 3 televisions and

3

4

5

Radios, Y 6

7

8

pX (x)

0

0.000

0.004

0.004

0.004

0.004

0.000

0.016

1

0.000

0.024

0.060

0.072

0.008

0.000

0.164

2

0.004

0.032

0.128

0.176

0.056

0.004

0.400

3

0.000

0.024

0.148

0.140

0.036

0.004

0.352

4

0.000

0.004

0.028

0.024

0.012

0.000

0.068

pY (y)

0.004

0.088

0.368

0.416

0.116

0.008

1.000

c) {2X = Y } and P (2X = Y ) = pX,Y (2, 4) + pX,Y (3, 6) + pX,Y (4, 8) = 0.172

6.7 Review Exercises

204

d) {Y ≥ X + 5} and P (X + 5 ≤ Y ) = pX,Y (0, 5) + pX,Y (0, 6) + pX,Y (0, 7) + pX,Y (0, 8) + pX,Y (1, 6) + pX,Y (1, 7) + pX,Y (1, 8) + pX,Y (2, 7) + pX,Y (2, 8) + pX,Y (3, 8) = 0.156 p

(x,y)

6.124 a) See cells in the table in (b) below, where pY |X (y|x) = X,Y and represents the pX (x) probability of a family owning y radios given we know they own x televisions. b) Radios, Y 3 4 5 6 7 8 Total

Televisions, X

0

0.000

0.250

0.250

0.250

0.250

0.000

1.000

1

0.000

0.146

0.366

0.439

0.049

0.000

1.000

2

0.010

0.080

0.320

0.440

0.140

0.010

1.000

3

0.000

0.068

0.420

0.398

0.102

0.011

1.000

4

0.000

0.059

0.412

0.353

0.176

0.000

1.000

pY (y)

0.004

0.088

0.368

0.416

0.116

0.008

1.000

c) P (Y ≤ 5|X = 2) = pY |X (3|2) + pY |X (4|2) + pY |X (5|2) = 0.41 p

(x,y)

d) See cells in the table in (e) below, where pX|Y (x|y) = X,Y pY (y) and represents the probability of a family owning x televisions given we know they own y radios. e) Radios, Y 3 4 5 6 7 8 pX (x)

Televisions, X

0

0.000

0.045

0.011

0.010

0.034

0.000

0.016

1

0.000

0.273

0.163

0.173

0.069

0.000

0.164

2

1.000

0.364

0.348

0.423

0.483

0.500

0.400

3

0.000

0.273

0.402

0.337

0.310

0.500

0.352

4

0.000

0.045

0.076

0.058

0.103

0.000

0.068

Total

1.000

1.000

1.000

1.000

1.000

1.000

1.000

f ) P (X ≥ 3|Y = 5) = pX|Y (3|5) + pX|Y (4|5) = 0.478, i.e. approximately 47.8% of the town’s families with five radios have at least three televisions. 6.125 a) Clearly, pY |X (0 | 0) = 1 and% for & x = 1, · · · , n, % (Y & | X = x) ∼ B(x, α). b) pX,Y (x, y) = pY |X (y|x)pX (x) = xy αy (1 − α)x−y nx px (1 − p)n−x for all y = 0, . . . , x and x = 0, . . . , n and pX,Y (x, y) = 0 otherwise. c) The marginal PMF of Y is given as: n # $# $ " " x n y x pY (y) = pX,Y (x, y) = α p (1 − α)x−y (1 − p)n−x y x x=y x =

n "

x! n! αy px (1 − α)x−y (1 − p)n−x y!(x − y)! x!(n − x)! x=y n−y

" (n − y)! 1 n! · px (1 − α)x (1 − p)n−y−x = αy py y! (n − y)! x!(n − y − x)! x=0

205

CHAPTER 6. Jointly Discrete Random Variables

=

# $ n−y " #n − y $ n (αp)y (p(1 − α))x (1 − p)n−y−x y x 'x=0 () * =(p(1−α)+(1−p))n−y

# $ n = (αp)y (1 − αp)n−y , y = 0, · · · , n. y

Thus Y ∼ B(n, αp). d) The required conditional PMF is given by: for x = y, · · · , n, %x&%n& y # $ x−y px (1 − p)n−x n − y x−y y x α (1 − α) %n& = r (1 − r)n−x , pX | Y (x | y) = y (1 − αp)n−y x − y (αp) y

where r =

(1−α)p 1−αp .

e) Using (d), we obtain that pX−Y | Y (z | y) = P (X − Y = z | Y = y) = P (X = z + y | Y = y) # $ n−y z = pX | Y (z + y | y) = r (1 − r)n−y−z , z where r = (1 − α)p/(1 − αp). Thus (X − Y | Y = y) ∼ B(n − y, r). 6.126 Let X, Y, Z be the number of holes that Jan wins, that Jean wins and that are tied, respectively. Then X, Y, Z have a multinomial distribution with parameters n = 18 and p, q, r. Thus for nonnegative integers x, y, z, where x + y + z = 18, # $ 18 pX,Y,Z (x, y, z) = px q y r z x, y, z and pX,Y,Z (x, y, z) = 0 otherwise. 6.127 Because the distribution is symmetric, P (|X − Y | ≥ 3) = P (X ≥ 3 + Y ) + P (Y ≥ 3 + X) = 2P (X ≥ 3 + Y ) =2

∞ ∞ " "

p2 (1 − p)x+y−2 =

y=1 x=3+y

2(1 − p)3 . 2−p

!∞ !∞ 2 2i−2 = 6.128 a) For z = 0, P (|X − Y | = 0) = i=1 P (X = Y = i) = i=1 p (1 − p) ! 2 p p 2 i p2 ∞ i=0 ((1 − p) ) = 2p−p2 = 2−p . Again by symmetry, for z = 1, 2 . . . , P (|X − Y | = z) = P (X = Y + z) + P (Y = X + z) = 2P (X = Y + z). Now P (X = Y + z) =

∞ "

P (X = y + z, Y = y) =

y=1

y=1

Therefore, for z ∈ N , P (|X − Y | = z) =

∞ "

p2 (1 − p)2y+z−2 =

p(1 − p)z . 2−p

2p(1−p)z 2−p .

b) P (|X − Y | ≥ 3) = 1 − {P (|X − Y | = 0) + P (|X − Y | = 1) + P (|X − Y | = 2)} =

2(1 − p)3 . 2−p

6.7 Review Exercises

206

6.129 a) Since the total area of the target is 36 square feet, and the area of the bulls-eye is π π square feet, the probability of our archer hitting the bulls-eye is p = 36 . The area of the 5 point π . The probability ring is 3π, thus the probability of our archer hitting the 5 point ring is q = 12 of hitting the remaining region is r = 1 − p − q. Thus the joint PMF of the random variables X, Y , and Z is multinomial with parameters 4, p, q, and r. b) Since each shot is independent, X ∼ B(4, p), Y ∼ B(4, q) and Z ∼ B(4, 1 − p − q). c) The probability that the archer hits either the bulls-eye or the 5 point ring is p + q. Thus, X + Y ∼ B(4, p + q). X + Y represents the number of shots fired that score some positive amount of points. d) T = 10X + 5Y + 0 · Z = 10X + 5Y . The PMF of T is given by: t 0 5 10 15 20

pT (t) 0.1795346 0.2888283 0.2705220 0.1628840 0.0707783

t 25 30 35 40

pT (t) 0.0218368 0.0048621 0.0006959 0.0000580

6.130 a) Using the multivariate form of FPF and the binomial formaula, pX1 ,X2 ,X3 +X4 (x1 , x2 , z) =

z "

P (X1 = x1 , X2 = x2 , X3 = y, X4 = z − y)

y=0

=

z # " y=0

$ z " n z! n! x1 x2 x1 x2 y z−y p p p p = p p py pz−y x1 , x2 , y, z − y 1 2 3 4 x1 !x2 !z! 1 2 y!(z − y)! 3 4 y=0 ' () * =(p3 +p4 )z

=

#

$

n px1 px2 (p3 + p4 )z . x1 , x2 , z 1 2

Thus X1 , X2 , X3 + X4 have a multinomial distribution with parameters n, p1 , p2 , and p3 + p4 b) Logically, we can group X3 and X4 and consider the sum as a new variable, which represents the number of selected members of the population that have 3rd or 4th attribute. Then p3 + p4 represents the proportion of the population that has either 3rd or 4th attributes. Hence we have 3 variables X1 , X2 , X3 +X4 and X1 , X2 , X3 +X4 has a multinomial distribution with parameters n and p1 , p2 , p3 + p4 . 6.131 a) For notational purposes, let U = U0.3 and V = U0.6 . V 0 1 2 pU (u) U

0

0.16

0.24

0.09

0.49

1

0.00

0.24

0.18

0.42

2

0.00

0.00

0.09

0.09

pV (v)

0.16

0.48

0.36

1.00

For the PMF of V , refer to the last row of the above table. b) For the PMF of U , refer to the last column of the above table.

207

CHAPTER 6. Jointly Discrete Random Variables

c) For the joint PMF of U, V refer to the cells of the above table. d) The conditional PMFs of U for each possible value of V are provided in the cell columns of the following table. V 0 1 2 pU (u) U

0

1.00

0.50

0.25

0.49

1

0.00

0.50

0.50

0.42

2

0.00

0.00

0.25

0.09

Total

1.00

1.00

1.00

1.00

e) The conditional PMFs of V for each possible value of U are provided in the cell columns of the following table. V 0 1 2 Total U

0

0.327

0.490

0.184

1.000

1

0.000

0.571

0.429

1.000

2

0.000

0.000

1.000

1.000

pV (v)

0.160

0.480

0.360

1.000

6.132 a) Let X, Y and Z represent the number of Republicans, Democrats and Independents selected to serve on the committee, respectively. Then X, Y, Z has a multiple hypergeometric distribution with parameters N = 100, n = 10 pX = 0.5, pY = 0.3 and pZ = 0.2. The probability that the committee will be perfectly representitive is P (X = 5, Y = 3, Z = 2) =

%50&%30&%20& 5

3 %100 &

2

≈ 0.0944

10

b) The probability that the committee will be close to being perfectly representitive is given by P (4 ≤ X ≤ 6, 2 ≤ Y ≤ 4, 1 ≤ Z ≤ 3) = pX,Y,Z (4, 3, 3) + pX,Y,Z (4, 4, 2) + pX,Y,Z (5, 2, 3) + pX,Y,Z (5, 3, 2) + pX,Y,Z (5, 4, 1) + pX,Y,Z (6, 2, 2) + pX,Y,Z (6, 3, 1) = 0.503 6.133 a) Let X1 , . . . , X7 represent the number of red, orange, yellow, green, blue, violet, and brown N&Ns selected from a packet respectively. Then X1 , . . . , X7 has a multiple hy1 6 p3 = 35 pergeometric distribution with parameters N = 70, n = 7, p1 = 71 , p2 = 10 4 4 3 p5 = 17 p6 = 35 and p7 = 35 . The probability that all seven colors are chosen is: p4 = 14 P (X1 = X2 = X3 = X4 = X5 = X6 = X7 = 1) = 0.00673. b) The probability that only one color is chosen is: P (X1 = 7) + P (X2 = 7) + P (X3 = 7) + P (X4 = 7) + P (X5 = 7) + P (X6 = 7) + P (X7 = 7) = 0.00000624. c) P (X1 = 2, X3 = 2, X4 = 2, X5 = 1) = 0.00260 6.134 From example 6.25 we know that the distribution of the number of serious accidents during a weekday is Poisson with parameter 2.4 × 0.6 = 1.44 and the distribution of the number of serious accidents during a weekend day is Poisson with parameter 1.3 × 0.4 = 0.52. Thus, using Proposition 6.20(b), the distribution of serious accidents during a week is Poisson with

6.7 Review Exercises

208

parameter 1.44 + 1.44 + 1.44 ' () + 1.44 + 1.44* + 5 weekdays

+ 0.52* '0.52 ()

saturday and sunday

6.135 a) The marginal PMF of X is pX (x) =

"

pX,Y (x, y) =

y

= 8.24

∞ " xy e−x y=0

∞ e−x " xy 1 = = y!M M y=0 y! M ' () * =ex

for x = 1, . . . , M and pX (x) = 0 otherwise. Thus, X is uniformly distributed on {1, · · · , M }. b) The marginal PMF of Y is pY (y) =

"

pX,Y (x, y) =

x

M " xy e−x x=1

y!M

=

M 1 " y −x x e y!M x=1

for y = 0, . . . and pY (y) = 0 otherwise. p (x,y) e−x xy c) pY |X (y|x) = X,Y y! . Thus (Y | X = x) ∼ P(x). pX (x) = d) X and Y are not independent random variables since the conditional distribution of Y given X = x is not the same as the marginal distribution of Y . 6.136 Let X1 , X2 , X3 and X4 denote, respectively, the numbers of the n items sampled that are in the four specified categories (nondefective-uninspected, nondefective-inspected, defectiveuninspected, and defective-inspected). Then the random variables X1 , X2 , X3 , X4 have the multinomial distribution with parameters n and (1 − p)(1 − q), (1 − p)q, p(1 − q), pq. 6.137 a) If the sample is taken without replacement, then X1 , X2 , X3 , X4 have the multiple hypergeometric distribution with parameters N, n and (1 − p)(1 − q), (1 − p)q, p(1 − q), pq. b) It is appropriate to use the multinomial distribution to approximate the multiple hypergeometric distribution when the population (lot) size N is large relative to the sample size n. 6.138 The conditional distribution of the number of undetected defectives, given that the number of detected defectives equals k, is binomial with parameters n − k and p(1−q) 1−pq , since P (X3 = x|X4 = k) = (n − k)! = x!(n − x − k)!

#

%

&

n x,k,n−x−k

p(1 − q) 1 − pq

$x #

(p(1 − q))x (pq)k (1 − p)n−x−k % n& k n−k k (pq) (1 − pq)

1−p 1 − pq

$n−k−x

, x = 0, · · · , n − k.

6.139 a) The conditional PMF of the number of people, N , entering the bank during the specified time interval, given the number of people, X, making a deposit is % & e−λ λn n x n−x P (X = x, N = n) n! x p (1 − p) = P (N = n | X = x) = e−pλ (pλ)x P (X = x) x!

=

e−(1−p)λ ((1 − p)λ)n−x e−λ λn n!px (1 − p)n−x x! = , n = x, x + 1, · · · . n!x!(n − x)!e−pλ (pλ)x (n − x)!

209

CHAPTER 6. Jointly Discrete Random Variables

b) From above, we see (N − X | X = x) ∼ P((1 − p)λ). c) X and N − X are independent since the answer in (b) does not depend on x. d) The number of people who make a deposit, X, and the number of people who don’t make a deposit, N − X, are independent. 6.140 The distribution of the total number of people at the car wash is Poisson with parameter λ + µ + ν (Proposition 6.20). Thus the conditional joint distribution of the hourly numbers of customers requesting the three options, given the total number n of customers during the hour, µ ν λ , λ+µ+ν , λ+µ+ν is multinomial with parameters n and λ+µ+ν 6.141 Let X be the number of successes in n trials, and let Y be the number of the trial on which the first success occurs. Note that the probability of k − 1 successes in n − y trials is %n−y& k−1 p (1 − p)n−y−k+1 = P (X = k|Y = y). Thus, for k = 0, · · · , n, k−1 P (X = k|Y = y)P (Y = y) P (X = k, Y = y) = P (X = k) P (X = k) 3% & 4% & %n−y& n−y k−1 (1 − p)n−y−k+1 p(1 − p)y−1 k−1 p % n& %n& , y = 1, · · · , n − k + 1. = = k−1 k (1 − p)n−k p k k P (Y = y|X = k) =

1 pY | X (y | x)pX (x) 10 −x , 6.142 a) For y = 0, 1, · · · , 9, we have pX|Y (x|y) = !y = ! 1 y k=0 pY | X (y | k)pX (k) k=0 10 − k where x = 0, · · · , y, since pX (x) = 1/10 for all x = 0, · · · , 9 and pY | X (y | x) = 1/(10 − x) for y = x, x + 1, · · · , 9 and x ∈ {0, · · · , 9}. b) The PMF of (X | Y = 6) is given by: x 0 1 2 3

pX|Y (x|6) 0.0913 0.1014 0.1141 0.1304

x 4 5 6

pX|Y (x|6) 0.1521 0.1825 0.2282

6.143 a) If T = n, then X1 through Xn−1 must not exceed m. Additionally, we must also have that Xn > m. Thus, Xn ∈ {m + 1, . . . , N }. Thus {T = n} is the union of the events {X1 ≤ m, . . . , Xn−1 ≤ m, Xn = k} for k = m + 1, . . . , N . % m &n−1 1 b) The probability of the event {X1 ≤ m, . . . , Xn−1 ≤ m, Xn = k} is N · N . Thus P (T = n) =

N "

P (X1 ≤ m, . . . , Xn−1 ≤ m, Xn = k) =

k=m+1

= %

m N

&

N 3 m 4n−1 1 " · N N

k=m+1

3 m 4n−1 N

#

N −m N

$

3 m 4 3 m 4n−1 = 1− N N

Therefore, T ∼ G 1 − c) Consider the random experiment of selecting a number at random from the first N positive

6.7 Review Exercises

210

integers. Let the specified event (i.e., a success) be that the number selected exceeds m. Indem pendent repititions of this experiment constitute Bernoulli trials with success probability 1 − N . The random variable T is the time of the first success, which has the geometric distribution m . with parameter 1 − N 6.144 Using the law of total probability, P (U = u) =

N "

P (U = u, X1 = k) =

k=1

N "

P (X1 = k)P (U = u | X1 = k).

k=1

Thus, for u = 2, P (U = 2) =

N " k=1

=

N " 1 P (X1 = k)P (U = 2 | X1 = k) = · P (X2 ≥ k) N k=1

N " 1 N −k+1 1 1 N (N + 1) N +1 · = 2 (N + (N − 1) + · · · + 1) = 2 · = . N N N N 2 2N k=1

For u = 3, . . ., we have that P (U = u, X1 = 1) = 0 and P (U = u) =

N "

P (U = u, X1 = k) =

k=1

N "

P (U = u, X1 = k) =

k=2

N "

P (X1 = k)P (U = u|X1 = k)

k=2

$# $ N N # k − 1 u−2 1 " 1 " k−1 = P (X2 < k, . . . , Xu−1 < k, Xu ≥ k) = 1− N N N N k=2

k=2

and P (U = u) = 0 otherwise.

λ

j , we have that the condition joint PMF 6.145 Folowing exercise 6.140, and letting pj = λ1 +...+λ m of X1 , . . . , Xm given that X1 + . . . + Xm = z is multinomial with parameters z and p1 , . . . , pm .

6.146 a) The marginal PMF of Xm is Binomial with parameters n and pm . b) The conditional PMF of X1 , . . . , Xm−1 given Xm = xm is multinomial with parameters pm−1 p1 n − xm and 1−p , . . . , 1−p m m 6.147 a) Assuming the two factories are independent, by Proposition 6.20, the total number of defective components per day is Binomial with parameters m + n and p. b) Let X represent the number of defective components from Factory A and Z represent the total number of defective components from both factories combined. Then the conditional probability distribution of the number of defective components per day from Factory A, given the combined total of defective components is P (X = x, Y = z − x) P (Z = z) %m& x % & %m&% n & m−x n pz−x (1 − p)n−z+x p (1 − p) z−x %m+n& z−x & = x = x%m+n z (1 − p)m+n−z p z z pX|Z (x|z) = P (X = x|Z = z) =

Thus X|Z = z has a hypergeometric distribution with parameters m + n, z and

m m+n .

211

CHAPTER 6. Jointly Discrete Random Variables

6.148 a) If l represents the daily output from Factory C, then by Proposition 6.20, the total number of defective components per day is Binomial with parameters m + n + l and p. m b) X|Z = z has a hypergeometric distribution with parameters m + n + l, z and m+n+l . 6.149 a) Using Exercise 6.31, the conditional joint PMF of X1 , . . . , Xm given X1 +. . .+Xm = z is multiple hypergeometric with parameters N, z and nN1 , . . . , nNm , where N = n1 + · · · + nm . One can also obtain the required answer by noting that X1 + · · · + Xm has B(n1 + · · · + nm , p) distribution, thus, for all xi ∈ {1, · · · , ni }, i = 1, . . . , m, P (X1 = x1 , ···, Xm = xm |X1 + ··· + Xm = z) =

P (X1 = x1 , ..., Xm = xm , X1 + ··· + Xm = z) P (X1 + ··· + Xm = z)

⎧ 0, if z ̸= x1 + · · · + xm , ⎪ ⎨ % & % & n n x n −x x n −x m 1 m 1 1 1 × ··· × = (1 − p) m m x p (1 − p) xm p ⎪ %n1 +···+nm & , if z = x1 + ··· + xm ; ⎩ 1 pz (1 − p)n1 +···+nm −z z ⎧ 0, if z ̸= x1 + · · · + xm , ⎪ ⎨ % & % & n n m 1 = x × ... × x ⎪ ⎩ 1 %n1 +···+nm & m , if z = x1 + ··· + xm . z

j th

b) Using the fact that the univariate marginal PMF of a multiple hypergeometric distribution n1 with parameters N, z and N , . . . , nNm has the hypergeometric distribution with parameters N, z n and Nj , the required results immediately follow. 6.150 a) By definition, Xj is Bernoulli with parameter p. b) If X = x, then we need to have had x successes in n trials, thus Xj is Bernoulli with parameter nx . c) Using the definition of conditional PMF, we have P (Xj = k|X = x) =

P (Xj = k, X = x) P (X = x|Xj = k)P (Xj = k) = . P (X = x) P (X = x)

For k = 0, we have P (Xj = 0|X = x) =

%%n−1& x

For k = 1, we have

& px (1 − p)n − x − 1 (1 − p) x %n& =1− . x n−x n x p (1 − p)

3% & 4 n−1 x−1 n−x p p (1 − p) x−1 x %n& P (Xj = 1|X = x) = = . x n−x n x p (1 − p)

Therefore, the desired result is proved. Theory Exercises

6.151 a) Note that for x1 , · · · , xm ∈ N , P (X1 = x1 , . . . , Xm = xm ) = P (F . . F* S F . . F* SF . . . S F . . F* S) ' .() ' .() ' .() x1 −1

x2 −1

xm −1

6.7 Review Exercises

212

% &% & % & = (1 − p)x1 −1 p (1 − p)x2 −1 p . . . (1 − p)xm −1 p ,

and pX1 ,··· ,Xm (x1 , · · · , xm ) = 0 otherwise. Thus, by Exercise 6.60(c), X1 , . . . , Xm are independent random variables, where pXi (xi ) = (1 − p)xi −1 p for xi ∈ N and pXi (xi ) = 0 for xi ∈ / N, i = 1, . . . , m. Therefore, X1 , . . . Xm are independent identically distributed geometric random variables with parameter p. b) Clearly, X = X1 + · · · + Xr , where, by (a), X1 , . . . , Xr are independent geometric random variables with parameter p. c) Using (a) and (b), if X ∼ N B(r, p) and Y ∼ N B(s, p) and are independent, then X = X1 + · · · + Xr and Y = Y1 + · · · + Ys for some independent geometric random variables X1 , . . . , Xr , Y1 , . . . , Ys with parameter p. Then their sum X + Y can be written as the sum of r + s independent geometric random variables with success parameter p and represents the number of Bernoulli trials up to and including the (r+s)th success. Thus, X +Y ∼ N B(r+s, p).

Advanced Exercises 6.152 a) Since SN is the sum of independent identically distributed random variables, SN has a compound distribution. b) Yes. Since N has a Poisson distribution, SN is has a compound Poisson distribution. c) SN is the total number of trials needed to achieve the N th success. d) Using Exercise 6.151 (a), we have that # $ ∞ x " " e−λ λn x − 1 n p (1 − p)x−n , x ∈ N , pSN (x) = pN (n)pSn (x) = n! n−1 n=1

n=0

and pSN (x) = 0 for all / N. D x∈ −λ e) P (SN = 3) = e λp(1 − p)2 + λ2 p2 (1 − p) +

λ3 3 6 p

E

.

6.153 a) Considering the joint PMF of the random variables X1 , . . . Xk for x1 , . . . , xk , let y = n − (x1 + . . . + xk ) and q = 1 − (p1 + . . . + pk ). Then, # $ n P (X1 = x1 , . . . , Xk = xk ) = px1 · · · pxk k q y x1 , . . . , xk , y 1 for x1 , . . . , xk ∈ Z+ such that x1 + · · · + xk ∈ {0, . . . , n}, and pX1 ,...,Xk (x1 , . . . , xk ) = 0 otherwise. b) The conditional distribution of X1 , . . . , Xm−2 given Xm−1 = xm−1 and Xm = xm is multipm−2 p1 nomial with parameters n − (xm−1 + xm ) and 1−(pm−1 +pm ) , . . . , 1−(pm−1 +pm ) . 6.154 a) T ∼ G(1 − pX (0)), since T counts the number of Bernoulli trials until the 1st success, where “success”={X ̸= 0}. In particular, this implies that T is finite with probability one. b) XT represents the first strictly positive value obtained in the sequence X1 , X2 , . . . c) For x ∈ N , t ∈ N , we have pXT ,T (x, t) = P (X1 = 0, . . . , Xt−1 = 0, Xt = x) = (pX (0))t−1 pX (x), and pXT ,T (x, t) = 0 otherwise. d) The PMF of XT is: pXT (x) =

∞ " t=1

∞ " P (XT = x, T = t) = (pX (0))t−1 pX (x) = t=1

pX (x) 1 − pX (0)

213

CHAPTER 6. Jointly Discrete Random Variables

for x = 1, 2, . . . and pXT (x) = 0 otherwise. e) XT and T are independent since pXT ,T (x, t) = pXt (x)pT (t) for all x and t. 6.155 Since X, Y are both integer-valued, using partitioning of events, we have that P (X ≤ x, Y ≤ y) = P (X = x, Y = y) + P (X = x, Y ≤ y − 1) + P (X ≤ x − 1, Y = y) + P (X ≤ x − 1, Y ≤ y − 1). On the other hand, P (X ≤ x, Y ≤ y − 1) = P (X = x, Y ≤ y − 1) + P (X ≤ x − 1, Y ≤ y − 1), P (X ≤ x − 1, Y ≤ y) = P (X ≤ x − 1, Y = y) + P (X ≤ x − 1, Y ≤ y − 1). Therefore, P (X ≤ x, Y ≤ y)−P (X ≤ x, Y ≤ y−1)−P (X ≤ x−1, Y ≤ y)+P (X ≤ x−1, Y ≤ y−1) = pX,Y (x, y). 6.156 P (U = u, V = v) = P (min{X1 , X2 } = u, max{X3 , X4 } = v) = P (X1 = u, X2 = u, X3 = v, X4 = v) + P (X1 = u, X2 = u, X3 = v, X4 < v) +P (X1 = u, X2 = u, X3 < v, X4 = v) + P (X1 = u, X2 > u, X3 = v, X4 = v) +P (X1 = u, X2 > u, X3 = v, X4 < v) + P (X1 = u, X2 > u, X3 < v, X4 = v) +P (X1 > u, X2 = u, X3 = v, X4 = v) + P (X1 > u, X2 = u, X3 = v, X4 < v) +P (X1 > u, X2 = u, X3 < v, X4 = v) 1 = 4 (1 + v + v + (9 − u) + v(9 − u) + v(9 − u) + (9 − u) + v(9 − u) + v(9 − u)) 10 19 + 38v − 4vu − 2u 1 + 2v 19 − 2u = = · = pV (v)pU (u). 4 10 100 100 6.157 Note that, when u, v ∈ {0, 1, . . . , N } and u < v, then P (U = u, V = v) = P (X = u, Y = v) + P (X = v, Y = u) =

2 ; (N + 1)2

if u = v ∈ {0, 1, . . . , N }, then P (U = u, V = u) =

1 , (N + 1)2

and P (U = u, V = v) = 0 for all other values of (u, v). Thus, the joint PMF of U and V is: ⎧ 1 ⎪ , if u = v ∈ {0, 1, . . . , N }, ⎪ ⎪ ⎪ (N + 1)2 ⎪ ⎨ 2 pU,V (u, v) = , if u < v and u, v ∈ {0, 1, . . . , N }, ⎪ ⎪ (N + 1)2 ⎪ ⎪ ⎪ ⎩ 0, otherwise. 6.158 a) For x1 , . . . , xm−1 ∈ {0, · · · , n}, where 0 < x1 + . . . + xm−1 ≤ n, let x = x1 + . . . + xm−1

6.7 Review Exercises

214

and pn = p1n + . . . + p(m−1)n . Also, note that npn → λ1 + . . . + λm−1 as n → ∞. Let λ = λ1 + . . . + λm−1 . Then # $ n xm−1 pX1n ,...,X(m−1),n (x1 , . . . , xm−1 ) = px1 · · · p(m−1)n (1 − pn )n−x x1 , . . . , xm−1 , n − x 1n

where

D 1 n(n − 1)···(n − x + 1) npn En−x x1 xm−1 = 1− , · (np1n ) · · · (np(m−1)n ) x1 ! · · · xm−1 ! nx n

n(n − 1)···(n − x + 1) n n−1 n−x+1 = × × ··· × x n n n n 2 x−1 1 = 1 × (1 − ) × (1 − ) × · · · × (1 − ) −→ 1, as n → ∞. n n n Also, by assumption, for 1 ≤ i ≤ m − 1, (npin )xi → λxi i as n → ∞. Moreover, D

1−

npn E−x D npn En npn En−x D = 1− 1− −→ 1 · e−λ , as n → ∞. n n n

Thus, for all x1 , . . . , xm−1 ∈ {0, · · · , n}, where 0 < x1 + . . . + xm−1 ≤ n, pX1n ,...,X(m−1)n (x1 , . . . , xm−1 ) −→

1 xm−1 −λ λx1 · · · λm−1 e , as n → ∞. x1 ! · · · xm−1 ! 1

Also pX1n ,...,X(m−1)n (0, . . . , 0) = (1 − pn )n = (1 − npnn )n −→ e−λ , thus, the required conclusion follows for all x1 , . . . , xm−1 ∈ {0, · · · , n} such that 0 ≤ x1 + . . . + xm−1 ≤ n. b) For a fixed m, as n grows large with respect to m, the multinomial distribution with parameters n and p1n , . . . , pmn is well approximated by a multivariate Poisson distribution with parameters np1n , . . . , np(m−1)n . c) When m = 2, the multinomial distribution in question is just a binomial distribution, whereas the multivariate Poisson distribution in (a) reduces to a univariate Poisson, thus, the required result follows at once from (a).

BETA

Chapter 7

Expected Value of Discrete Random Variables 7.1

From Averages to Expected Values

Basic Exercises 7.1 Using Exercise 5.24, we have E(X) =

X x

xP (X = x) = 2 ⇥ +8 ⇥

1 2 3 4 5 6 +3⇥ +4⇥ +5⇥ +6⇥ +7⇥ 36 36 36 36 36 36

5 4 3 2 1 +9⇥ + 10 ⇥ + 11 ⇥ + 12 ⇥ = 7. 36 36 36 36 36

7.2 Let X be the score of the archer. Then using Exercise 5.27, we have E(X) =

X x

xP (X = x) = 5 ⇥

⇡ ⇡ 25⇡ + 10 ⇥ = . 12 36 36

7.3 a) Using Exercise 5.28, when p = 0.5 we have X E(X) = xP (X = x) = 1 ⇥ 0.4 + 2 ⇥ 0.3 = 1. x

b) Again using Exercise 5.28, when p = 0.4 we have X E(X) = xP (X = x) = 1 ⇥ 0.4 + 2 ⇥ 0.24 = 0.88 x

7.4 a) Let X be the number of women in the interview pool. Then, using Exercise 5.29 (a), we have X 5 12 2 15 E(X) = xP (X = x) = 1 ⇥ +2⇥ +3⇥ = . 11 33 33 11 x b) Note that X has a hypergeometric distribution with parameters N = 11, n = 3 and p = 5 Then, using Table 7.4, we have that E(X) = np = 3 · 11 = 15 11 .

5 11 .

7.1 From Averages to Expected Values

216

7.5 Since the total torque acting on the seesaw must be 0, we have the following equation: m X

(xk

x ¯)pk = 0.

k=1

Then

Pm

k=1 xk pk

Pm

¯pk k=1 x

= 0, implying that

m X

xk P (X = xk )

x ¯

k=1

m X k=1

|

Therefore, E(X) = x ¯.

P (X = xk ) = 0. {z

}

=1

7.6 a) Note first that ⌦ contains a finite number of elements, say, ⌦ = {!1 , · · · , !N } for some finite N 2 N . For each x 2 R, let Kx be the set of all ! 2 ⌦ such that X(!) = x; also let {x , · · · , xM } (with M  N and all distinct x1 , · · · , xM ) denote the range of X. Then S1M ⌦ = i=1 Kxi and X

X(!)P ({!}) =

M ✓ X X i=1

!2⌦

=

M X

!2Kxi

◆ X ✓ X ◆ M X(!)P ({!}) = xi P ({!})

xi P (Kxi ) =

i=1

i=1

M X i=1

!2Kxi

xi P (X = xi ) = E(X).

b) If there are N equally likely outcomes, then P ({!i }) = P and E(X) = N1 N i=1 X(!i ).

1 N

for each possible outcome !i 2 ⌦

7.7 a) 82.4, since consider a finite population {!1 , · · · , !N } of (arbitrary but fixed) size N 2 N . Let yi be the value of the variable in question for the ith member of the population (i.e. for !i ), i = 1, · · · , N . If X represents the value of the variable for a randomly selected member of the population, then, by Exercise 7.6(b), N N 1 X 1 X E(X) = X(!i ) = yi = y¯ = 82.4 N N i=1

i=1

b) 82.4n. For large n, the average value of X in n independent observations is approximately equal to E(X), implying that the sum of values of X in n independent observations is approximately equal to nE(X). 7.8 a) 0.217242. Let X be the total number of people tested until the first positive test occurs. Then X has a geometric distribution (with some parameter p). Since, on average, the first positive test occurs when the 25th person is tested, then E(X) = 25 = 1/p, implying that p = 1/25. Thus, ◆✓ ◆ 6 ✓ X 1 1 x 1 P (X  6) = 1 = 0.217242 25 25 x=1

b) 125. Let Y be the total number of people tested until the fifth positive test occurs. Then,

217

CHAPTER 7. Expected Value of Discrete Random Variables

Y has N B(5, 1/25) distribution. From Table 7.4, and since the average number of people that the physician must test until obtaining the fifth positive test is the expectation of Y , it follows that r 5 E(Y ) = = 1 = 125. p 25 7.9 18. Since the number of children cured after treatment with the drug has a binomial distribution with parameters n = 20 and p = 0.9, the expected number of children cured equals to np = 18. 7.10 15. Distribution of the number of times that the favorite will finish in the money is binomial with parameters n and 0.67. Additionally, we want the expectation to be at least 10. If X is the number of times that the favorite will finish in the money, then we have np = n0.67 = E(X) 10, implying that n 14.9254, and, since n must be an integer, it follows that n is at least 15. 7.11 1.25; If both parents have the sickle cell trait, then the probability that they will pass it on is 0.25, and the number of children out of 5 who will get it has a binomial distribution with parameters n = 5 and p = 0.25. Thus the expected number of children who will inherit the sickle cell trait is equal to np = 1.25. 7.12 a) Let X represent the number of undergraduate students selected. Then when there is no replacement, X ⇠ H(10, 3, 0.6) and the expected number of underclassmen is E(X) = np = 1.8. b) When there is replacement, X ⇠ B(3, 0.6) and the expected number of underclassmen is E(X) = np = 1.8. c) The results are the same since a binomial random variable with parameters n and p has the same expectation as a hypergeometric random variable with parameters N, n and p. 7.13 From example 5.14, we have that X is hypergeometric with parameters 100, 5 and 0.06. Thus using Table 7.4, E(X) = np = 0.3. This means that on average, when 5 TVs are randomly selected without replacement, we expect to see 0.3 defective TVs. 7.14 a) When sampling without replacement, the distribution of people with the specific attribute is hypergeometric with parameters N , n and p, and the expectation is np. b) When sampling with replacement, the distribution of people with the specific attribute is binomial with parameters n and p, and the expectation is np. 7.15 a) 15. Since X, the number of cars that use the drive-up window (at a given fast-food restaurant) between 5:00 P.M. and 6:00 P.M., is Poisson with some parameter , and, on average, 15 cars use it during that hour, then = E(X) = 15. b) 63.472%, since 18 X e 15 15x P (|X 15|  3) = ⇡ 0.63472 x! x=12

7.16 The means are the same. 7.17 Assuming that all the customers at Downtown Co↵ee Shop make their selections independently and have the same chance of buying a caf´e mocha with no whipped cream, then the expected number of customers (per hour) who buy a caf´e mocha with no whipped cream is 0.3 ⇥ 31 = 9.3

7.1 From Averages to Expected Values

218

7.18 4/(1 e 4 ). From Exercise 5.85, letting Y be the number of eggs observed in a non-empty nest, we have for y 2 N , y e 4 · 4y! pY (y) = , 1 e 4 thus, y 1 X X e 4 · 4y! 4 E(Y ) = = ⇡ 4.07463 yP (Y = y) = y 4 4 1 e 1 e y y=1

7.19 If E is an event, then IE ⇠ B(1, P (E)). Therefore, E(IE ) = 1 · P (E) = P (E).

7.20 a) Let X be the number of at bats until the first hit. Then X ⇠ G(0.260), and E(X) = 1/0.260 ⇡ 3.84615; b) Let Y be the number of at bats until the second hit. Then Y ⇠ N B(2, 0.260), and E(Y ) = 2/0.260 ⇡ 7.69231; c) Let Z be the number of at bats until the tenth hit. Then Z ⇠ N B(10, 0.260), and E(Z) = 10/0.260 ⇡ 38.4615; 7.21 a) The answers will vary. b) By definition of expectation,

E(X) =

X

xP (X = x) =

x

9 X x = 4.5; 10 x=0

The sample average, for large sample size n, should be close to E(X) = 4.5 7.22 (N + 1)/2. By definition of expectation, E(X) =

X x

N N X x 1 X 1 N (N + 1) N +1 xP (X = x) = = x= · = . N N N 2 2 x=1

x=1

Theory Exercises ✓ ◆ ✓ ◆ ✓ ◆ ✓ Np Np 1 N N N 7.23 Note that k = Np for k 1, and = ⇥ k k 1 n n n thus, N Np N p N (1 p) N N N p Np 1 X X k 1 (n 1) (k 1) k n k E(X) = k· = N N 1 N n n ⇥ n 1 k=0 k=1 = np

Np 1 `

N X1 `=0

N Np (n 1) ` N 1 n 1

= np

N X1 `=0

(N 1)e p `

(N 1)(1 pe) (n 1) ` N 1 n 1

1 1



for n

1,

= np,

since the last sum is the sum of probabilities of all possible values of a hypergeometric random variable with parameters N 1, n 1 and pe, where we put pe = (N p 1)/(N 1). 7.24 The required conclusion follows from: E(X) =

1 X x=1

xp(1

x 1

p)

=

1 X d p (1 dp x=0

p) = x

1

d X p (1 dp x=0

p) = x

d p dp

✓ ◆ 1 1 = . p p

219

CHAPTER 7. Expected Value of Discrete Random Variables

7.25 a) Since a negative binomial random variable with parameters r and p can be represented as a sum of r geometric random variables, each of which has an expectation of 1/p, then the expected value of a negative binomial random variable is r/p. b) Note that 1 X E(X) = (r (x r))pX (x) x=r

=r

1 ✓ X x=r

r x

r



pr (p

1)x

= r + pr (p

r

+

1 ✓ X

r



x r x=r ◆ 1 ✓ X r d 1) (p j dp

(x

r)pr (p

1)x

r

1)j .

j=0

c) By (b) and the binomial series theorem (see equation (5.45) on page 241 in the text), r + pr (p

1)

◆ 1 ✓ X r d (p j dp

1)j = r + pr (p

1)

j=0

= r + pr (p

1)

d (1 + (p dp

1))

r

1

d X dp j=0

= r + pr (p

1)

d p dp

r



◆ r (p j

= r + pr (p

1)

1)j ( r) r = . r+1 p p

d) Since a negative binomial random variable, with parameters r = 1 and p, is a geometric random variable with parameter p, the desired result immediately follows from (c). Advanced Exercises 7.26 ↵. Indeed, note that E(X) =

1 X x=0

✓ 1 X ↵x x· = x· 1 (1 + ↵)x+1 x=1

1 where Y is G( 1+↵ ). Therefore,

E(X) =



1

1 1+↵

◆x ✓

1 1+↵



=



1

◆ 1 E(Y ), 1+↵

◆ 1 (1 + ↵) = ↵. 1+↵

7.27 a) Let A+ = {x1 , x2 , . . .}Pbe the enumeration of all positive x such that pX (x) > 0. For + + each N 2 N , consider SN = N i=1 xi pX (xi ). Since xi > 0 and pX (xi ) > 0 for all i, then SN is a monotone increasing sequence and must have a limit as N ! 1, where the limit is either + a finite positive constant or +1. Since E(X + ) = limN !1 SN , the required conclusion then follows. b) Let A = {y1 , y2 , . .P .} be the enumeration of all negative x such that pX (x) > 0. For each N 2 N , define SN = N yi > 0 and pX (yi ) > 0 for all i, then SN is i=1 ( yi )pX (yi ). Since a monotone increasing sequence and must have a limit as N ! 1, where the limit is either a P finite positive constant or +1. Since x0

X x0

x0: j is odd

7.2 Basic Properties of Expected Value

222

a) Let Xi equal the number of successes in the ith Bernoulli trial, i = 1, · · · , n. Then Y = 1{X1 +···+Xn is odd} and pX1 ,··· ,Xn (x1 , · · · , xn ) = pX1 (x1 ) . . . pXn (xn ) = px1 (1 = px1 +···+xn (1

p)n

p)1

x1

. . . pxn (1

p)1

xn

(x1 +···+xn )

for all (x1 , . . . , xn ) 2 {0, 1}n (and pX1 ,··· ,Xn (x1 , · · · , xn ) = 0 otherwise). Then E(Y ) = =

X

j>0: j is odd

1 X

...

...

1 X

x1 =0

✓X 1

= (1

x1 =0

p)n

1 X

xn =0

xn =0

1 2

✓✓

1{x1 +···+xn is odd} px1 +···+xn (1

1{x1 +···+xn =j}

1+

p 1

p

◆n





pj (1

p)n

j

=

p)n

(x1 +···+xn )

X

j>0: j is odd

1

p 1

p

◆n ◆

1 = (1 2

(1

✓ ◆ n j p (1 j

p)n

j

2p)n ),

where we used equality (7.1) (which was proved earlier). b) Let Sn denote the number of successes in n Bernoulli trials. Then X E(Y ) = E(1{Sn is odd} ) = P (Sn is odd) = P (Sn = j) j>0: j is odd

=

X

j>0: j is odd

= (1

1 p) 2 n

✓ ◆ n j p (1 j

✓✓

1+

n j

p)

= (1

n

p)

X

j>0: j is odd

p 1

p

◆n



1

p 1

p

◆n ◆

✓ ◆✓ ◆j n p j 1 p

1 = (1 2

(1

2p)n ),

where we used equality (7.1) (which was proved earlier).

7.34 a) X, Y and Z have a multinomial distribution with parameters n = 4, and 1 ⇡9 . Thus, the joint PMF of X, Y, Z is given by ✓ ◆✓ ◆x ✓ ◆y ✓ ◆ 4 ⇡ ⇡ ⇡ z pX,Y,Z (x, y, z) = 1 x, y, z 36 12 9

⇡ ⇡ 36 , 12

and

for all x, y, z 2 Z+ such that x + y + z = 4; and pX,Y,Z (x, y, z) = 0 otherwise. b) Since T = 10X + 5Y + 0Z, let g(x, y, z) = 10x + 5y. Then g(X, Y, Z) = T and by the FEF, XXX E(T ) = g(x, y, z)pX,Y,Z (x, y, z) ⇡ 8.72665 (x,y,z)

⇡ ⇡ c) We have that X ⇠ B 4, 36 , Y ⇠ B 4, 12 and Z ⇠ B 4, 1 ⇡9 . Thus E(X) = ⇡9 , E(Y ) = ⇡3 ⇡ and E(Z) = 4 1 9 . d) Since T = 10X + 5Y + 0Z, then, by the linearity property of expected value, E(T ) = 10E(X) + 5E(Y ) = 25⇡ 9 ⇡ 8.72665 e) Let Wi be the score obtained on the ith shot. Then, from Exercise 7.2, we have that 25⇡ E(Wi ) = 25⇡ 36 for each i. Since T = W1 + W2 + W3 + W4 , then E(T ) = 4E(W1 ) = 9 .

223

CHAPTER 7. Expected Value of Discrete Random Variables

f ) We can compute the PMF of T , and then use the definition of expectation directly. 7.35 1. Let Ei be the event that the ith man sits across fromPhis wife, i = 1, . . . , N . Then the total number of men who sit across from their wives is equal to N i=1 1Ei , thus, the corresponding expectation is equal to N N N N X X X X 1 E( 1Ei ) = E(1Ei ) = P (Ei ) = = 1. N i=1

i=1

i=1

i=1

7.36 Since we expect to have np1 items with 1 defect and np2 items with 2 or more defects, the expected cost of repairing the defective items in the sample will be $1 ⇥ np1 + $3 ⇥ np2 .

7.37 14.7; Let Xk be the number of throws from the appearance of the (k 1)th distinct number until the appearance of the k th distinct number, where X1 = 1. Then Xi ⇠ G 7 6 i for i = 1, 2, 3, 4, 5, 6. Thus E(Xi ) = 7 6 i , and the expected number of throws of a balanced die to P P get all six possible numbers is equal to 6i=1 E(Xi ) = 6i=1 7 6 i = 14.7 7.38 a) For investment A,

E(YA ) = 1 ⇥ 0.5 + 4 ⇥ 0.4 = 2.1

For investment B, E(YB ) = 1 ⇥ 0.3 + 16 ⇥ 0.2 = 3.5 Investment A has a lower expected yield than investment B. b) For investment A, p E( YA ) = 1 ⇥ 0.5 + 2 ⇥ 0.4 = 1.3 For investment B,

p E( YB ) = 1 ⇥ 0.3 + 4 ⇥ 0.2 = 1.1

Using the square root utility function, which decreases the e↵ect of large earnings, investment A has a higher expected utility than investment B. c) For investment A, 3/2

E(YA ) = 1 ⇥ 0.5 + 8 ⇥ 0.4 = 3.7 For investment B, 3/2

E(YB ) = 1 ⇥ 0.3 + 64 ⇥ 0.2 = 13.1 Using the above utility function, which increases the e↵ect of large earnings, investment B has a higher expected utility than investment A. 16 7.39 a) 17 . Let X2 be the number of items (in a given lot of 17 items) that are inspected by 4 2 both engineers. Then X2 has B(17, (4/17)2 ) distribution. Therefore, E(X2 ) = 17( 17 ) = 16 17 . 169 b) 17 . Let X0 be the number of items (in a given lot of 17 items) that are inspected by neither 169 2 engineer. Then X0 is B(17, (13/17)2 ) and, thus, E(X0 ) = 17( 13 17 ) = 17 . c) 104 17 . Let X1 be the number of items (in a lot of 17) inspected by just one of the two engineers. 4 4 13 104 Then X1 is B(n = 17, p = 2 ⇥ 17 ⇥ 13 17 ), thus, E(X1 ) = 17 ⇥ 2 ⇥ 17 ⇥ 17 = 17 . d) 17, since X0 + X1 + X2 = 17.

7.40 a)

1 p(2 p) .

Let A and B represent the time it takes for each of the two components to fail.

7.2 Basic Properties of Expected Value

224

Let X represent the time it takes for the first component to fail (i.e. X = min(A, B)). Then, upon using tail probabilities, one obtains that E(X) =

1 X

P (X > x) =

x=0

1 X

P (A > x, B > x) =

x=0

=

1 X

(1

1 X

P (A > x)P (B > x)

x=0

p)2x =

x=0

1 (1

1

p)2

1

=

(2

p)p

.

3 2p b) p(2 p) . Let Y represent the time that it takes for the second component to fail (i.e. Y = max(A, B)). Then, using tail probabilities,

E(Y ) = =

1 X

1 X

1 X

P (Y > y) =

y=0

(1

(1

P (Y < y + 1)) =

y=0

=

(1

P (A < y + 1, B < y + 1))

y=0

P (A < y + 1)P (B < y + 1)) =

y=0

1 h X

1 X

1 X

[1

(1

P (A > y)) (1

P (B > y))]

y=0

1

(1

(1

y 2

p) )

y=0

i

=

1 X

(1

p) (2

(1

y

p) ) = 2 y

y=0

=

1 X

(1

y

p)

y=0

1

2 p

(2

3 (2

=

p)p

2p . p)p

1 X

(1

p)2y

y=0

7.41 a) 1210 61 . Let A and B represent the arrival times of the two people. Let X represent the time it takes for the first person to arrive (i.e. X = min(A, B)). Then, using tail probabilities, E(X) =

1 X

P (X > x) =

x=0

59 X

P (A > x, B > x) =

x=0

=

59 ✓ X 60

P (A > x)P (B > x)

x=0

x

61

x=0

59 X

◆2

=

1210 ⇡ 19.8361 61

b) 2450 61 . Let Y represent the time that it takes for the second person to arrive (i.e. Y = max(A, B)). Then, using tail probabilities, E(Y ) = =

59 X

1 X y=0

(1

P (Y > y) =

59 X

(1

P (Y < y + 1)) =

y=0

59 X

(1

P (A < y + 1, B < y + 1))

y=0

P (A < y + 1)P (B < y + 1)) =

y=0

59 X y=0

"

1



y+1 61

◆2 #

=

2450 ⇡ 40.1639 61

c) 1240 61 . Let Z represent the time that the first person has to wait for the second. Then Z = |A B| = Y X, thus, using answers to (a),(b), one deduces that E(Z) = E(Y )

E(X) =

2450 61

1210 1240 = ⇡ 20.3279 61 61

225

CHAPTER 7. Expected Value of Discrete Random Variables

7.42 For any random variable X with finite expectation, since the random variable (X is always nonnegative, so is its expectation, and thus E(X

0, i.e. E(X 2

E(X))2

2E(X)X + (E(X))2 )

which (by linearity of expectation) implies that E(X 2 )

(E(X))2

E(X))2

0,

0.

7.43 a) X and Y are linearly uncorrelated if and only if E ((X

E(X)) (Y

E(Y ))) = 0,

which is equivalent to E (XY

XE(Y )

Y E(X) + E(X)E(Y )) = 0

E(XY )

E(X)E(Y ) = 0,

or, equivalently, thus, the desired result follows. b) Yes. If X and Y are independent, then E(XY ) = E(X)E(Y ), which, by (a), implies that X and Y are linearly uncorrelated. c) No. For example, the random variables with joint PDF shown in Table 7.8 are linearly uncorrelated but not independent. Theory Exercises 7.44 a) Since X = Y1 + · · · + Yr for some (independent) G(p) random variables Y1 , . . . , Yr , then E(X) =

r X i=1

E(Yi ) =

r X 1 i=1

p

r = . p

b) No, independence of the geometric random variables is not required for the result in (a), since the proof in (a) relies only on the linearity property of expectation (which holds regardless of the dependence structure between the random variables). 7.45 To see that for arbitrary discrete random variables defined on the same probability space with finite expectations and arbitrary c1 , · · · , cm 2 R, m 2 N , the equality E

✓X m j=1

cj Xj



=

m X j=1

cj E(Xj )

(7.2)

holds, note first that (7.2) holds for m = 2 by Proposition 7.3. Next assume that (7.2) holds for all m  n. Then E

✓n+1 X j=1

cj Xj



✓X ◆ ✓X ◆ n n =E ( cj Xj ) + cn+1 Xn+1 = E cj Xj + cn+1 E(Xn+1 ) j=1

=

n X j=1

cj E(Xj ) + cn+1 E(Xn+1 ) =

j=1

n+1 X j=1

cj E(Xj ).

7.2 Basic Properties of Expected Value

226

Thus, (7.2) holds for m = n + 1, which by mathematical induction implies that (7.2) holds for arbitrary m 2 N .

7.46 a) Let X1 , . . . , Xm be independent discrete random variables defined Qm on the same sample space and having and let c1 , . . . , cm 2 R. Claim: j=1 Xj has finite expecQ finite expectation, Qm tation and E( m X ) = E(X ). Note that the claim holds true for m = 2 in view of j j j=1 j=1 Proposition 7.5. Next assume that the claim is true for all m  n. Then E

✓n+1 Y j=1

Xj



✓ Y ◆ ✓Y ◆ ✓Y ◆ n n n n+1 Y =E ( Xj )Xn+1 = E Xj E(Xn+1 ) = E(Xj ) E(Xn+1 ) = E(Xj ), j=1

j=1

j=1

j=1

and the required conclusion follows by mathematical induction principle. b) Here we have 1 0 m Y X X E@ Xj A = ... x1 · · · xm pX1 ,...,Xm (x1 , . . . , xm ) j=1

(x1 ,...,xm )

=

X

...

X

(x1 ,...,xm )

=

X x1

x1 · · · xm pX1 (x1 ) · · · pXm (xm )

x1 pX1 (x1 ) · · ·

X

xm pXm (xm ) =

xm

m Y

j=1

where all the series involved are absolutely convergent.

E(Xj ),

7.47 By Exercise 7.27, it follows that |E(X)| = |E(X + )

E(X )|  |E(X + )| + |E(X )| = =

X

xpX (x) +

x>0

X x

X

( x)pX (x)

x M , therefore, X X X E(|X|) = |x|P (X = x) = |x|P (X = x)  M P (X = x) x

|x|M

=M

X

|x|M

P (X = x) = M.

|x|M

Then, in light of Exercise 7.47, the desired result follows. 7.49 a) For any real t > 0 and any nonnegative random variable X, tP (X

t) = t

1 X

P (X = x) =

x=t

1 X x=t



1 X x=0

tP (X = x) 

xP (X = x) = E(X).

1 X x=t

xP (X = x)

227

CHAPTER 7. Expected Value of Discrete Random Variables

b) Consider an arbitrary nonnegative random variable X. First, if X t then tI{X t} = t  X. Second, if 0  X < t then tI{X t} = 0  X. Thus, tI{X t}  X for all t > 0. c) The desired result follows upon taking expectation in the inequality in (b), due to the monotonicity property of expectation. Advanced Exercises 7.50 a) 4.5; Let Y be the first digit of the decimal expansion of X. Then the PMF of Y is given by: ✓ ◆ y y+1 pY (y) = P X< = 0.1 for y 2 {0, 1, . . . , 9}, 10 10 P and pY (y) = 0 otherwise, implying that E(Y ) = 9j=0 0.1j = 4.5 p b) 6.15; Let U be the first digit of the decimal expansion of X. Then the PMF of U is given by: ✓ ◆ ✓ 2 ◆ p u+1 u (u + 1)2 (u + 1)2 u2 2u + 1 u pU (u) = P  X< =P X< = = 10 10 100 100 100 100 P for u 2 {0, 1, . . . , 9} (and pU (u) = 0 otherwise). Therefore, E(U ) = 9u=0 u(2u+1) = 123 100 20 = 6.15 p c) 4.665; Let V be the second digit of the decimal expansion of X. Then the PMF of V is given by: ✓ ◆ X ✓ ◆ 9 9 X (10k + v + 1)2 10k + v p 10k + v + 1 (10k + v)2 pV (v) = P  X< = P X< 100 100 104 104 k=0

=

k=0

9 X (10k + v + 1)2

(10k + v)2

104

k=0

=

9 X 20k + 2v + 1

104

k=0

and pV (v) = 0 otherwise. Then E(V ) =

P9

v=0

91v+2v 2 103

=

91 + 2v , v 2 {0, 1, . . . , 9}; 103

= 4.665

7.51 Let Ai be the event that the ith customer is a woman and the (i + 1)st customer is a man, iP= 1, . . . , 49. Then the total number of times that a woman is followed by a man is equal to 49 i=1 1Ai , thus the expected number of times that a woman is followed by a man is equal to E

✓X 49 i=1

1Ai



=

49 X i=1

P (Ai ) =

49 X i=1

0.6 ⇥ 0.4 = 49 ⇥ 0.24 = 11.76,

where above we assume that there is always a 40% chance that a randomly selected customer is a man and that customers are independent of each other. On the other hand, if we are told that, in a given sample of fifty customers who entered the store, exactly twenty customers (40%) were men (and customers were sampled without replacement), then the expected number of times that a woman was followed by a man is equal to ✓X ◆ X 49 49 49 48 X 19 E 1Ai = P (Ai ) = 50 = 12. i=1

i=1

i=1

20

7.2 Basic Properties of Expected Value

228

7.52 a) Since X 0, then |tX |  1 for all t 2 [ 1, 1]. Thus, the desired result follows from Exercise 7.48. b) Clearly, for a nonnegative integer-valued random variable X, 1 X

PX (t) = E(tX ) =

tn pX (n).

n=0

c) Note that PX0 (t) =

1

1

x=0

x=1

X d d X x E(tX ) = t P (X = x) = xtx dt dt

thus, PX0 (1)

=

1 X

1

P (X = x),

xP (X = x) = E(X).

x=1

7.53 a) The PGF for a B(n, p) random variable X is given by: PX (t) =

✓ ◆ n X n k tk p (1 k

p)n

k

= (pt + 1

p)n ,

k=0

thus, PX0 (t) = n(pt + 1 p)n 1 p, implying that E(X) = PX0 (1) = np. b) The PGF for a P( ) random variable X is given by: PX (t) =

1 X

tk ·

k=0

k

e k!

=e

(t 1)

,

thus, PX0 (t) = e (t 1) and E(X) = PX0 (1) = . c) The PGF for a G(p) random variable X is given by: PX (t) =

1 X

tk p(1

p)k

1

=

k=1

1

pt = (1 p)t 1

pt , t + pt

p 1 , implying that E(X) = PX0 (1) = . 2 (1 t + pt) p d) The PGF for a N B(r, p) random variable X is given by: thus, PX0 (t) =

✓ 1 X k k PX (t) = t r k=r

= t p (1 r r

=



1

◆ 1 r p (1 1

t(1

tp t + tp

p)) ◆r

.

p)k r

r

1 ✓ X k

|k=r

r

◆ 1 (1 1

t(1 {z

=1

p))r (t(1

p))k

r

}

229

CHAPTER 7. Expected Value of Discrete Random Variables

rpr tr 1 r , which implies that E(X) = PX0 (1) = . (1 t + tp)r+1 p e) The PGF for the Uniform distribution on {1, . . . , N } is Thus, PX0 (t) =

PX (t) =

N X

1 t(1 tN ) = , N N (1 t)

tk

k=1

then PX0 (t) = =

1

tN + N tN (t N (1 t)2

N X1 tk (1 k=0

1)

1 + t + · · · + tN 1 N (1 t)

N X1 tk tN k ) = (1 + t + · · · + tN N (1 t) N

k 1

N tN

=

N X1 k=0

tk tN N (1 t)

). Therefore,

k=0

E(X) = PX0 (1) =

7.3

=

N X1 k=0

N

k N

=

N 1 X N (N + 1) N +1 `= = . N 2N 2 `=1

Variance of Discrete Random Variables

Basic Exercises 7.54 a) 5.833; Using Equation 7.30 and Exercise 7.1, we have ⇣

Var(X) = E (X

2

µX )



=

12 X

(x

7)2 pX (x) = 5.833

x=2

b) 5.833; Since E(X 2 ) = 54.833, Var(X) = E(X 2 )

(E(X))2 = 5.833

7.55 10.512; From Exercise 5.27, we have that E(X 2 ) = 2 formula for variance, Var(X) = 175⇡ ( 25⇡ 36 36 ) ⇡ 10.512

175⇡ 36 .

Thus, using the computing

7.56 a) 0.6; Using Exercise 5.28, we have that when p = 0.5, E(X 2 ) = 1.6 and E(X) = 1, thus, using the computing formula for variance, Var(X) = 1.6 12 = 0.6. b) 0.5856; Using Exercise 5.28, we have that when p = 0.4, E(X 2 ) = 1.36 and E(X) = 0.88, thus, using the computing formula for variance, Var(X) = 0.5856. 7.57 a) 72/121. Let X be the number of women chosen. Using Exercise 5.29 (a), we have that 27 72 2 ( 15 E(X 2 ) = 27 11 , and using the computing formula for variance, Var(X) = 11 11 ) = 121 . b) 72/121. Note that X has a hypergeometric distribution with parameters N = 11, n = 3 and 3) 5 72 p = 11 . Thus, using Table 7.10, we have that Var(X) = 3⇥(5/11)⇥(6/11)⇥(11 = 121 , which is 11 1 the same result as in (a). 7.58 10.24; Let {!1 , . . . , !N } be the finite population in question and yi (i = 1, · · · , N ) be the value of the variable of interest for the ith member of the population (i.e. for !i ). If X is the value of the variable of interest for a randomly selected member of the population, then, by an argument similar to the one given in Exercise 7.7, we have Var(X) = E(X ) 2

(E(X)) = E(X ) 2

2

N 1 X 2 y¯ = X (!i ) N 2

i=1

N 1 X 2 y¯ = yi N 2

i=1

y¯2

7.3 Variance of Discrete Random Variables

=

N 1 X (yi N

230

y¯)2 = 3.22 = 10.24

i=1

7.59 a) 600. Let X be the number of people tested until the first positive test occurs. Then X has a G(p) distribution (with some p 2 (0, 1)). Since, on average, the first positive test occurs with the 25th person, then E(X) = 25 = 1/p, implying that p = 1/25 = 0.04, thus, using Table 7.10, one obtains that Var(X) = (1 p)/p2 = 0.96/0.0016 = 600. b) 3000. Let Y be the number of tests performed until the fifth person who tests positive is 1 found. Then Y has a negative binomial distribution with parameters 5 and 25 . Thus using Table 7.10, Var(Y ) = 3000. 7.60 1.8; Let X be the number of children cured. Then X has B(20, 0.9) distribution and Var(X) = 20 ⇥ 0.9 ⇥ 0.1 = 1.8. p

7.61 415 . If both parents have the sickle cell trait, then the probability that they will pass it on to their child is 0.25, and the number of children (out of 5) who will have the sickle cell anemia has a binomial distribution with parameters 5 and 0.25. Thus the variance is 15 16 = 0.9375 and the standard deviation is 14 25 .

p

15 4

⇡ 0.968

7.62 a) When sampling without replacement, the number of selected undergraduates has a hypergeometric distribution with parameters N = 10, n = 3 and p = 0.6, thus the variance is 14 25 = 0.56 b) When sampling with replacement, the number of selected undergraduates has a binomial 18 distribution with n = 3 and p = 0.6, thus the variance is 25 = 0.72 c) While the expectations for the hypergeometric and binomial distributions (given above) are the same, the corresponding variances are di↵erent. 7.63 Var(X) ⇡ 0.271, since X has a hypergeometric distribution with parameters N = 100, n = 5 and p = 0.06. ⌘ ⇣ n 7.64 a) N N 1 np(1 p). Without replacement, if X represents the number of members sampled that have the specified attribute, then X has a hypergeometric distribution with parameters N , n and p. b) np(1 p). With replacement, if X represents the number of members sampled that have the specified attribute, then X has a B(n, p) distribution and the required answer follows. 7.65 15. Since the arrival of cars has a Poisson distribution with parameter variance is 15.

= 15, then the

7.66 9.3; If the purchases are independent, then the distribution of the number of customers per hour who buy a caf´e mocha with no whipped cream is Poisson with = 31 ⇥ 0.3 = 9.3; since the variance of P( ) is equal to , the desired answer then follows.

7.67 3.77; From Exercise 5.85, letting X be the number of eggs observed in a nonempty nest, we find that E(X 2 ) = 1 20 , since e 4 E(X 2 ) =

1 X k=1

k2

e 4 4k 4 = 4 k!(1 e ) 1 e

4

1 X k=1



e 4 4k 1 4 = (k 1)! 1 e

4

1 X `=0

|

(` + 1) {z

e

=E(Y +1)

4 4`

`! }

=

20 1 e

4

,

231

CHAPTER 7. Expected Value of Discrete Random Variables

where Y is P(4), thus E(Y + 1) = E(Y ) + 1 = 4 + 1 = 5. Therefore, by the computing formula for expectation and Exercise 7.18, ✓ ◆2 20 4 Var(X) = ⇡ 3.77054 1 e 4 1 e 4 7.68 a) Since the distribution of the occurrence of the first hit has a geometric distribution 0.26 with parameter 0.26, the variance is 10.26 2 ⇡ 10.94675 b) Since the distribution of the occurrence of the second hit has a negative binomial distribution with parameters 2, and 0.26, the variance is 21.8935 c) Since the distribution of the occurrence of the tenth hit has a negative binomial distribution with parameters 10 and 0.26, the variance is 109.4675 7.69 Noting that IE2 = IE , we have that E(IE2 ) = E(IE ) = P (E), thus, by the computing formula for expectation, Var(IE ) = E(IE2 ) (E(IE ))2 = P (E) (P (E))2 = P (E)(1 P (E)).

7.70 Since IE has a binomial distribution with parameters n = 1 and p = P (E), then Var(IE ) = np(1

p) = 1 · P (E)(1

P (E)) = P (E)(1

P (E)).

7.71 By linearity of expectation, ⇤

E(X ) = E



X

µX X



=

E(X)

µX

= 0.

X

By Proposition 7.10, ⇤

Var(X ) = Var



X

µX X



=

1

µX ) =

2 Var(X X

1

2 Var(X) X

= 1.

7.72 a) First note that Xn is symmetric about 0, thus E(Xn ) = 0. Then 1 2 ) + n2 2 = 1. 2 n 2n

Var(Xn ) = E(Xn2 ) = 02 (1 Then, by Chebyshev’s inequality, P (|Xn |

1 3)  . 9

b) By direct calculation, for all n 3, P (|Xn | 3) = n12 . c) The upper bound, provided by the Chebyshev’s inequality, is exact for n = 3, but gets worse as n increases, since the exact probability goes to 0 as n goes to infinity. 7.73 a) The probability that a random variable is less than k standard deviations away from its mean is at least 1 k12 . b) The probability that a random variable is less than 2 standard deviations away from its mean is at least 34 . The probability that a random variable is less than 3 standard deviations away from its mean is at least 89 . c) Proposition 7.11 is equivalent to P (|X

µX | < t)

1

Var(X) t2

7.3 Variance of Discrete Random Variables

232

for all positive real t. Then the desired relation (*) follows upon taking t equal to k 7.74 (N 2 then

1)/12. Here, since pX (x) = E(X) =

and E(X ) = 2

implying that

1 N

for all x 2 {1, 2, . . . , N } and pX (x) = 0 otherwise,

N N X x 1 X N +1 = x= N N 2 x=1

N X x2 x=1

X.

x=1

N 1 X 2 (N + 1)(2N + 1) = x = , N N 6 x=1

Var(X) = E(X 2 )

(E(X))2 =

N2 1 . 12

7.75 a) Let f (c) = E((Y c)2 ). Then f 0 (c) = E(2(c Y )) = 2(c E(Y )). Therefore f 0 (c) > 0 for all c > E(Y ), and f 0 (E(Y )) = 0, and f 0 (c) < 0 for all c < E(Y ). Thus, f (c) achieves its minimum at the point c = E(Y ). b) By (a), min E((Y c)2 ) = f (E(Y )) = E(Y E(Y ))2 = Var(Y ). c

Theory Exercises 7.76 First note that if |x| < 1 then |x|r 1  1 for all r 2 N , thus, |x|r 1  |x|r + 1. On the other hand, if |x| 1, then |x|r 1  |x|r for all r 2 N , thus, |x|r 1  |x|r + 1. Therefore, for all x 2 R and r 2 N , the inequality |x|r 1  |x|r + 1 holds. Thus, for a discrete random variable X with finite rth moment (for some arbitrary fixed r 2 N ), we have that E|X|r

1

=

X x

|x|r

1

pX (x) 

X

(|x|r + 1)pX (x) =

x

Thus X has a moment of order r

X x

|x|r pX (x) +

X x

pX (x) = E|X|r + 1 < 1.

1. Therefore, inductively, X has all lower order moments.

7.77 First assume that X µX has a finite rth moment. Then, by Exercise 7.76, it follows that E|X µX |k < 1 for all k 2 {0, 1, . . . , r}, which implies that ! r ✓ ◆ X r E|X|r = E|X µX + µX |r  E(|X µX | + |µX |)r = E |X µX |k |µX |r k k k=0

=

r ✓ ◆ X r k=0

k

|µX |r

k

E|X

µX |k < 1,

in view of the binomial theorem and linearity property of expected value. Thus, X has a finite rth moment. Next assume that X has a finite rth moment. Then from Exercise 7.76, X has a finite k th moment for all 0  k  r. Hence ! r ✓ ◆ X X X X r r r r k r k E|X µX | = |x µX | pX (x)  (|x| + |µX |) pX (x) = |x| |µX | pX (x) k x x x k=0

233

CHAPTER 7. Expected Value of Discrete Random Variables

=

r ✓ ◆ X r

k

k=0

hence, X

|µX |r

X

k

x

!

|x|k pX (x)

< 1,

µX has a finite rth moment.

7.78 Since P (|X| > M ) = 0, then pX (x) = 0 for all real x such that |x| > M . Then for all r 2 N, X x

|x|r pX (x) =

X

x: |x|M

|x|r pX (x) 

X

x: |x|M

M r pX (x) = M r P (|X|  M ) = M r ,

implying that X has finite moments of all orders and |E(X r )| =

X x

xr pX (x) 

X x

|xr |pX (x)  M r .

7.79 a) Consider a finite population {!1 , . . . , !N } and a variable of interest y defined on it. Let P yi be the value of y on the ith member of the population (i.e. y2i = y(!i )) and let µ = ( N is given by: i=1 yi )/N be the population mean. Then the population variance 2

=

N 1 X (yi N

µ)2 .

i=1

Suppose a member is selected at random from the population and X is equal to the corresponding value of y for that member. Then, whenever member !i is chosen, X takes value yi (for each i = 1, . . . , N ). Therefore, N 1 X 2 = (X(!i ) µ)2 . N i=1

b) By (a) and since P ({!i }) =

for each i = 1, . . . , N , then

1 N

2

=

N X

(X(!i )

i=1

which, by Exercise 7.6, implies that 2 = E((X E(X))2 ) = Var(X). 7.80 Note that {(X P (|X

µX )2

µX |

7.81 First note that if |X the S1 other hand, {|X µX | µX | 1/n}. n=1 {|X

2

= E((X

t2 } = {|X

t) = P ((X

µX | µX )2

µ)2 P ({!i }), µ)2 ). By Proposition 7.1, E(X) = µ, thus, t}. Thus, by Markov’s inequality, t2 ) 

E((X

µX )2 ) t2

=

Var(X) . t2

µX | > 0, then there exists n 2 N such that |X µX | 1/n. On 1/n} ⇢ {|X µX | > 0} for each n 2 N . Hence, {|X µX | > 0} =

7.3 Variance of Discrete Random Variables

234

Now, assume that a random variable X has zero variance. Then ! ✓ 1 1 [ X 1 P (|X µX | > 0) = P {|X µX | }  P |X n n=1

n=1

1 X Var(X)



=

1 n2

n=1

1 X 0

1 n=1 n2

µX |

1 n



= 0.

Therefore, P (X = µX ) = 1 and X is a constant random variable. On the other hand, if a random variable X is constant, then, for some constant c 2 R, P (X = c) = 1 and P (X 2 = c2 ) = 1, implying that Var(X) = E(X 2 ) (E(X))2 = c2 c2 = 0. 7.82 Assume that properties (a) and (b) in Proposition 7.10 are satisfied. Then Var(a + bX) = Var(bX) = b2 Var(X) holds for all constants a, b 2 R. Conversely, suppose equation (7.33) is satisfied. Then, upon taking a = 0 and b = c in (7.33), one arrives at property (a), whereas taking a = c and b = 1 reduces (7.33) to property (b). Thus, equation (7.33) is equivalent to properties (a) and (b) together. Advanced Exercises 7.83 First, we recall that E(X) = ↵. Next note that E(X ) = 2

1 X x=0

1

X ↵x x = x2 (1 + ↵)x+1 2

x=1



↵ 1+↵

◆x ✓

1

↵ 1+↵

where Y is a geometric random variable with parameter p = 1 E(Y 2 ) = Var(Y ) + (E(Y ))2 =

1

p p2

+



↵ 1+↵

1 2 p = = 2(1 + ↵)2 2 p p2

=

↵ E(Y 2 ), 1+↵

=

1 1+↵ .

Since

(1 + ↵) = (1 + ↵)(1 + 2↵),

then E(X 2 ) = ↵ + 2↵2 , and Var(X) = ↵ + 2↵2

↵2 = ↵ + ↵2 .

7.84 By definition of PGF, for a nonnegative-integer valued random variable X with finite variance, we have that 1 X PX0 (t) = ntn 1 pX (n) n=0

so

PX0 (1)

= E(X) and

PX00 (t) =

1 X

(n2

n)tn

2

pX (n)

n=0

so

PX00 (1)

=

E(X 2

X) =

E(X 2 )

E(X), implying that

Var(X) = PX00 (1) + PX0 (1)

(PX0 (1))2 .

235

CHAPTER 7. Expected Value of Discrete Random Variables

7.85 a) np(1

p). From Exercise 7.53 (a), for a B(n, p) random variable X, we have that 00

PX (t) = np(n

1)(pt + (1

p))n

2

00

p, thus PX (1) = np2 (n

Thus, by Exercise 7.84, Var(X) = np2 (n 1) + np n2 p2 = np(1 b) . Using Exercise 7.53 (b), for a P( ) random variable X, 00

PX (t) =

2

e

(t 1)

00

, and PX (1) =

2

1).

p).

.

2 = . Thus, by Exercise 7.84, Var(X) = 2 + c) (1 p)/p2 . Using Exercise 7.53 (c), for a G(p) random variable X, 00

PX (t) =

2p(1 p) 2(1 p) 00 , thus PX (1) = . 3 (1 t + pt) p2

Thus, by Exercise 7.84, Var(X) = 2(1p2 p) + p1 p12 = 1p2p . d) r(1 p)/p2 . Using Exercise 7.53 (d), for a N B(r, p) random variable X, 00

PX (t) =

rpr tr 2 (r (1 t + tp)r+2

1 + 2t

00

2tp) , thus PX (1) =

r(r + 1 p2

2p)

.

2

2p) Thus, by Exercise 7.84, Var(X) = r(r+1 + pr pr 2 = r(1p2 p) . p2 e) (N 2 1)/12. Using Exercise 7.53 (e), for discrete uniform on {1, . . . , N } random variable X, 00

PX (1) = which, by Exercise 7.84, implies that Var(X) =

7.4

N2 1 , 3 N2 1 3

+

N +1 2

(N +1)2 4

=

N2 1 12 .

Variance, Covariance, and Correlation

Basic Exercises 7.86 a) Var(Y ) = Var(Z) = 35/12. b) Var(X) = Var(Y + Z) = Var(Y ) + Var(Z) = 35/6 ⇡ 5.8333, since Y and Z are independent. c) Since the PMF of Y is much more easily obtained than the PMF of X, the method used in (b) is easier than the method used to obtain Var(X) in Exercise 7.54. 7.87 a) See solution to Exercise 7.34 (a). b) Note that T = 10X + 5Y , and let g(x, y, z) = (10x + 5y)2 . Then g(X, Y, Z) = T 2 and using the FEF, we have ✓ ◆ XXX 4 2 2 1 E(T ) = (10x + 5y) ⇡ x (3⇡)y (36 4⇡)z = 118.202 4 x, y, z 36 3 (x,y,z)2Z+ : x+y+z=4

c) From Exercise 7.34, we have E(T ) = 8.72665, so using the computing formula for variance, we have Var(T ) = 42.0479. d) From Exercise 7.55, it follows that the variance of the score from one shot is 10.512. Since

7.4 Variance, Covariance, and Correlation

236

the shots are independent, we can add their variances, and thus Var(T ) = 4 ⇥ 10.512 = 42.0479 e) One could also compute the PMF of T and compute the variance directly from definition. 7.88 a) Using Table 6.7, we first find that E(X + Y ) = 6.12. Then Var(X + Y ) =

9 X

(z

6.12)2 pX+Y (z) = 1.3456

z=4

b) The variance is the same. 7.89 The variance is equal to 1 when N 2, and 0 when N = 1. Fix N 2. Using the same notation as in Exercise 7.35, we have ! N N X X XX Var 1Ei = Var(1Ei ) + 2 Cov(1Ei , 1Ej ). i=1

Note that

i=1

1 Var(1Ei ) = N

i

1 N

= P (Ei ) for

7.91 38.99; From Exercise 7.37, we have that Xi ⇠ G 7 6 i for i = 2, . . . , 6, and X1 = 1. Since X1 , · · · , X6 are independent, ! 6 6 X X 1 5/6 1 4/6 1 3/6 1 2/6 1 1/6 Var Xi = Var(Xi ) = 0 + + + + + = 38.99 (5/6)2 (4/6)2 (3/6)2 (2/6)2 (1/6)2 i=1

i=1

7.92 a) Clearly, if one has more siblings, there is a greater chance for more sisters, thus, the correlation coefficient between X and Y must be positive. b) E(XY ) E(X)E(Y ) 1.5 1.3 ⇥ (29/40) p p = p ⇡ 0.788478 ⇢(X, Y ) = p 0.91 0.549375 Var(X) Var(Y ) 7.93 a) Since trials are independent, then Xik and Xj` are independent for i = 6 j (and all k and `), implying that Cov(Xik , Xj` ) = 0 for i 6= j and all k and `. b) For all k 6= `, Xik Xi` = 0, since Ek and E` are mutually exclusive. Thus for k = 6 `, Cov(Xik , Xi` ) = E(Xik Xi` )

E(Xik )E(Xi` ) = 0

pk p` =

pk p` .

237

CHAPTER 7. Expected Value of Discrete Random Variables

c) Since Xk =

Pn

i=1 Xik ,

then for all k 6= `, by (a),(b) we have that

✓X ◆ X n n n X n n X X Cov(Xk , X` ) = Cov Xik , Xj` = Cov(Xik , Xj` ) = Cov(Xik , Xi` ) = i=1

j=1

i=1 j=1

npk p` .

i=1

d) For k 6= `, since there are only n experiments, the more times Ek occurs, the less times E` can possibly occur. Thus, Xk and X` must be negatively correlated. e) Since Var(Xj ) = npj (1 pj ), then for all k 6= `, r npk p` pk p` Cov(Xk , X` ) p p p = = · ; ⇢(Xk , X` ) = 1 p 1 p` Var(Xk ) Var(X` ) npk (1 pk )np` (1 p` ) k

and, clearly, ⇢(Xk , X` ) = 1 when k = `. 7.94 Since T = 10X + 5Y ,

Var(T ) = Var(10X + 5Y ) = 100Var(X) + 25Var(Y ) + 100Cov(X, Y ) ✓ ◆ ✓ ◆ ⇡ ⇡ 3⇡ 3⇡ 4⇡(3⇡) = 100 4 · (1 ) + 25 4 · (1 ) 100 · ⇡ 42.0479 36 36 36 36 362 7.95 Let C represent the cost of repairing the defective items in the sample. Also, let X be the number of items with exactly one defect in the sample, and Y be the number of items with at least two defects in the sample. Then C = X + 3Y , and, by equation (*) in Exercise 7.93, Var(C) = Var(X +3Y ) = Var(X)+9Var(Y )+6Cov(X, Y ) = np1 (1 p1 )+9np2 (1 p2 ) 6np1 p2 . 7.96 Using the PMFs given in Example 6.7, the mean of the total number of accidents is q and the standard deviation is 172 147 ⇡ 1.0817

22 7

7.97 Using the joint and marginal PMF given in Exercise 6.6, one obtains that Cov(X, Y ) p =q 301 Var(X) Var(Y )

⇢(X, Y ) = p

36

441 36 912 362

91 161 · 36 q 36

·

791 36

1612 362

=

35 . 73

Clearly, X and Y are positively correlated, although mildly so. The latter is natural, since if X is large, then Y must be large as well. However, it is not very close to 1 since if X is small, Y can vary greatly. 7.98 Cov(X, Y ) = 8.8 and ⇢(X, Y ) ⇡ 0.999175, thus, X and Y have a strong positive linear correlation. Indeed, let A represent the size of a surgical claim and B represent the size of a hospital claim. Then X = A + B and Y = A + 1.2B. Now based on the information provided, E(A) = 5, E(A2 ) = 27.4, E(B) = 7, E(B 2 ) = 51.4 and Var(A + B) = 8. Then Var(A) = Var(B) = 2.4 and 8 = Var(A + B) = Var(A) + Var(B) + 2Cov(A, B) = 2.4 + 2.4 + 2Cov(A, B), which implies that Cov(A, B) = 1.6, which in turn implies that E(AB) = 36.6 Thus, Cov(X, Y ) = E(XY )

E(X)E(Y ) = E(A2 + 2.2AB + 1.2B 2 )

(12)(13.4)

7.4 Variance, Covariance, and Correlation

= E(A2 ) + 2.2E(AB) + 1.2E(B 2 )

238

160.8 = 8.8

Moreover, Var(X) = 8 and Var(Y ) = 9.696. Thus ⇢ = 0.999175. Clearly, there is a strong positive linear correlation between X and Y . 7.99 a) Note that x2 + y 2 Therefore,

2|x||y| = (|x| E|XY | 

0 for all x, y 2 R, thus, 21 (x2 + y 2 )

|y|)2

|xy|.

1 E(X 2 + Y 2 ) < 1, 2

hence, XY has finite mean. b) First we note (x + y)2 = x2 + 2xy + y 2  2(x2 + y 2 )

where the inequality follows from the inequality shown in (a). Since X and Y have finite variances, then both X and Y have finite second moments, which by the just proved inequality implies that X + Y has a finite second moment. Thus, variance of X + Y is well-defined and, by the computing formula for the variance, we have that 0  Var(X + Y ) = E(X + Y )2

(E(X) + E(Y ))2

 2 E X2 + E Y 2

(E(X) + E(Y ))2 < 1.

7.100 By linearity of expectation we have that Cov(X, Y ) = E((X = E(XY )

µX )(Y

µX E(Y )

µY )) = E(XY

µX Y

µY E(X) + µX µY = E(XY )

µY X + µX µY ) E(X)E(Y ).

7.101 Let X denote the value of the variable for a randomly selected member of the population and let X1 , . . . , Xn denote, respectively, the values of the variable for the n members of the random sample. As the sampling is with replacement, the random variables X1 , . . . , Xn are independent and constitute a random sample of size n from the distribution of X. Therefore, ¯ n ) = Var(X)/n = 2 /n, where the last step follows from Proposiby Example 7.16 (b), Var(X tion 7.9. 7.102 First we note that E(X ⇤ ) = 0 = E(Y ⇤ ), thus ✓✓ ◆✓ ◆◆ X µX Y µY ⇤ ⇤ Cov(X , Y ) = E = X

=

1

(E(XY )

X Y

Y

1 X Y

E(X)E(Y )) = p

E (XY

µX Y + µX µY )

µY X

Cov(X, Y ) Var(X)Var(Y )

.

The quantity is unitless since the units of the numerator and denominator are both the product of the units of X and Y . 7.103 a) Since E(X) = 0, we have that Cov(X, Y ) = E(XY ) = E(X 3 ) = ( 1)3 ⇥

1 1 1 + 03 ⇥ + 13 ⇥ = 0, 3 3 3

thus, ⇢(X, Y ) = 0. b) The correlation coefficient is a measure of linear association, whereas here the functional

239

CHAPTER 7. Expected Value of Discrete Random Variables

relationship is quadratic. 7.104 a) Upon taking derivatives with respect to a and b, we arrive at the following two equations for a and b: E(Y a bX) = 0, E(XY

bX 2 ) = 0.

aX

From the first equation it follows that

a = E(Y )

bE(X).

Let us plug-in the latter expression into the second equation. Then (E(Y )

E(XY

bX 2 ) = 0,

b E(X))X

hence E(XY )

and thus

E(Y )E(X) b=

b(E(X 2 )

Cov(X, Y ) = Var(X)

Y

(E(X))2 ) = 0,

⇢(X, Y ).

X

b) For values of a, b from part (a), we have that E(Y = Var(Y )

a



bX) = E (Y 2

E(Y ))

b(X

2bCov(X, Y ) + b2 Var(X) = Var(Y )

c) From (b), it follows that Var(Y )

(Cov(X, Y ))2 Var(X)

◆2 E(X)) (Cov(X, Y ))2 . Var(X)

0, therefore,

(Cov(X, Y ))2  1, thus (⇢(X, Y ))2  1, Var(X)Var(Y ) which implies part (a) of Proposition 7.16. d) If ⇢ = 1 or 1, then (Cov(X, Y ))2 = Var(X)Var(Y ). Pick a and b as in (a), then we have E(Y (a + bX))2 = 0, which implies that Y = a + bX with probability 1. Note also that since Y b= X ⇢(X, Y ), then b has the same sign as ⇢(X, Y ).

Conversely, suppose Y = ↵ + X for some real constants ↵ and > 0. Then the mean square error, E [Y (a + bX)]2 , is zero (and therefore minimized) when a = ↵ and b = , Y which, by (a), implies that = X ⇢(X, Y ). Since Y = | | X , it follows that ⇢(X, Y ) =

| |

=



1, if 1, if

> 0, < 0.

e) ⇢ measures how closely a linear function of one variable fits another variable (in the sense of the mean square error). Theory Exercises

7.4 Variance, Covariance, and Correlation

240

7.105 a) Cov(X, X) = E((X µX )(X µX )) = E((X µX )2 ) = Var(X). b) Cov(X, Y ) = E ((X µX )(Y µY )) = E ((Y µY )(X µX )) = Cov(Y, X). c) Cov(cX, Y ) = E((cX µcX )(Y µY )) = E((cX cµX )(Y µY )) = E(c(X µX )(Y µY )) = c Cov(X, Y ). d) Note that Cov(X, Y + Z) = E ((X = E(X

µX )(Y

µX )(Y + Z µY ) + E(X

a + bX

e) Cov(a + bX, Y ) = E = bCov(X, Y ).

µY +Z )) = E ((X µX )(Z

µ(a+bX) (Y

µX )((Y

µY ) + (Z

µZ )))

µZ ) = Cov(X, Y ) + Cov(X, Z).

µY ) = E((a + bX

a

bµX )(Y

µY ))

7.106 By induction and Proposition 7.12 (parts (b),(c),(d)), we have that for arbitrary random variables X, Y1 , . . . , Yn defined on the same sample space and having finite variances, ✓

Cov X,

n X

bk Yk

k=1



=

n X

bk Cov(X, Yk ).

k=1

Thus, Cov

✓X m j=1

aj Xj ,

n X

bk Yk

k=1

=



m X n X

=

n X

bk Cov

✓X m

aj bk Cov(Yk , Xj ) =

j=1 k=1

7.107 Clearly,

Cov(X, Y ) = E((X = E(XY )

aj Xj , Yk

j=1

k=1

m X n X



=

n X



bk Cov Yk ,

k=1

m X

aj Xj

j=1



aj bk Cov(Xj , Yk ).

j=1 k=1

µX )(Y

µX E(Y )

µY )) = E(XY

µX Y

µY E(X) + µX µY = E(XY )

µY X + µX µY ) E(X)E(Y ).

7.108 By Proposition 7.12, we have that ✓X ◆ ✓X ◆ X m m m m X m X Var Xk = Cov Xk , X` = Cov(Xk , X` ) k=1

=

m X

Cov(Xk , Xk ) +

k=1

k=1

m X m X

`=1

k=1 `=1

Cov(Xk , X` ) =

k=1 `=1 `6=k

m X k=1

Var(Xk ) + 2

X

1k 0. The latter implies that Y |X=x has finite variance for each x 2 R such that pX (x) > 0.

7.128 a) Recall that (see the argument involving equation by the conditional form of P (7.70)), P the FEF, g(Z, Y )|X=x has finite expectation if and only if (z,y) |g(z, y)|pZ,Y |X (z, y | x) < 1, in which case XX E(g(Z, Y ) | X = x) = g(z, y)pZ,Y |X (z, y | x). (z,y)

Take Z = X in the above statement. Since for all x such that pX (x) > 0, we have that ⇢ P (X = z, Y = y, X = x) pY |X (y | x), if z = x, pX,Y |X (z, y | x) = = 0, if z 6= x, P (X = x) P then g(X, Y )|X=x has finite expectation if and only if y |g(x, y)|pY |X (y | x) < 1, in which case X E(g(X, Y ) | X = x) = g(x, y)pY |X (y | x). y

b) Let g(x, y) = k(x)`(x, y). By (a), ⇠(x) ⌘ E(g(X, Y ) | X = x) =

X

g(x, y)pY

y

= k(x)

X

| X (y | x)

`(x, y)pY

y

| X (y | x)

= k(x)E(`(X, Y ) | X = x),

then E(k(X)`(X, Y ) | X) = ⇠(X) = k(X)E(`(X, Y ) | X).

c) We have

E(E(g(X, Y )|X)) = =

XX x

y

X x

E(g(x, Y )|X = x)pX (x) =

g(x, y)pX (x)pY |X (y|x) =

XX (x,y)

X X x

y

!

g(x, y)pY |X (y|x) pX (x)

g(x, y)pX,Y (x, y) = E(g(X, Y )).

Advanced Exercises 7.129 7/9. Let N be the number of unreimbursed accidents. Then, by Example 6.15, P (N = 0 | X = 0) =

12 4 = , 27 9

4 5 9 1 + = = . 27 27 27 3 6 2 P (N = 2 | X = 0) = pY,Z | X (2, 2 | 0) = = . 27 9

P (N = 1 | X = 0) = pY,Z | X (1, 2 | 0) + pY,Z | X (2, 1 | 0) =

7.5 Conditional Expectation

250

E(N ) = 0 · P (N = 0 | X = 0) + 1 · P (N = 1 | X = 0) + 2 · P (N = 2 | X = 0) =

1 2 7 +2⇥ = . 3 9 9

7.130 a) First, by Exercise 7.128, we have that X

E(Xk X` | Xk = xk ) =

x`

xk x` pX` | Xk (x` | xk )

nX xk

=

xk x`

x` =0



n

xk x`

◆✓

p` 1 pk

implying that E(Xk X` | Xk ) =

◆ x` ✓

1

p` 1 pk

◆n

xk x`

=

p` xk (n xk ) , 1 pk

p` Xk (n Xk ) . 1 pk

Thus, ✓ ◆ n X p` xk (n xk ) n xk E(Xk X` ) = E(E(Xk X` | Xk )) = p (1 1 pk xk k

pk )n

xk

= (n

1)npk p` .

xk =0

b) Using (a), we have Cov(Xk , X` ) = E(Xk X` ) E(Xk )E(X` ) = n(n 1)pk p` npk ·np` = which agrees with Exercise 7.93.

npk p`

7.131 Note that (Y | X = x) is B(x, p). Thus, E(Y | X) = pX and Var(Y | X) = p(1 p)X, implying that Var(Y ) = Var(E(Y | X)) + E(Var(Y | X)) = p2 Var(X) + p(1 p)E(X) = p , and E(XY ) E(X)E(Y ) = E(E(XY | X)) E(X)E(E(Y | X)) = E(XpX) p(E(X))2 = pVar(X) = p . p Thus ⇢(X, Y ) = p p

p

=

p

p.

7.132 a) For each k 2 H, by Exercise 7.128, E((Y

(X))k(X)) = E(E((Y

(X))k(X)|X)) = E(k(X)(E(Y |X)

E(Y |X))) = 0.

b) Note that (Y h(X))2 (Y (X))2 2(Y (X))( (X) h(X)) = (h(X) (X))2 0. c) Taking the expectation of both sides of the inequality in (b) and applying (a), we obtain that for all h 2 H, ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ E (Y h(X))2 E (Y (X))2 = E (Y E(Y | X))2 . Since

2 H, then

⇣ E (Y

⌘ ⇣ E(Y | X))2 = min E (Y h2H

⌘ h(X))2 .

7.133 a) If E(Y | X) = a + bX then E(Y ) = E(E(Y |X)) = a + bE(X). By Exercise 7.128, we have that E(XY |X) = XE(Y |X). Then (⇢(X, Y ))2 =

(E(XY ) E(X)E(Y ))2 [E(E(XY |X)) E(X) (a + bE(X))]2 = Var(X)Var(Y ) Var(X)Var(Y )

251

CHAPTER 7. Expected Value of Discrete Random Variables

=



E(X(a + bX))

=



b (E(X))2

aE(X)

Var(X)Var(Y )

b2 E(X 2 )

(E(X))2

Var(Y )



=

⌘2

2

b2 E(X 2 ) E(X 2 ) = Var(X)Var(Y )

b2 Var(X) Var(E(Y |X)) = . Var(Y ) Var(Y )

b) If the conditional expectation is linear, the coefficient of determination is the proportion of the total variance of Y that is explained.

7.6

Review Exercises

Basic Exercises 7.134 31.4, due to the long-run average interpretation of the expected value. 7.135 a) $290 b) $2900 7.136 a) The mean of the salesman’s daily commission is $420 with a standard deviation of $76.68. Indeed, let X be the daily number of sales and Y be the salesman’s total daily commission. Then X is B(n = 20, p = 0.6) and Y = 35X, implying that p p E(Y ) = 35E(X) = 35 ⇥ 20 ⇥ 0.6 = 420, Y = 35 X = 35 20 ⇥ 0.6 ⇥ 0.4 = 35 4.8 ⇡ 76.68 b) In (a) it was assumed that the attempts to make a sale were independent, with the same probability of success (sale), and thus represented the Bernoulli trials. 7.137 Let Y denote the amount the insurance company pays. Then the PMF of Y is y

pY (y)

0

e

0.6

1000000

0.6e

2000000

1

0.6

1.6e

0.6

Then the mean and standard deviation of Y are equal to $573089.75 and $698899.60, respectively. 7.138 a) No, because the mean returns are equal. b) The second investment is more conservative since its standard deviation is smaller. 7.139 a) The nth factorial moment equals k = 0, . . . , n 1, then E(X(X

1) . . . (X

n + 1)) =

1 X

k(k

n.

Since k(k

1) . . . (k

n + 1)

k=0

=e

1 X `=0

`+n

`!

=

where we put ` = k n. b) From (a) with n = 1, E(X) = ; when n = 2, E(X(X

n

1) . . . (k k

e k!

=

1 X

k=n

n + 1) = 0 for all k! (k

n)!

·

k

e k!

,

1)) = E(X 2 )

=

2,

implying that

7.6 Review Exercises

E(X 2 ) =

+ , thus, Var(X) = (

2

2

+ )

2

252

= .

7.140 a) Using Exercise 5.136 (a), we have E(Y ) =

m X1

p)y

yp(1

1

+ m(1

p)m

1

1

=

(1 p)m . p

y=1

b) Using the FEF and the PMF of X, we have E(Y ) = =

1 X

min(x, m)p(1

x 1

p)

x=1

1 X

=

m X

xp(1

x 1

p)

+ pm

x=1

xp(1

x 1

p)

x=1

1 X

xp(1

= E(X)

(1

p)

1 X

(1

p)x

1 (1

p)

1

x=m+1

p)x

1

+ pm(1

x=m+1 m

1 X

(j + m)p(1

p)j

1

p)m ·

1

+ m(1

p)m

j=1

= E(X) =

1 p

(1

(1 p)m E(m + X) + m(1 p)m ✓ ◆ 1 1 (1 p)m m p) m + + m(1 p)m = . p p

c) Using tail probabilities of Y , for m 2 N , we have E(Y ) =

1 X

P (Y > y) =

y=0

1 X

P (min(X, m) > y) =

y=0

m X1 y=0

P (X > y) =

m X1 y=0

(1 p)y =

1

(1 p)m . p

7.141 a) Using the expectation of a geometric random variable, the average time, taken to 1 complete the exam, is 0.020 = 50 minutes. b) Using Exercise 7.140, if there is a 1 hour time limit (i.e. m=60), then the average time, that 0.02)60 it takes a student to complete the exam, would be equal to 1 (10.02 ⇡ 35.122 minutes. 7.142 a) The following table represents the conditional expectation of the number of radios for each number of televisions: Televisions x 0

Conditional expectation E(Y |X = x) 5.500

1

5.391

2

5.650

3

5.562

4

5.646

b) The following table represents the conditional variance of the number of radios given the number of televisions:

253

CHAPTER 7. Expected Value of Discrete Random Variables

Televisions x 0

Conditional variance Var(Y |X = x) 1.250

1

0.628

2

0.788

3

0.654

4

0.699

P 7.143 a) From Exercise 7.142 (a), we have E(Y ) = x E(Y |X = x)pX (x) = 5.574. b) From Exercise 6.123 we have E(Y ) = 5.576. The small di↵erence between this answer and the one in (a) is due to rounding error. c) From Exercise 7.142 (a) and (b), we have Var(Y ) = 0.722. d) Using Exercise 6.123 we have Var(Y ) = 0.724, where the small di↵erence between this answer and the one in (c) exists due to rounding error. 7.144 a) The PMF of X is pX (x) =

13 2x 36

E(X) =

for x = 1, 2, 3, 4, 5, 6 and pX (x) = 0 otherwise. Thus,

6 X x=1



13

2x 36

=

91 . 36

b) Using the joint PMF of Y and Z and the FEF, we have E(X) =

XX

min(y, z)

(y,z)2{1,...,6}2

1 91 = . 36 36

p p p p 7.145 a) Cov(X, Y ) = ⇢(X, Y ) Var(X) Var(Y ) = 12 25 144 = 30. b) Cov(X + Y, X Y ) = Var(X) Var(Y ) = 25 144 = 119. c) Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ) = 25 + 144 + 2( 30) = 109. d) Var(X Y ) = Var(X) + Var(Y ) 2Cov(X, Y ) = 25 + 144 2( 30) = 229. e) Var(3X + 4Y ) = 9Var(X) + 16Var(Y ) + 24Cov(X, Y ) = 1809. f ) Cov(3X + 4Y, 2X + 5Y ) = 6Var(X) + 7Cov(X, Y ) + 20Var(Y ) = 2520. 7.146 Let X be the number of wells drilled until the third producing well is found. Then X ⇠ N B(3, 1/3). Let Y be the amount of the company’s profit. Then E(Y ) = $7500 and the standard deviation of Y is $2121.32, since Y = 12000 500X and, thus, p E(Y ) = 12000 500E(X) = 7500, Y = 500 X = 500 18 = 2121.32 7.147 The amount (call it, say, c) that should be put aside annually for cash register repair is equal to $178.67. Indeed, let X be the annual repair cost. Then, on the other one hand, we need to have P (X > c)  0.05, on the other hand, by Chebyshev’s inequality, P (X > c) = P (X

µX > c

Thus, it suffices to pick c so that

144 (c 125)2

125)  P (|X

µ| > c

125) 

144 . (c 125)2

= 0.05, and the required result follows.

7.6 Review Exercises

7.148 Note that E(X) = X

=

r

1 (0.000 103

254

+ 0.001 + 0.002 + . . . + 0.999) = 0.4995, and

1 (0.0002 + 0.0012 + 0.0022 + . . . + 0.9992 ) 103

0.49952 ⇡ 0.2887

7.149 Since the expected amount paid by the policy is 5 X

(5000

1000(x

1)) ⇥ 0.1 ⇥ 0.9x

x=1

1

= 1314.41,

the company should charge $1414.41 (per policy), so that on average it nets a $100 profit. 7.150 a) If X represents the number of red balls obtained assuming that they are drawn with replacement, then X ⇠ B(2, 0.6) with the PMF of X given by: ( 2 x 2 x , if x 2 {0, 1, 2}, x 0.6 0.4 pX (x) = 0, otherwise. b) If Y represents the number of red balls obtained assuming that they are drawn without replacement, then Y ⇠ H(10, 2, 0.6) and the PMF of Y is given by: pY (y) =

8 > <

6 y

> :

4 2 y 10 2

, if y 2 {0, 1, 2},

0,

otherwise.

c) E(X) = 1.2 and E(Y ) = 1.2, since, by definition of expectation and PMFs in (a),(b), we have E(X) = 1 ⇥ 2(0.6)(0.4) + 2 ⇥ 1(0.6)2 = 1.2, E(Y ) = 1 ⇥

6 1

4 1 10 2

+2⇥

6 2

4 0

=

10 2

24 15 54 +2⇥ = = 1.2 45 45 45

d) From Table 7.4 we have that E(X) = 2 ⇥ 0.6 = 1.2 and E(Y ) = 2 ⇥ 0.6 = 1.2; e) Var(X) = 0.48 and Var(Y ) = 0.4267, since ✓ ◆ ✓ ◆ ✓ ◆ 2 2 0 2 2 2 2 2 Var(X) = 0 0.6 0.4 + 1 0.6 · 0.4 + 2 0.62 0.40 1.22 = 0.48, 0 1 2 Var(Y ) = 02

6 0

4 2 10 2

+ 12

6 1

4 1 10 2

+ 22

6 2

4 0 10 2

1.22 =

84 45

1.22 ⇡ 0.4267

f ) From Table 7.10, it follows that Var(X) = 2 ⇥ 0.6 ⇥ 0.4 = 0.48 and 2 Var(Y ) = 10 10 1 ⇥ 2 ⇥ 0.6 ⇥ 0.4 ⇡ 0.4267 7.151 a) 1 + n(1 (1 p)n ). Let X be the number of tests required under the alternative scheme for a random sample of n people. Then E(X) = 1 ⇥ (1

p)n + (n + 1) (1

(1

p)n ) = 1 + n (1

(1

p)n ) .

255

CHAPTER 7. Expected Value of Discrete Random Variables

b) The alternative scheme is superior whenever 1 + n(1 p u) =

u=0

1 X

P (X > u)P (Y > u) =

u=0

1 X

(1

p)u (1

q)u =

u=0

1 . p + q pq

b) Let V = max(X, Y ). Then E(V ) =

1 X

P (V > v) =

v=0

=

1 X

1 X

(1

v=0

[1

(1

(1

P (V  v)) =

p)v )(1

(1

v=0

1 X

[1

v=0

q)v )] =

P (X  v)P (Y  v)]

1 1 + p q

1 . p + q pq

c) From (a),(b) for p = q we have that E(min(X, Y )) =

1 (2

7.155 Note that Var(X) + Var(Y ) = E(X) = 1 ·

1 2

p)p

and E(max(X, Y )) =

+

=

13 4

15 4

3 (2

2p . p)p

= Var(X + Y ), since, on the one hand,

8 4 4 4 4 4 4 +2· +3· = 2, E(Y ) = 1 · +3· +4· +6· = 3.5, 16 16 16 16 16 16 16

implying that

4 8 4 1 + 22 ⇥ + 32 ⇥ 22 = , 16 16 16 2 4 13 Var(Y ) = 12 + 32 + 42 + 62 ⇥ 3.52 = 3.25 = ; 16 4 Var(X) = 12 ⇥

on the other hand, E(X + Y ) = 2 ⇥

1 2 2 3 3 2 2 1 11 +3⇥ +4⇥ +5⇥ +6⇥ +7⇥ +8⇥ +9⇥ = , 16 16 16 16 16 16 16 16 2

7.6 Review Exercises

1 + 32 ⇥ 16 2 + 72 ⇥ + 82 ⇥ 16

2 + 42 ⇥ 16 2 + 92 ⇥ 16

Var(X + Y ) = 22 ⇥

256

2 3 3 + 52 ⇥ + 62 ⇥ 16 16 16 ✓ ◆2 1 11 544 121 15 = = . 16 2 16 4 4

7.156 By Exercise 6.138, the conditional distribution of the number of undetected defectives (in a random sample of size n), given that the number of detected defectives equals k, is binomial q) with parameters n k and p(1 1 pq , implying that the required conditional expectation is equal to (n

q) k) p(1 1 pq and the conditional variance is (n

q) k) p(1(1 p)(1 . pq)2

7.157 a) The mean of X is 77.700 = 60 5.700 and the standard deviation is 2.75900 . b) The mean of Y is 2.642 cm and the standard deviation is 7.007 cm, since Y = 2.54X (where recall that 100 = 2.54 cm), resulting in E(Y ) = 2.54E(X)

200 = 2.54 ⇥ 77.7 Y

= 2.54

X

200 =

200

2.642,

⇡ 7.007

c) Clearly, since there is a linear relationship between X and Y and 2.54 > 0, the correlation coefficient is 1. 7.158 Note that E



1 X



=

1 X p(1 x=1

p)x x

1

=

p 1

p

1 X (1

|x=1 {z

=

which is certainly di↵erent from

1 = p. E(X)

p)x x

=

}

p ln (p) , 1 p

ln(1 (1 p))

7.159 a) 1. Since it takes X trials to receive a single success, one expects a single success in the next X trials, thus the expected value of Y should equal 1. b) 1. Note that X ⇠ G(p) and (Y | X = x) ⇠ B(x, p). So E(Y ) = c) 2

X x

E(Y |X = x)pX (x) =

1 X

(xp) p(1

p)x

1

= 1.

x=1

2p. Note that E(Y | X) = pX and Var(Y | X) = p(1

p)X, thus,

Var(E(Y | X)) = Var(pX) = p2 Var(X) = 1

p

and E(Var(Y | X)) = E(p(1

p)X) = p(1

p)E(X) = 1

implying that Var(Y ) = Var(E(Y | X)) + E(Var(Y | X)) = 2(1

p,

p).

7.160 If Y has P( ), then, since distribution of Xn converges to distribution of Y , we must have that  (npn )2 Var(Y) = lim Var(Xn ) = lim npn (1 pn ) = lim npn = , n!1 n!1 n!1 n

257

CHAPTER 7. Expected Value of Discrete Random Variables

since npn !

as n ! 1.

7.161 a) If N is large relative to n, then sampling without replacement is well approximated by sampling with replacement, implying that distribution of X is approximately binomial with parameters n and p, thus, Var(X) ⇡ np(1 p). n b) Since the exact distribution of X is H(N, n, p), then Var(X) = ( N N 1 )np(1 p) ! np(1 p) as N ! 1. 7.162 a)

zm , since, for z = 0, . . . , m + n, m+n

E(X|X + Y = z) = =

m X

m x

x

xpX|X+Y =z (x|z) =

x=0

px (1

p)m m+n z

x=0

b)

m X

m X x=0

x

n z x

pz (1

x (1

pz

p)n

m

x

pX,X+Y (x, z) X pX (x)pY (z x) = x pX+Y (z) pX+Y (z) x=0

m X

z+x

=

p)m+n z

x

|x=0

m n x z x m+n z

{z

= }

zm . m+n

m = mean of H(m+n,z, m+n )

zmn(m + n z) m . Indeed, since (X | X + Y = z) is H(m + n, z, m+n ), then (m + n 1)(m + n)2 Var(X | X + Y = z) =



m+n m+n

◆ ✓ ◆✓ ◆ z m n z . 1 m+n m+n

Theory Exercises 7.163 a) For all x, P (X = x) =

X

P (X = x, Y = y) = P (X = x, Y = x) +

y

X

P (X = x, Y = y),

y6=x

then, since P (Y = X) = 1, 1=

X x

thus,

P (X = x) =

X |

P (X = x, Y = x) +

x

1=1+

{z

=P (Y =X)=1

X✓ X x

P

y6=x

}

X✓ X x

y6=x

◆ P (X = x, Y = y) ,

◆ P (X = x, Y = y) ,

implying that y6=x P (X = x, Y = y) = 0 for each x. Therefore, P (X = x) = P (X = x, Y = x) for all x. Similarly, one shows that P (Y = x) = P (Y = x, X = x) for all x, thus, for all x 2 R, pX (x) = P (X = x, Y = x) = pY (x),

7.6 Review Exercises

258

i.e. X and Y have the same PMF. b) By (a) and computing formulas for expectation and variance, X and Y must have the same mean and variance, since X X E(X) = xpX (x) = xpY (x) = E(Y ), x

Var(X) =

X

x2 pX (x)

x

X

(E(X))2 =

x

x2 pY (x)

(E(Y ))2 = Var(Y ).

x

¯ n = X1 +...+Xn , then using properties of 7.164 a) Let 2 be the (finite) variance of X. Also, let X n ¯ n ) = E(X) and Var(X ¯ n ) = 2 . Then, by Chebyshev’s inequality expectation and variance, E(X n ¯ n , for all ✏ > 0, applied to X 2

¯n P( X

E(X)

✏) 

, n✏2 where the right-hand side goes to 0, as n ! 1. Thus, for each ✏ > 0, ◆ ✓ X1 + · · · + Xn ¯ n E(X) lim P E(X) < ✏ = 1 lim P ( X n!1 n!1 n

✏) = 1

0 = 1.

¯ n in (a) represents relative frequency of the occurrence b) If X = 1E , then E(X) = P (E), and X of event E in n independent repetitions of the experiment. Then the weak law of large numbers (given in (a)) states that the relative frequency of event E in a large number n of repetitions of the experiment is approximately equal to P (E) (with probability close to 1), which to a certain extent provides confirmation of the frequentist interpretation of probability. Advanced Exercises 7.165 E(X) = ↵ and Var(X) = ↵ + ↵2 . Indeed, note that ✓ 1 X E(X) = x x=1

↵ = 1+↵ 0

↵ @ = (1 + 1+↵ =

pX (0) +

◆x

↵ 1+↵

1 ✓ X x=2

· 0)pX (0) +

↵ 1+↵

1 ✓ X y=1

1(1 + ) · · · (1 + (x x!

·

◆x

1

pX (0)

1(1 + ) · · · (1 + (x · (x 1)!

◆y

↵ 1+↵

1) )

1(1 + ) · · · (1 + (y y!

1) )

!

pX (0)

1) )(1 + y )

↵ ↵ E(1 + X) = (1 + E(X)) , implying that E(X) = ↵. 1+↵ 1+↵

Similarly, E(X ) = 2

↵ = 1+↵

1 X x=1

2

x



↵ 1+↵

✓ 1 X pX (0) + x x=2

◆x

↵ 1+↵

·

1(1 + ) · · · (1 + (x x!

◆x

1

1) )

1(1 + ) · · · (1 + (x · (x 1)!

pX (0)A

pX (0) 1) )

!

pX (0)

1

259

CHAPTER 7. Expected Value of Discrete Random Variables

=

0

1

◆y



1 X

↵ @ ↵ 1(1 + ) · · · (1 + (y 1) )(1 + y ) pX (0) + (y + 1) · pX (0)A 1+↵ 1+↵ y! y=1 0 1 1 ↵ ↵ @X (y + 1)(1 + y )pX (y)A = E ((X + 1)(1 + X)) = 1+↵ 1+↵ y=0

=

↵ 1+↵

↵(1 + ( + 1)↵) ↵ + E(X 2 ), 1+↵ 1+↵

1 + ( + 1)E(X) + E(X 2 ) =

implying that E(X 2 ) = ↵ + ↵2 + ↵2 , thus, Var(X) = E(X 2 )

(E(X))2 = ↵ + ↵2 .

7.166 a) Let X denote the total number of coupons that need to be collected until a complete set is obtained. Also, let Xi be the number of coupons that are collected to get the (i + 1)th new coupon after the ith new coupon is collected, i = 0, . . . , M 1. Then Xi ⇠ G MM i . Also, P 1 X= M i=0 Xi , and therefore by linearity ! M 1 M M M X1 X X1 M X 1 E(X) = E Xi = E(Xi ) = =M . M i j i=0

b) Since the X0 , . . . , XM

i=0

i=0

j=1

are independent, we have that ! M 1 M M X1 X X1 1 (M i)/M Var(X) = Var Xi = Var(Xi ) = ((M i)/M )2 i=0 i=0 i=0 =M

M X1 i=0

c) Since ln(M + 1) =

1

RM 0

i (M

1 x+1 dx

=M

i)2 

j=1

PM

1 j=1 j

j

j2

1+

= M2

M X 1 j2 j=1

RM 1

1 x dx

M

M X 1 j=1

j

.

= 1 + ln(M ), then

PM 1 1 ln(M + 1) j=1 j   +1, ln(M ) ln(M ) ln(M ) | {z } | {z } !1

implying that

!0

lim

thus, by (a), E(X) = M

M X M

PM

M !1

1 j=1 j

M

PM

1 j=1 j

M ln(M )

= 1,

⇠ M ln(M ), as M ! 1.

7.167 0.244; Let Xi denote the number of claims made by a given (randomly selected) policyholder in year i, i = 1, 2. Let H be the event that the policy holder is “high risk”. Then we know that P (H) = 0.1, (Xi | 1H = 1) is P(0.6) and (Xi | 1H = 0) is P(0.1) for i = 1, 2. Moreover, (X1 | 1H = y) and (X2 | 1H = y) are independent (for each y 2 {0, 1}). Then E(X2 | X1 = 1) =

1 X

x2 =0

x2 pX2 | X1 (x2 | 1) =

1 X

x2 =0

x2 ·

P (X2 = x2 , X1 = 1) P (X1 = 1)

7.6 Review Exercises

=

1 X

x2 =0

=

1 X

x2 =0

x2 ·

x2 ·

260

P (X2 = x2 , X1 = 1 | 1H = 1)P (H) + P (X2 = x2 , X1 = 1 | 1H = 0)P (H c ) P (X1 = 1 | 1H = 1)P (H) + P (X1 = 1 | 1H = 0)P (H c )

P (X2 = x2 | 1H = 1)P (X1 = 1| 1H = 1)0.1 + P (X2 = x2 | 1H = 0)P (X1 = 1| 1H = 0)0.9 P (X1 = 1 | 1H = 1)0.1 + P (X1 = 1 | 1H = 0)0.9 e

0.6 0.6x2

e

0.6 0.6

x2 !

x2 ·

·

1!

· 0.1 +

e

0.1 0.1x2

e

0.1 0.1

· 0.9 x2 ! 1! e 0.6 0.6 e 0.1 0.1 x2 =0 · 0.1 + · 0.9 1! 1! ✓X ◆ ✓X ◆ 1 1 e 0.6 ⇥ 0.06 e 0.6 0.6x2 e 0.1 ⇥ 0.09 e 0.1 0.1x2 = 0.6 x2 · + 0.6 x2 · e ⇥.06 + e 0.1 ⇥.09 x2 ! e ⇥.06 + e 0.1 ⇥.09 x2 ! x2 =0 x2 =0 {z } {z } | | =

1 X

·

=0.6

=

e

⇥ 0.036 + e e 0.6 ⇥ 0.06 + e 0.6

=0.1

0.1 0.1

⇥ 0.009 ⇡ 0.244 ⇥ 0.09

7.168 a) 8. Let X represent the number of times that the spelunker chooses the route which returns her to the cavern in 3 hours, let Y represent the number of times that she goes down the route which returns her in 4 hours and let T represent her total time spent underground. Then 1 T = 1 + 3X + 4Y . If the route is chosen at random each time, then pX,Y (x, y) = x+y y 31+x+y for x, y 2 Z+ . Thus E(T ) =

XX

(1 + 3x + 4y)pX,Y (x, y) =

1 X 1 X x=0 y=0

(x,y)

✓ ◆ x+y 1 (1 + 3x + 4y) = 8. 1+x+y y 3

Alternatively, note that, by symmetry, E(X) = E(Y ) and (X + Y + 1) is G(1/3), then 3 = E(X + Y + 1) = 2E(X) + 1, implying that E(X) = E(Y ) = 1. Therefore, E(T ) = 1 + 3E(X) + 4E(Y ) = 1 + 3 + 4 = 8. b) 4.5; If the spelunker chooses a route at random from among the routes not already tried, then the distribution of the total amount of time spent underground by the spelunker is given by: t

pT (t)

1

1 3 1 6 1 6 1 3

4 5 8

and E(T ) = 4.5 7.169 (p + 1)/p2 . Let X be the number of trials until 2 consecutive successes occur. Let H be the event that the first trial is a success. Then E(X | 1H = 1) = 2p + (2 + E(X))(1

p) and E(X | 1H = 0) = 1 + E(X).

261

CHAPTER 7. Expected Value of Discrete Random Variables

Therefore, E(X) = E(E(X | 1H )) = E(X | 1H = 1)P (1H = 1) + E(X | 1H = 0)P (1H = 0) = 2p2 + (2 + E(X))p(1

p) + (1 + E(X))(1

p),

implying that E(X) = (p + 1)/p2 .

7.170 a) 0. Note that for each i = 1, . . . , m, Xi has B(n, pi ). Moreover, by Exercise 7.130(b), Cov(Xi , Xj ) = npi pj for all i 6= j, thus, Var(X1 + . . . + Xm ) =

m X

Var(Xk ) + 2

=

npk (1

pk )

2

XX

1i 0, and fY (y) = 0 otherwise.

& % 8.106 Let X denote diameter for the produced ball bearings. Then X ∼ N 1.4, 0.0252 . Let Y denote diameter for the remaining ball bearings. For 1.35 ≤ y < 1.48, FY (y) = P (Y ≤ y) = P (X ≤ y | 1.35 ≤ X ≤ 1.48) = = Now,

P (X ≤ y, 1.35 ≤ X ≤ 1.48) P (1.35 ≤ X ≤ 1.48)

P (1.35 ≤ X ≤ y) FX (y) − FX (1.35) = . P (1.35 ≤ X ≤ 1.48) FX (1.48) − FX (1.35)

% & % & FX (1.48) − FX (1.35) = ) (1.48 − 1.4)/0.025 − ) (1.35 − 1.4)/0.025 = )(3.20) − )(−2) = 0.9765.

8.5

8-37

Normal Random Variables

Noting that 1/0.9765 ≈ 1.024, it now follows that fY (y) = 1.024fX (y) = 1.024 · √

1

2π 0.025 for 1.35 < y < 1.48, and fY (y) = 0 otherwise.

2 /2(0.025)2

e−(y−1.4)

40.96 −800(y−1.4)2 = √ e 2π

8.107 Let X1 and X%2 denote the &amount of soda% put in a bottle & by Machines I and II, respectively. We know that X1 ∼ N 16.21, 0.142 and X2 ∼ N 16.12, 0.072 . Also, let A1 and A2 denote the events that a randomly selected bottle was filled by Machines I and II, respectively. Because Machine I fills twice as many bottles as Machine II, we have P (A1 ) = 2/3 and P (A2 ) = 1/3. Let B denote the event that a randomly selected bottle contains less than 15.96 oz of soda. We have % & P (B | A1 ) = P (X1 < 15.96) = ) (15.96 − 16.21)/0.14 = )(−1.79) = 0.0367 and

% & P (B | A2 ) = P (X2 < 15.96) = ) (15.96 − 16.12)/0.07 = )(−2.29) = 0.0110.

Applying Bayes’s rule now yields

P (A1 )P (B | A1 ) = P (A1 | B) = P (A1 )P (B | A1 ) + P (A2 )P (B | A2 )

2 3

2 3

· 0.0367

· 0.0367 +

1 3

· 0.0110

= 0.870.

Thus, 87.0% of the bottles that contain less than 15.96 oz are filled by Machine I.

8.108 Let X denote the number of people who don’t take the flight. Then X ∼ B(42, 0.16). Thus, # $ 42 (0.16)x (0.84)42−x , x = 0, 1, . . . , 42. (∗) pX (x) = x

Noting that np = 6.72 and np(1 − p) = 5.6448, we can, in view of Proposition 8.12 on page 445, use the normal approximation 1 2 pX (x) ≈ √ e−(x−6.72) /11.2896 , 11.2896π

x = 0, 1, . . . , 42.

(∗∗)

a) Here we want P (X = 5) = pX (5). The values obtained by substituting x = 5 into Equation (∗) and Relation (∗∗) are 0.1408 and 0.1292, respectively. . b) Here we want P (9 ≤ X ≤ 12) = 12 x=9 pX (x). The values obtained by using Equation (∗) and Relation (∗∗) are 0.2087 and 0.2181, respectively. c) Here we want P (X ≥ 1) = 1 − P (X = 0) = 1 − pX (0). The values obtained by substituting x = 0 into Equation (∗) and Relation (∗∗) are 0.0007 and 0.0031, respectively. Therefore, the values obtained for the required probability are 0.9993 and 0.9969, respectively. . d) Here we want P (X ≤ 2) = 2x=0 pX (x). The values obtained by using Equation (∗) and Relation (∗∗) are 0.0266 and 0.0357, respectively. Note: In this exercise, the normal approximation does not work that well because n is not large and p is relatively small. 8.109 The population under consideration consists of the residents of Anchorage, AK, the number of which we denote by N; the specified attribute is “owns a cell phone”; the proportion of the population that has the specified attribute is 0.56; and the sample size is 500. a) Because the sampling is without replacement, the exact distribution of X is hypergeometric with parameters N, 500, and 0.56, that is, X ∼ H(N, 500, 0.56). b) In view of Proposition 5.6 on page 215, the required binomial distribution is the one with parameters 500 and 0.56, that is, B(500, 0.56).

8-38

CHAPTER 8

Continuous Random Variables and Their Distributions

c) Note that np = 500 · 0.56 = 280 and np(1 − p) = 500 · 0.56 · 0.44 = 123.2. Thus, the required normal distribution is the one with parameters 280 and 123.2, that is, N (280, 123.2). d) Referring to part (c), we find that 1 2 pX (x) ≈ √ e−(x−280) /246.4 , 246.4π

x = 0, 1, . . . , 500.

In this case, we want P (X = 280). Substituting 280 for x in the previous display, we find that P (X = 280) = pX (280) ≈ 0.0359. e) Here we want P (278 ≤ X ≤ 280). Referring to part (d), we find that P (278 ≤ X ≤ 280) = Thus, P (278 ≤ X ≤ 280) ≈ 0.107.

280 )

x=278

pX (x) ≈ 0.0354 + 0.0358 + 0.0359 = 0.1071.

8.110 Let X denote the number who have consumed alcohol within the last month. We have n = 250 and p = 0.527. The problem is to approximate P (X = 125) by using the (local) normal approximation. Note that np = 250 · 0.527 = 131.75 and np(1 − p) = 250 · 0.527 · 0.473 = 62.31775. Thus, the approximating normal distribution is N (131.75, 62.31775). Referring now to Relation (8.39) on page 445 gives P (X = 125) = pX (125) ≈ 0.0351.

8.111 Let X denote the number who have Fragile X Syndrome. We have n = 10,000 and p = 1/1500. Observe that np = 6.667 and np(1 − p) = 6.662. Consequently, the approximating normal distribution is N (6.667, 6.662). a) Here we want P (X > 7). Applying the (local) normal approximation gives P (X > 7) = 1 − P (X ≤ 7) = 1 −

7 ) x=0

pX (x) ≈ 1 − 0.625 = 0.375.

b) Here we want P (X ≤ 10). Applying the (local) normal approximation gives P (X ≤ 10) =

10 ) x=0

pX (x) ≈ 0.930.

c) The Poisson approximation is preferable to the normal approximation because the assumptions for the former (large n and small p) are better met than those for the latter (large n and moderate p). 8.112 We first note that X satisfies the conditions of Exercise 8.83(d). Thus, the median of X is the unique number, M, that satisfies the equation FX (M) = 1/2. From Equation (8.35) on page 443, we then % & have 0.5 = FX (M) = ) (M − µ)/σ . Referring to Table I now shows that (M − µ)/σ = 0, or M = µ.

Theory Exercises

8.113 First observe that φ(−t) = φ(t) for all t ∈ R. Therefore, making the substitution u = −t, we find that 4 ∞ 4 ∞ 4 z 4 −z φ(t) dt = φ(−u) du = φ(u) du = 1 − φ(u) du = 1 − )(z) )(−z) = for all z ∈ R.

−∞

z

z

−∞

8.6

Other Important Continuous Random Variables

8-39

Advanced Exercises 8.114 a) First note that, for all t ∈ R, # $ d 1 1 d 5 2 6 2 −t 2 /2 ′ = √ e−t /2 −t /2 = −tφ(t). φ (t) = e √ dt dt 2π 2π Therefore, for x > 0, 4 4 ∞ 1 − )(x) = φ(t) dt ≤ x

∞ x

t 1 φ(t) dt = − x x

4



x

φ ′ (t) dt = −

& 1 1% φ(∞) − φ(x) = φ(x). x x

For the other inequality, we again use the equality φ ′ (t) = −tφ(t) and integrate by parts twice to obtain 1 − )(x) =

4

x



φ(t) dt = −

φ(x) + x

4



4

φ ′ (t)

∞ x

φ ′ (t) φ(x) dt = − t x

4

φ(x) φ(x) − 3 +3 3 x t x x # $ φ(x) φ(x) 1 1 ≥ − 3 = − 3 φ(x). x x x x

=

dt =

∞ x

4

φ(t) dt t2

∞ x

φ(t) dt t4

b) Dividing the inequalities from part (a) by x −1 φ(x) yields the inequalities 1− Therefore, 1 = lim

x→∞

#

1 − )(x) 1 ≤ 1. ≤ −1 2 x x φ(x)

1 1− 2 x

$

1 − )(x) ≤ 1, x→∞ x −1 φ(x)

≤ lim

so that 1 − )(x) ∼ x −1 φ(x) as x → ∞.

8.6 Other Important Continuous Random Variables Basic Exercises 8.115 As +(t) =

*∞ 0

% & x t−1 e−x dx, and recalling from calculus that d x t−1 /dt = (ln x)x t−1 , we have 4 ∞ (ln x)x t−1 e−x dx + ′ (t) = 0

and + ′′ (t) =

4



0

+ ′′ (t)

(ln x)2 x t−1 e−x dx.

From this latter equation, it follows that > 0 for all t > 0 and, hence, the graph of the gamma function is always concave up. Noting also that +(t) > 0 for all t > 0 and that +(1) = +(2) = 1, we conclude that the gamma function has a unique minimum, which occurs somewhere between t = 1

8-40

CHAPTER 8

Continuous Random Variables and Their Distributions

and t = 2. Next we note that +(t) >

4

0

1

x t−1 e−x dx > e−1

4

1 0

x t−1 dx =

1 . et

Consequently, +(t) → ∞ as t → 0+ . Hence, the graph of the gamma function is as shown in Figure 8.17 on page 451. . j 8.116 Using the continuity of the exponential function and the fact that the sum, jr−1 =0 (λx) /j !, is a polynomial in x, it follows easily that F is continuous for x > 0. Also, it is easy to see that F (0) = 0. Therefore, F is an everywhere continuous function. In particular, then, property (b) of Proposition 8.1 is satisfied. Property (c) is obvious from the definition of F . To show that F is nondecreasing, we recall from page 453 that λr x r−1 e−λx , x > 0. F ′ (x) = (r − 1)!

From this equation, it is clear that F ′ (x) > 0 for x > 0 and, hence, that F is nondecreasing. Consequently, property (a) is satisfied. Finally, we must verify property (d). To do so, it suffices to show that limx→∞ e−λx x j = 0 for each 0 ≤ j ≤ r − 1, which is easy to do by applying L’Hôpital’s rule.

8.117 a) The Erlang distribution with parameters r and λ is just the gamma distribution with those two parameters. Thus, a PDF is given by f (x) =

λr x r−1 e−λx , (r − 1)!

x > 0,

and f (x) = 0 otherwise. b) The chi-square distribution with ν degrees of freedom is just the gamma distribution with parameters ν/2 and 1/2. Thus, a PDF is given by f (x) =

1 2ν/2 +(ν/2)

x ν/2−1 e−x/2 ,

x > 0,

and f (x) = 0 otherwise. 8.118 a) Let Y = X2 . For y > 0,

5 6 % √ %√ & % √ & √ & FY (y) = P (Y ≤ y) = P X2 ≤ y = P − y ≤ X ≤ y = FX y − FX − y .

Noting that fX is symmetric about 0, differentiation now yields

%√ & 1 −1/2 %√ & 1 −1/2 % √ & y fX y + y fX − y = y −1/2 fX y 2 2 % &1/2 % & √ 2 1/2σ 2 1 2 −1/2 −( y ) /2σ 2 = y 1/2−1 e− 1/2σ y . =y e √ +(1/2) 2π σ % & Consequently, X2 ∼ + 1/2, 1/2σ 2 . b) We can proceed as in part (a). Alternatively, we can first apply Proposition 8.10 on page 441 to conclude that X/σ ∼ N (0, 1) and then apply part (a) to conclude that X2 /σ 2 = (X/σ )2 ∼ + (1/2, 1/2). Thus, X2 /σ 2 has the chi-square distribution with 1 degree of freedom. fY (y) =

8.6

8-41

Other Important Continuous Random Variables

8.119 Because Y ∼ B(m + n − 1, x), we have # $ m+n−1 j P (Y = j ) = x (1 − x)m+n−1−j , j

j = 0, 1, . . . , m + n − 1.

Referring to Equation (8.55) on page 457 now yields m+n−1 ) #m + n − 1$ x j (1 − x)m+n−1−j = FX (x) = P (X ≤ x). P (Y ≥ m) = j j =m 8.120 Let Y = X−1 − 1 and note that the range of Y is (0, ∞) because the range of X is (0, 1). For 0 ≤ y < ∞, 5 6 % & % & FY (y) = P (Y ≤ y) = P X−1 − 1 ≤ y = P X ≥ 1/(y + 1) = 1 − FX 1/(y + 1) .

Differentiation now yields

% & 1 1 1 fX 1/(y + 1) = fY (y) = 2 2 (y + 1) (y + 1) B(α, β) =

#

1 y+1

$α−1 #

1 1 y β−1 y β−1 = B(α, β) (y + 1)α+1 (y + 1)β−1 B(α, β)(y + 1)α+β

1 1− y+1

$β−1

if y > 0, and fY (y) = 0 otherwise.

8.121 We used basic calculus to find the derivative of a beta PDF and obtained the following: % & 1 f ′ (x) = x α−2 (1 − x)β−2 (α − 1)(1 − x) − (β − 1)x , B(α, β)

(∗)

which can also be expressed as f ′ (x) =

% & 1 x α−2 (1 − x)β−2 (2 − α − β)x + α − 1 . B(α, β)

(∗∗)

Now we proceed as follows: ● If α = β = 1, then B(α, β) = B(1, 1) = +(1)+(1)/ +(2) = 0!0!/1! = 1. Therefore, f (x) =

1 x 1−1 (1 − x)1−1 = 1, B(1, 1)

0 < x < 1,

and f (x) = 0 otherwise. This function is the PDF of the uniform distribution on (0, 1). If α < 1, then α − 1 < 0, so that x α−1 → ∞ as x → 0+ . Consequently,



lim f (x) = lim

x→0+ ●

x→0+

1 1 x α−1 (1 − x)β−1 = · ∞ · 1 = ∞. B(α, β) B(α, β)

If β < 1, then β − 1 < 0, so that (1 − x)β−1 → ∞ as x → 1− . Consequently, lim f (x) = lim

x→1−

x→1−

1 1 x α−1 (1 − x)β−1 = · 1 · ∞ = ∞. B(α, β) B(α, β)

If α < 1 and β < 1, then we know from the previous two bulleted items that f (x) → ∞ as either x → 0+ or x → 1− . Also, from Equation (∗∗), f ′ (x) = 0 if and only if x = (1 − α)/(2 − α − β), which lies strictly between 0 and 1. Thus, f takes its unique minimum at this point. We leave it to the reader to show that, on the interval (0, 1), the second derivative of f is always positive and, hence, the graph of f is concave up. ●

8-42

CHAPTER 8

Continuous Random Variables and Their Distributions

If α < 1 and β ≥ 1, then Equation (∗) shows that f ′ (x) < 0 for all x ∈ (0, 1). Thus, f is decreasing thereon. ′ ● If α ≥ 1 and β < 1, then Equation (∗) shows that f (x) > 0 for all x ∈ (0, 1). Thus, f is increasing thereon. ′ ● If α > 1 and β > 1, then, from Equation (∗∗), f (x) = 0 if and only if x = (α − 1)/(α + β − 2), which lies strictly between 0 and 1. As f (0) = f (1) = 0, f takes its unique maximum at this point. ● If α = β, then ●

% &β−1 1 1 (0.5 − x)α−1 1 − (0.5 − x) = (0.5 − x)α−1 (0.5 + x)β−1 B(α, β) B(α, β) % &β−1 1 1 = (0.5 + x)α−1 (0.5 − x)β−1 = (0.5 + x)α−1 1 − (0.5 + x) B(α, β) B(α, β)

f (0.5 − x) =

= f (0.5 + x).

Consequently, f is symmetric about x = 0.5. 8.122 Let Y = X1/α . For 0 ≤ y < 1,

5 6 % & % & FY (y) = P (Y ≤ y) = P X1/α ≤ y = P X ≤ y α = FX y α = y α .

Differentiation yields fY (y) = αy α−1 if 0 < y < 1, and fY (y) = 0 otherwise. Because B(α, 1) = 1/α, we see that X1/α has the beta distribution with parameters α and 1. 8.123 Let X denote the proportion of manufactured items that require service within 5 years. From Equation (8.55) on page 457, the CDF of X is given by 4 # $ ) 4 j x (1 − x)4−j = 6x 2 (1 − x)2 + 4x 3 (1 − x) + x 4 , 0 ≤ x < 1. FX (x) = j j =2 We use this expression to solve parts (a) and (b). a) P (X ≤ 0.3) = FX (0.3) = 0.3483. b) P (0.1 < X < 0.20) = FX (0.2) − FX (0.1) = 0.1808 − 0.0523 = 0.1285.

8.124 Let f and F denote the PDF and CDF, respectively, of a beta distribution with parameters 51.3 and 539.75. a) A graph of the PDF is as follows: f(x)

x

8.6

8-43

Other Important Continuous Random Variables

b) F (0.10) − F (0.07) = 0.8711 − 0.0661 = 0.805. 8.125 Let X ∼ T (−1, 1). We want to use a linear transformation, y = c + dx, that takes the interval (−1, 1) to the interval (a, b). Thus, we need to choose c and d so that c − d = a and c + d = b. These relations imply that c = (b + a)/2 and d = (b − a)/2. Now let Y = c + dX. For a ≤ y < b, % & % & % & FY (y) = P (Y ≤ y) = P c + dX ≤ y = P X ≤ (y − c)/d = FX (y − c)/d .

Differentiating yields

& 1 % 2 fY (y) = fX (y − c)/d = fX d b−a

#

& 2 % y − (a + b)/2 b−a

$

for a < y < b, and fY (y) = 0 otherwise. Finally, we show that Y actually has the T (a, b) distribution. Referring to Equation (8.57) on page 460, we see that, for a < y < b, # $ & 2 % 2 fY (y) = y − (a + b)/2 fX b−a b−a 9 9$ # 9 2 % &9 2 = 1 − 99 y − (a + b)/2 99 b−a b−a # $ 2 |a + b − 2y| = 1− . b−a b−a 8.126 From Exercise 8.49(e), we find that the CDF of a random variable X with a T (−1, 1) distribution is given by ⎧ 0, if x < −1; ⎪ ⎨ 1 (1 + x)2 , if −1 ≤ x < 0; FX (x) = 2 1 2 ⎪ ⎩ 1 − 2 (1 − x) , if 0 ≤ x < 1; 1, if x ≥ 1.

Referring now to to the solution of Exercise 8.125, we see that the CDF of a random variable Y with a T (a, b) distribution is given by ⎧ 0, if y < a; ⎪ # $2 ⎪ ⎪ % & ⎪ 2 1 ⎪ ⎪ 1+ , if a ≤ y < (a + b)/2; y − (a + b)/2 ⎨ 2 b−a FY (y) = # $ ⎪ & 2 1 2 % ⎪ ⎪ 1− , if (a + b)/2 ≤ y < b; 1− y − (a + b)/2 ⎪ ⎪ ⎪ 2 b−a ⎩ 1, if y ≥ b. ⎧ 0, if y < a; ⎪ # $2 ⎪ ⎪ ⎪ y−a ⎪ ⎪ , if a ≤ y < (a + b)/2; ⎨2 b−a = # $2 ⎪ b−y ⎪ ⎪ 1−2 , if (a + b)/2 ≤ y < b; ⎪ ⎪ ⎪ b−a ⎩ 1, if y ≥ b.

8-44

CHAPTER 8

Continuous Random Variables and Their Distributions

Following is a graph of this CDF: F(y)

y

8.127 a) For x > 0, fX (x) =

(0.1)5 5−1 −0.1x 1 x e = x 4 e−x/10 , 5 +(5) 24 × 10

and fX (x) = 0 otherwise. Also, referring to Equation (8.49) on page 453, we see that FX (x) = 1 − e−x/10

4 ) (x/10)j

j =0

j!

x ≥ 0,

,

and FX (x) = 0 otherwise. b) Referring to part (a), we find that P (X ≤ 60) = FX (60) = 1 − e−6

4 ) 6j

j =0

j!

= 0.715.

c) Again referring to part (a), we get P (40 ≤ X ≤ 50) = FX (50) − FX (40) = 0.5595 − 0.3712 = 0.1883. 8.128 a) We have FY (y) = 0 if y < β. For y ≥ β, % & % & FY (y) = P (Y ≤ y) = P βeX ≤ y = P X ≤ ln(y/β) = 1 − e−α ln(y/β) = 1 − (β/y)α .

b) Referring to part (a), we find that a PDF of Y is given by fY (y) = if y > β, and fY (y) = 0 otherwise.

& d % 1 − (β/y)α = αβ α /y α+1 dy

8.6

Other Important Continuous Random Variables

8.129 Let Y denote loss amount in thousands of dollars. a) From the solution to Exercise 8.128(b), fY (y) = =

7

7

2 · 32 /y 3 , if y > 3; 0, otherwise. 18/y 3 , 0,

if y > 3; otherwise.

Note: In the following graph, we use f (y) instead of fY (y).

Also, from the solution to Exercise 8.128(a), FY (y) = =

7

7

0, 1 − (3/y)2 , 0, 1 − 9/y 2 ,

if y < 3; if y ≥ 3. if y < 3; if y ≥ 3.

Note: In the following graph, we use F (y) instead of FY (y).

8-45

8-46

CHAPTER 8

Continuous Random Variables and Their Distributions

b) Referring to part (a), we get P (Y > 8) = 1 − P (Y ≤ 8) = 1 − FY (8) = 9/82 = 0.141. c) Again referring to part (a), we get P (4 ≤ Y ≤ 5) = FY (5) − FY (4) = 9/42 − 9/52 = 0.2025. 8.130 a) We have FY (y) = 0 if y < 0. For y ≥ 0,

% & % & β FY (y) = P (Y ≤ y) = P X1/β ≤ y = P X ≤ y β = 1 − e−αy .

b) Referring to part (a), we find that a PDF of Y is given by fY (y) =

6 d 5 β β 1 − e−αy = αβy β−1 e−αy dy

if y > 0, and fY (y) = 0 otherwise. c) The graphs of the PDFs are as follows:

The graphs of the CDFs are as follows:

8.6

Other Important Continuous Random Variables

8-47

8.131 Let Y denote component lifetime, in years. Referring to the solution to Exercise 8.130, we get the following with α = 2 and β = 3. 3 3 a) A PDF of Y is fY (y) = 2 · 3y 3−1 e−2y = 6y 2 e−2y if y > 0, and fY (y) = 0 otherwise. Its graph is: f(y)

y

3

The CDF of Y is FY (y) = 0 if y ≤ 0, and FY (y) = 1 − e−2y if y ≥ 0. Its graph is: F(y)

y

b) The probability that the component lasts at most 6 months is 3

P (Y ≤ 0.5) = FY (0.5) = 1 − e−2(0.5) = 0.221. The probability that it lasts between 6 months and 1 year is 3

3

P (0.5 ≤ Y ≤ 1) = FY (1) − FY (0.5) = e−2(0.5) − e−2·1 = 0.643. 8.132 a) Answers will vary. b) We see from Figure 8.19(b) on page 458 that the PDF of X is larger for values of x near 0 or 1 than for values near 1/2. Thus, X is more likely to be near 0 or 1 than near 1/2. c) First note that +(1/2)+(1/2) %√ &2 = π = π. B(1/2, 1/2) = +(1)

8-48

CHAPTER 8

Continuous Random Variables and Their Distributions

√ For 0 ≤ x < 1, we make the substitution u = t to get 4 x 4 1 x 1/2−1 fX (t) dt = t (1 − t)1/2−1 dt FX (x) = P (X ≤ x) = π 0 0 4 √x √ 2 du 2 = = arcsin x. √ π 0 π 1 − u2 d) We have

P (0.4 ≤ X ≤ 0.6) = FX (0.6) − FX (0.4) =

√ √ 2 2 arcsin 0.6 − arcsin 0.4 = 0.128 π π

and

P (X ≤ 0.1 or X ≥ 0.9) = P (X ≤ 0.1) + P (X ≥ 0.9) = FX (0.1) + 1 − FX (0.9) √ √ 2 2 = arcsin 0.1 + 1 − arcsin 0.9 = 0.410. π π These two probabilities illustrate the fact alluded to in part (b), namely, that X is more likely to be near 0 or 1 than near 1/2.

Theory Exercises 8.133 Using integration by parts with u = x t and dv = e−x dx gives 9∞ 4 ∞ 4 ∞ 9 t −x t −x 9 x e dx = −x e 9 + tx t−1 e−x dx +(t + 1) = 0 0 0 4 ∞ =0−0+t x t−1 e−x dx = t+(t). 0

8.134 Using the already-established fact that +(1) = 0! = (1 − 1)! and referring to Equation (8.43) on page 451, we proceed inductively to get +(n + 1) = n+(n) = n · (n − 1)! = n!. Thus, +(n) = (n − 1)! for all n ∈ N . 8.135 We have

% & √ (2 · 0)! √ + 0 + 21 = +(1/2) = π = π. 0! 22·0 Proceeding inductively and using Exercise 8.133, we get 5% 6 % % & & & % & % & (2n)! √ + (n + 1) + 21 = + n + 21 + 1 = n + 21 + n + 21 = n + 21 π n! 22n % & 2(n + 1) ! √ (2n + 2)(2n + 1)(2n)! √ (2n)! √ π= π= π. = (2n + 1) n! 22n+1 2(n + 1)n! 22n+1 (n + 1)! 22(n+1)

Advanced Exercises 8.136 a) By the conditional probability rule, for .t > 0, P (T ≤ t + .t | T > t) =

P (t < T ≤ t + .t) FT (t + .t) − FT (t) = . P (T > t) 1 − FT (t)

8.6

8-49

Other Important Continuous Random Variables

Therefore, & % FT (t + .t) − FT (t) /.t P (T ≤ t + .t | T > t) lim = lim .t→0 .t→0 .t 1 − FT (t) FT′ (t) fT (t) = = hT (t). 1 − FT (t) 1 − FT (t) The left side of the previous equation gives the instantaneous risk of item failure at time t. b) We have & FT′ (u) fT (u) d % hT (u) = = =− ln 1 − FT (u) . 1 − FT (u) du 1 − FT (u) Integrating from 0 to t and using the fact that T is a positive continuous random variable, which implies that FT (0) = 0, we get 4 t % & hT (u) du = − ln 1 − FT (t) . =

0

The required result now follows easily. c) Applying the conditional probability rule and referring to part (b) yields, for s, t > 0, * s+t

1 − FT (s + t) e− 0 hT (u) du P (T > s + t) = = P (T > s + t | T > t) = *t P (T > t) 1 − FT (t) e− 0 hT (u) du =e



5*

s+t 0

6 *t hT (u) du− 0 hT (u) du

= e−

* s+t t

hT (u) du

.

* s+t d) Suppose that hT is a decreasing function, then t hT (u) du is a decreasing function of t. This, * s+t * s+t in turn, implies that − t hT (u) du is an increasing function, so that e− t hT (u) du is an increasing * s+t

function. Similarly, if hT is an increasing function, then e− t hT (u) du is a decreasing function of t. In view of part (c), these two results imply that, if hT is a decreasing function, then the item improves with age, whereas if hT is an increasing function, then the item deteriorates with age. e) The item neither improves nor deteriorates with age; that is, it is “memoryless.”

8.137 a) Because of the lack-of-memory property of an exponential random variable, the hazard-rate function should be constant. b) We have fT (t) λe−λt % & = λ. hT (t) = = 1 − FT (t) 1 − 1 − e−λt Therefore, an item whose lifetime has an exponential distribution neither improves nor deteriorates with age; that is, it is “memoryless.” β

β

8.138 From the solution to Exercise 8.130, FT (t) = 1 − e−αt and fT (t) = αβt β−1 e−αt for t > 0. Therefore, β αβt β−1 e−αt fT (t) hT (t) = = = αβt β−1 . β 1 − FT (t) e−αt From this result, we see that the hazard function is decreasing, constant, or increasing when β < 1, β = 1, and β > 1, respectively. It now follows easily from Exercises 8.136(d) and (e) that the item improves with age, is memoryless, or deteriorates with age depending on whether β < 1, β = 1, and β > 1, respectively.

8-50

CHAPTER 8

Continuous Random Variables and Their Distributions

8.139 Referring to Exercise 8.136(b), we find that $ #4 t *t *t ′ − 0 hT (u) du d hT (u) du = hT (t)e− 0 hT (u) du . fT (t) = FT (t) = e dt 0 a) If hT (t) = λ, then fT (t) = hT (t)e− so that T ∼ E(λ). b) If hT (t) = α + βt, then fT (t) = hT (t)e−

*t 0

hT (u) du

*t 0

hT (u) du

= λe−

= (α + βt)e−

*t 0

λ du

*t

0 (α+βu) du

= λe−λt ,

= (α + βt)e−

% & αt+βt 2 /2

.

8.140 Let TI and TII denote the lifetimes of Items I and II, respectively. a) Answers will vary. Many people guess that the probability for Item II is one-half of that for Item I but, as we will see in part (b), that is incorrect. b) By assumption, hTII = 2hTI . Referring now to Exercise 8.136(c), * t+s

* t+s

P (TII > t + s | TII > t) = e− t hTII (u) du = e− t 2hTI (u) du 5 * t+s 62 % &2 = e− t hTI (u) du = P (TI > t + s | TI > t) .

8.7 Functions of Continuous Random Variables Basic Exercises 8.141 As X ∼ U(0, 1), its PDF is fX (x) = 1 if 0 < x < 1, and fX (x) = 0 otherwise. The range of X is the interval (0, 1). Here g(x) = c + dx, which is strictly increasing and differentiable on the range of X. The transformation method is appropriate. Note that g −1 (y) = (y − c)/d. Letting Y = c + dX, we see that the range of Y is the interval (c, c + d). Applying the transformation method now yields % & fX (y − c)/d 1 1 fY (y) = ′ fX (x) = = |g (x)| d |d| if c < y < c + d, and fY (y) = 0 otherwise. So, c + dX ∼ U(c, c + d).

8.142 As X ∼ E(1), its PDF is fX (x) = e−x if x > 0, and fX (x) = 0 otherwise. The range of X is the interval (0, ∞). Here g(x) = bx, which is strictly increasing and differentiable on the range of X. The transformation method is appropriate. Note that g −1 (y) = y/b. Letting Y = bX, we see that the range of Y is the interval (0, ∞). Applying the transformation method now yields fY (y) =

1 fX (y/b) e−y/b 1 1 f (x) = = = e− b y X ′ |g (x)| |b| b b

if y > 0, and fY (y) = 0 otherwise. So, bX ∼ E(1/b).

8.143 As X ∼ U(0, 1), its PDF is fX (x) = 1 if 0 < x < 1, and fX (x) = 0 otherwise. The range of X is the interval (0, 1). Here g(x) = − ln x, which is strictly decreasing and differentiable on the range of X.

8.7

Functions of Continuous Random Variables

8-51

The transformation method is appropriate. Note that g −1 (y) = e−y . Letting Y = − ln X, we see that the range of Y is the interval (0, ∞). Applying the transformation method now yields % & fX e−y 1 fX (x) 9 = e−y fY (y) = ′ fX (x) = =9 9−1/e−y 9 |g (x)| | − 1/x| if y > 0, and fY (y) = 0 otherwise. So, − ln X ∼ E(1). 5 √ 6 & % 2 2 8.144 As X ∼ N 0, 1/α 2 , its PDF is fX (x) = α/ 2π e−α x /2 if −∞ < x < ∞. The range of X is

the interval (−∞, ∞). Here g(x) = 1/x 2 , which is not monotone on the range of X. The transformation method is not appropriate. % % √ & 2 2 & 8.145 As X ∼ N 0, σ 2 , its PDF is fX (x) = 1/ 2πσ e−x /2σ if −∞ < x < ∞. The range of X is the interval (−∞, ∞). Here g(x) = x 2 for part (a) and g(x) = x 2 /σ 2 for part (b). In either case, g is not monotone on the range of X. The transformation method is not appropriate. 8.146 As X has the beta distribution with parameters α and β, its PDF is fX (x) =

1 x α−1 (1 − x)β−1 , B(α, β)

0 < x < 1,

and fX (x) = 0 otherwise. The range of X is the interval (0, 1). Here g(x) = x −1 − 1, which is strictly decreasing and differentiable on the range of X. The transformation method is appropriate. Note that g −1 (y) = 1/(y + 1). Letting Y = X−1 − 1, we see that the range of Y is the interval (0, ∞). Applying the transformation method now yields % & fX 1/(y + 1) 1 fX (x) 9=9 fY (y) = ′ fX (x) = 9 % &2 99 9−1/x 2 9 9 |g (x)| 9−1/ 1/(y + 1) 9 1 1 = 2 (y + 1) B(α, β)

=

#

1 y+1

$α−1 #

1 1− y+1

$β−1

1 1 y β−1 y β−1 = B(α, β) (y + 1)α+1 (y + 1)β−1 B(α, β)(y + 1)α+β

if y > 0, and fY (y) = 0 otherwise.

8.147 As X ∼ U(0, 1), its PDF is fX (x) = 1 if 0 < x < 1, and fX (x) = 0 otherwise. The range of X is the interval (0, 1). Here g(x) = x 1/α , which is strictly increasing and differentiable on the range of X. The transformation method is appropriate. Note that g −1 (y) = y α . Letting Y = X1/α , we see that the range of Y is the interval (0, 1). Applying the transformation method now yields % & fX y α 1 fX (x) 9 = 9% fY (y) = ′ fX (x) = 9 &% &1/α−1 99 9(1/α)x 1/α−1 9 9 |g (x)| 1/α yα 9 9 =

1 1 = αy α−1 = y α−1 (1 − y)1−1 1−α B(α, 1) (1/α)y

if 0 < y < 1, and fY (y) = 0 otherwise. So, X1/α has the beta distribution with parameters α and 1.

8.148 As X ∼ E(α), its PDF is fX (x) = αe−αx if x > 0, and fX (x) = 0 otherwise. The range of X is the interval (0, ∞). Here g(x) = βex , which is strictly increasing and differentiable on the range of X.

8-52

CHAPTER 8

Continuous Random Variables and Their Distributions

The transformation method is appropriate. Note that g −1 (y) = ln(y/β). Letting Y = βeX , we see that the range of Y is the interval (β, ∞). Applying the transformation method now yields % & fX ln(y/β) 1 fX (x) αe−α ln(y/β) α(y/β)−α αβ α 9 9 = fX (x) = = = = fY (y) = ′ 9βeln(y/β) 9 |βex | |g (x)| y y y α+1

if y > β, and fY (y) = 0 otherwise.

8.149 As X ∼ E(α), its PDF is fX (x) = αe−αx if x > 0, and fX (x) = 0 otherwise. The range of X is the interval (0, ∞). Here g(x) = x 1/β , which is strictly increasing and differentiable on the range of X. The transformation method is appropriate. Note that g −1 (y) = y β . Letting Y = X1/β , we see that the range of Y is the interval (0, ∞). Applying the transformation method now yields % & β fX y β 1 fX (x) αe−αy β 9=9 9= fX (x) = 9 = αβy β−1 e−αy fY (y) = ′ & % 1−β 1/β−1 1/β−1 9 9 9 9 |g (x)| (1/β)y (1/β)x 9 9(1/β) y β

if y > 0, and fY (y) = 0 otherwise. & % 8.150 As X ∼ N µ, σ 2 , its PDF is

fX (x) = √

1 2π σ

2 /2σ 2

e−(x−µ)

,

−∞ < x < ∞.

Letting Y = a + bX, we see that the range of Y is (−∞, ∞). Assume that b < 0. For y ∈ R, % & % & FY (y) = P (a + bX ≤ y) = P X ≥ (y − a)/b = 1 − FX (y − a)/b .

Differentiation now yields

% & 1 2 1 1 2 fY (y) = −fX (y − a)/b · = − √ e−((y−a)/b−µ) /2σ b b 2π σ 2 1 1 2 2 2 2 =√ e−(y−a−bµ) /2b σ = √ e−(y−(a+bµ)) /2(bσ ) 2π |b|σ 2π(|b|σ )

% & for all y ∈ R. Thus, a + bX ∼ N a + bµ, (bσ )2 . A similar argument shows that this same result holds when b > 0. 8.151 Here g(x) = a + bx, which is strictly decreasing if b < 0 and strictly increasing if b > 0. Note that g is differentiable on all of R. The transformation method is appropriate. We have g −1 (y) = (y − a)/b. Applying the transformation method now yields % & % & fX (y − a)/b 1 1 fX (x) = = fX (y − a)/b . fa+bX (y) = ′ |g (x)| |b| |b| 8.152 As X ∼ E(1), its PDF is fX (x) = e−x if x > 0, and fX (x) = 0 otherwise. The range of X is the interval (0, ∞). Here g(x) = 10x 0.8 , which is strictly increasing and differentiable on the range of X. The transformation method is appropriate. Note that g −1 (y) = (0.1y)1.25 . Letting Y = 10X0.8 , we see

8.7

Functions of Continuous Random Variables

8-53

that the range of Y is the interval (0, ∞). Applying the transformation method now yields % & fX (0.1y)1.25 1 fX (x) fY (y) = ′ fX (x) = 9 −0.2 9 = 9 % &−0.2 99 98x 9 9 |g (x)| 98 (0.1y)1.25 9 1.25

e−(0.1y) 1.25 = 0.125(0.1y)0.25 e−(0.1y) = −0.25 8(0.1y) if y > 0, and fY (y) = 0 otherwise.

8.153 a) Referring to Definition 8.11 on page 470, we find that, for x ∈ R, & 1 1 % 1 1 % 5 6= & ψ (x − η)/θ = % & 2 θ θ π 1 + (x − η)/θ θπ 1 + (x − η)2 /θ 2 =

π

%

θ2

θ & = fX (x). + (x − η)2

b) Let Y = (X − η)/θ = −η/θ + (1/θ)X. Referring to part (a) and applying Equation (8.64) on page 469 with a = −η/θ and b = 1/θ yields # $ # $ 1 y − (−η/θ) 1 (θy + η) − η fY (y) = fX = θfX (θy + η) = θ ψ = ψ(y). |1/θ| 1/θ θ θ Therefore, (X − η)/θ has the standard Cauchy distribution. 8.154 % & a) As X ∼ N µ, σ 2 , its PDF is

fX (x) = √

1

2 /2σ 2

e−(x−µ)

, −∞ < x < ∞. 2π σ Here g(x) = ex , which is strictly increasing and differentiable on the range of X. The transformation method is appropriate. Note that g −1 (y) = ln y. Letting Y = eX , we see that the range of Y is the interval (0, ∞). Applying the transformation method now yields 1 fX (x) fX (ln y) f (x) = = X |g ′ (x)| |ex | eln y %√ &−1 −(ln y−µ)2 /2σ 2 2π σ e 1 2 2 = =√ e−(ln y−µ) /2σ y 2π σy if y > 0, and fY (y) = 0 otherwise. b) The (natural) logarithm of the random variable is normally distributed. fY (y) =

8.155 a) As X ∼ U(−π/2, π/2), its PDF is fX (x) = 1/π if −π/2 < x < π/2, and fX (x) = 0 otherwise. The range of X is the interval (−π/2, π/2). Here g(x) = sin x, which is strictly increasing and differentiable on the range of X. The transformation method is appropriate. Note that g −1 (y) = arcsin y. Letting Y = sin X, we see that the range of Y is the interval (−1, 1). Applying the transformation method now yields 1/π 1 1 fX (x) fX (x) = = = " fY (y) = ′ |g (x)| | cos x| | cos(arcsin y)| π 1 − y2

if −1 < y < 1, and fY (y) = 0 otherwise.

8-54

CHAPTER 8

Continuous Random Variables and Their Distributions

b) As X ∼ U(−π, π), its PDF is fX (x) = 1/(2π) if −π < x < π, and fX (x) = 0 otherwise. Note that the transformation method is inappropriate here because sin x is not monotone on the interval (−π, π). We instead use the CDF method. If −1 ≤ y < 0,

If 0 ≤ y < 1,

Thus,

FY (y) = P (Y ≤ y) = P (sin X ≤ y) = P (−π − arcsin y ≤ X ≤ arcsin y) % & arcsin y − (−π − arcsin y) 1 1 = = + arcsin y. 2π 2 π

FY (y) = P (Y ≤ y) = P (sin X ≤ y) = P (X ≤ arcsin y or X ≥ π − arcsin y) % & % & arcsin y − (−π) + π − (π − arcsin y) 1 1 = = + arcsin y. 2π 2 π

⎧ 0, if y < −1; ⎪ ⎪ ⎨ 1 1 FY (y) = + arcsin y, if −1 ≤ y < 1; ⎪ ⎪ ⎩2 π 1, if y ≥ 1. 5 " 6−1 if −1 < y < 1, and fY (y) = 0 otherwise. Differentiation now shows that fY (y) = π 1 − y 2

8.156 Let $ denote the angle, in radians, made by the line segment connecting the center of the circle to the point chosen. Then $ ∼ U(−π, π). a) Let X denote the x-coordinate of the point chosen. Then X = cos $. For −1 < x ≤ 1, FX (x) = P (X ≤ x) = P (cos $ ≤ x) = 1 − P (cos $ > x) 2 arccos x 1 = 1 − P (− arccos x < $ < arccos x) = 1 − = 1 − arccos x. 2π π Differentiation now yields fX (x) =

FX′ (x)

d = dx

# $ 1 1 1 − arccos x = √ π π 1 − x2

if −1 < x < 1, and fX (x) = 0 otherwise. b) Let Y denote the y-coordinate of the point chosen. By symmetry, X and Y have the same distribution. 6−1 5 " Thus, a PDF of Y is given by fY (y) = π 1 − y 2 if −1 < y < 1, and fY (y) = 0 otherwise. c) Let Z denote the distance of the point chosen to the point (0, 1). Then " " " Z = (X − 1)2 + Y 2 = (X − 1)2 + 1 − X 2 = 2(1 − X). √ We know that the range of X is the interval (−1, 1). Let g(x) = 2(1 − x), which is strictly decreasing on the range of X. The transformation method is appropriate. Note that g −1 (z) = 1 − z2 /2. " 1 fX (x) 9 = 2(1 − x)fX (x) fX (x) = 9 fZ (z) = ′ √ 9−1/ 2(1 − x)9 |g (x)| : 5 5 6 % &6 z 2 = 2 1 − 1 − z2 /2 fX 1 − z2 /2 = ; % &2 = π √4 − z2 π 1 − 1 − z2 /2 if 0 < z < 2, and fZ (z) = 0 otherwise.

8.7

Functions of Continuous Random Variables

8-55

8.157 a) Recall that the range of a beta distribution is the interval (0, 1). Because a + (b − a) · 0 = a and a + (b − a) · 1 = b, we see that the range of Y is the interval (a, b). b) As X has the beta distribution with parameters α and β, its PDF is fX (x) =

1 x α−1 (1 − x)β−1 , B(α, β)

0 < x < 1,

and fX (x) = 0 otherwise. The range of X is the interval (0, 1). Here g(x) = a + (b − a)x, which is strictly increasing and differentiable on the range of X. The transformation method is appropriate. Note that g −1 (y) = (y − a)/(b − a). Applying the transformation method now yields % & 1 fX (x) 1 f (x) = = f (y − a)/(b − a) X X |g ′ (x)| b−a b−a % &α−1 % &β−1 1 = (y − a)/(b − a) 1 − (y − a)/(b − a) (b − a)B(α, β)

fY (y) =

=

1 (y − a)α−1 (b − y)β−1 (y − a)α−1 (b − y)β−1 = (b − a)B(α, β) (b − a)α−1 (b − a)β−1 (b − a)α+β−1 B(α, β)

if a < y < b, and fY (y) = 0 otherwise. 8.158 We apply the CDF method. For y < 0, FY (y) = P (Y ≤ y) = P (1/X ≤ y) = P (1/y ≤ X < 0) = FX (0) − FX (1/y). Differentiation yields fY (y) = For y ≥ 0,

1 1 1 % & . f (1/y) = = X y2 π(1 + y 2 ) y 2 π 1 + (1/y)2

FY (y) = P (Y ≤ y) = P (1/X ≤ y) = P (X < 0 or X ≥ 1/y) = P (X < 0) + 1 − P (X < y) = FX (0) + 1 − FX (1/y).

Differentiation yields 1 1 1 &= . fX (1/y) = 2 % 2 2 y π(1 + y 2 ) y π 1 + (1/y) % & Thus, we see that fY (y) = 1/ π(1 + y 2 ) for −∞ < y < ∞. In other words, 1/X ∼ C(0, 1). fY (y) =

8.159 The range of R is (0, ∞). Here g(r) = r 2 , which is strictly increasing and differentiable on the √ range of R. The transformation method is appropriate. Note that g −1 (y) = y. Letting Y = R 2 , we see that the range of Y is the interval (0, ∞). Applying the transformation method now yields

√ 2 %√ & %√ & 2 fR y y/σ 2 e−( y ) /2σ 1 fR (r) 1 −y/2σ 2 fR (r) = = = = e fY (y) = ′ √ √ |g (r)| |2r| 2 y 2 y 2σ 2 % & if y > 0, and fY (y) = 0 otherwise. Thus, we see that R 2 ∼ E 1/2σ 2 .

8-56

CHAPTER 8

Continuous Random Variables and Their Distributions

8.160 a) As X has the chi-square distribution with n degrees of freedom, its PDF is fX (x) =

1 x n/2−1 e−x/2 , 2n/2 +(n/2)

x > 0,

√ and fX (x) = 0 otherwise. The range of X is the interval (0, ∞). Here g(x) = σ x, which is strictly increasing and differentiable on the √ range of X. The transformation method is appropriate. Note that g −1 (y) = y 2 /σ 2 . Letting Y = σ X, we see that the range of Y is the interval (0, ∞). Applying the transformation method now yields % & fX y 2 /σ 2 1 fX (x) 9=9 ! " 9 fY (y) = ′ fX (x) = 9 9σ/2√x 9 9 9 |g (x)| 9σ 2 y 2 /σ 2 9 5 6n/2−1 % 2 2 & 1 2 2 y /σ e− y /σ /2 1 2n/2 +(n/2) n−1 −y 2 /2σ 2 = y e = σ 2 /2y 2n/2−1 +(n/2)σ n if y > 0, and fY%(y) =&0 otherwise. b) Let U ∼ N 0, σ 2 and set V = |U |. We want to find a PDF of V , which we do by applying the CDF method. For v ≥ 0, FV (v) = P (V ≤ v) = P (|U | ≤ v) = P (−v ≤ U ≤ v) = FU (v) − FU (−v). Differentiation now yields 1

fV (v) = fU (v) + fU (−v) = 2fU (v) = 2 · √ e 2π σ

−v 2 /2σ 2

√ 2/π −v 2 /2σ 2 = e σ

if v > 0, and √ fV (v) = 0 otherwise. Setting n = 1 in the PDF obtained in part (a) and using the fact that +(1/2) = π, we get, for y > 0, √ 1 2/π −y 2 /2σ 2 1−1 −y 2 /2σ 2 y e = e = fV (y). fY (y) = 1/2−1 1 σ +(1/2)σ 2 √ % & Thus, for n = 1, σ X has the same distribution as the absolute value of a N 0, σ 2 distribution. c) Setting n = 2 in the PDF obtained in part (a) and using the fact that +(1) = 1, we get, for y > 0, fY (y) =

1

22/2−1 +(2/2)σ 2

y 2−1 e−y

2 /2σ 2

=

y −y 2 /2σ 2 e , σ2

which is the PDF of a Rayleigh random variable with parameter σ 2 . √ d) Setting n = 3 in the PDF obtained in part (a) and using the fact that +(3/2) = π/2, we get, for y > 0, √ 1 2/π 2 −y 2 /2σ 2 3−1 −y 2 /2σ 2 y e = y e , fY (y) = 3/2−1 3 2 +(3/2)σ σ3 which is the PDF of a Maxwell distribution with parameter σ 2 . 8.161 As X has the standard Cauchy distribution, its PDF is fX (x) =

1 &, π 1 + x2 %

−∞ < x < ∞.

8.7

Functions of Continuous Random Variables

8-57

% & The range of X is the interval (−∞, ∞). Here g(x) = 1/ 1 + x 2 , which is not monotone on the range & % of X. Thus, we apply the CDF method instead of the transformation method. Letting Y = 1/ 1 + X2 , we note that the range of Y is the interval (0, 1). For 0 ≤ y < 1, # $ 5 6 5 6 1 2 2 FY (y) = P (Y ≤ y) = P ≤ y = P X ≥ 1/y − 1 = 1 − P X < 1/y − 1 1 + X2 5 " 6 5" 6 5 " 6 " = 1 − P − 1/y − 1 < X < 1/y − 1 = 1 − FX 1/y − 1 + FX − 1/y − 1 . Differentiation now yields

5" 6 5 " 6 1 1 1 1 f 1/y − 1 + f − 1/y − 1 fY (y) = √ √ X X 2 1/y − 1 y 2 2 1/y − 1 y 2 5" 6 1 1 1 5 6 = 2√ 1/y − 1 = 2 √ fX % √ y 1/y − 1 y 1/y − 1 π 1 + 1/y − 1&2

1 1 y 1 1 = y −1/2 (1 − y)−1/2 = = √ √ √ √ B(1/2, 1/2) y 2 1/y − 1 π πy 1/y − 1 π y 1−y % & if 0 < y < 1, and fY (y) = 0 otherwise. Thus, 1/ 1 + X 2 has the beta distribution with parameters 1/2 and 1/2 or, equivalently, the arcsine distribution. =

8.162 a) Because U ∼ U(0, 1) and 1 − u is a linear transformation that takes the interval (0, 1) to itself, we would guess that 1 − U ∼ U(0, 1). b) As U ∼ U(0, 1), its PDF is fU (u) = 1 if 0 < u < 1, and fU (u) = 0 otherwise. The range of U is the interval (0, 1). Here g(u) = 1 − u, which is strictly decreasing and differentiable on the range of U . The transformation method is appropriate. Note that g −1 (y) = 1 − y. Letting Y = 1 − U , we see that the range of Y is the interval (0, 1). Applying the transformation method now yields fY (y) =

1

fU (u) |g ′ (u)|

=

fU (u) = fU (1 − y) = 1 | − 1|

if 0 < y < 1, and fY (y) = 0 otherwise. So, 1 − U ∼ U(0, 1).

8.163 From Exercise 8.65 on page 434, we know that FX (x) = (x − a)/(b − a) if a ≤ x < b. Using this fact, we also see that FX−1 (y) = a + (b − a)y if 0 < y < 1. a) From Proposition 8.16(a), we know that FX (X) ∼ U(0, 1). However, FX (X) = (X − a)/(b − a) so that (X − a)/(b − a) ∼ U(0, 1). b) From Proposition 8.16(b), we know that FX−1 (U ) has the same distribution as X, which is U(a, b). However, FX−1 (U ) = a + (b − a)U so that a + (b − a)U ∼ U(a, b). 8.164 a) Use a basic random number generator to obtain 5000 observations of a uniform random variable on the interval (0, 1) and then transform those numbers by using the function a + (b − a)u. b) Answers will vary. c) Answers will vary. 8.165 Let Y = ⌊m + (n − m + 1)X⌋. As X ∼ U(0, 1), its CDF is / 0, if x < 0; FX (x) = x, if 0 ≤ x < 1; 1, if x ≥ 1.

8-58

CHAPTER 8

Continuous Random Variables and Their Distributions

In particular, the range of X is the interval (0, 1). If follows that the range of Y is S = {m, m + 1, . . . , n}. For y ∈ S, % & % & pY (y) = P (Y = y) = P ⌊m + (n − m + 1)X⌋ = y = P y ≤ m + (n − m + 1)X < y + 1 $ $ $ # # # y+1−m y−m y+1−m y−m ≤X< = FX − FX =P n−m+1 n−m+1 n−m+1 n−m+1 # $ # $ y+1−m y−m 1 = − = . n−m+1 n−m+1 n−m+1

Thus, Y has the discrete uniform distribution on {m, m + 1, . . . , n}.

8.166 a) Use a basic random number generator to obtain 10,000 observations of a uniform random variable on the interval (0, 1) and then transform those numbers by using the function ⌊m + (n − m + 1)u⌋. b) Answers will vary. c) Answers will vary.

Theory Exercises 8.167 We have Z = (X − µ)/σ = (−µ/σ ) + (1/σ )X. Applying Proposition 8.15 with a = −µ/σ and b = 1/σ , we conclude that Z has the normal distribution with parameters # $2 1 1 µ 2 2 and b σ = σ 2 = 1. a + bµ = − + µ = 0 σ σ σ 8.168 To prove the univariate transformation theorem in the case where g is strictly increasing on the range of X, we apply the CDF method. In the proof, we use the notation g −1 for the inverse function of g, defined on the range of Y . For y in the range of Y , % & % & & % FY (y) = P (Y ≤ y) = P g(X) ≤ y = P X ≤ g −1 (y) = FX g −1 (y) ,

where the third equality follows from the fact that g is increasing. Taking the derivative with respect to y and applying the chain rule, we obtain

% & d −1 % & 1 fY (y) = FY′ (y) = FX′ g −1 (y) g (y) = ′ % −1 & fX g −1 (y) , dy g g (y) & % where, in the last equality, we used the calculus result (d/dy)g −1 (y) = 1/g ′ g −1 (y) . Because g is increasing, g ′ is positive. Letting x denote the unique real number in the range of X such that g(x) = y, we conclude that, for y in the range of Y , fY (y) = fX (x)/|g ′ (x)|, as required. 8.169 We apply the CDF method. For y in the range of Y 4 5 % & % &6 FY (y) = P (Y ≤ y) = P g(X) ≤ y = P X ∈ g −1 (−∞, y] =

g −1 ((−∞,y])

fX (x) dx,

where the last equality is a result of the FPF for continuous random variables. Differentiation now yields 4 d fY (y) = FY′ (y) = fX (x) dx. dy g −1 ((−∞,y])

This formula is the continuous analogue of the one given for discrete random variables in Equation (5.51) on page 247. The PMFs and sum in the latter are replaced by PDFs and an integral in the former.

8.7

Functions of Continuous Random Variables

8-59

8.170 Let R denote the range of X. Assume first that g is strictly increasing on R. Referring to Exercise 8.169 and applying the first fundamental theorem of calculus and the chain rule, we get 4 4 4 d d d fX (x) dx fX (x) dx = fX (x) dx = fY (y) = dy g −1 ((−∞,y]) dy { x∈R:g(x)≤y } dy { x∈R:x≤g −1 (y) } % & d −1 % & % & 1 1 = fX g −1 (y) g (y) = fX g −1 (y) ′ % −1 & = 9 ′ % −1 &9 fX g −1 (y) , 9 9 dy g g (y) g g (y)

where the last equality follows from the fact that g ′ is positive because g is strictly increasing. Next assume that g is strictly decreasing on R. Referring to Exercise 8.169 and applying the first fundamental theorem of calculus and the chain rule, we get 4 4 4 d d d fX (x) dx = fX (x) dx = fX (x) dx fY (y) = dy g −1 ((−∞,y]) dy { x∈R:g(x)≤y } dy { x∈R:x≥g −1 (y) } & d −1 & & % % % 1 1 = −fX g −1 (y) g (y) = −fX g −1 (y) ′ % −1 & = 9 ′ % −1 &9 fX g −1 (y) , 9g g (y) 9 dy g g (y)

where the last equality follows from the fact that g ′ is negative because g is strictly decreasing.

Advanced Exercises 8.171 Answers will vary. However, the idea is as follows. Let Xj denote the roundoff error for the j th number—that is, the difference between the j th number and its rounded value. The sum of the rounded numbers equals the rounded sum of the unrounded numbers if and only if |X1 + · · · + X10 | ≤ 0.5. We want to use simulation to estimate the probability of that event. We can reasonably assume, as we do, that X1 , . . . , X10 are independent and identically distributed random variables with common distribution U(−0.5, 0.5). Thus, for the estimate, we generate a large number, say, 10,000, of 10-tuples of observations of a U(−0.5, 0.5) random variable by using a basic random number generator (as explained in Exercise 8.164), and then sum each 10-tuple. The proportion of those sums that are at most 0.5 in magnitude provides the required probability estimate. We followed this procedure and obtained a probability estimate of 0.418. 8.172 For 0 < y < 1, the set Ay = { x : FX (x) = y } is a nonempty bounded closed interval. Indeed, Ay is nonempty and bounded because FX is continuous and FX (−∞) = 0 and FX (∞) = 1; Ay is closed because it equals the inverse image under the continuous function FX of the closed set {y}; Ay is an interval because FX is nondecreasing—if x1 , x2 ∈ Ay with x1 < x2 , then, for each x1 < x < x2 , we have y = FX (x1 ) ≤ FX (x) ≤ FX (x2 ) = y, and hence FX (x) = y, that is, x ∈ Ay . Now let ry denote the right endpoint of Ay . Then we have that {FX (X) ≤ y} = {X ≤ ry }. Hence & % P FX (X) ≤ y = P (X ≤ ry ) = FX (ry ) = y, which shows that FX (X) ∼ U(0, 1).

8.173 Yes. Suppose to the contrary that X isn’t a continuous random variable. Then there is a real number x0 such that% P (X = x0 ) > 0. &Because {X = x0 } ⊂ {FX (X) = FX (x0 )}, the domination principle implies that P FX (X) = FX (x0 ) ≥ P (X = x0 ) > 0. This inequality shows that FX (X) isn’t a continuous random variable and, consequently, couldn’t be uniform on the interval (0, 1).

8-60

CHAPTER 8

Continuous Random Variables and Their Distributions

8.174 Suppose that 0 < y < 1. As we showed in the solution to Exercise 8.172, the set { x : FX (x) = y } is a nonempty bounded closed interval. Consequently, we can define FX−1 (y) = min{ x : FX (x) = y }. Let W = FX−1 (U ). Then, for w ∈ R, we have {W ≤ w} = {U ≤ FX (w)}. Hence, % & FW (w) = P (W ≤ w) = P U ≤ FX (w) = FX (w),

which shows that W has the same CDF as X and, thus, the same probability distribution.

Review Exercises for Chapter 8 Basic Exercises 8.175 Your colleague should have written

8.176 a) For 0 ≤ w < 1,

⎧ 0, ⎪ ⎨ 1/3, FX (x) = ⎪ ⎩ 2/3, 1,

if x < 1; if 1 ≤ x < 2; if 2 ≤ x < 3; if x ≥ 3.

FW (w) = P (W ≤ w) = 1 − P (W > w) = 1 − Thus, FW (w) =

/

b) Referring to the result of part (a), we get ′ fW (w) = FW (w) =

(1 − w)2 = 2w − w2 . 1

0, if w < 1; 2w − w 2 , if 0 ≤ w < 1; 1, w ≥ 1.

6 d 5 2w − w2 = 2 − 2w = 2(1 − w) dw

if 0 < w < 1, and fW (w) = 0 otherwise. c) First note that B(1, 2) = +(1)+(2)/ +(3) = 0! 1!/2! = 1/2. For 0 < w < 1, we can then write fW (w) = 2(1 − w) =

1 w1−1 (1 − w)2−1 . B(1, 2)

Thus, W has the beta distribution with parameters 1 and 2. 8.177 a) Noting that X is a continuous random variable, we have % & % & FY (y) = P (Y ≤ y) = P Y ≤ y, X ∈ {0.2, 0.9} + P Y ≤ y, X ∈ / {0.2, 0.9} % & = 0 + P Y ≤ y, X ∈ / {0.2, 0.9} = P (Y ≤ y, X ̸= 0.2, X ̸= 0.9).

This result shows that the CDF of Y does not depend on the values assigned to Y when X = 0.2 or X = 0.9. In particular, then, we can assume without loss of generality that Y = X when X ∈ {0.2, 0.9}. Doing so, we have FY (y) = 0 if y < 0.2, and FY (y) = 1 if y ≥ 0.9. Also, for 0.2 ≤ y < 0.9, FY (y) = P (Y ≤ y) = P (X ≤ y) = y.

8-61

Review Exercises for Chapter 8

Thus, FY (y) =

/ 0,

b) Referring to part (a), we note that

y, 1,

if y < 0.2; if 0.2 ≤ y < 0.9; if y ≥ 0.9.

P (Y = y) = FY (y) − FY (y−) =

/

0.2, 0.1, 0,

if y = 0.2; if y = 0.9; otherwise.

. Therefore, Y is not a discrete random variable because y P (Y = y) = 0.3 < 1. It is not a continuous random variable because, for instance, P (Y = 0.2) ̸= 0. c) From part (b), we know that S = {0.2, 0.9} and that P (Y ∈ S) = P (Y = 0.2) + P (Y = 0.9) = 0.2 + 0.1 = 0.3. For y ∈ S, P (Y = y, Y ∈ S) P (Y = y | Y ∈ S) = P (Y ∈ S) . and P (Y = y | Y ∈ S) = 0 otherwise. As y P (Y discrete. Its CDF is / 0, FY|{Y ∈S} (y) = 2/3, 1, d) Note that and that

P (Y = y) = = 0.3

7

2/3, 1/3,

if y = 0.2; if y = 0.9,

= y | Y ∈ S) = 1, the random variable Y|{Y ∈S} is if y < 0.2; if 0.2 ≤ y < 0.9; if y ≥ 0.9.

P (Y ∈ / S) = 1 − P (Y ∈ S) = 1 − 0.3 = 0.7 P (Y ≤ y, Y ∈ / S) = P (Y ≤ y) − P (Y ≤ y, Y ∈ S).

Referring to parts (a) and (c), we have

& P (Y ≤ y, Y ∈ / S) 1 % = P (Y ≤ y) − P (Y ≤ y, Y ∈ S) P (Y ∈ / S) 0.7 # $ & 1 0.3 1 % = P (Y ≤ y) − P (Y ≤ y, Y ∈ S) = FY (y) − 0.3FY|{Y ∈S} (y) , 0.7 P (Y ∈ S) 0.7

FY|{Y ∈S} / S) = / (y) = P (Y ≤ y | Y ∈

or

FY|{Y ∈S} / (y) =

/ 0,

(y − 0.2)/0.7, 1,

if y < 0.2; if 0.2 ≤ y < 0.9; if y ≥ 0.9.

We see that Y|{Y ∈S} / ∼ U(0.2, 0.9) and, in particular, is a continuous random variable. e) From part (d), FW (y) = FY|{Y ∈S} / (y) =

& & 1 % 1 % FY (y) − 0.3FY|{Y ∈S} (y) = FY (y) − 0.3FV (y) , 0.7 0.7

which means that FY = 0.3FV + 0.7FW . Thus, choose a = 0.3 and b = 0.7. f) As X is a continuous random variable, the four required probabilities are identical and equal FX (0.9) − FX (0.2) = 0.9 − 0.2 = 0.7.

8-62

CHAPTER 8

Continuous Random Variables and Their Distributions

Referring to part (a) and Proposition 8.2 on page 412, we find that the required probabilities are FY (0.9−) − FY (0.2−) = 0.9 − 0 = 0.9, FY (0.9−) − FY (0.2) = 0.9 − 0.2 = 0.7, FY (0.9) − FY (0.2) = 1 − 0.2 = 0.8,

FY (0.9) − FY (0.2−) = 1 − 0 = 1.

Referring to part (c) and Proposition 8.2, we find that the required probabilities are FV (0.9−) − FV (0.2−) = 2/3 − 0 = 2/3, FV (0.9−) − FV (0.2) = 2/3 − 2/3 = 0, FV (0.9) − FV (0.2) = 1 − 2/3 = 1/3, FV (0.9) − FV (0.2−) = 1 − 0 = 1.

As W is a continuous random variable, the four required probabilities are identical and equal FW (0.9) − FW (0.2) = 1 − 0 = 1. 8.178 Yes. For a CDF, FX , we have FX (x) = P (X ≤ x) = 1 − P (X > x) for all x ∈ R. Thus, the CDF can be obtained from the tail probabilities. And because the CDF determines the probability distribution of X (fourth bulleted item on page 407), so do the tail probabilities. 8.179 Let Y denote weekly gas sales, in gallons. Then Y = min{X, m}. a) For 0 ≤ y < m, % & P (Y > y) = P min{X, m} > y = P (X > y) = e−λy .

Therefore,

FY (y) =

/

0, 1 − e−λy , 1,

if y < 0; if 0 ≤ y < m; otherwise.

b) From part (a), we see that % & P (Y = m) = FY (m) − FY (m−) = 1 − 1 − e−λm = e−λm , . and P (Y = y) = 0 if y ̸= m. Because y P (Y = y) = e−λm < 1, Y is not a discrete random variable and, because P (Y = m) ̸= 0, Y is not a continuous random variable. c) The gas station runs out of gas before the end of the week if and only if {X > m}, which, because X ∼ E(λ), has probability e−λm .

8.180 a) fX is discontinuous at x = 0, 1/2, and 1. b) Because X is a continuous random variable, its CDF, FX , is an everywhere continuous function, that is, FX has no points of discontinuity. c) We have # $ 4 ∞ 4 1/2 4 1 1 1 3 1= fX (x) dx = c dx + 2c dx = c + 1 − (2c) = c. 2 2 2 −∞ 0 1/2 Hence, c = 2/3. d) For 0 ≤ x < 1/2,

FX (x) =

4

x

−∞

fX (t) dt =

4

x 0

(2/3) dt = 2x/3.

8-63

Review Exercises for Chapter 8

For 1/2 ≤ x < 1, FX (x) =

4

x

−∞

fX (t) dt =

Thus,

4

1/2 0

# $# $ 1 1 4 (2/3) dt + (4/3) dt = + x − = (4x − 1)/3. 3 2 3 1/2 4

x

⎧ 0, ⎪ ⎨ 2x/3, FX (x) = ⎪ (4x − 1)/3, ⎩ 1,

if x < 0; if 0 ≤ x < 1/2; if 1/2 ≤ x < 1; if x ≥ 1.

From this result, it follows easily that FX is everywhere continuous. Note, in particular, that

and

FX (0−) = 0 = (2 · 0)/3 = FX (0), % & FX (1/2−) = 2 · (1/2)/3 = 1/3 = 4(1/2) − 1 /3 = FX (1/2), FX (1−) = (4 · 1 − 1)/3 = 1 = FX (1).

e) A graph of fX is as follows: f(x)

x

A graph of FX is as follows: F(x)

x

8-64

CHAPTER 8

Continuous Random Variables and Their Distributions

8.181 A PDF must integrate to 1. Thus, 4 ∞ 4 5 4 1= fX (x) dx = cx(5 − x) dx = c Thus, c =

6/53

0

−∞

= 0.048.

0

55

6 5x − x 2 dx = 53 c/6.

8.182 If there were a constant c such that cg is a PDF, then c would have to be positive. Then, 4 ∞ 4 1 1 1= cg(x) dx = c dx = c · ∞ = ∞, −∞ 0 x which is a contradiction.

8.183 a) We can write fX (x) = cx 3−1 (1 − x)2−1 if 0 < x < 1, and fX (x) = 0 otherwise. Because of the functional form of the PDF, we see that X must have the beta distribution with parameters 3 and 2. b) We have 1 +(3 + 2) +(5) 4! c= = = = = 12. B(3, 2) +(3)+(2) +(3)+(2) 2! 1! c) We have # $ 4 ∞ 4 1 4 1 1 2 2 3 1= fX (x) dx = cx (1 − x) dx = c (x − x ) dx = c · . 12 −∞ 0 0 Thus, c = 12. d) From Equation (8.55) on page 457, the CDF of X is 4 # $ ) 4 j FX (x) = x (1 − x)4−j = 4x 3 (1 − x) + x 4 = 4x 3 − 3x 4 = x 3 (4 − 3x). j j =3

% & Therefore, P (X ≤ 1/2) = (1/2)3 4 − 3 · (1/2) = 0.3125. e) Applying the conditional probability rule and using the results of part (d) yields

P (X ≤ 1/4, X ≤ 1/2) P (X ≤ 1/4) = P (X ≤ 1/2) P (X ≤ 1/2) % & (1/4)3 4 − 3 · (1/4) = 0.1625. = 0.3125

P (X ≤ 1/4 | X ≤ 1/2) =

8.184 a) It would make sense for x > 0. b) Because X2 cannot be negative, we would necessarily have fX2 (x) = 0 for all x < 0. Thus, the required constant is 0. c) If fX is an even function, then, making the substitution u = −t, we get 4 −x 4 ∞ 4 ∞ fX (t) dt = fX (−u) du = fX (u) du FX (−x) = −∞

x

x

= P (X > x) = 1 − P (X ≤ x) = 1 − FX (x).

Hence, the quantity in brackets on the right of the display in part (a) can be simplified as follows: %√ & % √ & %√ & 5 %√ &6 %√ & FX x − FX − x = FX x − 1 − FX x = 2FX x − 1.

Review Exercises for Chapter 8

8-65

8.185 a) We can write fZ (z) = 2z =

1 2! +(3) z= z2−1 (1 − z)1−1 . z= +(2)+(1) B(2, 1) 1! 0!

Thus, Z has the beta distribution with parameters 2 and 1. b) For 0 ≤ θ < 2π, the event {$ ≤ θ } occurs if and only if the center of the first spot lies in the angular sector [0, θ), which has area θ/2. Hence, by Equation (8.5) on page 403, F$ (θ) = P ($ ≤ θ) =

|{$ ≤ θ}| θ/2 θ = = . π π 2π

Thus,

⎧ 0, ⎪ ⎨ θ F$ (θ) = , ⎪ ⎩ 2π 1, c) From the result of part (b), we see that f$ (θ) = F$′ (θ) =

if θ < 0; if 0 ≤ θ < 2π; if θ ≥ 2π. d dθ

#

θ 2π

$

=

1 2π

if 0 < θ < 2π, and f$ (θ) = 0 otherwise. Thus $ ∼ U(0, 2π).

8.186 a) As N(t) ∼ P(3.8t), its mean is 3.8t. Thus, on average, 3.8t calls arrive at the switchboard during the first t minutes. b) Event {W1 > t} occurs if and only if the first phone call arrives after time t, which means that N(t) = 0. Similarly, event {W1 ≤ t} occurs if and only if the first phone call arrives on or before time t, which means that N(t) ≥ 1. c) Referring to part (b), we have % & (3.8)0 P (W1 > t) = P N(t) = 0 = e−3.8t = e−3.8t . 0! d) From Propostion 8.8 on page 433 and the result of part (c), we conclude that W1 ∼ E(3.8). e) For t > 0, % & % & P (W3 ≤ t) = P N(t) ≥ 3 = 1 − P N(t) ≤ 2 =1−

2 )

e−3.8t

j =0

3−1 ) (3.8t)j (3.8t)j = 1 − e−3.8t . j! j! j =0

Referring now to the note directly preceding Example 8.16 and to Equation (8.49), we can conclude that W3 ∼ +(3, 3.8). f) Proceeding as in part (e), we have, for t > 0, % & % & P (Wn ≤ t) = P N(t) ≥ n = 1 − P N(t) ≤ n − 1 =1−

Thus, Wn ∼ +(n, 3.8).

n−1 )

j =0

e

−3.8t

n−1 ) (3.8t)j (3.8t)j −3.8t =1−e . j! j! j =0

8-66

CHAPTER 8

Continuous Random Variables and Their Distributions

8.187 Let X denote the computed area of the circle, so that X = πD 2 /4. a) For π(d − ϵ)2 /4 ≤ x < π(d + ϵ)2 /4, 5 " 5 6 5 6 6 " FX (x) = P (X ≤ x) = P πD 2 /4 ≤ x = P D ≤ 2 x/π = FD 2 x/π Differentiation now yields

5 " 6 1 1 1 1 = √ fX (x) = FX′ (x) = √ fD 2 x/π = √ · πx 2ϵ 2ϵ πx πx

if π(d − ϵ)2 /4 ≤ x < π(d + ϵ)2 /4, and fX (x) = 0 otherwise. b) Note that, because D can’t be negative, the normal distribution given for D must be an approximation. Thus, our calculation here is purely formal. For x > 0, 5 6 FX (x) = P (X ≤ x) = P πD 2 /4 ≤ x 5 " 6 5 " 6 5 " 6 " = P −2 x/π ≤ D ≤ 2 x/π = FD 2 x/π − FD −2 x/π . Differentiation now yields

5 " 6 5 " 6 1 1 fX (x) = FX′ (x) = √ fD 2 x/π + √ fD −2 x/π πx πx $ # √ √ 2 2 1 1 1 2 2 =√ e−(2 x/π−d ) /2ϵ + √ e−(−2 x/π−d ) /2ϵ √ πx 2π ϵ 2π ϵ 6 5 √ √ √ √ 2 2 1 2 2 = e−(2 x− π d ) /2πϵ + e−(2 x+ π d ) /2πϵ √ πϵ 2x

if x > 0 and fX (x) = 0 otherwise.

8.188 % & a) We have P N(t) = 0 = P (X > t) = e−6.9t . b) We have % & P N(2.5) = 0 | N(2) = 0 = P (X > 2.5 | X > 2) = P (X > 0.5) = e−6.9·0.5 = 0.0317,

where, in the second equality, we used the lack-of-memory property of an exponential random variable. c) Referring to the solution to part (b) yields % & % & P N(2.5) − N(2) = 0 | N(2) = 0 = P N(2.5) = 0 | N(2) = 0 = 0.0317. 8.189 a) For x > 0,

% & % & % & FX (x) = P (X ≤ x) = P − logb U ≤ x = P U ≥ b−x = 1 − FU b−x .

Differentiation now yields,

% & fX (x) = FX′ (x) = (ln b)b−x fU b−x = (ln b)b−x = (ln b)e−(ln b)x

if x > 0, and fX (x) = 0 otherwise. Thus, Y ∼ E(ln b). b) From algebra, logb y = (ln y)/(ln b). Hence, X = − logb U = −(ln b)−1 ln U which, by Example 8.22(b), is exponentially distributed with parameter ln b.

Review Exercises for Chapter 8

8-67

8.190 Note that, given X = 0, we must have Y = 0. Thus, P (4 < Y < 8 | X = 0) = 0. Furthermore, P (4 < Y < 8 | X = 1) = e−4/5 − e−8/5

and

P (4 < Y < 8 | X > 1) = e−4/8 − e−8/8 .

Applying the law of total probability yields P (4 < Y < 8) = P (X = 0)P (4 < Y < 8 | X = 0) + P (X = 1)P (4 < Y < 8 | X = 1) + P (X > 1)P (4 < Y < 8 | X > 1) 6 15 6 1 5 −4/5 1 e − e−8/5 + e−4/8 − e−8/8 = 0.122. = ·0+ 2 3 6

8.191 a) We can show that F n is a CDF by proving that it satisfies the conditions of Proposition 8.1 on page 411. Alternatively, we can proceed as follows. Let X1 , . . . , Xn be independent and identically distributed random variables having common PDF f and set U = max{X1 , . . . , Xn }. Then % & FU (u) = P (U ≤ u) = P max{X1 , . . . , Xn } ≤ u = P (X1 ≤ u, . . . , Xn ≤ u) % &n = P (X1 ≤ u) · · · P (Xn ≤ u) = F (u) = F n (u). Thus, F n is the CDF of U and, in particular, is a CDF. The corresponding PDF is % &n−1 ′ % &n−1 fU (u) = FU′ (u) = n F (u) F (u) = nf (u) F (u) .

Thus, the PDF corresponding to F n is nf F n−1 . b) We can show that 1 − (1 − F )n is a CDF by proving that it satisfies the conditions of Proposition 8.1 on page 411. Alternatively, we can proceed as follows. Let X1 , . . . , Xn be independent and identically distributed random variables having common PDF f and set V = min{X1 , . . . , Xn }. Then % & FV (v) = P (V ≤ v) = 1 − P (V > v) = 1 − P min{X1 , . . . , Xn } > v = 1 − P (X1 > v, . . . , Xn > v) = 1 − P (X1 > v) · · · P (Xn > v) % & % & % &n = 1 − 1 − P (X1 ≤ v) · · · 1 − P (Xn ≤ v) = 1 − 1 − F (v) . Thus, 1 − (1 − F )n is the CDF of V and, in particular, is a CDF. The corresponding PDF is % &n−1 % & % &n−1 fV (v) = FV′ (v) = −n 1 − F (v) −F ′ (v) = nf (v) 1 − F (v) . Thus, the PDF corresponding to 1 − (1 − F )n is nf (1 − F )n−1 .

8.192 Note that in both parts, X(t) is a continuous random variable with a symmetric distribution. These facts imply that FX(t) (−x) = 1 − FX(t) (x) for all x ∈ R. Therefore, for x > 0, % & % & F|X(t)| (x) = P |X(t)| ≤ x = P −x ≤ X(t) ≤ x = FX(t) (x) − FX(t) (−x) = 2FX(t) (x) − 1.

Differentiation yields f|X(t)| (x) = 2fX(t) (x) if x > 0, and f|X(t)| (x) = 0 otherwise. a) In this case, √ 1 2/πt −x 2 /2σ 2 t −x 2 /2σ 2 t f|X(t)| (x) = 2 · √ = e √ e σ 2πσ t if x > 0, and f|X(t)| (x) = 0 otherwise. b) In this case, 1 1 = f|X(t)| (x) = 2 · 2t t if 0 < x < t, and f|X(t)| (x) = 0 otherwise. Thus, |X(t)| ∼ U(0, t).

8-68

CHAPTER 8

Continuous Random Variables and Their Distributions

8.193 We have FX+ (t) (x) = 0 if x < 0 and, for x ≥ 0, % & % & FX+ (t) (x) = P max{X(t), 0} ≤ x = P X(t) ≤ x = FX(t) (x).

a) Referring to Equation (8.35) on page 443, we see that, in this case, FX+ (t) (x) = 0 if x < 0 and, for x ≥ 0, # $ x + FX (t) (x) = FX(t) (x) = ) √ . σ t b) In this case, FX+ (t) (x) = 0 if x < 0,

FX+ (t) (x) = FX(t) (x) =

x+t 2t

if 0 ≤ x ≤ t, and % FX+ (t) (x) & = 1 if x ≥ t. c) We have P X+ (t) = 0 = FX+ (t) (0) − FX+ (t) (0−). In part (a), this quantity equals )(0) − 0 = 1/2, and, in part (b), it equals 1/2 − 0 = 1/2. Thus, in either case, X+ (t) is not a continuous random variable. 8.194 a) The range of V is the interval (0, ∞). Here g(v) = mv 2 /2, which is strictly increasing √ and differentiable on the range of V . The transformation method is appropriate. Note that g −1 (k) = 2k/m. We see that the range of K is the interval (0, ∞). Applying the transformation method now yields %√ & fV 2k/m 1 fV (v) fV (v) = = fK (k) = ′ √ |mv| |g (v)| m 2k/m √ 62 5" √ 2 a 2k −2bk 1 −bm( 2k/m) 2k/m e = 3/2 e = √ a m m 2k/m

if k > 0, and fK (k) = 0 otherwise. √ b) From the result of part (a), for k > 0, the PDF of K has the form c k e−2bk = ck 3/2−1 e−(2b)k , where c is a constant. It follows that K ∼ +(3/2, 2b). c) From parts (a) and (b), √ a 2 (2b)3/2 (2b)3/2 = c = = . √ +(3/2) m3/2 π/2 √ Hence, a = (4bm)3/2 / π.

8.195 a) Because a normal random variable can be negative, whereas a gestation period can’t. b) The probability that a normal random variable with parameters µ = 266 and σ 2 = 256 will be negative is essentially zero (2.30 × 10−62 ).

8.196 Let X denote tee shot distance, in yards, for PGA players (circa 1999). Then X ∼ N (272.2, 8.122 ). We apply Proposition 8.11 on page 443 and use Table I. a) We have $ # $ # 260 − 272.2 280 − 272.2 −) = )(0.96) − )(−1.50) P (260 ≤ X ≤ 280) = ) 8.12 8.12 % & = )(0.96) − 1 − )(1.50) = 0.7647. Thus, 76.5% of PGA player tee shots go between 260 and 280 yards.

8-69

Review Exercises for Chapter 8

b) We have #

300 − 272.2 P (X > 300) = 1 − P (X ≤ 300) = 1 − ) 8.12

$

= 1 − )(3.42) = 1 − 0.9997 = 0.0003.

Thus, 0.03% of PGA player tee shots go more than 300 yards. 8.197 Let X denote height, in inches, of men in their 20s. Then X ∼ N (69, 2.52 ). We apply Proposition 8.11 on page 443 and use Table I. a) We have $ # $ # 68 − 69 73 − 69 −) = )(1.6) − )(−0.4) P (68 ≤ X ≤ 73) = ) 2.5 2.5 % & = )(1.6) − 1 − )(0.4) = 0.6006. b) We need to choose c so that

#

(69 + c) − 69 0.80 = P (69 − c ≤ X ≤ 69 + c) = ) 2.5 = )(c/2.5) − )(−c/2.5) = 2)(c/2.5) − 1.

$

#

(69 − c) − 69 −) 2.5

$

Hence, c = 2.5)−1 (0.9) = 2.5 · 1.28 = 3.2. We interpret this by stating that 80% of men in their 20s have heights within 3.2 inches of the mean height of 69 inches. 8.198 Let X denote the number of the 100 adults selected who are in favor of strict gun control laws. Then X ≈ B(100, 0.47), the approximate symbol reflecting the fact that the exact distribution is H(N, 100, 0.47), where N is the size of the adult population of the southwestern city in question. Note that np = 100 · 0.47 = 47 and np(1 − p) = 100 · 0.47 · 0.53 = 24.91. Thus, X ≈ N (47, 24.91). a) We have 1 2 e−(47−47) /(2·24.91) = 0.0799. P (X = 47) = pX (47) ≈ √ 2π · 24.91 b) We have P (X = 46, 47, or 48) =

48 )

48 ) 1 2 pX (x) ≈ √ e−(x−47) /(2·24.91) = 0.237. 2π · 24.91 x=46 x=46

8.199 Let X denote the number of the 300 Mexican households selected that are living in poverty. Then X ≈ B(300, 0.26), the approximate symbol reflecting the fact that the exact distribution is actually H(N, 300, 0.26), where N is the number of Mexican households. Note that np = 300 · 0.26 = 78 and np(1 − p) = 300 · 0.26 · 0.74 = 57.72. Consequently, X ≈ N (78, 57.72). a) Here we want the probability that 26% of the Mexican households sampled are living in poverty. As 26% of 300 is 78, the required probability is 1 2 P (X = 78) = pX (78) ≈ √ e−(78−78) /(2·57.72) = 0.0525. 2π · 57.72

b) Here we want the probability that between 25% and 27%, inclusive, of the Mexican households sampled are living in poverty. As 25% of 300 is 75 and 27% of 300 is 81, the required probability is P (75 ≤ X ≤ 81) =

81 )

x=75

pX (x) ≈ √

1

81 )

2π · 57.72 x=75

2 /(2·57.72)

e−(x−78)

= 0.355.

8-70

CHAPTER 8

Continuous Random Variables and Their Distributions

8.200 We have X ∼ U(0, 1) and Y = ⌊100X⌋. Note that the range of Y is the set S = {00, 01, . . . , 99}. For y ∈ S, pY (y) = P (Y = y) = P (⌊100X⌋ = y) = P (y ≤ 100X < y + 1) $ # y+1 y y−1 1 y ≤X< = − = . =P 100 100 100 100 100 Thus, Y has the discrete uniform distribution on {00, 01, . . . , 99}. 8.201 We have 4 4 ∞ xfX (x) dx = 0

= Also, 4 ∞ 0

2

x fX (x) dx = =

∞ 0

λα α−1 −λx λα +(α + 1) x x e dx = · +(α) +(α) λα+1

+(α + 1) α+(α) α ·1= · α+1 = . α+1 +(α) +(α) λ λ λ λα

4

∞ 0

λα

·

4

λα α−1 −λx λα +(α + 2) x e dx = x · +(α) +(α) λα+2 2

∞ 0

4

λα+1 x (α+1)−1 e−λx dx +(α + 1)

∞ 0

λα+2 x (α+2)−1 e−λx dx +(α + 2)

+(α + 2) (α + 1)α+(α) α(α + 1) ·1= · = . α+2 α+2 +(α) +(α) λ λ λ2 λα

λα

·

8.202 a) We can write

1 12 2−1 −1x xe−x = x e (2 − 1)! +(2) if x > 0 and fX (x) = 0 otherwise. Hence, X ∼ +(2, 1). b) From part (a) and Equation (8.49) on page 453, the CDF of X is given by FX (x) = 0 if x < 0, and fX (x) = xe−x =

FX (x) = 1 − e−x

2−1 j ) x

j =0

j!

= 1 − e−x (1 + x),

x ≥ 0.

c) Referring to the result of part (b), we find that the proportion of claims in which the claim amount exceeds 1000 is P (X > 1) = 1 − FX (1) = e−1 (1 + 1) = 2e−1 . Hence the company should expect to pay 2e−1 × 100 ≈ 73.6 claims.

Theory Exercises 8.203 a) Suppose that X is a symmetric random variable, that is, X and −X have the same probability distribution. Then, for all x ∈ R, P (X ≤ −x) = P (−X ≤ −x) = P (X ≥ x).

Conversely, suppose that P (X ≤ −x) = P (X ≥ x) for all x ∈ R. Then

F−X (x) = P (−X ≤ x) = P (X ≥ −x) = P (X ≤ x) = FX (x)

for all x ∈ R. Hence, X and −X have the same probability distribution, that is, X is a symmetric random variable.

8-71

Review Exercises for Chapter 8

b) From part (a), the complementation rule, and Equation (8.7) on page 412, we see that the following conditions are equivalent: ● ● ● ●

X is a symmetric random variable. P (X ≤ −x) = P (X ≥ x) for all x ∈ R.

P (X ≤ −x) = 1 − P (X < x) for all x ∈ R. FX (−x) = 1 − FX (x−) for all x ∈ R.

The last bulleted item provides the required equivalent condition. c) Referring to part (b) and Proposition 8.2(b) on page 412 yields % & P (|X| ≤ x) = P (−x ≤ X ≤ x) = FX (x) − FX (−x−) = FX (x) − 1 − FX (x) = 2FX (x) − 1.

d) From part (a), P (X ≤ 0) = P (X ≥ 0). Thus,

1 = P (X < 0) + P (X ≥ 0) ≤ P (X ≤ 0) + P (X ≥ 0) = 2P (X ≥ 0). Therefore P (X ≥ 0) ≥ 1/2. Furthermore, P (X ≤ 0) = P (X ≥ 0) ≥ 1/2. Hence, 0 is a median of X. 8.204 Let x ∈ R. Referring to Exercise 8.203(a) and making the substitution u = −t yields 4 ∞ 4 x FX (x) = P (X ≤ x) = P (X ≥ −x) = fX (t) dt = fX (−u) du. −x

−∞

Applying Proposition 8.5 on page 422 now shows that the function g on R defined by g(x) = fX (−x) is a PDF of X. In other words, we can write fX (−x) = fX (x) for all x ∈ R, meaning that we can choose a PDF for X that is a symmetric function. 8.205 a) Referring to Exercise 8.203(a), we see that, for each x ∈ R,

FX (c − x) = P (X ≤ c − x) = P (X − c ≤ −x) = P (X − c ≥ x) = 1 − P (X − c < x) = 1 − P (X − c ≤ x) = 1 − P (X ≤ c + x) = 1 − FX (c + x).

Thus, a continuous random variable, X, is symmetric about c if and only if FX (c − x) = 1 − FX (c + x),

x ∈ R.

b) We have & & d % d % FX (c − x) = −fX (c − x) and 1 − FX (c + x) = −fX (c + x). dx dx Referring now to the equivalent condition in part (a), we see that a continuous random variable, X, with a PDF is symmetric about c if and only if we can write fX (c − x) = fX (c + x) for all x ∈ R, meaning that we can choose a PDF for X that is a symmetric function about c. 8.206 In answering each part, draw a graph of the PDF of the random variable in question and refer to Exercise 8.205(b). a) X is symmetric about (a + b)/2. b) X is not symmetric about any number c. c) X is symmetric about µ. d) X is not symmetric about any number c. e) From the last bulleted item on page 457, we see that X is symmetric about some number c if and only if α = β, in which case c = 1/2.

8-72

CHAPTER 8

Continuous Random Variables and Their Distributions

f) X is symmetric about (a + b)/2. g) X is symmetric about η. 8.207 a) By defintion, X − c is a symmetric random variable. Therefore, by Exercise 8.203(d), we know that 0 is a median of X − c. Consequently, P (X ≤ c) = P (X − c ≤ 0) ≥

1 2

and

P (X ≥ c) = P (X − c ≥ 0) ≥

1 . 2

Hence, c is a median of X. b) Referring to part (a) and the results of Exercise 8.206 shows that the medians of the four specified distributions are (a + b)/2, µ, (a + b)/2, and η, respectively.

Advanced Exercises 8.208 In each case, we must show that the specified function satisfies the four properties of Proposition 8.1 on page 411. a) Let x ≤ y. We make the substitution u = t + (y − x) and use the fact that F is nondecreasing to conclude that 4 4 4 & 1 y+h % 1 y+h 1 x+h F (t) dt = F u − (y − x) du ≤ F (u) du = G(y). G(x) = h x h y h y Hence, G is nondecreasing. We can write

1 G(x) = h

4

x

x+h

1 F (t) dt = h

#4

0

x+h

F (t) dt −

4

x

$

F (t) dt . 0

From calculus, the two functions of x within the parentheses are everywhere continuous and, hence, so is their difference. Thus, G is everywhere continuous and, in particular, everywhere right continuous. Let ϵ > 0 be given. Because F (−∞) = 0, we can choose M so that F (t) < ϵ whenever t < M. Then, for x < M − h, 4 4 1 x+h 1 x+h G(x) = F (t) dt ≤ ϵ dt = ϵ. h x h x

Thus, G(−∞) = 0. Let ϵ > 0 be given. Because F (∞) = 1, we can choose M so that F (t) > 1 − ϵ whenever t > M. Then, for x > M, 4 4 1 x+h 1 x+h G(x) = F (t) dt ≥ (1 − ϵ) dt = 1 − ϵ. h x h x

Thus, G(∞) = 1. b) Let x ≤ y. We make the substitution u = t + (y − x) and use the fact that F is nondecreasing to conclude that 4 x+h 4 y+h 4 y+h % & 1 1 1 H (x) = F (t) dt = F u − (y − x) du ≤ F (u) du = H (y). 2h x−h 2h y−h 2h y−h Hence, H is nondecreasing.

8-73

Review Exercises for Chapter 8

We can write 1 H (x) = 2h

4

1 F (t) dt = 2h

x+h

x−h

#4

x+h

F (t) dt −

0

4

$

x−h

F (t) dt . 0

From calculus, the two functions of x within the parentheses are everywhere continuous and, hence, so is their difference. Thus, H is everywhere continuous and, in particular, everywhere right continuous. Let ϵ > 0 be given. Because F (−∞) = 0, we can choose M so that F (t) < ϵ whenever t < M. Then, for x < M − h, 1 2h

H (x) =

4

x−h

4

1 2h

x+h

F (t) dt ≤

x+h

x−h

ϵ dt = ϵ.

Thus, H (−∞) = 0. Let ϵ > 0 be given. Because F (∞) = 1, we can choose M so that F (t) > 1 − ϵ whenever t > M. Then, for x > M + h, 1 H (x) = 2h

4

x+h

x−h

1 F (t) dt ≥ 2h

4

x+h

x−h

(1 − ϵ) dt = 1 − ϵ.

Thus, H (∞) = 1. 8.209 a) Consider the following graph:

fY

np

x – 12

x

x + 12

The height of the dashed line is fY (x), which also equals the area of the crosshatched rectangle. However, the area approximately equals the shaded area under the normal curve, % of the crosshatched rectangle & or P x − 1/2 < Y < x + 1/2 . % & b) Now let Y ∼ N np, np(1 − p) . Referring to Equation (8.39) on page 445, part (a), and Proposition 8.11 on page 443, we get #

x + 1/2 − np pX (x) ≈ fY (x) ≈ P x − 1/2 < Y < x + 1/2 = ) √ np(1 − p) for x = 0, 1, . . . , n.

%

&

$

#

$ x − 1/2 − np −) √ , np(1 − p)

8-74

CHAPTER 8

Continuous Random Variables and Their Distributions

From the FPF and the preceding relation, we now conclude that $ # $= b < # ) x + 1/2 − np x − 1/2 − np P (a ≤ X ≤ b) = pX (x) ≈ ) √ −) √ np(1 − p) np(1 − p) x=a x=a # $ # $ b + 1/2 − np a − 1/2 − np =) √ −) √ . np(1 − p) np(1 − p) b )

c) Here n = 50 and p = 0.4. Hence,

#

21 + 1/2 − 50 · 0.4 P (X = 21) = P (21 ≤ X ≤ 21) ≈ ) √ 50 · 0.4 · 0.6

$

#

21 − 1/2 − 50 · 0.4 −) √ 50 · 0.4 · 0.6

$

= 0.1101,

where, for better results, we used statistical software, rather than Table I, to evaluate the two values of ). Also, $ # $ # 19 − 1/2 − 50 · 0.4 21 + 1/2 − 50 · 0.4 −) = 0.3350. P (19 ≤ X ≤ 21) ≈ ) √ √ 50 · 0.4 · 0.6 50 · 0.4 · 0.6

In Example 8.15, where we used the local De Moivre–Laplace theorem, the probabilities obtained were 0.110 and 0.336, respectively, or to four decimal places, 0.1105 and 0.3362. d) Suppose that you want to use a normal distribution to approximate the probability that a binomial random variable takes a value among m consecutive nonnegative integers. Using the local De Moivre– Laplace theorem, m values of a normal PDF must be calculated and summed, whereas, using the integral De Moivre–Laplace theorem, only two values of the standard normal CDF must be calculated and subtracted. Clearly, then, when m is large, use of the integral De Moivre–Laplace theorem is preferable to that of the local De Moivre–Laplace theorem. 8.210 Answers will vary, but here are some possibilities. a) If X ∼ U(0, 1), then X has a unique pth quantile equal to p. b) If X ∼ B(1, 1 − p), then every number in [0, 1] is a pth quantile of X. 8.211 a) Suppose that ξp is a pth quantile of X. Then

FX (ξp −) = P (X < ξp ) = 1 − P (X ≥ ξp ) ≤ 1 − (1 − p) = p

and FX (ξp ) = P (X ≤ ξp ) ≥ p. Conversely, suppose that FX (ξp −) ≤ p ≤ FX (ξp ). Then we have P (X ≤ ξp ) = FX (ξp ) ≥ p and P (X ≥ ξp ) = 1 − P (X < ξp ) = 1 − FX (ξp −) ≥ 1 − p.

Thus, ξp is a pth quantile of X. b) The conditions P (X < ξp ) ≤ p and P (X > ξp ) ≤ 1 − p are precisely the same as the conditions FX (ξp −) ≤ p and FX (ξp ) ≥ p, respectively. Therefore, the equivalence follows from part (a). 8.212 Let X = IE . We have / 0, if x < 0; FX (x) = 1 − P (E), if 0 ≤ x < 1; 1, if x ≥ 1.

FX (x−) =

/

0, 1 − P (E), 1,

if x ≤ 0; if 0 < x ≤ 1; if x > 1.

We use the condition for being a pth quantile from Exercise 8.211(a). It follows easily that no real number less than 0 or greater than 1 can be a pth quantile of X. For other values of x, we consider the cases x = 0, x = 1, and 0 < x < 1 separately.

Review Exercises for Chapter 8

8-75

We see that x = 0 is a pth quantile of X if and only if 0 = FX (0−) ≤ p ≤ FX (0) = 1 − P (E), which is true if and only if P (E) ≤ 1 − p. We see that x = 1 is a pth quantile of X if and only if 1 − P (E) = FX (1−) ≤ p ≤ FX (1) = 1, which is true if and only if P (E) ≥ 1 − p. If 0 < x < 1, then x is a pth quantile of X if and only if 1 − P (E) = FX (x−) ≤ p ≤ FX (x) = 1 − P (E), which is true if and only if P (E) = 1 − p. Consequently, if P (E) < 1 − p, then 0 is the unique pth quantile of X; if P (E) = 1 − p, then all numbers between 0 and 1, inclusive, are pth quantiles of X; and if P (E) > 1 − p, then 1 is the unique pth quantile of X. 8.213 a) We know that FX (−∞) = 0 and FX (∞) = 1. Furthermore, because X is a continuous random variable, FX is everywhere continuous. Hence, the intermediate value theorem guarantees the existence of an x ∈ R such that FX (x) = p. b) Because FX is everywhere continuous, we have FX (x−) = FX (x) for all x ∈ R. Referring to Exercise 8.211(a), we conclude that ξp is a pth quantile of X if and only if FX (ξp ) ≤ p ≤ FX (ξp ), that is, if and only if FX (ξp ) = ξp . 8.214 Let the range of X be the interval (a, b), where −∞ ≤ a < b ≤ ∞. Note that we have FX (a) = 0 and FX (b) = 1. Because X is a continuous random variable, its CDF is everywhere continuous. Applying the intermediate value theorem, we conclude that there is a number ξp , with a < ξp < b, such that FX (ξp ) = p and, because FX is strictly increasing on the range of X, we know that ξp is the unique such number. In view of Exercise 8.213(b), we can now conclude that FX−1 (p) is the unique pth quantile of X. 8.215 a) Here the CDF of X is FX (x) = x for 0 ≤ x < 1. Therefore, FX−1 (y) = y for y ∈ (0, 1). Thus, the unique pth percentile of X is given by ξp = FX−1 (p) = p. b) Here the CDF of X is FX (x) = 1 − e−λx for x ≥ 0. Hence, FX−1 (y) = −λ−1 ln(1 − y) for y ∈ (0, 1). Thus, the unique pth percentile of X is given by ξp = FX−1 (p) = −λ−1 ln(1 − p). 8.216 Let Ap = { x : FX (x) ≥ p } and note that, as FX (−∞) = 0 and FX (∞) = 1, Ap is nonempty and bounded below. Set ξp = inf Ap . We claim that ξp is a pth quantile of X. Let {xn }∞ n=1 be a decreasing sequence of elements of Ap such that limn→∞ xn = ξp . Applying the right continuity of FX , we get P (X ≤ ξp ) = FX (ξp ) = lim FX (xn ) ≥ p. n→∞

Next let {xn }∞ n=1 be an increasing sequence of real numbers such that lim n→∞ xn = ξp . For each n we have xn < ξp and hence xn ∈ / Ap which, in turn, implies that FX (xn ) < p. Therefore,

∈ N,

P (X < xp ) = FX (ξp −) = lim FX (xn ) ≤ p. n→∞

Consequently, P (X ≥ ξp ) = 1 − P (X < ξp ) ≥ 1 − p.

We have now shown that P (X ≤ ξp ) ≥ p and P (X ≥ ξp ) ≥ 1 − p. Thus, ξp is a pth quantile of X.

8-76

CHAPTER 8

Continuous Random Variables and Their Distributions

8.217 Let X denote loss incurred. Then X ∼ E(1/300). We want to find the 95th percentile of the random variable X|X>100 . In view of Exercise 8.213(b), this means that we need to determine x such that 0.95 = FX|X>100 (x) = P (X ≤ x | X > 100),

or, equivalently, P (X > x | X > 100) = 0.05. Applying the lack-of-memory property of exponential random variables, we get, for x > 100, P (X > x | X > 100) = P (X > x − 100) = e−(x−100)/300 .

Setting this last expression equal to 0.05 and solving for x, we get x = 100 − 300 ln(0.05) = 998.72.

CHAPTER NINE

Instructor’s Solutions Manual Anna Amirdjanova University of Michigan, Ann Arbor

Neil A. Weiss Arizona State University

FOR A Course In

Probability Neil A. Weiss Arizona State University

Boston London Mexico City

Toronto Munich

San Francisco Sydney Paris

New York

Tokyo

Cape Town

Singapore

Madrid

Hong Kong

Montreal

Publisher: Greg Tobin Editor-in-Chief: Deirdre Lynch Associate Editor: Sara Oliver Gordus Editorial Assistant: Christina Lepre Production Coordinator: Kayla Smith-Tarbox Senior Author Support/Technology Specialist: Joe Vetere Compositor: Anna Amirdjanova and Neil A. Weiss Accuracy Checker: Delray Schultz Proofreader: Carol A. Weiss

Copyright © 2006 Pearson Education, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.

Chapter 9

Jointly Continuous Random Variables 9.1 Joint Cumulative Distribution Functions Basic Exercises 9.1 For u, v ∈ R,

FU,V (u, v) = P (U ≤ u, V ≤ v) = P (a + bX ≤ u, c + dY ≤ v) ! " ! " = P X ≤ (u − a)/b, Y ≤ (v − c)/d = FX,Y (u − a)/b, (v − c)/d .

9.2 If not both u and v are positive, then FU,V (u, v) = 0. For u, v > 0, we apply Proposition 9.1 on page 488 and use the fact that X and Y are continuous random variables to get ! " FU,V (u, v) = P (U ≤ u, V ≤ v) = P X2 ≤ u, Y 2 ≤ v ! √ √ √ √ " = P − u ≤ X ≤ u, − v ≤ Y ≤ v !√ √ " !√ ! √ √ " ! √ √ " √ " = FX,Y u, v − FX,Y u, − v − FX,Y − u, v + FX,Y − u, − v . 9.3 a) For 0 ≤ x < 1, we use Proposition 9.2 and Example 9.1 to get

FX (x) = lim FX,Y (x, y) = x. y→∞

Hence, FX (x) =

#

0, x, 1,

if x < 0; if 0 ≤ x < 1; x ≥ 0.

b) X ∼ U(0, 1) c) By symmetry, Y has the same probability distribution as X and, hence, the same CDF. From part (b), we see that Y ∼ U(0, 1). 9.4 Let ! denote the upper half of the unit disk. a) Clearly, FX (x) = 0 if x < −1, and FX (x) = 1 if x ≥ 1. For −1 ≤ x < 1, $x √ % 'x 2 |{X ≤ x}| 2 1 & −1 1 − t dt 2 FX (x) = P (X ≤ x) = = = · t 1 − t + arcsin t |!| π/2 π 2 −1 ) 1 ( & ) 1 1( & π x 1 − x 2 + arcsin x + = + x 1 − x 2 + arcsin x . = π 2 2 π

9-1

9-2

CHAPTER 9

Jointly Continuous Random Variables

b) Clearly, FY (y) = 0 if y < 0, and FY (y) = 1 if y ≥ 1. For 0 ≤ y < 1, $y √ 2 1 − t 2 dt |{Y ≤ y}| = 0 FY (y) = P (Y ≤ y) = |!| π/2 'y , %& * + 2 2 2 2 = · t 1 − t + arcsin t = y 1 − y + arcsin y . π π 0

c) Clearly, FX,Y (x, y) = 0 if x < −1 or y < 0, and FX,Y (x, y) = 1 if x ≥ 1 and y ≥ 1. For the other possibilities, we consider five separate cases. √ Case 1: −1 ≤ x < 1 and 0 ≤ y < 1 − x 2 . ) & |{X ≤ x, Y ≤ y}| 2 y( FX,Y (x, y) = P (X ≤ x, Y ≤ y) = = x + 1 − t 2 dt |!| π 0 . * + ,/ * , + 2 1 1 = y 1 − y 2 + arcsin y = 2xy + y 1 − y 2 + arcsin y . xy + 2 π π Case 2: −1 ≤ x < 1 and y ≥ 1.

|{X ≤ x, Y ≤ y}| 2 x& = 1 − t 2 dt FX,Y (x, y) = P (X ≤ x, Y ≤ y) = |!| π −1 . / ) ) π 1 1( & 2 1( & x 1 − x 2 + arcsin x + = + x 1 − x 2 + arcsin x . = π 2 2 2 π

Case 3: 0 ≤ x < 1 and

√ 1 − x 2 ≤ y < 1.

|{X ≤ x, Y ≤ y}| FX,Y (x, y) = P (X ≤ x, Y ≤ y) = |!| ⎛ √ ⎞ - 1−x 2 ( - y ) & & 2⎝ = x + 1 − t 2 dt + √ 2 1 − t 2 dt ⎠ 2 π 0 1−x * & , + & 1 = x 1 − x 2 − arcsin 1 − x 2 + 2y 1 − y 2 + 2 arcsin y . π

Case 4: −1 ≤ x < 0 and



1 − x 2 ≤ y < 1.

|{X ≤ x, Y ≤ y}| FX,Y (x, y) = P (X ≤ x, Y ≤ y) = |!| ⎛ √ ⎞ - 1−x 2 ( ) ) & & 1( & 2⎝ x + 1 − t 2 dt ⎠ = x 1 − x 2 + arcsin 1 − x 2 . = π π 0

Case 5: x ≥ 1 and 0 ≤ y < 1.

|{X ≤ x, Y ≤ y}| FX,Y (x, y) = P (X ≤ x, Y ≤ y) = |!| *- y & * + , , 2 2 2 2 = y 1 − y + arcsin y . 2 1 − t dt = π π 0

9.1

Joint Cumulative Distribution Functions

9-3

d) Clearly, FX (x) = 0 if x < −1, and FX (x) = 1 if x ≥ 1. For −1 ≤ x < 1, we apply Proposition 9.2 and refer to Case 2 of part (c) to get FX (x) = lim FX,Y (x, y) = y→∞

) 1 1( & + x 1 − x 2 + arcsin x , 2 π

which agrees with the answer obtained in part (a). e) Clearly, FY (y) = 0 if y < 0, and FY (y) = 1 if y ≥ 1. For 0 ≤ y < 1, we apply Proposition 9.2 and refer to Case 5 of part (c) to get , * + 2 2 FY (y) = lim FX,Y (x, y) = y 1 − y + arcsin y , x→∞ π which agrees with the answer obtained in part (b). 9.5 Here the sample space is ! = {(0, 0), (1, 0), (0, 1), (1, 1)}. a) Clearly, FX,Y (x, y) = 0 if x < 0 or y < 0, and FX,Y (x, y) = 1 if x ≥ 1 and y ≥ 1. For the other possibilities, we consider three separate cases. Case 1: 0 ≤ x < 1 and 0 ≤ y < 1. FX,Y (x, y) = P (X ≤ x, Y ≤ y) =

N({X ≤ x, Y ≤ y}) N({(0, 0)}) 1 = = . N(!) 4 4

Case 2: 0 ≤ x < 1 and y ≥ 1. FX,Y (x, y) = P (X ≤ x, Y ≤ y) =

N({X ≤ x, Y ≤ y}) N({(0, 0), (0, 1)}) 2 1 = = = . N(!) 4 4 2

Case 3: x ≥ 1 and 0 ≤ y < 1. FX,Y (x, y) = P (X ≤ x, Y ≤ y) = Thus,

N({X ≤ x, Y ≤ y}) N({(0, 0), (1, 0)}) 2 1 = = = . N(!) 4 4 2

⎧ 0, if x < 0 or y < 0; ⎪ ⎨ 1/4, if 0 ≤ x < 1 and 0 ≤ y < 1; FX,Y (x, y) = ⎪ ⎩ 1/2, if 0 ≤ x < 1 and y ≥ 1, or if x ≥ 1 and 0 ≤ y < 1; 1, if x ≥ 1 and y ≥ 1.

b) For 0 ≤ x < 1, we apply Proposition 9.2 and refer to Case 2 of part (a) to get FX (x) = lim FX,Y (x, y) = y→∞

Thus, FX (x) =

#

0, 1/2, 1,

1 . 2

if x < 0; if 0 ≤ x < 1; if x ≥ 1.

c) The CDF obtained in part (b) is that of a Bernoulli random variable with parameter 1/2. Hence, X has that probability distribution.

9-4

CHAPTER 9

Jointly Continuous Random Variables

d) For 0 ≤ y < 1, we apply Proposition 9.2 and refer to Case 3 of part (a) to get FY (y) = lim FX,Y (x, y) = x→∞

Thus, FY (y) =

# 0,

1/2, 1,

1 . 2

if y < 0; if 0 ≤ y < 1; if y ≥ 1.

This CDF is that of a Bernoulli random variable with parameter 1/2. Hence, Y has that probability distribution. e) From parts (a), (b), and (d), we see that FX,Y (x, y) = FX (x)FY (y) for all x, y ∈ R.

9.6 By assumption, the marginal PMFs of X and Y are as follows: # # 1 − p, 1 − p, if x = 0; pX (x) = p, and pY (y) = p, if x = 1; 0, 0, otherwise.

if y = 0; if y = 1; otherwise.

a) Using the marginal PMFs of X and Y , the assumed independence, and Proposition 6.10 on page 292, we get the joint PMF of X and Y : ⎧ ⎨ (1 − p)2 , if x = 0 and y = 0; pX,Y (x, y) = pX (x)pY (y) = p(1 − p), if x = 0 and y = 1, or if x = 1 and y = 0; ⎩ 2 if x = 1 and y = 1. p , b) From the FPF for two discrete random variables, Proposition 6.3 on page 265, 88 pX,Y (s, t). FX,Y (x, y) = P (X ≤ x, Y ≤ y) = s≤x, t≤y

Clearly, FX,Y (x, y) = 0 if x < 0 or y < 0, and FX,Y (x, y) = 1 if x ≥ 1 and y ≥ 1. For the other possibilities, we consider three separate cases. Case 1: 0 ≤ x < 1 and 0 ≤ y < 1. FX,Y (x, y) = Case 2: 0 ≤ x < 1 and y ≥ 1. FX,Y (x, y) = Case 3: x ≥ 1 and 0 ≤ y < 1. FX,Y (x, y) = Thus,

88 s≤x, t≤y

pX,Y (s, t) = (1 − p)2 .

88

pX,Y (s, t) = (1 − p)2 + p(1 − p) = 1 − p.

88

pX,Y (s, t) = (1 − p)2 + p(1 − p) = 1 − p.

s≤x, t≤y

s≤x, t≤y

⎧ 0, ⎪ ⎨ (1 − p)2 , FX,Y (x, y) = ⎪ ⎩ 1 − p, 1,

if x < 0 or y < 0; if 0 ≤ x < 1 and 0 ≤ y < 1; if 0 ≤ x < 1 and y ≥ 1, or if x ≥ 1 and 0 ≤ y < 1; if x ≥ 1 and y ≥ 1.

9.1

9-5

Joint Cumulative Distribution Functions

c) For 0 ≤ x < 1, we apply Proposition 9.2 and refer to Case 2 of part (b) to get FX (x) = lim FX,Y (x, y) = 1 − p. y→∞

Thus,

#

0, if x < 0; 1 − p, if 0 ≤ x < 1; 1, if x ≥ 1. Of course, Y has the same CDF as X, because these two random variables have the same probability distribution. d) Applying the FPF for one discrete random variable, Proposition 5.2 on page 190, to the PMF of X, obtained at the beginning of the solution to this exercise, yields # 0, if x < 0; 8 FX (x) = P (X ≤ x) = pX (s) = 1 − p, if 0 ≤ x < 1; s≤x 1, if x ≥ 1. FX (x) =

Of course, Y has the same CDF as X, because these two random variables have the same probability distribution. e) Using the assumed independence of X and Y , and the result of part (d) yields FX,Y (x, y) = P (X ≤ x, Y ≤ y) = P (X ≤ x)P (Y ≤ y) ⎧ 0, if x < 0 or y < 0; ⎪ ⎨ (1 − p)2 , if 0 ≤ x < 1 and 0 ≤ y < 1; = FX (x)FY (y) = if 0 ≤ x < 1 and y ≥ 1, or if x ≥ 1 and 0 ≤ y < 1; ⎪ ⎩ 1 − p, 1, if x ≥ 1 and y ≥ 1.

f) The answers in parts (b) and (e) are identical.

9.7 a) From Exercise 8.70(a), the CDFs of X and Y are 9 0, if x < 0; and FX (x) = −λx , if x ≥ 0. 1−e

FY (y) =

9

0, 1 − e−µy ,

b) Referring to Definition 6.4 on page 291 and then to part (a), we get

FX,Y (x, y) = P (X ≤ x, Y ≤ y) = P (X ≤ x)P (Y ≤ y) 9 0, = FX (x)FY (y) = 1 − e−λx − e−µy + e−λx−µy ,

if y < 0; if y ≥ 0.

if x < 0 or y < 0; if x ≥ 0 and y ≥ 0.

9.8 Suppose that F is a joint CDF, say, of the random variables X and Y . Then, by Proposition 9.1 on page 488, P (0 < X ≤ 1, 0 < Y ≤ 1) = FX,Y (1, 1) − FX,Y (1, 0) − FX,Y (0, 1) + FX,Y (0, 0) = (1 − e−2 ) − (1 − e−1 ) − (1 − e−1 ) + (1 − e−0 )

= 2e−1 − e−2 − 1 < 0,

which is impossible. Hence, F is not a joint CDF. 9.9 a) We have

! " FV (v) = P (V ≤ v) = P max{X, Y } ≤ v = P (X ≤ v, Y ≤ v) = FX,Y (v, v).

9-6

CHAPTER 9

Jointly Continuous Random Variables

b) Applying the general addition rule, we get ! " FU (u) = P (U ≤ u) = P min{X, Y } ≤ u = P ({X ≤ u} ∪ {Y ≤ u}) = P (X ≤ u) + P (Y ≤ u) − P (X ≤ u, Y ≤ u) = FX (u) + FY (u) − FX,Y (u, u). c) From part (b) and Proposition 9.2 on page 490, FU (u) = FX (u) + FY (u) − FX,Y (u, u) = FX,Y (u, ∞) + FX,Y (∞, u) − FX,Y (u, u). 9.10 a) We have

! " FV (v) = P (V ≤ v) = P max{X1 , . . . , Xm } ≤ v = P (X1 ≤ v, . . . , Xm ≤ v) = FX1 ,...,Xm (v, . . . , v).

b) Applying the inclusion-exclusion principle yields *: , m ! " FU (u) = P (U ≤ u) = P min{X1 , . . . , Xm } ≤ u = P {Xn ≤ u} n=1

=

m 8 k=1

P (Xk ≤ u) −

+ (−1)n+1

8

k1 0; ∂2 fX,Y (x, y) = FX,Y (x, y) = λµe ∂x ∂y 0, otherwise. b) As X ∼ E(λ) and Y ∼ E(µ), 9 9 −λx , if x > 0; µe−µy , if y > 0; fX (x) = λe and fY (y) = 0, otherwise. 0, otherwise.

Because e−λx e−µy = e−λx−µy , we see that fX,Y (x, y) = fX (x)fY (y) for all x, y ∈ R. In words, the joint PDF of X and Y equals the product of their marginal PDFs.

9.2

9-13

Introducing Joing Probability Density Functions

c) Answers will vary, but you should not be surprised at the multiplicative relationship between the joint PDF and the marginal PDFs, as the multiplication rule has previously applied in many contexts of independence. 9.25 Recall that the CDf and PDF of an E(λ) distribution are given by 9 9 −λx , if x > 0; 0, if x < 0; and f (x) = λe F (x) = −λx , if x ≥ 0. 1−e 0, otherwise. Thus, from the general result obtained in Example 9.4, which begins on page 497, we see that ! "n−2 ! "n−2 = n(n − 1)λ2 e−λ(x+y) e−λx − e−λy , fX,Y (x, y) = n(n − 1)f (x)f (y) F (y) − F (x)

if 0 < x < y, and fX,Y (x, y) = 0 otherwise.

9.26 a) We have, in view of Definition 9.2 on page 495, P (1/4 < X < 3/4, 1/2 < Y < 1) = P (1/4 ≤ X ≤ 3/4, 1/2 ≤ Y ≤ 1) - 3/4 - 1 - 3/4 = fX,Y (x, y) dx dy = 1/4

=

-

1/2

3/4 1/4

.-

/

1 1/2

(x + y) dy dx =

1/4

1 1/2

(x + y) dx dy

5 . 16

b) Clearly, FX,Y (x, y) = 0 if x < 0 or y < 0, and FX,Y (x, y) = 1 if x ≥ 1 and y ≥ 1. For 0 ≤ x < 1 and 0 ≤ y < 1, FX,Y (x, y) = =

-

-

x

y

fX,Y (s, t) ds dt =

−∞ −∞ x

0

*-

,

y

0

(s + t) dt ds =

-

x

0

x 0

-

y 0

(s + t) ds dt

( ) 1 sy + y 2 /2 ds = xy(x + y). 2

For 0 ≤ x < 1 and y ≥ 1, FX,Y (x, y) = =

-

x

-

y

−∞ −∞ x 0

.-

1 0

fX,Y (s, t) ds dt = /

(s + t) dt ds =

-

0

-

x 0

-

1 0

(s + t) ds dt

x

(s + 1/2) ds =

1 x(x + 1). 2

By symmetry, for x ≥ 1 and 0 ≤ y < 1, we have FX,Y (x, y) = y(y + 1)/2. Thus, ⎧ 0, ⎪ ⎪ ⎪ ⎨ xy(x + y)/2, FX,Y (x, y) = x(x + 1)/2, ⎪ ⎪ ⎪ ⎩ y(y + 1)/2, 1,

if x < 0 or y < 0; if 0 ≤ x < 1 and 0 ≤ y < 1; if 0 ≤ x < 1 and y ≥ 1; if x ≥ 1 and 0 ≤ y < 1; if x ≥ 1 and y ≥ 1.

9-14

CHAPTER 9

Jointly Continuous Random Variables

c) We have, in view of Proposition 9.1 on page 488, P (1/4 < X < 3/4, 1/2 < Y < 1) = P (1/4 < X ≤ 3/4, 1/2 < Y ≤ 1) = FX,Y (3/4, 1) − FX,Y (3/4, 1/2)

− FX,Y (1/4, 1) + FX,Y (1/4, 1/2)

= (3/4)(3/4 + 1)/2 − (3/4)(1/2)(3/4 + 1/2)/2

− (1/4)(1/4 + 1)/2 + (1/4)(1/2)(1/4 + 1/2)/2,

which equals 5/16. This answer agrees with the one obtained in part (a).

Theory Exercises 9.27 Applying the second fundamental theorem of calculus twice and referring to Proposition 9.1 on page 488, we get, for all real numbers, a, b, c, d, with a < b and c < d, -

b a

-

d c

∂2 FX,Y (x, y) dx dy = ∂x ∂y

-

d

c

*-

a

, ∂ ∂ = FX,Y (b, y) − FX,Y (a, y) dy ∂y ∂y c , , - d* - d* ∂ ∂ = FX,Y (b, y) dy − FX,Y (a, y) dy ∂y ∂y c c ! " ! " = FX,Y (b, d) − FX,Y (b, c) − FX,Y (a, d) − FX,Y (a, c) -

d

*

, ∂2 FX,Y (x, y) dx dy ∂x ∂y

b

= FX,Y (b, d) − FX,Y (b, c) − FX,Y (a, d) + FX,Y (a, c)

= P (a < X ≤ b, c < Y ≤ d) = P (a ≤ X ≤ b, c ≤ Y ≤ d). A Therefore, from Definition 9.2 on page 495, ∂ 2 FX,Y ∂x ∂y is a joint PDF of X and Y . We can use a A similar argument to show that ∂ 2 FX,Y ∂y ∂x is a joint PDF of X and Y , or, alternatively, we can use the A A fact that ∂ 2 FX,Y ∂y ∂x = ∂ 2 FX,Y ∂x ∂y.

9.28 Suppose that there is a nonnegative function f defined on R2 such that Equation (9.21) on page 499 holds. Then FX,Y is everywhere continuous. For a < b and c < d, Proposition 9.1 on page 488 gives P (a ≤ X ≤ b, c ≤ Y ≤ d) = FX,Y (b, d) − FX,Y (b, c) − FX,Y (a, d) + FX,Y (a, c) - b - d - b - c = f (s, t) ds dt − f (s, t) ds dt −∞ −∞

− =

-

-

b

−∞



−∞ −∞

d

−∞ −∞

*-

-

-

a

f (s, t) ds dt +

d −∞

a −∞

f (s, t) dt −

*-

-

a

-

c

f (s, t) ds dt

−∞ −∞

c

,

f (s, t) dt ds −∞

d −∞

-

f (s, t) dt −

-

c

,

f (s, t) dt ds −∞

9.2

9-15

Introducing Joing Probability Density Functions

Therefore, P (a ≤ X ≤ b, c ≤ Y ≤ d) = =

-

b

−∞

-

a

b

*-

*-

,

d

f (s, t) dt ds −

c

,

d c

f (s, t) dt ds =

-

-

−∞

*-

b

d

a

a

-

d

,

f (s, t) dt ds c

f (s, t) ds dt. c

Hence, by Definition 9.2 on page 495, f is a joint PDF of X and Y . Conversely, suppose that X and Y have a joint PDF, fX,Y . Then, by Definition 9.2, fX,Y is a nonnegative function defined on R2 such that, for all a < b and c < d, - b- d fX,Y (s, t) ds dt. P (a ≤ X ≤ b, c ≤ Y ≤ d) = a

c

Let x, =y ∈ R and set An = {−n ≤ X ≤ x, −n ≤ Y ≤ y} for each n ∈ N . Then we have A1 ⊂ A2 ⊂ · · · and ∞ n=1 An = {X ≤ x, Y ≤ y}. Applying the continuity property of a probability measure and the previous display yields *: , ∞ An = lim P (An ) FX,Y (x, y) = P (X ≤ x, Y ≤ y) = P n→∞

n=1

= lim P (−n ≤ X ≤ x, −n ≤ Y ≤ y) = lim n→∞

=

-

x

-

x

-

y

n→∞ −n −n

y

−∞ −∞

-

fX,Y (s, t) ds dt

fX,Y (s, t) ds dt.

Thus fX,Y is a nonnegative function defined on R2 that satisfies Equation (9.21). Equation (9.22) is a consequence of the first fundamental theorem of calculus. It shows again that a joint PDF of two continuous random variables—when it exists—is essentially a mixed second-order partial derivative of the joint CDF of the two random variables. 9.29 a) A nonnegative function fX1 ,...,Xm is said to be a joint probability density function of X1 , . . . , Xm if, for all real numbers ai and bi , with ai < bi for 1 ≤ i ≤ m, - b1 - bm ··· fX1 ,...,Xm (x1 , . . . , xm ) dx1 · · · dxm . P (a1 ≤ X1 ≤ b1 , . . . , am ≤ Xm ≤ bm ) = a1

am

b) Let X1 , . . . , Xm be continuous random variables defined on the same sample space. If the partial derivatives of FX1 ,...,Xm , up to and including those of the mth-order, exist and are continuous except possibly at (portions of) a finite number of (m − 1)-dimensional planes parallel to the coordinate axes, then X1 , . . . , Xm have a joint PDF, which we can take to be fX1 ,...,Xm (x1 , . . . , xm ) =

∂xk1

∂m FX ,...,Xm (x1 , . . . , xm ) · · · ∂xkm 1

at continuity points of the partials, and fX1 ,...,Xm (x1 , . . . , xm ) = 0 otherwise, where k1 , . . . , km is any permutation of 1, . . . , m. c) Random variables X1 , . . . , Xm defined on the same sample space have a joint PDF if and only if there is a nonnegative function f defined on Rm such that, for all x1 , . . . , xm ∈ R, - x1 - xm ··· f (u1 , . . . , um ) du1 · · · dum . FX1 ,...,Xm (x1 , . . . , xm ) = −∞

−∞

9-16

CHAPTER 9

Jointly Continuous Random Variables

In this case, f is a joint PDF of X1 , . . . , Xm , and f (x1 , . . . , xm ) =

∂m FX ,...,Xm (x1 , . . . , xm ) ∂x1 · · · ∂xm 1

at all points (x1 , . . . , xm ) where this mth-order mixed partial exists.

Advanced Exercises 9.30 a) In the solution to Exercise 9.14, we showed that both X and Y are U(0, 1) random variables and, hence, they are continuous random variables. b) In the solution to Exercise 9.14(c), we found that ⎧ 0, if x < 0 or y < 0; ⎪ ⎨ x, if 0 ≤ x < 1 and y ≥ x; FX,Y (x, y) = ⎪ ⎩ y, if 0 ≤ y < 1 and x ≥ y; 1, if x ≥ 1 and y ≥ 1.

It follows that the second-order mixed partials equal 0, wherever they exist. Hence, X and Y couldn’t possibly have a joint PDF. Another way to show this fact relies on Proposition 9.6 on page 502 of Section 9.3, the FPF for two random variables with a joint PDF. Applying this result and recalling that P (Y = X) = 1, we see that, if X and Y had a joint PDF, then , -- ∞ *- x 1 = P (Y = X) = fX,Y (x, y) dx dy = fX,Y (x, y) dy dx = 0, −∞

y=x

x

which is impossible. 9.31 Suppose that X and Y are independent. Then, by Exercise 9.15(a), for x, y ∈ R, *- x , *- y , - x - y FX,Y (x, y) = FX (x)FY (y) = fX (s) ds fY (t) dt = fX (s)fY (t) ds dt. −∞

−∞

−∞ −∞

Consequently, by Proposition 9.4 on page 499, the function f defined on R2 by f (x, y) = fX (x)fY (y) is a joint PDF of X and Y . Conversely, suppose that the function f defined on R2 by f (x, y) = fX (x)fY (y) is a joint PDF of X and Y . Then, for x, y ∈ R, *- x , *- y , - x - y FX,Y (x, y) = fX (s)fY (t) ds dt = fX (s) ds fY (t) dt = FX (x)FY (y). −∞ −∞

−∞

Hence, by Exercise 9.15(b), X and Y are independent.

−∞

9.32 Suppose that X1 , . . . , Xm are independent. Then, by Exercise 9.16, for x1 , . . . , xm ∈ R, *- x1 , *- xm , FX1 ,...,Xm (x1 , . . . , xm ) = FX1 (x1 ) · · · FXm (xm ) = fX1 (u1 ) du1 · · · fXm (um ) dum −∞ −∞ - x1 - xm = ··· fX1 (u1 ) · · · fXm (um ) du1 · · · dum . −∞

−∞

Consequently, by the m-variate version of Proposition 9.4 on page 499, the function f defined on Rm by f (x1 , . . . , xm ) = fX1 (x1 ) · · · fXm (xm ) is a joint PDF of X1 , . . . , Xm .

9.2

Introducing Joing Probability Density Functions

9-17

Conversely, suppose that the function f defined on Rm by f (x1 , . . . , xm ) = fX1 (x1 ) · · · fXm (xm ) is a joint PDF of X1 , . . . , Xm . Then, for x1 , . . . , xm ∈ R, FX1 ,...,Xm (x1 , . . . , xm ) = =

-

x1

···

−∞ *- x1

−∞

-

xm

fX1 (u1 ) · · · fXm (um ) du1 · · · dum . , *- xm , fX1 (u1 ) du1 · · · fXm (um ) dum = FX1 (x1 ) · · · FXm (xm ). −∞

−∞

Hence, by Exercise 9.16, X1 , . . . , Xm are independent. 9.33 a) Event {X(k) ≤ x} occurs if and only if the kth smallest of X1 , . . . , Xn is at most x, that is, if and only if at least k of the Xj s fall in the interval (−∞, x]. Let Y denote the number of Xj s that fall in the interval ! (−∞, " x]. Because X1 , . . . , Xn are independent and have common CDF, F , we see that Y ∼ B n, F (x) . Therefore, FX(k) (x) = P (X(k) ≤ x) = P (Y ≥ k) =

n * ,! 8 n

j =k

j

"j ! "n−j F (x) 1 − F (x) .

b) Referring to part (a), we differentiate FX(k) to obtain n * ,( 8 ! "j −1 ! ! "j ! "n−j "n−j −1 ) n fX(k) (x) = jf (x) F (x) − (n − j )f (x) F (x) 1 − F (x) 1 − F (x) j j =k

=

n 8

j =k

− =

n 8

j =k

− =

! "j −1 ! "n−j n! f (x) F (x) 1 − F (x) (j − 1)! (n − j )! n−1 8

j =k

! "j ! "n−j −1 n! f (x) F (x) 1 − F (x) j ! (n − j − 1)!

! "j −1 ! "n−j n! f (x) F (x) 1 − F (x) (j − 1)! (n − j )! n 8

! "j −1 ! "n−j n! f (x) F (x) 1 − F (x) (j − 1)! (n − j )! j =k+1

! "k−1 ! "n−k n! f (x) F (x) 1 − F (x) . (k − 1)! (n − k)!

c) We argue as in Example 9.4(a), except we partition the real line into three intervals using the numbers x and x + %x. We classify each of the n observations, X1 , . . . , Xn , by the interval in which it falls. Because the n observations are independent, we can apply the multinomial distribution with parameters n and p1 , p2 , p3 , where p1 = F (x),

p2 = F (x + %x) − F (x) ≈ f (x)%x,

p3 = 1 − F (x + %x) ≈ 1 − F (x).

For event {x ≤ X(k) ≤ x + %x} to occur, the number of observations that fall in the three intervals must be k − 1, 1, and n − k, respectively, where we have ignored higher order differentials. Referring now to

9-18

CHAPTER 9

Jointly Continuous Random Variables

the PMF of a multinomial distribution, Proposition 6.7 on page 277, we conclude that *

, n ≤ x + %x) ≈ p k−1 p21 p3n−k k − 1, 1, n − k 1

P (x ≤ X(k)

! "k−1 ! "1 ! "n−k n! F (x) f (x)%x 1 − F (x) (k − 1)! 1! (n − k)! ! "k−1 ! "n−k n! = f (x) F (x) 1 − F (x) %x. (k − 1)! (n − k)! ≈

Therefore, heuristically, fX(k) (x) =

! "k−1 ! "n−k n! , f (x) F (x) 1 − F (x) (k − 1)! (n − k)!

which agrees with the result obtained formally in part (b). d) For a U(0, 1) distribution, F (x) =

#

0, if x < 0; x, if 0 ≤ x < 1; 1, if x ≥ 1.

and

f (x) =

9

1, 0,

if 0 < x < 1; otherwise.

Therefore, from part (b), fX(k) (x) =

n! 1 x k−1 (1 − x)n−k = x k−1 (1 − x)n−k (k − 1)! (n − k)! B(k, n − k + 1)

if 0 < x < 1, and fX(k) (x) = 0 otherwise. We see that X(k) has the beta distribution with parameters k and n − k + 1. e) For an E(λ) distribution, 9 9 −λx , if x > 0; 0, if x < 0; F (x) = and f (x) = λe −λx 1−e , if x ≥ 0. 0, otherwise. Therefore, from part (b), ! "k−1 ! −λx "n−k n! λe−λx 1 − e−λx e (k − 1)! (n − k)! ! "k−1 n! = λe−λ(n−k+1)x 1 − e−λx (k − 1)! (n − k)!

fX(k) (x) =

if x > 0, and fX(k) (x) = 0 otherwise.

9.34 a) By symmetry, all n! permutations of X1 , . . . , Xn are equally likely to be the order statistics. Hence, by Exercise 9.32, fX(1) ,...,X(n) (x1 , . . . , xn ) = n!fX1 ,...,Xn (x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ) if x1 < · · · < xn , and fX(1) ,...,X(n) (x1 , . . . , xn ) = 0 otherwise.

9.2

Introducing Joing Probability Density Functions

9-19

b) Let x1 < · · · < xn . We argue as in Example 9.4(a), except we partition the real line into 2n + 1 intervals by using the numbers xk and xk + %xk , for k = 1, 2, . . . , n. We classify each of the n observations, X1 , . . . , Xn , by the interval in which it falls. Because the n observations are independent, we can apply the multinomial distribution with parameters n and p1 , p2 , . . . , p2n , p2n+1 , where p1 = F (x1 ), p2 = f (x1 )%x1 , ..., p2n = f (xn )%xn , p2n+1 = 1 − F (xn ). ;n For event k=1 {xk ≤ X(k) ≤ xk + %xk } to occur, the number of observations that fall in the 2n + 1 intervals must be 0, 1, . . . , 1, 0, respectively, where we have ignored higher order differentials. Referring now to the PMF of a multinomial distribution, Proposition 6.7 on page 277, we conclude that P

*< n

{xk ≤ X(k)

k=1

,

*

, n 1 0 ≤ xk + %xk } ≈ p 0 p1 · · · p2n p2n+1 0, 1, . . . , 1, 0 1 2

! "0 ! "1 ! "1 ! "0 = n! F (x1 ) f (x1 )%x1 · · · f (xn )%xn 1 − F (xn ) = n!f (x1 )%x1 · · · f (xn )%xn

= n!f (x1 ) · · · f (xn )%x1 · · · %xn . Therefore, heuristically, fX(1) ,...,X(n) (x1 , . . . , xn ) = n!f (x1 ) · · · f (xn )

if x1 < · · · < xn , and fX(1) ,...,X(n) (x1 , . . . , xn ) = 0 otherwise, which agrees with the result obtained in part (a). c) For a U(0, 1) distribution, 9 1, if 0 < x < 1; f (x) = 0, otherwise. Therefore, from part (b),

fX(1) ,...,X(n) (x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ) = n! if 0 < x1 < · · · < xn < 1, and fX(1) ,...,X(n) (x1 , . . . , xn ) = 0 otherwise. For an E(λ) distribution, 9 −λx , if x > 0; f (x) = λe 0, otherwise.

Therefore, from part (b),

! " ! " fX(1) ,...,X(n) (x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ) = n! λe−λx1 · · · λe−λxn = n!λn e−λ(x1 +···+xn )

if 0 < x1 < · · · < xn , and fX(1) ,...,X(n) (x1 , . . . , xn ) = 0 otherwise.

9.35 a) Let x < y. We argue as in Example 9.4(a), where we partition the real line into five intervals using the numbers x, x + %x, y, and y + %y. We classify each of the n observations, X1 , . . . , Xn , by the interval in which it falls. Because the n observations are independent, we can apply the multinomial distribution with parameters n and p1 , . . . , p5 , where, approximately, p1 = F (x),

p2 = f (x)%x,

p3 = F (y) − F (x),

p4 = f (y)%y,

p5 = 1 − F (y).

For event {x ≤ X(j ) ≤ x + %x, y ≤ X(k) ≤ y + %y} to occur, the number of observations that fall in the five intervals must be j − 1, 1, k − j − 1, 1, and n − k, respectively, where we have ignored higher

9-20

CHAPTER 9

Jointly Continuous Random Variables

order differentials. Referring now to the PMF of a multinomial distribution, Proposition 6.7 on page 277, we conclude that P (x ≤ X(j ) ≤ x + %x, y ≤ X(k) ≤ y + %y) * , n j −1 k−j −1 1 n−k ≈ p1 p21 p3 p4 p5 j − 1, 1, k − j − 1, 1, n − k =

! "j −1 ! "k−j −1 ! "n−k n! f (x)f (y) F (x) F (y) − F (x) 1 − F (y) %x%y. (j − 1)! (k − j − 1)! (n − k)!

Therefore, heuristically,

fX(j ) ,X(k) (x, y) =

n! f (x)f (y) (j − 1)! (k − j − 1)! (n − k)! ! "j −1 ! "k−j −1 ! "n−k × F (x) F (y) − F (x) 1 − F (y)

if x < y, and fX(j ) ,X(k) (x, y) = 0 otherwise. b) Let x < y < z. We argue as in Example 9.4(a), except we partition the real line into seven intervals using the numbers x, x + %x, y, y + %y, z, and z + %z. We classify each of the n observations, X1 , . . . , Xn , by the interval in which it falls. Because the n observations are independent, we can apply the multinomial distribution with parameters n and p1 , . . . , p7 , where, approximately, p1 = F (x),

p2 = f (x)%x,

p5 = F (z) − F (y),

p3 = F (y) − F (x), p6 = f (z)%z,

p4 = f (y)%y,

p7 = 1 − F (z).

For event {x ≤ X(i) ≤ x + %x, y ≤ X(j ) ≤ y + %y, z ≤ X(k) ≤ z + %z} to occur, the number of observations that fall in the seven intervals must be i − 1, 1, j − i − 1, 1, k − j − 1, 1, and n − k, respectively, where we have ignored higher order differentials. Referring now to the PMF of a multinomial distribution, Proposition 6.7 on page 277, we conclude that P (x ≤ X(i) ≤ x + %x, y ≤ X(j ) ≤ y + %y, z ≤ X(k) ≤ z + %z) * , n j −i−1 1 k−j −1 1 n−k ≈ p1i−1 p21 p3 p4 p5 p6 p7 i − 1, 1, j − i − 1, 1, k − j − 1, 1, n − k =

n! f (x)f (y)f (z) (i − 1)! (j − i − 1)! (k − j − 1)! (n − k)! ! "i−1 ! "j −i−1 ! "k−j −1 ! "n−k × F (x) F (y) − F (x) F (z) − F (y) 1 − F (z) %x%y%z.

Therefore, heuristically,

fX(i) ,X(j ) ,X(k) (x, y, z) =

n! f (x)f (y)f (z) (i − 1)! (j − i − 1)! (k − j − 1)! (n − k)! ! "i−1 ! "j −i−1 ! "k−j −1 ! "n−k × F (x) F (y) − F (x) F (z) − F (y) 1 − F (z)

if x < y < z, and fX(i) ,X(j ) ,X(k) (x, y, z) = 0 otherwise.

9.2

Introducing Joing Probability Density Functions

9-21

c) For a U(0, 1) distribution, F (x) =

#

0, x, 1,

if x < 0; if 0 ≤ x < 1; if x ≥ 1.

and

f (x) =

9

1, 0,

if 0 < x < 1; otherwise.

Therefore, from part (a), fX(j ) ,X(k) (x, y) =

n! x j −1 (y − x)k−j −1 (1 − y)n−k (j − 1)! (k − j − 1)! (n − k)!

if 0 < x < y < 1, and fX(j ) ,X(k) (x, y) = 0 otherwise. And from part (b), fX(i) ,X(j ) ,X(k) (x, y, z) =

n! (i − 1)! (j − i − 1)! (k − j − 1)! (n − k)! × x i−1 (y − x)j −i−1 (z − y)k−j −1 (1 − z)n−k

if 0 < x < y < z < 1, and fX(i) ,X(j ) ,X(k) (x, y, z) = 0 otherwise. For an E(λ) distribution, 9 9 −λx , if x > 0; 0, if x < 0; and f (x) = λe F (x) = −λx , if x ≥ 0. 1−e 0, otherwise. Therefore, from part (a), fX(j ) ,X(k) (x, y) =

=

! −λx "! −λy " n! λe λe (j − 1)! (k − j − 1)! (n − k)! ! "j −1 ! −λx "k−j −1 ! −λy "n−k × 1 − e−λx e − e−λy e n! λ2 e−λ(x+(n−k+1)y) (j − 1)! (k − j − 1)! (n − k)! ! "j −1 ! −λx "k−j −1 × 1 − e−λx e − e−λy

if 0 < x < y, and fX(j ) ,X(k) (x, y) = 0 otherwise. And from part (b), fX(i) ,X(j ) ,X(k) (x, y, z) =

=

! −λx "! −λy "! −λz " n! λe λe λe (i − 1)! (j − i − 1)! (k − j − 1)! (n − k)! ! "i−1 ! −λx "j −i−1 ! −λy "k−j −1 ! −λz "n−k × 1 − e−λx − e−λy − e−λz e e e n! λ3 e−λ(x+y+(n−k+1)z) (i − 1)! (j − i − 1)! (k − j − 1)! (n − k)! ! "i−1 ! −λx "j −i−1 ! −λy "k−j −1 × 1 − e−λx − e−λy − e−λz e e

if 0 < x < y < z, and fX(i) ,X(j ) ,X(k) (x, y, z) = 0 otherwise.

9-22

CHAPTER 9

Jointly Continuous Random Variables

9.3 Properties of Joint Probability Density Functions Basic Exercises 9.36 By the FPF, we have, for each x ∈ R, ! " P (X = x) = P (X, Y ) ∈ {x} × R = =

-



−∞

*-

--

(s,t)∈{x}×R - ∞

,

x

fX,Y (s, t) ds dt =

x

fX,Y (s, t) ds dt 0 dt = 0.

−∞

Thus, X is a continuous random variable. Similarly, we can show that Y is a continuous random variable. 9.37 Clearly, property (a) is satisfied. For property (b), - ∞- ∞ - ∞- ∞ h(x, y) dx dy = f (x)g(y) dx dy = −∞ −∞

=

−∞ −∞ - ∞ −∞

1 · f (x) dx =

-

*-



−∞



−∞

0

Therefore, the corresponding joint PDF is

f (x, y) =

0

9

λ2 e−λx , 0,

,

g(y) dy f (x) dx

−∞

f (x) dx = 1.

9.38 a) We note that g(x, y) ≥ 0 for all (x, y) ∈ R2 , and that , - ∞- ∞ - ∞ *- x −λx g(x, y) dx dy = e dy dx = −∞ −∞





0

xe−λx dx =

1 . λ2

if 0 < y < x; otherwise.

b) We note that g(x, y) ≥ 0 for all (x, y) ∈ R2 . For the double integral over R2 to exist, we must have α > −1, in which case , - ∞- ∞ - 1 *- y - 1 α+1 y 1 α g(x, y) dx dy = (y − x) dx dy = dy = . (α + 1)(α + 2) −∞ −∞ 0 0 0 α+1 Therefore, the corresponding joint PDF is 9 (α + 1)(α + 2)(y − x)α , f (x, y) = 0,

if 0 < x < y < 1; otherwise.

c) As g(x, y) = x − y < 0 for 0 < x < y < 1, constructing a joint PDF from g is not possible. d) We note that g(x, y) ≥ 0 for all (x, y) ∈ R2 , and that / - ∞- ∞ - 1 .- 1 - 1 g(x, y) dx dy = (x + y) dx dy = (1/2 + y) dy = 1. −∞ −∞

0

0

0

Therefore, g itself is a joint PDF. e) We note that g(x, y) ≥ 0 for all (x, y) ∈ R2 , but that - ∞- ∞ - ∞ *g(x, y) dx dy = −∞ −∞

0

∞ 0

Therefore, constructing a joint PDF from g is not possible.

,

(x + y) dx dy = ∞.

9.3

9-23

Properties of Joint Probability Density Functions

9.39 Note that the region in which the PDF is nonzero is the interior of the triangle with vertices (0, 0), (50, 0), and (0, 50). We want P (X > 20, Y > 20), which, from the FPF, equals / - ∞- ∞ - 30 .- 50−x fX,Y (x, y) dx dy = c(50 − x − y) dy dx, 20

−∞ −∞

20

which is the expression in (b). It is easy to see that none of the other expressions provide the required probability. 9.40 a) Because the point is selected at random, we can take a joint PDF to be a positive constant on S—that is, fX,Y (x, y) = c for (x, y) ∈ S and fX,Y (x, y) = 0 otherwise. Because a joint PDF must integrate to 1, we have -- ∞- ∞ fX,Y (x, y) dx dy = c dx dy = c|S|, 1= −∞ −∞

S

where |S| denotes the area of S. Hence c = 1/|S| and, consequently, fX,Y (x, y) = 1/|S| for (x, y) ∈ S, and fX,Y (x, y) = 0 otherwise. b) From the FPF and part (a), --! " |A ∩ S| 1 P (X, Y ) ∈ A = fX,Y (x, y) dx dy = dx dy = . |S| |S| A

A∩S

9.41 Let S denote the unit square and note that |S| = 1. a) From Exercise 9.40(a), 9 9 1/|S|, if (x, y) ∈ S; 1, if 0 < x < 1 and 0 < y < 1; fX,Y (x, y) = = 0, otherwise. 0, otherwise.

b) In this special case of bivariate uniform random variables, using the bivariate uniform model from Exercise 9.40 is a far easier way to obtain a joint PDF than by the CDF method used in Examples 9.1 and 9.3. c) Using the joint PDF of X and Y and applying the FPF, we get -P (|X − Y | ≤ 1/4) = fX,Y (x, y) dx dy |x−y|≤1/4

=

-

1/4

0

.-

x+1/4 0

/

1 dy dx +

-

3/4

1/4

.-

x+1/4 x−1/4

/

1 dy dx +

-

1

3/4

.-

1

/

dy dx x−1/4

= 3/32 + 1/4 + 3/32 = 7/16.

Alternatively, let A = { (x, y) : |x − y| ≤ 1/4 } and set B = S \ A. Using Exercise 9.40(b), we get ! " |A ∩ S| 1 P (|X − Y | ≤ 1/4) = P (X, Y ) ∈ A = = 1 − |B| = 1 − 2 · (3/4)2 = 7/16. |S| 2

9.42 Let S denote the upper half of the unit disk and note that |S| = π/2. a) From Exercise 9.40(a), 9 9 √ 1/|S|, if (x, y) ∈ S; 2/π, if −1 < x < 1 and 0 < y < 1 − x2; fX,Y (x, y) = = 0, otherwise. 0, otherwise.

9-24

CHAPTER 9

Jointly Continuous Random Variables

b) In this special case of bivariate uniform random variables, using the bivariate uniform model from Exercise 9.40 is a far easier way to obtain a joint PDF than by the CDF method used in Exercises 9.4 and 9.23. c) Let T denote the interior of the specified triangle. Using the joint PDF of X and Y and applying the FPF, we get / - 1 .- 1−y -! " 2 fX,Y (x, y) dx dy = (2/π) dx dy = · 1 = 0.637. P (X, Y ) ∈ T = π 0 y−1 T

Alternatively, using Exercise 9.40(b), we get ! " |T ∩ S| |T | 1 = = = 0.637. P (X, Y ) ∈ T = |S| |S| π/2

d) Let A = { (x, y) : |x| ≥ 1/2 }. Using the joint PDF of X and Y and the FPF, we get " ! P (X, Y ) ∈ Ac =

-Ac

fX,Y (x, y) dx dy =

--

(2/π) dx dy

Ac ∩S

⎛ √ ⎞ - 1−x 2 4 1/2 ⎝ 1 dy ⎠ dx = 0.609. = π 0 0

! " Hence, by the complementation rule, P (X, Y ) ∈ A = 1 − 0.609 = 0.391.

9.43 a) Because the point is selected at random, we can take a joint PDF to be a positive constant on S—that is, fX1 ,...,Xm (x1 , . . . , xm ) = c for (x1 , . . . , xm ) ∈ S and fX1 ,...,Xm (x1 , . . . , xm ) = 0 otherwise. Because a joint PDF must integrate to 1, we have - ∞ - ∞ 1= ··· fX1 ,...,Xm (x1 , . . . , xm ) dx1 · · · dxm = · · · c dx1 · · · dxm = c|S|, −∞

−∞

S

where |S| denotes the m-dimensional volume of S. Hence c = 1/|S| and, consequently, we have fX1 ,...,Xm (x1 , . . . , xm ) = 1/|S| for (x1 , . . . , xm ) ∈ S, and fX1 ,...,Xm (x1 , . . . , xm ) = 0 otherwise. b) From the FPF and part (a), P (X1 , . . . , Xm ) ∈ A =

-

=

-

!

"

··· A

··· A∩S

-

fX1 ,...,Xm (x1 , . . . , xm ) dx1 · · · dxm

-

1 |A ∩ S| dx1 · · · dxm = . |S| |S|

9.44 Let S denote the unit cube and note that |S| = 1. a) From Exercise 9.43(a), 9 9 1, 1/|S|, if (x1 , . . . , xm ) ∈ S; = fX1 ,...,Xm (x1 , . . . , xm ) = 0, otherwise. 0,

if 0 < x, y, z < 1; otherwise.

9.3

9-25

Properties of Joint Probability Density Functions

b) From the FPF and part (a), P (Z = max{X, Y, Z}) = P (Z > X, Z > Y ) = =

- 1 .0

0

z *-

0

fX,Y,Z (x, y, z) dx dy dz

z>x, z>y

/

,

z

---

1 dx dy dz =

-

0

1 *- z 0

,

z dy dz =

-

0

1

z2 dz =

1 . 3

c) By symmetry, each of X, Y , and Z are equally likely to be the largest of the three. Thus, 1 = P (X = max{X, Y, Z}) + P (Y = max{X, Y, Z}) + P (Z = max{X, Y, Z}) = 3P (Z = max{X, Y, Z}).

Thus, P (Z = max{X, Y, Z}) = 1/3. d) Let A denote the sphere of radius 1/4 centered at (1/2, 1/2, 1/2). Then, by Exercise 9.43(b), 4 |A ∩ S| = |A| = π P (X, Y, Z) ∈ A = |S| 3 !

"

e) From the FPF and part (a), P (X + Y > Z) =

---

fX,Y,Z (x, y, z) dx dy dz =

x+y>z

= =

- 1 .0

0

1−x *- x+y 0

1 1 5 + = . 3 2 6

/

* ,3 1 = 0.0654. 4

- 1 .- 1 *0

, 1 dz dy dx +

0

- 1 .0

min{x+y,1} 0

1 1−x

*-

1 0

, / 1 dz dy dx

, / 1 dz dy dx

9.45 a) Here we want P (X > Y ). From the FPF, P (X > Y ) =

--

x>y

fX,Y (x, y) dx dy =

-

1/2 0

.-

1−y y

/

6(1 − x − y) dx dy =

1 . 2

Alternatively, we can use the fact that the joint PDF of X and Y is symmetric in x and y to conclude that 1 = P (X = Y ) + P (X > Y ) + P (X < Y ) = 0 + P (X > Y ) + P (X < Y ) = 2P (X > Y ), so that P (X > Y ) = 1/2. b) Here we want P (X < 0.2). From the FPF P (X < 0.2) =

--

x 0.

b) In a series system, T = min{X, Y }. Then, for t ≥ 0,

FT (t) = P (T ≤ t) = 1 − P (T > t) = 1 − P (min{X, Y } > t) = 1 − P (X > t, Y > t).

Referring now to the result of Example 9.5(a) on page 503, we see that FT (t) = 1 − e−(λ+µ)t for t ≥ 0. Differentiating gives fT (t) = (λ + µ)e−(λ+µ)t , t > 0.

Hence, T ∼ E(λ + µ). c) The event exactly one of the components is functioning at time t is {X > t, Y ≤ t} ∪ {X ≤ t, Y > t}, where we note that these two events are mutually exclusive. Now, by the FPF, , - t *- ∞ -fX,Y (x, y) dx dy = λµe−(λx+µy) dx dy P (X > t, Y ≤ t) = =

x>t, y≤t - t *- ∞ 0

t

0

t

,

! " λe−λx dx µe−µy dy = e−λt 1 − e−µt = e−λt − e−(λ+µ)t .

Likewise, we find that P (X ≤ t, Y > t) = e−µt − e−(λ+µ)t . Thus, the probability that exactly one of the components is functioning at time t is ) ( ) ( e−λt − e−(λ+µ)t + e−µt − e−(λ+µ)t = e−λt + e−µt − 2e−(λ+µ)t . 9.47 From the FPF,

P (Y = X) =

--

y=x

fX,Y (x, y) dx dy =

-

∞ −∞

*-

,

x x

fX,Y (x, y) dy dx =

-



−∞

0 dx = 0.

9.48 Recall that the event that component B is the first to fail is given by {Y < X}. a) By the FPF, , - ∞ *- ∞ -−(λx+µy) fX,Y (x, y) dx dy = λµe dx dy P (Y < X) = y X + 1) = P (R > 1) = fR (r) dr = (n − 1)λe−λr 1 − e−λr = (n − 1)

-

r>1

1

1−e−λ

1

! "n−1 un−2 du = 1 − 1 − e−λ .

Theory Exercises 9.50 The property in Propostion 9.5(a) is part of the definition of a joint PDF, as given in Definition 9.2 on page 495. We still need to verify the property in Proposition 9.5(b). Intuitively, it reflects the fact

9-28

CHAPTER 9

Jointly Continuous Random Variables

that the probability that X and Y take on some real numbers equals 1. To prove it mathematically, = we let An = { −n ≤ X ≤ n, −n ≤ Y ≤ n } for each n ∈ N . Then A1 ⊂ A2 ⊂ · · · and ∞ n=1 An = !. Applying the definition of a joint PDF, the continuity property of a probability measure, and the certainty property of a probability measure gives - ∞- ∞ - n- n fX,Y (x, y) dx dy = lim fX,Y (x, y) dx dy n→∞ −n −n

−∞ −∞

= lim P (−n ≤ X ≤ n, −n ≤ Y ≤ n) n→∞ *: , ∞ = lim P (An ) = P An = P (!) = 1. n→∞

n=1

9.51 A joint PDF, fX1 ,...,Xm , of m continuous random variables, X1 , . . . , Xm , satisfies the following two properties: a) fX1 ,...,Xm (x1 , . . . , xm ) ≥ 0 for all (x1 , . . . , xm ) ∈ Rm $∞ $∞ b) −∞ · · · −∞ fX1 ,...,Xm (x1 , . . . , xm ) dx1 · · · dxm = 1

9.52 Suppose that X1 , . . . , Xm are continuous random variables with a joint PDF. Then, for any subset, A ⊂ Rm , ! " P (X1 , . . . , Xm ) ∈ A = · · · fX1 ,...,Xm (x1 , . . . , xm ) dx1 · · · dxm . A

In words, the probability that m continuous random variables take a value in a specified subset of Rm can be obtained by integrating a joint PDF of the random variables over that subset. 9.53 a) We use the FPF, make the substitution u = x + y, and then interchange the order of integration to get , -- ∞ *- z−x FX+Y (z) = P (X + Y ≤ z) = fX,Y (x, y) dx dy = fX,Y (x, y) dy dx −∞

x+y≤z

=

-



−∞

*-

z −∞

, fX,Y (x, u − x) du dx =

−∞

Therefore, from Proposition 8.5 on page 422, fX+Y (z) =

z

-



−∞

*-

∞ −∞

−∞

,

fX,Y (x, u − x) dx du.

fX,Y (x, z − x) dx.

b) We use the FPF, make the substitution u = x − y, and then interchange the order of integration to get , -- ∞ *- ∞ FX−Y (z) = P (X − Y ≤ z) = fX,Y (x, y) dx dy = fX,Y (x, y) dy dx −∞

x−y≤z

=

-



−∞

*-

z −∞

, fX,Y (x, x − u) du dx =

−∞

Therefore, from Proposition 8.5 on page 422, fX−Y (z) =

z

-



−∞

*-

∞ −∞

fX,Y (x, x − z) dx.

x−z

,

fX,Y (x, x − u) dx du.

9.3

9-29

Properties of Joint Probability Density Functions

Advanced Exercises 9.54 No! The random variables X and Y in Exercise 9.30 are continuous random variables but, clearly, P (Y = X) = 1. 9.55 No! The random variables X and Y in Exercise 9.30 are continuous random variables but cannot have a joint PDF because P (Y = X) = 1 ̸= 0. 9.56 a) Let X1 , X2 , and X3 denote the positions of the three gas stations. By assumption, these three random variables are a random sample of size 3 from a U(0, 1) distribution. Therefore, from Exercise 9.34(c), a joint PDF of X(1) , X(2) , and X(3) is fX(1) ,X(2) ,X(3) (x1 , x2 , x3 ) = 3! = 6 if 0 < x1 < x2 < x3 < 1, and fX(1) ,X(2) ,X(3) (x1 , x2 , x3 ) = 0 otherwise. The event that no two of the gas stations are less than 1/3 mile apart can be expressed as {X(2) ≥ X(1) + 1/3, X(3) ≥ X(2) + 1/3}. From the FPF, P (X(2) ≥ X(1) + 1/3, X(3) ≥ X(2) + 1/3) =

---

fX(1) ,X(2) ,X(3) (x1 , x2 , x3 ) dx1 dx2 dx3

x2 ≥x1 +1/3 x3 ≥x2 +1/3

=6

-

=6·

1/3

0

.-

2/3 x1 +1/3

*-

1 x2 +1/3

,

/

dx3 dx2 dx1

1 = 0.0370. 162

b) We must have 2d < 1, or d < 1/2. c) We evaluate the following integrals by first making the substitution u = 1 − d − x2 and then the substitution v = 1 − 2d − x1 , thus: P (X(2) ≥ X(1) + d, X(3) ≥ X(2) + d) --= fX(1) ,X(2) ,X(3) (x1 , x2 , x3 ) dx1 dx2 dx3 = 6 x2 ≥x1 +d x3 ≥x2 +d

=6 =3

-

1−2d

0

-

0

1−2d

0

.-

1−2d

/

1−d x1 +d

(1 − d − x2 ) dx2 dx1 = 6 2

(1 − 2d − x1 ) dx1 = 3 3

-

1−2d

-

0

1−2d

.-

.-

1−d *- 1

x1 +d

1−2d−x1 0

x2 +d

,

/

dx3 dx2 dx1

/

u du dx1

v 2 dv

0

= (1 − 2d) . 9.57 a) We must have (n − 1)d < 1, or d < 1/(n − 1). b) Let X1 , . . . , Xn denote the positions of the n gas stations, which, by assumption are a random sample of size n from a U(0, 1) distribution. Therefore, from Exercise 9.34(c), a joint PDF of X(1) , . . . , X(n) is fX(1) ,...,X(n) (x1 , . . . , xn ) = n! if 0 < x1 < · · · < xn < 1, and fX(1) ,...,X(n) (x1 , . . . , xn ) = 0 otherwise. ; The event that no two of the gas stations are less than d mile apart is n−1 k=1 {X(k+1) ≥ X(k) + d}. From

9-30

CHAPTER 9

Jointly Continuous Random Variables

the FPF, we have , *n−1 < {X(k+1) ≥ X(k) + d} P k=1

=

-

···

-

xk+1 ≥xk +d, k=1,...,n−1

= n! = n!

-

1−(n−1)d 0 1−(n−1)d 0

..-

fX(1) ,...,X(n) (x1 , . . . , xn ) dx1 · · · dxn

1−(n−2)d *

x1 +d

1−(n−2)d *

x1 +d

··· ···

-

/

,

1

dxn · · · dx2 dx1 .

xn−1 +d

,

1−d xn−2 +d

/

(1 − d − xn−1 ) dxn−1 · · · dx2 dx1 .

Now we proceed as in Exercise 9.56(c). We evaluate the integrals in the last expression of the previous display by ! making the"nsuccessive substitutions un−k = 1 − kd − xn−k for k = 1, 2, . . . , n − 1. The result is 1 − (n − 1)d .

9.58 a) That hX,Y is a nonnegative function on R2 is part of the definition of a joint PDF/PMF. = b) For each n ∈ N , let An = {−n ≤ X ≤ n, Y = y}. Then A1 ⊂ A2 ⊂ · · · and ∞ n=1 An = {Y = y}. Hence, by the continuity property of a probability measure, - ∞ - n hX,Y (x, y) dx = lim hX,Y (x, y) dx = lim P (−n ≤ X ≤ n, Y = y) n→∞ −n

−∞

n→∞

= lim P (An ) = P n→∞

*: ∞

B

The required result now follows from the fact that

n=1

y

An

,

= P (Y = y).

P (Y = y) = 1.

9.59 a) Clearly, hX,Y is a nonnegative function on R2 . Also, -



8

−∞ y

∞ ∞ *8

, xy hX,Y (x, y) dx = λe−(1+λ)x dx y! 0 y=0 - ∞ - ∞ x −(1+λ)x = e λe dx = λe−λx dx = 1. -

0

b) For y = 0, 1, . . . ,

0

pY (y) = P (Y = y) = P (−∞ < X < ∞, Y = y) = = =

-

∞ 0

λe

−(1+λ)x

xy λ dx = y! y! (1/λ)y

-

λ = . (1 + λ)y+1 (1 + 1/λ)y+1

0



-



−∞

hX,Y (x, y) dx

x (y+1)−1 e−(1+λ)x dx =

λ '(y + 1) y! (1 + λ)y+1

Thus Y has the Pascal distribution (as defined in Exercise 7.26) with parameter 1/λ.

9.4

9-31

Marginal and Conditional Probability Density Functions

c) For x > 0, we have, in view of the law of partitions, FX (x) = P (X ≤ x) = ∞ *8

∞ 8 y=0

P (X ≤ x, Y = y) =

∞ *8 y=0

x −∞

hX,Y (s, y) ds

,

, - x *8 ∞ y, sy s = λe ds = λ e−(1+λ)s ds y! y! 0 0 y=0 y=0 - x - x = es λ e−(1+λ)s ds = λe−λs ds. x

−(1+λ)s

0

0

Thus, by Proposition 8.5 on page 422, we have fX (x) = λe−λx for x > 0 and fX (x) = 0 otherwise. Hence, X ∼ E(λ). 9.60 Clearly, h is a nonnegative function on R2 . Also, - ∞8 - ∞8 h(x, y) dx = f (x)p(y) dx = −∞ y

−∞ y

=

-



−∞

1 · f (x) dx =

-



−∞

−∞ y

=

-



−∞

1 · f (x) dx =

-



−∞

−∞

∞ *8

−∞

−∞ y

=

8 y

1 · p(y) =

y

8 y

y

, px (y) f (x) dx

f (x) dx = 1.

9.62 Clearly, h is a nonnegative function on R2 . Also, - ∞8 - ∞8 8 *h(x, y) dx = fy (x)p(y) dx = −∞ y

y

, p(y) f (x) dx

f (x) dx = 1.

9.61 Clearly, h is a nonnegative function on R2 . Also, - ∞8 - ∞8 h(x, y) dx = f (x)px (y) dx = −∞ y

∞ *8

∞ −∞

,

fy (x) dx p(y)

p(y) = 1.

9.4 Marginal and Conditional Probability Density Functions Basic Exercises 9.63 From Example 9.7, we have 9 9 1, if 0 < x, y < 1; 1, fX,Y (x, y) = fX (x) = 0, otherwise. 0,

if 0 < x < 1; fY (y) = otherwise.

9

1, 0,

if 0 < y < 1; otherwise.

9-32

CHAPTER 9

Jointly Continuous Random Variables

a) Let 0 < x < 1. Then, for 0 < y < 1, fY | X (y | x) =

1 fX,Y (x, y) = = 1. fX (x) 1

fY | X (y | x) =

9

Hence, for 0 < x < 1,

1, 0,

if 0 < y < 1; . otherwise.

In other words, Y|X=x ∼ U(0, 1) for each 0 < x < 1. By symmetry, we conclude that X|Y =y ∼ U(0, 1) for each 0 < y < 1. b) From part (a), we conclude that the conditional PDFs of Y given X = x are identical to each other and to the marginal PDF of Y . Thus, in this case, knowing the value of the random variable X imparts no information about the distribution of Y ; in other words, knowing the value of X doesn’t affect the probability distribution of Y . 9.64 a) In this case, we know from Exercise 9.63 that X|Y =y ∼ U(0, 1) for each 0 < y < 1. Consequently, we have P (X > 0.9 | Y = 0.8) = 0.1. b) For 0 < y < 1, - ∞ - 1 fX,Y (x, y) dx = (x + y) dx = 0.5 + y. fY (y) = 0

−∞

Then, for 0 < x < 1,

fX,Y (x, y) x+y = . fY (y) 0.5 + y Therefore, from (the conditional version of) the FPF, - 1 - ∞ 1 fX | Y (x | 0.8) dx = (x + 0.8) dx = 0.135. P (X > 0.9 | Y = 0.8) = 0.5 + 0.8 0.9 0.9 fX | Y (x | y) =

c) For 0 < y < 1,

-

fY (y) =



−∞

Then, for 0 < x < 1,

fX,Y (x, y) dx =

-

0

1

) 3( 2 1 3 x + y 2 dx = + y 2 . 2 2 2

! " 3 x2 + y2 fX,Y (x, y) = . fX | Y (x | y) = fY (y) 1 + 3y 2 Therefore, from (the conditional version of) the FPF, - ∞ - 1( ) 3 2 2 P (X > 0.9 | Y = 0.8) = fX | Y (x | 0.8) dx = x + y dx = 0.159. 1 + 3 · (0.8)2 0.9 0.9 9.65 a) For −1 < x < 1, fX (x) = Hence,

-



−∞

fX,Y (x, y) dy =

fX (x) =

9

- √1−x 2 0

√ (2/π) 1 − x 2 , 0,

& (2/π) dy = (2/π) 1 − x 2 . if −1 < x < 1; otherwise.

9.4

9-33

Marginal and Conditional Probability Density Functions

b) For 0 < y < 1, fY (y) =

-



fX,Y (x, y) dx =

−∞

Hence,

- √1−y 2 √



1−y 2

+ (2/π) dx = (4/π) 1 − y 2 .

9

& (4/π) 1 − y2, fY (y) = 0, √ c) Let −1 < x < 1. Then, for 0 < y < 1 − x 2 , fY | X (y | x) =

if 0 < y < 1; otherwise.

fX,Y (x, y) 2/π 1 = =√ . √ fX (x) (2/π) 1 − x 2 1 − x2

( √ ) Thus, Y|X=x ∼ U 0, 1 − x 2 for −1 < x < 1. & & d) Let 0 < y < 1. Then, for − 1 − y 2 < x < 1 − y 2 ,

fX,Y (x, y) 2/π 1 & = = & . fY (y) (4/π) 1 − y 2 2 1 − y2 ( & ) & ∼ U − 1 − y 2 , 1 − y 2 for 0 < y < 1. fX | Y (x | y) =

Thus, X|Y =y 9.66 a) We have

- √1−x 2 - √1−x 2 3 3 fX,Y (x, y) dy = √ (|x| + |y|) dy = (|x| + y) dy fX (x) = 2 8 4 − 1−x 0 −∞ * & , ) & 1 − x2 3( 3 2 |x| 1 − x + = 1 + 2|x| 1 − x 2 − x 2 = 4 2 8 -



if −1 < x < 1, and fX (x) = 0 otherwise. b) By symmetry, X and Y have the same probability distribution. Thus, , ⎧ * + ⎨ 3 1 + 2|y| 1 − y 2 − y 2 , if −1 < y < 1; fY (y) = 8 ⎩ 0, otherwise. c) Referring to the results of parts (a) and (b), we see that, for −1 < y < 1, fY | X (y | 0) =

fX,Y (0, y) = fX (0)

3 8 |y| 3 8

= |y|.

Hence, by the FPF, P (|Y | > 1/2 | X = 0) =

-

|y|>1/2

fY | X (y | 0) dy =

-

1/2 0; otherwise.

β α α−1 −βλ (βλ)α −(β+y)λ λ e · λe−λy = e '(α) '(α) if λ > 0 and y > 0, and f(,Y (λ, y) = 0 otherwise. Thus, fY (y) = 0 if y ≤ 0 and, for y > 0, - ∞ - ∞ - ∞ (βλ)α −(β+y)λ βα f(,Y (λ, y) dλ = e dλ = λα e−(β+y)λ dλ fY (y) = '(α) '(α) −∞ 0 0 - ∞ α+1 α '(α + 1) (β + y) αβ α β (α+1)−1 −(β+y)λ · λ e dλ = . = '(α) (β + y)α+1 0 '(α + 1) (β + y)α+1 f(,Y (λ, y) = f( (λ)fY | ( (y | λ) =

9-38

CHAPTER 9

Jointly Continuous Random Variables

b) For y > 0, (βλ)α −(β+y)λ e f(,Y (λ, y) '(α) = f( | Y (λ | y) = αβ α fY (y) (β + y)α+1 =

(β + y)α+1 α −(β+y)λ (β + y)α+1 (α+1)−1 −(β+y)λ = e λ e λ α'(α) '(α + 1)

if λ > 0, and f( | Y (λ | y) = 0 otherwise. Thus, (|Y =y ∼ '(α + 1, β + y) for y > 0.

9.76 a) By assumption,

9

fX (x) = and

1, 0,

if 0 < x < 1; , otherwise.

fY | X (y | x) =

fZ | X,Y (z | x, y) =

9

1/y, 0,

9

1/x, if 0 < y < x < 1; , 0, otherwise.

if 0 < z < y < x < 1; otherwise.

Hence, by the general multiplication rule in the form of Equation (9.36) on page 519, fX,Y,Z (x, y, z) = fX (x)fY | X (y | x)fZ | X,Y (z | x, y) = 1 ·

1 1 1 · = x y xy

if 0 < z < y < x < 1, and fX,Y,Z (x, y, z) = 0 otherwise. b) We have fZ (z) = =

-



-



−∞ −∞

-

1

z

fX,Y,Z (x, y, z) dx dy =

1 ln(x/z) dx = x

-

− ln z

0

-

u du =

z

1 *- x z

, , - 1 *- x 1 dy 1 dy dx = dx xy y x z z

1 (ln z)2 2

if 0 < z < 1, and fZ (z) = 0 otherwise, where we obtained the penultimate integral by making the substitution u = ln(x/z). c) From parts (a) and (b), we have, for 0 < z < 1,

fX,Y | Z (x, y | z) =

1 xy

fX,Y,Z (x, y, z) 2 = = 1 fZ (z) xy(ln z)2 (ln z)2 2

if z < y < x < 1 and fX,Y | Z (x, y | z) = 0 otherwise.

Theory Exercises 9.77 Clearly, fY | X (· | x) is a nonnegative function on R. Also, in view of Definition 9.3 on page 515 and Equation (9.27) on page 510, - ∞ - ∞ - ∞ 1 fX,Y (x, y) 1 dy = · fX (x) = 1. fY | X (y | x) dy = fX,Y (x, y) dy = fX (x) fX (x) −∞ fX (x) −∞ −∞

9.4

Marginal and Conditional Probability Density Functions

9.78 a) For all x, y ∈ R, !

"

c

-

c

0 ≤ P X ∈ A ∩ (−∞, x], Y ≤ y ≤ P (X ∈ A ) =

Ac

9-39

fX (x) dx = 0,

where !the last equality follows from"the fact that fX (x) = 0 for all x ∈ Ac . Consequently, we have shown that P X ∈ Ac ∩ (−∞, x], Y ≤ y = 0 for all x, y ∈ R. b) From the law of partitions and part (a), ! " ! " FX,Y (x, y) = P (X ≤ x, Y ≤ y) = P X ∈ A ∩ (−∞, x], Y ≤ y + P X ∈ Ac ∩ (−∞, x], Y ≤ y ! " ! " = P X ∈ A ∩ (−∞, x], Y ≤ y + 0 = P X ∈ A ∩ (−∞, x], Y ≤ y .

c) Recall from Definition 9.3 on page 515 that, if fX (x) = 0, then we define fY | X (y | x) = 0 for all y ∈ R. Using that convention and part (b), we get, for all x, y ∈ R, ! " FX,Y (x, y) = P X ∈ A ∩ (−∞, x], Y ≤ y - y = fX,Y (s, t) ds dt = A∩(−∞,x] −∞

= =

-

-

y

A∩(−∞,x] −∞

-

-

x

y

−∞ −∞

-

y

A∩(−∞,x] −∞

fX (s)fY | X (t | s) ds dt +

-

fX (s)fY | X (t | s) ds dt -

y

Ac ∩(−∞,x] −∞

fX (s)fY | X (t | s) ds dt

fX (s)fY | X (t | s) ds dt.

d) From part (c) and Proposition 9.4 on page 499, we conclude that the function f defined on R2 by f (x, y) = fX (x)fY | X (y | x) is a joint PDF of X and Y . 9.79 a) Applying Proposition 9.7 on page 510 and the general multiplication rule yields - ∞ - ∞ fX,Y (x, y) dx = fY | X (y | x)fX (x) dx. fY (y) = −∞

−∞

$∞

Therefore, the function f defined on R by f (y) = −∞ fY | X (y | x)fX (x) dx is a PDF of the random variable Y . b) Applying the FPF, part (a), and Equation (9.33) on page 517, we obtain, for all A ⊂ R2 , P (Y ∈ A) = =

-

A

-

fY (y) dy =



−∞

*-

A

- *A

∞ −∞

,

fY | X (y | x)fX (x) dx dy

,

fY | X (y | x) dy fX (x)dx =

-



−∞

P (Y ∈ A | X = x)fX (x) dx.

9.80 Let X1 , . . . , Xm be continuous random variables with a joint PDF. Then the function f defined on Rm by f (x1 , . . . , xm ) = fX1 (x1 )fX2 | X1 (x2 | x1 ) · · · fXm | X1 ,...,Xm−1 (xm | x1 , . . . , xm−1 ) is a joint PDF of X1 , . . . , Xm .

9-40

CHAPTER 9

Jointly Continuous Random Variables

Advanced Exercises 9.81 As usual, we use | · | to denote, in the extended sense, length in R and area in R2 . a) From Proposition 9.7 on page 510, fX (x) =

-



−∞

-

fX,Y (x, y) dy =

Sx

1 |Sx | dy = . |S| |S|

Thus, fX (x) = |Sx |/|S| if |Sx | > 0, and fX (x) = 0 otherwise. Similarly, we find that fY (y) = |S y |/|S| if |S y | > 0, and fY (y) = 0 otherwise. b) Suppose that fX (x) > 0, that is, |Sx | > 0. From the solution to part (a), fY | X (y | x) =

fX,Y (x, y) 1/|S| 1 = = , fX (x) |Sx |/|S| |Sx |

if y ∈ Sx , and fY | X (y | x) = 0 otherwise. Consequently, Y|X=x ∼ U(Sx ). Similarly, X|Y =y ∼ U(S y ) for all y such that |S y | > 0. c) No, as Example 9.8 on page 511 shows. d) Suppose that there exist subsets A and B of R such that S = A × B. Observe that |S| = |A||B| and, in particular, both |A| and |B| must be positive. Also observe that Sx = B if x ∈ A, and Sx = ∅ otherwise. Therefore, from the solution to part (a), fX (x) =

|Sx | |B| |B| 1 = = = |S| |S| |A||B| |A|

if x ∈ A, and fX (x) = 0 otherwise. Hence, X ∼ U(A). Similarly, we find that Y ∼ U(B). 9.82 No! The random variables X and Y from Exercise 9.30 are each U(0, 1) random variables and, hence, each have a PDF. However, they don’t have a joint PDF. 9.83 We first observe that, from the definition of a joint PDF/PMF and the continuity property of a probability measure, P (X ≤ x, Y = y) =

-

x

−∞

hX,Y (s, y) ds

and

P (−∞ < X < ∞, Y = y) =

-



−∞

hX,Y (x, y) dx.

a) From the law of partitions, FX (x) = P (X ≤ x) = =

8y

x −∞

8 y

P (X ≤ x, Y = y)

hX,Y (s, y) ds =

-

x

−∞

. 8

Therefore, from Proposition 8.5 on page 422, we have fX (x) = b) We have

/

hX,Y (s, y) ds.

y

B

y

pY (y) = P (Y = y) = P (−∞ < X < ∞, Y = y) =

hX,Y (x, y). -



−∞

hX,Y (x, y) dx.

9.4

Marginal and Conditional Probability Density Functions

9-41

9.84 From Exercise 9.83(a), fX (x) =

8 y

hX,Y (x, y) =

1 8 1

2 y=0

e−x = e−x

if x > 0, and fX (x) = 0 otherwise. Thus, X ∼ E(1). Also, from Exercise 9.83(b), - ∞ - ∞ 1 −x 1 pY (y) = P (Y = y) = hX,Y (x, y) dx = e dx = 2 2 −∞ 0

if y ∈ {0, 1}, and pY (y) = 0 otherwise. Thus, Y has the Bernoulli distribution with parameter 1/2 or, equivalently, Y ∼ B(1, 1/2). 9.85 a) If fX (x) > 0, we define the conditional PMF of Y given X = x by pY | X (y | x) =

hX,Y (x, y) . fX (x)

If fX (x) = 0, we define pY | X (y | x) = 0 for all y ∈ R but, in that case, we don’t refer to pY | X as a conditional PMF. b) If pY (y) > 0, we define the conditional PDF of X given Y = y by fX | Y (x | y) =

hX,Y (x, y) . pY (y)

If pY (y) = 0, we define fX | Y (x | y) = 0 for all x ∈ R but, in that case, we don’t refer to fX | Y as a conditional PDF. 9.86 a) We have fX (x) =

8 y

hX,Y (x, y) =

∞ 8

λe−(1+λ)x

y=0

∞ 8 xy xy = λe−(1+λ)x = λe−(1+λ)x ex = λe−λx y! y! y=0

if x > 0, and fX (x) = 0 otherwise. Thus, X ∼ E(λ). Also, we have - ∞ - ∞ y λ ∞ −(1+λ)x y −(1+λ)x x pY (y) = hX,Y (x, y) dx = λe dx = e x dx y! y! 0 −∞ 0 - ∞ '(y + 1) (1 + λ)y+1 (y+1)−1 −(1+λ)x λ y! λ · x e dx = · ·1 = y+1 y! (1 + λ) '(y + 1) y! (1 + λ)y+1 0 =

(1/λ)y (1 + 1/λ)y+1

if y = 0, 1, . . . , and pY (y) = 0 otherwise. Thus, Y has the Pascal distribution with parameter 1/λ. b) For y = 0, 1, . . . , fX | Y (x | y) =

hX,Y (x, y) = pY (y)

λe−(1+λ)x

xy y!

(1/λ)y (1 + 1/λ)y+1

=

(1 + λ)y+1 (y+1)−1 −(1+λ)x x e '(y + 1)

if x > 0, and fX | Y (x | y) = 0 otherwise. Thus, X|Y =y ∼ '(y + 1, 1 + λ) for y = 0, 1, . . . .

9-42

CHAPTER 9

Jointly Continuous Random Variables

Also, for x > 0, λe−(1+λ)x

xy y!

y hX,Y (x, y) −x x = = e fX (x) y! λe−λx if y = 0, 1, . . . , and pY | X (y | x) = 0 otherwise. Thus, Y|X=x ∼ P(x) for x > 0.

pY | X (y | x) =

9.87 By assumption, Y|X=λ ∼ P(λ) for λ > 0, and X ∼ '(α, β). a) From the general multiplication rule for a joint PDF/PMF, hX,Y (λ, y) = fX (λ)pY | X (y | λ) =

β α α−1 −βλ −λ λy β α λy+α−1 −(β+1)λ λ e ·e = e '(α) y! y! '(α)

if λ > 0 and y = 0, 1, . . . , and hX,Y (λ, y) = 0 otherwise. b) We have, for y = 0, 1, . . . , - ∞ - ∞ α y+α−1 β λ pY (y) = hX,Y (λ, y) dλ = e−(β+1)λ dλ y! '(α) −∞ 0 - ∞ α β '(y + α) (β + 1)y+α (y+α)−1 −(β+1)λ = · λ e dλ y! '(α) (β + 1)y+α 0 '(y + α) =

'(y + α) βα β α λy+α−1 '(y + α) · · 1 = · . y! '(α) (β + 1)y+α y! '(α) (β + 1)y+α

However, from Equation (8.43) on page 451,

Also,

(α + y − 1)(α + y − 2) · · · α'(α) '(y + α) = y! '(α) y! '(α) * , y (−1) (−α)(−α − 1) · · · (−α − y + 1) −α = = (−1)y . y! y

βα = (β + 1)y+α It now follows that

*

β β +1



*

1 · = (β + 1)y

*

β β +1

,α *

β 1− β +1

,y

.

,* ,α * ,y −α β β pY (y) = −1 y β +1 β +1 if y = 0, 1, . . . , and pY (y) = 0 otherwise. Hence, referring to page 243, we see that Y has the Pascal distribution with parameters α and β/(β + 1). c) For y = 0, 1, . . . , β α λy+α−1 −(β+1)λ e hX,Y (λ, y) y! '(α) fX | Y (λ | y) = = * ,* ,α * ,y pY (y) β −α β −1 y β +1 β +1

β α λy+α−1 −(β+1)λ e (β + 1)y+α (y+α)−1 −(β+1)λ y! '(α) = = λ e α '(y + α)β '(y + α) y! '(α)(β + 1)y+α

for λ > 0, and fX | Y (λ | y) = 0 otherwise. Hence, X|Y =y ∼ '(y + α, β + 1) for y = 0, 1, . . . .

9.5

9-43

Independent Continuous Random Variables

9.5 Independent Continuous Random Variables Basic Exercises 9.88 a) Given that X = 0, the possible values √ of Y are √ between −1 and 1. However, given that X = 0.5, the possible values of Y are between − 3/2 and 3/2. In particular, knowing the value of X affects the probability distribution of Y . b) From Example 9.8, 9 & (2/π) 1 − y 2 , if −1 < y < 1; fY (y) = 0, otherwise. In particular, we note!that not have"a uniform distribution. However, from Example 9.10, we √ Y does √ know that Y|X=x ∼ U − 1 − x 2 , 1 − x 2 for −1 < x < 1. It now follows from Proposition 9.10(a) on page 527 that X and Y are not independent random variables. c) Let g denote the joint PDF of X and Y in Equation (9.39) on page 524, and let f denote the product of the marginals of X and Y . We find that, within the unit disk, the second-order mixed partials exist. In particular, they exist at (0, 0). If f were also a joint PDF of X and Y , then, according to Equation (9.22) on page 499, we would have f (0, 0) = g(0, 0). But that isn’t true as f (0, 0) = 4/π 2 and g(0, 0) = 1/π. 9.89 From Example 9.11 on page 516, each conditional PDF of Y given X = x is a PDF of Y . Independence of X and Y now follows from Proposition 9.10(a) on page 527. 9.90 By assumption, 9 1/10, fX (x) = 0,

if −5 < x < 5; otherwise.

and

fY (y) =

9

1/10, 0,

if −5 < y < 5; otherwise.

As X and Y are independent, a joint PDF of X and Y is equal to the product of the two marginals: 9 1/100, if −5 < x < 5 and −5 < y < 5; fX,Y (x, y) = 0, otherwise.

We can use the FPF to evaluate P (|X − Y | > 5). Alternatively, we note that, in this case, X and Y are the x and y coordinates, respectively, of a point selected at random from the square ! = { (x, y) : −5 < x < 5, −5 < y < 5 }. Thus,

@ @ @ @ @{|X − Y | > 5}@ @{ (x, y) ∈ ! : |x − y| > 5 }@ 25 1 P (|X − Y | > 5) = = = = . |!| |!| 100 4

9.91 We have fX,Y (x, y) =

9

1, if 0 < x < 1 and 0 < y < 1; 0, otherwise.

and, from Example 9.7 on page 510, 9 1, if 0 < x < 1; fX (x) = 0, otherwise.

and

fY (y) =

9

1, 0,

if 0 < y < 1; otherwise.

Hence, we see that the function f defined on R2 by f (x, y) = fX (x)fY (y) is a joint PDF of X and Y . Consequently, by Proposition 9.9 on page 523, X and Y are independent random variables.

9-44

CHAPTER 9

Jointly Continuous Random Variables

9.92 From the solution to Exercise 9.65, 9 & (4/π) 1 − y2, fY (y) = 0,

if 0 < y < 1; otherwise.

In particular, we note that Y does not distribution. However, we also know from the solution √ a uniform ! have " to Exercise 9.65 that Y|X=x ∼ U 0, 1 − x 2 for −1 < x < 1. It now follows from Proposition 9.10(a) on page 527 that X and Y are not independent random variables. ! " 9.93 By assumption, Y|X=x ∼ N ρx, 1 − ρ 2 for each x ∈ R. From Example 9.13(b) on page 517, Y ∼ N (0, 1). From Proposition 9.10(a), X and Y are independent if and only if Y|X=x is a PDF of Y for each x ∈ R, which is the case, if and only if ρ = 0. 9.94 We have

1 1 1 · = 2 2 4 if −1 < x < 1 and −1 < y < 1, and fX,Y (x, y) = 0 otherwise. The roots of the quadratic equation x 2 + Xx + Y = 0 are real if and only if X2 − 4Y ≥ 0. Applying the FPF now yields / -- 1 .- x 2 /4 ( ) 1 13 P X2 − 4Y ≥ 0 = fX,Y (x, y) dx dy = dy dx = . 4 24 −1 −1 fX,Y (x, y) = fX (x)fY (y) =

x 2 −4y≥0

9.95 a) We have fX (x) =

-

fY (y) =

-

b) We have



−∞ ∞

−∞

fX,Y (x, y) dy =

-

fX,Y (x, y) dx =

-



−∞ ∞

−∞

g(x)h(y) dy =

*-

g(x)h(y) dx =

*-

c) From Proposition 8.6(b) on page 423 and part (a), / , *- ∞ - ∞ .*- ∞ 1= fX (x) dx = h(y) dy g(x) dx = −∞

−∞

−∞

d) By assumption and parts (a)–(c), fX,Y (x, y) = g(x)h(y) = .*=



−∞

∞ −∞ ∞



, *-

g(s) ds /.*, h(t) dt g(x) −∞





−∞

−∞

,

g(x) dx h(y).

−∞



g(x) dx

−∞

*-

, h(y) dy g(x).

, *-



,

h(y) dy .

−∞

,

h(t) dt g(x)h(y) / ,

g(s) ds h(y) = fX (x)fY (y).

Hence, from Proposition 9.9 on page 523, X and Y are independent random variables. 9.96 No, not necessarily. From Exercise 9.95(c), *- ∞ g(x) dx = −∞

∞ −∞

h(y) dy

,−1

.

Referring to Proposition 8.6(b) on page 423 and Exercise 9.95(a), we see that g is a PDF of X if and only $∞ if −∞ h(y) dy = 1, which, in turn, implies that h is a PDF of Y by Exercise 9.95(b). Thus, we see that g

9.5

Independent Continuous Random Variables

and h are marginal PDFs of X and Y , respectively, if and only if $∞ only if −∞ g(x) dx = 1.

$∞

−∞ h(y) dy

9-45

= 1 or, equivalently, if and

9.97 a) From Equation (9.20) on page 498, a joint PDF of X and Y is ! "2−2 = 2f (x)f (y) fX,Y (x, y) = 2 · (2 − 1)f (x)f (y) F (y) − F (x)

if x < y, and fX,Y (x, y) = 0 otherwise. b) No. Although at first glance it might appear that the joint PDF in part (a) can be factored into a function of x alone and a function of y alone, that isn’t the case. Indeed, fX,Y (x, y) = 2f (x)f (y)I{ (x,y): x 0 and r > 0, -FX,R (x, r) = P (X ≤ x, R ≤ r) = P (X ≤ x, Y − X ≤ r) = fX,Y (s, t) ds dt =

-

x 0

*-

s+r

2f (s)f (t) dt s

,

ds =

-

x 0

*-

s≤x t−s≤r

r 0

, 2f (s)f (u + s) du ds.

Therefore, from Proposition 9.4 on page 499, 9 2f (x)f (r + x), if x > 0 and r > 0; fX,R (x, r) = 0, otherwise. b) Applying Proposition 9.7 on page 510, the result of part (a), and making the substitution u = r + x, we get - ∞ - ∞ - ∞ fX (x) = fX,R (x, r) dr = 2f (x)f (r + x) dr = 2f (x) f (r + x) dr −∞ 0 0 - ∞ ! " = 2f (x) f (u) du = 2f (x) 1 − F (x) x

if x > 0, and fX (x) = 0 otherwise. Also, - ∞ fR (r) = fX,R (x, r) dx =

0

−∞



2f (x)f (r + x) dx = 2

-

0



f (x)f (r + x) dx

if r > 0, and fR (r) = 0 otherwise. c) Assume Xk ∼ E(λ) for k = 1, 2. Then, from part (a),

fX,R (x, r) = 2f (x)f (r + x) = 2λe−λx λe−λ(r+x) = 2λ2 e−λ(r+2x)

if x > 0 and r > 0, and fX,R (x, r) = 0 otherwise. Also, from part (b), ! " fX (x) = 2f (x) 1 − F (x) = 2λe−λx e−λx = 2λe−2λx if x > 0, and fX (x) = 0 otherwise. Moreover, - ∞ - ∞ −λx −λ(r+x) 2 −λr fR (r) = 2 f (x)f (r + x) dx = 2 λe λe dx = 2λ e 0

0

∞ 0

e−2λx dx = λe−λr

! "! " if r > 0, and fR (r) = 0 otherwise. Because 2λ2 e−λ(r+2x) = 2λe−2λx λe−λr , it follows from Proposition 9.9 on page 523 that X and R are independent random variables.

9.5

Independent Continuous Random Variables

9-47

d) No! The random variables X and R aren’t independent, for example, if the common probability distribution of X1 and X2 is uniform on the interval (0, 1). In that case, fX,R (x, r) = 2f (x)f (r + x) = 2 if 0 < r < 1 − x and 0 < x < 1, and fX,R (x, r) = 0 otherwise. Also, ! " fX (x) = 2f (x) 1 − F (x) = 2(1 − x) if 0 < x < 1, and fX (x) = 0 otherwise. Moreover, fR (r) = 2

-



0

f (x)f (r + x) dx = 2

-

1−r 0

1 dx = 2(1 − r)

if 0 < r < 1, and fR (r) = 0 otherwise. We can now proceed in several ways to show that, in this case, X and R are not independent. One way is to argue as in Example 9.14 on page 524, except use the region S = { (x, r) : 1 − x < r < 1, 0 < x < 1 }. Indeed, let f be the function defined on R2 by f (x, r) = fX (x)fR (r). Then 9 4(1 − x)(1 − r), if 0 < x < 1 and 0 < r < 1; f (x, r) = 0, otherwise. We see from the joint PDF of X and R that the probability is 0 that the random point (X, R) falls in S. If f were a joint PDF of X and R, then its double integral over S would also be 0, which it clearly isn’t. Hence, f is not a joint PDF of X and R, which, by Proposition 9.9, means that X and R aren’t independent. 9.101 Let X and Y denote the bid amounts by the two insurers. We want to find P (|X − Y | < 20). By assumption, X and Y are independent U(2000, 2200) random variables. Thus, a joint PDF of X and Y is given by 1 1 1 fX,Y (x, y) = fX (x)fY (y) = · = 200 200 40,000 if 2000 < x < 2200 and 2000 < y < 2200, and fX,Y (x, y) = 0 otherwise. From the FPF and complementation rule, -P (|X − Y | < 20) = 1 − P (|X − Y | ≥ 20) = 1 − fX,Y (x, y) dx dy =1−2

-

. 2200 -

2020

|x−y|≥20

x−20 2000

/ - 2200 1 1 dy dx = 1 − (x − 2020) dx 40,000 20,000 2020

= 1 − 0.81 = 0.19. Note that we can also obtain the required probability by using the fact that we have a uniform probability model on the square S = (2000, 2200) × (2000, 2200). Then, @ @ @ @ @{|X − Y | < 20}@ @{ (x, y) : |x − y| < 20 } ∩ S @ P (|X − Y | < 20) = = |S| |S| @ @ |S| − @{ (x, y) : |x − y| ≥ 20 } ∩ S @ (180)(180)/2 =1−2· = = 0.19. |S| (200)(200)

9-48

CHAPTER 9

Jointly Continuous Random Variables

9.102 Let X and Y denote the waiting times for the first claim by a good driver and bad driver, respectively. By assumption, X and Y are independent random variables with X ∼ E(1/6) and Y ∼ E(1/3). a) We have ( )( ) P (X ≤ 3, Y ≤ 2) = P (X ≤ 3)P (Y ≤ 2) = 1 − e−(1/6)·3 1 − e−(1/3)·2 = 1 − e−2/3 − e−1/2 + e−7/6 = 0.191.

b) By independence, ( )( ) 1 −(1/6)x −(1/3)y e e fX,Y (x, y) = fX (x)fY (y) = (1/6)e−(1/6)x (1/3)e−(1/3)y = 18

if x > 0 and y > 0, and fX,Y (x, y) = 0 otherwise. Applying the FPF now gives P (X < Y ) = =

--

1 fX,Y (x, y) dx dy = 18

x 4 | X = 4) = P (Y > 4) = e−(1/3)·4 = 0.264. 9.103 a) By symmetry, each of the 3! possible orderings of X, Y , and Z are equally likely. Consequently, we have P (X < Y < Z) = 1/3! = 1/6. b) From independence, fX,Y,Z (x, y, z) = fX (x)fY (y)fZ (z) = f (x)f (y)f (z). Applying the FPF and making the successive substitutions u = F (y) and v = F (z), we get P (X < Y < Z) = = =

---

fX,Y,Z (x, y, z) dx dy dz

x Y + 1/3). We evaluate the required probability by using the FPF and by first making the substitution u = 2/3 − y and then the substitution v = 1/3 − x, thus: 6P (Y > X + 1/3, Z > Y + 1/3) --=6 fX,Y,Z (x, y, z) dx dy dz = 6

0

y>x+1/3 z>y+1/3

=6 =3

-

1/3

0

-

0

1/3

.-

1/3

/

2/3 x+1/3

(2/3 − y) dy dx = 6 2

(1/3 − x) dx = 3

-

0

1/3

-

v 2 dv =

1/3

0

1 . 27

.-

.-

2/3 x+1/3

*-

1/3−x

1 y+1/3

/

, / 1 dz dy dx

u du dx 0

9-50

CHAPTER 9

Jointly Continuous Random Variables

b) We must have 2d ≤ 1, or d ≤ 1/2. c) For d ≤ 1/2, the required probability is, by symmetry, 3! times P (Y > X + d, Z > Y + d). We evaluate the required probability by using the FPF and by first making the substitution u = 1 − d − y and then the substitution v = 1 − 2d − x, thus: P (Y > X + d, Z > Y + d) --= fX,Y,Z (x, y, z) dx dy dz = 6

0

y>x+d z>y+d

=6 =3

-

1−2d

0

-

0

.-

1−2d

.-

/

1−d x+d

(1 − d − y) dy dx = 6

1−2d

2

(1 − 2d − x) dx = 3

= (1 − 2d)3 .

-

1−2d

-

1−d *- 1

x+d

1−2d

0

y+d

.-

, / 1 dz dy dx /

1−2d−x

u du dx 0

v 2 dv

0

9.106 a) Let X = Xk and let Y = minj ̸=k {Xj , 1 ≤ j ≤ n}. From Proposition 6.13 on page 297, X and Y are ! "n−2 independent and, from Example 9.9(a) on page 512, fY (y) = (n − 1)f (y) 1 − F (y) . We evaluate the required probability by using the FPF and by first making the substitution u = 1 − F (y) and then the substitution v = 1 − F (x), thus: P (Xk = min{X1 , . . . , Xn }) = P (X < Y ) --= fX,Y (x, y) dx dy = fX (x)fY (y) dx dy xy

f (x)(n − 1)f (y) F (y)

x>y - ∞

f (x)

−∞

-



−∞

-

!



−∞

1 = . n

*-

x

−∞

f (x)

.-

0

"n−2 !

dx dy "n−2

(n − 1)f (y) F (y) /

F (x)

,

dy dx

(n − 1)un−2 du dx

! "n−1 f (x) F (x) dx =

-

1

v n−1 dv

0

b) By symmetry, each of the Xj s, of which there are n, is equally likely to be the maximum of the random sample. Hence, the probability that the maximum is Xk equals 1/n.

Theory Exercises 9.108 Continuous random variables, X1 , . . . , Xm , with a joint PDF are independent if and only if the function f defined on Rm by f (x1 , . . . , xm ) = fX1 (x1 ) · · · fXm (xm ) is a joint PDF of X1 , . . . , Xm . We write this condition symbolically as fX1 ,...,Xm (x1 , . . . , xm ) = fX1 (x1 ) · · · fXm (xm ),

x1 , . . . , xm ∈ R.

To prove this result, we proceed as in the proof of the bivariate case. Suppose that X1 , . . . , Xm are independent random variables. Then, for each x1 , . . . , xm ∈ R, FX1 ,...,Xm (x1 , . . . , xm ) = P (X1 ≤ x1 , . . . , Xm ≤ xm ) = P (X1 ≤ x1 ) · · · P (Xm ≤ xm ) *- x1 , *- xm , = fX1 (s1 ) ds1 · · · fXm (sm ) dsm =

-

−∞ x1

−∞

···

-

xm

−∞

−∞

fX1 (s1 ) · · · fXm (sm ) ds1 · · · dsm .

Consequently, from the multivariate analogue of Proposition 9.4 on page 499 [see Exercise 9.29(c)], the function f defined on Rm by f (x1 , . . . , xm ) = fX1 (x1 ) · · · fXm (xm ) is a joint PDF of X1 , . . . , Xm . Conversely, suppose that the function f defined on Rm by f (x1 , . . . , xm ) = fX1 (x1 ) · · · fXm (xm ) is a joint PDF of X1 , . . . , Xm . Let A1 , . . . , Am be any m subsets of real numbers. Applying the FPFs for

9-52

CHAPTER 9

Jointly Continuous Random Variables

multivariate and univariate continuous random variables, we get P (X1 ∈ A1 , . . . , Xm ∈ Am )

" ! = P (X1 , . . . , Xm ) ∈ A1 × · · · × Am =

= =

-

···

-

A1 ×···×Am

*-

A1

-

fX1 (x1 ) dx1 · · ·

*-

Am

f (x1 , . . . , xm ) dx1 · · · dxm

-

.

A1 ×···×Am

fX1 (x1 ) · · · fXm (xm ) dx1 · · · dxm = ,

-

···

fXm (xm ) dxm

,

Am

···

*-

A1

,

/

fX1 (x1 ) dx1 · · · fXm (xm ) dxm

= P (X1 ∈ A1 ) · · · P (Xm ∈ Am ).

Thus X1 , . . . , Xm are independent random variables. 9.109 a) Using independence, FX,Y (x, y) = P (X ≤ x, Y ≤ y) = P (X ≤ x)P (Y ≤ y) , *- y , - x *- x = fX (s) ds fY (t) dt = −∞

−∞

y

−∞ −∞

fX (s)fY (t) ds dt.

Hence, from Proposition 9.4 on page 499, the function f defined on R2 by f (x, y) = fX (x)fY (y) is a joint PDF of X and Y . b) No. Let X and Y be the x and y coordinates, respectively, of a point selected at random from the diagonal of the unit square, that is, from { (x, y) ∈ R2 : y = x, 0 < x < 1 }. Then both X and Y are uniform on the interval (0, 1) and hence each has a PDF. However, X and Y doesn’t have a joint PDF because P (X = Y ) = 1 > 0. c) No. In the petri-dish illustration of Example 9.14 on page 524, X and Y have a joint PDF but aren’t independent. 9.110 a) Using the FPF and independence, and then making the substitution u = y + x, we get FX+Y (z) = P (X + Y ≤ z) =

--

fX,Y (x, y) dx dy =

x+y≤z

= =

-



−∞

-

z

−∞

**-

z−x −∞ ∞

−∞

--

fX (x)fY (y) dx dy

x+y≤z

,

fY (y) dy fX (x) dx = ,

-

∞ −∞

*-

z −∞

, fY (u − x) du fX (x) dx

fX (x)fY (u − x) dx du.

Therefore, from Proposition 8.5 on page 422, fX+Y (z) =

-



−∞

fX (x)fY (z − x) dx.

9.5

9-53

Independent Continuous Random Variables

b) We can proceed as in part (a) to find the PDF of X − Y . Alternatively, we note that

F−Y (y) = P (−Y ≤ y) = P (Y ≥ −y) = 1 − P (Y < −y) = 1 − FY (−y)

so that f−Y (y) = fY (−y). Now applying the result of part (a) with Y replaced by −Y , we get - ∞ fX (x)fY (x − z) dx. fX−Y (z) = −∞

Advanced Exercises 9.111 We begin by noting that |S| = |A||B|; Sx = B for x ∈ A, and Sx = ∅ otherwise; S y = A for y ∈ B, and S y = ∅ otherwise. From Exercise 9.40, a joint PDF of X and Y is fX,Y (x, y) =

1 1 = , |S| |A||B|

x ∈ A, y ∈ B,

and fX,Y (x, y) = 0 otherwise. From Exercise 9.81, fX (x) = and fX (x) = 0 otherwise; and fY (y) =

|Sx | |B| 1 = = , |S| |A||B| |A|

x ∈ A,

|S y | |A| 1 = = , |S| |A||B| |B|

y ∈ B,

and fY (y) = 0 otherwise. From the three displays, we conclude that the function f defined on R2 by f (x, y) = fX (x)fY (y) is a joint PDF of X and Y . Hence, by Proposition 9.9 on page 523, the random variables X and Y are independent. 9.112 Suppose that X and Y are independent. Then, P (a ≤ X ≤ b, Y = y) = P (a ≤ X ≤ b)P (Y = y) =

*-

,

b

a

fX (x) dx pY (y) =

-

b

fX (x)pY (y) dx.

a

Thus the function h defined on R2 by h(x, y) = fX (x)pY (y) is a joint PDF/PMF of X and Y . Conversely, suppose that the function h defined on R2 by h(x, y) = fX (x)pY (y) is a joint PDF/PMF of X and Y . Let A and B be two subsets of real numbers. Then 88 P (X ∈ A, Y = y) = hX,Y (x, y) dx P (X ∈ A, Y ∈ B) = =

y∈B

y∈B A

8-

*-

y∈B A

fX (x)pY (y) dx =

= P (X ∈ A)P (Y ∈ B).

fX (x) dx A

, .8 y∈B

/

pY (y)

Hence, X and Y are independent random variables.

9.113 From Exercise 9.59, X ∼ E(λ) and Y has the Pascal distribution (as defined in Exercise 7.26) with parameter 1/λ. Thus, ⎧ (1/λ)y 9 ⎨ −λx , y = 0, 1, . . . ; , if x > 0; fX (x) = λe and pY (y) = (1 + 1/λ)y+1 ⎩ 0, otherwise. 0, otherwise.

9-54

CHAPTER 9

Jointly Continuous Random Variables

It appears that the function h defined on R2 by h(x, y) = fX (x)pY (y) is not a joint PDF/PMF of X and Y and, hence, in view of Exercise 9.112, that X and Y are not independent random variables. To verify, note, for instance, that - ∞ - ∞ hX,Y (x, 0) dx = λe−(1+λ)x dx P (X > 1, Y = 0) = 1

1

λ −(1+λ) 1 = e = e−1 e−λ 1+λ 1 + 1/λ 1 ̸= e−λ = P (X > 1)P (Y = 0). 1 + 1/λ

9.114 From Exercise 9.84, X ∼ E(1) and Y ∼ B(1, 1/2). Hence, 9 9 1/2, e−x , if x > 0; and pY (y) = fX (x) = 0, 0, otherwise. Note that

⎧ ⎨ 1 e−x , fX (x)pY (y) = 2 ⎩ 0,

if x > 0 and y = 0 or 1; otherwise.

if y = 0 or 1; otherwise.

= hX,Y (x, y).

Thus, the function h defined on R2 by h(x, y) = fX (x)pY (y) is a joint PDF/PMF of X and Y and, hence, in view of Exercise 9.112, X and Y are independent random variables.

9.6 Functions of Two or More Continuous Random Variables Basic Exercises 9.115 a) By independence, 1 1 1 2 2 2 2 2 2 2 fX,Y (x, y) = fX (x)fY (y) = √ e−(x +y )/2σ e−x /2σ · √ e−y /2σ = 2 2πσ 2πσ 2πσ for all x, y ∈ R. For r > 0, we use the FPF and change to polar coordinates to get -(& ) FR (r) = P (R ≤ r) = P X2 + Y 2 ≤ r = fX,Y (x, y) dx dy √ 1 = 2πσ 2 √ =

1 σ2

-

r 0

x 2 +y 2 ≤r

--

e

−(x 2 +y 2 )/2σ 2

1 dx dy = 2πσ 2

x 2 +y 2 ≤r 2 /2σ 2

e−u

u du.

Therefore, from Propostion 8.5 on page 422, fR (r) = if r > 0, and fR (r) = 0 otherwise.

r −r 2 /2σ 2 e σ2

-

2π 0

*-

r

e 0

−u2 /2σ 2

u du

,



9.6

Functions of Two or More Continuous Random Variables

9-55

! " b) Because X ∼ N 0, σ 2 , we know that X/σ ∼ N (0, 1) and, hence, that X2 /σ 2 has the chi-square distribution with one degree of freedom, as does Y 2 /σ 2 . Referring now to the second bulleted item on page 537, we conclude that W = (X2 + Y 2 )/σ 2 has the chi-square distribution with two degrees of freedom: (1/2)(2/2) (2/2)−1 −w/2 1 e = e−w/2 fW (w) = w '(2/2) 2 √ if w > 0, and fW (w) =√0 otherwise. Noting that R = σ W , we apply the univariate transformation theorem with g(w) = σ w to conclude that & 1 1 2 r 2 /σ 2 1 −r 2 /2σ 2 r 1 −w/2 2 2 fR (r) = ′ fW (w) = = · e = 2 e−r /2σ √ · e σ 2 |g (w)| σ σ/2 w 2

if r > 0, and fR (r) = 0 otherwise.

9.116 a) Let Z = X + Y . Referring to Equation (9.44) on page 534 and then to Equation (9.20), we get - ∞ - z/2 ! "n−2 fZ (z) = fX,Y (x, z − x) dx = n(n − 1) f (x)f (z − x) F (z − x) − F (x) dx. −∞

−∞

Noting that M = Z/2, we apply the univariate transformation theorem with g(z) = z/2 to conclude that fM (m) =

1

fZ (z) |g ′ (z)|

= 2n(n − 1)

-

= m

−∞

1 fZ (z) = 2fZ (2m) |1/2| ! "n−2 f (x)f (2m − x) F (2m − x) − F (x) dx.

b) For a U(0, 1) distribution, # 0, if t < 0; F (t) = t, if 0 ≤ t < 1; 1, if t ≥ 1.

and

f (t) =

9

1, 0,

if 0 < t < 1; otherwise.

Referring now to the formula for fM in part (a), we note that the integrand is nonzero if and only if 0 < x < 1 and 0 < 2m − x < 1. Keeping in mind that, in this case, 0 < m < 1, we see that the integrand is nonzero if and only if x > max{2m − 1, 0}. For convenience, let a = max{2m − 1, 0}. Then, upon making the substitution u = m − x, we get - m ! "n−2 fM (m) = 2n(n − 1) (2m − x) − x dx a - m−a un−2 du = n2n−1 (m − a)n−1 . = 2n−1 n(n − 1) 0

Now,

m − a = m − max{2m − 1, 0} = Therefore,

9

m, if m < 1/2; 1 − m, if m > 1/2.

⎧ ⎨ n(2m)n−1 , if 0 < m < /2; fM (m) = n(2 − 2m)n−1 , if 1/2 < m < 1; ⎩ 0, otherwise.

9-56

CHAPTER 9

Jointly Continuous Random Variables

9.117 Let Y denote system lifetime, so that Y = max{X1 , . . . , Xn }. Then FY (y) = 0 if y < 0 and, for y > 0, FY (y) = P (Y ≤ y) = P (max{X1 , . . . , Xn } ≤ y) = P (X1 ≤ y, . . . , Xn ≤ y) n ! C " = P (X1 ≤ y) · · · P (Xn ≤ y) = 1 − e−λk y . k=1

Differentiation now yields fY (y) =

FY′ (y)

, *C , ,* n n *C ! n ! 8 " " 8 λk e−λk y −λj y −λk y −λk y = = 1−e λk e 1−e 1 − e−λk y k=1 j ̸=k k=1 k=1

if y > 0, and fY (y) = 0 otherwise.

9.118 Let X and Y denote the lifetimes of the original component and the spare, respectively. By assumption, both X and Y have an E(λ) distribution and, furthermore, we can assume that they are independent random variables. Thus, fX,Y (x, y) = fX (x)fY (y) = λe−λx · λe−λy = λ2 e−λ(x+y) if x > 0 and y > 0, and fX,Y (x, y) = 0 otherwise. We want to find the CDF of the random variable R = X/(X + Y ). For 0 < r < 1, * , -! " X fX,Y (x, y) dx dy ≤ r = P Y ≥ (1 − r)X/r = FR (r) = P (R ≤ r) = P X+Y y≥(1−r)x/r

=

-

0



∞ *-

-



0



2 −λ(x+y)

λ e

(1−r)x/r

,

dy dx =

e−λ(1−r)x/r e−λx dx = λ

-



0

-

0

∞ *-



λe

−λy

,

dy λe−λx dx

(1−r)x/r

e−(λ/r)x dx = r.

Thus, R ∼ U(0, 1). 9.119 From the fact that X and Y are independent E(λ) random variables, we get, for z > 0, --fX,Y (x, y) dx dy = fX (x)fY (y) dx dy FX−Y (z) = P (X − Y ≤ z) = x−y≤z

= =

-

-

0

0

∞ *- y+z 0



x−y≤z

,

λe−λx dx λe−λy dy =

λe−λy dy − λe−λz

-

∞ 0

-

0

∞(

) 1 − e−λ(y+z) λe−λy dy

1 e−2λz dy = 1 − e−λz . 2

Therefore, fX−Y (z) = (λ/2)e−λz if z > 0. We can do a similar computation to determine fX−Y (z) when z < 0. Alternatively, we can use symmetry, as discussed in Exercise 8.204 on page 482. Thus, for z < 0, we have fX−Y (z) = fX−Y (−z) = (λ/2)e−λ(−z) = (λ/2)e−λ|z| . Consequently, fX−Y (z) = (λ/2)e−λ|z| for all z ∈ R.

9.6

Functions of Two or More Continuous Random Variables

9-57

Note: In the following graph, we use f (z) instead of fX−Y (z).

9.120 a) From the FPF, --

P (X + Y ≤ 1) =

fX,Y (x, y) dx dy =

x+y≤1

= =

-

0

-

0

1

.-

1−x

e 0

/

−y

1(

-

1 0

.-

dy e−x dx = )

-

1−x

e 0

0

−(x+y)

/

dy dx

1(

) 1 − e−(1−x) e−x dx

e−x − e−1 dx = 1 − 2e−1 .

e−x e−y

for x > 0 and y > 0, and fX,Y (x, y) = 0 otherwise. It follows b) We note that fX,Y (x, y) = easily that X and Y are independent E(1) random variables or, equivalently, independent '(1, 1) random variables. Therefore, by Proposition 9.13, X + Y ∼ '(2, 1). Applying now Equation (8.49) with r = 2 and λ = 1 gives P (X + Y ≤ 1) = FX+Y (1) = 1 − e−1 (1 + 1) = 1 − 2e−1 . 9.121 Let X and Y denote the times, in days, until the next basic and deluxe policy claims, respectively. By assumption, X and Y are independent E(1/2) and E(1/3) random variables. a) From the FPF and independence, --P (Y < X) = fX,Y (x, y) dx dy = fX (x)fY (y) dx dy y 0, and fY /X (z) = 0 otherwise. A b) Let U = Y/X. We have Z = X/(X + Y ) = 1 (1 + Y/X) = 1/(1 + U ). Applying the univariate transformation theorem with g(u) = 1/(1 + u) and referring to part (a), we get fZ (z) = =

! " 1 1 @ @ fU (u) = (1 + u)2 fU (u) = 1 + (1/z − 1) 2 fU (1/z − 1) f (u) = U @−1/(1 + u)2 @ |g ′ (u)| 1 1 (1/z − 1)β−1 1 zα−1 (1 − z)β−1 ! "α+β = 2 B(α, β) z B(α, β) (1/z − 1) + 1

if 0 < z < 1, and fZ (z) = 0 otherwise. Thus, Z has the beta distribution with parameters α and β. c) In Exercise 9.118, X and Y are independent E(λ) random variables or, equivalently, '(1, λ) random variables. Applying part (b), with α = β = 1, we find that fZ (z) = 1/B(1, 1) = 1 if 0 < z < 1, and fZ (z) = 0 otherwise. Thus, in this case, Z = X/(X + Y ) ∼ U(0, 1).

9.134 a) Let U = W + X. From Example 9.19 on page 535, U ∼ T (0, 2) and, from Proposition 6.13 on page 297, U and Y are independent. Therefore, from Equation (9.46) on page 534, - ∞ fU (u)fY (t − u) du. fU +Y (t) = −∞

The integrand is nonzero if and only if 0 < u < 2 and 0 < t − u < 1; that is, if and only if 0 < u < 2 and t − 1 < u < t. Therefore, - min{2,t} fU (u) du. fU +Y (t) = max{0,t−1}

Now,

max{0, t − 1} = Consequently,

9

0, t − 1,

⎧$t ⎪ ⎪ 0 fU (u) du, ⎨ $t fU +Y (t) = t−1 fU (u) du, ⎪ ⎪ $ ⎩ 2 t−1 fU (u) du,

if t ≤ 1; if t > 1.

and

min{2, t} =

9

t, if t ≤ 2; 2, if t > 2.

⎧$t u du, if 0 < t ≤ 1; ⎪ ⎪ ⎨ $0 $t 1 if 1 < t ≤ 2; = t−1 u du + 1 (2 − u) du, if 1 < t ≤ 2; ⎪ ⎪ ⎩$2 if 2 < t ≤ 3. if 2 < t ≤ 3. t−1 (2 − u) du, if 0 < t ≤ 1;

9.6

Functions of Two or More Continuous Random Variables

9-61

Evaluating the three integrals, we get ⎧ 2 t /2, ⎪ ⎪ ⎪ ⎨ 2 −t + 3t − 3/2, fW +X+Y (t) = ⎪ t 2 /2 − 3t + 9/2, ⎪ ⎪ ⎩ 0,

if 0 < t ≤ 1;

if 1 < t ≤ 2; if 2 < t ≤ 3; otherwise.

b) Let V = W + X + Y , whose PDF is given in part (a). From Proposition 6.13 on page 297, V and Z are independent. Therefore, from Equation (9.46), fV +Z (t) =

-



−∞

fV (v)fY (t − v) dv.

The integrand is nonzero if and only if 0 < v < 3 and 0 < t − v < 1; that is, if and only if 0 < u < 3 and t − 1 < v < t. Therefore, fV +Z (t) =

-

min{3,t}

max{0,t−1}

fV (v) dv.

Now, max{0, t − 1} =

9

0, t − 1,

if t ≤ 1; if t > 1.

and

min{3, t} =

9

t, if t ≤ 3; 3, if t > 3.

Consequently, ⎧$t ⎪ ⎪ 0 fV (v) dv, ⎪ ⎪ $t ⎪ ⎨ t−1 fV (v) dv, fV +Z (t) = $ t ⎪ ⎪ ⎪ t−1 fV (v) dv, ⎪ ⎪ ⎩ $ 3 f (v) dv, t−1 V

if 0 < t ≤ 1; if 1 < t ≤ 2;

if 2 < t ≤ 3; if 3 < t ≤ 4.

⎧$t 2 (v /2) dv, ⎪ ⎪ ⎪ $0 ⎪ $t ⎪ 1 2 2 ⎨ t−1 (v /2) dv + 1 (−v + 3v − 3/2) dv, = $2 $t ⎪ (−v 2 + 3v − 3/2) dv + 2 (v 2 /2 − 3v + 9/2) dv, ⎪ t−1 ⎪ ⎪$ ⎪ ⎩ 3 2 t−1 (v /2 − 3v + 9/2) dv,

if 0 < t ≤ 1;

if 1 < t ≤ 2;

if 2 < t ≤ 3;

if 3 < t ≤ 4.

Evaluating the four integrals, we get

⎧ 3 t /6, ⎪ ⎪ ⎪ ⎪ 3 2 ⎪ ⎨ −t /2 + 2t − 2t + 2/3, fW +X+Y +Z (t) = t 3 /2 − 4t 2 + 10t − 22/3, ⎪ ⎪ ⎪ ⎪ −t 3 /6 + 2t 2 − 8t + 32/3, ⎪ ⎩ 0,

if 0 < t ≤ 1;

if 1 < t ≤ 2;

if 2 < t ≤ 3; if 3 < t ≤ 4; otherwise.

9-62

CHAPTER 9

Jointly Continuous Random Variables

c) Following are graphs of the PDFs of W , W + X, W + X + Y , and W + X + Y + Z.

Theory Exercises 9.135 a) Using the FPF and making the substitution u = xy, we get FXY (z) = P (XY ≤ z) = =

-

0

−∞

*-

∞ z/x

--

fX,Y (x, y) dx dy

xy≤z

,

fX,Y (x, y) dy dx +

-

0

∞ *- z/x −∞

,

fX,Y (x, y) dy dx

*- −∞ , , - ∞ *- z 1 1 = fX,Y (x, u/x) du dx + fX,Y (x, u/x) du dx x −∞ x z 0 −∞ , *- z , , - 0 * - ∞ *- z 1 1 = − fX,Y (x, u/x) du dx + fX,Y (x, u/x) du dx x x −∞ −∞ 0 −∞ / - ∞ - z .- 0 1 1 fX,Y (x, u/x) dx + fX,Y (x, u/x) dx du. = |x| −∞ −∞ |x| 0 , - z *- ∞ 1 = fX,Y (x, u/x) dx du. −∞ −∞ |x| -

0

Therefore, from Proposition 8.5 on page 422, fXY (z) =



−∞

1 fX,Y (x, z/x) dx. |x|

b) If X and Y are independent, then we can replace the joint PDF in the preceding display by the product of the marginal PDFs, thus: - ∞ 1 fX (x)fY (z/x) dx. fXY (z) = −∞ |x|

9.6

Functions of Two or More Continuous Random Variables

9-63

c) From part (b), fXY (z) =

-



−∞

1 fX (x)fY (z/x) dx = |x|

if 0 < z < 1, and fXY (z) = 0 otherwise.

-

1 z

1 · 1 · 1 dx = − ln z x

9.136 Proceeding by induction, we assume that Proposition 9.13 holds for m − 1 and must establish the result for m. From Proposition 6.13 on page 297 and the induction assumption, X1 + · · · + Xm−1 and Xm are independent random variables with distributions '(α1 + · · · + αm−1 , λ) and '(αm , λ), respectively. Hence, from Example 9.20(a) on page 536, X1 + · · · + Xm = (X1 + · · · + Xm−1 ) + Xm ! " ∼ ' (α1 + · · · + αm−1 ) + αm , λ = '(α1 + · · · + αm , λ).

9.137 Proceeding by induction, we assume that Proposition 9.14 holds for m − 1 and must establish the result for m. From Proposition 6.13 on page 297, the induction assumption, and Proposition a + b1 X1 + · "· · + bm−1 b"m Xm are independent ! 8.15 on page 468, the random 2variables ! Xm−1 and 2 2 2 2 2 N a + b1 µ1 + · · · + bm−1 µm−1 , b1 σ1 + · · · + bm−1 σm−1 and N bm µm , bm σm , respectively. Hence, from Example 9.22 on page 539, a + b1 X1 + · · · + bm Xm

= (a + b1 X1 + · · · + bm−1 Xm−1 ) + bm Xm (! ) ! " " 2 2 2 2 ∼ N a + b1 µ1 + · · · + bm−1 µm−1 + bm µm , b12 σ12 + · · · + bm−1 σm−1 σm . + bm " ! 2 2 σm . = N a + b1 µ1 + · · · + bm µm , b12 σ12 + · · · + bm

Advanced Exercises

9.138 Let W = X + Y + Z. Applying the FPF yields, for w > 0, --FW (w) = P (W ≤ w) = P (X + Y + Z ≤ w) =

fX,Y,Z (x, y, z) dx dy dz

x+y+z≤w

, / 6 = dz dy dx (1 + x + y + z)4 0 0 0 , / - w .- w−x * 2 2 = − dy dx (1 + x + y)3 (1 + w)3 0 0 , - w* 1 1 2(w − x) − − dx = (1 + x)2 (1 + w)2 (1 + w)3 0 * , * ,3 1 w w2 w =1− + + = . 1 + w (1 + w)2 1+w (1 + w)3 - w .-

w−x

*-

w−x−y

Differentiation now yields

fX+Y +Z (w) = if w > 0, and fX+Y +Z (w) = 0 otherwise.

3w 2 (1 + w)4

9-64

CHAPTER 9

Jointly Continuous Random Variables

9.139 a) From Equation (9.44) on page 534 and Equation (9.35), - ∞ - ∞ ! 2 " 1 2 2 & fX,Y (x, z − x) dx = e− x −2ρx(z−x)+(z−x) /2(1−ρ ) dx. fX+Y (z) = −∞ −∞ 2π 1 − ρ 2

Performing some algebra and completing the square yields

x 2 − 2ρx(z − x) + (z − x)2 = 2(1 + ρ)(x − z/2)2 + (1 − ρ)z2 /2.

Hence,

(x − z/2)2 z2 (x − z/2)2 z2 x 2 − 2ρx(z − x) + (z − x)2 = + = + . 1−ρ 4(1 + ρ) 2(1 − ρ)/2 4(1 + ρ) 2(1 − ρ)2

Therefore, using the fact that a PDF must integrate to 1, we get - ∞ ! " 1 2 −z2 /4(1+ρ) & e−(x−z/2) / 2(1−ρ)/2 dx fX+Y (z) = e 2π 1 − ρ 2 −∞ ! " & 1 1 2 2 2π(1 − ρ)/2 = √ √ = e−z /4(1+ρ) & e−z /2 2(1+ρ) . 2π 2(1 + ρ) 2π 1 − ρ 2 ! " Thus, we see that X + Y ∼ N 0, 2(1 + ρ) . b) From Equation (9.48) on page 540 and Equation (9.35), - ∞ - ∞ ! 2 " 1 2 2 |x|fX,Y (x, xz) dx = |x| & e− x −2ρx·(xz)+(xz) /2(1−ρ ) dx. fY /X (z) = −∞ −∞ 2π 1 − ρ 2

However,

( ) 2 x − 2ρx · (xz) + (xz) = 1 − 2ρz + z x 2 . " ! Therefore, using symmetry and making the substitution u = 1 − 2ρz + z2 x 2 /2(1 − ρ 2 ), we get 2

fY /X (z) =

2

-



2

! " − 1−2ρz+z2 x 2 /2(1−ρ 2 )

1 − ρ2 dx = & π 1 − ρ 2 1 − 2ρz + z2

& xe 2π 1 − ρ 2 0 & & 1 − ρ2 1 − ρ2 "= ! ". = ! π 1 − 2ρz + z2 π (1 − ρ 2 ) + (z − ρ)2

1

-



0

e−u du

Referring now to & Definition 8.11 on page 470, we conclude that Y/X has a Cauchy distribution with parameters ρ and 1 − ρ 2 .

9.140 Denote by In the interarrival time between the (n − 1)st and nth customer. By assumption, I1 , I2 , . . . are independent exponential random variables with common parameter λ. Now let Wn denote the time at which the nth customer arrives. Because Wn = I1 + · · · + In , Proposition 9.13 on page 537 implies that Wn ∼ '(n, λ). Next, we observe that {N(t) = n} = {Wn ≤ t, Wn+1 > t}. Consequently, by Equation (8.49) on page 453, for each nonnegative integer n, ! " P N(t) = n = P (Wn ≤ t, Wn+1 > t) = P (Wn ≤ t) − P (Wn+1 ≤ t) . / . / n−1 n j j 8 8 (λt) (λt) (λt)n − 1 − e−λt = e−λt . = 1 − e−λt j! j! n! j =0 j =0 Thus, N(t) ∼ P(λt).

9.6

Functions of Two or More Continuous Random Variables

9-65

√ 9.141 We begin by finding the PDF of Y/ν, which we call X. The required result can be obtained by using either the CDF method or the univariate transformation method. The result is fX (x) =

ν ν/2 2 x ν−1 e−νx /2 ν/2−1 2 '(ν/2)

if x > 0, and fX (x) = 0 otherwise. For convenience, let c denote the constant in the right of the preceding display. Because Z and Y are independent random variables, so are Z and X, by Proposition 6.13 on page 297. Therefore, we can now apply Equation (9.49) on page 540 and make the substitution u = x 2 to obtain - ∞ - ∞ c 2 2 fT (t) = fZ/X (t) = |x|fX (x)fZ (xt) dx = √ xx ν−1 e−νx /2 e−(xt) /2 dx 2π 0 −∞ - ∞ - ∞ ! ! " " c c 2 2 2 =√ xx ν−1 e− ν+t x /2 dx = √ u(ν+1)/2−1 e− ν+t u/2 du 2π 0 2 2π 0 ! " ! " ν/2 "−(ν+1)/2 ' (ν + 1)/2 ' (ν + 1)/2 ! 1 ν . = √ 1 + t 2 /ν ! "(ν+1)/2 = √ ν/2−1 '(ν/2) (ν + t 2 )/2 νπ '(ν/2) 2 2π 2

9.142 a) Let U = X/2 and V = Y/2. Using either the CDF method or the transformation method, we easily find that 2 2 " ". fU (u) = ! and fV (v) = ! 2 π 1 + 4u π 1 + 4v 2

We want to show that the random variable W = U + V has the standard Cauchy distribution. From Equation (9.46) on page 534, - ∞ - ∞ 4 du ! "! ". fW (w) = fU (u)fV (w − u) du = 2 π −∞ 1 + 4u2 1 + 4(w − u)2 −∞ To evaluate the integral, we break the integrand into two parts by using partial fractions: A + Bu C + Du 1 ! "! "= + . 2 2 2 1 + 4u 1 + 4(w − u)2 1 + 4u 1 + 4(w − u)

Clearing fractions and solving for A, B, C, and D yields B=

1 ! ", 2w 1 + w2

A=

1 wB, 2

C=

3 wB, 2

D = −B.

Now we make the substitutions v = 2u and " − w), respectively, in the two integrals and note ! v = 2(u 2 that A + (B/2)v + C + Dw + (D/2)v = 1 + w /2 to obtain * - ∞ , 1 A + (B/2)v 1 ∞ C + Dw + (D/2)v dv + dv 2 −∞ 1 + v 2 2 −∞ 1 + v2 - ∞ - ∞ 2 A + (B/2)v + C + Dw + (D/2)v 1 dv ! ! " " = 2 dv = π −∞ 1 + v2 π 1 + w2 −∞ π 1 + v 2

4 fW (w) = 2 π

=

1 ". ! π 1 + w2

Hence, we see that W —that is, (X + Y )/2—has the standard Cauchy distribution.

9-66

CHAPTER 9

Jointly Continuous Random Variables

b) If c = 0 or c = 1, the result is trivial. So, assume that 0 < c < 1. For convenience, let a = c and b = 1 − c, and set U = aX and V = bY . Using either the CDF method or the transformation method, we easily find that a ! " 2 π a + u2

fU (u) =

and

fV (v) =

b ! ". 2 π b + v2

We want to show that the random variable W = U + V has the standard Cauchy distribution. From Equation (9.46) on page 534, - ∞ ab ∞ du ! "! ". fW (w) = fU (u)fV (w − u) du = 2 2 2 π −∞ a + u b2 + (w − u)2 −∞

At this point, we proceed as in part (a) to obtain the required result that W —that is, (X + Y )/2—has the standard Cauchy distribution. c) We proceed by induction. Thus, assume that (X1 + · · · + Xn )/n has the standard Cauchy distribution. By Proposition 6.13 on page 297, the random variables X1 + · · · + Xn and Xn+1 are independent. Now, * , * , X1 + · · · + Xn+1 n X1 + · · · + Xn 1 = + Xn+1 . n+1 n+1 n n+1 Noting that n/(n + 1) + 1/(n + 1) = 1, we conclude from part (b) that (X1 + · · · + Xn+1 )/(n + 1) has the standard Cauchy distribution. 9.143 By Proposition 6.13 on page 297, the random variables X and Y + Z are independent. Hence, by Equation (9.46) on page 534, - ∞ fX (x)fY +Z (w − x) dx. fX+Y +Z (w) = fX+(Y +Z) (w) −∞

9.144 a) We apply the multivariate version of the FPF and make the substitution u = x1 + · · · + xm to obtain, for w ∈ R, FX1 +···+Xm (w) = P (X1 + · · · + Xm ≤ w) = ··· fX1 ,...,Xm (x1 , . . . , xm ) dx1 · · · dxm = = =

-

∞ −∞ ∞ −∞ w −∞

.

···

.

···

*-

-

∞ *- w−(x1 +···+xm−1 )

−∞

∞ *- w

−∞

∞ −∞

−∞

···

-

−∞ ∞

−∞

x1 +···+xm ≤w

,

/

fX1 ,...,Xm (x1 , . . . , xm )dxm dxm−1 · · · dx1

/ , " ! fX1 ,...,Xm x1 , . . . , xm−1 , u − (x1 + · · · + xm−1 ) du dxm−1 · · · dx1 ,

fX1 ,...,Xm (x1 , . . . , xm−1 , u − x1 − · · · − xm−1 ) dx1 · · · dxm−1 du.

The required result now follows from Proposition 8.5 on page 422. b) For k = 1, 2, . . . , m − 1, we have

fX1 +···+Xm (w) - ∞ - ∞ ( ) !B " ··· fX1 ,...,Xm x1 , . . . , xk−1 , w − j ̸=k xj , xk+1 , . . . , xm dx1 · · · dxk−1 dxk+1 · · · dxm . = −∞

−∞

9.7

9-67

Multivariate Transformation Theorem

c) Because of independence, a joint PDF of X1 , . . . , Xm is the product of the marginal PDFs. Thus, for part (a), we can write - ∞ - ∞ ! " fX1 +···+Xm (w) = ··· fX1 (x1 ) · · · fXm−1 (xm−1 )fXm (w − x1 − · · · − xm−1 ) dx1 · · · dxm−1 −∞

=

-

∞ −∞

.

···

−∞

-

∞ *- ∞

−∞

−∞

fXm−1 (xm−1 )fXm (w − x1 − · · · − xm−1 ) dxm−1 /

,

× fXm−2 (xm−2 )dxm−2 · · · fX1 (x1 ) dx1 , and similarly for the m − 1 formulas in part (b). d) For part (a), ∞

fX+Y +Z (w) =

−∞ −∞

For part (b),

fX+Y +Z (w) =

-

fX+Y +Z (w) =

-

and

For part (c), fX+Y +Z (w) =

-

fX+Y +Z (w) = fX+Y +Z (w) =

-

and



-



−∞ −∞



−∞

-





−∞ ∞

−∞



-



−∞ −∞

***-

∞ −∞ ∞ −∞ ∞ −∞

fX,Y,Z (x, y, w − x − y) dx dy.

fX,Y,Z (w − y − z, y, z) dy dz fX,Y,Z (x, w − x − z, z) dx dz. ,

fY (y)fZ (w − x − y) dy fX (x) dx, ,

fX (w − y − z)fZ (z) dz fY (y) dy, ,

fY (w − x − z)fZ (z) dz fX (x) dx.

9.145 By Proposition 6.13 on page 297, the random variables X1 + · · · + Xm−1 and Xm are independent. Hence, by Equation (9.46) on page 534, - ∞ fX1 +···+Xm (w) = f(X1 +···+Xm−1 )+Xm (w) = fX1 +···+Xm−1 (x)fXm (w − x) dx. −∞

9.7 Multivariate Transformation Theorem Basic Exercises 9.146 a) Let X and Y denote the inspection times for the first and second engineers, respectively. By assumption, X and Y are independent random variables with distributions '(α, λ) and '(β, λ), respectively.

9-68

CHAPTER 9

Jointly Continuous Random Variables

Hence a joint PDF of X and Y is given by fX,Y (x, y) =

λα α−1 −λx λβ β−1 −λy λα+β e · e = x y x α−1 y β−1 e−λ(x+y) '(α) '(β) '(α)'(β)

if x > 0 and y > 0, and fX,Y (x, y) = 0 otherwise. We note that S = X/(X + Y ) and T = X + Y and, furthermore, that the range of S is the interval (0, 1) and the range of T is the interval (0, ∞). As in Example 9.25, the Jacobian determinant of the transformation s = x/(x + y) and t = x + y is J (x, y) = 1/(x + y), and the inverse transformation is x = st and y = (1 − s)t. Applying the bivariate transformation theorem, we obtain a joint PDF of S and T : fS,T (s, t) = =

! " λα+β ! "β−1 −λ!st+(1−s)t " 1 e fX,Y (x, y) = st + (1 − s)t (st)α−1 (1 − s)t |J (x, y)| '(α)'(β) λα+β s α−1 (1 − s)β−1 t α+β−1 e−λt '(α)'(β)

if 0 < s < 1 and t > 0, and fS,T (s, t) = 0 otherwise. b) To obtain a PDF of S, we apply Proposition 9.7 on page 510 to get fS (s) = = =

-



−∞

λα+β '(α)'(β)



λα+β s α−1 (1 − s)β−1 t α+β−1 e−λt dt '(α)'(β) 0 - ∞ α−1 β−1 s (1 − s) t α+β−1 e−λt dt

fS,T (s, t) dt =

λα+β '(α)'(β)

-

0

s α−1 (1 − s)β−1

'(α + β) 1 = s α−1 (1 − s)β−1 α+β λ B(α, β)

if 0 < s < 1, and fS (s) = 0 otherwise. Hence, S has the beta distribution with parameters α and β. c) To obtain a PDF of T , we again apply Proposition 9.7 to get fT (t) = = =

-



−∞

fS,T (s, t) ds =

λα+β '(α)'(β) λα+β '(α)'(β)

t

-

α+β−1 −λt

e

1

0

-

λα+β s α−1 (1 − s)β−1 t α+β−1 e−λt ds '(α)'(β) 1

0

s α−1 (1 − s)β−1 ds

t α+β−1 e−λt B(α, β) =

λα+β α+β−1 −λt t e '(α + β)

if t > 0, and fT (t) = 0 otherwise. Hence, T ∼ '(α + β, λ). d) Yes. From parts (b) and (c), fS (s)fT (t) = =

1 λα+β α+β−1 −λt e s α−1 (1 − s)β−1 · t B(α, β) '(α + β) λα+β s α−1 (1 − s)β−1 t α+β−1 e−λt '(α)'(β)

if 0 < s < 1 and t > 0, and fS (s)fT (t) = 0 otherwise. Thus, in view of part (a), the function f defined on R2 by f (s, t) = fS (s)fT (t) is a joint PDF of S and T . Consequently, S and T are independent random variables.

9.7

9-69

Multivariate Transformation Theorem

9.147 a) Let X and Y denote the inspection times for the first and second engineers, respectively. By assumption, X and Y are independent random variables with distributions E(λ) and E(µ), respectively. Hence a joint PDF of X and Y is given by fX,Y (x, y) = λe−λx · µe−µy = λµe−(λx+µy) if x > 0 and y > 0, and fX,Y (x, y) = 0 otherwise. We note that S = X/(X + Y ) and T = X + Y and, furthermore, that the range of S is the interval (0, 1) and the range of T is the interval (0, ∞). As in Example 9.25, the Jacobian determinant of the transformation s = x/(x + y) and t = x + y is J (x, y) = 1/(x + y), and the inverse transformation is x = st and y = (1 − s)t. Applying the bivariate transformation theorem, we obtain a joint PDF of S and T : fS,T (s, t) =

! " ! " ! " 1 fX,Y (x, y) = st + (1 − s)t λµe− λ(st)+µ(1−s)t = λµte− µ+(λ−µ)s t |J (x, y)|

if 0 < s < 1 and t > 0, and fS,T (s, t) = 0 otherwise. b) To obtain a PDF of S, we apply Proposition 9.7 on page 510 to get - ∞ - ∞ ! " − µ+(λ−µ)s t fS,T (s, t) dt = λµte dt = λµ fS (s) = 0

−∞

0



t 2−1 e−

! " µ+(λ−µ)s t

dt

λµ '(2) = λµ ! "2 "2 = ! µ + (λ − µ)s µ + (λ − µ)s

if 0 < s < 1, and fS (s) = 0 otherwise. c) To obtain a PDF of T , we again apply Proposition 9.7 to get fT (t) = =

-



−∞

fS,T (s, t) ds =

-

0

1

λµte−

! " µ+(λ−µ)s t

ds = λµte−µt

) " λµte−µt ( λµ ! −µt 1 − e−(λ−µ)t = e − e−λt (λ − µ)t λ−µ

-

1 0

e−

! " (λ−µ)t s

ds

if t > 0, and fT (t) = 0 otherwise. d) No. We can see that S and T are not independent random variables in several ways. One way is to note that, for 0 < s < 1 and t > 0, !

"

! "2 −!µ+(λ−µ)s "t fS,T (s, t) λµte− µ+(λ−µ)s t , = fT | S (t | s) = A! "2 = µ + (λ − µ)s te fS (s) λµ µ + (λ − µ)s

which clearly depends on s because λ ̸= µ. Consequently, from Proposition 9.10 on page 527, we conclude that S and T are not independent. 9.148 Let U = X (the “dummy” variable) and V = Y/X. The Jacobian determinant of the transformation u = x and v = y/x is @ @ @ 1 @ 1 0 @= . J (x, y) = @@ 2 −y/x 1/x @ x

Solving the equations u = x and v = y/x for x and y, we obtain the inverse transformation x = u and y = uv. Hence, by the bivariate transformation theorem, fU,V (u, v) =

1 fX,Y (x, y) = |x|fX,Y (x, y) = |u|fX,Y (u, uv). |J (x, y)|

9-70

CHAPTER 9

Therefore, fV (v) = In other words,

-



−∞

Jointly Continuous Random Variables

fU,V (u, v) du =

fY /X (z) = which is Equation (9.48).

-



−∞

-



−∞

|u|fX,Y (u, uv) du.

|x|fX,Y (x, xz) dx,

9.149 a) Let X and Y denote the x and y coordinates, respectively, of the first spot to appear. Recall that fX,Y (x, y) = 1/π if (x, y) is in the unit disk, and fX,Y (x, y) = 0 otherwise. We have X = R cos / and Y = R sin /. Hence, the inverse transformation is x = r cos θ and y = r sin θ. The Jacobian determinant of this inverse transformation is @ @ @ cos θ −r sin θ @ @ @ = r cos2 θ + r sin2 θ = r. J (r, θ) = @ sin θ r cos θ @ Recalling that the Jacobian determinant of this inverse transformation is the reciprocal of that of the transformation itself, we conclude from the bivariate transformation theorem that 1 r fR,/ (r, θ) = fX,Y (x, y) = |J (r, θ)|fX,Y (x, y) = |J (x, y)| π

if 0 < r < 1 and 0 < θ < 2π , and fR,/ (r, θ) = 0 otherwise. b) We have fR (r) =



−∞



fR,/ (r, θ) dθ =

0

(r/π ) dθ = 2r

if 0 < r < 1, and fR (r) = 0 otherwise. Thus, R has the beta distribution with parameters 2 and 1. Also, - ∞ - 1 1 f/ (θ) = fR,/ (r, θ) dr = (r/π ) dr = 2π −∞ 0

if 0 < θ < 2π, and f/ (θ) = 0 otherwise. Thus, / ∼ U(0, 2π). c) From parts (a) and (b), we see that the function f defined on R2 by f (r, θ) = fR (r)f/ (θ) is a joint PDF of R and /. Consequently, R and / are independent random variables. 9.150 a) We have fU,V (u, v) = fU (u)fV (v) = 1 · 1 = 1

if 0 < u < 1 and 0 < v < 1, and fU,V (u, v) = 0 otherwise. The Jacobian determinant of the transformation x = a + (b − a)u and y = c + (d − c)v is @ @ @b − a 0 @@ @ J (x, y) = @ = (b − a)(d − c). 0 d −c@

We note that the ranges of X and Y are the intervals (a, b) and (c, d), respectively. Solving the equations x = a + (b − a)u and y = c + (d − c)v for u and v, we obtain the inverse transformation u = (x − a)/(b − a) and v = (y − c)/(d − c). Hence, by the bivariate transformation theorem, fX,Y (x, y) =

1 1 fU,V (u, v) = |J (u, v)| (b − a)(d − c)

if a < x < b and c < y < d, and fX,Y (x, y) = 0 otherwise. Thus, the random point (X, Y ) has the uniform distribution on the rectangle (a, b) × (c, d).

9.7

Multivariate Transformation Theorem

9-71

b) First ! obtain two (independent)" numbers, say, u and v, from a basic random number generator. Then a + (b − a)u, c + (d − c)v is a point selected at random from the rectangle (a, b) × (c, d).

9.151 a) From Equation (9.20), a joint PDF of X and Y is

! "n−2 fX,Y (x, y) = n(n − 1)f (x)f (y) F (y) − F (x)

if x < y, and fX,Y (x, y) = 0 otherwise. The Jacobian determinant of the transformation r = y − x and m = (x + y)/2 is @ @ @ −1 1 @@ J (x, y) = @@ = −1. 1/2 1/2 @

Solving the equations r = y − x and m = (x + y)/2 for x and y, we obtain the inverse transformation x = m − r/2 and y = m + r/2. Therefore, by the bivariate transformation theorem, a joint PDF of R and M is 1 fR,M (r, m) = fX,Y (x, y) = fX,Y (m − r/2, m + r/2). |J (x, y)|

Thus, for r > 0 and −∞ < m < ∞,

! "n−2 , fR,M (r, m) = n(n − 1)f (m − r/2)f (m + r/2) F (m + r/2) − F (m − r/2)

and fR,M (r, m) = 0 otherwise. b) To obtain a marginal PDF of R, we apply Proposition 9.7 on page 510 to the joint PDF of R and M that we just found. Making the substitution x = m − r/2, we get fR (r) =

-



−∞

fR,M (r, m) dm

= n(n − 1) = n(n − 1)

-



−∞ ∞

-

−∞

! "n−2 f (m − r/2)f (m + r/2) F (m + r/2) − F (m − r/2) dm ! "n−2 f (x)f (r + x) F (r + x) − F (x) dx

if r > 0, and fR (r) = 0 otherwise. This result agrees with the one found in Example 9.6. c) To obtain a marginal PDF of M, we again apply Proposition 9.7 to the joint PDF of R and M. Making the substitution x = m − r/2, we get, for −∞ < m < ∞, fM (m) =

-



−∞

fR,M (r, m) dr

= n(n − 1)

-

= 2n(n − 1)

0



-

! "n−2 f (m − r/2)f (m + r/2) F (m + r/2) − F (m − r/2) dr

m

−∞

! "n−2 f (x)f (2m − x) F (2m − x) − F (x) dx.

This result agrees with the one found in Exercise 9.116. d) No, R and M are not independent random variables. This fact can be seen in several ways. One way is to note from parts (a) and (b) that the conditional PDF of M given R = r depends on r. Thus, from Proposition 9.10 on page 527, R and M are not independent.

9-72

CHAPTER 9

Jointly Continuous Random Variables

9.152 For all x, y ∈ R, 1 1 1 −(x 2 +y 2 )/2σ 2 2 2 2 2 e . fX,Y (x, y) = √ e−x /2σ · √ e−y /2σ = 2 2πσ 2π σ 2π σ Let U = X 2 + Y 2 and V = Y/X, and observe that the ranges of U and V are the intervals (0, ∞) and (−∞, ∞), respectively. We want to find a joint PDF of U and V . Observe, however, that the transformation u = x 2 + y 2 and v = y/x does not satisfy the conditions of the bivariate transformation theorem because it is not one-to-one on the range of (X, Y ), which is R2 . Indeed, each two points (x, y) and (−x, −y) are transformed into the same (u, v) point. Note, though, that the joint density values of each two such points are identical. Consequently, fU,V (u, v) = 2 ·

1 fX,Y (x, y). |J (x, y)|

The Jacobian determinant of the transformation u = x 2 + y 2 and v = y/x is @ @ 2x J (x, y) = @@ −y/x 2

Noting that x 2 = u/(1 + v 2 ), we now get

@ * , ) 2y @@ y2 2 ( 2 2 = 2 1 + = x + y . x2 x2 1/x @

x2 1 −!x 2 +y 2 "/2σ 2 u/(1 + v 2 ) 1 −u/2σ 2 ! " e = e u 2πσ 2 2 x 2 + y 2 2πσ 2 , * ,* 1 1 −u/2σ 2 1 −u/2σ 2 1 ! " = e = e 1 + v 2 2πσ 2 2σ 2 π 1 + v2

fU,V (u, v) = 2 ·

if u > 0 and −∞ < v < ∞, and fU,V (u, v) = 0 otherwise. We see that the joint PDF of U and V can be factored into a function of u alone and a function of v alone. Therefore, from Exercise 9.95 on page 528, U and V are independent random variables, that is, X 2 +!Y 2 and" Y/X are independent. We note in passing that the factorization also shows that X 2 + Y 2 ∼ E 1/2σ 2 and Y/X ∼ C(0, 1). 9.153 We have

fX,Y (x, y) =

λα α−1 −λx λβ β−1 −λy λα+β x e · y e = x α−1 y β−1 e−λ(x+y) '(α) '(β) '(α)'(β)

if x > 0 and y > 0, and fX,Y (x, y) = 0 otherwise. Let U = X + Y and V = Y/X, and observe that the ranges of both U and V are (0, ∞). We want to find a joint PDF of U and V . The Jacobian determinant of the transformation u = x + y and v = y/x is @ @ 1 J (x, y) = @@ −y/x 2

@ 1 @@ x + y . = x2 1/x @

Solving the equations u = x + y and v = y/x for x and y, we determine the inverse transformation to be x = u/(1 + v) and y = uv/(1 + v). Therefore, by the bivariate transformation theorem, a joint PDF

9.7

9-73

Multivariate Transformation Theorem

of U and V is 1 x2 λα+β fX,Y (x, y) = x α−1 y β−1 e−λ(x+y) |J (x, y)| x + y '(α)'(β) * ,α−1 * , u2 /(1 + v)2 λα+β u uv β−1 −λu = e u '(α)'(β) 1 + v 1+v ,* , * 1 v β−1 λα+β uα+β−1 v β−1 −λu λα+β α+β−1 −λu u = e = e '(α)'(β) (1 + v)α+β '(α + β) B(α, β) (1 + v)α+β

fU,V (u, v) =

if u > 0 and v > 0, and fU,V (u, v) = 0 otherwise. We see that the joint PDF of U and V can be factored into a function of u alone and a function of v alone. Therefore, from Exercise 9.95 on page 528, U and V are independent random variables, that is, X + Y and Y/X are independent. 9.154 Let U = X + Y and V = X − Y . a) We want to find a joint PDF of U and V . The Jacobian determinant of the transformation u = x + y and v = x − y is @ @ @1 1 @ @ @ = −2. J (x, y) = @ 1 −1 @

Solving the equations u = x + y and v = x − y for x and y, we determine the inverse transformation to be x = (u + v)/2 and y = (u − v)/2. Therefore, by the bivariate transformation theorem, a joint PDF of U and V is fU,V (u, v) =

b) In this case,

1 2 2 ·√ e−(y−µ) /2σ 2π σ 2π σ ! " ! " 1 1 − (x−µ)2 +(y−µ)2 /2σ 2 − x 2 +y 2 −2µ(x+y)+2µ2 /2σ 2 = e = e . 2πσ 2 2πσ 2

fX,Y (x, y) = √

1

! " 1 1 fX,Y (x, y) = fX,Y (u + v)/2, (u − v)/2 . |J (x, y)| 2 2 /2σ 2

e−(x−µ)

Recalling that x = (u + v)/2 and y = (u − v)/2, we find that x 2 + y 2 − 2µ(x + y) + 2µ2 = Hence, from part (a), ! " 1 1 − (u−2µ)2 +v 2 /4σ 2 e = fU,V (u, v) = 2 2πσ 2

*

) 1( (u − 2µ)2 + v 2 . 2

1 −(u−2µ)2 /4σ 2 e 2πσ

,*

, 1 −v 2 /4σ 2 . e 2πσ

We see that the joint PDF of U and V can be factored into a function of u alone and a function of v alone. Therefore, from Exercise 9.95 on page 528, U and V are independent random variables, that is, X + Y and in passing that the factorization also shows " ! X −2Y" are independent.! We 2note that X + Y ∼ N 2µ, 2σ and X − Y ∼ N 0, 2σ .

9-74

CHAPTER 9

Jointly Continuous Random Variables

c) In this case, fX,Y (x, y) = 1 · 1 = 1 if 0 < x < 1 and 0 < y < 1, and fX,Y (x, y) = 0 otherwise. Recalling that x = (u + v)/2 and y = (u − v)/2, we find from part (a) that

1 2 if (u, v) ∈ S, and fU,V (u, v) = 0 otherwise, where S is the square with vertices (0, 0), (1, −1), (2, 0), and (1, 1). We can now show in several ways that U and V (i.e., X + Y and X − Y ) are not independent random variables. One way is to note that the range of V is the interval (−1, 1), whereas the range of V|U =1/2 is the interval (−1/2, 1/2). Hence, by Proposition 9.10 on page 527, U and V are not independent. fU,V (u, v) =

9.155 Let X(t) and Y (t) denote the x and y !coordinates, respectively, of the particle at time t. By " assumption, X(t) and Y (t) are independent N 0, σ 2 t random variables. Hence, their joint PDF is, for all (x, y) ∈ R2 ,

! " 1 1 2 2 2 2 − x 2 +y 2 /2σ 2 t e . √ e−x /2σ t · √ √ e−y /2σ t = 2πσ 2 t 2π σ t 2π σ t ! " ! " a) We have X(t) = R(t) cos /(t) and Y (t) = R(t) sin /(t) . Hence, the inverse transformation is x = r cos θ and y = r sin θ . The Jacobian determinant of this inverse transformation is @ @ @ cos θ −r sin θ @ @ = r cos2 θ + r sin2 θ = r. J (r, θ) = @@ sin θ r cos θ @

fX(t),Y (t) (x, y) = √

1

Recalling that the Jacobian determinant of this inverse transformation is the reciprocal of that of the transformation itself, we conclude from the bivariate transformation theorem that fR(t),/(t) (r, θ) =

1 fX(t),Y (t) (x, y) = |J (r, θ)|fX(t),Y (t) (x, y) |J (x, y)|

! " r 1 2 2 − x 2 +y 2 /2σ 2 t e−r /2σ t e = 2 2 2πσ t 2πσ t if r > 0 and 0 < θ < 2π , and fR(t),/(t) (r, θ) = 0 otherwise. b) We can apply Proposition 9.7 on page 510 to obtain the marginal PDFs of R(t) and /(t). Alternatively, we can and do use Exercises 9.95 and 9.96 on page 528. From part (a), we can write )* 1 , ( r −r 2 /2σ 2 t fR(t),/(t) (r, θ) = e 2π σ 2t

=r

if r > 0 and 0 < θ < 2π, and fR(t),/(t) (r, θ) = 0 otherwise. It now follows immediately from the aforementioned exercises that ⎧ ⎧ ⎨ r e−r 2 /2σ 2 t , if r > 0; ⎨ 1 , if 0 < θ < 2π ; fR(t) (r) = σ 2 t and f/(t) (θ) = 2π ⎩ ⎩ 0, otherwise. 0, otherwise. Thus, we see that /(t) ∼ U(0, 2π) and, referring to Exercise 8.159 on page 475, that R(t) has the Rayleigh distribution with parameter σ 2 t. c) It follows immediately from part (b) and either Exercise 9.95(d) or Proposition 9.9 on page 523 that R(t) and /(t) are independent random variables. 9.156 a) A marginal PDF of X and all conditional PDFs of Y given X = x determine a joint PDF of X and Y because of the general multiplication rule for the joint PDF of two continuous random variables, Equation (9.34) on page 517.

9.7

Multivariate Transformation Theorem

9-75

b) We have

1 1 = =2 |T | (1/2) · 1 · 1 if (x, y) ∈ T , and fX,Y (x, y) = 0 otherwise. Also, - ∞ - x fX,Y (x, y) dy = 2 dy = 2x fX (x) = fX,Y (x, y) =

−∞

0

if 0 < x < 1, and fX (x) = 0 otherwise. Furthermore, for 0 < x < 1, fY | X (y | x) =

fX,Y (x, y) 2 1 = = fX (x) 2x x

if 0 < y < x, and fY | X (y | x) = 0 otherwise. √ c) It follows from part (b) that FX (x) = x 2 if 0 < x < 1 and, hence, FX−1 (u) = u. Therefore, from √ Proposition 8.16(b), U has the same probability distribution as X. It also follows from part (b) that, for each 0 < x < 1, FY | X (y | x) = y/x if 0 < y < x and, hence, FY−1| X (v) = xv. Therefore, from Proposition 8.16(b), √ xV has the same probability distribution as Y|X=x . However, because of the independence of U and V , U V|√U =x has the same probability distribution as xV and, hence, as Y|X=x . √ d) From part (c), U and X have the same probability distributions √ the same marginal PDFs. √ and, hence, as the Also, from part (c), for each x, the conditional distribution of U V given U = x is the same √ conditional distribution of Y given X = x; in other words, for each x, the conditional PDF of √ UV √ given√ U = x is the same as the conditional PDF of Y given X = x. Consequently, from part (a), U and U V have the same joint PDF and, hence, the same joint probability distribution as X and Y . e) We first note that, because U and V are independent U(0, 1) random√ variables, we √ have fU,V (u, v) = 1 if 0 < u < 1 and 0 < v < 1, and fU,V (u, Let R = U and S = U V . The Jacobian √ v) = 0 otherwise. √ determinant of the transformation r = u and s = uv is @ ! √ " @ @ 1/ 2 u @ 1 0 J (u, v) = @@ ! √ " √ @@ = . 2 v/ 2 u u √ √ Solving the equations r = u and s = uv for u and v, we obtain the inverse transformation u = r 2 and v = s/r. It follows that 0 < u < 1 and 0 < v < 1 if and only if 0 < r < 1 and 0 < s < r. Thus, by the bivariate transformation theorem, 1 fU,V (u, v) = 2 · 1 = 2 fR,S (r, s) = |J (u, v)| if (r, , and fR,S (r, s) = 0 otherwise. Consequently, in view of part (a), we see that R and S—that √ s) ∈ T √ is, U and U V —have the same joint probability distribution as X and Y . f) Use a basic random number generator !to " independent uniform numbers between 0 and 1, √ two √obtain say, u and v, and then calculate the point u, uv . g) Consider random variables X and Y that satisfy the following properties: ● ●



X and Y are continuous random variables with a joint PDF. The conditions of Proposition 8.16 on page 471 hold for the random variable X and for the random variable Y|X=x for each x in the range of X. For each x in the range of X, the equation u = FX (x) can be solved explicitly for x in terms of u and, for each (x, y) in the range of (X, Y ), the equation v = FY | X (y | x) can be solved explicitly for y in terms of x and v.

Then we can proceed in an analogous manner to that in parts (a)–(f) to simulate an observation of the random point (X, Y ).

9-76

CHAPTER 9

Jointly Continuous Random Variables

9.157√From the Box–Müller transformation, if U and V are independent U(0, 1) random variables, then −2 ln U cos(2πV ) has the standard√normal distribution. Therefore, from Proposition 8.15 on page 468, the random variable X = µ + σ −2 ln U cos(2πV ) has the normal distribution with parameters µ and σ 2 . So, to simulate an observation of a normal random variable with parameters µ and σ 2 , we first obtain two √ (independent) numbers, say, u and v, from a basic random number generator. The number x = µ + σ −2 ln u cos(2πv) is then a simulated observation of a normal random variable with parameters µ and σ 2 . 9.158 a) Answers will vary, but here is the procedure: First obtain 10,000 pairs (a total of 20,000 √ numbers) from a basic random number generator. Then, for each pair, (u, v), compute x = 266 + 16 −2 ln u cos(2πv). The 10,000 x-values thus obtained will constitute a simulation of 10,000 human gestation periods. b) The histogram should be shaped roughly like a normal curve with parameters µ = 266 and σ = 16. c) Answers will vary. 9.159 a) We have, for all z1 , z2 ∈ R,

1 1 1 −!z2 +z2 "/2 2 2 fZ1 ,Z2 (z1 , z2 ) = √ e−z1 /2 · √ e−z2 /2 = e 1 2 . 2π 2π 2π & The Jacobian determinant of the transformation x = z1 and y = ρz1 + 1 − ρ 2 z2 is @ @ + 0 @1 @ @ = 1 − ρ2. & J (z1 , z2 ) = @@ @ ρ 1 − ρ2 & Solving the equations x = z and y = ρz + 1 − ρ 2 z2 , we obtain the inverse transformation z1 = x 1 1 & ! " ! " 2 and z2 = (y − ρx)/ 1 − ρ . It follows that z12 + z22 = x 2 − 2ρxy + y 2 / 1 − ρ 2 . Thus, by the bivariate transformation theorem, 1 1 1 −!z2 +z2 "/2 fZ1 ,Z2 (z1 , z2 ) = & e 1 2 |J (z1 , z2 )| 1 − ρ 2 2π ! 2 " ! " 1 2 2 & = e− x −2ρxy+y /2 1−ρ 2π 1 − ρ 2

fX,Y (x, y) =

for all x, y ∈ R. b) Let ρ be a specified real number with −1 < ρ < 1. Obtain two (independent) numbers, say, u and v, from a basic random number generator, and then set + √ √ √ and y = ρ −2 ln u cos(2πv) + 1 − ρ 2 −2 ln u sin(2πv). x = −2 ln u cos(2πv) From the Box–Müller transformation and part (a), the pair (x, y) is a simulated observation of a bivariate normal distribution with joint PDF given by Equation (9.35).

Theory Exercises 9.160 Let X1 , . . . , Xm be continuous random variables with a joint PDF. Suppose that g1 , . . . , gm are real-valued functions of m real variables defined on the range of (X1 , . . . , Xm ) and that they and their first partial derivatives are continuous on it. Further suppose that the m-dimensional transformation defined on the range of (X1 , . . . , Xm ) by yj = gj (x1 , . . . , xm ), for 1 ≤ j ≤ m, is one-to-one. Then a joint PDF

9.7

Multivariate Transformation Theorem

9-77

of the random variables Y1 = g1 (X1 , . . . , Xm ), . . . , Ym = gm (X1 , . . . , Xm ) is fY1 ,...,Ym (y1 , . . . , ym ) =

1 fX ,...,Xm (x1 , . . . , xm ) |J (x1 , . . . , xm )| 1

for (y1 , . . . , ym ) in the range of (Y1 , . . . , Ym ), where (x1 , . . . , xm ) is the unique point in the range of (X1 , . . . , Xm ) such that gj (x1 , . . . , xm ) = yj for 1 ≤ j ≤ m. Here, J (x1 , . . . , xm ) is the Jacobian determinant of the transformation: @ @ ∂g1 @ ∂g1 @ @ @ @ ∂x1 (x1 , . . . , xm ) . . . ∂xm (x1 , . . . , xm ) @ @ @ @ @ .. .. .. J (x1 , . . . , xm ) = @ @. . . . @ @ @ ∂gm @ ∂gm @ @ (x , . . . , x ) . . . (x , . . . , x ) 1 m 1 m @ @ ∂x1 ∂xm Proof: For (y1 , . . . , ym ) in the range of (Y1 , . . . , Ym ), we have, from the multivariate version of the FPF, that FY1 ,...,Ym (y1 , . . . , ym ) = P (Y1 ≤ y1 , . . . , Ym ≤ ym ) " ! = P g1 (X1 , . . . , Xm ) ≤ y1 , . . . , gm (X1 , . . . , Xm ) ≤ ym = ··· fX1 ,...,Xm (x1 , . . . , xm ) dx1 · · · dxm . gj (x1 ,...,xm )≤yj , 1≤j ≤m

Making the transformation sj = gj (x1 , . . . , xm ), 1 ≤ j ≤ m, and applying from calculus the change of variable formula for multiple integrals, we get - y1 - ym 1 ··· fX1 ,...,Xm (x1 , . . . , xm ) ds1 · · · dsm , FY1 ,...,Ym (y1 , . . . , ym ) = |J (x , . 1 . . , xm )| −∞ −∞ where, for each point (s1 , . . . , sm ), the point (x1 , . . . , xm ) is the unique point in the range of (X1 , . . . , Xm ) such that gj (x1 , . . . , xm ) = sj for 1 ≤ j ≤ m. The required result now follows from the multivariate version of Proposition 9.4 on page 499. B 9.161 We apply Exercise 9.160 with m = n and gi (x1 , . . . , xn ) = jn=1 aij xj for 1 ≤ i ≤ n. Because Bn the matrix A is nonsingular, the transformation yi = j =1 aij xj , for 1 ≤ i ≤ n, is one-to-one. For each i and j , * n , ∂gi ∂ 8 (x1 , . . . , xn ) = aik xk = aij . ∂xj k=1 ∂xj B Thus, the Jacobian determinant of the transformation yi = jn=1 aij xj , 1 ≤ i ≤ n, is det A. Applying the multivariate transformation theorem now yields fY1 ,...,Yn (y1 , . . . , yn ) =

1 1 fX1 ,...,Xn (x1 , . . . , xn ) = fX ,...,Xn (x1 , . . . , xn ) |J (x1 , . . . , xn )| | det A| 1

for (y1 , . . . , yn ) in the range Bn of (Y1 , . . . , Yn ), where (x1 , . . . , xn ) is the unique point in the range of (X1 , . . . , Xn ) such that j =1 aij xj = yi for 1 ≤ i ≤ n. Letting x and y denote the column vectors of x1 , . . . , xn and y1 , . . . , yn , respectively, we have y = Ax and, hence, x = A−1 y.

9-78

CHAPTER 9

Jointly Continuous Random Variables

Advanced Exercises 9.162 a) We apply Exercise 9.161 with aij = From this result, we see that det A = 1. Now,

9

1, if j ≤ i; 0, otherwise.

fX1 ,...,Xn (x1 , . . . , xn ) = fX1 (x1 ) · · · fXn (xn ) = λe−λx1 · · · λe−λxn = λn e−λ(x1 +···+xn )

if xi ≥ 0 for 1 ≤ i ≤ n, and fX1 ,...,Xn (x1 , . . . , xn ) = 0 otherwise. Therefore, fY1 ,...,Yn (y1 , . . . , yn ) =

1 1 fX1 ,...,Xn (x1 , . . . , xn ) = λn e−λ(x1 +···+xn ) = λn e−λyn | det A| 1

if 0 < y1 < · · · < yn , and fY1 ,...,Yn (y1 , . . . , yn ) = 0 otherwise. b) We note that the inverse transformation of the transformation in part (a) is x1 = y1 and xi = yi − yi−1 for 2 ≤ i ≤ n. Consequently, by referring to part (a), we see that the random variables defined by X1 = Y1 and Xi = Yi − Yi−1 , for 2 ≤ i ≤ n, must be independent, each with an E(λ) distribution. In particular, their joint PDF is the one given for X1 , . . . , Xn in part (a). 9.163 Let X(t), Y (t), and Z(t) denote the x, y, and z coordinates, " respectively, of the particle at time t. By ! 2 assumption, X(t), Y (t), and Z(t) are independent N 0, σ t random variables. Hence, their joint PDF is, for all (x, y, z) ∈ R3 , 1 1 1 2 2 2 2 2 2 fX(t),Y (t),Z(t) (x, y, z) = √ √ e−x /2σ t · √ √ e−y /2σ t · √ √ e−z /2σ t 2π σ t 2π σ t 2π σ t ! " 1 − x 2 +y 2 +z2 /2σ 2 t = (√ e . √ )3 2π σ t

! " ! " a) Recalling the! definition coordinates, !we have " ! of spherical " " that X(t) = P (t) sin -(t) cos /(t) , Y (t) = P (t) sin -(t) sin /(t) , and Z(t) = P (t) cos -(t) . Hence, the inverse transformation is given by x = ρ sin φ cos θ, y = ρ sin φ sin θ , and z = ρ cos φ. The Jacobian determinant of the inverse transformation is @ @ sin φ sin θ cos φ @ @ sin φ cos θ @ @ J (ρ, φ, θ) = @@ ρ cos φ cos θ ρ cos φ sin θ −ρ sin φ @@ = ρ 2 sin φ. @ @ −ρ sin φ sin θ ρ sin φ cos θ 0 Recalling that the Jacobian determinant of an inverse transformation is the reciprocal of that of the transformation itself, we conclude from the multivariate transformation theorem that fP (t),-(t),/(t) (ρ, φ, θ) =

1 fX(t),Y (t),Z(t) (x, y, z) = |J (ρ, φ, θ)|fX(t),Y (t),Z(t) (x, y, z) |J (x, y, z)|

= ρ 2 sin φ (√

1

e− √ )3 2π σ t

! 2 2 2" x +y +z /2σ 2 t

ρ 2 sin φ 2 2 = (√ e−ρ /2σ t √ )3 2π σ t

if ρ > 0, 0 < φ < π, and 0 < θ < 2π, and fP (t),-(t),/(t) (ρ, φ, θ) = 0 otherwise. b) We can apply the trivariate analogue of Proposition 9.7 on page 510 to determine the marginal PDFs of P (t), -(t), and /(t). Alternatively, we can and do use the trivariate analogues of Exercises 9.95

9.7

9-79

Multivariate Transformation Theorem

and 9.96 on page 528. From part (a), we can write /* . √ ,* , 2/π 2 −ρ 2 /2σ 2 t 1 1 fP (t),-(t),/(t) (ρ, φ, θ) = ! √ "3 ρ e sin φ 2 2π σ t

if ρ > 0, 0 < φ < π, and 0 < θ < 2π, and fP (t),-(t),/(t) (ρ, φ, θ) = 0 otherwise. It now follows immediately from the aforementioned exercises that ⎧ √ 2/π 2 −ρ 2 /2σ 2 t ⎪ ⎨! √ , if ρ > 0; "3 ρ e fP (t) (ρ) = σ t ⎪ ⎩ 0, otherwise. ⎧ ⎨ 1 sin φ, f-(t) (φ) = 2 ⎩ 0,

if 0 < φ < π;

and

otherwise.

⎧ ⎨ 1 , f/(t) (θ) = 2π ⎩ 0,

if 0 < θ < 2π ; otherwise.

Thus, we see that /(t) ∼ U(0, 2π) and, referring to Exercise 8.160(d) on page 475, that P (t) has the Maxwell distribution with parameter σ 2 t. c) It follows immediately from part (b) and either the trivariate version of Exercise 9.95(d) or Proposition 9.9 on page 523 that P (t), -(t), and /(t) are independent random variables. 9.164 By assumption, we have, for y > 0 and z ∈ R, fY (y) =

(1/2)ν/2 ν/2−1 −y/2 y e '(ν/2)

and

1 2 fZ (z) = √ e−z /2 . 2π

Because Y and Z are independent, a joint PDF of Y and Z is the product of the marginals, or fY,Z (y, z) = fY (y)fZ (z) = or fY,Z (y, z) =

(1/2)ν/2 ν/2−1 −y/2 1 2 y e · √ e−z /2 , '(ν/2) 2π 1

2ν/2 '(ν/2)

y ν/2−1 e−(y+z √ 2π

2 )/2

,

if y > 0 and −∞ < z < √ ∞, and fY,Z (y, z) = 0 otherwise. We now let T √ = Z/ Y/ν and U = Y (a “dummy variable”). The Jacobian determinant of the transformation t = z/ y/ν and u = y is @ @ 1 @ z @ @ @− & √ @ y/ν @@ = − √ 1 . 2 y 3 /ν J (y, z) = @ @ @ y/ν @ 1 0 @

√ Solving the √ equations t = z/ y/ν and u = y for y and z, we obtain the inverse transformation y = u and z = t u/ν. Therefore, by the bivariate transformation theorem, a joint PDF of T and U is 1 1 fY,Z (y, z) = fY,Z (y, z) √ |J (y, z)| | − 1/ y/ν| & & ! & " = y/νfY,Z (y, z) = u/νfY,Z u, t u/ν .

fT ,U (t, u) =

9-80

CHAPTER 9

Jointly Continuous Random Variables

! " Noting that y + z2 = u 1 + t 2 /ν , we get fT ,U (t, u) =

=

& u/ν

! " 1+t 2 /ν /2

1

uν/2−1 e−u √ ν/2 2 '(ν/2) 2π

! " 1 (ν+1)/2−1 −u 1+t 2 /ν /2 e , u √ 2(ν+1)/2 '(ν/2) νπ

if −∞ < t < ∞ and u > 0, and fT ,U (t, u) = 0 otherwise. To obtain a PDF of T , we apply Proposition 9.7 on page 510 and then use the fact that the integral of a gamma PDF is 1: fT (t) =

-



−∞

fT ,U (t, u) du =

-



0

! " 1 (ν+1)/2−1 −u 1+t 2 /ν /2 e du u √ 2(ν+1)/2 '(ν/2) νπ

! " ' (ν + 1)/2 1 = (ν+1)/2 √ ( " )(ν+1)/2 2 '(ν/2) νπ ! 1 + t 2 /ν /2 ×

Therefore,

-

∞ 0

(! " )(ν+1)/2 1 + t 2 /ν /2 " ! 2 ! " u(ν+1)/2−1 e−u 1+t /ν /2 du ' (ν + 1)/2

! " ' (ν + 1)/2 1 = (ν+1)/2 √ ( " )(ν+1)/2 . 2 '(ν/2) νπ ! 2 1 + t /ν /2 ! " "−(ν+1)/2 ' (ν + 1)/2 ! , fT (t) = √ 1 + t 2 /ν νπ '(ν/2)

∞ < t < ∞.

9.165 We use the bivariate transformation theorem. By assumption, fX1 ,X2 (x1 , x2 ) = fX1 (x1 ) · fX2 (x2 ) = =

(1/2)ν1 /2 ν1 /2−1 −x1 /2 (1/2)ν2 /2 ν2 /2−1 −x2 /2 x e · x e '(ν1 /2) 1 '(ν2 /2) 2

1 ν /2−1 ν2 /2−1 −(x1 +x2 )/2 x2 e x 1 2(ν1 +ν2 )/2 '(ν1 /2)'(ν2 /2) 1

if x1 > 0 and x2 > 0, and fX1 ,X2 (x1 , x2 ) = 0 otherwise. For convenience, set a=

1 2(ν1 +ν2 )/2 '(ν1 /2)'(ν2 /2)

and

b=

ν1 . ν2

A Let W = (X1 /ν1 ) (X2 /ν2 ) = (1/b)(X1 /X2 ) and V = X2 (the “dummy variable”). The Jacobian determinant of the transformation w = (1/b)(x1 /x2 ) and v = x2 is @ 1 x1 @@ @ − @ @ bx22 @ = 1 . J (x1 , x2 ) = @@ bx2 @ bx 2 @ @ 0 1

Review Exercises for Chapter 9

9-81

Solving the equations w = (1/b)(x1 /x2 ) and v = x2 for x1 and x2 , we get the inverse transformation x1 = bvw and x2 = v. Thus, by the bivariate transformation theorem, fW,V (w, v) =

1 fX ,X (x1 , x2 ) = bvfX1 ,X2 (bvw, v) |J (x1 , x2 )| 1 2

= abv(bvw)ν1 /2−1 v ν2 /2−1 e−v(1+bw)/2 = abν1 /2 wν1 /2−1 v (ν1 +ν2 )/2−1 e−v(1+bw)/2

if w > 0 and v > 0, and fW,V (w, v) = 0 otherwise. Therefore, - ∞ - ∞ ν1 /2 ν1 /2−1 fW,V (w, v) dv = ab w v (ν1 +ν2 )/2−1 e−v(1+bw)/2 dv fW (w) = 0

−∞

" ! ' (ν1 + ν2 )/2 ν1 /2 ν1 /2−1 w = ab ! "(ν +ν )/2 (1 + bw)/2 1 2 ! " (ν1 /ν2 )ν1 /2 ' (ν1 + ν2 )/2 wν1 /2−1 = "(ν +ν )/2 ! '(ν1 /2)'(ν2 /2) 1 + (ν1 /ν2 )w 1 2

if w > 0, and fW (w) = 0 otherwise.

Review Exercises for Chapter 9 Basic Exercises 9.166 a) Applying first the complementation rule and then the general addition rule, we get P (X > x, Y > y) = 1 − P ({X ≤ x} ∪ {Y ≤ y}) ! " = 1 − P (X ≤ x) + P (Y ≤ y) − P (X ≤ x, Y ≤ y) = 1 − FX (x) − FY (y) + FX,Y (x, y).

b) In view of the result of part (a) and Proposition 9.2 on page 490,

P (X > x, Y > y) = 1 − FX (x) − FY (y) + FX,Y (x, y)

= 1 − FX,Y (x, ∞) − FX,Y (∞, y) + FX,Y (x, y).

9.167 a) We have

⎧ 0, if x < 0 or y < 0; ⎪ ⎪ ⎪ xy, if 0 ≤ x < 1 and 0 ≤ y < 1; ⎨ if 0 ≤ x < 1 and y ≥ 1; FX,Y (x, y) = x, ⎪ ⎪ y, if x ≥ 1 and 0 ≤ y < 1; ⎪ ⎩ 1, if x ≥ 1 and y ≥ 1. Therefore, from Proposition 9.1 on page 488, P (X > 0.3, Y > 0.4) = P (0.3 < X ≤ 1, 0.4 < Y ≤ 1) = FX,Y (1, 1) − FX,Y (1, 0.4) − FX,Y (0.3, 1) + FX,Y (0.3, 0.4) = 1 − 0.4 − 0.3 + (0.3)(0.4) = 0.42.

9-82

CHAPTER 9

Jointly Continuous Random Variables

b) We have FX (x) =

#

0, x, 1,

if x < 0; if 0 ≤ x < 1; if x ≥ 1.

and

FY (y) =

# 0,

y, 1,

Therefore, from Exercise 9.166(a),

if y < 0; if 0 ≤ y < 1; if y ≥ 1.

P (X > 0.3, Y > 0.4) = 1 − FX (0.3) − FY (0.4) + FX,Y (0.3, 0.4) = 1 − 0.3 − 0.4 + (0.3)(0.4) = 0.42.

c) In view of Exercise 9.166(b) and Proposition 9.2 on page 490,

d) We have

P (X > 0.3, Y > 0.4) = 1 − FX,Y (0.3, ∞) − FX,Y (∞, 0.4) + FX,Y (0.3, 0.4) = 1 − FX (0.3) − FY (0.4) + FX,Y (0.3, 0.4) = 1 − 0.3 − 0.4 + (0.3)(0.4) = 0.42. fX,Y (x, y) =

Therefore, by the FPF,

9

1, if 0 < x < 1 and 0 < y < 1; 0, otherwise. --

P (X > 0.3, Y > 0.4) =

fX,Y (x, y) dx dy =

x>0.3, y>0.4

-

1

-

1

0.3 0.4

1 dx dy

= (1 − 0.3)(1 − 0.4) = 0.42. 9.168 a) The sample space, !, for this random experiment is the rectangle (a, b) × (c, d). Because we are selecting a point at random, use of a geometric probability model is appropriate. Thus, for each event E, P (E) =

|E| |E| = , |!| (b − a)(d − c)

where |E| denotes the area (in the extended sense) of the set E. The random variables X and Y are defined on ! by X(x, y) = x and Y (x, y) = y. To find the joint CDF of X and Y , we begin by observing that the range of (X, Y ) is !. Therefore, if either x < a or y < c, we have FX,Y (x, y) = 0; and, if x > b and y > d, we have FX,Y (x, y) = 1. We must still find FX,Y (x, y) in all other cases, of which we can consider three. Case 1: a ≤ x < b and c ≤ y < d In this case, {X ≤ x, Y ≤ y} is the event that the randomly selected point lies in the rectangle with vertices (a, c), (x, c), (x, y), and (a, y), specifically, (a, x] × (c, y]. Hence, FX,Y (x, y) = P (X ≤ x, Y ≤ y) =

|{X ≤ x, Y ≤ y}| |(a, x] × (c, y]| (x − a)(y − c) = = . (b − a)(d − c) (b − a)(d − c) (b − a)(d − c)

Case 2: a ≤ x < b and y ≥ d In this case, {X ≤ x, Y ≤ y} is the event that the randomly selected point lies in the rectangle with vertices (a, c), (x, c), (x, d), and (a, d), specifically, (a, x] × (c, d). Hence, FX,Y (x, y) = P (X ≤ x, Y ≤ y) =

x−a |{X ≤ x, Y ≤ y}| |(a, x] × (c, d]| = . = (b − a)(d − c) b−a (b − a)(d − c)

9-83

Review Exercises for Chapter 9

Case 3: x ≥ b and c ≤ y < d

In this case, {X ≤ x, Y ≤ y} is the event that the randomly selected point lies in the rectangle with vertices (a, c), (b, c), (b, y), and (a, y), specifically, (a, b) × (c, y]. Thus, FX,Y (x, y) = P (X ≤ x, Y ≤ y) =

|{X ≤ x, Y ≤ y}| |(a, b] × (c, y]| y−c = = . (b − a)(d − c) (b − a)(d − c) d −c

We conclude that the joint CDF of the random variables X and Y is ⎧ 0, if x < a or y < c; ⎪ ⎪ ⎪ ⎪ ⎪ (x − a)(y − c) ⎪ ⎪ , if a ≤ x < b and c ≤ y < d; ⎪ ⎪ (b − a)(d − c) ⎪ ⎪ ⎪ ⎨ x−a FX,Y (x, y) = , if a ≤ x < b and y ≥ d; ⎪ b−a ⎪ ⎪ ⎪ ⎪ ⎪ y−c ⎪ ⎪ , if x ≥ b and c ≤ y < d; ⎪ ⎪ d −c ⎪ ⎪ ⎩ 1, if x ≥ b and y ≥ d.

b) Referring to Proposition 9.2 on page 490 and to part (a), we find that ⎧ 0, if x < a; ⎪ ⎪ ⎨ x−a , if a ≤ x < b; FX (x) = lim FX,Y (x, y) = y→∞ ⎪ b−a ⎪ ⎩ 1, if x ≥ b.

and

⎧ 0, ⎪ ⎪ ⎨ y−c , FY (y) = lim FX,Y (x, y) = x→∞ ⎪ d −c ⎪ ⎩ 1,

if y < c;

if c ≤ y < d; if y ≥ d.

c) From the form of the marginal CDFs obtained in part (b), we see that X ∼ U(a, b) and Y ∼ U(c, d). d) From parts (a) and (b), we find that the joint CDF of X and Y equals the product of the marginal CDFs of X and Y . e) From part (d), we have FX,Y (x, y) = FX (x)FY (y) for all (x, y) ∈ R2 . Thus, - x - y - x - y fX (s) ds · fY (t) dt = fX (s)fY (t) ds dt. FX,Y (x, y) = FX (x)FY (y) = −∞

−∞

−∞ −∞

Therefore, from Proposition 9.4, the function f defined on R2 by f (x, y) = fX (x)fY (y) is a joint PDF of X and Y . Now applying Proposition 9.9 on page 523, we conclude that X and Y are independent random variables. 9.169 a) Taking the mixed partial of the joint CDF found in the solution to Exercise 9.168(a), we find that ⎧ 1 ⎨ , if a < x < b and c < y < d; ∂2 fX,Y (x, y) = FX,Y (x, y) = (b − a)(d − c) ⎩ ∂x ∂y 0, otherwise.

9-84

CHAPTER 9

Jointly Continuous Random Variables

b) Because a point is being selected at random from the rectange R = (a, b) × (c, d), a joint PDF of X and Y should be a nonzero constant, say, k, on R, and zero elsewhere. As a joint PDF must integrate to 1, we have - 1=





b

fX,Y (x, y) dx dy =

−∞ −∞ "−1

! Hence, k = (b − a)(d − c)

d

a

c

, so that ⎧ 1 ⎨ , fX,Y (x, y) = (b − a)(d − c) ⎩ 0,

k dx dy = k(b − a)(d − c).

if a < x < b and c < y < d; otherwise.

9.170 Suppose that F is the joint CDF of the random variables X and Y . Then, from Proposition 9.1 on page 488, P (−1 < X ≤ 1, −1 < Y ≤ 1) = F (1, 1) − F (1, −1) − F (−1, 1) + F (−1, −1) = 1 − 1 − 1 + 0 = −1,

which is impossible because a probability cannot be negative. Hence, F can’t be the joint CDF of two random variables. 9.171 Completing the square relative to the variable x, we get 1 1 x 2 + xy + y 2 = x 2 + xy + y 2 − y 2 + y 2 = (x + 1/2)2 + (3/4)y 2 . 4 4 Then, e

! " −2 x 2 +xy+y 2 /3

Consequently, - ∞-



−∞ −∞

=e

! " −2 (x+1/2)2 +(3/4)y 2 /3

g(x, y) dx dy = = =

-



e

=e

−y 2 /2

−∞ - ∞ (√ −∞

! " −y 2 /2 −2 (x+1/2)2 /3

*-

e

∞ −∞

e

=e

! "2 A !√ "2 2 3/2 −y 2 /2 − x−(−1/2)

! "2 A !√ "2 − x−(−1/2) 2 3/2

e

dx

,

√ ) −y 2 /2 √ √ 2π 3/2 e dy = 2π 3/2

(√ √ ) (√ ) √ 2π 3/2 2π = 3π.

dy



−∞

e−y

2 /2

dy

Hence, g is the “variable form” of a joint PDF of two random variables. The actual joint PDF is the function f defined on R2 by ! 2 " 1 1 2 f (x, y) = √ g(x, y) = √ e−2 x +xy+y /3 . 3π 3π

9.172 We want to determine P (X + Y ≥ 1). We can obtain that probability by using the bivariate FPF or we can use Equation (9.44) on page 534 and the univariate FPF. Let’s use the bivariate FPF: / -- .- 2 1 1 P (X + Y ≥ 1) = fX,Y (x, y) dx dy = (2x + 2 − y) dy dx 4 0 1−x x+y≥1

1 = 8

-

0

1(

) 17 . 6x + 1 + 5x 2 dx = 24

9-85

Review Exercises for Chapter 9

9.173 a) Referring to Proposition 9.7 on page 510, we get - ∞ fX,Y (x, y) dy = fX (x) = −∞

x

−∞

0



e−y dy = e−x

if x > 0, and fX (x) = 0 otherwise. Hence, X ∼ E(1). Also, - ∞ - y fX,Y (x, y) dx = e−y dx = ye−y fY (y) = if y > 0, and fY (y) = 0 otherwise. Hence, Y ∼ '(2, 1). b) We can show that X and Y are not independent random variables in several ways. One way is to note that, for x > 0, fX,Y (x, y) e−y fY | X (y | x) = = −x = e−(y−x) fX (x) e if y > x, and fY | X (y | x) = 0 otherwise. As fY | X (y | x) is not a PDF of Y , Proposition 9.10(a) on page 527 shows that X and Y are not independent. c) From Proposition 9.12 on page 534, we have, for z > 0, - ∞ fX,Y (z − y, y) dy. fX+Y (z) = −∞

The integrand is nonzero if and only if 0 < z − y < y, that is, if and only if z/2 < y < z. Hence, - z ! " fX+Y (z) = e−y dy = e−z/2 − e−z = e−z ez/2 − 1 z/2

if z > 0, and fX+Y (z) = 0 otherwise. d) For y > 0,

fX | Y (x | y) =

fX,Y (x, y) e−y 1 = −y = fY (y) ye y

if 0 < x < y, and fX | Y (x | y) = 0 otherwise. Hence, X|Y =y ∼ U(0, y) for y > 0. e) We have - 5 - 5 P (Y < 5) = fY (y) dy = ye−y dy = 1 − 6e−5 0

−∞

and

--

P (X > 3, Y < 5) =

fX,Y (x, y) dx dy =

x>3, y 3 | Y < 5) =

-

5 3

.-

5 x

/

e−y dy dx

5(

) e−x − e−5 dx = e−3 − 3e−5 .

P (X > 3, Y < 5) e−3 − 3e−5 = = 0.0308. P (Y < 5) 1 − 6e−5

f) From part (d), X|Y =5 ∼ U(0, 5). Hence,

P (X > 3 | Y = 5) = (5 − 3)/5 = 0.4.

9-86

CHAPTER 9

Jointly Continuous Random Variables

9.174 a) Jane will catch the bus if and only if she arrives no later than the bus does and the bus arrives no later than 5 minutes after Jane does, which is the event {Y ≤ X ≤ Y + 5}. Thus, by the FPF and independence, the required probability is --fX,Y (x, y) dx dy = fX (x)fY (y) dx dy. P (Y ≤ X ≤ Y + 5) = y≤x≤y+5

y≤x≤y+5

b) Let time be measured in minutes after 7:00 A.M.. Then X ∼ U(0, 30) and Y ∼ U(5, 20). Hence, from part (a), ./ -P (Y ≤ X ≤ Y + 5) =

fX (x)fY (y) dx dy =

y≤x≤y+5

=

-

20 5

.-

y+5 y

1 dx 30

/



y+5

fX (x) dx fY (y)dy

−∞

1 1 dy = 15 450

y

-

5

20

5 dy =

1 . 6

9.175 a) We have P (An ) = P (a < X1 ≤ b, . . . , a < Xn ≤ b) = P (min{X1 , . . . , Xn } > a, max{X1 , . . . , Xn } ≤ b) ! "n = P (X > a, Y ≤ b) = F (b) − F (a) .

b) The event that any particular Xk falls in the interval (a, b] has probability F (b) − F (a). Therefore, by independence, the number of Xk s that fall in the interval (a, b] has the binomial distribution with parameters n and F (b) − F (a). Hence, * , "k ! "n−k n ! F (b) − F (a) 1 − F (b) + F (a) , k = 0, 1, . . . , n. P (Ak ) = k c)

k

P (Ak )

0 1 2 3 4

0.0081 0.0756 0.2646 0.4116 0.2401

d)

k

P (Ak )

0 1 2 3 4

0.1194 0.3349 0.3522 0.1646 0.0289

9.176 a) Because a point is selected at random, a geometric probability model is appropriate. The sample space is S, the unit sphere, which has volume 4π/3. Letting | · | denote three-dimensional volume, we have, for E ⊂ R3 , |E ∩ S| 3 |E ∩ S| = = |E ∩ S|. P (E) = |S| 4π/3 4π b) Because a point is being selected at random from S, a joint PDF of X, Y , and Z should be a nonzero constant, say, k, on S, and zero elsewhere. As a joint PDF must integrate to 1, we have ----4π k. fX,Y,Z (x, y, z) dx dy dz = k dx dy dz = k|S| = 1= 3 R3

S

9-87

Review Exercises for Chapter 9

Hence, k = 3/4π, so that

c) From the FPF and part (b),

⎧ ⎨ 3 , fX,Y,Z (x, y, z) = 4π ⎩ 0,

! " P (E) = P (X, Y, Z) ∈ E =

---

if (x, y, z) ∈ S; otherwise.

fX,Y,Z (x, y, z) dx dy dz =

E

---

3 3 dx dy dz = |E ∩ S|. 4π 4π

E∩S

d) We note that, of the plane s = x with the unit sphere is, in y-z space, E for −1 < x < 1, the intersection F the disk Dx = (y, z) : y 2 + z2 < 1 − x 2 . Thus, - ∞- ∞ -" 3! " 3 3 3 ! fX,Y,Z (x, y, z) dy dz = dy dz = |Dx | = π 1 − x2 = 1 − x2 fX (x) = 4π 4π 4π 4 −∞ −∞ Dx

if −1 < x < 1, and fX (x) = 0 otherwise. Here, | · | denotes two-dimensional volume, that is, area. By symmetry, Y and Z have the same marginal PDFs as X. e) We note that, for x 2 + y 2 < 1, the&intersection of the planes & s = x and Ft = y with the unit sphere is, E in z space, the interval Ix,y = z : − 1 − x 2 − y 2 < z < 1 − x 2 − y 2 . Hence, - ∞ + + 3 3 3 fX,Y,Z (x, y, z) dz = dz = · 2 1 − x2 − y2 = 1 − x2 − y2 fX,Y (x, y) = 4π 4π 2π Ix,y −∞ if x 2 + y 2 < 1, and fX,Y (x, y) = 0 otherwise. By symmetry, the joint PDF of X and Z and the joint PDF of Y and Z are the same as the joint PDF of X and Y . f) From parts (b) and (e), we have, for x 2 + y 2 < 1, fZ | X,Y (z | x, y) =

fX,Y,Z (x, y, z) 3/4π 1 & = = & 2 2 fX,Y (x, y) (3/2π) 1 − x − y 2 1 − x2 − y2

if z ∈ Ix,y , and fZ | X,Y (z | x, y) = 0 otherwise. Hence, for x 2 + y 2 < 1, Z|X=x,Y =y ∼ U(Ix,y ). g) From parts (b) and (d), we have, for −1 < x < 1, fY,Z | X (y, z | x) =

1 fX,Y,Z (x, y, z) 3/4π = = 2 fX (x) (3/4)(1 − x ) π(1 − x 2 )

if (y, z) ∈ Dx , and fY,Z | X (y, z | x) = 0 otherwise. Hence, for −1 < x < 1, (Y, Z)|X=x ∼ U(Dx ). h) We can see in several ways that X, Y , and Z are not independent random variables. One way is to note from part (f) that fZ | X,Y (z | x, y) is not a PDF of Z, as it depends on x and y.

9.177 Let R denote the distance of the point chosen from the origin. Note that, for 0 < r < 1, the event {R ≤ r} occurs if and only if the point chosen lies in the sphere, Sr , of radius r centered at the origin. Hence, from Exercise 9.176(a), we have, for 0 < r < 1, FR (r) = P (R ≤ r) = Differentiation now yields

9

3 3 4πr 3 |Sr | = = r 3. 4π 4π 3

3r 2 , if 0 < r < 1; 0, otherwise. Thus, R has the beta distribution with parameters 3 and 1. fR (r) =

9-88

CHAPTER 9

Jointly Continuous Random Variables

9.178 Let S and T denote the lifetimes of the two components. Then system lifetime is Z = min{S, T }. From the FPF, P (Z ≤ 0.5) = P (min{S, T } ≤ 0.5) = P (S ≤ 0.5 or T ≤ 0.5)

= P (S > 0.5, T ≤ 0.5) + P (S ≤ 0.5) / / - 0.5 .- 1 - 1 .- 0.5 = f (s, t) ds dt + f (s, t) ds dt. 0

0.5

0

0

Hence, the correct answer is (e). 9.179 We must have 0 ≤ Y ≤ X ≤ 1. Hence, 9 2(x + y), if 0 < y < x < 1; fX,Y (x, y) = 0, otherwise. We have

fX (0.10) = Therefore,

-



−∞

fX,Y (0.10, y) dy =

-

0

0.10

2(0.10 + y) dy = 0.03.

2(0.10 + y) 200 fX,Y (0.10, y) = = (0.10 + y) fX (0.10) 0.03 3 if 0 < y < 0.10, and fY | X (y | 0.10) = 0 otherwise. Consequently, - 0.05 - 0.05 200 fY | X (y | 0.10) dy = (0.10 + y) dy = 0.417. P (Y < 0.05 | X = 0.10) = 3 −∞ 0 fY | X (y | 0.10) =

9.180 The sample space is D. Because a point is selected at random, a geometric probability model is appropriate. Thus, for each event E, |E| |E| P (E) = =√ |D| 2

where | · | denotes length (in the extended sense). a) Clearly, FX,Y (x, y) = 0 if x < 0 or y < 0, and FX,Y (x, y) = 1 if x ≥ 1 and y ≥ 1. Now suppose that 0 ≤ y < min{x, 1}. Then √ |{ (s, t) ∈ D : 0 < t < y }| 2y FX,Y (x, y) = P (X ≤ x, Y ≤ y) = = √ = y. √ 2 2 By symmetry, for 0 ≤ x < min{1, y}, we have FX,Y (x, y) = x. Hence, ⎧ 0, if x < 0 or y < 0; ⎪ ⎨ x, if 0 ≤ x < min{1, y}; FX,Y (x, y) = ⎪ ⎩ y, if 0 ≤ y < min{1, x}; 1, if x ≥ 1 and y ≥ 1.

b) Referring first to Proposition 9.2 on page 490 and then to part (a), we get # 0, if x < 0; FX (x) = lim FX,Y (x, y) = x, if 0 ≤ x < 1; y→∞ 1, if x ≥ 1.

By symmetry, Y has the same CDF as X.

9-89

Review Exercises for Chapter 9

c) Referring to part (b), we have fX (x) =

FX′ (x)

=

9

1, 0,

if 0 < x < 1; otherwise.

Hence, X ∼ U(0, 1). By symmetry, Y ∼ U(0, 1). d) Suppose to the contrary that X and Y have a joint PDF. Then, by the FPF, , - 1 *- x -! " fX,Y (x, y) dx dy = fX,Y (x, y) dy dx = 1 = P (X, Y ) ∈ D = 0

D

0

x

1

0 dx = 0,

Thus, we have a contradiction, which means that X and Y don’t have a joint PDF. 9.181 Let X, Y , and Z be the annual losses due to storm, fire, and theft, respectively, Then, by assumption, X, Y , and Z are independent E(1), E(2/3), and E(5/12) random variables. Let W = max{X, Y, Z}. We want to find P (W > 3). Applying first the complementation rule and then independence, we get ! " P (W > 3) = 1 − P (W ≤ 3) = 1 − P max{X, Y, Z} ≤ 3 = 1 − P (X ≤ 3, Y ≤ 3, Z ≤ 3) ( )( )( ) = 1 − P (X ≤ 3)P (Y ≤ 3)P (Z ≤ 3) = 1 − 1 − e−1·3 1 − e−(2/3)·3 1 − e−(5/12)·3 = 0.414.

9.182 Let X, Y , and Z denote the arrival times, which, by assumption are independent U(−5, 5) random variables. a) The probability that all three people arrive before noon is * , 0 − (−5) 3 1 P (X < 0, Y < 0, Z < 0) = P (X < 0)P (Y < 0)P (Z < 0) = = . 10 8

b) The probability that any particular one of the three persons arrives after 12:03 P.M. is 0.2. Let W denote the number of the three people that arrive after 12:03 P.M.. Then W ∼ B(3, 0.2). Thus, * , 3 P (W = 1) = (0.2)1 (1 − 0.2)3−1 = 0.384. 1 c) A joint PDF of X, Y , and Z is

1 1 1 1 · · = 10 10 10 1000 if −5 < x, y, z < 5, and fX,Y,Z (x, y, z) = 0 otherwise. By symmetry, the required probability equals --fX,Y,Z (x, y, z) dx dy dz 3P (Z ≥ X + 3, Z ≥ Y + 3) = 3 fX,Y,Z (x, y, z) = fX (x)fY (y)fZ (z) =

z≥x+3, z≥y+3

3 = 1000 3 = 1000 =

3 1000

-

5

−2

-

5

−2

-

0

7

.-

z−3 *- z−3

−5

.-

z−3 −5

−5

,

/

1 dy dx dz /

3 (z + 2) dx dz = 1000

u2 du = 0.343.

-

5

−2

(z + 2)2 dz

9-90

CHAPTER 9

Jointly Continuous Random Variables

--

-

9.183 a) We have |C| =

1 dx dy =



−∞

C

.-

f (x) 0

/

1 dy dx =

-



−∞

f (x) dx = 1,

where the last equality follows from the fact that f is a PDF. Therefore, a joint PDF of X and Y is fX,Y (x, y) = 1/|C| = 1 if (x, y) ∈ C, and fX,Y (x, y) = 0 otherwise. b) From Proposition 9.7 on page 510, fX (x) =

-



−∞

fX,Y (x, y) dy =

-

f (x)

1 dy = f (x).

0

c) For fixed y, let A(y) = { x : f (x) ≥ y }. Then, from Proposition 9.7, - ∞ fY (y) = fX,Y (x, y) dx = 1 dx = |A(y)|, −∞

A(y)

if y > 0, and fY (y) = 0 otherwise. d) Referring to parts (a) and (b), we find that, for f (x) > 0, fX,Y (x, y) 1 = fX (x) f (x) ! " if 0 < y < f (x), and fY | X (y | x) = 0 otherwise. Hence, Y|X=x ∼ U 0, f (x) for all x with f (x) > 0. e) Referring to parts (a) and (c), we find that, for |A(y)| > 0, fY | X (y | x) =

fX,Y (x, y) 1 = fY (y) |A(y)| ! " if x ∈ A(y), and fX | Y (x | y) = 0 otherwise. Hence, X|Y =y ∼ U A(y) for all y > 0 with |A(y)| > 0. fX | Y (x | y) =

9.184 Let X and Y be two independent observations of the random variable. We want to determine P (X ≤ 1, Y ≤ 1). By independence, and upon making the substitution u = ex , we get P (X ≤ 1, Y ≤ 1) = P (X ≤ 1)P (Y ≤ 1) = =

*

2 π

-

0

e

.-

1

2

! " dx π e−x + ex ,2 * ,2 du 2 = arctan e = 0.602. π 1 + u2 −∞

/2

9.185 Let X and Y denote claim amount and processing time, respectively. By assumption, 9 2 fX (x) = (3/8)x , if 0 < x < 2; 0, otherwise. and, for 0 < x < 2, fY | X (y | x) =

9

1/x, if x < y < 2x; 0, otherwise.

9-91

Review Exercises for Chapter 9

Thus, from the general multiplication rule, Equation (9.34) on page 517, 1 = (3/8)x x if x < y < 2x and 0 < x < 2, and fX,Y (x, y) = 0 otherwise. Consequently, from Proposition 9.7 on page 510, we have, for 0 < y < 2, - ∞ - y 9 2 fY (y) = fX,Y (x, y) dx = (3/8)x dx = y 64 −∞ y/2 fX,Y (x, y) = fX (x)fY | X (y | x) = (3/8)x 2 ·

and, for 2 < y < 4,

fY (y) =

-

P (1 ≤ Y ≤ 3) =

-

Therefore,



−∞

1

fX,Y (x, y) dx = -

3

fY (y), dy =

2

-

2

y/2

(3/8)x dx =

9 2 y dy + 64

1

9.186 We have fX,Y (x, y) = fX (x)fY (y) =

-

3 2

3 (16 − y 2 ). 64

3 (16 − y 2 ) dy = 0.781. 64

1 fY (y) 2h

if −h < x < h, and fX,Y (x, y) = 0 otherwise. a) Applying the bivariate FPF and making the substitution u = z − x, we get , -- h *- z−x 1 fX,Y (x, y) dx dy = fY (y) dy dx FX+Y (z) = P (X + Y ≤ z) = 2h −h −∞ x+y≤z

1 = 2h

-

1 FY (z − x) dx = 2h −h h

b) Differentiating the result in part (a) yields,

-

z+h

FY (u) du.

z−h

" 1 ! FY (z + h) · 1 − FY (z − h) · 1 2h - z+h 1 1 = P (z − h < Y ≤ z + h) = fY (u) du. 2h 2h z−h

′ fX+Y (z) = FX+Y (z) =

c) Applying first Proposition 9.12 and then making the substitution u = z − x, we get - ∞ - h - z+h 1 1 fX,Y (x, z − x) dx = fY (z − x) dx = fY (u) du. fX+Y (z) = 2h −h 2h z−h −∞ d) From part (c) and the FEF, - z fX+Y (t) dt = FX+Y (z) = −∞

z −∞

*

1 2h

-

t+h

t−h

,

fY (u) du dt

*- z , - z ! " 1 FY (t + h) − FY (t − h) dt = FY (t + h) dt − FY (t − h) dt 2h −∞ −∞ −∞ *- z+h , - z+h - z−h 1 1 = FY (u) du − FY (u) du = FY (u) du. 2h −∞ 2h z−h −∞ 1 = 2h

-

z

9-92

CHAPTER 9

Jointly Continuous Random Variables

9.187 Let X denote the time, in minutes, that it takes the device to warm up, and let Y denote the lifetime, in minutes, of the device after it warms up. By assumption, X and Y are independent U(1, 3) and E(1/600) random variables, respectively, and T = X + Y . We have fX,Y (x, y) = fX (x)fY (y) =

1 1 −y/600 1 −y/600 · e e = 2 600 1200

if 1 < x < 3 and y > 0, and fX,Y (x, y) = 0 otherwise. From Proposition 9.12 on page 534, for t > 1, - ∞ fT (t) = fX,Y (x, t − x) dx. −∞

The integrand is nonzero if and only if 1 < x < 3 and t − x > 0, that is, if and only if 1 < x < min{t, 3}. Thus, e−t/600 min{t,3} x/600 e dx = e dx 1200 1 1 " #1! −(t−1)/600 , ) 1 − e e−t/600 ( min{t,3}/600 2 e − e1/600 = = ! −(t−3)/600 " 1 2 − e−(t−1)/600 , 2 e

1 fT (t) = 1200

-

min{t,3}

−(t−x)/600

1 < t < 3; t > 3;

and fT (t) = 0 otherwise.

9.188 Let X(t) and Y (t) denote the x and y coordinates, respectively, at time t. By assumption, X(t) and Y (t) are independent N (0, σ 2 t) random variables. Thus, by Example 9.23 on page 541, the random variable U (t) = Y (t)/X(t) has the standard Cauchy distribution. Now, we have that /(t) = arctan U (t). Applying the univariate transformation theorem with g(u) = arctan u, we get, for −π/2 < θ < π/2, f/(t) (θ) = Thus, /(t) ∼ U(−π/2, π/2).

1

fU (u) |g ′ (u)|

= (1 + u2 ) ·

1 1 = . 2 π π(1 + u )

9.189 a) Let Z denote the fraction of harmful impurities. Then Z = XY . Using the FPF, independence, and making the substitution u = xy, we get FZ (z) = P (XY ≤ z) = =

-

0

1 *- z 0

--

xy≤z

fX,Y (x, y) dx dy =

-

0

1 *- z/x 0

,

fY (y) dy fX (x) dx

/ , - z .- 1 1 1 fY (u/x) du fX (x) dx = fX (x)fY (u/x) dx du. x 0 0 x

Therefore, from Proposition 8.5 on page 422, fZ (z) =

0

if 0 < z < 1, and fZ (z) = 0 otherwise. b) We have 9 10, if 0 < x < 0.1; fX (x) = 0, otherwise.

1

1 fX (x)fY (z/x) dx x

and

fY (y) =

9

2, 0,

if 0 < y < 0.5; otherwise.

9-93

Review Exercises for Chapter 9

From part (a), then, fZ (z) =

-

0

1

1 fX (x)fY (z/x) dx = x

-

0.1 2z

! " 1 (10 · 2) dx = 20 ln(0.1) − ln(2z) = −20 ln(20z) x

if 0 < z < 0.05, and fZ (z) = 0 otherwise. The PDF of Z is strictly decreasing on the range of Z, which, in particular, means that the fraction of harmful impurities is more likely to be small (close to 0) than large (close to 0.05). 9.190 a) From Proposition 9.7 on page 510, - ∞ fX,Y (x, y) dy = fX (x) =

x

0

−∞

8xy dy = 4x 3

if 0 < x < 1, and fX (x) = 0 otherwise. Thus, X has the beta distribution with parameters 4 and 1. b) From Proposition 9.7, fY (y) =

-



−∞

-

fX,Y (x, y) dx =

y

1

8xy dx = 4y(1 − y 2 )

if 0 < y < 1, and fY (y) = 0 otherwise. c) Let Z denote the amount of the item remaining at the end of the day, so that Z = X − Y . Then, by the FPF, we have, for 0 < z < 1, FZ (z) = P (Z ≤ z) = P (X − Y ≤ z) =

--

fX,Y (x, y) dx dy

x−y≤z

= =

-

0

-

z *z

0

,

x 0

8xy dy dx +

3

4x dx +

-

1 z

-

z

1 *-

,

x

8xy dy dx x−z

(8x 2 z − 4xz2 ) dx =

1 4 8 z − 2z2 + z. 3 3

Differentiation now yields fZ (z) =

4 3 8 z − 4z + 3 3

if 0 < z < 1, and fZ (z) = 0 otherwise. d) By the FPF, P (X − Y < X/3) = P (Y > 2X/3) =

--

fX,Y (x, y) dx dy =

y>2x/3

=

-

0

1*

-

0

1 *- x

2x/3

, y dy 8x dx

, 5 2 20 1 3 5 x 8x dx = x dx = . 18 9 0 9

9.191 Let X1 , . . . , X25 denote the 25 randomly selected claims. We know that Xk ∼ N (19.4, 25) for 1 ≤ k ≤ 25, and we can reasonably assume that X1 , . . . , X25 are independent random variables.

9-94

CHAPTER 9

Jointly Continuous Random Variables

Now, from Proposition 9.14 on page 540, X1 + · · · + X25 1 1 = X1 + · · · + X25 X¯ 25 = 25 25 25 . / * ,2 * ,2 1 1 1 1 ∼N · 19.4 + · · · + · 19.4, · 25 + · · · + · 25 = N (19.4, 1) . 25 25 25 25 Therefore, from Proposition 8.11 on page 443, P (X¯ 25 > 20) = 1 − P (X¯ 25

*

20 − 19.4 ≤ 20) = 1 − 1

,

= 0.274.

9.192 Let E denote the event that the total claim amount of Company B will exceed that of Company A. Also, let X and Y denote the claim amounts for Companies A and B, respectively, given a positive claim. By assumption, X ∼ N (10, 4) and Y ∼ N (9, 4), and X and Y are independent. First we find P (Y > X). From the independence assumption and Proposition 9.14 on page 540, Y − X ∼ N (−1, 8). Therefore, from Proposition 8.11 on page 443, , * 0 − (−1) = 0.362. P (Y > X) = P (Y − X > 0) = 1 − P (Y − X ≤ 0) = 1 − √ 8 Next let

A = event that no claim is made on either Company A or Company B,

B = event that no claim is made on Company A but at least one claim is made on Company B, C = event that at least one claim is made on Company A but no claim is made on Company B,

D = event that at least one claim is made on both Company A and Company B.

Applying the law of total probability now yields

P (E) = P (A)P (E | A) + P (B)P (E | B) + P (C)P (E | C) + P (D)P (E | D) = (0.6)(0.7) · 0 + (0.6)(0.3) · 1 + (0.4)(0.7) · 0 + (0.4)(0.3) · 0.362 = 0.223.

9.193 By assumption fI (i) =

9

6i(1 − i), 0 < i < 1; 0, otherwise.

and

fR (r) =

9

2r, 0 < r < 1; 0, otherwise.

a) For 0 < w < 1, we use independence and apply the FPF to get --! 2 " fI,R (i, r) di dr = fI (i)fR (r) di dr FW (w) = P (W ≤ w) = P I R ≤ w = i 2 r≤w

=1−

-

=1−6

1 √ w

-

.-

1 √ w

/

1 w/i 2

*

2r dr 6i(1 − i) di = 1 −

w2 w2 i−i − 3 + 2 i i 2

= 3w2 + 6w − 8w 3/2 .

,

i 2 r≤w

-

1

√ w

*

*

w2 1− 4 i

,

6i(1 − i) di

1 1 2 4 − w − w + w3/2 dw = 1 − 6 6 2 3

,

9-95

Review Exercises for Chapter 9

Differentiation now yields 9 1/2 fW (w) = 6w + 6 − 12w , 0,

if 0 < w < 1; = otherwise.

9 ! √ "2 6 1− w , 0,

if 0 < w < 1; otherwise.

b) Let W = I 2 R and X = R (the “dummy variable”). The Jacobian determinant of the transformation w = i 2 r and x = r is @ @ @ 2ir i 2 @ @ = 2ir. @ J (i, r) = @ 0 1@ √ Solving the equations w = i 2 r and x = r for i and r, we obtain the inverse transformation i = w/x and r = x. Therefore, by the bivariate transformation theorem, fW,X (w, x) =

( ) & 1 1 fI,R (i, r) = 6i(1 − i) · 2r = 6(1 − i) = 6 1 − w/x |J (i, r)| 2ir

if 0 < w < x < 1, and fW,X (w, x) = 0 otherwise. Applying Proposition 9.7 on page 510 now yields fW (w) =

-



−∞

fW,X (w, x) dx =

-

1

w

if 0 < w < 1, and fW (w) = 0 otherwise.

( ) & ! " ! √ √ "2 6 1 − w/x dx = 6 1 − 2 w + w = 6 1 − w

9.194 We have S = cW D 3 . To obtain a PDF of S, we apply the bivariate transformation theorem with S = cW D 3 and T = D (the “dummy variable”). The Jacobian determinant of the transformation s = cwd 3 and t = d is @ 3 @ @ cd 3cdw2 @@ @ J (w, d) = @ = cd 3 . @ 0 1

A! " Solving the equations s = cwd 3 and t = d for w and d, we obtain the inverse transformation w = s ct 3 and d = t. Hence, by the bivariate transformation theorem and the assumed independence of W and D, fS,T (s, t) =

( s ) 1 1 1 fW,D (w, d) = 3 fW (w)fD (d) = 3 fW fD (t) |J (w, d)| cd ct ct 3

if s > 0 and t > 0, and fS,T (s, t) = 0 otherwise. Consequently, by Proposition 9.7 on page 510, fS (s) =

-



−∞

fS,T (s, t) dt =

-

0



( s ) 1 fD (t) dt f W ct 3 ct 3

if s > 0, and fS (s) = 0 otherwise. Bn B 9.195 Let X = m k=1 Xk and Y = k=m+1 Xk . From Proposition 6.13 on page 297, X! and Y are " independent random variables. Furthermore, from Proposition 9.14 on page 540, X ∼ N mµ, mσ 2 " ! and Y ∼ N (n − m)µ, (n − m)σ 2 . Let U = X + Y and V = X. The Jacobian determinant of the transformation u = x + y and v = x is @ @ @1 1@ @ @ = −1. J (x, y) = @ 1 0@

9-96

CHAPTER 9

Jointly Continuous Random Variables

Solving the equations u = x + y and v = x for x and y, we obtain the inverse transformation x = v and y = u − v. Hence, by the bivariate transformation, for all u, v ∈ R, 1 fX,Y (x, y) = fX (x)fY (y) = fX (v)fY (u − v) |J (x, y)| , * , * ! "2 1 1 −(v−mµ)2 /2mσ 2 − (u−v)−(n−m)µ) /2(n−m)σ 2 · √ = √ e e 2π(n − m) σ 2πm σ 1 1 = e− 2 Q(u,v) , √ √ √ 2π n σ m σ 1 − m/n

fU,V (u, v) =

where 1 Q(u, v) = 1 − m/n

#*

u − nµ √ nσ

,2

G

m −2 n

*

u − nµ √ nσ

,*

v − mµ √ mσ

,

+

*

v − mµ √ mσ

,2 H

.

Theory Exercises 9.196 From the FPF, the general multiplication rule, and Equation (9.33) on page 517, !

"

P (X ∈ A, Y ∈ B) = P (X, Y ) ∈ A × B = = =

- *A

-

A

B

--

A×B

fX,Y (x, y) dx dy = ,

fX (x)fY | X (y | x) dy dx =

- *A

B

- *A

,

fX,Y (x, y) dy dx B

,

fY | X (y | x) dy fX (x) dx

P (Y ∈ B | X = x)fX (x) dx.

9.197 a) Applying the definition of a conditional PDF, the general multiplication rule, and Exercise 9.79(a), we get fX (x)fY | X (y | x) fX,Y (x, y) . = $∞ fX | Y (x | y) = fY (y) −∞ fX (s)fY | X (y | s) ds This result shows that we can obtain a conditional PDF of X given Y = y when we know a marginal PDF of X and conditional PDFs of Y given X = x for all x ∈ R. b) For y > 0, fX (x)fY | X (y | x) λ e−λx λ e−λ(y−x) = $ y −λs −λ(y−s) λe ds 0 λe −∞ fX (s)fY | X (y | s) ds

fX | Y (x | y) = $ ∞ = $y 0

λ2 e−λy λ2 e−λy 1 $y , = = y λ2 e−λy ds λ2 e−λy 0 1 ds

if 0 < x < y, and fX | Y (x | y) = 0 otherwise. Thus, X|Y =y ∼ U(0, y) for y > 0; that is, the posterior distribution of X under the condition that Y = y is uniform on the interval (0, y).

Review Exercises for Chapter 9

9-97

9.198 a) From Proposition 9.2 on page 490 and Proposition 8.1 on page 411, FU (u) = lim FU,V (u, v) = lim G(u, v) = lim max{FX (u) + FY (v) − 1, 0} v→∞

v→∞

v→∞

= max{FX (u) + 1 − 1, 0} = max{FX (u), 0} = FX (u). Thus, FU = FX . Similarly, FV = FY . b) Let S and T be random variables having H as their joint CDF. Again, from Proposition 9.2 and Proposition 8.1, FS (s) = lim FS,T (s, t) = lim H (s, t) = lim min{FX (s), FY (t)} t→∞

t→∞

t→∞

= min{FX (s), 1} = FX (s).

Thus, FS = FX . Similarly, FT = FY . c) From Bonferroni’s inequality, Exercise 2.74 on page 76, FX,Y (x, y) = P (X ≤ x, Y ≤ y) ≥ P (X ≤ x) + P (Y ≤ y) − 1 = FX (x) + FY (y) − 1. Because, in addition, FX,Y (x, y) ≥ 0, we conclude that FX,Y (x, y) ≥ max{FX (x) + FY (y) − 1, 0} = G(x, y). Now, because {X ≤ x, Y ≤ y} ⊂ {X ≤ x} and {X ≤ x, Y ≤ y} ⊂ {Y ≤ y}, the domination principle implies that FX,Y (x, y) = P (X ≤ x, Y ≤ y) ≤ min{P (X ≤ x), P (Y ≤ y)} = min{FX (x), FY (y)} = H (x, y). We have now shown that G(x, y) ≤ FX,Y (x, y) ≤ H (x, y),

or, in other words,

max{FX (x) + FY (y) − 1, 0} ≤ FX,Y (x, y) ≤ min{FX (x), FY (y)} for all x, y ∈ R.

Advanced Exercises 9.199 a) We have FX,Y (x, y) = P (X ≤ x, Y ≤ y) = P (X ≤ x, 1 − X ≤ y) ⎧ 0, if x < 0, y < 0, or x + y < 1; ⎪ ⎪ ⎪ ⎨ x − (1 − y), if 0 < 1 − y ≤ x < 1; = P (1 − y ≤ X ≤ x) = 1 − (1 − y), if 0 ≤ y < 1 and x ≥ 1; ⎪ ⎪ if 0 ≤ x < 1 and y ≥ 1; ⎪ ⎩ x − 0, 1, if x ≥ 1 and y ≥ 1. ⎧ 0, if x < 0, y < 0, or x + y < 1; ⎪ ⎪ ⎪ x + y − 1, if 0 < 1 − y ≤ x < 1; ⎨ if 0 ≤ y < 1 and x ≥ 1; = y, ⎪ ⎪ x, if 0 ≤ x < 1 and y ≥ 1; ⎪ ⎩ 1, if x ≥ 1 and y ≥ 1.

9-98

CHAPTER 9

Jointly Continuous Random Variables

b) Referring to the result of part (a), we deduce from Proposition 9.2 on page 490 that # 0, if x < 0; FX (x) = lim FX,Y (x, y) = x, if 0 ≤ x < 1; y→∞ 1, if x ≥ 1. and # 0, if y < 0; FY (y) = lim FX,Y (x, y) = y, if 0 ≤ y < 1; x→∞ 1, if y ≥ 1. c) Differentiating the results in part (b) yields 9 1, if 0 < x < 1; ′ fX (x) = FX (x) = 0, otherwise. and

fY (y) =

FY′ (y)

=

9

1, 0,

if 0 < y < 1; otherwise.

Thus, we see that both X and Y have the uniform distribution on the interval (0, 1). d) No, X and Y don’t have a joint PDF. Suppose to the contrary that they do. Note that, as Y = 1 − X, we have P (X + Y = 1) = 1. Hence, from the FPF, -fX,Y (x, y) dx dy 1 = P (X + Y = 1) = x+y=1

=

- 1 .0

1−x 1−x

/

fX,Y (x, y) dy dx =

-

0

1

0 dx = 0,

which is impossible. Hence, X and Y don’t have a joint PDF. 9.200 a) From Proposition 9.2 on page 490, FX (x) = lim FX,Y (x, y) = y→∞

and

9

0, 1 − e−x ,

# 0,

if x < 0; if x ≥ 0.

if y < 0; FY (y) = lim FX,Y (x, y) = 1/2, if 0 ≤ y < 1; x→∞ 1, if y ≥ 1. b) From part (a), we see that X ∼ E(1) and Y ∼ B(1, 1/2) (or, equivalently, Y has the Bernoulli distribution with parameter 1/2). c) No, X and Y don’t have a joint PDF. One way to see this fact is to note that Y is a discrete random variable. If two random variables have a joint PDF, then both must have a (marginal) PDF; in particular, both must be continuous random variables. d) From part (a), we see that FX,Y (x, y) = FX (x)FY (y) for all x, y ∈ R. Hence, from Exercise 9.15(b), we conclude that X and Y are independent random variables. e) Applying first the result of part (d) and then that of parts (a) and (b), we get " ! P (3 < X < 5, Y = 0) = P (3 < X < 5)P (Y = 0) = FX (5) − FX (3) P (Y = 0) (! ) " ! ") 1 ( −3 e − e−5 = 1 − e−5 − 1 − e−3 P (Y = 0) = 2 = 0.0215.

Review Exercises for Chapter 9

9-99

9.201 a) For 0 ≤ y < 1, ! " FY (y) = P (Y ≤ y) = P X − ⌊X⌋ ≤ y

= P (X ≤ y) + P (1 ≤ X ≤ y + 1) + P (2 ≤ X ≤ y + 2) =

y y y + + = y. 3 3 3

Therefore Y ∼ U(0, 1). b) The range of (X, Y ) consists of following three line segments: { (x, y) : y = x, 0 < x < 1 }, { (x, y) : y = x − 1, 1 ≤ x < 2 }, { (x, y) : y = x − 2, 2 ≤ x < 3 }. A graph of the range of (X, Y ) is as follows:

c) Clearly, we have FX,Y (x, y) = 0 if x < 0 or y < 0, and FX,Y (x, y) = 1 if x ≥ 3 and y ≥ 1. From part (a), we see that, if x ≥ 3 and 0 ≤ y < 1, FX,Y (x, y) = P (X ≤ x, Y ≤ y) = P (Y ≤ y) = FY (y) = y. Also, if 0 ≤ x < 3 and y ≥ 1, FX,Y (x, y) = P (X ≤ x, Y ≤ y) = P (X ≤ x) = FX (x) = x/3. Furthermore, by referring to the graph obtained in part (b), we find that, for 0 ≤ x < 3 and 0 ≤ y < 1, ⎧ x/3, ⎪ ⎪ ⎪ y/3, ⎪ ⎪ ⎨ (x − 1 + y)/3, FX,Y (x, y) = 2y/3, ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (x − 2 + 2y)/3, y,

if 0 ≤ x < y; if y ≤ x < 1; if 1 ≤ x < y + 1; if y + 1 ≤ x < 2; if 2 ≤ x < y + 2; if y + 2 ≤ x < 3.

9-100

CHAPTER 9

Jointly Continuous Random Variables

In summary, then, ⎧ 0, ⎪ ⎪ ⎪ x/3, ⎪ ⎪ ⎪ ⎪ y/3, ⎪ ⎪ ⎨ (x − 1 + y)/3, FX,Y (x, y) = 2y/3, ⎪ ⎪ ⎪ ⎪ (x − 2 + 2y)/3, ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ y, 1,

if x < 0 or y < 0; if 0 ≤ x < y < 1 or if 0 ≤ x < 3 and y ≥ 1; if 0 ≤ y ≤ x < 1; if 1 ≤ x < y + 1 < 2; if 1 ≤ y + 1 ≤ x < 2; if 2 ≤ x < y + 2 < 3; if 2 ≤ y + 2 ≤ x < 3 or if x ≥ 3 and 0 ≤ y < 1; if x ≥ 3 and y ≥ 1.

9.202 a) We have Z = X − Y = X − (X − ⌊X⌋) = ⌊X⌋. Thus, 9 9 P (z ≤ X < z + 1), if z = 0, 1, 2; 1/3, if z = 0, 1, 2; pZ (z) = P (Z = z) = = 0, otherwise. 0, otherwise.

Therefore, Z has the discrete uniform distribution on the set {0, 1, 2}. b) If two random variables have a joint PDF, their difference is a continuous random variable (with a PDF). By part (a), X − Y is a discrete random variable and, hence, it can’t be a continuous random variable. Thus, X and Y can’t have a joint PDF. c) Let R denote the range of X and Y . We know that R is the union of the three line segments shown in the solution to Exercise 9.201(b), which we call L1 , L2 , and L3 , respectively. If X and Y had a joint PDF, we would have -! " 1 = P (X, Y ) ∈ R = fX,Y (x, y) dx dy = =

--

R

L1

--

- 1 *-

,

0

fX,Y (x, y) dx dy +

fX,Y (x, y) dx dy +

L2

x

fX,Y (x, y) dy dx +

x

- 2 .1

--

fX,Y (x, y) dx dy

L3

/

x−1 x−1

fX,Y (x, y) dy dx +

-

3 2

.-

x−2

/

fX,Y (x, y) dy dx x−2

= 0 + 0 + 0 = 0,

which is impossible. Hence, X and Y can’t have a joint PDF. 9.203 a) Let M denote the sample median. For a sample of size 2n + 1, M is the (n + 1)st order statistic. Hence, from the solution to Exercise 9.33(b), ! "(n+1)−1 ! "(2n+1)−(n+1) (2n + 1)! " ! " f (m) F (m) fM (m) = ! 1 − F (m) (n + 1) − 1 ! (2n + 1) − (n + 1) ! =

b) In this case,

! "n ! "n (2n + 1)! f (m) F (m) 1 − F (m) . 2 (n!)

FX (x) =

#

0, x, 1,

if x < 0; if 0 ≤ x < 1; if x ≥ 1.

and

fX (x) =

9

1, 0,

if 0 < x < 1; otherwise.

Review Exercises for Chapter 9

9-101

Applying the result of part (a) with n = 2 now yields

5! 1 · m2 (1 − m)2 = 30m2 (1 − m)2 (2!)2

fM (m) =

if 0 < m < 1, and fM (m) = 0 otherwise. Thus, M has the beta distribution with parameters 3 and 3. We want to determine the probability that the sample median will be within 1/4 of the distribution median of 1/2. Applying the FPF yields - 3/4 P (|M − 1/2| ≤ 1/4) = fM (m) dm = 30m2 (1 − m)2 = 0.793. |m−1/2|≤1/4

1/4

9.204 For i = 1, 2, let Li denote the event that employee i incurs a loss and let Yi denote the loss amount in thousands of dollars for employee i. By assumption, L1 and L2 are independent events each having probability 0.4, Y1 and Y2 are independent random variables, and the conditional distribution of Yi given event Li occurs is uniform on the interval (1, 5). First let’s determine P (Yi > 2) for i = 1, 2. By the law of total probability, 3 + 0.6 · 0 = 0.3. 4 Next let’s determine P ({Y1 > 2} ∪ {Y2 > 2}). We have, by the general addition rule and the independence of Y1 and Y2 , P (Yi > 2) = P (Li )P (Yi > 2 | Li ) + P (Lci )P (Y1 > 2 | Lci ) = 0.4 ·

P ({Y1 > 2} ∪ {Y2 > 2}) = P (Y1 > 2) + P (Y2 > 2) − P (Y1 > 2, Y2 > 2) = P (Y1 > 2) + P (Y2 > 2) − P (Y1 > 2)P (Y2 > 2) = 0.3 + 0.3 − (0.3)2 = 0.51.

Now let’s determine P (Y1 + Y2 > 8). Again, by the law of total probability,

P (Y1 + Y2 > 8) = P (L1 ∩ L2 )P (Y1 + Y2 > 8 | L1 ∩ L2 ) ! " ! " + P (L1 ∩ L2 )c P Y1 + Y2 > 8 | (L1 ∩ L2 )c . ! " As the maximum loss of an employee is 5, P Y1 + Y2 > 8 | (L1 ∩ L2 )c = 0. Also, by the independence of L1 and L2 , we have P (L1 ∩ L2 ) = P (L1 )P (L2 ) = 0.4 · 0.4 = 0.16. Moreover, by the independence and assumed conditional distributions of Y1 and Y2 , P (Y1 + Y2 > 8 | L1 ∩ L2 ) = P (X1 + X2 > 8), where X1 and X2 are independent U(1, 5) random variables. Applying the FPF, we get / -- 5 .- 5 1 P (X1 + X2 > 8) = fX1 ,X2 (x1 , x2 ) dx1 dx2 = dx2 dx1 3 8−x1 16 x1 +x2 >8

1 = 16 Thus,

-

3

5

(x1 − 3) dx1 = 0.125.

P (Y1 + Y2 > 8) = 0.16 · 0.125 + 0.84 · 0 = 0.02.

a) Here we assume that “one” means a specified one, say, employee 1. The problem is then to determine the probability P (Y1 + Y2 > 8 | Y1 > 2). Observing that {Y1 + Y2 > 8} ⊂ {Y1 > 2} and applying the conditional probability rule, we get P (Y1 + Y2 > 8 | Y1 > 2) =

P (Y1 + Y2 > 8) 0.02 1 P (Y1 + Y2 > 8, Y1 > 2) = = = . P (Y1 > 2) P (Y1 > 2) 0.3 15

9-102

CHAPTER 9

Jointly Continuous Random Variables

b) Here we assume that “one” means at least one. The problem is then to determine the probability P (Y1 + Y2 > 8 | {Y1 > 2} ∪ {Y2 > 2}). Noting that {Y1 + Y2 > 8} ⊂ {{Y1 > 2} ∪ {Y2 > 2}}, we get P (Y1 + Y2 > 8 | {Y1 > 2} ∪ {Y2 > 2}) =

P (Y1 + Y2 > 8) 0.02 2 = = . P ({Y1 > 2} ∪ {Y2 > 2}) 0.51 51

c) Here we assume that “one” means exactly one. The problem is then to determine the probability P (Y1 + Y2 > 8 | E), where E = {Y1 > 2, Y2 ≤ 2} ∪ {Y1 ≤ 2, Y2 > 2}. However, given that E occurs, the maximum total loss amount is $7000. Hence, P (Y1 + Y2 > 8 | E) = 0; that is, P (Y1 + Y2 > 8 | {Y1 > 2, Y2 ≤ 2} ∪ {Y1 ≤ 2, Y2 > 2}) = 0. 9.205 a) Let FX and FY denote the two marginal CDFs of Fα . Then, from Proposition 9.2 on page 490 and Proposition 8.1 on page 411, I ! "! "J FX (x) = lim Fα (x, y) = lim G(x)H (y) 1 + α 1 − G(x) 1 − H (y) y→∞

and

y→∞

I ! " J = G(x) · 1 · 1 + α 1 − G(x) (1 − 1) = G(x)

I ! "! "J FY (y) = lim Fα (x, y) = lim G(x)H (y) 1 + α 1 − G(x) 1 − H (y) x→∞ x→∞ I ! "J = 1 · H (y) 1 + α(1 − 1) 1 − H (y) = H (y).

Thus, FX = G and FY = H . b) In this case, we take G(x) =

#

0, x, 1,

if x < 0; if 0 ≤ x < 1; if x ≥ 1.

and

H (y) =

# 0,

y, 1,

if y < 0; if 0 ≤ y < 1; if y ≥ 1.

Hence, ⎧ 0, ! ⎪ " ⎪ ⎪ ⎨ xy 1 + α(1 − x)(1 − y) , Fα (x, y) = y, ⎪ ⎪ ⎪ ⎩ x, 1,

if x < 0 or y < 0; if 0 ≤ x < 1 and 0 ≤ y < 1; if x ≥ 1 and 0 ≤ y < 1; if 0 ≤ x < 1 and y ≥ 1; if x ≥ 1 and y ≥ 1.

From part (a), each Fα has G and H as its marginal CDFs, which, in this case, are both CDFs of a uniform distribution on the interval (0, 1). Now, it is easy to see that, !if Fα1 = Fα2 , then α1 = α"2 . Hence, we have an uncountably infinite number of different bivariate CDFs one for each α ∈ [−1, 1] whose marginals are both CDFs of a uniform distribution on the interval (0, 1). c) Taking the mixed partial of Fα in part (b), we get fα (x, y) =

9

1 + α(1 − 2x)(1 − 2y), if 0 ≤ x < 1 and 0 ≤ y < 1; 0, otherwise.

9-103

Review Exercises for Chapter 9

9.206 a) Taking the mixed partial of Fα , as defined in the statement of Exercise 9.205, yields ! " fα (x, y) = (1 + α)g(x)h(y) − 2αg(x)h(y) G(x) + H (y) + 4αg(x)h(y)G(x)H (y).

b) Let fX and fY denote the two marginal CDFs of Fα . Applying Proposition 9.7 on page 510 and making the substitution u = H (x) in the appropriate intergrals, we get - ∞ fX (x) = fα (x, y) dy =

−∞ ∞(

-

−∞

) ! " (1 + α)g(x)h(y) − 2αg(x)h(y) G(x) + H (y) + 4αg(x)h(y)G(x)H (y) dy

= (1 + α)g(x)

-



−∞

+ 4αg(x)G(x)

h(y) dy − 2αg(x)G(x) -



-



−∞

h(y) dy − 2αg(x)

-



h(y)H (y) dy

−∞

h(y)H (y) dy

−∞

= (1 + α)g(x) − 2αg(x)G(x) − 2αg(x)

-

0

1

u du + 4αg(x)G(x)

= (1 + α)g(x) − 2αg(x)G(x) − αg(x) + 2αg(x)G(x) = g(x).

-

1

u du 0

Thus, fX = g. Similarly, we find that fY = h. c) From Exercise 9.205(a), we know that FX = G and FY = H . Therefore, we see that fX = g and fY = h. d) From parts (a) and (b) and Proposition 9.9 on page 523, the corresponding random variables are independent if and only if ! " (1 + α)g(x)h(y) − 2αg(x)h(y) G(x) + H (y) + 4αg(x)h(y)G(x)H (y) = g(x)h(y) or ! " (1 + α) − 2α G(x) + H (y) + 4αG(x)H (y) = 1.

Letting both x and y go to infinity in the previous display, we get, in view of Proposition 8.1 on page 411, that (1 + α) − 2α · (1 + 1) + 4α · 1 · 1 = 1, or 1 + α = 1, which means that α = 0. e) In this case, # 0, if y < 0; # 0, if x < 0; G(x) = x, if 0 ≤ x < 1; and H (y) = y, if 0 ≤ y < 1; 1, if y ≥ 1. 1, if x ≥ 1. and 9 9 1, if 0 < x < 1; 1, if 0 < y < 1; g(x) = and h(y) = 0, otherwise. 0, otherwise. Hence, from part (a), fα (x, y) = (1 + α) · 1 · 1 − 2α · 1 · 1 · (x + y) + 4α · 1 · 1 · xy = 1 + α − 2αx − 2αy + 4αxy = 1 + α(1 − 2x)(1 − 2y) if 0 < x < 1 and 0 < y < 1, and fα (x, y) = 0 otherwise. This result is the same as the one found in Exercise 9.205(c).

9-104

CHAPTER 9

Jointly Continuous Random Variables

9.207 a) Referring to Exercises 9.83 and 9.85 on page 522, we get fX | Y (x | y) =

fX (x)pY | X (y | x) fX (x)pY | X (y | x) hX,Y (x, y) = $∞ . = $∞ pY (y) −∞ hX,Y (s, y) ds −∞ fX (s)pY | X (y | s) ds

This result shows that we can obtain a conditional PDF of X given Y = y when we know a marginal PDF of X and conditional PMFs of Y given X = x for all x ∈ R. b) Let Y denote the number of employees who take at least one sick day during the year. By assumption, Y|1=p ∼ B(n, p) and 1 has the beta distribution with parameters α and β. We want to determine the probability distribution of 1|Y =k . Applying Bayes’s rule, we get, for 0 < p < 1, f1 (p)pY | 1 (k | p) f1 | Y (p | k) = $ ∞ −∞ f1 (s)pY | 1 (k | s) ds

* , n k 1 α−1 β−1 (1 − p) · p (1 − p)n−k p k B(α, β) =- 1 * , 1 n k α−1 β−1 s (1 − s) · s (1 − s)n−k ds k 0 B(α, β)

p α+k−1 (1 − p)β+n−k−1 = $1 . α+k−1 (1 − s)β+n−k−1 ds s 0

Using the fact that a beta PDF integrates to 1, we conclude that f1 | Y (p | k) =

1 p α+k−1 (1 − p)β+n−k−1 , B(α + k, β + n − k)

if 0 < p < 1, and f1 | Y (p | k) = 0 otherwise. Thus, given that the number of employees who take at least one sick day during the year is k, the posterior probability distribution of 1 is the beta distribution with parameters α + k and β + n − k.

9.208 a) Intuitively, as n → ∞, we would expect Un → 0 and Vn → 1, so that (Un , Vn ) → (0, 1). Thus, we would expect the joint probability distribution of Un and Vn to converge to that of the discrete random vector that equals (0, 1) with probability 1. b) The common CDF of X1 , X2 , . . . , is given by # 0, if t < 0; F (t) = t, if 0 ≤ t < 1; 1, if t ≥ 1. Because Un ≤ Vn , we have, for x ≥ y,

! "n Fn (x, y) = P (Un ≤ x, Vn ≤ y) = P (Vn ≤ y) = F (y) .

From the preceding result and Equation (9.19), we find that ⎧ 0, if x < 0 or y < 0; ⎪ ⎪ ⎪ ⎨ y n − (y − x)n , if 0 ≤ x < y < 1; if 0 ≤ y ≤ x < 1 or if x ≥ 1 and 0 ≤ y < 1; Fn (x, y) = y n , ⎪ n, ⎪ 1 − (1 − x) if 0 ≤ x < 1 and y ≥ 1; ⎪ ⎩ 1, if x ≥ 1 and y ≥ 1.

Review Exercises for Chapter 9

Consequently, lim Fn (x, y) =

n→∞

9

0, 1,

9-105

if x ≤ 0 or y < 1; if x > 0 and y ≥ 1.

c) We first recall that, if two events each have probability 1, then so does their intersection. Thus, It follows that

P (X = 0, Y = 1) = P ({X = 0} ∩ {Y = 1}) = 1. FX,Y (x, y) =

9

0, if x < 0 or y < 1; 1, if x ≥ 0 and y ≥ 1.

d) Yes, from parts (b) and (c), as n → ∞, the joint CDF of Un and Vn converges to that of the discrete random vector that equals (0, 1) with probability 1, except at the points in R2 at which the CDF of the latter is discontinuous. Note: As discussed in advanced probability courses, requiring convergence at every point of R2 is too strong a condition.

CHAPTER TEN

Instructor’s Solutions Manual Anna Amirdjanova University of Michigan, Ann Arbor

Neil A. Weiss Arizona State University

FOR A Course In

Probability Neil A. Weiss Arizona State University

Boston London Mexico City

Toronto Munich

San Francisco Sydney Paris

New York

Tokyo

Cape Town

Singapore

Madrid

Hong Kong

Montreal

Publisher: Greg Tobin Editor-in-Chief: Deirdre Lynch Associate Editor: Sara Oliver Gordus Editorial Assistant: Christina Lepre Production Coordinator: Kayla Smith-Tarbox Senior Author Support/Technology Specialist: Joe Vetere Compositor: Anna Amirdjanova and Neil A. Weiss Accuracy Checker: Delray Schultz Proofreader: Carol A. Weiss

Copyright © 2006 Pearson Education, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.

Chapter 10

Expected Value of Continuous Random Variables 10.1 Expected Value of a Continuous Random Variable Basic Exercises 10.1 From Exercise 8.41, fZ (z) =

!

3z2 /r 3 , 0,

if 0 < z < r; otherwise.

Thus, from Definition 10.1 on page 565, E(X) =

"



−∞

zfZ (z) dz =

"

r 0

3z2 3 z · 3 dz = 3 r r

"

r 0

z3 dz =

3 r. 4

On average, we would expect the point chosen to be a distance of 3r/4 from the center of the sphere. 10.2 From Exercise 8.43, fX (x) =

1 , π(1 + x 2 )

x ∈ R.

In other words, X has the standard Cauchy distribution. From Example 10.5 on page 568, we know that a Cauchy distribution does not have finite expectation. 10.3 From Exercise 8.44, fY (y) =

!

2(h − y)/ h2 , 0,

if 0 < y < h; otherwise.

Hence, from Definition 10.1 on page 565, E(X) =

"

∞ −∞

yfY (y) dy =

"

h 0

2(h − y) 2 y dy = 2 2 h h

"

h 0

(hy − y 2 ) dy =

On average, the point chosen will be a distance of h/3 from the base of the triangle. 10.4 a) By symmetry, we would guess that E(X) = 0.

10-1

1 h. 3

10-2

CHAPTER 10

Expected Value of Continuous Random Variables

b) From Example 9.8,

2# 1 − x2 π if −1 < x < 1, and fX (x) = 0 otherwise. Hence, from Definition 10.1 on page 565, " ∞ " 2 1 x E(X) = xfX (x) dx = dx = 0, √ π −1 1 − x 2 −∞ $√ 1 − x 2 is an odd function. where the last equation follows from the fact that x c) By symmetry, Y has the same probability distribution as X and, hence, the same expected value. Thus, E(Y ) = 0. fX (x) =

10.5 Obviously, X has finite expectation. Applying Definition 10.1 on page 565 yields " 30 " 30 " ∞ 1 1 (30)2 1 xfX (x) dx = x· x dx = E(X) = dx = · = 15. 30 30 0 30 2 −∞ 0

The expected time that John waits for the train is 15 minutes, which we could have guessed in advance. On average, John waits 15 minutes for the train to arrive. 10.6 a) We have fX (x) =

!

Thus, from Definition 10.1 on page 565, " ∞ xfX (x) dx = E(X) = −∞

1/(b − a), 0, 1 b−a

"

if a < x < b; otherwise.

b a

x dx =

1 b2 − a 2 a+b = . b−a 2 2

b) In Exercise 10.5, X ∼ U(0, 30). Thus, from part (a), E(X) = (0 + 30)/2 = 15.

10.7 From Example 10.2 on page 566, the expected value of an exponential random variable is the reciprocal of its parameter. Hence, the expected duration of a pause during a monologue is 1/1.4, or 5/7 second. 10.8 Let X denote the time, in minutes, that it takes a finisher to complete the race. We are given that X ∼ N (61, 81). From Example 10.3 on page 567, the expected value of a normal random variable is its first parameter. Thus, E(X) = 61 minutes. On average, it takes 61 minutes for a finisher to complete the race. 10.9 We have

1 x α−1 (1 − x)β−1 B(α, β) if 0 < x < 1, and fX (x) = 0 otherwise. Referring to Equation (8.43) on page 451, we get fX (x) =

"

" 1 1 E(X) = xfX (x) dx = x · x α−1 (1 − x)β−1 dx B(α, β) −∞ 0 " 1 1 1 = x (α+1)−1 (1 − x)β−1 dx = · B(α + 1, β) B(α, β) 0 B(α, β) =



$(α + β) $(α + 1)$(β) α = . $(α)$(β) $(α + β + 1) α+β

10.1

Expected Value of a Continuous Random Variable

10-3

10.10 Let X denote the proportion of these manufactured items that require service during the first 5 years of use. We know that X has the beta distribution with parameters α = 2 and β = 3. Applying Exercise 10.9, we find that E(X) = 2/(2 + 3) = 2/5. On average, 40% of the manufactured items require service during the first 5 years of use. 10.11 Some possible beta distributions follow. To obtain the mean, we apply Exercise 10.9, and to determine P (X ≥ µ), we use Equation (8.55) on page 457. a) α = β = 1: µX = 1/(1 + 1) = 0.5; P (X ≥ µX ) = P (X ≥ 0.5) = 1 − P (X < 0.5) & ' 1+1−1 % &1 + 1 − 1' 1 j 1+1−1−j =1− (0.5) (1 − 0.5) =1− (0.5)1 (0.5)0 j 1 j =1 = 1 − 0.5 = 0.5.

b) α = 3, β = 1: µX = 3/(3 + 1) = 0.75; P (X ≥ µX ) = P (X ≥ 0.75) = 1 − P (X < 0.75) & ' 3+1−1 % &3 + 1 − 1' 3 =1− (0.75)j (1 − 0.75)3+1−1−j = 1 − (0.75)3 (0.25)0 j 3 j =3 = 1 − (0.75)3 > 1/2.

c) α = 1, β = 3: µX = 1/(1 + 3) = 0.25; P (X ≥ µX ) = P (X ≥ 0.25) = 1 − P (X < 0.25) 1+3−1 % &1 + 3 − 1' =1− (0.25)j (1 − 0.25)1+3−1−j j j =1 & ' & ' & ' 3 3 3 2 2 1 1 (0.25) (0.75) − (0.25)3 (0.75)0 < 1/2. =1− (0.25) (0.75) − 2 3 1 10.12 We have

2 fX (x) = b−a

&

|a + b − 2x| 1− b−a

'

if a < x < b, and fX (x) = 0 otherwise. Applying Definition 10.1 on page 565 and making the substitutions u = 1 − (a + b − 2x)/(b − a) and u = 1 − (2x − a − b)/(b − a), respectively, we get ' " b & 2 |a + b − 2x| xfX (x) dx = x 1− dx E(X) = b−a a b−a −∞ ' & ' " (a+b)/2 & " b 2 a + b − 2x 2 2x − a − b = x 1− dx + x 1− dx b−a a b−a b − a (a+b)/2 b−a ' ' " 1& " 1& " 1 b−a b−a a+b = u u du + u u du = (a + b) . a+ b− u du = 2 2 2 0 0 0 "



For a more elegant approach, see Exercise 10.20(b).

10-4

CHAPTER 10

Expected Value of Continuous Random Variables

10.13 We know that fX (x) = c(1 + x)−4 if x > 0, and fX (x) = 0 otherwise. To determine c, we proceed as follows: 1=

"



−∞

fX (x) dx = c

"



dx =c (1 + x)4

0

"



1

du 1 = c. 4 3 u

Hence, c = 3. Consequently, making the substitution u = 1 + x yields " ∞ x u−1 xfX (x) dx = 3 dx = 3 du E(X) = 4 (1 + x) u4 −∞ 0 1 ' & ' " ∞& 1 1 1 1 1 − 4 du = 3 − = . =3 3 2 3 2 u u 1 "

"





10.14 Let X1 , X2 , X3 , and X4 denote the bids. By assumption, the Xj s are independent random variables with common CDF F (x) = (1 + sin πx)/2 for 3/2 ≤ x < 5/2. It follows that the common PDF of the Xj s is f (x) = (π cos πx)/2 if 3/2 < x < 5/2, and f (x) = 0 otherwise. The accepted bid is Y = max{X1 , X2 , X3 , X4 }. Thus, from Exercise 9.70, '3 *π + &1 ( )3 π fY (y) = 4f (y) F (y) = 4 · cos πy (1 + sin πy) = (cos πy)(1 + sin πy)3 2 2 4

if 3/2 < y < 5/2, and fY (y) = 0 otherwise. Consequently, "



π E(Y ) = yfY (y) dy = 4 −∞

"

5/2

3/2

y(cos πy)(1 + sin πy)3 dy.

Therefore, the correct answer is (d). 10.15 We have fX (x) =

π

(

θ2

θ ), + (x − η)2

−∞ < x < ∞.

Let Y be a random variable that has the standard Cauchy distribution. We first make the substitution y = (x − η)/θ and then apply the triangle inequality to get " ∞ 1 dy = |x|fX (x) dx = |θy + η| |θy + η|fY (y) dy π(1 + y 2 ) −∞ −∞ −∞ " ∞ " ∞ " ≥ |y|fY (y) dy − |η| (|θy| − |η|) fY (y) dy = θ

"



"





−∞ " ∞

−∞

−∞



−∞

fY (y) dy

|y|fY (y) dy − |η| = ∞,

where the last equality follows from Example 10.5 on page 568, which shows that a standard Cauchy random variable doesn’t have finite expectation.

10.1

10.16 We first determine the two marginal PDFs: " ∞ " fX (x) = fX,Y (x, y) dy = 2x if 0 < x < 2, and fX (x) = 0 otherwise. Also, " ∞ " fY (y) = fX,Y (x, y) dx = 2y

if 0 < y < 1, and fY (y) = 0 otherwise. Therefore, " " ∞ xfX (x) dx = E(X) = −∞

E(Y ) =

∞ −∞

2

2y

−∞

"

x/2

0

−∞

and

10-5

Expected Value of a Continuous Random Variable

yfY (y) dy = 4

"

0

1

2

x 0

y dy = x 3 /4

x dx = 4y(1 − y 2 )

x3 25 8 dx = = 4 20 5 &

1 1 y (1 − y ) dy = 4 − 3 5 2

2

'

=

8 . 15

10.17 Let us set S = { (t1 , t2 ) : 0 < t1 < 6, 0 < t2 < 6, t1 + t2 < 10 }. We know that fT1 ,T2 (t1 , t2 ) = c if (t1 , t2 ) ∈ S, and fT1 ,T2 (t1 , t2 ) = 0 otherwise. To determine c, we draw a graph of S and find that 1 1 · 10 · 10 − 2 · · 4 · 4 = 34. 2 2

|S| = Therefore, 1=

""

fT1 ,T2 (t1 , t2 ) dt1 dt2 = c

S

""

1 dt1 dt2 = c|S| = 34c,

S

which means that c = 1/34. Now let T denote the time between a car accident and payment of the claim, so that T = T1 + T2 . We note that ,the range of T is the interval (0, 10). For t in the range of T , ∞ we have, from Proposition 9.12, fT (t) = −∞ fT1 ,T2 (s, t − s) ds. The integrand is nonzero if and only if 0 < s < 6 and 0 < t − s < 6, that is, if and only if 0 < s < 6 and t − 6 < s < t, which is true if and only if max{t − 6, 0} < s < min{t, 6}. Now, ! ! t − 0, if t < 6; t, if t < 6; min{t, 6} − max{t − 6, 0} = = 6 − (t − 6), if t ≥ 6. 12 − t, if t ≥ 6. Hence,

fT (t) =

"

min{t,6}

max{t−6,0}

Consequently, "



) 1 1( ds = min{t, 6} − max{t − 6, 0} = 34 34

1 E(T ) = tfT (t) dt = 34 −∞

"

0

6

1 t dt + 34 2

"

6

-

t/34, (12 − t)/34, 0,

10

t (12 − t) dt =

if 0 < t < 6; if 6 < t < 10; otherwise.

1 (72 + 368/3) = 5.73. 34

10.18 a) On average X will equal 1/2 and, on average, Y will equal 1/2 of X. Hence, on average, Y will equal 1/4. Thus, we guess that E(Y ) = 1/4. b) By assumption, ! 1, if 0 < x < 1; fX (x) = 0, otherwise.

10-6

CHAPTER 10

Expected Value of Continuous Random Variables

and, for 0 < x < 1, fY | X (y | x) =

!

1/x, if 0 < y < x; 0, otherwise.

From the general multiplication rule and Proposition 9.7 on page 510, " ∞ " ∞ " 1 " 1 1 dx fY (y) = fX,Y (x, y) dx = fX (x)fY | X (y | x) dx = 1 · dx = = − ln y x −∞ −∞ y y x

if 0 < y < 1, and fY (y) = 0 otherwise. Applying Definition 10.1 and using integration by parts, we now get . 2 /1 " 1 " ∞ y 1 y2 y ln y dy = − = . E(Y ) = yfY (y) dy = − ln y − 2 4 0 4 −∞ 0

Theory Exercises 10.19 Let fX be any PDF of X. We will prove that the function g defined by g(x) = fX (x) if |x| ≤ M, and g(x) = 0 otherwise, is a PDF of X. To do so, we show that g satisfies Equation (8.15) on page 422. First note that, because P (|X| ≤ M) = 1, we have FX (x) = 0 if x ≤ −M, and FX (x) = 1 if x ≥ M. Now we consider three cases. Case 1: x < −M In this case, FX (x) = 0 =

"

x

g(t) dt.

−∞

Case 2: −M ≤ x < M In this case,

( ) FX (x) = FX (−M) + FX (x) − FX (−M) = 0 + =

Case 3: x ≥ M In this case,

"

−M

−∞

g(t) dt +

"

x

−M

g(t) dt =

"

x

"

x

−M

fX (t) dt

g(t) dt. −∞

( ) ( ) FX (x) = FX (−M) + FX (M) − FX (−M) + FX (x) − FX (M) " M " M =0+ fX (t) dt + (1 − 1) = g(t) dt =

"

−M −M

−∞

g(t) dt +

"

M

−M

g(t) dt +

"

−M x

M

g(t) dt =

"

x

g(t) dt. −∞

,x We have shown that FX (x) = −∞ g(t) dt for all x ∈ R. Hence, g is a PDF of X. Thus, without loss of generality, we can assume that a PDF of X is such that fX (x) = 0 for |x| > M. Now, " M " M " ∞ |x|fX (x) dx = |x|fX (x) dx ≤ M fX (x) dx = MP (|X| ≤ M) = M < ∞. −∞

−M

−M

Hence X has finite expectation. Moreover, 0" ∞ 0 " 0 0 0 | E(X)| = 0 xfX (x) dx 00 ≤ −∞



−∞

|x|fX (x) dx ≤ M.

10.1

Expected Value of a Continuous Random Variable

10-7

10.20 a) Define g(x) = xfX (c − x). Then, because fX is symmetric about c, we have g(−x) = −xfX (c + x) = −xfX (c − x) = −g(x).

Thus, g is an odd function. Consequently, making the substitution y = c − x, we get " ∞ " ∞ " ∞ " ∞ xfX (x) dx = (c − y)fX (c − y) dy = c fX (c − y) dy − yfX (c − y) dy −∞

=c

−∞ " ∞

−∞

fX (x) dx −

"



−∞

−∞

−∞

g(y) dy = c · 1 − 0 = c.

b) If X ∼(U(a, b), ) then fX is symmetric about (a + b)/2, so that, by part (a), we have E(X) = (a + b)/2. 2 If X ∼ N µ, σ , then fX is symmetric about µ, so that, by part (a), we have E(X) = µ. If X ∼ T (a, b), then fX is symmetric about (a + b)/2, so that, by part (a), we have E(X) = (a + b)/2. c) If a PDF of X is an even function (i.e., symmetric about 0), then, by part (a), E(X) = 0.

Advanced Exercises 10.21 ,x a) We note that, as xfX (x) ≥ 0 for x ≥ 0, the function G defined on [0, ∞) by G(x) = 0 tfX (t) dt is nondecreasing. If G is bounded, then limx→∞ G(x) = U , where U = sup{ G(x) : x ≥ 0 }. Hence, " x " ∞ xfX (x) dx = lim tfX (t) dt = lim G(x) = U. x→∞ 0

0

x→∞

If G is unbounded, then limx→∞ G(x) = ∞, so that " x " ∞ xfX (x) dx = lim tfX (t) dt = lim G(x) = ∞. 0

x→∞ 0

−∞

x→−∞ x

x→∞

,0 b) We note that, as xfX (x) ≤ 0 for x ≤ 0, the function H defined on (−∞, 0] by H (x) = x tfX (t) dt is nondecreasing. If H is bounded, then limx→−∞ H (x) = L, where L = inf{ H (x) : x ≤ 0 }. Hence, " 0 " 0 xfX (x) dx = lim tfX (t) dt = lim H (x) = L. x→−∞

If H is unbounded, then limx→∞ H (x) = −∞, so that " 0 " 0 xfX (x) dx = lim tfX (t) dt = lim H (x) = −∞. x→−∞ x

−∞

c) We have

"



−∞

|x|fX (x) dx = =

"

"

∞ 0 ∞ 0

x→−∞

|x|fX (x) dx + xfX (x) dx −

"

"

0 −∞

0

−∞

|x|fX (x) dx

( ) ( ) xfX (x) dx = E X+ + E X − .

,∞ By definition, X has finite expectation if and only if −∞ |x|fX (x) dx < ∞, which, as we see from the ( ) ( ) previous display, is the case if and only if both E X + and E X − are real numbers. In that case, " ∞ " ∞ " 0 ( ) ( ) E(X) = xfX (x) dx = xfX (x) dx + xfX (x) dx = E X + − E X − . −∞

0

−∞

10-8

CHAPTER 10

Expected Value of Continuous Random Variables

d) Let X be a random variable with PDF given by ⎧ 2 ⎨ , if x > 0; π(1 + x2) fX (x) = ⎩ 0, otherwise. ( −) ( ) We have E X = 0 and, from the solution to Example 10.5 on page 568, we see that E X+ = ∞. Thus, X has infinite expectation, and ( ) ( ) E(X) = E X+ − E X − = ∞ − 0 = ∞. e) No, the standard Cauchy distribution ( does ) not (have)infinite expectation because, as is easily seen from the solution to Example 10.5, both E X+ and E X − equal ∞.

10.22 ,∞ ( )−(ν+1)/2 a) We note that T has finite expectation if and only if −∞ |t| 1 + t 2 /ν dt < ∞. Using sym2 metry and making the substitution u = 1 + t /ν, we find that " ∞ " ∞ " ∞ |t| t du dt = 2 dt = ν . ( )(ν+1)/2 ( )(ν+1)/2 (ν+1)/2 u −∞ 1 + t 2 /ν 0 1 1 + t 2 /ν

From calculus, we know that the integral on the right of the preceding display converges (i.e., is finite) if and only if (ν + 1)/2 > 1, that is, if and only if ν > 1. b) We see that the PDF of T is an even function. Hence, when T has finite expectation, we see from Exercise 10.20(c) that E(T ) = 0.

10.2 Basic Properties of Expected Value Basic Exercises 10.23 a) Applying, in turn, Proposition 8.10 on page 441 and Proposition 8.13 on page 466, we deduce that X 2 /σ 2 has the chi-square distribution with 1 degree of freedom, as do both Y 2 /σ 2 and Z 2 /σ 2 . Because X, Y , and Z are independent, so are X2 /σ 2 , Y 2 /σ 2 , and Z 2 /σ 2 (Proposition 6.13 on page 297). Applying the second bulleted item on page 537, we conclude that W = (X2 + Y 2 + Z 2 )/σ 2 has the chi-square distribution with three degrees of freedom: ( 1 )3/2 1 fW (w) = 2 w3/2−1 e−w/2 = √ w 1/2 e−w/2 , w > 0, $(3/2) 2π

and fW (w) = 0 otherwise. √ b) We note that S = σ W . Thus, from part (a) and the FEF, " ∞ * √ + " ∞( √ ) √ σ σ w fW (w) dw = √ w · w 1/2 e−w/2 dw E(S) = E σ W = 2π 0 −∞ " ∞ # σ σ $(2) =√ w2−1 e−w/2 dw = √ = 2 2/π σ. 2π 0 2π (1/2)2

10.24 a) To obtain a PDF of W = |Z|, we use the CDF method. For w > 0,

FW (w) = P (W ≤ w) = P (|Z| ≤ w) = P (−w ≤ Z ≤ w) = )(w) − )(−w) = 2)(w) − 1.

10.2

10-9

Basic Properties of Expected Value

Differentiation now yields # 1 2 2 ′ (w) = 2φ(w) = 2 · √ e−w /2 = 2/π e−w /2 fW (w) = FW 2π

if w > 0, and fW (w) = 0 otherwise. Applying now Definition 10.1 on page 565 and making the substitution u = w2 /2, we get " ∞ " ∞ " ∞ # # # 2 E(|Z|) = E(W ) = wfW (w) dw = 2/π w e−w /2 dw = 2/π e−u du = 2/π. 0

−∞

0

b) From the FEF and symmetry, we get, upon making the substitution u = z2 /2, "

"

" ∞ 1 1 2 −z2 /2 E(|Z|) = |z|φ(z) dz = |z| √ e dz = 2 z √ e−z /2 dz 2π 2π −∞ −∞ 0 " " ∞ ∞ # # 2 2 =√ z e−z /2 dz = 2/π e−u du = 2/π. 2π 0 0 ∞



10.25 Applying the FEF with g(x, y) = x, we get ) E(X) = E g(X, Y ) = =

"

(

2

0

1 = 4

"

4" 2

0

"



"



−∞ −∞

x/2

5

g(x, y)fX,Y (x, y) dx dy =

x · 2xy dy dx = 2

0

x 4 dx =

"

2

x

0

2

4"

5

x/2 0

"



"



−∞ −∞

y dy dx = 2

"

xfX,Y (x, y) dx dy 2

x

0

2

&

x2 8

'

dx

1 25 8 · = . 4 5 5

Applying the FEF with g(x, y) = y, we get ) E(Y ) = E g(X, Y ) = =

"

(

2

0

1 = 12

4"

"

0

2

"

x/2

0



"



−∞ −∞

5

g(x, y)fX,Y (x, y) dx dy =

y · 2xy dy dx = 2

x 4 dx =

"

2

0

x

4"

0

x/2

2

5

"



"



−∞ −∞

y dy dx = 2

"

0

yfX,Y (x, y) dx dy 2

x

&

' x3 dx 24

1 25 8 · = . 12 5 15

10.26 We know from Exercise 10.17 that ! 1/34, if 0 < t1 < 6, 0 < t2 < 6, and t1 + t2 < 10; fT1 ,T2 (t1 , t2 ) = 0, otherwise. From Proposition 10.2 on page 576 and symmetry, E(T1 + T2 ) = E(T1 ) + E(T2 ) = 2E(T1 ) .

10-10

CHAPTER 10

Expected Value of Continuous Random Variables

Applying the FEF with g(t1 , t2 ) = t1 yields " ∞" ∞ " ( ) g(t1 , t2 )fT1 ,T2 (t1 , t2 ) dt1 dt2 = E(T1 ) = E g(T1 , T2 ) = 1 = 34 6 = 34

"

4

t1

0

"

0

4

4"

−∞ −∞

6

0

1 dt2

1 t1 dt1 + 34

5

"

4

1 dt1 + 34

"

6

t1

4

4"

6

t1 (10 − t1 ) dt1 =

10−t1

0

1 dt2

5



"



−∞ −∞

t1 fT1 ,T2 (t1 , t2 ) dt1 dt2

dt1

146 . 51

Hence, E(T1 + T2 ) = 2 · (146/51) = 5.73.

10.27 Let Y denote the policyholder’s loss and let X denote the benefit paid. Noting that X = min{10, Y }, we apply the FEF with g(y) = min{10, y} to get " ∞ " ∞ " ∞ ( ) 1 g(y)fY (y) dy = min{10, y}fY (y) dy = 2 min{10, y} · 3 dy E(X) = E g(Y ) = y −∞ −∞ 1 " 10 " ∞ " 10 " ∞ y 10 dy dy 18 1 =2 dy + 2 dy = 2 + 20 = + = 1.9. 3 3 2 3 10 10 y y 1 10 y 1 10 y

10.28 a) We have fX (x) = 1 if 1 < x < 2, and fX (x) = 0 otherwise. The range of X is the interval (1, 2). Here g(x) = 1/x, which is strictly decreasing and differentiable on the range of X. The transformation method is appropriate. Note that g −1 (y) = 1/y. Letting Y = 1/X, we see that the range of Y is the interval (1/2, 1). Applying the transformation method now yields fY (y) =

1

fX (x) |g ′ (x)|

1 1 0 fX (x) = x 2 · 1 = =0 0−1/x 2 0 y2

if 1/2 < y < 1, and fY (y) = 0 otherwise. Therefore, from Definition 10.1 on page 565, " ∞ " 1 " 1 1 dy E(1/X) = E(Y ) = yfY (y) dy = y · 2 dy = = ln 1 − ln(1/2) = ln 2. y −∞ 1/2 1/2 y b) Applying the FEF with g(x) = 1/x yields " ∞ " 2 " E(1/X) = (1/x)fX (x) dx = (1/x) · 1 dx = −∞

1

1

2

dx = ln 2 − ln 1 = ln 2. x

c) Because X ∼ U(1, 2), we know that E(X) = (1 + 2)/2 = 1.5. Referring to either part (a) or part (b) shows that 1 1 = . E(1/X) = ln 2 ̸= 2/3 = 1.5 E(X)

Thus, in general, E(1/X) ̸= 1/E(X).

10.29 Let T denote the lifetime, in years, of the piece of equipment. We know that T ∼ E(0.1). Let A be the amount paid. Then x, if 0 < T ≤ 1; A = 0.5x, if 1 < T ≤ 3; 0, otherwise.

10.2

10-11

Basic Properties of Expected Value

Let g(t) =

-

x, if 0 < t ≤ 1; 0.5x, if 1 < t ≤ 3; 0, otherwise.

so that A = g(T ). Applying the FEF, we obtain "

( ) E(A) = E g(T ) =



−∞

g(t)fT (t) =

"

1

xfT (t) dt +

0

= xP (T ≤ 1) + 0.5xP (1 < T ≤ 3) =

"

3 1

0.5xfT (t) dt

*( ) ( )+ 1 − e−0.1 + 0.5 e−0.1 − e−0.3 x.

Setting this last expression equal to 1000, we find that x = 5644.23. 10.30 From Example 9.4,

fX,Y (x, y) = n(n − 1)(y − x)n−2 if 0 < x < y < 1, and fX,Y (x, y) = 0 otherwise. a) Applying the FEF with g(x, y) = x and making the substitution u = y − x, we get E(X) =

"



"



−∞ −∞

= n(n − 1)

xfX,Y (x, y) dx dy =

"

1

x

0

4"

1−x

u

n−2

0

"

1 0

5

4"

1 x

du dx = n

5

x · n(n − 1)(y − x)n−2 dy dx "

1

0

x 2−1 (1 − x)n−1 dx = nB(2, n)

1 1 = . (n + 1)n n+1

=n·

b) Applying the FEF with g(x, y) = y and making the substitution u = y − x, we get E(Y ) =

"

"





yfX,Y (x, y) dx dy =

−∞ −∞

= n(n − 1)

"

1

y

0

&"

y

n−2

u

0

"

0

1 &" y 0

' " du dy = n

0

y · n(n − 1)(y − x) 1

y n dy =

n−2

'

dx dy

n . n+1

c) Applying the FEF with g(x, y) = y − x and making the substitution u = y − x, we get E(R) = E(Y − X) = =

"

0

1 &" y



"



−∞ −∞

(y − x)fX,Y (x, y) dx dy

(y − x) · n(n − 1)(y − x)

0

= (n − 1)

"

"

0

1

y n dy =

n−1 . n+1

n−2

'

dx dy = n(n − 1)

"

0

1 &" y 0

n−1

u

' du dy

10-12

CHAPTER 10

Expected Value of Continuous Random Variables

d) Applying the FEF with g(x, y) = (x + y)/2 and making the substitution u = y − x, we get (

)

"



"



x+y fX,Y (x, y) dx dy 2 −∞ −∞ ' " &" y 1 1 n−2 = (x + y) · n(n − 1)(y − x) dx dy 2 0 0 ' " &" y n(n − 1) 1 n−2 = (2y − u)u du dy 2 0 0 ' ' " 1 &" y " &" y n(n − 1) 1 n−2 n−1 = n(n − 1) y u du dy − u du dy 2 0 0 0 0 " 1 " (n − 1) 1 n 1 =n y n dy − y dy = . 2 2 0 0

E(M) = E (X + Y )/2 =

10.31 a) We have E(X) =

"



−∞

xfX (x) dx =

= nB(2, n) = n ·

"

1 0

1 x· x 1−1 (1 − x)n−1 dx = n B(1, n)

1 1 = . (n + 1)n n+1

"

1

0

x 2−1 (1 − x)n−1 dx

b) We have E(Y ) =

"

∞ −∞

yfY (y) dy =

"

1 0

1 y· y n−1 (1 − y)1−1 dy = n B(n, 1)

"

0

1

y n dy =

n . n+1

c) Applying Equation (10.13) on page 576 and referring to parts (a) and (b), we get E(R) = E(Y − X) = E(Y ) − E(X) =

n 1 n−1 − = . n+1 n+1 n+1

d) Applying Proposition 10.2 on page 576 and referring to parts (a) and (b), we get & ' ) 1 ( ) 1( 1 n 1 E(X) + E(Y ) = + = . E(M) = E (X + Y )/2 = 2 2 n+1 n+1 2 10.32 Let T denote the payment. We have ! 0, T = X + Y − 1, Define g(x, y) =

!

if X + Y ≤ 1; if X + Y > 1.

0, x + y − 1,

if x + y ≤ 1; if x + y > 1.

10.2

Then T = g(X, Y ). Applying the FEF now yields " ∞" ∞ ( ) E(T ) = E g(X, Y ) = g(x, y)fX,Y (x, y) dx dy = =

"

"

=2



−∞ 1 0

"

0

4"

4" 1

−∞ −∞

1−x

−∞

0 · fX,Y (x, y) dy + 5

1 1−x

10-13

Basic Properties of Expected Value

"



1−x

(x + y − 1) · fX,Y (x, y) dy dx

(x + y − 1) · 2x dy dx = 2

x2 x· dx = 2

Thus, the expected payment is $250.

"

0

1

x 3 dx =

5

"

0

1

x

4"

1

1−x

5

(x + y − 1) dy dx

1 . 4

10.33 a) From Example 9.25(b), we know that the proportion of the total inspection time attributed to the first engineer is U(0, 1). Because the expected value of a U(0, 1) random variable is 1/2, we see that, on average, 50% of the total inspection time is attributed to the first engineer. b) Let X and Y denote the proportions of the total inspection time attributed to the first and second engineers, respectively. Clearly, X + Y = 1. Because the inspection times for the two engineers have the same probability distribution, it follows, by symmetry, that X and Y have the same probability distribution and, hence, that E(X) = E(Y ). Using the linearity property of expected value and the fact that the expected value of a constant random variable is the constant, we get 1 = E(1) = E(X + Y ) = E(X) + E(Y ) = 2E(X) .

Consequently, E(X) = 1/2. On average, 50% of the total inspection time is attributed to the first engineer. 10.34 a) Let U ∼ U(0, ℓ). For 0 ≤ x < ℓ/2,

FX (x) = P (X ≤ x) = P (U ≤ x or U ≥ ℓ − x) =

x ℓ − (ℓ − x) 2x x + = = . ℓ ℓ ℓ ℓ/2

Hence, X ∼ U(0, ℓ/2). b) Noting that R = X/(ℓ − X), we apply the FEF and make the substitution u = ℓ − x to get & ' " ∞ " ℓ/2 X x x 1 E(R) = E = fX (x) dx = · dx ℓ−X ℓ − x ℓ/2 −∞ ℓ − x 0 " " ℓ " 2 ℓ ℓ−u du 2 ℓ = du = 2 − 1 du = 2 ln 2 − 1. ℓ ℓ/2 u ℓ ℓ/2 ℓ/2 u

c) The range of X is the interval (0, ℓ/2). To determine a PDF of R, we use the transformation method with g(x) = x/(ℓ − x), which is strictly increasing and differentiable on the range of X. Note that g −1 (r) = rℓ/(1 + r) and that the range of R is the interval (0, 1). Applying the transformation method now yields & ' 2(ℓ − x)2 1 1 1 2 rℓ 2 2 fX (x) = fR (r) = ′ = = 2 ℓ− = 2 2 |g (x)| 1+r ℓ/(ℓ − x) ℓ/2 ℓ ℓ (1 + r)2

if 0 < r < 1, and fR (r) = 0 otherwise.

10-14

CHAPTER 10

Expected Value of Continuous Random Variables

d) From the definition of expected value and part (c), we find, upon making the substitution u = 1 + r, " 1 " 2 " ∞ 2 u−1 rfR (r) dr = r· dr = 2 du E(R) = 2 (1 + r) u2 −∞ 0 1 ' " 2& 1 1 =2 du = 2 ln 2 − 1. − u u2 1

e) The ratio of the longer segment to the shorter segment is (ℓ − X)/X. Letting g(x) = (ℓ − x)/x, we have 0 " ∞0 " ℓ/2 " ℓ/2 " ∞ 0ℓ − x 0 ℓ−x dx 0 0 fX (x) dx = 2 |g(x)|fX (x) dx = dx = 2 − 1 = ∞. 0 x 0 ℓ 0 x x −∞ −∞ 0

Hence, by the FEF, (ℓ − X)/X doesn’t have finite expectation.

10.35 a) From Exercise 10.34(c), the ratio of the shorter piece to the longer piece doesn’t depend on ℓ; thus, we can assume that ℓ = 2. Then, from Exercise 10.34(a), X ∼ U(0, 1). The idea is to use a basic random number generator to obtain a large number of, say, 10,000, uniform random numbers on the interval (0, 1). For each such random number, u, compute r = u/(2 − u), which gives 10,000 independent observations of the ratio of the smaller segment to the longer segment. By the long-run-average interpretation of expected value, the average of these 10,000 observations should approximately equal E(R), which, from Exercise 10.34(d), is 2 ln 2 − 1 ≈ 0.386. b) Answers will vary, but you should get roughly 0.386. 10.36 The area of the triangle equals 1/2. Thus, ! 2, if 0 < y < x and 0 < x < 1; fX,Y (x, y) = 0, otherwise. a) Applying the FEF, we get " " ∞" ∞ xfX,Y (x, y) dx dy = E(X) = −∞ −∞ ∞ " ∞

E(Y ) =

"

E(XY ) =

"

and

Therefore,

−∞ −∞ ∞

"



−∞ −∞

yfX,Y (x, y) dx dy =

"

xyfX,Y (x, y) dx dy = E(XY ) =

0

0

1 &" x 0 1 &" x

"

0

0

'

x · 2 dy dx = 2 y · 2 dy dx =

1 &" x 0

'

"

'

"

1

0 1

0

xy · 2 dy dx =

"

x 2 dx =

x 2 dx =

0

1

2 , 3

1 , 3

x 3 dx =

1 . 4

1 2 2 1 ̸= = · = E(X) E(Y ) ; 4 9 3 3

that is, E(XY ) ̸= E(X) E(Y ). b) Yes, we can conclude that X and Y aren’t independent random variables. Indeed, by Proposition 10.4 on page 576, if X and Y were independent random variables, then we would have E(XY ) = E(X) E(Y ), which, by part (a) isn’t the case. 10.37 The area of the triangle equals 1. Thus, ! 1, if −x < y < x and 0 < x < 1; fX,Y (x, y) = 0, otherwise.

10.2

10-15

Basic Properties of Expected Value

a) By symmetry, E(Y ) = 0, so that E(X) E(Y ) = 0. Applying the FEF, we get E(XY ) = =

"

"





−∞ −∞

"

1

&"

x

0

xyfX,Y (x, y) dx dy = '

x

−x

y dy dx =

"

1

"

0

1 &" x

−x

'

xy · 1 dy dx

x · 0 dx = 0.

0

Therefore, E(XY ) = E(X) E(Y ) . b) No, we cannot conclude that X and Y are independent random variables. Independence of random variables is not a necessary condition for the expected value of a product to equal the product of the expected values; in other words, it’s possible for the expected value of a product to equal the product of the expected values without the random variables being independent. c) No, X and Y aren’t independent random variables. We can see this fact in many ways. One way is to note that the range of Y is the interval (−1, 1), whereas, the range of Y|X=1/2 is the interval (−1/2, 1/2). 10.38 Let X, Y , and Z denote the three claim amounts. We know that these three random variables all have f as their PDF, and we can assume that they are independent. The largest of the three claim amounts is max{X, Y, Z}, which we denote by W . a) We first note that the CDF corresponding to f is " x " x 3 1 F (x) = f (t) dt = dt = 1 − 3 4 x −∞ 1 t if x ≥ 1, and F (x) = 0 otherwise. From Exercise 9.70(a), &

( )2 3 fW (w) = 3f (w) F (w) = 3 · 4 w

1−

1 w3

'2

=

9 w4

&

1−

1 w3

'2

if w > 1, and fW (w) = 0 otherwise. Hence, by Definition 10.1 on page 565, E(W ) =

"



−∞

wfW (w) dw =

"

∞ 1

9 w· 4 w

&

1 1− 3 w

'2

dw = 9

"

1

∞&

1 2 1 − 6+ 9 3 w w w

'

dw =

81 . 40

Hence, the expected value of the largest of the three claims is $(81/40) thousand, or $2025. b) From the FEF, independence, and symmetry, E(W ) = E(max{X, Y, Z}) = =6

"""

x>y>z

= 3 · 27

"

"



"



"



−∞ −∞ −∞

max{x, y, z}fX,Y,Z (x, y, z) dx dy dz

xfX (x)fY (y)fZ (z) dx dy dz = 6 · 27

1

∞ &" ∞ z

1 dy y6

'

1 3 · 27 dz = 4 5 z

"

"

∞ 1

∞ 1

4"

z

∞ &" ∞ y

1 x · 4 dx x

'

5 1 1 dy 4 dz 4 y z

1 3 · 27 81 dz = = . 9 5·8 40 z

Hence, the expected value of the largest of the three claims is $(81/40) thousand, or $2025. 10.39 Let X1 and X2 denote the arrival times of the two people. By assumption, X1 and X2 are independent random variables with common PDF f . The arrival times of the first and second persons to arrive

10-16

CHAPTER 10

Expected Value of Continuous Random Variables

are X = min{X1 , X2 } and Y = max{X1 , X2 }, respectively. The expected amount of time that the first person to arrive waits for the second person to arrive is E(Y − X). a) Noting that Y − X = |X1 − X2 |, applying the FEF, and using independence and symmetry, we get " ∞" ∞ E(Y − X) = E(|X1 − X2 |) = |x1 − x2 |fX1 ,X2 (x1 , x2 ) dx1 dx2 =

"

=2



"



−∞ −∞

"



−∞ −∞

""

|x1 − x2 |f (x1 )f (x2 ) dx1 dx2 = 2

f (y)

−∞

&"

−∞

x1 0, ! P (X > t), if t < tr ; P (T > t) = P (min{X, tr } > t) = 0, if t ≥ tr .

a) Recall from the first bulleted item on page 578 that Proposition 10.5 holds for any type of random variable. Hence, " tr " tr " ∞ ( ) P (T > t) dt = P (X > t) dt = 1 − FX (t) dt. E(T ) = 0

0

0

b) Let Y denote the cost of replacement. Note that Y = c1 if X > tr , and Y = c2 if X ≤ tr . Hence, Y is a discrete random variable, and we have % ypY (y) = c1 P (X > tr ) + c2 P (X ≤ tr ) E(Y ) = y

( ) = c1 1 − FX (tr ) + c2 FX (tr ) = c1 + (c2 − c1 )FX (tr ).

c) In this case, X ∼ E(0.001), so that FX (x) = 1 − e−0.001x for x ≥ 0. Hence, from part (b), + * E(Y ) = 50 + (80 − 50) 1 − e−0.001·950 = $68.40.

d) In this case, X ∼ U(900, 1100), so that FX (x) = (x − 900)/200 for 900 ≤ x < 1100. Consequently, from part (b), ' & 950 − 900 = $57.50. E(Y ) = 50 + (80 − 50) 200 10.43 a) Let E denote the event that the component fails immediately when turned on. Applying the law of total probability, we find that, for x ≥ 0, ( ) ( ) ( ) ( ) FX (x) = P (X ≤ x) = P E P X ≤ x | E + P E c P X ≤ x | E c = p · 1 + (1 − p) · F (x) = p + (1 − p)F (x). Hence, FX (x) =

!

0, if x < 0; p + (1 − p)F (x), if x ≥ 0.

b) Recall from the first bulleted item on page 578 that Proposition 10.5 holds for any type of random variable. Hence, from part (a), E(X) = =

"

0

"

0



P (X > x) dx =

"

0

∞(

) 1 − FX (x) dx

" )+ 1 − p + (1 − p)F (x) dx = (1 − p)

∞*

(

0

∞(

c) In this case, F (x) = 1 − e−λx for x ≥ 0. Hence, from part (b), " ∞ 1−p E(X) = (1 − p) . e−λx dx = λ 0

) 1 − F (x) dx.

10.2

10-19

Basic Properties of Expected Value

Theory Exercises 10.44 a) We apply the triangle inequality and use the fact that X has finite expectation to conclude that " ∞ " ∞ |a + bx|fX (x) dx ≤ (|a| + |b||x|)fX (x) dx −∞

−∞

= |a|

"



−∞

fX (x) dx + |b|

= |a| + |b|

"



−∞

"



|x|fX (x) dx

−∞

|x|fX (x) dx < ∞.

Hence, from the FEF, a + bX has finite expectation, and " ∞ " ∞ " E(a + bX) = (a + bx)fX (x) dx = afX (x) dx + =a

−∞ " ∞

−∞

"

fX (x) dx + b



−∞

−∞

∞ −∞

bxfX (x) dx

xfX (x) dx = a + bE(X) .

b) Let fY −X be any PDF of Y − X. We will prove that the function g, defined by g(z) = 0 if z < 0, and g(z) = fY −X (z) if z ≥ 0, is a PDF of Y − X. To do so, we show that g satisfies Equation (8.15) on page 422. First note that, because X ≤ Y , we have P (Y − X ≥ 0) = 1. Hence, for z < 0, FY −X (z) = P (Y − X ≤ z) ≤ P (Y − X < 0) = 1 − P (Y − X ≥ 0) = 1 − 1 = 0. Thus, FY −X (z) = 0 if z < 0. As FY −X is everywhere continuous, it follows that, in fact, FY −X (z) = 0 if z ≤ 0. Now we consider two cases. Case 1: z < 0 In this case, FY −X (z) = 0 =

"

z

−∞

0 dt =

"

z

g(t) dt.

−∞

Case 2: z ≥ 0 In this case,

) FY −X (z) = FY −X (0) + FY −X (z) − FY −X (0) = 0 + =

"

0

−∞

(

g(t) dt +

"

z

0

g(t) dt =

"

z

"

0

z

fY −X (t) dt

g(t) dt. −∞

,z We have shown that FY −X (z) = −∞ g(t) dt for all z ∈ R. Hence, g is a PDF of Y − X. Thus, without loss of generality, we can assume that a PDF of Y − X is such that fY −X (z) = 0 for z < 0. Applying the linearity property of expected value for continuous random variables, we conclude that Y − X has finite expectation and " ∞ " ∞ E(Y ) − E(X) = E(Y − X) = zfY −X (z) dz = zfY −X (z) dz ≥ 0.

Thus, E(X) ≤ E(Y ), as required.

−∞

0

10-20

CHAPTER 10

10.45 a) We have " ∞

−∞

|x|fX (x) dx = = = = =

"

"

Expected Value of Continuous Random Variables

0 −∞ 0 −∞

"

0

−∞

"

−xfX (x) dx + 4" &"

0

−∞ " ∞ 0

5

0

"

∞ 0

xfX (x) dx

1 dy fX (x) dx +

x y

−∞

"

' " fX (x) dx dy +

0

0

P (X < y) dy +

"



∞ &" x

'

1 dy fX (x) dx

0

∞ &" ∞ y

' fX (x) dx dy

P (X > y) dy

0

P (X < −y) dy +

"

∞ 0

P (X > y) dy =

"



P (|X| > y) dy.

0

,∞ Thus, we see that X has finite expectation if and only if 0 P (|X| > x) dx < ∞. In that case, by applying the same argument as the one just given, but without absolute values, we find that " ∞ " ∞ E(X) = P (X > x) dx − P (X < −x) dx. 0

0

b) We have

P (X > x) =

-

1, (3 − x)/4, 0,

Hence, by part (a),

if x < −1; if −1 ≤ x < 3; if x ≥ 3.

E(X) = =

"



0

"

3

0

and

P (X < −x) =

P (X > x) dx −

3−x dx − 4

"

1 0

"

∞ 0

-

1, (−x + 1)/4, 0,

if x < −3; if −3 ≤ x < 1; if x ≥ 1.

P (X < −x) dx

9 1 −x + 1 dx = − = 1. 4 8 8

10.46 a) Because X and −X have the same probability distribution, they have the same expected value. Hence, from the linearity property of expected value, E(X) = E(−X) = −E(X), which implies that E(X) = 0. b) From part (a) and the linearity property of expected value, 0 = E(X − c) = E(X) − E(c) = E(X) − c.

Hence, E(X) = c. c) Using the symmetry of fX about c and making the successive substitutions u = c − t and t = c + u, we get " ∞ " x Fc−X (x) = P (c − X ≤ x) = P (X ≥ c − x) = fX (t) dt = fX (c − u) du =

"

x

−∞

fX (c + u) du =

"

c−x

c+x

−∞

−∞

fX (t) dt = P (X ≤ c + x) = P (X − c ≤ x) = FX−c (x).

10.2

Basic Properties of Expected Value

10-21

Therefore, c − X has the same probability distribution as X − c, which means that X − c is a symmetric random variable; that is, X is symmetric about c. d) We note that a U(a, b) random variable has a PDF that is symmetric about ( (a + ) b)/2, and, therefore, by parts (c) and (b) has mean (a + b)/2. Likewise, we observe that a N µ, σ 2 random variable has a PDF that is symmetric about µ, and, therefore, has mean µ. Also, we note that a T (a, b) random variable has a PDF that is symmetric about (a + b)/2, and, therefore, has mean (a + b)/2.

Advanced Exercises 10.47 From Exercise 9.33(d), X(k) has the beta (distribution with parameters α = k and β = n − k + 1. ) Hence, from Table 10.1 on page 568, we have E X(k) = k/(n + 1).

10.48 a) From Proposition 9.11(a) on page 532 or Exercise 9.33(e), ( )X(1) = min{X1 , . . . , Xn } has the exponential distribution with parameter nλ. Hence, we have E X(1) = 1/nλ. b) For convenience, we let X = X(j −1) and Y = X(j ) . From Exercise 9.35(c), a joint PDF of X and Y is n! λ2 e−λx e−(n−j +1)λy (1 − e−λx )j −2 , 0 < x < y. (j − 2)! (n − j )! $( ) Set c = n! (j − 2)! (n − j )! . Then, for z > 0, we get, upon making the substitution u = 1 − e−λx , " ∞ " ∞ fX,Y (x, x + z) dx = c λ2 e−λx e−(n−j +1)λ(x+z) (1 − e−λx )j −2 dx fY −X (z) = fX,Y (x, y) =

−∞

= cλe−(n−j +1)λz = cλe−(n−j +1)λz = cλe−(n−j +1)λz

" " "

0



0 1 0 1 0

λe−λx e−(n−j +1)λx (1 − e−λx )j −2 dx

uj −2 (1 − u)n−j +1 du u(j −1)−1 (1 − u)(n−j +2)−1 du = cλe−(n−j +1)λz B(j − 1, n − j + 2)

n! (j − 2)! (n − j + 1)! −(n−j +1)λz = λe = (n − j + 1)λe−(n−j +1)λz . (j − 2)! (n − j )! n! ( ) Hence, we see that X(j ) − X(j −1) ∼ E (n − j + 1)λ . ) 6 ( c) We have X(k) = X(1) + jk=2 X(j ) − X(j −1) . Therefore, from the linearity property of expected value and parts (a) and (b), k k % ( ( ) ( ) % ) 1 1 E X(k) = E X(1) + E X(j ) − X(j −1) = + nλ j =2 (n − j + 1)λ j =2

=

n−1 n % 1 % 1 1 1 + = . nλ i=n−k+1 iλ λ j =n−k+1 j

10.49 a) We can consider the lifetimes of the n components a random sample of size n from an exponential distribution with parameter λ. The lifetime of the unit in a series system is then X(1) . From Exercise 10.48(a), we conclude that the expected lifetime of the unit is 1/nλ.

10-22

CHAPTER 10

Expected Value of Continuous Random Variables

b) Again, we can consider the lifetimes of the n components a random sample of size n from an exponential distribution with parameter λ. The lifetime of the unit in a parallel system is then X(n) . From 6 Exercise 10.48(c), we conclude that the expected lifetime of the unit is (1/λ) jn=1 1/j . 6 c) This result follows from part (b) and the relation jn=1 (1/j ) ∼ ln n, where, here, the symbol ∼ means that the ratio of the two sides approach 1 as n → ∞.

10.50 Consider a parallel system of n components whose lifetimes, X1 , . . . , Xn , are independent E(λ) random variables. The kth order statistic, X(k) , represents the time of the kth component failure. Thus, the random variable X(j ) − X(j −1) represents the time between the (j − 1)st and j th component failure. When the (j − 1)st component failure occurs, there remain (n − j + 1) working components. Because of the lack-of-memory property of the exponential distribution, the remaining elapsed time until failure for each of these (n − j + 1) working components still has the exponential distribution with parameter λ. Hence the time until the next failure, X(j ) − X(j −1) , has the probability distribution of the minimum of (n − j + 1) independent exponential random variables with parameter λ, which, by Proposition 9.11(a) on page 532, is the exponential distribution with parameter (n − j + 1)λ.

10.3 Variance, Covariance, and Correlation Basic Exercises 10.51 Let X be the current annual cost and let Y be the annual cost with the 20% increase. Then Y = 1.2X. a) By the linearity property of expected value, we have E(Y ) = 1.2E(X), so there is a 20% increase in the mean annual cost. b) From Equation (10.27) on page 586, we have Var(Y ) = Var(1.2X) = 1.44 Var(X), so there is a 44% increase in the variance of the annual √ cost. √ c) Referring to part (b), we have σY = Var(Y ) = 1.44 Var(X) = 1.2σX , so there is a 20% increase in the standard deviation of the annual cost. 10.52 a) Because E(X) = 0, we have ( ) Var(X) = E X2 = =

4 π

"

0

1

"



2 x fX (x) dx = π −∞ 2

"

1

−1

# x 2 1 − x 2 dx

# 4 π 1 x 2 1 − x 2 dx = · = . π 16 4

b) By symmetry, Y has the same probability distribution as X and, hence, the same variance. Consequently, Var(Y ) = Var(X) = 1/4. c) As E(X) = 0, we have " ∞" ∞ xyfX,Y (x, y) dx dy Cov(X, Y ) = E(XY ) = −∞ −∞

⎛ √ ⎞ " " " 1−x 2 1 1 1 1⎝ ⎠ y dy x dx = 0 · x dx = 0. = √ π −1 π −1 − 1−x 2

d) No, as noted in the text, random variables can be uncorrelated without being independent. e) No, X and Y are not independent random variables, as we discovered in Example 9.14 on page 524.

10.3

10.53 We have E Therefore,

(

X2

)

10-23

Variance, Covariance, and Correlation

"



1 x fX (x) dx = = 30 −∞ 2

"

30

0

x 2 dx = 300.

( ) ( )2 Var(X) = E X2 − E(X) = 300 − (15)2 = 75.

10.54 a) We have E

(

X2

)

"



1 x fX (x) dx = = b−a −∞ 2

"

b a

x 2 dx =

b3 − a 3 b2 + ab + a 2 = . 3(b − a) 3

Recalling that E(X) = (a + b)/2, we get

' a 2 + 2ab + b2 Var(X) = E − E(X) 4 + (b − a)2 1 * 2 4b + 4ab + 4a 2 − 3a 2 − 6ab − 3b2 = . = 12 12 (

X2

)

(

)2

b2 + ab + a 2 = − 3

&

b) In Exercise 10.53, X ∼ U(0, 30). Therefore, from part (a), Var(X) = (30 − 0)2 /12 = 75. 10.55 From Example 10.13 on page 587, we know that the variance of an exponential random variable is the reciprocal of the # square of its parameter. Hence, the standard deviation of the duration of a pause during a monologue is 1/(1.4)2 = 1/1.4, or about 0.714 seconds.

10.56 Let X denote the time, in minutes, that it takes a finisher to complete the race. By assumption, X ∼ N (61, 81). From Example 10.12 on page know that the variance of a normal random √ 586, we √ variable is its second parameter. Thus, σX = Var(X) = 81 = 9 minutes. 10.57 We have " 1 1 x fX (x) dx = x 2 · x α−1 (1 − x)β−1 dx B(α, β) −∞ 0 " 1 1 1 (α + 1)α = x (α+2)−1 (1 − x)β−1 dx = · B(α + 2, β) = . B(α, β) 0 B(α, β) (α + β + 1)(α + β)

( ) E X2 =

"



2

Recalling that E(X) = α/(α + β), we get Var(X) = E

(

X2

)

(

− E(X)

)2

(α + 1)α = − (α + β + 1)(α + β)

&

α α+β

'2

=

αβ (α

+ β)2 (α

+ β + 1)

.

10.58 Let X denote the proportion of these manufactured items that requires service during the first 5 years of use. By assumption, X has the beta distribution with parameters α = 2 and β = 3. Applying Exercise 10.57 gives ; # √ 2·3 σX = Var(X) = = 0.04 = 0.2. (2 + 3)2 (2 + 3 + 1)

10-24

CHAPTER 10

Expected Value of Continuous Random Variables

10.59 a) From Exercise 10.17, a PDF of T = T1 + T2 is t/34, fT (t) = (12 − t)/34, 0,

if 0 < t < 6; if 6 < t < 10; otherwise.

Consequently, E

(

T2

)

"



1 t fT (t) dt = = 34 −∞ 2

"

6

0

1 t dt + 34 3

"

6

10

t 2 (12 − t) dt =

1 642 (324 + 960) = 34 17.

Recalling from the solution to Exercise 10.17 that E(T ) = 292/51, we get ; & ' # 642 292 2 σT = Var(T ) = − = 2.23. 17 51

b) Applying the FEF with g(t1 , t2 ) = t12 yields ( ) E T12 =

"



"



−∞ −∞

1 = 34 6 = 34

"

4

0

"

4

0

t12

t12 fT1 ,T2 (t1 , t2 ) dt1 dt2

4"

6

1 dt2

0

t12 dt1

1 + 34

5

"

1 dt1 + 34

6

4

"

6

4

t12

4"

t12 (10 − t1 ) dt1 =

10−t1 0

1 dt2

5

dt1

64 370 562 + = . 17 51 51

Recalling from the solution to Exercise 10.26 that E(T1 ) = 146/51, we get & ' 146 2 7346 562 − = . Var(T1 ) = 51 51 (51)2 By symmetry, we have Var(T2 ) = Var(T1 ). We also need to compute Cov(T1 , T2 ). Applying the FEF with g(t1 , t2 ) = t1 t2 yields " ∞" ∞ E(T1 T2 ) = t1 t2 fT1 ,T2 (t1 , t2 ) dt1 dt2 −∞ −∞

1 = 34 18 = 34 Hence,

"

4

t1

0

"

0

4

4"

6 0

t2 dt2

1 t1 dt1 + 68

5

"

4

6

1 dt1 + 34

"

4

6

t1

4"

t1 (10 − t1 )2 dt1 =

401 − Cov(T1 , T2 ) = E(T1 T2 ) − E(T1 ) E(T2 ) = 51 Consequently, Var(T1 + T2 ) = Var(T1 ) + Var(T2 ) + 2 Cov(T1 , T2 ) = 2 ·

&

10−t1

t2 dt2

0

5

dt1

72 185 401 + = . 17 51 51

146 51

'2

=−

865 . (51)2

7346 865 12,962 −2· = . 2 2 (51) (51) (51)2

10.3

Variance, Covariance, and Correlation

σT1 +T2

# = Var(T1 + T2 ) =

Therefore,

;

10-25

12,962 = 2.23. (51)2

10.60 In the solution to Exercise 10.18(b), we discovered that E(Y ) = 1/4 and that fY (y) = − ln y if 0 < y < 1, and fY (y) = 0 otherwise. Therefore, /1 " 1 " ∞ ( 2) y3 1 2 2 E Y = y fY (y) dy = − y ln y dy = (1 − 3 ln y) = . 9 9 −∞ 0 0 Hence,

( ) ( )2 1 Var(Y ) = E Y 2 − E(Y ) = − 9

& '2 7 1 = . 4 144

10.61 a) Clearly, X + Y = 1. Hence, by Equation (10.27) on page 586, ( ) Var(Y ) = Var(1 − X) = Var 1 + (−1)X = (−1)2 Var(X) = Var(X) . b) From properties of variance and part (a), we get

0 = Var(1) = Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X, Y ) = 2 Var(X) + 2 Cov(X, Y ) .

Hence, Cov(X, Y ) = − Var(X). c) From properties of covariance,

Cov(X, Y ) = Cov(X, 1 − X) = Cov(X, 1) − Cov(X, X) = 0 − Var(X) = − Var(X) .

d) From parts (a) and (b),

Cov(X, Y ) − Var(X) Var(X) ρ(X, Y ) = √ = −1. =√ =− Var(X) Var(X) Var(Y ) Var(X) Var(X) e) As Y = 1 − X = 1 + (−1)X, we conclude from Proposition 7.16(d) on page 372 that ρ(X, Y ) = −1. 10.62 Let X denote the lifetime of the machine and let Y denote the age of the machine at the time of replacement. We want to determine the standard deviation of the random variable Y . Observe that ! X, if X ≤ 4; Y = 4, if X > 4. Now, set

g(x) =

!

x, 4,

if x ≤ 4; if x > 4.

Then Y = g(X) and, hence, from the FEF, " ∞ *( )r + ( )r E(Y r ) = E g(X) = g(x) fX (x) dx =

"

0

4

−∞

x r fX (x) dx +

"

4



4r fX (x) dx =

a) In this case, we assume that X ∼ U(0, 6). Then ! 1/6, fX (x) = 0,

"

4 0

x r fX (x) dx + 4r P (X > 4).

if 0 < x < 6; otherwise.

10-26

CHAPTER 10

Expected Value of Continuous Random Variables

Consequently, "

1 E(Y ) = 6 and

"

( ) 1 E Y2 = 6

Hence,

0

0

4

4

2 8 = 6 3

x dx + 4 ·

x 2 dx + 42 ·

< # ( ) ( )2 σY = Var(Y ) = E Y 2 − E(Y ) =

2 80 = . 6 9

;

80 − 9

& '2 8 = 1.333. 3

b) In this case, we assume that X ∼ T (0, 6). Then fX (x) =

-

x/9, (6 − x)/9, 0,

if 0 < x < 3; if 3 < x < 6; otherwise.

Consequently, E(Y ) =

"

3

0

x dx + 9



"

4 3



6−x 26 8 77 dx + 4P (X > 4) = 1 + + = . 9 27 9 27

and

Hence,

( ) E Y2 =

"

3

0

x2 ·

x dx + 9

"

4 3

x2 ·

6−x 9 121 32 165 dx + 16P (X > 4) = + + = . 9 4 36 9 18

< # ( ) ( )2 σY = Var(Y ) = E Y 2 − E(Y ) =

c) In this case, X ∼ E(1/3). Then

fX (x) =

!

(1/3)e−x/3 , 0,

;

165 − 18

&

77 27

'2

= 1.017.

if x > 0; otherwise.

Consequently, E(Y ) =

1 3

and E Hence,

(

Y2

)

σY =

1 = 3

"

4

0

"

0

4

xe−x/3 dx + 4P (X > 4) = 1.1548 + 1.0544 = 2.2092

x 2 e−x/3 dx + 16P (X > 4) = 2.7114 + 4.2176 = 6.9290.

< # )2 # ( ) ( Var(Y ) = E Y 2 − E(Y ) = 6.9290 − (2.2092)2 = 1.431.

10.3

10.63 a) We begin by noting that " ∞" ∞ " 1= fX,Y (x, y) dx dy = c

4"

1

x

0

−∞ −∞

E(X) =

"



xfX,Y (x, y) dx dy = 2

−∞ −∞

and E(Y ) =

"



"



−∞ −∞

1 dy dx = c

x

4"

2

0

−∞ −∞ ∞

1

"

1

x

2

0

yfX,Y (x, y) dx dy = 2

"

4"

1

x

0

5

1

0

So, we see that c = 2. From the FEF, " ∞" ∞ " E(XY ) = xyfX,Y (x, y) dx dy = 2 "

0

0

4"

0

Cov(X, Y ) = E(XY ) − E(X) E(Y ) = ( )( ) fX,Y (x, y) = 2xI(0,1) (x) I(0,1) (y) ,

"

1

x dx =

0

5

1

"

y dy dx = 5

1

Hence,

b) We can write

10-27

Variance, Covariance, and Correlation

1 dy dx = 2 5

1

y dy dx =

1

0

"

1

0

"

c . 2

x 2 dx =

1

0

1 , 3

x 2 dx =

x dx =

2 , 3

1 . 2

1 2 1 − · = 0. 3 3 2 (x, y) ∈ R2 .

Therefore, from Exercise 9.95, X and Y are independent random variables. Consequently, from Proposition 7.14 on page 368, we have Cov(X, Y ) = 0.

10.64 Let X denote the repair costs and let Y denote the insurance payment in the event that the automobile is damaged. We know that X ∼ U(0, 1500). Note that ! 0, if X ≤ 250; Y = X − 250, if X > 250. Set

g(x) =

!

0, x − 250,

Then Y = g(X). By the FEF, " ∞ E(Y ) = g(x)fX (x) dx = −∞

and

E Hence,

(

Y2

)

=

"

∞(

−∞

σY =

)2 g(x) fX (x) dx =

if x ≤ 250; if x > 250.

1 1500

"

1500

1 1500

"

1500

250

250

(x − 250) dx = 520.833, (x − 250)2 dx = 434027.778.

# # Var(Y ) = 434027.778 − (520.833)2 = $403.44.

10.65 By assumption, µX = µY = 100 and (X − 100)(Y − 100) < 0. Therefore, by the monotonicity property of expected value, ( ) ( ) Cov(X, Y ) = E (X − µX )(Y − µY ) = E (X − 100)(Y − 100) < 0.

10-28

CHAPTER 10

Expected Value of Continuous Random Variables

Now applying Equation (10.32) on page 589, we get Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X, Y ) < Var(X) + Var(Y ) . Thus, (b) is the correct relationship. 10.66 No, the converse is not true. For instance, consider the random variables X and Y in Exercise 10.37. Because E(XY ) = E(X) E(Y ), we have Cov(X, Y ) = E(XY ) − E(X) E(Y ) = 0. However, as noted in the solution of that exercise, these two random variables are not independent. 10.67 a) In Example 10.11, we found that Var(X) = Var(Y ) = and, in Example 10.14, Cov(X, Y ) = Hence,

n (n + 2)(n + 1)2

1 . (n + 2)(n + 1)2

1 Cov(X, Y ) 1 (n + 2)(n + 1)2 ρ(X, Y ) = √ = . == n n n Var(X) · Var(Y ) · (n + 2)(n + 1)2 (n + 2)(n + 1)2

b) Answers will vary, but following is one possible explanation. As n → ∞, we would expect that min{X1 , . . . , Xn } → 0 and max{X1 , . . . , Xn } → 1, and the correlation between two constant random variables equals 0. c) Recalling that Var(Y ) = Var(X) and using properties of covariance, we find that

( ) 1 Cov(R, M) = Cov Y − X, (X + Y )/2 = Cov(Y − X, X + Y ) 2 ) 1( = Cov(Y, X) + Cov(Y, Y ) − Cov(X, X) − Cov(X, Y ) 2 ) 1( = Cov(X, Y ) + Var(Y ) − Var(X) − Cov(X, Y ) = 0. 2 Hence, ρ(R, M) = 0; the range and midrange are uncorrelated. d) No. Although independent random variables are uncorrelated, it is not necessarily the case that uncorrelated random variables are independent. See, for instance, Exercise 10.66. e) No, R and M are not independent random variables. We can see this fact in several ways. One way is to note that the range of M is the interval (0, 1). However, given that, say, r = 1/2, the range of M is the interval (1/4, 3/4). 10.68 a) From the linearity property of expected value, & ' ( ) ) 1 1( E X¯ n = E E(X1 + · · · + Xn ) (X1 + · · · + Xn ) = n n ) ( 1 1 1 = E(X1 ) + · · · + E(Xn ) = (µ + · · · + µ) = (nµ) = µ. n n n

10.3

Variance, Covariance, and Correlation

10-29

b) Using the independence of X1 , . . . , Xn and Equations (10.27) and (10.34) on pages 586 and 590, respectively, we get & ' ) ( ) 1 1 ( ¯ Var Xn = Var (X1 + · · · + Xn ) = 2 Var(X1 + · · · + Xn ) n n + ) ) σ2 1( 1 * 1 ( . = 2 Var(X1 ) + · · · + Var(Xn ) = 2 σ 2 + · · · + σ 2 = 2 nσ 2 = n n n n variables each having variance σ 2 , we know that c) Because ( ( random )X1 , . 2. . , Xn are independent ) Cov Xk , Xj = σ if k = j , and Cov Xk , Xj = 0 if k ̸= j . Applying properties of covariance and referring to part (b), we get ) ( ) ( ) ( Cov X¯ n , Xj − X¯ n = Cov X¯ n , Xj − Cov X¯ n , X¯ n 4 5 n ( ) 1% Xk , Xj − Var X¯ n = Cov n k=1 =

n ( ) ( ) 1 σ2 1% = 0. Cov Xk , Xj − Var X¯ n = σ 2 − n k=1 n n

d) The mean of the sample mean equals the distribution mean; the variance of the sample mean equals the ratio of the distribution variance to the sample size; and the sample mean is uncorrelated with each deviation from the sample mean. 10.69 For n ∈ N , " ∞ " ∞ " ∞ λα λα n n α−1 −λx n E(X ) = x fX (x) dx = x ·x e dx = x (α+n)−1 e−λx dx $(α) $(α) −∞ 0 0 =

$(α + n) (α + n − 1)(α + n − 2) · · · α$(α) α(α + 1) · · · (α + n − 1) λα · = = . $(α) λα+n λn $(α) λn

10.70 a) From the FEF and upon making the substitution z = (x − µ)/σ , we find that, for n ∈ N , " ∞ " ( ) 1 σ 2n+1 ∞ 2n+1 −z2 /2 2n+1 −(x−µ)2 /2σ 2 2n+1 E (X − µ) = (x − µ) dx = √ z e dz = 0, e √ 2π σ 2π −∞ −∞ 2

where the last equation follows from the fact that z2n+1 e−z /2 is an odd function. b) From the FEF and upon first making the substitution z = (x − µ)/σ and then making the substitution u = z2 /2, we find that, for n ∈ N , " ∞ " ∞ ) ( 1 σ 2n 2 2n −(x−µ)2 /2σ 2 2n (x − µ) √ dx = √ z2n e−z /2 dz = E (X − µ) e 2π σ 2π −∞ −∞ " " 2n σ 2n ∞ n−1/2 −u 2σ 2n ∞ 2n −z2 /2 z e dz = √ u e du =√ π 0 2π 0 " 2n σ 2n 2n σ 2n ∞ (n+1/2)−1 −u 2n σ 2n (2n)! √ u e du = √ π = √ · $(n + 1/2) = √ · n! 22n π 0 π π

(2n)! 2n σ , n! 2n where the penultimate equality is due to Equation (8.46). =

10-30

CHAPTER 10

Expected Value of Continuous Random Variables

10.71 a) Applying properties of covariance, we obtain Cov(aX + bY, aX − bY ) = a 2 Cov(X, X) − ab Cov(X, Y ) + ba Cov(Y, X) − b2 Cov(Y, Y ) = a 2 Var(X) − ab Cov(X, Y ) + ab Cov(X, Y ) − b2 Var(Y )

= a 2 Var(X) − b2 Var(Y ) . b) If Var(Y ) = Var(X), then, from part (a),

Cov(X + Y, X − Y ) = 12 Var(X) − 12 Var(Y ) = Var(X) − Var(X) = 0.

Thus, X and Y are uncorrelated random variables.

Theory Exercises 10.72 a) Because X and −X have the same probability distribution, they have the same moments. Hence, if X has a finite odd moment of order r, then ( ) ( ) ( ) ( ) ( ) E X r = E (−X)r = E (−1)r Xr = E −1Xr = −E Xr ,

which implies that E(Xr ) = 0. b) Suppose that X is symmetric about c. From Exercise 10.46(b), we know that µX = c. And because X − c is symmetric, part (a) shows that ( ) ( ) E (X − µX )r = E (X − c)r = 0,

for all odd positive integers r for which X has a finite moment of that order. In other words, all odd central moments of X equal 0. 10.73 We have " ∞ " n |x| fX (x) dx = −∞

∞ 0

=n

"

0

n

x fX (x) dx =

∞ &" ∞ y

"

0

∞ &" x 0

ny

n−1

'

dy fX (x) dx

' " fX (x) dx y n−1 dy = n



y n−1 P (X > y) dy.

0

,∞ It now follows from the FEF that X has a finite nth moment if and only if 0 x n−1 P (X > x) dx < ∞, and, in that case, the FEF and the previous display show that " ∞ " ∞ n n E(X ) = x fX (x) dx = n x n−1 P (X > x) dx. 0

0

10.74 a) Let σ 2 denote the common variance of the Xj s. Applying Chebyshev’s inequality and Exercise 10.68, we get 0 &0 ' 0 0 + *0 + *0 0 X1 + · · · + Xn 0 0 0¯ 0 0¯ 0 0 P 0 − µ0 < ϵ = P 0Xn − µX¯ n 0 < ϵ = 1 − P 0Xn − µX¯ n 0 ≥ ϵ n ( ) Var X¯ n σ2 ≥1− = 1 − . ϵ2 nϵ 2

10.3

It now follows that

10-31

Variance, Covariance, and Correlation

0 ' &0 0 0 X1 + · · · + Xn lim P 00 − µ00 < ϵ = 1. n→∞ n

b) In words, the weak law of large numbers states that, when n is large, the average value of X1 , . . . , Xn is likely to be near µ. c) The weak law of large numbers provides a mathematically precise version of the long-run-average interpretation of expected value.

Advanced Exercises 10.75 a) From the linearity property of expected value ' % ' &% &% n n n ( ) cj Xj = cj E Xj = cj µ. E(Y ) = E j =1

j =1

j =1

6 Hence, Y is an unbiased estimator of µ if and only if jn=1 cj = 1. b) By independence and Equation (10.27) on page 586, ' % &% n n n ( ) % cj Xj = cj2 Var Xj = cj2 σj2 . Var(Y ) = Var j =1

j =1

j =1

We want to choose the cj s that minimize Var(Y ) and also provide an unbiased estimator. To do so, we use the method of Lagrange multipliers. Here the objective and constraint functions are f (c1 , . . . , cn ) =

n %

cj2 σj2

and

j =1

g(c1 , . . . , cn ) =

n %

j =1

cj − 1,

respectively. The gradient equation yields 2cj σj2 = λ, or cj = λ/2σj2 , for 1 ≤ j ≤ n. Substituting these $ 6n ( ) 2 values of cj into the equation g(c1 , . . . , cn ) = 0 gives λ = 2 k=1 1/σk . Therefore, $ 6n ( ) & n 2 % 1 '−1 1 2 k=1 1/σk = · 2, 1 ≤ j ≤ n. cj = 2 2σj2 σj k=1 σk

c) Unbiasedness is a desirable property because it means that, on average, the estimator will give the actual value of the unknown parameter; minimum variance is a desirable property because it means that, on average, the estimator will give the closest estimate of (least error for) the value of the unknown parameter. d) Let σ 2 denote the common variance of the Xj s. From part (b), the best linear unbiased estimator of µ is obtained by choosing the cj s as follows: &% &% ' ' n n * n +−1 1 1 1 −1 1 1 −1 1 · = · = · 2 = , 1 ≤ j ≤ n. cj = 2 2 2 2 2 n σ σ σ σ σ σ j k=1 k k=1 Therefore, in this case, the best linear unbiased estimator of µ is Y =

n %

j =1

cj Xj =

n 1% Xj = X¯ n . n j =1

10-32

CHAPTER 10

Expected Value of Continuous Random Variables

10.76 ,∞ ( )−(ν+1)/2 dt < ∞. Using symmetry a) We note that T has finite variance if and only if −∞ t 2 1 + t 2 /ν and making the substitution u = 1 + t 2 /ν, we find that "

∞ −∞

t2

)(ν+1)/2 dt = 2 ( 1 + t 2 /ν

"



0

t2

)(ν+1)/2 dt = ν ( 1 + t 2 /ν

3/2

"

∞ 1

√ u−1 du. (ν+1)/2 u

Noting that the integrand on, the right of the preceding display is asymptotic to u−ν/2 as u → ∞ and ∞ recalling from calculus that 1 u−p du converges (i.e., is finite) if and only if p > 1, we see that T has finite variance if and only if ν/2 > 1, that is, if and only if ν > 2. b) Using integration by parts, we find that "

0



t2

)(ν+1)/2 ( 1 + t 2 /ν

Making the substitution u = integrates to 1, we get "

∞ 0

dt =

ν ν−1

"

∞ 0

dt )(ν−1)/2 . ( 2 1 + t /ν

√ (ν − 2)/ν t and using the fact that a t PDF with ν − 2 degrees of freedom dt

( )(ν−1)/2 = 1 + t 2 /ν

=

1 = 2

ν ν−2

=

"

ν ν−2

∞ 0

"

du )((ν−2)+1)/2 ( 1 + u2 /(ν − 2)



du ) )( ( −∞ 1 + u2 /(ν − 2) (ν−2)+1 /2 ( ) √ = (ν − 2)π $ (ν − 2)/2 1 ν ( ) = · 2 ν−2 $ (ν − 1)/2 ( ) $ (ν − 2)/2 1√ ). = νπ ( 2 $ (ν − 1)/2

Recalling from Exercise 10.22 that E(T ) = 0, we conclude that

( )" ∞ $ (ν + 1)/2 t2 Var(T ) = E = √ dt ) ( νπ $(ν/2) −∞ 1 + t 2 /ν (ν+1)/2 ( )" ∞ $ (ν + 1)/2 t2 =2√ )(ν+1)/2 dt ( νπ $(ν/2) 0 1 + t 2 /ν ( ) ( ) $ (ν − 2)/2 $ (ν + 1)/2 1√ ν ) νπ ( · · =2√ νπ $(ν/2) ν − 1 2 $ (ν − 1)/2 ( ) ( ) ( ) (ν − 1)/2 $ (ν − 1)/2 ν 1 $ (ν − 2)/2 ) ( )· ( ) = 2( · (ν − 2)/2 $ (ν − 2)/2 ν − 1 2 $ (ν − 1)/2 (

=

T2

)

ν . ν−2

10.4

Conditional Expectation

10-33

10.4 Conditional Expectation Basic Exercises 10.77 We know that Y|X=x ∼ U(0, x) for 0 < x < 1. a) Recall that for a U(a, b) distribution, the mean and variance are (a + b)/2 and (b − a)2 /12, respectively. Hence, for 0 < x < 1, E(Y | X = x) = x/2

and

Var(Y | X = x) = x 2 /12.

b) Referring to part (a) and applying the law of total expectation, we get & ' ( ) X 1 1 1 1 E(Y ) = E E(Y | X) = E = E(X) = · = . 2 2 2 2 4

Referring to part (a) and applying the law of total variance, we get

& 2' & ' ( ) X X Var(Y ) = E Var(Y | X) + Var E(Y | X) = E + Var 12 2 (

= =

)

( )2 + 1 1 ( 2) 1 1* E X + Var(X) = Var(X) + E(X) + Var(X) 12 4 12 4 )2 1 1( 1 1 1 1 7 Var(X) + E(X) = · + · = . 3 12 3 12 12 4 144

10.78 For 0 < x < 1, fX (x) =

"



−∞

fX,Y (x, y) dy =

"

x

x+1

2x dy = 2x.

Therefore, fY | X (y | x) =

2x fX,Y (x, y) = =1 fX (x) 2x

if x < y < x + 1, and fY | X (y | x) = 0 otherwise. So, we see that Y|X=x ∼ U(x, x + 1) for 0 < x < 1. Consequently, E(Y | X = x) =

x + (x + 1) 1 =x+ 2 2

if 0 < x < 1, and E(Y | X = x) = 0 otherwise. Also, ( )2 (x + 1) − x 1 Var(Y | X = x) = = 12 12 if 0 < x < 1, and Var(Y | X = x) = 0 otherwise.

10-34

CHAPTER 10

10.79 We have fY (y) =

Expected Value of Continuous Random Variables

"



−∞

fX,Y (x, y) dx =

"

2

2y

2xy dx = 4y(1 − y 2 ).

if 0 < y < 1, and fY (y) = 0 otherwise. Therefore, for 0 < y < 1, fX | Y (x | y) =

x fX,Y (x, y) 2xy = = fY (y) 4y(1 − y 2 ) 2(1 − y 2 )

if 2y < x < 2, and fX | Y (x | y) = 0 otherwise. Thus, " ∞ E(X | Y = y) = xfX | Y (x | y) dx = −∞

1 2(1 − y 2 )

"

2 2y

x 2 dx =

4(1 − y 3 ) 3(1 − y 2 )

if 0 < y < 1, and E(X | Y = y) = 0 otherwise. Applying the law of total expectation, we now find that " ∞ " 1 " ) 4(1 − y 3 ) 16 1 ( 2 4 dy = 8 . E(X) = · 4y(1 − y E(X | Y = y) fY (y) dy = ) dy = y − y 2 3 0 5 −∞ 0 3(1 − y )

10.80 a) We have

fX (x) =

"



−∞

fX,Y (x, y) dy =

"



0

λµe

−(λx+µy)

dy = λe

−λx

if x > 0, and fX (x) = 0 otherwise. Thus, for x > 0, fY | X (y | x) =

"

∞ 0

µe−µy dy = λe−λx

fX,Y (x, y) λµe−(λx+µy) = = µe−µy fX (x) λe−λx

if y > 0, and fY | X (y | x) = 0 otherwise. We observe that Y|X=x ∼ E(µ) for x > 0. It now follows that E(Y | X = x) =

1 µ

and

Var(Y | X = x) =

1 µ2

if x > 0, and E(Y | X = x) = 0 and Var(Y | X = x) = 0 otherwise. b) They don’t depend on x because X and Y are independent random variables. c) Referring to part (a) yields & ' ( ) 1 1 E(Y ) = E E(Y | X) = E = . µ µ

d) Referring to part (a) yields

( ) ( ) Var(Y ) = E Var(Y | X) + Var E(Y | X) = E

&

1 µ2

'

+ Var

& ' 1 1 1 = 2 +0 = 2. µ µ µ

10.81 a) Because X and Y are independent, knowing the value of X doesn’t affect the probability distribution of Y . Hence, E(Y | X = x) = E(Y ) for each possible value x of X. b) Because X and Y are independent, fY is a PDF of Y|X=x for each possible value x of X. Hence, " ∞ " ∞ yfY | X (y | x) dy = yfY (y) dy = E(Y ) . E(Y | X = x) = −∞

−∞

c) Because X and Y are independent, knowing the value of X doesn’t affect the probability distribution of Y . Hence, Var(Y | X = x) = Var(Y ) for each possible value x of X.

10.4

Conditional Expectation

10-35

d) Because X and Y are independent, fY is a PDF of Y|X=x for each possible value x of X. Therefore, from Definition 10.3, part (b), and the conditional form of the FEF, + *( + *( )2 )2 Var(Y | X = x) = E Y − E(Y | X = x) | X = x = E Y − E(Y ) | X = x " ∞ " ∞ ( )2 ( )2 y − E(Y ) fY | X (y | x) dy = y − E(Y ) fY (y) dy = Var(Y ) . = −∞

−∞

e) Because the expected value of a constant is that constant, we have, in view of part (b), that ( ) ( ) E E(Y | X) = E E(Y ) = E(Y ) .

f) Because the expected value of a constant is that constant and the variance of a constant equals 0, we have, in view of parts (b) and (d), that ( ) ( ) ( ) ( ) E Var(Y | X) + Var E(Y | X) = E Var(Y ) + Var E(Y ) = Var(Y ) + 0 = Var(Y ) .

10.82 ( ) a) By assumption Y|X=x ∼ N ρx, 1 − ρ 2 . Thus, E(Y |X = x) = ρx and Var(Y | X = x) = 1 − ρ 2 . Consequently, E(Y |X) = ρX and Var(Y | X) = 1 − ρ 2 . b) By assumption, X ∼ N (0, 1) and, so, E(X) = 0 and Var(X) = 1. Consequently, by referring to part (a) and applying the laws of total expectation and total variance, we get ( ) E(Y ) = E E(Y | X) = E(ρX) = ρE(X) = 0 and

) ( ) ( ) ( Var(Y ) = E Var(Y | X) + Var E(Y | X) = E 1 − ρ 2 + Var(ρX) ( ) ) ( = 1 − ρ 2 + ρ 2 Var(X) = 1 − ρ 2 + ρ 2 = 1.

c) In Example 9.13(b), we found that Y ∼ N (0, 1). Thus, if we used the PDF of Y to find its mean and variance, we would obtain E(Y ) = 0 and Var(Y ) = 1, which is what we got in part (b).

10.83 Starting from now, let X denote the amount of time, in hours, until the bus arrives and let Y denote the number of passengers at the bus stop when the bus arrives. By assumption, X ∼ Beta(2, 1) and Y|X=x ∼ P(λx). Hence, E(Y | X = x) = λx and Var(Y | X = x) = λx. Applying the law of total expectation and recalling that the mean and variance of a Beta(α, β) distribution are α/(α + β) and αβ/(α + β)2 (α + β + 1), respectively, we get ( ) 2 E(Y ) = E E(Y | X) = E(λX) = λE(X) = λ. 3

Also, by the law of total variance, ( ) ( ) Var(Y ) = E Var(Y | X) + Var E(Y | X) = E(λX) + Var(λX) 2 λ(λ + 12) 1 . = λE(X) + λ2 Var(X) = λ + λ2 = 3 18 18 10.84 a) Because X and Y are independent, observing X provides no information about Y . Consequently, the best predictor of Y , based on observing X, is just E(Y ). b) By the prediction theorem, the best predictor of Y , based on observing X, is E(Y | X). However, because X and Y are independent, we have E(Y | X) = E(Y ), as shown in Exercise 10.81.

10-36

CHAPTER 10

Expected Value of Continuous Random Variables

10.85 a) Observing X determines Y , namely, Y = g(X). So, clearly, g(X) is the best predictor of Y , based on observing X. b) By the prediction theorem, the best possible predictor of Y , based on observing X, is E(Y | X). Using the assumption that Y = g(X) and applying Equation (10.49) on page 602, with k(x) = g(x) and ℓ(x, y) = 1, we get ( ) ( ) E(Y | X) = E g(X) | X = E g(X) · 1 | X = g(X)E(1 | X) = g(X) · 1 = g(X). Hence, the best predictor of Y , based on observing X, is g(X).

10.86 From the prediction theorem (Proposition 10.8 on page 602) and the law of total expectation (Proposition 10.6 on page 599), we find that the minimum mean square error is & * ' *( ( ) ( )2 0 + )2 + 0 = E E Y − E(Y | X) X = E Var(Y | X) . E Y − E(Y | X)

10.87 Let X denote the division point of the interval (0, 1) and let L denote the length of the subinterval that contains c. Note that X ∼ U(0, 1). a) We have 1 − x, if 0 < x < c; E(L | X = x) = x, if c ≤ x < 1; 0, otherwise. Hence, by the law of total expectation, " c " 1 " ∞ E(L | X = x) fX (x) dx = (1 − x) · 1 dx + x · 1 dx = 1/2 + c − c2 . E(L) = 0

−∞

c

b) Referring to part (a), we want to maximize the function f (c) = 1/2 + c − c2 for 0 < c < 1. Differentiating f and setting the result equal to 0, we find that 1/2 is the value of c that maximizes the expected length of the interval that contains c. 10.88 Let Y denote the number of defects per yard. We know that Y|/=λ ∼ P(λ). Therefore, E(Y | / = λ) = λ

a) In this case, / ∼ U(0, 3), so that

and

Var(Y | / = λ) = λ.

(3 − 0)2 0+3 = 1.5 and Var(/) = = 0.75. 2 12 Hence, by the laws of total expectation and total variance, ( ) E(Y ) = E E(Y | /) = E(/) = 1.5 E(/) =

and

( ) ( ) Var(Y ) = E Var(Y | /) + Var E(Y | /) = E(/) + Var(/) = 1.5 + 0.75 = 2.25.

b) In this case, / ∼ T (0, 3), so that

(3 − 0)2 0+3 = 1.5 and Var(/) = = 0.375. 2 24 Hence, by the laws of total expectation and total variance, ( ) E(Y ) = E E(Y | /) = E(/) = 1.5 E(/) =

10.4

10-37

Conditional Expectation

and ( ) ( ) Var(Y ) = E Var(Y | /) + Var E(Y | /) = E(/) + Var(/) = 1.5 + 0.375 = 1.875.

10.89 Let Y denote the lifetime of the selected device. Let X = k if the device is of Type k. For k = 1, 2, we have E(Y | X = k) = µk and Var(Y | X = k) = σk2 . Thus, E(Y | X) = (2 − X)µ1 + (X − 1)µ2 = 2µ1 − µ2 + (µ2 − µ1 )X and

Var(Y | X) = (2 − X)σ12 + (X − 1)σ22 = 2σ12 − σ22 + (σ22 − σ12 )X.

Moreover, observing that we can write X = 1 + W , where W ∼ B(1, 1 − p), we see that E(X) = 1 + E(W ) = 1 + (1 − p) = 2 − p

and

Var(X) = Var(W ) = p(1 − p).

Hence, by the laws of total expectation and total variance, ( ) ( ) E(Y ) = E E(Y | X) = E 2µ1 − µ2 + (µ2 − µ1 )X = 2µ1 − µ2 + (µ2 − µ1 )E(X) = 2µ1 − µ2 + (µ2 − µ1 )(2 − p) = pµ1 + (1 − p)µ2

and

( ) ( ) Var(Y ) = E Var(Y | X) + Var E(Y | X) * + ( ) = E 2σ12 − σ22 + (σ22 − σ12 )X + Var 2µ1 − µ2 + (µ2 − µ1 )X = 2σ12 − σ22 + (σ22 − σ12 )E(X) + (µ2 − µ1 )2 Var(X)

= 2σ12 − σ22 + (σ22 − σ12 )(2 − p) + (µ2 − µ1 )2 p(1 − p) = pσ12 + (1 − p)σ22 + p(1 − p)(µ2 − µ1 )2 .

10.90 6 a) Let Y denote the total claim amount. We can write Y = N k=1 Yk , where N is the number of(claims ) and Y1 , Y2 , . (. . , )are independent Beta(1, 2) random variables. Note that N ∼ B(32, 1/6) and that E Yj = 1/3 and Var Yj = 1/18. From Examples 7.26 and 7.29 on pages 383 and 387, respectively, E(Y ) =

1 1 1 16 E(N) = · 32 · = 3 3 6 9

and & '2 1 1 1 1 5 8 40 64 1 · 32 · + · 32 · · = + = . Var(N) = 3 18 6 9 6 6 27 81 81 6 b) Let Y denote the total claim amount. We can write Y = 32 k=1 Zk , where Zk denotes the claim amount for risk k. Let Xk = 1 if risk k makes a claim and let Xk = 0 otherwise. Note that each Xk ∼ B(1, 1/6) and, therefore, E(Xk ) = 1/6 and Var(Xk ) = 5/36. Now, given Xk = 1, Zk ∼ Beta(1, 2), whereas, given Xk = 0, Zk = 0. Thus, ! ! 1/3, if x = 1; 1/18, if x = 1; E(Zk | Xk = x) = and Var(Zk | Xk = x) = 0, otherwise. 0, otherwise. 1 Var(Y ) = E(N) + 18

Consequently,

E(Zk | Xk ) =

1 Xk 3

and

Var(Zk | Xk ) =

1 Xk . 18

10-38

CHAPTER 10

Expected Value of Continuous Random Variables

Applying the laws of total expectation and total variance, we get ' & ( ) 1 1 1 1 1 E(Zk ) = E E(Zk | Xk ) = E Xk = E(Xk ) = · = 3 3 3 6 18

and

' & ' 1 1 Var(Zk ) = E Var(Zk | Xk ) + Var E(Zk | Xk ) = E Xk + Var Xk 18 3 (

)

(

)

&

1 1 1 1 1 5 E(Xk ) + Var(Xk ) = · + · 18 9 18 6 9 36 2 = . 81

=

Therefore, E(Y ) = E

&% 32

Zk

k=1

'

=

'

=

32 % k=1

E(Zk ) = 32 ·

1 16 = 18 9

and, because the Zk s are independent, Var(Y ) = Var

&% 32

Zk

k=1

32 % k=1

Var(Zk ) = 32 ·

2 64 = . 81 81

10.91 We use units of thousands of dollars. Let Y denote the claim payment and let X denote the amount of damage if there is partial damage to the car. By assumption, f is a PDF of X. Also, let W = 0 if there is no damage, W = 1 if there is partial damage, and W = 2 if there is a total loss. Now, we have Y|W =0 = 0, Y|W =1 = max{0, X − 1}, and Y|W =2 = 14. Applying the FEF for continuous random variables yields (

)

E(Y | W = 1) = E max{0, X − 1} = =

"

1

0

0 · f (x) dx +

= 1.2049.

"

"

15 1



−∞

max{0, x − 1}f (x) dx

(x − 1)f (x) dx = 0.5003

"

15 1

(x − 1) e−x/2 dx

Therefore, by the law of total probability and the FEF for discrete random variables, ( ) % E(Y ) = E E(Y | W ) = E(Y | W = w) pW (w) w

= 0 · pW (0) + 1.2049 · pW (1) + 14 · pW (2)

= 1.2049 · 0.04 + 14 · 0.02 = 0.328196. The expected claim payment is $328.20.

10.4

10-39

Conditional Expectation

Theory Exercises 10.92 a) From the bivariate form of the FEF and the general multiplication rule for continuous random variables, ( ) E E(Z | X, Y ) = = = = =

"



"



−∞ −∞

"



"



−∞ −∞

"



z

−∞

"



−∞ " ∞ −∞

&"

E(Z | X = x, Y = y) fX,Y (x, y) dx dy &" ∞

∞ −∞

"

zfZ | X,Y (z | x, y) dz fX,Y (x, y) dx dy



−∞ −∞

z

&"



"

'



−∞ −∞

' fX,Y (x, y)fZ | X,Y (z | x, y) dx dy dz '

fX,Y,Z (x, y, z) dx dy dz

zfZ (z) dz

= E(Z) . b) From the multivariate form of the FEF and the general multiplication rule for continuous random variables, ( ) E E(Xm | X1 , . . . , Xm−1 ) " ∞ " ∞ ··· E(Xm | X1 = x1 , . . . , Xm−1 = xm−1 ) fX1 ,...,Xm−1 (x1 , . . . , xm−1 ) dx1 · · · dxm−1 = −∞

=

"

=

"



−∞

···

"

−∞ ∞

−∞

&"



−∞

× dx1 · · · dxm−1 &" ∞ " ∞ xm ···

−∞

−∞

'

xm fXm | X1 ,...,Xm−1 (xm | x1 , . . . , xm−1 ) dxm fX1 ,...,Xm−1 (x1 , . . . , xm−1 ) ∞

−∞

'

fX1 ,...,Xm−1 (x1 , . . . , xm−1 )fXm | X1 ,...,Xm−1 (xm | x1 , . . . , xm−1 )

× dx1 · · · dxm−1 dxm = =

"



−∞ ∞

"

−∞

xm

&"

∞ −∞

···

"



−∞

' fX1 ,...,Xm (x1 , . . . , xm ) dx1 · · · dxm−1 dxm

xm fXm (xm ) dxm

= E(Xm ) . 10.93 6 a) Let X = n nIAn . Observe that {X = n} = An for all n ∈ N . We see that X is a discrete random variable with PMF given by pX (x) = P (An ) for x ∈ N , and pX (x) = 0 otherwise. Applying the law of

10-40

CHAPTER 10

Expected Value of Continuous Random Variables

total expectation and the FEF for discrete random variables, we get ( ) % E(Y | X = x) pX (x) E(Y ) = E E(Y | X) = =

% n

x

E(Y | X = n) P (X = n) =

% n

E(Y | An ) P (An ).

b) Let B be an event and set Y = IB . Applying part (a) yields % % % E(Y | An ) P (An ) = E(IB | An ) P (An ) = P (B | An )P (An ), P (B) = E(IB ) = E(Y ) = n

n

n

which is the law of total probability.

Advanced Exercises 10.94 Let Y denote the lifetime of the component and let E be the event that the component fails immediately. Note that E(Y | E) = 0. Applying Exercise 10.93(a) with the partition E and E c , we obtain ) ( ) ( ) ( ) ( ) ( ) ( E(Y ) = E Y | E P E + E Y | E c P E c = 0 · p + E Y | E c (1 − p) = (1 − p)E Y | E c .

c By assumption, the conditional , ∞ ( of Y given ( )distribution ) E has CDF F . Consequently, by Proposition 10.5 c on page 577, we have E Y | E = 0 1 − F (y) dy. Therefore, " ∞ ( ) 1 − F (y) dy. E(Y ) = (1 − p) 0

Note: We could also solve this problem with the law of total expectation by conditioning on the random variable X = IE .

10.95 *( )2 + a) Let G(a, b) = E Y − (a + bX) . We want to find real numbers a and b that minimize G. For convenience, set ρ = ρ(X, Y ). Now, * ( + ) ( ) ∂G (a, b) = E 2 Y − (a + bX) · (−1) = −2 E(Y ) − a − bE(X) ∂a * ( + * ) ( )+ ∂G (a, b) = E 2 Y − (a + bX) · (−X) = −2 E(XY ) − aE(X) − bE X 2 . ∂b Setting these two partial deriviatives equal to 0 and solving for a and b, we find that σY σY and b=ρ . a = µY − ρ µX σX σX Thus, the best linear predictor of Y based on observing X is ' & σY σY σY µY − ρ µX + ρ X = µY + ρ (X − µX ). σX σX σX

b) Referring to part (a), we have, for the optimal a and b, 4 & '52 & '2 ( )2 σY σY = (Y − µY ) − ρ (X − µX ) Y − (a + bX) = Y − µY + ρ (X − µX ) σX σX = (Y − µY )2 − 2ρ

σ2 σY (X − µX )(Y − µY ) + ρ 2 Y2 (X − µX )2 . σX σX

10.4

10-41

Conditional Expectation

Therefore, the minimum mean square error for linear prediction is 5 4 2 σ σ Y E (Y − µY )2 − 2ρ (X − µX )(Y − µY ) + ρ 2 Y2 (X − µX )2 σX σX = Var(Y ) − 2ρ = σY2 − 2ρ

σ2 σY Cov(X, Y ) + ρ 2 Y2 Var(X) σX σX

( ) σ2 σY ρσX σY + ρ 2 Y2 σX2 = 1 − ρ 2 σY2 . σX σX

10.96 a) In Exercise 10.67(a), we found that ρ = ρ(X, Y ) = 1/n. Applying now Exercise 10.95(a) and the equations referred to in the statement of this exercise, we find that # σY n 1 n/(n + 2)(n + 1)2 1 n2 − 1 n−1 a = µY − ρ µX = − # = = n + 1 n n/(n + 2)(n + 1)2 n + 1 n(n + 1) n σX

and

# 1 n/(n + 2)(n + 1)2 1 σY = # = . b=ρ σX n n/(n + 2)(n + 1)2 n

Thus, the best linear predictor of Y , based on observing X, is

1−x n−1 1 + x =1− . n n n b) The two predictors are the same; that is, the best linear predictor is also the best predictor. c) The two predictors are identical because the best predictor is linear and, hence, in this case, the best predictor is also the best linear predictor. d) As we have seen, in this case, the best linear predictor is also the best predictor. the minimum ) Thus, ( 2 2 mean square error for linear prediction, which, by Exercise 10.95(b), is 1 −(ρ σY , must) also be the minimum mean square error for prediction, which, by Exercise 10.86, is E Var(Y | X) . To verify explicitly that these two quantities are equal, we proceed as follows. In Example 10.19 on page 600, we found that ( ) n−1 E Var(Y | X) = . n(n + 1)(n + 2) Furthermore, 4 ( ) & '2 5 ) ( n n2 − 1 1 n n−1 2 2 = = . 1 − ρ σY = 1 − n n(n + 1)(n + 2) (n + 2)(n + 1)2 n2 (n + 2)(n + 1)2 a + bx =

10.97 a) We have

"



"

1&

6 4 2− x− y 5 5 −∞ 0 if 0 < x < 1, and fX (x) = 0 otherwise. Hence, for 0 < x < 1, fX (x) =

fX,Y (x, y) dy =

fY | X (y | x) =

'

dy =

8 6 − x 5 5

2 − 65 x − 45 y 5 − 3x − 2y fX,Y (x, y) = = 8 6 fX (x) 4 − 3x 5 − 5x

if 0 < y < 1, and fY | X (y | x) = 0 otherwise.

10-42

CHAPTER 10

Expected Value of Continuous Random Variables

Consequently, " 1 1 yfY | X (y | x) dy = y · (5 − 3x − 2y) dy E(Y | X = x) = 4 − 3x 0 −∞ & ' " 1 ( ) 1 1 11 3 = 5y − 3xy − 2y 2 dy = − x 4 − 3x 0 4 − 3x 6 2 "



11 − 9x 6(4 − 3x)

=

if 0 < x < 1, and E(Y | X = x) = 0 otherwise. We can now conclude from the prediction theorem that, if an employee took X hundred hours of sick leave last year, the best prediction for the number of hundreds of hours, Y , that he will take this year is E(Y | X) =

11 − 9X . 6(4 − 3X)

b) We see from Exercise 10.95 that to find the best linear predictor of Y , based on observing X, we must determine µX , µY , σX , σY , and ρ = ρ(X, Y ). From the FEF, E(X) =

"



"



−∞ −∞

=

"

E(Y ) =

"

&

1



0

"

xfX,Y (x, y) dx dy =

8 6 − x 5 5

'

dx =

"

0

1

x 0

4"

0

1 &8

6 x − x2 5 5

1&

'

6 4 2− x− y 5 5

dx =

5

'

dy dx

'

5

2 5

and

=



"



−∞ −∞

"

1



0

&

yfX,Y (x, y) dx dy =

7 4 − y 5 5

'

dy =

"

0

"

1

y 0

4"

0

1 &7

4 y − y2 5 5

1&

'

4 6 2− x− y 5 5

dy =

dx dy

13 30

and ( ) E X2 = =

"



"



−∞ −∞

"

1

2

x ·

0

&

x 2 fX,Y (x, y) dx dy = 8 6 − x 5 5

'

dx =

"

0

"

1 &8

1

x2

0

4"

0

6 x − x3 5 5 2

1&

'

6 4 2− x− y 5 5

dx =

5

'

dy dx

'

5

7 30

and ( ) E Y2 = =

"



"



−∞ −∞

"

0

1

2

y ·

&

2

y fX,Y (x, y) dx dy = 7 4 − y 5 5

'

dy =

"

0

"

1 &7

1

y 0

2

4"

0

4 y − y3 5 5 2

1&

'

6 4 2− x− y 5 5

dy =

4 15

dx dy

10.4

and E(XY ) =

"



"



−∞ −∞

"

1

4"

10-43

Conditional Expectation

xyfX,Y (x, y) dx dy =

1&

4 6 = x 2y − xy − y 2 5 5 0 0 ' " 1& 11 3 2 1 = x− x dx = . 15 5 6 0

'

"

1

x 0

4"

1

0

5

dy dx =

&

6 4 y· 2− x− y 5 5 "

1

0



&

11 3 − x 15 5

'

'

5

dy dx

dx

It now follows that µX = 2/5, µY = 13/30, σX2 = 11/150, σY2 = 71/900, and Cov(X, Y ) = −1/150. Therefore, from Exercise 10.95(a), the best linear predictor of the number of hundreds of hours, Y , an employee will take this year if he took X hundred hours last year is & ' σY 13 1 2 31 1 µY + ρ (X − µX ) = − X− = − X. σX 30 11 5 66 11

c) The results of parts (a) and (b) are different because the best predictor is not linear. d) Using the best predictor and the best linear predictor, the estimates for the number of hundreds of hours of sick leave an employee will take this year if he took 10 hours (1/10 hundred hours) last year are 1 11 − 9 · 10 101 += * ≈ 0.455 1 222 6 4 − 3 · 10

and

31 76 1 1 · = ≈ 0.461, − 165 66 11 10 respectively. In other words, for an employee who took 10 hours of sick leave last year, the best prediction for the number of hours he will take this year is about 45.5 hours and the best linear prediction is about 46.1 hours. 10.98 a) Assuming finite expectations, we define E(Y | X = x) =

% y

ypY | X (y | x)

if fX (x) > 0, and E(Y | X = x) = 0 otherwise; and we define " ∞ xfX | Y (x | y) dx E(X | Y = y) = −∞

if pY (y) > 0, and E(X | Y = y) = 0 otherwise. b) From the FEF for continuous random variables and part (a), 5 " ∞ " ∞ 4% ( ) E E(Y | X) = E(Y | X = x) fX (x) dx = ypY | X (y | x) fX (x) dx −∞

= =

"



−∞

% y

4 % y

5

−∞

y

% hX,Y (x, y) y y fX (x) dx = fX (x) y

ypY (y) = E(Y ) .

&"



−∞

hX,Y (x, y) dx

'

10-44

CHAPTER 10

Expected Value of Continuous Random Variables

From the FEF for discrete random variables and part (a), (

)

E E(X | Y ) = = =

% y

E(X | Y = y) pY (y) =

% &" y

"



−∞

% &" y

∞ −∞

'

xfX | Y (x | y) dx pY (y)

5 ' " ∞ 4% hX,Y (x, y) x x hX,Y (x, y) dx dx pY (y) = pY (y) −∞ −∞ y ∞

xfX (x) dx = E(X) .

10.99 a) In Exercise 9.86(b), we showed that Y|X=x ∼ P(x) for x > 0 and that X|Y =y ∼ $(y + 1, 1 + λ) for y = 0, 1, . . . . Consequently, we have E(Y | X = x) = x if x > 0, and equals 0 otherwise; and E(X | Y = y) = (y + 1)/(1 + λ) if y = 0, 1, . . . , and equals 0 otherwise. b) In Exercise 9.86(a), we showed that X ∼ E(λ). Hence, from part (a) and the law of total expectation,

( ) 1 E(Y ) = E E(Y | X) = E(X) = . λ We also showed in Exercise 9.86(a) that Y has the Pascal distribution, as defined in Exercise 7.26 on page 336, with parameter 1/λ. From that exercise, E(Y ) = 1/λ. Hence, from part (a) and the law of total expectation, & ' ( ) Y +1 E(Y ) + 1 1/λ + 1 1 E(X) = E E(X | Y ) = E = = = . 1+λ 1+λ 1+λ λ

10.100 a) By assumption Y|X=x ∼ P(x) for x > 0. Hence, E(Y | X = x) = x if x > 0, and equals 0 otherwise. b) From the law of total expectation and part (a), ( ) α E(Y ) = E E(Y | X) = E(X) = . β

c) In Exercise 9.87(b), we found that Y has the Pascal distribution, as given by Equation (5.50) on page 243, with parameters α and β/(β + 1). For convenience, set p = β/(β + 1). Referring to the binomial series, Equation (5.45) on page 241, we get ' ' ∞ & ∞ & % % % −α α −α y α E(Y ) = ypY (y) = y p (p − 1) = p (p − 1) y (p − 1)y−1 y y y y=0 y=0 4 5 ' ∞ & )−α d % d ( −α α = p (p − 1) (p − 1)y = pα (p − 1) 1 + (p − 1) dp y=0 y dp & ' d 1 α = pα (p − 1) p−α = αpα (1 − p)p −α−1 = α −1 = . dp p β

d) In Exercise 9.87(c), we found that X|Y =y ∼ $(y + α, β + 1) for y = 0, 1, . . . . Hence, we see that E(X | Y = y) = (y + α)/(β + 1) if y = 0, 1, . . . , and equals 0 otherwise. e) From the law of total expectation and part (d), & ' ( ) Y +α E(Y ) + α α/β + α α E(X) = E E(X | Y ) = E = = = . β +1 β +1 β +1 β

10.5

The Bivariate Normal Distribution

10-45

10.5 The Bivariate Normal Distribution Basic Exercises 10.101 We have µX = −2, σX2 = 4, µY = 3, σY2 = 2, and ρ = −0.7. Moreover, Cov(X, Y ) = ρσX σY = (−0.7) · 2 ·



√ 2 = −1.4 2.

From the linearity property of expected value, µW = E(W ) = E(−2X + 4Y − 12) = −2E(X) + 4E(Y ) − 12 = −2 · (−2) + 4 · 3 − 12 = 4. Also, from properties of variance and covariance, 2 = Var(W ) = Var(−2X + 4Y − 12) = Var(−2X + 4Y ) σW = Var(−2X) + Var(4Y ) + 2 Cov(−2X, 4Y ) = 4 Var(X) + 16 Var(Y ) − 16 Cov(X, Y ) √ √ = 4 · 4 + 16 · 2 − 16 · (−1.4 2) = 48 + 22.4 2.

Hence, σW =

# √ 48 + 22.4 2 = 8.93.

10.102 a) Because f1 and f2 are nonnegative functions, so is f . Moreover, "



"



−∞ −∞

' f1 (x, y) + f2 (x, y) dx dy 2 −∞ −∞ ' &" ∞ " ∞ " ∞" ∞ 1 f1 (x, y) dx dy + f2 (x, y) dx dy = 2 −∞ −∞ −∞ −∞

f (x, y) dx dy =

=

"



"



&

1 (1 + 1) = 1. 2

b) From Proposition 10.10 on page 616, we know that the marginals of f1 are both N (0, 1), as are those of f2 . Then ' &" ∞ ' " ∞ f1 (x, y) + f2 (x, y) 1 f (x, y) dy = dy = f1 (x, y) dy + f2 (x, y) dy 2 2 −∞ −∞ −∞ −∞ ' & 1 1 1 1 2 −x 2 /2 −x 2 /2 +√ e = √ e−x /2 . = √ e 2 2π 2π 2π

"



Similarly, we find that

"



&

"



1 2 f (x, y) dx = √ e−y /2 . 2π −∞

Hence, both marginals of f are standard normal PDFs. 10.103 a) The level curves of the joint PDF of X and Y are obtained by setting fX,Y (x, y) equal to a positive constant or, equivalently, they are the curves of the form Q(x, y) − k = 0, where k is a constant. This

10-46

CHAPTER 10

Expected Value of Continuous Random Variables

second-order equation can be written in the form 1 2 2ρ 1 x − xy + 2 y 2 + ax + by + c = 0, 2 σX σY σX σY where a, b, and c are constants. The discriminant of this equation is & '2 ) 2ρ 1 1 4 ( 1= − 4 · 2 · 2 = 2 2 ρ2 − 1 . σX σY σX σY σX σY

Because −1 < ρ < 1, we see that 1 < 0 and, consequently, we know from analytic geometry that the level curves are ellipses (or circles). b) From analytic geometric, the level-curve ellipses are parallel to the coordinate axes if and only if the coefficient of the xy-term is 0, that is, if and only if ρ = 0. In view of Proposition 10.9 on page 613, this latter condition is satisfied if and only if X and Y are independent random variables. 10.104 We have µX = 120,

σX = 20,

µY = 82,

σY = 15,

ρ = 0.6.

a) From the previous display, we find that < # 2σX σY 1 − ρ 2 = 2 · 20 · 15 1 − 0.62 = 480, ( ) ( ) 2 1 − ρ 2 = 2 1 − 0.62 = 2(0.64) = 32/25, 2ρ = 2 · 0.6 = 6/5. Therefore, from Definition 10.4 on page 610, fX,Y (x, y) =

1 e 480π

25 − 32

!*

x−120 20

+2

− 65

*

x−120 20

+*

y−82 15

+ * +2 > + y−82 15

b) From Proposition 10.10(a) on page 616, X ∼ N (120, 400). Hence, the systolic blood pressures for this population are normally distributed with mean 120 and standard deviation 20. c) From Proposition 10.10(b), Y ∼ N (82, 225). Hence, the diastolic blood pressures for this population are normally distributed with mean 82 and standard deviation 15. d) From Proposition 10.10(c), ρ(X, Y ) = ρ = 0.6. Hence, the correlation between the systolic and diastolic blood pressures for this population is 0.6. We see that there is a moderate positive correlation between the two types of blood pressures. In particular, people with high systolic blood pressures tend to have higher diastolic blood pressures than people with low systolic blood pressures. e) We have σY 15 µY + ρ (x − µX ) = 82 + 0.6 · (x − 120) = 28 + 0.45x σX 20 and ( ) ( ) σY2 1 − ρ 2 = 225 1 − 0.62 = 144.

Consequently, from Proposition 10.10(d), Y|X=x ∼ N (28 + 0.45x, 144). The diastolic blood pressures for people in the population who have a systolic blood pressure of x are normally distributed with mean 28 + 0.45x and standard deviation 12. f) From part (e), we see that, in this case, the regression equation is y = 28 + 0.45x. Hence, the predicted diastolic blood pressure for an individual in this population whose systolic blood pressure is 123 equals 28 + 0.45 · 123, or 83.35.

10.5

g) We have µX + ρ and

10-47

The Bivariate Normal Distribution

σX 20 (y − µY ) = 120 + 0.6 · (y − 82) = 54.4 + 0.8y σY 15 ( ) ( ) σX2 1 − ρ 2 = 400 1 − 0.62 = 256.

Thus, from Proposition 10.10(e), X|Y =y ∼ N (54.4 + 0.8y, 256). The systolic blood pressures for people in the population who have a diastolic blood pressure of y are normally distributed with mean 54.4 + 0.8y and standard deviation 16. For a diastolic blood pressure of 86, we have 54.4 + 0.8 · 86 = 123.2. Consequently, ' & 130 − 123.2 = 1 − )(0.425) = 0.335. P (X > 130 | Y = 86) = 1 − ) 16 10.105 Throughout this exercise, nitrate concentrations are in units of micromoles per liter and arsenic concentrations are in units of nanomoles per liter. We have µX = 49.8,

σX = 30.7,

µY = 2.26,

σY = 1.18,

ρ = 0.732.

a) From the previous display, we find that < # 2σX σY 1 − ρ 2 = 2 · 30.7 · 1.18 1 − 0.7322 = 49.362, ( ) ( 2 1 − ρ 2 = 2 1 − 0.7322 ) = 0.928, 2ρ = 2 · 0.732 = 1.464.

Therefore, from Definition 10.4 on page 610,

!*

−1.077 1 e fX,Y (x, y) = 49.362π

x−49.8 30.7

+2

−1.464

*

x−49.8 30.7

+*

y−2.26 1.18

+ * +2 > + y−2.26 1.18

b) From Proposition 10.10(a) on page 616, X ∼ N (49.8, 30.72 ). Nitrate concentration is normally distributed with mean 49.8 and standard deviation 30.7 . c) From Proposition 10.10(b), Y ∼ N (2.26, 1.182 ). Arsenic concentration is normally distributed with mean 2.26 and standard deviation 1.18. d) From Proposition 10.10(c), ρ(X, Y ) = ρ = 0.732. Hence, the correlation between nitrate concentration and arsenic concentration is 0.732. A relatively high positive correlation exists between nitrate concentration and arsenic concentration. e) We have µY + ρ and

σY 1.18 (x − µX ) = 2.26 + 0.732 · (x − 49.8) = 0.859 + 0.028x σX 30.7 ( ) ( ) σY2 1 − ρ 2 = 1.182 1 − 0.7322 = 0.646.

So, from Proposition 10.10(d), Y|X=x ∼ N (0.859 + 0.028x, 0.646). Given a nitrate concentration of x, arsenic concentrations are normally distributed with mean 0.859 + 0.028x and standard deviation 0.804. f) From part (e), we see that, in this case, the regression equation is y = 0.859 + 0.028x, whose graph is a straight line with slope 0.028 and y-intercept 0.859. Hence, the predicted arsenic concentration for a nitrate concentration of x is 0.859 + 0.028x. g) From part (f), we see that the predicted arsenic concentration for a nitrate concentration of 60 is 0.859 + 0.028 · 60, or 2.539.

10-48

CHAPTER 10

Expected Value of Continuous Random Variables

10.106 a) From Example 10.22(b), the regression equation for predicting the height of an eldest son from the height of his mother is y = 34.9 + 0.537x. Hence, the predicted height of the eldest son of a mother who is 5′ 8′′ (i.e., 68 inches) tall is 34.9 + 0.537 · 68, or about 71.4 inches. b) We have µX + ρ

σX 2.7 (y − µY ) = 63.7 + 0.5 · (y − 69.1) ≈ 31.533 + 0.466y. 2.9 σY

Consequently, the predicted height of the mother of an eldest son who is 5′ 8′′ (i.e., 68 inches) tall is 31.533 + 0.466 · 68, or about 63.2 inches. ′ ′′ c) From Example 10.22, the heights of eldest sons of mothers who √ are 5 2 (i.e., 62 inches) tall are normally distributed with mean 68.2 inches and standard deviation 6.31 inches. Hence, & ' 72 − 68.2 P (Y > 72 | X = 62) = 1 − ) = 1 − )(1.51) = 0.065. √ 6.31 d) From Table I on page A-39, the probability that a normally distributed random variable takes a value within 1.96 standard deviations of its mean is 0.95. Hence, from Example 10.22, the required prediction √ interval is 68.2 ± 1.96 · 6.31, or (63.277′′ , 73.123′′ ).

10.107 a) We know that we can express X and Y in the form given in Equations (10.55), where Z1 and Z2 are independent standard normal random variables. Consequently, a linear combination of X and Y is also a linear combination of Z1 and Z2 plus a constant (which may be 0). Referring now to Proposition 9.14 on page 540, we conclude that any nonzero linear combination of X and Y is also normally distributed. b) We have E(X + Y ) = E(X) + E(Y ) = µX + µY and

Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X, Y ) = σX2 + σY2 + 2ρσX σY . ( ) Hence, from part (a), X + Y ∼ N µX + µY , σX2 + σY2 + 2ρσX σY . Also, we have E(Y − X) = E(Y ) − E(X) = µY − µX

and

( ) ( ) ( ) Var(Y − X) = Var Y + (−1)X = Var(Y ) + Var (−1)X + 2 Cov Y, (−1)X

= Var(Y ) + (−1)2 Var(X) − 2 Cov(Y, X) = Var(X) + Var(Y ) − 2 Cov(X, Y )

= σX2 + σY2 − 2ρσX σY . ( ) Hence, from part (a), Y − X ∼ N µY − µX , σX2 + σY2 − 2ρσX σY .

10.108 From Example 10.21,

µX = 63.7,

σX = 2.7, µY = 69.1, σY = 2.9, ρ = 0.5. ( ) a) From Exercise 10.107(b), X + Y ∼ N µX + µY , σX2 + σY2 + 2ρσX σY , which, in view of Proposition 8.15 on page 468, implies that 4 5 X+Y µX + µY σX2 + σY2 + 2ρσX σY ∼N , = N (66.4, 5.883). 2 2 4

10.5

10-49

The Bivariate Normal Distribution

b) From Exercise 10.107(b), + * Y − X ∼ N µY − µX , σX2 + σY2 − 2ρσX σY = N (5.4, 7.870).

c) Here we want E(Y − X), which, by part (b), equals 5.4 inches. d) We have, in view of part (b),

&

0 − 5.4 P (Y > X) = P (Y − X > 0) = 1 − P (Y − X ≤ 0) = 1 − ) √ 7.870 = 1 − )(−1.925) = 0.973.

'

e) Here we want P (Y > 1.1X) = P (Y − 1.1X > 0). We have E(Y − 1.1X) = E(Y ) − 1.1E(X) = 69.1 − 1.1 · 63.7 = −0.97. and Var(Y − 1.1X) = Var(Y ) + (−1.1)2 Var(X) + 2 · (−1.1) Cov(X, Y ) = Var(Y ) + 1.21 Var(X) − 2.2ρσX σY = 8.618.

Thus, from Exercise 10.107(a), Y − 1.1X ∼ N (−0.97, 8.618). Consequently, P (Y > 1.1X) = P (Y − 1.1X > 0) = 1 − P (Y − 1.1X ≤ 0) & ' 0 − (−0.97) =1−) = 1 − )(0.330) = 0.371. √ 8.618 10.109 We have µX = 0, σX = 1, µY = 0, σY = 1. Hence, fX,Y (x, y) = From the FEF and symmetry, (

)

E max{X, Y } =

"



"



−∞ −∞

1

# e 2π 1 − ρ 2

( ) − ( 1 2 ) x 2 −2ρxy+y 2 2 1−ρ

max{x, y}fX,Y (x, y) dx dy = 2

""

.

yfX,Y (x, y) dx dy.

y>x

To evaluate the integral, we rotate the coordinate axes π/4 radians. That is, we set 1 1 x = √ u− √ v 2 2

and

1 1 y = √ u + √ v. 2 2

The Jacobian determinant of the inverse transformation is identically 1. Moreover, simple algebra shows that & 2 ' ( 2 ) 1 u v2 1 2 x − 2ρxy + y = + . 2 1+ρ 1−ρ 2(1 − ρ 2 )

10-50

CHAPTER 10

Expected Value of Continuous Random Variables

Therefore, (

)

E max{X, Y } = =

1

# π 1 − ρ2

"

0

1 # π 2(1 − ρ 2 )

∞ &" ∞

* 2 + ' u v2 1 − 21 1+ρ + 1−ρ du dv √ (u + v) e 2 −∞ &" ∞ ' " ∞ " ∞ v2 u2 u2 − 2(1−ρ) − 2(1+ρ) − 2(1+ρ) e ue du + v e du dv 0

"



−∞

* + # 0 + v 2π(1 + ρ) dv

−∞

1 # e π 2(1 − ρ 2 ) 0 " ∞ " ∞ # v2 1−ρ 1 − 2(1−ρ) ve dv = √ e−w dw = (1 − ρ)/π. =√ π(1 − ρ) 0 π(1 − ρ) 0 =

v2 − 2(1−ρ)

10.110 To simulate bivariate normal random variables with specified means, variances, and correlation given by µ1 and µ2 , σ12 and σ22 , and ρ, respectively, proceed as follows. First simulate independent standard normal random variables, Z1 and Z2 , by applying the Box-Müller transformation. Then set X = µ1 + σ1 Z1 Y = µ2 + ρσ2 Z1 +

<

1 − ρ 2 σ2 Z2 .

( ) Referring to Equations (10.54) and (10.55) on page 608, we see that (X, Y ) ∼ BVN µ1 , σ12 , µ2 , σ22 , ρ .

Theory Exercises 10.111 a) We have

µY = E(Y ) = E(β0 + β1 X + ϵ) = β0 + β1 E(X) + E(ϵ) = β0 + β1 E(X) + 0 = β0 + β1 µX . b) As X and ϵ are independent random variables, we have σY2 = Var(Y ) = Var(β0 + β1 X + ϵ) = β12 Var(X) + Var(ϵ) = β12 σX2 + σ 2 . c) As X and ϵ are independent random variables, we have Cov(X, Y ) = Cov(X, β0 + β1 X + ϵ) = β1 Cov(X, X) + Cov(X, ϵ) = β1 Var(X) + 0 = β1 σX2 . Consequently, β1 σ 2 σX Cov(X, Y ) = < X = β1 . ρ = ρ(X, Y ) = √ σY Var(X) Var(Y ) σX2 σY2

d) From parts (c) and (b),

4 5 2 ( ) σ σY2 1 − ρ 2 = σY2 1 − β12 X2 = σY2 − β12 σX2 = σ 2 . σY

e) From parts (a) and (c),

& ' σY µX . β0 = µY − β1 µX = µY − ρ σX

f) This result just re-expresses the one from part (c).

10.5

The Bivariate Normal Distribution

10-51

g) From Equation (10.49) on page 602 with k(x) = β0 + β1 x and ℓ(x, y) = 1, and from the independence of X and ϵ, we get E(Y | X) = E(β0 + β1 X + ϵ | X) = E(β0 + β1 X | X) + E(ϵ | X) = (β0 + β1 X)E(1 | X) + E(ϵ) = (β0 + β1 X) · 1 + 0 = β0 + β1 X.

Hence, E(Y | X = x) = β0 + β1 x. h) From Equation (10.49) on page 602, and from the independence of X and ϵ, we get ) ( ) ( ) ( E Y 2 | X = E (β0 + β1 X + ϵ)2 | X = E (β0 + β1 X)2 + 2(β0 + β1 X)ϵ + ϵ 2 | X ) ( ) ( ) ( = E (β0 + β1 X)2 | X + 2E (β0 + β1 X)ϵ | X + E ϵ 2 | X ( ) = (β0 + β1 X)2 + 2(β0 + β1 X)E(ϵ | X) + E ϵ 2 | X ( ) = (β0 + β1 X)2 + 2(β0 + β1 X)E(ϵ) + E ϵ 2 = (β0 + β1 X)2 + 2(β0 + β1 X) · 0 + σ 2

= (β0 + β1 X)2 + σ 2 . ) ( Hence, we see that E Y 2 | X = x = (β0 + β1 x)2 + σ 2 . Therefore, from part (g), ( ) ( )2 Var(Y | X = x) = E Y 2 | X = x − E(Y | X = x) = (β0 + β1 x)2 + σ 2 − (β0 + β1 x)2 = σ 2 .

10.112 a) As X and ϵ are independent normally distributed random variables and Y = β0 + β1 X + ϵ, it follows from Proposition 9.14 on page 540 that Y is normally distributed. b) From the argument leading up to Definition 10.4 on page 610, we know that two random variables, X and Y , that can be written in the form of Equations (10.55), where Z1 and Z2 are independent standard normal random variables, have a bivariate normal distribution. Set Z1 = (X − µX )/σX and Z2 = ϵ/σ . Then Z1 and Z2 are independent standard normal random variables. Now, we have X = µX + σX Z1 and Y = β0 + β1 X + ϵ = β0 + β1 (µX + σX Z1 ) + σ Z2 . Therefore, X = µX + σX Z1 + 0 · Z2 Y = (β0 + β1 µX ) + (β1 σX )Z1 + σ Z2 ,

Hence, X and Y have a bivariate normal distribution. c) From part (b) and Proposition 10.10(d) on page 616, we know that Y|X=x is normally distributed. Moreover, from parts (g)(and (h) of Exercise 10.111, E(Y | X = x) = β0 + β1 x and Var(Y | X = x) = σ 2 . ) Therefore, Y|X=x ∼ N β0 + β1 x, σ 2 .

Advanced Exercises

10.113 ,∞ ,∞ a) Clearly, f (x, y) ≥ 0 for all (x, y) ∈ R2 . Thus, it remains to show that −∞ −∞ f (x, y) dx dy = 1. Now, &" ∞ ' "" " 1 −c −x 2 /2 −y 2 /2 f (x, y) dx dy = e e dy dx π −∞ 0 A

1 = π

√ " −c " −c 2π 1 2 −x 2 /2 e dx = √ e−x /2 dx. 2 2π −∞ −∞

10-52

CHAPTER 10

Similarly,

"" C

Expected Value of Continuous Random Variables

1 f (x, y) dx dy = √ 2π

Also, ""

1 f (x, y) dx dy = π

B

Hence, " ∞"



−∞ −∞

c

−c

e

−x 2 /2

4"



e−x

2 /2

dx.

c

0

e −∞

−y 2 /2

5

dy dx

√ " c " c 2π 1 2 2 = e−x /2 dx = √ e−x /2 dx. 2π −c 2π −c

f (x, y) dx dy =

""

f (x, y) dx dy +

A

1 =√ 2π 1 =√ 2π b) We have

"

"

""

f (x, y) dx dy +

B

"

−c

−∞ " ∞ −∞

e

−x 2 /2

e−x

2 /2

1 dx + √ 2π

""

f (x, y) dx dy

C

"

c

−c

e

−x 2 /2

1 dx + √ 2π

"



e−x

2 /2

dx

c

dx = 1.

" ⎧ 1 −x 2 /2 ∞ −y 2 /2 ⎪ ⎪ e dy, ⎪ " ∞ ⎨ πe 0 fX (x) = f (x, y) dy = " 0 ⎪ 1 2 2 −∞ ⎪ ⎪ ⎩ e−x /2 e−y /2 dy, π −∞

if |x| > c; if |x| < c.

1 2 = √ e−x /2 . 2π

Thus, X ∼ N (0, 1). At this point, we must take c = )−1 (3/4). Then, " ⎧ 1 −y 2 /2 c −x 2 /2 ⎪ ⎪ e dx, if y < 0; e ⎪ " ∞ ⎨π −c f (x, y) dx = fY (y) = " " ⎪ 1 −y 2 /2 −c −x 2 /2 1 −y 2 /2 ∞ −x 2 /2 −∞ ⎪ ⎪ ⎩ e e dx + e e dx, if y > 0. π π −∞ c ⎧ √ " 2 2π −y 2 /2 c 1 −x 2 /2 ⎪ ⎪ dx, if y < 0; √ e ⎪ ⎨ π e 2π 0 = √ " ⎪ ⎪ 2 2π −y 2 /2 ∞ 1 −x 2 /2 ⎪ ⎩ dx, if y > 0. e √ e π 2π c ⎧ √ ) 2 2 −y 2 /2 ( ⎪ ⎪ )(c) − )(0) , if y < 0; ⎪ ⎨ √π e 1 2 = = √ e−y /2 . √ ⎪ 2π ) ⎪ 2 2 −y 2 /2 ( ⎪ ⎩ √ e 1 − )(c) if y > 0. π

Thus, Y ∼ N (0, 1).

10.5

c) Note that, for each y ∈ R, the function gy (x) = xe−x ""

xyf (x, y) dx dy =

"

=

"

A

∞ 0 ∞ 0

10-53

The Bivariate Normal Distribution

1 −y 2 /2 ye π

&"

−c −∞

2 /2

is odd. Therefore, '

gy (x) dx dy

& " ∞ ' "" 1 −y 2 /2 ye − gy (x) dx dy = − xyf (x, y) dx dy π c C

and ""

"

xyf (x, y) dx dy =

B

Hence, E(XY ) = =

"



"

0 −∞



1 −y 2 /2 ye π

A

=−

xyf (x, y) dx dy +

""

'

c −c

gy (x) dx dy =

"

0

−∞

1 −y 2 /2 ye · 0 dy = 0. π

xyf (x, y) dx dy

−∞ −∞

""

&"

""

xyf (x, y) dx dy +

B

xyf (x, y) dx dy + 0 +

C

""

""

xyf (x, y) dx dy

C

xyf (x, y) dx dy = 0.

C

Therefore, in view of part (b), Cov(X, Y ) = E(XY ) − E(X) E(Y ) = 0 − 0 · 0 = 0. Thus, X and Y are uncorrelated. d) We can show that X and Y aren’t independent in various ways. One way is to note that P (|X| < c, Y > 0) = 0 ̸=

) 1 1 ( 1 = · = 2)(c) − 1 · P (Y > 0) = P (|X| < c)P (Y > 0). 4 2 2

e) From part (c), X and Y are uncorrelated. If they were bivariate normal, then, by Proposition 10.9 on page 613, they would be independent, which, by part (d), they aren’t. Hence, X and Y are not bivariate normal random variables. 10.114 a) From Proposition 10.10 on page 616, Y ∼ N (0, 1). Therefore, P (X ≤ x, Y > 0) P (Y > 0) ,x ,∞ ' " x &" ∞ −∞ 0 fX,Y (s, y) ds dy = fX,Y (s, y) dy ds. =2 1/2 −∞ 0

FX | Y >0 (x) = P (X ≤ x | Y > 0) =

Therefore, from Proposition 8.5 on page 422, fX | Y >0 (x) = 2

"

0



fX,Y (x, y) dy.

10-54

CHAPTER 10

Expected Value of Continuous Random Variables

( ) b) From Proposition 10.10(e), X|Y =y ∼ N ρy, 1 − ρ 2 . We now get, by referring to part (a), applying the general multiplication rule, and making the substitution u = y 2 , ' " ∞ & " ∞ " ∞ xfX | Y >0 (x) dx = x 2 fX,Y (x, y) dy dx E(X | Y > 0) = −∞

=2

"

0

∞ &" ∞

2ρ =√ 2π

"

0

−∞ ∞

−∞

'

0

xfX | Y (x | y) dx fY (y) dy = 2

ye

−y 2 /2

" # dy = ρ 2/π

0



"



0

ρyfY (y) dy

# e−u du = ρ 2/π.

10.115 a) Recalling that the covariance of a random variable with itself equals the variance of the random variable, we see that the entries of ! provide the four possible covariances involving X and Y . For that reason, ! is called the covariance matrix of X and Y . b) The covariance matrix is symmetric because Cov(Y, X) = Cov(X, Y ). To show that it’s positive definite, we first note that . /. / . /. / Var(X) Cov(X, Y ) Cov(X, X) Cov(X, Y ) x x ′ = [x y ] x !x = [ x y ] y y Cov(Y, X) Var(Y ) Cov(Y, X) Cov(Y, Y ) = x 2 Cov(X, X) + yx Cov(Y, X) + xy Cov(X, Y ) + y 2 Cov(Y, Y )

= Cov(xX + yY, xX + yY ) = Var(xX + yY ) ≥ 0.

Thus, ! is nonnegative definite. Now suppose that x ̸= 0. Then xX + yY is a nonzero linear combination of X and Y and, hence, by Exercise 10.107(a), is normally distributed. In particular, then, we have P (xX + yY = c) = 0 for all real numbers c. It now follows from Proposition 7.7 on page 354 that x′ !x = Var(xX + yY ) > 0. Hence, ! is positive definite. c) We have ( )2 det ! = Var(X) Var(Y ) − Cov(X, Y ) Cov(Y, X) = σX2 σY2 − Cov(X, Y ) ( ) = σX2 σY2 − ρ 2 σX2 σY2 = 1 − ρ 2 σX2 σY2 . # Therefore, (det !)1/2 = σX σY 1 − ρ 2 . Now, for convenience, set σXY = Cov(X, Y ). We then have . 2 / . / −σXY σX σXY −1 σY2 1 −1 ! = = . det ! −σXY σXY σY2 σX2 Consequently, * + 1 ) 2 2 (x − µX )2 σY2 − 2(x − µX )(y − µY )σXY + (y − µY )2 σX2 (x − µ)′ !−1 (x − µ) = ( 1 − ρ 2 σX σY -& ' & '& ' & ' @ x − µX 2 x − µX y − µY y − µY 2 1 ) − 2ρ + =( σX σX σY σY 1 − ρ2 = Q(x, y).

Referring now to Definition 10.4 on page 610, we see that fX,Y (x, y) =

1 1 1 1 ′ −1 # e− 2 Q(x,y) = e− 2 (x−µ) ! (x−µ) . 1/2 2 2π(det ! ) 2πσX σY 1 − ρ

10-55

Review Exercises for Chapter 10

d) Let X be a normal random variable. Define the following 1 × 1 vectors and matrix: x = [x], µ = [µX ], and ! = [Var(X)]. Then ( )−1 1 1 2 2 − 1 (x−µX ) Var(X) (x−µX ) e−(x−µX ) /2σX = √ ( fX (x) = √ )1/2 e 2 2π σX 2π Var(X)

1 1 ′ −1 =√ e− 2 (x−µ) ! (x−µ) . 1/2 2π(det !) A ( )B e) Define ! = Cov Xi , Xj , an m × m matrix, and let ⎡µ ⎤ ⎡ ⎤ x1 X1 . and µ = ⎣ ... ⎦ . x = ⎣ .. ⎦ xm µXm

Then, in view of parts (c) and (d), an educated guess for the form of the joint PDF of an m-variate normal distribution is 1 1 ′ −1 e− 2 (x−µ) ! (x−µ) . fX1 ,...,Xm (x1 , . . . , xm ) = m/2 1/2 (2π) (det !)

Review Exercises for Chapter 10 Basic Exercises 10.116 Let T ∼ T (−1, 1). Then fT (t) = 1 − |t| if |t| < 1, and fT (t) = 0 otherwise. By symmetry, we have E(T ) = 0. Also, " 1 " 1 ( 2) 1 2 Var(T ) = E T = t fT (t) dt = 2 t 2 (1 − t) dt = . 6 −1 0

Now suppose that X ∼ T (a, b). From Exercise 8.125, we can write X = (a + b)/2 + (b − a)T /2, where T ∼ T (−1, 1). Therefore, & ' a+b b−a a+b b−a a+b b−a a+b E(X) = E + T = + E(T ) = + ·0= , 2 2 2 2 2 2 2 and

Var(X) = Var

&

a+b b−a + T 2 2

'

10.117 a) Let g(x) = βex . We have " " ∞ |g(x)|fX (x) dx = −∞

=

∞ 0

&

b−a 2

x

βe αe

'2

−αx

(b − a)2 1 (b − a)2 · = . 4 6 24

Var(T ) =

dx = αβ

"

0



e−(α−1)x dx.

Applying the FEF, we conclude that Y has finite expectation if and only if α > 1. b) In Exercise 8.128(b), we found that fY (y) = αβ α /y α+1 if y > β, and fY (y) = 0 otherwise. Hence, by the definition of expected value, " ∞ " ∞ " ∞ αβ α 1 αβ α . yfY (y) dy = y α+1 dy = αβ y −α dy = αβ α · = E(Y ) = α−1 α−1 y (α − 1)β −∞ β β

10-56

CHAPTER 10

c) From the FEF, " ( X) E(Y ) = E βe =



0

Expected Value of Continuous Random Variables

x

βe fX (x) dx =

"



x

βe αe

0

−αx

dx = αβ

"



0

e−(α−1)x dx =

αβ . α−1

d) Clearly, P (Y > y) = 1 if y ≤ β. For y > β, " ∞ " ∞ " ∞ αβ α αβ α βα α −(α+1) P (Y > y) = fY (t) dt = dt = αβ t dt = = . αy α yα t α+1 y y y Therefore, from Proposition 10.5 on page 577, " " ∞ P (Y > y) dy = E(Y ) = 0

= β + βα

"

0



β

1 dy +

y −α dy = β +

β

"

∞ β

βα dy yα

βα αβ = . α−1 α−1 (α − 1)β

e) Let Y denote loss amount. Then Y has the Pareto distribution with parameters α = 2 and β = 3. Therefore, αβ 2·3 E(Y ) = = = 6. α−1 2−1 Hence, the expected loss amount is $6000. 10.118 a) From the FEF, ( ) E(Y ) = E X1/β = =α Also,





−∞

0

Therefore, (

Y2

)

(

"

− E(Y )



0

)2

x 1/β fX (x) dx =

x (1/β+1)−1 e−αx dx = α ·

*( )2 + ( ) = E Y 2 = E X1/β =α

Var(Y ) = E

"

"

"



−∞

"

∞ 0

x 1/β α e−αx dx

$(1/β + 1) $(1 + 1/β) = . (1/β+1) α 1/β α

" * +2 x 1/β fX (x) dx =

x (2/β+1)−1 e−αx dx = α ·

$(1 + 2/β) = − α 2/β

&

∞ 0

x 2/β α e−αx dx

$(2/β + 1) $(1 + 2/β) = . (2/β+1) α 2/β α

$(1 + 1/β) α 1/β

'2

( )2 $(1 + 2/β) − $(1 + 1/β) = . α 2/β

b) Let Y denote the lifetime of the component. Then Y has the Weibull distribution with parameters α = 2 and β = 3. Hence, from part (a),

$(4/3) $(1 + 1/β) = . 1/β α 21/3 The mean lifetime is approximately 0.71 year. Also, ( )2 ( )2 $(1 + 2/β) − $(1 + 1/β) $(5/3) − $(4/3) Var(Y ) = = . α 2/β 22/3 √ The standard deviation of the lifetime is approximately 0.066 = 0.26 year. E(Y ) =

10-57

Review Exercises for Chapter 10

10.119 a) From the FEF, ( ) E(Y ) = E eX = =

"



−∞

x

"

e √



ex fX (x) dx

−∞

1

2π σ

e

−(x−µ)2 /2σ 2

1

dx = √ 2π σ

"

∞ −∞

e−

( ) (x−µ)2 /2σ 2 −x

.

Doing some algebra and completing the square we find that * ( )+2 & ' x − µ + σ2 σ2 2 2 − µ+ . (x − µ) /2σ − x = 2 2σ 2 ( ) 2 Because the PDF of a N µ + σ 2 , σ 2 distribution integrates to 1, we conclude that E(Y ) = eµ+σ /2 . Also, *( ) + " ∞ ( ) ( 2) 2 2 E Y = E eX = ex fX (x) dx =

"



−∞

1 1 2 2 e2x √ e−(x−µ) /2σ dx = √ 2π σ 2π σ −∞

"



−∞

e−

( ) (x−µ)2 /2σ 2 −2x

.

Doing some algebra and completing the square we find that * ( )+2 x − µ + 2σ 2 ( ) 2 . − 2 µ + σ (x − µ)2 /2σ 2 − 2x = 2σ 2 ( ) ) ( ) ( 2 Because the PDF of a N µ + 2σ 2 , σ 2 distribution integrates to 1, we conclude that E Y 2 = e2 µ+σ . Consequently, * +2 * 2 + ( ) )2 ( ) ( 2 2 2 Var(Y ) = E Y 2 − E(Y ) = e2 µ+σ − eµ+σ /2 = e2µ+σ eσ − 1 .

b) Let Y denote claim amount. We know that Y has the lognormal distribution with parameters µ = 7 and σ 2 = 0.25. Therefore, from part (a), E(Y ) = eµ+σ

2 /2

= e7.125 .

The mean claim amount is approximately $1242.65. Also, * 2 + * + 2 Var(Y ) = e2µ+σ eσ − 1 = e14.25 e0.25 − 1 . The standard deviation of claim amounts is approximately



438,584.796 = $662.26.

10.120 √ a) For the solution part,) see Example 10.7(b) on page 574, where we found that E(S) = 2 2/π σ . ( 2 to this b) Let W = X + Y 2 + Z 2 /σ 2 . Then, from Propositions 8.15, 8.13, and 9.13 on pages 468, 466, and 537, respectively, ' & '2 & '2 & '2 & X Y Z 3 1 X2 + Y 2 + Z 2 , . = + + ∼$ W = σ σ σ 2 2 σ2

10-58

CHAPTER 10

Expected Value of Continuous Random Variables

√ Noting that S = σ W , we apply the FEF to get * √ + " E(S) = E σ W =

"

√ (1/2)3/2 3/2−1 −w/2 w σ w e dw $(3/2) −∞ 0 " ∞ # σ $(2) σ = 3/2 · w2−1 e−w/2 dw = 3/2 = 2 2/π σ. 2 2 $(3/2) 0 2 $(3/2) (1/2) ∞

√ σ w fW (w) dw =



√ c) For the solution to this part, see Example 10.7(a) on page 574, where we found that E(S) = 2 2/π σ .

10.121 a) We have

( ) E(Y ) = E X(1 − X) = = Also,

"

1

0

"



−∞

x(1 − x)fX (x) dx =

x 2−1 (1 − x)2−1 dx = B(2, 2) =

( ) ( ) E(XY ) = E X · X(1 − X) = E X2 (1 − X) = = b) We have

"

1

0

x 3−1 (1 − x)2−1 dx = B(3, 2) =

E(Xn ) Therefore,

and

=

"



−∞

n

"



"

1 0

x(1 − x) dx

1 . 6

2

−∞

x (1 − x)fX (x) dx =

"

1 0

x 2 (1 − x) dx

1 . 12

x fX (x) dx =

"

1 0

x n dx =

1 . n+1

) ( ) 1 1 ( ) ( 1 E(Y ) = E X(1 − X) = E X − X 2 = E(X) − E X2 = − = 2 3 6

( ) ( ) ( ) ( ) 1 1 1 E(XY ) = E X · X(1 − X) = E X2 − X 3 = E X2 − E X 3 = − = . 3 4 12 c) Referring to part (b), we find that E(XY ) =

1 1 1 = · = E(X) E(Y ) . 12 2 6

Clearly, X and Y are not independent because, in fact, Y is a function of X. d) Two random variables can be uncorrelated without being independent. 10.122 The area of the triangle is 1/2 and, hence, fX,Y (x, y) = 2 if (x, y) is in the triangle, and equals 0 otherwise. a) We have 5 " ∞" ∞ " 1 4" 1−x " 1 1 E(X) = xfX,Y (x, y) dx dy = 2 x 1 dy dx = 2 x(1 − x) dx = . 3 −∞ −∞ 0 0 0 By symmetry, E(Y ) = 1/3.

10-59

Review Exercises for Chapter 10

b) We have 2

E(X) =

"

"





2

x fX,Y (x, y) dx dy = 2

−∞ −∞

"

1

x

0

2

4"

5

1−x 0

1 dy dx = 2

"

1

0

x 2 (1 − x) dx =

1 . 6

Hence, Var(X) = (1/6) − (1/3)2 = 1/18. By symmetry, Var(Y ) = 1/18. c) Examining the triangle over which X and Y are uniformly distributed, we see that large values (close to 1) of X correspond to small values (close to 0) of Y . Thus, it appears that X and Y are negatively correlated, that is, that the sign of the correlation coefficient is negative. d) We have 5 " 1 4" 1−x " 1 " ∞" ∞ 1 xyfX,Y (x, y) dx dy = 2 x y dy dx = x(1 − x)2 dx = . E(XY ) = 12 −∞ −∞ 0 0 0 Thus, 1 E(XY ) − E(X) E(Y ) 1/12 − (1/3)(1/3) Cov(X, Y ) = √ = √ =− . ρ(X, Y ) = √ 2 (1/18)(1/18) Var(X) Var(Y ) Var(X) Var(Y ) 10.123 Let X denote the manufacturer’s annual loss and let Y denote the amount of that loss not paid by the insurance policy. Then ! X, if X ≤ 2; Y = 2, if X > 2. Let g(x) =

!

x, if x ≤ 2; 2, if x > 2.

Then Y = g(X) and, hence, by the FEF, ( ) E(Y ) = E g(X) = 2.5

= 2.5(0.6)

"

"

g(x)f (x) dx =

"

−2.5

2.5



−∞ 2

x 0.6

dx + 5(0.6)

2 0.6

"

x · 2.5(0.6) ∞ 2

2.5 −3.5

x

dx +

"

∞ 2

2 · 2.5(0.6)2.5 x −3.5 dx

x −3.5 dx = 0.934.

10.124 Note that the time it takes to complete the two successive tasks is Y . a) We have " ∞ " y " y ( ) −(x+2y) −2y fY (y) = fX,Y (x, y) dx = 6e dx = 6e e−x dx = 6e−2y 1 − e−y 0

−∞

0

if y > 0, and fY (y) = 0 otherwise. Hence, by the definition of expected value, E(Y ) =

"

=6



−∞

&"

yfY (y) dy = 6

0



y

2−1 −2y

e

"



0

dy −

"

ye

−2y

∞ 0

y

( ) 1 − e−y dy = 6

2−1 −3y

e

dy

'

&

&"

0



ye

−2y

$(2) $(2) =6 − 2 22 3

dy − '

=

"

∞ 0

5 . 6

ye

−3y

dy

'

10-60

CHAPTER 10

Expected Value of Continuous Random Variables

b) Applying the FEF, we get " ∞" ∞ " E(Y ) = yfX,Y (x, y) dx dy = 6 −∞ −∞

=6 =6

"



ye

0

&" &

0



−2y

ye

&"

−2y

y

e 0

−x

dy −

$(2) $(2) =6 − 2 22 3

'

"

=

'

ye

0

y

0

dx dy = 6 ∞



−3y

dy

5 . 6

y

e

0

"

'

&"



0

−(x+2y)

'

dx dy

( ) ye−2y 1 − e−y dy

=6

&"



y

0

2−1 −2y

e

dy −

"

∞ 0

y

2−1 −3y

e

dy

'

10.125 Let X denote printer lifetime, in years. Then X ∼ E(1/2). Also, let Y denote the refund amount, in hundreds of dollars, for one printer. We have, 2, if X ≤ 1; Y = 1, if 1 < X ≤ 2; 0, if X > 2. Define g(x) =

-

2, if x ≤ 1; 1, if 1 < x ≤ 2; 0, if x > 2.

Then Y = g(X) and, hence, by the FEF, " ∞ " ( ) E(Y ) = E g(X) = g(x)fX (x) dx = −∞

1 0

2 · fX (x) dx +

"

2 1

1 · fX (x) dx

) ( ) ( = 2P (0 < X ≤ 1) + P (1 < X ≤ 2) = 2 1 − e−1/2 + e−1/2 − e−1 = 2 − e−1/2 − e−1 .

( ) Therefore, the expected refund, in hundreds of dollars, for 100 printers is 100 2 − e−1/2 − e−1 , or about $10,255.90. 10.126 Let X denote the time to failure, in years, denote the time to discovery of the device’s failure. ! 2, Y = X, Let

g(x) =

!

= 2P (0 < X ≤ 2) +

1 3

"

2



if X ≤ 2; if X > 2.

2, if x ≤ 2; x, if x > 2.

Then Y = g(X) and, hence, by the FEF, " ∞ " ( ) E(Y ) = E g(X) = g(x)fX (x) dx = −∞

for the device. We know that X ∼ E(1/3). Let Y Then

2 0

2 · (1/3)e

−x/3

dx +

"

∞ 2

x · (1/3)e−x/3 dx

) 1 ( x e−x/3 dx = 2 1 − e−2/3 + · 15e−2/3 = 2 + 3e−2/3 . 3

The expected time to discovery of the device’s failure is about 3.54 years.

10-61

Review Exercises for Chapter 10

10.127 Let X denote the loss, in thousands of dollars. Then X ∼ U(0, 1). The expected payment without a deductible is E(X) = 0.5. Let Y denote the payment with a deductible of c thousand dollars. Then ! 0, if X ≤ c; Y = X − c, if X > c. From the FEF,

E(Y ) =

"

c

−∞

0 · fX (x) dx +

"

∞ c

(x − c)fX (x) dx =

"

1 c

(x − c) dx = (1 − c)2 /2.

We want to choose c so that (1 − c)2 /2 = 0.25 · 0.5, which means that c = 0.5. Thus, the deductible should be set at $500. 10.128 Let X and Y denote the lifetimes, in years, of the two generators. By assumption, both of these random variables have an E(1/10) distribution, and we assume, as reasonable, that they are independent random variables. The total time that the two generators produce electricity is X + Y . We have Var(X + Y ) = Var(X) + Var(Y ) =

1 1 + = 200. 2 (1/10) (1/10)2

Consequently, the standard deviation of the total time that the generators produce electricity is about 14.1 years.

√ 200, or

10.129 Let X and Y be independent standard normal random variables. Then X and Y have finite moments of all orders and, in particular, finite expectation. From Example 9.23 on page 541, the random variable Y/X has the standard Cauchy distribution, which, by Example 10.5 on page 568, doesn’t have finite expectation. 10.130 Let X denote loss amount and let Y denote processing time. We know that ! ! 2 /8, if 0 < x < 2; 1/x, if x < y < 2x; 3x and fY | X (y | x) = fX (x) = 0, otherwise. 0, otherwise. a) From the general multiplication rule,

3x 2 1 3x · = 8 x 8 if 0 < x < y < 2x < 4, and fX,Y (x, y) = 0 otherwise. Hence, " ∞ " min{y,2} 9y 2 /64, 3x ( ) fX,Y (x, y) dx = dx = fY (y) = 8 3 16 − y 2 /64, −∞ y/2 fX,Y (x, y) = fX (x)fY | X (y | x) =

and fY (y) = 0 otherwise. Consequently, E(Y ) = =

"



−∞

9 64

yfY (y) dy =

"

0

2

y 3 dy +

"

3 64

2 0

"

2

9y 2 y· dy + 64

4

"

4 2

if 0 < y < 2; if 2 < y < 4.

( ) 3 16 − y 2 y· dy 64

( ) 9 3 9 y 16 − y 2 dy = ·4+ · 36 = . 64 64 4

b) Applying the law of total expectation and recalling that Y|X=x ∼ U(x, 2x) for 0 < x < 2, we get ' & " " 2 ( ) 3 3 2 9 3x 2 9 3 X = E(X) = dx = x· x 3 dx = . E(Y ) = E E(Y | X) = E 2 2 2 0 8 16 0 4

10-62

CHAPTER 10

Expected Value of Continuous Random Variables

10.131 Let X and Y denote the remaining lifetimes of the husband and wife, respectively. By assump( ) tion, X and Y are independent U(0, 40) random variables. The problem is to find Cov X, max{X, Y } . Now fX,Y (x, y) = (1/40) · (1/40) = 1/1600 if 0 < x < 40 and 0 < y < 40, and equals 0 otherwise. We know that E(X) = 20. From the FEF and symmetry, ' " 40 &" x " ∞" ∞ ( ) 2 80 max{x, y}fX,Y (x, y) dx dy = x 1 dy dx = . E max{X, Y } = 1600 0 3 −∞ −∞ 0

Also,

( ) E X max{X, Y } =

"



"

−∞ −∞

x max{x, y}fX,Y (x, y) dx dy

5 " 40 4" 40 1 x2 1 dy dx + x y dy dx 1600 0 0 0 x " 40 " 40 ( ) 1 1 3 = x dx + x 1600 − x 2 dx = 400 + 200 = 600. 1600 0 3200 0

1 = 1600

Consequently,



"

40

&"

x

'

( ) ( ) ( ) 200 80 = . Cov X, max{X, Y } = E X max{X, Y } − E(X) E max{X, Y } = 600 − 20 · 3 3 10.132 a) Because the (PDF) of X is symmetric about 0, all odd moments of X equal 0. In particular, then, we have E(X) = E X3 = 0. Therefore, ) ( ) ( E(XY ) = E X · X2 = E X3 = 0 = E(X) E(Y ) .

b) No, although independence of two random variables is a sufficient condition for having the expected value of a product equal the product of the expected values, it is not a necessary condition. c) No, X and Y are not independent random variables; in fact, Y is a function of X. 10.133 For positive x and y, we can write fX,Y (x, y) =

1 1 1 y e−(x+y)/4 = e−x/4 · y e−y/4 . 64 4 16

This result shows that X ∼ E(1/4), Y ∼ $(2, 1/4), and that X and Y are independent random variables. a) Let T denote the lifetime of the device. Then T = max{X, Y }. Referring to Equation (8.49) on page 453, we see that, for t ≥ 0, ( ) FT (t) = P (T ≤ t) = P max{X, Y } ≤ t = P (X ≤ t, Y ≤ t) = P (X ≤ t)P (Y ≤ t) +* + * = 1 − e−t/4 1 − e−t/4 − (t/4)e−t/4 = 1 − 2e−t/4 + e−t/2 − (t/4)e−t/4 + (t/4)e−t/2 .

Differentiation now yields

fT (t) = if t > 0, and fT (t) = 0 otherwise.

1 −t/4 1 −t/2 1 1 e − e + te−t/4 − te−t/2 4 4 16 8

10-63

Review Exercises for Chapter 10

Therefore, "

E(T ) =



tfT (t) dt

−∞ " 1 ∞

=

4

te

0

−t/4

1 dt − 4

"



te

0

−t/2

1 dt + 16

"

0



2 −t/4

t e

1 dt − 8

1 $(2) 1 $(2) 1 $(3) 1 $(3) · − · + · − · = 9. 2 2 3 4 (1/4) 4 (1/2) 16 (1/4) 8 (1/2)3 Thus, the expected lifetime of the device is 9000 hours. b) Applying the FEF, we get " ∞" ∞ ( ) max{x, y}fX,Y (x, y) dx dy E(T ) = E max{X, Y } =

"

0



t 2 e−t/2 dt

=



−x/4

&"

−∞ −∞

' &" y ' " ∞ 1 1 1 −x/4 −y/4 2 −y/4 xe ye dy dx + y e e dx dy 16 0 0 0 16 0 4 " " ∞ + * + 1 1 ∞ −x/4 * −x/4 −x/4 xe 1−e − (x/4)e dx + y 2 e−y/4 1 − e−y/4 dy = 4 0 16 0 " ∞ " ∞ " ∞ 1 1 1 = xe−x/4 dx − xe−x/2 dx − x 2 e−x/2 dx 4 0 4 0 16 0 " ∞ " ∞ 1 1 2 −y/4 + y e dy − y 2 e−y/2 dy 16 0 16 0 1 = 4

"

x

1 $(2) 1 $(2) 1 $(3) 1 $(3) 1 $(3) − · − · + · − · = 9. · 2 2 3 3 4 (1/2) 16 (1/2) 16 (1/4) 16 (1/2)3 4 (1/4) Thus, the expected lifetime of the device is 9000 hours. c) Let W denote the cost of operating the device until its failure. Then W = 65 + 3X + 5Y . Recalling that X ∼ E(1/4) and Y ∼ $(2, 1/4), we get =

E(W ) = E(65 + 3X + 5Y ) = 65 + 3E(X) + 5E(Y ) = 65 + 3 ·

1 2 +5· = 117. 1/4 1/4

Thus, the expected cost of operating the device until its failure is $117. d) Referring to part (c) and using the independence of X and Y , we get 1 2 + 25 · = 944. 2 (1/4) (1/4)2 √ Thus, the standard deviation of the cost of operating the device until its failure is 944, or about $30.72. Var(W ) = Var(65 + 3X + 5Y ) = 9 Var(X) + 25 Var(Y ) = 9 ·

10.134 a) We have fX (x) =

!

1/12, 0,

if 0 < x < 12; otherwise.

and

fY | X (y | x) =

Hence, from the general multiplication rule, fX,Y (x, y) = fX (x)fY | X (y | x) = if 0 < y < x < 12, and fX,Y (x, y) = 0 otherwise.

!

1/x, if 0 < y < x; 0, otherwise.

1 1 1 · = 12 x 12x

10-64

CHAPTER 10

Expected Value of Continuous Random Variables

We know that E(X) = 6. Applying the FEF yields ' " 12 &" x " 12 " ∞" ∞ 1 1 1 E(Y ) = yfX,Y (x, y) dx dy = y dy dx = x dx = 3 12 0 x 24 0 −∞ −∞ 0

and

E(XY ) =

"

Consequently,



−∞

"



1 xyfX,Y (x, y) dx dy = 12 −∞

"

0

12

1 x· x

&"

x

0

'

1 y dy dx = 24

"

12

0

x 2 dx = 24.

Cov(X, Y ) = E(XY ) − E(X) E(Y ) = 24 − 6 · 3 = 6.

b) Recalling that X ∼ U(0, 12) and Y|X=x ∼ U(0, x), we apply the law of total probability to get & ' ( ) 1 1 X = E(X) = · 6 = 3 E(Y ) = E E(Y | X) = E 2 2 2

and, also referring to Equation (10.49) on page 602,

Consequently,

' & ( ) ( ) 1 ( ) X = E X2 E(XY ) = E E(XY | X) = E XE(Y | X) = E X · 2 2 & ' + * 2 ( )2 1 (12 − 0) 1 = Var(X) + E(X) + 62 = 24. = 2 2 12 Cov(X, Y ) = E(XY ) − E(X) E(Y ) = 24 − 6 · 3 = 6.

10.135 a) From the FEF, E(Y r )

=

"



"

r

−∞ −∞

"

8 = r +2

Therefore,

∞ 1

0

y fX,Y (x, y) dx dy = 8 x r+3 dx =

b) We have

)2 ( ) ( Var(Y ) = E Y 2 − E(Y ) = fY (y) =

"



−∞

1

x

0

&"

x

y

=4

&

8 1 8 · = . r +2 r +4 (r + 2)(r + 4)

8 − (2 + 2)(2 + 4)

fX,Y (x, y) dx = 8y

1 1 − r +2 r +4

'

"

1

y

0

=

'

dy dx

0

&

8 15

'2

=

8 . (r + 2)(r + 4)

1 64 11 − = . 3 225 225

( ) x dx = 4y 1 − y 2

if 0 < y < 1, and fY (y) = 0 otherwise. From the FEF, " 1 " ∞ " ( ) r r+1 r 2 E(Y ) = y fY (y) dy = 4 y 1 − y dy = 4 −∞

r+1

8 8 = (1 + 2)(1 + 4) 15

E(Y ) = and

"

0

1(

) y r+1 − y r+3 dy

10-65

Review Exercises for Chapter 10

Therefore, E(Y ) = and

c) We have

)2 ( ) ( Var(Y ) = E Y 2 − E(Y ) = fX (x) =

"



−∞

8 8 = (1 + 2)(1 + 4) 15

8 − (2 + 2)(2 + 4)

fX,Y (x, y) dy = 8x

&

"

0

8 15

x

'2

=

1 64 11 − = . 3 225 225

y dy = 4x 3

if 0 < x < 1, and fX (x) = 0 otherwise. We see that X ∼ Beta(4, 1). Therefore, E(X) =

4 4 = 4+1 5

and

Var(X) =

2 4·1 = . 75 (4 + 1)2 (4 + 1 + 1)

Now, for 0 < x < 1, fY | X (y | x) =

fX,Y (x, y) 8xy 2y = = 2 3 fX (x) 4x x

if 0 < y < x, and fY | X (y | x) = 0 otherwise. Therefore, " ∞ " x 2 2x r r r y fY | X (y | x) dy = 2 y r+1 dy = E(Y | X = x) = . r +2 x 0 −∞ Hence, 2 E(Y | X) = X 3

and

1 Var(Y | X) = X2 − 2

&

2 X 3

'2

=

1 2 X . 18

Applying now the laws of total expectation and total variance, we get & ' ( ) 2 2 2 4 8 E(Y ) = E E(Y | x) = E X = E(X) = · = 3 3 3 5 15

and

(

)

( ) Var(Y ) = E Var(Y | X) + Var E(Y | X) = E

&

' & ' 1 2 2 1 ( 2) 4 X + Var X = E X + Var(X) 18 3 18 9

( )2 + 4 )2 1 1* 1 ( Var(X) + E(X) + Var(X) = · E(X) + · Var(X) 18 9 18 2 & '2 1 4 1 2 11 = · + · = . 18 5 2 75 225 =

10.136 a) Referring to Exercise 10.135(c), we conclude from the prediction theorem, Proposition 10.8 on page 602, that the best predictor of the amount sold, given that the amount stocked is 0.5, is E(Y | X = 0.5) =

2 1 · 0.5 = . 3 3

10-66

CHAPTER 10

Expected Value of Continuous Random Variables

b) Here we want E(X | Y = 0.4). Referring to Exercise 10.135, we see that fX | Y (x | 0.4) =

50x 8x · (0.4) fX,Y (x, 0.4) ( )= = 2 fY (0.4) 21 4 · (0.4) 1 − (0.4)

if 0.4 < x < 1, and fX | Y (x | 0.4) = 0 otherwise. Therefore, " ∞ " 50 1 2 26 E(X | Y = 0.4) = xfX | Y (x | 0.4) dx = x dx = . 21 0.4 35 −∞

c) The amount left unsold is X − Y , so here we want E(Y | X − Y = 0.1). We can obtain that conditional expectation as follows. First we use the bivariate transformation theorem to show that ! 8u(u + v), if u > 0, v > 0, and u + v < 1; fY,X−Y (u, v) = 0, otherwise.

From that, we get

fX−Y (v) =

"



−∞

fY,X−Y (u, v) du = 8

"

1−v

u(u + v) du =

0

4 (1 − v)2 (2 + v) 3

if 0 < v < 1, and fX−Y (v) = 0 otherwise. Therefore, fX−Y (0.1) =

4 567 (1 − 0.1)2 (2 + 0.1) = 3 250

and

fY,X−Y (u, 0.1) 250 2000 = · 8u(u + 0.1) = u(u + 0.1) fX−Y (0.1) 567 567 if 0 < u < 0.9, and fY | X−Y (u | 0.1) = 0 otherwise. Consequently, " ∞ " 2000 0.9 2 93 E(Y | X − Y = 0.1) = ufY | X−Y (u | 0.1) du = u (u + 0.1) du = . 567 0 140 −∞ fY | X−Y (u | 0.1) =

10.137 For n ≥ 2, we haveXn |Xn−1 =xn−1 ∼ U(0, xn−1 ) and, hence, E(Xn | Xn−1 = xn−1 ) = xn−1 /2. Thus, by the law of total expectation, & ' ( ) Xn−1 1 E(Xn ) = E E(Xn | Xn−1 ) = E = E(Xn−1 ) . 2 2 Consequently,

&G n

E(Xk ) E(Xn ) = E(X1 ) E(Xk−1 ) k=2 10.138 a) We have

'

1 = · 2

& 'n−1 1 1 = n. 2 2

< # √ 2σX σY 1 − ρ 2 = 2 · 1.4 · 31.2 1 − (−0.9)2 = 87.36 0.19, ( ) ( ) 2 1 − ρ 2 = 2 1 − (−0.9)2 = 2(0.19) = 0.38, 2ρ = 2 · (−0.9) = −0.18.

Therefore, from Definition 10.4 on page 610,

!*

−2.63 1 fX,Y (x, y) = e √ 87.36 0.19 π

x−5.3 1.4

+2

* +* + * +2 > y−88.6 +0.18 x−5.3 + y−88.6 31.2 31.2 1.4

.

10-67

Review Exercises for Chapter 10

b) From Proposition 10.10(a) on page 616, ( ) ( ) X ∼ N µX , σX2 = N 5.3, 1.42 .

The ages of Orions are normally distributed with a mean of 5.3 years and a standard deviation of 1.4 years. c) From Proposition 10.10(b), ( ) ( ) Y ∼ N µY , σY2 = N 88.6, 31.22 .

The prices of Orions are normally distributed with a mean of $8860 and a standard deviation of $3120. d) From Proposition 10.10(c), ρ(X, Y ) = ρ = −0.9. There is a strong negative correlation between age and price of Orions. e) We have µY + ρ

σY 31.2 (x − µX ) = 88.6 − 0.9 · (x − 5.3) = 194.90 − 20.06x σX 1.4

and

( ) ( ) σY2 1 − ρ 2 = 31.22 1 − (−0.9)2 = 184.95.

Hence, from Proposition 10.10(d), Y|X=x ∼ N (194.90 − 20.06x, 184.95). The prices of x-year-old Orions are normally distributed with a mean of $(19490 − 2006x) and a standard deviation of $1360. f) From part (e), the regression equation is y = 194.90 − 20.06x. g) Referring to part (f), the predicted price of a 3-year-old Orion is y = 194.90 − 20.06 · 3 = 134.72, or $13,472. 10.139 We have µX = 3.0,

σX = 0.5,

µY = 2.7,

σY = 0.6,

ρ = 0.5.

a) From the preceding display, < # √ 2σX σY 1 − ρ 2 = 2 · 0.5 · 0.6 1 − (0.5)2 = 0.3 3, ( ) ( ) 2 1 − ρ 2 = 2 1 − (0.5)2 = 2(0.75) = 1.5, 2ρ = 2 · (0.5) = 1.

Therefore, from Definition 10.4 on page 610, fX,Y (x, y) =

1 e √ 0.3 3 π

− 23

!*

x−3.0 0.5

+2 * +* + * + > y−2.7 y−2.7 2 − x−3.0 + 0.5 0.6 0.6

.

b) We have

σY 0.6 (x − µX ) = 2.7 + 0.5 · (x − 3.0) = 0.9 + 0.6x σX 0.5 Hence, the predicted cumulative GPA at the end of the sophomore year for an ASU student whose high-school GPA was 3.5 is 0.9 + 0.6 · 3.5 = 3.0. c) We have ( ) ( ) σY2 1 − ρ 2 = 0.62 1 − (0.5)2 = 0.27. µY + ρ

Hence, from Proposition 10.10(d) on page 616 and part (b), Y|X=3.5 ∼ N (3.0, 0.27). Referring to Table I on page A-39, we see that the probability a normally distributed random variable takes a value within 1.645 standard deviations of its mean is 0.90. Consequently, the required prediction interval √ is 3.0 ± 1.645 · 0.27, or (2.15, 3.85).

10-68

CHAPTER 10

Expected Value of Continuous Random Variables

10.140 Let X denote the weekly demand, in hundreds of pounds, for unbleached enriched whole wheat flour and let Y denote the weekly profit, in dollars, due to sales of the flour. We know that X ∼ Beta(2, 2). Note that, because the maximum value of a beta random variable is 1, we can assume that a < 1. Now, the amount sold by the end of the week is min{X, a}. Therefore, Y = 21 min{X, a} − 10a.

We want to choose a to maximize E(Y ). Let

g(x) = 21 min{x, a} − 10a =

!

21x − 10a, if x < a; 11a, if x ≥ a.

Then Y = g(X) and, hence, from the FEF and Equation (8.55) on page 457, " ∞ " a " ( ) E(Y ) = E g(X) = g(x)fX (x) dx = (21x − 10a) · 6x(1 − x) dx + =6

"

a

0

0

−∞

1 a

11a · 6x(1 − x) dx

& ' ( ) 23 4 3 x(21x − 10a)(1 − x) dx + 11aP (X > a) = 12a − a + 11a 1 − 3a 2 + 2a 3 2

21 4 a − 21a 3 + 11a. 2 Differentiating the last term in the above display and setting the result equal to 0, we get =

42a 3 − 63a 2 + 11 = 0.

Solving this equation, we find that a ≈ 0.51588. Thus, to maximize expected profit, the manager should stock about 51.6 pounds of the flour at the beginning of each week. 10.141 a) From Example 7.8 on page 343, we know that, in general, X¯ n is an unbiased estimator of the common mean of the random variables X1 , . . . , Xn . In this case, that common mean is (θ − 0.5) + (θ + 0.5) = θ. 2 Hence, X¯ n is an unbiased estimator of θ. b) Observe that if T ∼ U(θ − 0.5, θ + 0.5), then T + 0.5 − θ ∼ U(0, 1). Let X = min{X1 , . . . , Xn } and Y = max{X1 , . . . , Xn }, and note that M = (X + Y )/2. Also, let Uj = Xj + 0.5 − θ for 1 ≤ j ≤ n. We see that U1 , . . . , Un is a random sample from a U(0, 1) distribution. Now let U = min{U1 , . . . , Un }, V = max{U1 , . . . , Un }, and set W = (U + V )/2. Then X = min{X1 , . . . , Xn } = min{U1 + θ − 0.5, . . . , Un + θ − 0.5} = U + θ − 0.5, Y = max{X1 , . . . , Xn } = max{U1 + θ − 0.5, . . . , Un + θ − 0.5} = V + θ − 0.5,

M = (X + Y )/2 = W + θ − 0.5.

Referring Example 10.14(c) on page 590, we find that E(M) = E(W + θ − 0.5) = E(W ) + θ − 0.5 = 0.5 + θ − 0.5 = θ.

Hence, M is an (unbiased estimator of θ. ) ( ) ) ( c) We have Var Xj = Var Uj − 0.5 + θ = Var Uj = 1/12. Hence, 4 5 n n % ( ) ( ) 1 1 1 % Var X¯ n = Var Xj = 2 Var Xj = . n j =1 12n n j =1

Review Exercises for Chapter 10

10-69

Also, from Example 10.14(c), Var(M) = Var(W + θ − 0.5) = Var(W ) = Now,

1 . 2(n + 2)(n + 1)

Var(M) 1/2(n + 2)(n + 1) 6n ( )= = . ¯ 1/12n (n + 2)(n + 1) Var Xn

( ) However, (n + 2)(n + 1) − 6n = n2 − 3n + 2 = (n − 2)(n − 1). Hence, Var(M) = Var X¯ n if n = 1 ( ) or n = 2, and Var(M) < Var X¯ n if n ≥ 3. d) Although, from parts (a) and (b), both M and X¯ n are unbiased estimators of θ, we see from part (c) that M is a better estimator because its variance is smaller than (or, in case n = 1 or n = 2, equal to) that of X¯ n . Thus, on average, M will be closer to (or, in case n = 1 or n = 2, no further from) θ than X¯ n .

Theory Exercises 10.142 We have, for all t, P (X > t) = 1 − FX (t) ≥ 1 − FY (t) = P (Y > t) and P (X < t) = FX (t−) ≤ FY (t−) = P (Y < t).

Therefore, from Exercise 10.45(a) on page 581, " ∞ " E(X) = P (X > t) dt − ≥

"

0



0

P (Y > t) dt −



0 " ∞ 0

P (X < −t) dt P (Y < −t) dt = E(Y ) .

10.143 ( ) a) If E Y 2 = 0, then P (Y = 0) = 1 and the terms on both sides of Schwarz’s inequality equal 0. So, ( ) ( ) assume that E Y 2 ̸= 0 and let h(t) = E (X + tY )2 . We have ) ( ) ( ) ( ) ( 0 ≤ h(t) = E X2 + 2XY t + t 2 Y 2 = E X2 + 2E(XY ) t + E Y 2 t 2 . We see that h is a nonnegative quadratic polynomial and, hence, must have at most one real root. This, in turn, implies that ( )2 ( ) ( ) 2E(XY ) − 4 · E Y 2 E X2 ≤ 0, from which Schwarz’s inequality immediately follows. b) Applying Schwarz’s inequality to X − µX and Y − µY , we obtain ( )2 * ( )+2 ( ) ( ) Cov(X, Y ) = E (X − µX )(Y − µY ) ≤ E (X − µX )2 E (Y − µY )2 = Var(X) Var(Y ) . Consequently,

( )2 ρ(X, Y ) =

from which we conclude that |ρ(X, Y ) | ≤ 1.

(

)2 Cov(X, Y ) ≤ 1, Var(X) Var(Y )

10-70

CHAPTER 10

Expected Value of Continuous Random Variables

10.144 Let Z = Y − X. By assumption, Z is a positive random variable, that is, P (Z > 0) = 1. H∞ For n ∈ N , let An = {Z ≥ 1/n}. Then A1 ⊂ A2 ⊂ · · · and n=1 An = {Z > 0}. Applying the continuity property of a probability measure, we get limn→∞ P (An ) = P (Z > 0) = 1. In particular, then, we have P (Z ≥ 1/n) > 0 for sufficiently large n. Choose n0 ∈ N such that P (Z ≥ 1/n0 ) > 0 and let 0 < z0 < 1/n0 . Then, P (Z > z0 ) ≥ P (Z ≥ 1/n0 ) > 0.

Consequently, from Proposition 10.5 on page 577, " " ∞ P (Z > z) dz ≥ E(Y ) − E(X) = E(Z) = 0

0

z0

P (Z > z) dz ≥ z0 P (Z > z0 ) > 0.

Hence, E(X) < E(Y ). We note that this result holds true if only X ≤ Y and P (X < Y ) > 0. 10.145 a) Because the range of X is contained in I , it follows from the monotonicity property of expected value that µX ∈ I . From Taylor’s formula, for each x ∈ I , there is a number ξx between µX and x (and, hence, in I ) such that g ′′ (ξx ) g(x) = g(µX ) + g ′ (µX )(x − µX ) + (x − µX )2 . 2 Thus, because g ′′ ≥ 0 on I , we have g(x) ≥ g(µX ) + g ′ (µX )(x − µX ) for all x ∈ I and, because the range of X is contained in I , we conclude that g(X) ≥ g(µX ) + g ′ (µX )(X − µX ). Consequently, by the monotonicity property of expected value, ( ) ( ) E g(X) ≥ E g(µX ) + g ′ (µX )(X − µX ) = g(µX ) + g ′ (µX )E(X − µX ) ( ) = g (µX ) + g ′ (µX ) · 0 = g E(X) .

b) Applying part (a) to the function −g gives ( ) ( ) ( ) ( ) ( ) −g E(X) = (−g) E(X) ≤ E (−g)(X) = E − g(X) = −E g(X) . ( ) ( ) Therefore, g E(X) ≥ E g(X) . c) Let X denote the yield of the equity investment, and let Ecd and Ee denote the expected utility of the certificate of deposit and equity investment, respectively. Then ( ) Ecd = u(r) and Ee = E u(X) . √ First suppose that the investor uses the utility function u(x) = x. This function is concave on (0, ∞) because its second derivative is negative thereon. Hence, by part (b), ( ) ( ) Ecd = u(r) = u E(X) ≥ E u(X) = Ee .

Consequently, in this case, the investor will choose the certificate of deposit. Next suppose that the investor uses the utility function u(x) = x 3/2 . This function is convex on (0, ∞) because its second derivative is positive thereon. Hence, by part (a), ( ) ( ) Ecd = u(r) = u E(X) ≤ E u(X) = Ee . Consequently, in this case, the investor will choose the equity investment.

Review Exercises for Chapter 10

10-71

Advanced Exercises 10.146 a) We begin by observing that µY = E(X) + E(Z) = µ and σY2 = Var(X) + Var(Z) = σ 2 + 1. Also, Cov(X, Y ) = Cov(X, X + Z) = Cov(X, X) + Cov(X, Z) = Var(X) + 0 = Var(X) so that

It follows that

Cov(X, Y ) Var(X) σ . ρ = ρ(X, Y ) = √ =√ =√ Var(X) Var(Y ) Var(X) Var(Y ) σ2 + 1 1 − ρ2 = 1 −

and

1 σ2 = 2 2 σ +1 σ +1

<

# 1 = 2πσ. 2πσX σY 1 − ρ 2 = 2πσ σ 2 + 1 · √ 2 σ +1 Now, let U = X and Y = X + Z. The Jacobian determinant of the transformation u = x and y = x + z is 0 0 01 00 0 0 = 1. J (x, z) = 0 1 10

Solving the equations u = x and y = x + z for x and z, we obtain the inverse transformation x = u and z = y − u. Therefore, by the bivariate transformation theorem and the independence of X and Z, fU,Y (u, y) =

In other words,

1 1 fX,Y (x, z) = fX (x)fZ (z) = fX (u)fZ (y − u). |J (x, z)| 1

1 1 2 2 2 fX,Y (x, y) = fX (x)fZ (y − x) = √ e−(x−µ) /2σ · √ e−(y−x) /2 2π σ 2π ( ( ) ) 1 − (x−µ)2 +σ 2 (y−x)2 /2σ 2 1 2 2 2 2 # = e = e− (x−µ) +σ (y−x) /2σ . 2πσ 2πσX σY 1 − ρ 2

However,

( )2 (y − x)2 = (y − µ) − (x − µ) = (x − µ)2 − 2(x − µ)(y − µ) + (y − µ)2

and from that fact we get

(x − µ)2 + σ 2 (y − x)2 σ2 -& & ' & ' @ ' '& ( 2 ) x−µ y−µ 2 y−µ x−µ 2 2σ + √ = σ +1 −√ √ σ σ σ2 + 1 σ2 + 1 σ2 + 1 -& ' & '& ' & ' @ 1 x − µX 2 x − µX y − µY y − µY 2 . = − 2ρ + σX σX σY σY 1 − ρ2 Therefore, X and Y are bivariate normal random variables.

10-72

CHAPTER 10

Expected Value of Continuous Random Variables

b) Let Z1 be the standardized version of X and let Z2 = Z. Then Z1 and Z2 are independent standard normal random variables and X = µ + σ Z1 + 0 · Z2 Y = µ + σ Z1 + 1 · Z2 .

Therefore, from the argument used after Equations (10.55), we conclude that X and Y are bivariate normal random variables. c) By the prediction theorem, the best predictor is E(X | Y ). From part (a) or (b) and Proposition 10.10(e) on page 616, we deduce that E(X | Y ) = µX + ρ

σX σ µ + σ 2Y σ (y − µY ) = µ + √ (Y − µ) = . √ σY 1 + σ2 1 + σ2 1 + σ2

10.147 Let X denote the height, in inches, of the 25-year-old man, let Y denote the reported height, in inches, and let Z be the measurement ( ) error, in inches. We know that X and Z are independent ran2 dom variables and that X ∼ N 69, 2.5 , Z ∼ N (0, 1), and Y = X + Z. Applying Exercise 10.146(c) with µ = 69, σ = 2.5, and Y = 72, we find that the best predictor of the man’s height is µ + σ 2Y 69 + 2.52 · 72 = = 71.6 inches. 1 + σ2 1 + 2.52 10.148 Let F and I f denote the common CDF and PDF, respectively, of the Xj s. We first note that {N > n} = jn=2 {Xj ≤ X1 }. Therefore, for n ≥ 2, P (N > n) = P (X2 ≤ X1 , . . . , Xn ≤ X1 ) " " = ··· fX1 ,...,Xn (x1 , . . . , xn ) dx1 · · · dxn xj ≤x1 , 2≤j ≤n

= = =

"

···

"

xj ≤x1 , 2≤j ≤n

" "

∞ −∞ ∞

−∞

&"

f (x1 ) · · · f (xn ) dx1 · · · dxn

x1

−∞

f (x2 ) dx2 · · ·

"

'

x1

−∞

)n−1 ( f (x1 ) dx1 = F (x1 )

f (xn ) dxn f (x1 ) dx1

"

1 0

un−1 du =

1 , n

where in the penultimate equation we made the substitution u = F (x1 ). Recalling that we conclude from Proposition 7.6 on page 347 that N doesn’t have finite expectation.

6∞

−1 n=1 n

= ∞,

10.149 Let Sn = X1 + · · · + Xn . We begin by observing that, because the Xj s are positive random variables, {N > n} = {Sn ≤ 1}. We claim that, for n ∈ N , P (Sn ≤ x) =

xn , n!

0 ≤ x ≤ 1.

(∗)

We use induction. Let 0 ≤ x ≤ 1. Because X1 ∼ U(0, 1), P (S1 ≤ x) = P (X1 ≤ x) = x. Thus Equation (∗) holds for n = 1. Assume now that Equation (∗) holds for n. As X1 , X2 , . . . are independent random variables, Proposition 6.13 on page 297 implies that Sn and Xn+1 are also independent random

10-73

Review Exercises for Chapter 10

variables. Hence, by Equation (9.46) on page 534, we have, for 0 < y < 1, " ∞ " y yn fSn+1 (y) = fSn (u)fXn+1 (y − u) du = fSn (u) · 1 du = P (Sn ≤ y) = . n! −∞ 0 Therefore, for 0 ≤ x ≤ 1, P (Sn+1 ≤ x) =

"

0

x

fSn+1 (y) dy =

"

x 0

yn x n+1 dy = , n! (n + 1)!

which is Equation (∗) for n + 1. In particular, we have shown that P (N > n) = P (Sn ≤ 1) =

1 , n!

n ∈ N.

a) Applying Proposition 7.6 on page 347, we get E(N) = b) We have

∞ % n=0

P (N > n) =

P (N = n) = P (N > n − 1) − P (N > n) = Hence,

∞ % 1 = e. n! n=0

1 n−1 1 − = , (n − 1)! n! n!

∞ ∞ % ( ) % E N2 = n2 P (N = n) = n2 P (N = n) n=1

=

∞ % n=2

n ∈ N.

n=2

n(n − 1)P (N = n) +

From part (a), E(N) = e. Also,

∞ % n=2

n(n − 1)P (N = n) = n(n − 1) Thus, ∞ % n=2

nP (N = n) =

∞ % n=2

n(n − 1)P (N = n) + E(N) .

n−1 n−2 1 n−1 = = + . n! (n − 2)! (n − 2)! (n − 2)!

n(n − 1)P (N = n) =

∞ % n=3

∞ % 1 1 + = 2e. (n − 3)! n=2 (n − 2)!

( ) So, E N 2 = 3e and, consequently, Var(N) = 3e − e2 = (3 − e)e.

10.150 Use a basic random number generator to choose a random (uniform) number, u1 , between 0 and 1. Next choose another such random number, u2 . If u1 + u2 > 1, set N1 = 2; otherwise, choose another such random number, u3 . If u1 + u2 + u3 > 1, set N1 = 3; otherwise, choose another random number, u4 , and so forth. The integer N1 thus obtained has the same probability distribution as the random variable N in Exercise 10.149. Now repeat this entire procedure a large number of times, say, 10,000, to obtain integers N1 , N2 , . . . , N10,000 . These integers represent 10,000 independent observations of the random variable N. Hence, by the long-run-average interpretation of expected value and Exercise 10.149(a), N1 + · · · + N10,000 ≈ E(N) = e. 10,000

10-74

CHAPTER 10

Expected Value of Continuous Random Variables

Note: For those readers who have access to Minitab statistical software, we have provided the following macro to perform the simulation: MACRO ESIM MCOLUMN C D MCONSTANT s x n i j k BRIEF 2 GPAUSE NOTE NOTE This macro simulates e. NOTE How many observations should be used for the simulation? SET C; FILE ’TERMINAL’; NOBS 1. COPY C n DO i=1:n LET s=0 LET j=0 WHILE s 0, and fY (y) = 0 otherwise. Therefore, upon making the substitution x = ln y, we get & ∞ & ∞ 1 2 2 n n E(Y ) = y fY (y) dy = √ y n−1 e−(ln y−µ) /2σ dy 2π σ 0 −∞ & ∞ & ∞ $ % $ x %n −(x−µ)2 /2σ 2 1 1 2 2 =√ e e dx = √ e− (x−µ) /2σ −nx . 2π σ −∞ 2π σ −∞

11.1

11-5

Moment Generating Functions

Doing some algebra and completing the square we find that ( $ %)2 ( ) x − µ + nσ 2 2 2 (x − µ)2 /2σ 2 − nx = − nµ + n σ /2 . 2σ 2 % $ 2 2 As the PDF of a N µ + nσ 2 , σ 2 distribution integrates to 1, we conclude that E(Y n ) = enµ+n σ /2 . % $ b) Let X = ln Y , so that X ∼ N µ, σ 2 . From the FEF, & ∞ & ∞ ($ % ) $ nX % $ n% 1 2 2 n nx X =E e e fX (x) dx = √ enx e−(x−µ) /2σ dx. E Y =E e = 2π σ −∞ −∞

The integral on the right of the preceding display is the same as the one that we evaluated in part (a). 2 2 Hence, we conclude that E(Y n ) =$ enµ+n% σ /2 . c) Let X = ln Y , so that X ∼ N µ, σ 2 . Referring to Example 11.5 on page 633, we find that ($ % ) $ % % $ 2 2 2 2 n E Y n = E eX = E enX = MX (n) = eµn+σ n /2 = enµ+n σ /2 . d) Making the substitution x = ln y, we get &



−∞

&



1 2 2 ety √ e−(ln y−µ) /2σ dy 2π σy 0 & ∞ & ∞ 1 1 x 2 2 x =√ ete e−(x−µ) /2σ dx = √ ete −g(x) dx, 2π σ −∞ 2π σ −∞

ty

e fY (y) dy =

where, for convenience, we have set g(x) = (x − µ)2 /2σ 2 . For t ≤ 0, & ∞ & ∞ √ tex −g(x) e dx ≤ e−g(x) dx = 2π σ < ∞. −∞

−∞

Now suppose that t > 0. Using L’Hôpital’s rule, we find that g(x)/tex → 0 as x → ∞ and, hence, that tex − g(x) > tex /2 > x for sufficiently large x, say, for x ≥ M. Therefore, & ∞ & ∞ & ∞ x x ete −g(x) dx ≥ ete −g(x) dx ≥ ex dx = ∞. −∞

M

M

Referring now to the FEF, we conclude that MY (t) is defined if and only if t ≤ 0. e) No. As we have shown in parts (c) and (d), a lognormal random variable has moments of all orders, but its MGF is not defined in any open interval containing 0. 11.9 a) We have lim

n→∞

!

1 −1 + n

"

= −1

and

lim

n→∞

!

1 1− n

"

= 1.

Thus, heuristically, we would say that the probability distribution of a random variable X to which {Xn }∞ n=1 converges in distribution is the discrete uniform distribution on {−1, 1}. b) Let Y have the discrete uniform distribution on {a, b}. Then ) # 1 1 1 ( at MY (t) = ety pY (y) = eat + ebt = e + ebt . 2 2 2 y

11-6

CHAPTER 11

Generating Functions and Limit Theorems

Now, let X have the discrete uniform distribution on {−1, 1}. Then, by the continuity of the exponential function, we have, for all t ∈ R, ) 1 ( ) 1 ( (−1+1/n)t e + e(1−1/n)t = lim e(−1+1/n)t + e(1−1/n)t lim MXn (t) = lim n→∞ n→∞ 2 2 n→∞ ( ) % 1 (−1)t 1 $ −t = e e + et = + e(1)t = MX (t). 2 2

Hence, from the continuity theorem (Proposition 11.5 on page 638), we conclude that {Xn }∞ n=1 converges in distribution to X. c) Let Y have the discrete uniform distribution on {a, b}. Then * 0, if y < a; FY (y) = P (Y ≤ y) =

1/2, 1,

if a ≤ y < b; if y ≥ b.

Thus, if X has the discrete uniform distribution on {−1, 1}, * 0, if x < −1; FX (x) = 1/2, if −1 ≤ x < 1; 1, if x ≥ 1.

Observe that FX is continuous except at −1 and 1. Now, we have * 0, if x < −1 + 1/n; FXn (x) = 1/2, if −1 + 1/n ≤ x < 1 − 1/n; 1, if x ≥ 1 − 1/n. To determine limn→∞ FXn (x), we consider three cases. Case 1: x ≤ −1

In this case, we have x < −1 + 1/n for all n ∈ N , so that FXn (x) = 0 for all n ∈ N . Consequently, we see that limn→∞ FXn (x) = 0.

Case 2: −1 < x < 1

In this case, for sufficiently large n, we have −1 + 1/n < x < 1 − 1/n, so that FXn (x) = 1/2. Consequently, we see that limn→∞ FXn (x) = 1/2.

Case 3: x ≥ 1

In this case, we have x > 1 − 1/n for all n ∈ N , so that FXn (x) = 1 for all n ∈ N . Consequently, we see that limn→∞ FXn (x) = 1. We have now shown that

lim FXn (x) =

n→∞

*

0, if x ≤ −1; 1/2, if −1 < x < 1; 1, if x ≥ 1.

This limit agrees with FX (x) except when x = −1. In particular, then, limn→∞ FXn (x) = FX (x) for all x ∈ C FX . Thus, by definition, {Xn }∞ n=1 converges in distribution to X.

11.10 a) Let ϵ > 0 be small. The probability that a U(0, 1) random variable takes a value less than ϵ equals ϵ, which is positive. Consequently, by Proposition 4.6 on page 155, eventually one of the Xj s will be less than ϵ. Heuristically, then, we would expect Un to converge in distribution to 0.

11.1

Moment Generating Functions

11-7

b) Let U be the random variable that equals 0 with probability 1. Then, + 0, if u < 0; FU (u) = 1, if u ≥ 0. From Example 9.9(c) on page 512, we find that * 0, FUn (u) = 1 − (1 − u)n , 1,

It follows that lim FUn (u) =

n→∞

+

0, 1,

if u < 0; if 0 ≤ u < 1; if u ≥ 1. if u ≤ 0; if u > 0.

This limit agrees with FU (u) except when u = 0, which is a point of discontinuity of FU . In particular, then, limn→∞ FUn (u) = FU (u) for all u ∈ C FU . Hence, by definition, {Un }∞ n=1 converges in distribution to U . c) Let ϵ > 0 be small. The probability that a U(0, 1) random variable takes a value greater than 1 − ϵ equals ϵ, which is positive. Consequently, by Proposition 4.6 on page 155, eventually one of the Xj s will be greater than 1 − ϵ. Heuristically, then, we would expect Vn to converge in distribution to 1. Let V be the random variable that equals 1 with probability 1. Then, + 0, if v < 1; FV (v) = 1, if v ≥ 1. From Exercise 9.70, we find that

FVn (v) =

*

0, vn, 1,

It follows that lim FVn (v) =

n→∞

+

if v < 0; if 0 ≤ v < 1; if v ≥ 1. 0, 1,

if v < 1; if v ≥ 1.

This limit agrees everywhere with FV (v). In particular, then, limn→∞ FVn (v) = FV (v) for all v ∈ C FV . Hence, by definition, {Vn }∞ n=1 converges in distribution to V . 11.11 a) We have

) ( X1 +···+Xn ) ( t ) ( ¯ n = E e n (X1 +···+Xn ) MX¯ n (t) = E et Xn = E et

$ %n = MX1 +···+Xn (t/n) = MX1 (t/n) · · · MXn (t/n) = M(t/n) .

b) Referring to part (a), we see that $ %n−1 ′ $ %n−1 M (t/n)(1/n) = M ′ (t/n) M(t/n) . MX′¯ (t) = n M(t/n) n

Therefore,

$ % $ %n−1 E X¯ n = MX′¯ (0) = M ′ (0) M(0) = µ · 1n−1 = µ. n

Also,

$ %n−2 ′ $ %n−1 MX′′¯ (t) = M ′ (t/n)(n − 1) M(t/n) M (t/n)(1/n) + M ′′ (t/n)(1/n) M(t/n) n

=

%2 $ $ %n−1 %n−2 1 ′′ n−1$ ′ M (t/n) M(t/n) + M (t/n) M(t/n) . n n

11-8

CHAPTER 11

Generating Functions and Limit Theorems

Therefore, $ % %n−1 %n−2 1 ′′ $ n − 1 $ ′ %2 $ E X¯ n2 = MX′′¯ (0) = + M (0) M(0) M (0) M(0) n n n % n − 1 2 n−2 1 $ 2 1 = µ ·1 + σ + µ2 · 1n−1 = µ2 + σ 2 . n n n

Hence,

( ) ( $ %)2 $ % 1 σ2 Var X¯ n = E X¯ n2 − E X¯ n . = µ2 + σ 2 − µ2 = n n c) Set ψn (t) = ln MX¯ n (t). Then, $ %n ψn (t) = ln MX¯ n (t) = ln M(t/n) = n ln M(t/n). Applying L’Hôpital’s rule, we now get

$ , % − t/n2 M ′ (t/n) M(t/n) ln M(t/n) lim ψn (t) = lim = lim n→∞ n→∞ n→∞ 1/n −1/n2 , = lim tM ′ (t/n) M(t/n) = tM ′ (0)/M(0) = µt. n→∞

Therefore, limn→∞ MX¯ n (t) = eµt . $ % $ % d) Let X be the random variable that equals µ with probability 1. Then MX (t) = E etX = E etµ = eµt . - .∞ It now follows from part (c) and the continuity theorem, Proposition 11.5 on page 638, that X¯ n n=1 converges in distribution to µ.

Theory Exercises 11.12 a) Let X be a bounded discrete random variable, say, P (|X| ≤ M) = 1. Then pX (x) = 0 for |x| > M. Therefore, for each t ∈ R, # # # # |etx |pX (x) = etx pX (x) ≤ e|t||x| pX (x) = e|t||x| pX (x) x

x

x

≤ e|t|M

#

|x|≤M

|x|≤M

pX (x) = e|t|M P (|X| ≤ M) = e|t|M < ∞.

Hence, from the FEF for discrete random variables, etX has finite expectation for all t ∈ R; in other words, the MGF of X is defined for all t ∈ R. b) Let X be a bounded continuous random variable with a PDF, say, P (|X| ≤ M) = 1. From Exercise 10.19, we can assume that fX (x) = 0 for |x| > M. Therefore, for each t ∈ R, & ∞ & ∞ & & ∞ |etx |fX (x) dx = etx fX (x) dx ≤ e|t||x| fX (x) dx = e|t||x| fX (x) dx −∞

−∞

≤ e|t|M

&

−∞

|x|≤M

|x|≤M

fX (x) dx = e|t|M P (|X| ≤ M) = e|t|M < ∞.

Hence, from the FEF for continuous random variables, etX has finite expectation for all t ∈ R; in other words, the MGF of X is defined for all t ∈ R.

11.1

11-9

Moment Generating Functions

c) We know that the MGF of a bounded random variable is defined for all t ∈ R. Therefore, from Proposition 11.2 on page 635, a bounded random variable has moments of all orders. 11.13 Let X1 , . . . , Xm be independent random variables. We claim that MX1 +···+Xm = MX1 · · · MXm . Proceeding by induction, assume that the result is true for m. Now suppose that X1 , . . . , Xm+1 are independent random variables. From Proposition 6.13 on page 297, the random variables X1 + · · · + Xm and Xm+1 are independent. Hence, from the first displayed equation in Proposition 11.4 and the induction assumption,

as required.

MX1 +···+Xm+1 = M(X1 +···+Xm )+Xm+1 = MX1 +···+Xm MXm+1 $ % = MX1 · · · MXm MXm+1 = MX1 · · · MXm+1 ,

11.14 $ %n a) From Example 11.3 on page 631, we have MXn (t) = pn et + 1 − pn . Set ψn (t) = ln MXn (t). Then, $ % % $ ψn (t) = n ln pn et + 1 − pn = n ln 1 + pn (et − 1) $ % % $ ln 1 + (λ/n)(et − 1) t t = n ln 1 + (λ/n)(e − 1) = λ(e − 1) . (λ/n)(et − 1)

Applying the hint, we conclude that limn→∞ ψn (t) = λ(et − 1), from which the required result immediately follows. t b) Let X ∼ P(λ). From Table 11.1 on page 634, we have MX (t) = eλ(e −1) . Referring now to part (a) and the continuity theorem, we conclude that {Xn }∞ n=1 converges in distribution to X, that is, to a Poisson distribution with parameter λ. c) Let X ∼ P(λ) and let x be a nonnegative integer. The discontinuities of FX occur at the nonnegative integers. Consequently, x − 1/2 and x + 1/2 are continuity points of FX . We have pXn (x) = P (Xn = x) = P (x − 1/2 < Xn ≤ x + 1/2) = FXn (x + 1/2) − FXn (x − 1/2). Referring now to part (b), we conclude that % $ lim pXn (x) = lim FXn (x + 1/2) − FXn (x − 1/2) n→∞

n→∞

= FX (x + 1/2) − FX (x − 1/2) = pX (x) = e−λ

λx . x!

d) From part (c), we have, for large n, pXn (x) ≈ e−λ

λx (npn )x = e−npn , x! x!

x = 0, 1, . . . , n.

Because pn = λ/n, we see that pn is small when n is large. Using p instead of pn in the previous display, we get the informal statement of Proposition 5.7. 11.15 a) We know that E(Xn ) = λn and Var(Xn ) = λn . Therefore, Yn =

/ / Xn − λn = − λn + Xn / λn . √ λn

11-10

CHAPTER 11

Generating Functions and Limit Theorems

Applying Proposition 11.1 on page 633 and recalling the MGF of a Poisson random variable, we get that ( / ) % $ t/√λ $ t/√λ √ √ √ % n −1 n −1−t/ λn MYn (t) = e− λn t MXn t/ λn = e− λn t eλn e = eλn e . Taking logarithms gives

( √ / ) ln MYn (t) = λn et/ λn − 1 − t/ λn = t 2

0

√ 1 − 1 − t/ λn . $ √ %2 t/ λn

√ λn

et/

2

Applying the result given in the hint yields limn→∞ ln MYn (t) = t 2 /2. Thus, MYn (t) → et /2 as n → ∞. 2 Now, let Y ∼ N (0, 1). From Example 11.5 on page 633, we know that MY (t) = et /2 . Therefore, from the continuity theorem, {Yn }∞ n=1 converges in distribution to Y , that is, to a standard normal random variable. √ b) Let Y = (X − λ)/ λ. Observing that ) is everywhere continuous, we conclude from part (a) that FY (y) ≈ )(y) for all y ∈ R. Now let x be a nonnegative integer. Then ) ( ) ( pX (x) = P (X = x) = P X ≤ x + 21 − P X ≤ x − 21 0 1 1 0 x + 21 − λ x − 21 − λ =P Y ≤ −P Y ≤ √ √ λ λ 0 0 1 1 x + 21 − λ x − 21 − λ = FY − FY Yn ≤ √ √ λ λ 0 1 0 1 x + 21 − λ x − 21 − λ ≈) −) . √ √ λ λ

Advanced Exercises 11.16 (n) a) From calculus, we have an = MX (0)/n! for n = 0, 1, . . . . Consequently, in view of the moment (n) generation property of MGFs, we have E(Xn ) = MX (0) = n! an for each n ∈ N . b) Applying the geometric series to the MGF of an exponential random variable, we get ∞ ! "n ∞ ! "n # # λ 1 t 1 = = = t n, t < λ. MX (t) = λ−t 1 − t/λ n=0 λ λ n=0 Hence, in view of part (a), we have E(X n ) = n! λ−n for$ all n %∈ N . c) Applying the exponential series to the MGF of a N 0, σ 2 random variable, Y , we get $ 2 2 %n ∞ ∞ # # σ t /2 σ 2n 2n σ 2 t 2 /2 = t . = MY (t) = e n! n! 2n n=0 n=0 Hence, from part (a), $ % σ 2n (2n)! σ 2n E Y 2n = (2n)! = , n! 2n n! 2n

n ∈ N.

11.1

11-11

Moment Generating Functions

$ % $ % Now let X ∼ N µ, σ 2 . Then X − µ ∼ N 0, σ 2 and, so, the previous display shows that the central moments of X are given by $ % (2n)! σ 2n E (X − µ)2n = , n! 2n

n ∈ N.

11.17 As for real-valued random variables, a complex-valued random variable Z has finite expectation if and only of a complex number z. ' if E(|Z|) < ∞, $where, ' itX '% in this case, |z| denotes the modulusitX ' itX ' ' ' ' has finite expectation = 1, we have E e = E(1) = 1 < ∞. Consequently, e Because e for all t ∈ R. 11.18 Analogue of Proposition 11.1: Let X be a random variable and let a and b be real numbers. Then the CHF of the random variable a + bX is given by φa+bX (t) = eiat φX (bt). For the proof, we have ( ) ( ) ( ) φa+bX (t) = E eit (a+bX) = E eiat eitbX = eiat E ei(bt)X = eiat φX (bt). (n)

Analogue of Proposition 11.2: If X has finite nth moment, then E(X n ) = (−i)n φX (0). For the proof, we first note that ! " ( ) ( ) d d ( itX ) ∂ itX ′ φX (t) = φX (t) = E e =E e = E iXeitX = iE XeitX . dt dt ∂t $ % (n) Proceeding inductively, we obtain the relationship φX (t) = i n E X n eitX . In particular, then, ) ( $ % (n) φX (0) = i n E X n ei·0·X = i n E X n , (n)

or, equivalently, E(Xn ) = (−i)n φX (0).

Analogue of Proposition 11.4: If X1 , . . . , Xm are independent random variables, then φX1 +···+Xm = φX1 · · · φXm .

For the proof, we first note that, from (the complex-valued version of) Proposition 6.13 on page 297, eitX1 , . . . , eitXm are also independent random variables. Therefore, for each t ∈ R, ( ) ( ) ( ) φX1 +···+Xm (t) = E eit (X1 +···+Xm ) = E eitX1 +···+itXm = E eitX1 · · · eitXm ( ) ( ) = E eitX1 · · · E eitXm = φX1 (t) · · · φXm (t). In other words, φX1 +···+Xm = φX1 · · · φXm . 11.19 a) Formally, we have

( ) ( ) φX (t) = E eitX = E e(it)X = MX (it).

b) Referring to part (a) and Table 11.1 on page 634, we get the following: ●



If X ∼ B(n, p), then If X ∼ P(λ), then

$ %n φX (t) = MX (it) = peit + 1 − p . φX (t) = MX (it) = eλ

$ it % e −1

.

11-12



CHAPTER 11

Generating Functions and Limit Theorems

If X ∼ G(p), then φX (t) = MX (it) =



If X ∼ U(a, b), then

peit . 1 − (1 − p)eit

φX (t) = MX (it) = ●



If X ∼ E(λ), then

eibt − eiat . (b − a)it

φX (t) = MX (it) =

$ % If X ∼ N µ, σ 2 , then

φX (t) = MX (it) = eµit+σ

λ . λ − it

2 (it)2 /2

= eiµt−σ

2 t 2 /2

.

c) We apply the results from part (b) and the following formula, which we obtained in Exercise 11.18: $ % (n) E Xn = (−i)n φX (0). ●

$ %n X ∼ B(n, p): We have MX (it) = peit + 1 − p . Hence,

and

Therefore, Also, and, hence,



$ %n−1 it %n−1 $ ′ (t) = n peit + 1 − p pe i = inpeit peit + 1 − p φX

$ %n−2 it $ %n−1 ′′ φX (t) = inpeit (n − 1) peit + 1 − p pe i + inpeit i peit + 1 − p %n−2 %n−1 $ $ = −n(n − 1)p2 e2it peit + 1 − p − npeit peit + 1 − p . ′ (0) = −i(inp) = np. E(X) = (−i)1 φX

$ % $ % ′′ (0) = − −n(n − 1)p 2 − np = n(n − 1)p2 + np E X2 = (−i)2 φX

%2 $ % $ Var(X) = E X2 − E(X) = n(n − 1)p2 + np − (np)2 = np(1 − p).

X ∼ P(λ): We have φX (t) = eλ

and Therefore, Also, and, hence,

$ it % e −1

′ (t) = eλ φX

$ it % e −1

. Hence,

λeit i = iλeit eλ

$ it % e −1

= iλeit φX (t)

′′ ′ ′ (t) = iλeit φX (t) + iλeit iφX (t) = iλeit φX (t) − λeit φX (t). φX ′ (0) = −i(iλ) = λ. E(X) = (−i)1 φX

$ % $ % ′′ ′ (0) = − iλφX (0) − λ = − (iλiλ − λ) = λ2 + λ E X2 = (−i)2 φX %2 $ % $ Var(X) = E X2 − E(X) = λ2 + λ − λ2 = λ.

11.1



$ % X ∼ G(p): For convenience, let q = 1 − p. We have φX (t) = peit / 1 − qeit . Hence, % $ % $ 1 − qeit peit i − peit −qeit i ipeit ′ φX (t) = =$ %2 $ %2 1 − qeit 1 − qeit

and

Therefore,

$ $ % % %$ $ % it 2 ipeit i − ipeit · 2 1 − qeit −qeit i it 1 + qeit 1 − qe pe ′′ φX (t) = =− $ $ %4 %3 . 1 − qeit 1 − qeit ′ E(X) = (−i)1 φX (0) = −i ·

Also,

ip 1 = . 2 p p

! " $ 2% p(2 − p) 2−p 2 ′′ E X = (−i) φX (0) = − − = 3 p p2

and, hence,



11-13

Moment Generating Functions

! "2 $ 2% $ %2 2−p 1 1−p Var(X) = E X − E(X) = − = . 2 p p p2 $ %,$ % X ∼ U(a, b): We have φX (t) = eibt − eiat (b − a)it . Here, as one possible method of solution, we expand φX (t) in a power series. First, we note that eibt − eiat =

Consequently,

Hence,

∞ # (ibt)n

n!

n=0



∞ # (iat)n n=0

n!

=

∞ # (it)n (bn − a n ) n=1

$ % $ % ∞ # (it)2 b2 − a 2 (it)3 b3 − a 3 (it)n (bn − a n ) = it (b − a) + + + . 2 6 n! n=4

∞ n−1 n eibt − eiat i(a + b) a 2 + ab + b2 2 # i (b − a n ) n−1 =1+ t− t + t . φX (t) = (b − a)it 2 6 n!(b − a) n=4 ′ (t) = φX

and

∞ n−1 # i(a + b) a 2 + ab + b2 i (n − 1) (bn − a n ) n−2 − t+ t 2 3 n!(b − a) n=4

′′ (t) = − φX

Therefore,

∞ n−1 a 2 + ab + b2 # i (n − 1)(n − 2) (bn − a n ) n−3 . + t 3 n!(b − a) n=4

′ E(X) = (−i)1 φX (0) = −i ·

Also,

and, so,

n!

i(a + b) a+b = . 2 2

! 2 " $ 2% a + ab + b2 a 2 + ab + b2 2 ′′ E X = (−i) φX (0) = − − = 3 3 $ % $ %2 a 2 + ab + b2 Var(X) = E X2 − E(X) = − 3

!

a+b 2

"2

=

(b − a)2 . 12

11-14



CHAPTER 11

Generating Functions and Limit Theorems

X ∼ E(λ): We have φX (t) = λ/(λ − it). Hence, ! ′ φX (t) = λ −

and

′′ (t) φX

Therefore,

" 1 iλ (−i) = (λ − it)2 (λ − it)2

!

" −2 −2λ = iλ (−i) = 3 (λ − it) (λ − it)3

′ (0) = −i · E(X) = (−i)1 φX

Also, E and, hence,

$

X2

%

=

Var(X) = E ●

$

!

2λ =− − 3 λ

′′ (0) (−i)2 φX

X2

%

$

− E(X)

%2

2 = 2− λ

$ % 2 2 X ∼ N µ, σ 2 : We have φX (t) = eiµt−σ t /2 . Hence,

=

2 λ2

! "2 1 1 = 2. λ λ

$ % ′ ′′ (t) = iµ − σ 2 t φX (t) − σ 2 φX (t). φX

Therefore,

and, hence,

"

$ % ′ (t) = iµ − σ 2 t φX (t) φX

and

Also,

iλ 1 = . 2 λ λ

′ (0) = −i · iµ = µ. E(X) = (−i)1 φX

$ % $ % $ % ′′ ′ (0) − σ 2 = − iµ · iµ − σ 2 = µ2 + σ 2 (0) = − iµφX E X2 = (−i)2 φX %2 $ % $ Var(X) = E X2 − E(X) = µ2 + σ 2 − (µ)2 = σ 2 .

11.20 Recall that a standard Cauchy random variable has a PDF given by fX (x) = a) We have φX (t) =

&



−∞

1 $ %, π 1 + x2

eitx fX (x) dx =

&

∞ −∞

x ∈ R. eitx %, π 1 + x2 $

t ∈ R.

,( $ %) 2 π 1 + z . First assume Let C denote the set of complex numbers. Define gt : C → C by gt (z) = that t > 0. Let R > 1 and let CR denote the upper half of the circle of radius R centered at the origin. Then & R & π & $ % gt (z) dz = gt (x) dx + i gt Reiθ Reiθ dθ. CR

−R

0

eitz

11.1

11-15

Moment Generating Functions

Now, for 0 < θ < π, ' ' ' ' ' ' ' itReiθ ' ' itR(cos θ+i sin θ ) ' ' itR cos θ −tR sin θ ' e 'e ' = 'e ' = 'e ' = e−tR sin θ ≤ 1,

where the inequality follows from the fact that t > 0. Also, ' ' ' ' '1 + R 2 e2iθ ' = '1 + R 2 cos(2θ) + iR 2 sin(2θ)' ($ )1/2 %2 $ %2 )1/2 ( = 1 + R 2 cos(2θ) + R 2 sin(2θ) = 1 + 2R 2 cos(2θ) + R 4 ( )1/2 ≥ 1 − 2R 2 + R 4 = R 2 − 1.

Therefore,

and, consequently,

'& ' ' '

π 0

' ' ' itReiθ $ iθ %' '' 1 e ' %' ≤ $ % |gt Re ' = ' $ 2 2iθ 2 ' π R −1 'π 1 + R e

' & $ iθ % iθ ' gt Re Re dθ '' ≤ R

0

π'

$ %' 'gt Reiθ ' dθ ≤

R . −1

R2

As the term on the right of the preceding display converges to 0 as R → ∞, we conclude that & π & & R $ % gt (z) dz = lim gt (x) dx + lim i gt Reiθ Reiθ dθ lim R→∞ CR

R→∞ −R

=

&



−∞

R→∞

0

gt (x) dx + 0 = φX (t).

On the other hand, the function gt is analytic in the upper half plane, except for a simple pole at z = i. The residue there is e−t eitz = . lim (z − i)gt (z) = lim z→i z→i π(i + z) 2πi Hence, by the residue theorem, & e−t gt (z) dz = 2πi · = e−t . 2πi CR

Thus, we have shown that φX (t) = e−t if t > 0. By using an entirely similar argument, but with CR the lower half of the circle of radius R centered at the origin, we can show that φX (t) = et if t < 0. Consequently, φX (t) = e−|t| for all t ∈ R. b) Using the independence of the Xj s and referring to part (a), we get ( ) ( X1 +···+Xn ) ( t ) ¯ n φX¯ n (t) = E eit Xn = E eit = E ei n (X1 +···+Xn ) = φX1 +···+Xn (t/n) ( )n ( )n = φX1 (t/n) · · · φXn (t/n) = e−|t/n| = e−|t|/n = e−|t| .

Hence, from the uniqueness theorem for characteristic functions and part (a), we conclude that X¯ n has the standard Cauchy distribution. c) Let X1 , . . . , Xn represent n independent observations of a random variable X. The long-run-average interpretation of expected value is that, for large n, we have X¯ n ≈ E(X). In particular, this interpretation suggests that X¯ n will approach a single number, namely, E(X), as n increases without bound. Now, from part (b), if X has the standard Cauchy distribution, so does X¯ n for all n. But, then, X¯ n won’t approach a

11-16

CHAPTER 11

Generating Functions and Limit Theorems

single number as n increases without bound. This result, however, does not violate the long-run-average interpretation of expected value because, as we discovered in Example 10.5 on page 568, a random variable with the standard Cauchy distribution does not have a mean. d) We recall from Exercise 8.153(b) that, if X ∼ C(η, θ), then Y = (X − η)/θ ∼ C(0, 1). From part (a), we have φY (t) = e−|t| . Referring now to Exercise 11.18, we conclude that φX (t) = φη+θ Y (t) = eiηt φY (θt) = eiηt e−|θ t| = eiηt e−θ|t| = eiηt−θ |t| .

11.2 Joint Moment Generating Functions Basic Exercises 11.21 a) We have pX,Y (x, y) =

+

1/4, 0,

if (x, y) = (0, 0), (1, 0), (0, 1), or (1, 1); otherwise.

Therefore, MX,Y (s, t) = =

## (x,y)

esx+ty pX,Y (x, y) = es·0+t·0 ·

1 1 1 1 + es·1+t·0 · + es·0+t·1 · + es·1+t·1 · 4 4 4 4

% 1 1 s 1 t 1 s+t 1$ + e + e + e = 1 + es + et + es+t . 4 4 4 4 4

b) Referring to part (a), we get

% 1$ s ∂MX,Y (s, t) = e + es+t , ∂s 4 % 1$ s ∂ 2 MX,Y s+t (s, t) = + e e , 4 ∂s 2

% ∂MX,Y 1$ t (s, t) = e + es+t , ∂t 4 % 1$ t ∂ 2 MX,Y s+t (s, t) = + e e , 4 ∂t 2

1 ∂ 2 MX,Y (s, t) = es+t . ∂s∂t 4

Therefore, from Proposition 11.6 on page 643, E(X) =

1 1 ∂MX,Y (0, 0) = · 2 = , ∂s 4 2

$ % ∂ 2 MX,Y 1 1 (0, 0) = · 2 = , E X2 = 4 2 ∂s 2 E(XY ) =

E(Y ) =

∂MX,Y 1 1 (0, 0) = · 2 = , ∂t 4 2

$ % ∂ 2 MX,Y 1 1 E Y2 = (0, 0) = · 2 = , 4 2 ∂t 2

1 1 ∂ 2 MX,Y (0, 0) = · 1 = . ∂s∂t 4 4

11.2

11-17

Joint Moment Generating Functions

Also, ! "2 1 1 = , 2 4 ! "2 $ % $ %2 1 1 1 Var(Y ) = E Y 2 − E(Y ) = − = , 2 2 4

%2 $ % $ 1 Var(X) = E X2 − E(X) = − 2

Cov(X, Y ) = E(XY ) − E(X) E(Y ) =

1 1 1 − · = 0. 4 2 2

c) From Proposition 11.7 on page 645 and part (a), MX (s) = MX,Y (s, 0) =

and

MY (t) = MX,Y (0, t) = d) From parts (a) and (c), MX,Y (s, t) =

% 1$ % 1$ 1 + es + 1 + es = 1 + es 4 2

% 1$ % 1$ 1 + 1 + et + et = 1 + et . 4 2

% 1$ 1 + es + et + es+t = 4

!

"! " % % 1$ 1$ 1 + es 1 + et = MX (s)MY (t). 2 2

Hence, from Proposition 11.9 on page 646, X and Y are independent random variables. e) From part (c) and Table 11.1 on page 634, we see that both X and Y have the B(1, 1/2) distribution. Equivalently, they both have the Bernoulli distribution with parameter 1/2 or the discrete uniform distribution on {0, 1}. 11.22 a) We have fX,Y (x, y) = 1 if 0 < x < 1 and 0 < y < 1, and fX,Y (x, y) = 0 otherwise. Therefore, if s ̸= 0 and t ̸= 0, MX,Y (s, t) = = =

&



&



e

sx+ty

−∞ −∞

&

1

e

0

!

sx

0&

et − 1 t

1 0

"!

fX,Y (x, y) dx dy =

ty

1

e dy dx = es − 1 s

"

=

&

0

1

e

sx

!

&

0

1& 1 0

et − 1 t

esx+ty · 1 dx dy

"

dx =

!

et − 1 t

es+t − es − et + 1 . st

The other three cases are handled similarly, and we get ⎧ s+t − es − et + 1 e ⎪ ⎪ , if s ̸= 0 and t ⎪ ⎪ ⎪ st ⎪ ⎪ ⎪ ⎨ es − 1 , if s ̸= 0 and t MX,Y (s, t) = s ⎪ ⎪ ⎪ et − 1 ⎪ ⎪ , if s = 0 and t ⎪ ⎪ ⎪ ⎩ t 1, if s = 0 and t

̸= 0; = 0; ̸= 0; = 0.

"&

1 0

esx dx

11-18

CHAPTER 11

Generating Functions and Limit Theorems

b) There are several ways to solve this problem. We will use the most straightforward approach. From part (a), we have, for s ̸= 0, MX,Y (s, 0) − MX,Y (0, 0) =

es − 1 es − 1 − s −1= . s s

Therefore, from Proposition 11.6 on page 643, E(X) =

MX,Y (s, 0) − MX,Y (0, 0) es − 1 − s 1 ∂MX,Y (0, 0) = lim = lim = , 2 s→0 s→0 ∂s s 2 s

where the last equality follows by two applications of L’Hôpital’s rule. By symmetry, E(Y ) = 1/2. Again referring to part (a), we get, for s ̸= 0, ∂MX,Y ses − es + 1 1 2ses − 2es + 2 − s 2 ∂MX,Y (s, 0) − (0, 0) = − = . ∂s ∂s 2 s2 2s 2 Hence, from Proposition 11.6,

$ % ∂ 2 MX,Y (0, 0) = E X2 = ∂s 2



!

∂MX,Y ∂s ∂s

"

∂MX,Y ∂MX,Y (s, 0) − (0, 0) ∂s (0, 0) = lim ∂s s→0 s

2ses − 2es + 2 − s 2 1 = , 3 s→0 3 2s

= lim

where the last equality follows by three applications of L’Hôpital’s rule. Thus, %2 $ % $ 1 Var(X) = E X 2 − E(X) = − 3

! "2 1 1 = . 2 12

By symmetry, Var(Y ) = 1/12. Again referring to part (a), we get, for s ̸= 0 and t ̸= 0,

es+t − es − et + 1 es − 1 − st s ! " ! " es − 1 et − 1 − t es − 1 et − 1 −1 = . = s t s t

MX,Y (s, t) − MX,Y (s, 0) =

Therefore, ∂MX,Y es − 1 MX,Y (s, t) − MX,Y (s, 0) et − 1 − t es − 1 (s, 0) = lim = lim , = t→0 ∂t t s t→0 2s t2 where the the last equality follows by two applications of L’Hôpital’s rule. Consequently, es − 1 − s ∂MX,Y es − 1 1 ∂MX,Y − = (s, 0) − (0, 0) = 2s 2 2s ∂t ∂t

11.2

Joint Moment Generating Functions

11-19

and, hence, from Proposition 11.6,

∂ 2 MX,Y E(XY ) = (0, 0) = ∂s∂t



!

∂MX,Y ∂t ∂s

"

∂MX,Y ∂MX,Y (s, 0) − (0, 0) ∂t ∂t (0, 0) = lim s→0 s

es − 1 − s 1 = , 2 s→0 4 2s

= lim

where the the last equality follows by two applications of L’Hôpital’s rule. Thus, Cov(X, Y ) = E(XY ) − E(X) E(Y ) =

1 1 1 − · = 0. 4 2 2

c) Referring to Proposition 11.7 on page 645 and part (a), we get +$ % s − 1 /s, e MX (s) = MX,Y (s, 0) = 1, and +$ % t MY (t) = MX,Y (0, t) = e − 1 /t, 1,

if s = ̸ 0; if s = 0. if t = ̸ 0; if t = 0.

d) From parts (c) and (a), +$ +$ % % s − 1 /s, if s ̸= 0; t − 1 /t, if t ̸= 0; e e MX (s)MY (t) = · 1, if s = 0. 1, if t = 0. " ! t " ⎧! s e −1 e −1 ⎪ ⎪ · , if s ̸= 0 and t ̸= 0; ⎪ ⎪ s t ⎪ ⎪ ⎪ ⎪ ⎨ es − 1 , if s ̸= 0 and t = 0; = s ⎪ ⎪ ⎪ et − 1 ⎪ ⎪ ⎪ , if s = 0 and t ̸= 0; ⎪ ⎪ t ⎩ 1, if s = 0 and t = 0. ⎧ s+t e − es − et + 1 ⎪ ⎪ , if s ̸= 0 and t ̸= 0; ⎪ ⎪ ⎪ st ⎪ ⎪ ⎪ ⎨ es − 1 , if s ̸= 0 and t = 0; = s ⎪ ⎪ ⎪ et − 1 ⎪ ⎪ , if s = 0 and t ̸= 0; ⎪ ⎪ ⎪ ⎩ t 1, if s = 0 and t = 0. = MX,Y (s, t).

Therefore, from Proposition 11.9 on page 646, X and Y are independent random variables. e) From part (c) and Table 11.1 on page 634, we see that both X and Y have the U(0, 1) distribution. 11.23 Let X and Y be as in Exercise 11.22. From parts (d) and (e) of that exercise, we know that X and Y are independent U(0, 1) random variables. Consequently, from Example 9.19, X + Y ∼ T (0, 2).

11-20

CHAPTER 11

Generating Functions and Limit Theorems

However, from Proposition 11.10 on page 647 and part (a) of Exercise 11.22, we have, for t ̸= 0, et+t − et − et − 1 e2t − 2et − 1 = t ·t t2 " ! t "! t 2et 2e e + e−t − 1 = = (cosh t − 1). 2 t2 t2

MX+Y (t) = MX,Y (t, t) =

Hence, the MGF of a triangular distribution on the interval (0, 2) is ⎧ t ⎨ 2e (cosh t − 1), if t ̸= 0; M(t) = t2 ⎩ 1, if t = 0.

11.24 a) We have $ %−1 ∂MX,Y ∂ ln MX,Y 1 ∂MX,Y ∂ψ (s, t) = (s, t) = · (s, t) = MX,Y (s, t) (s, t). ∂t ∂t MX,Y (s, t) ∂t ∂t

Consequently,

$ %−1 ∂ 2 MX,Y $ %−2 ∂MX,Y ∂ 2ψ ∂MX,Y (s, t) = MX,Y (s, t) (s, t) − MX,Y (s, t) (s, t) (s, t). ∂s∂t ∂s∂t ∂s ∂t Setting s = t = 0, we conclude, in view of Proposition 11.6 on page 643, that $ %−1 ∂ 2 MX,Y $ %−2 ∂MX,Y ∂MX,Y ∂ 2ψ (0, 0) = MX,Y (0, 0) (0, 0) − MX,Y (0, 0) (0, 0) (0, 0) ∂s∂t ∂s∂t ∂s ∂t = 1−1 E(XY ) − 1−2 E(X) E(Y ) = E(XY ) − E(X) E(Y ) = Cov(X, Y ) .

b) From Example 11.10 on page 642, we see that

ψ(s, t) = ln MX,Y (s, t) = µX s + µY t + From this result, we get, successively,

and

) 1( 2 2 σX s + 2ρσX σY st + σY2 t 2 . 2

) 1( ∂ψ (s, t) = µY + 2ρσX σY s + 2σY2 t = µY + ρσX σY s + σY2 t ∂t 2 !

∂ψ ∂ 2 ∂ ψ ∂t (s, t) = ∂s∂t ∂s

"

(s, t) = ρσX σY = Cov(X, Y ) .

11.25 a) From Proposition 11.7 on page 645,

$ %n $ %n MX (s) = MX,Y (s, 0) = pes + qe0 = pes + 1 − p .

Referring now to Table 11.1 on page 634, we see that X ∼ B(n, p). Similarly, we find that Y ∼ B(n, q). b) From part (a), we see that $ %n $ %n %n $ MX (s)MY (t) = pes + q p + qet = (pes + q)(p + qet ) $ %n $ %n = p2 es + pqes+t + pq + q 2 et ̸= pes + qet = MX,Y (s, t). Hence, from Proposition 11.9 on page 646, X and Y are not independent random variables.

11.2

11-21

Joint Moment Generating Functions

c) Recalling that p + q = 1, we find, in view of Proposition 11.10 on page 647, that $ %n $ %n MX+Y (t) = MX,Y (t, t) = pet + qet = et = ent .

However, ent is the MGF of a random variable that equals n with probability 1. Thus, by the uniqueness property of MGFs, P (X + Y = n) = 1. In other words, X + Y is a discrete random variable with pX+Y (n) = 1 and pX+Y (z) = 0 if z ̸= n. 11.26 a) Let a and b be two real numbers, not both 0. Referring to Equation (11.15) on page 642, we get ( ) ( ) MaX+bY (t) = E et (aX+bY ) = E e(at)X+(bt)Y = MX,Y (at, bt) 1

= eµX (at)+µY (bt)+ 2 = e(aµX +bµY )t+

$ 2 % σX (at)2 +2ρσX σY (at)(bt)+σY2 (bt)2

$ 2 2 % a σX +2abρσX σY +b2 σY2 t 2 /2

.

Referring now to Table 11.1 on page 634, we conclude that ) ( aX + bY ∼ N aµX + bµY , a 2 σX2 + 2abρσX σY + b2 σY2 .

b) We know that the mean and variance of a normal random variable are its µ and σ 2 parameters, respectively. Hence, from part (a), E(aX + bY ) = aµX + bµY

Var(aX + bY ) = a 2 σX2 + 2abρσX σY + b2 σY2 .

and

11.27 a) We have MX,Y (s, t) = =

&



&



−∞ −∞

&&

e

esx+ty fX,Y (x, y) dx dy

sx+ty

(x − y)e

−(x+y)

&&

dx dy +

esx+ty (y − x)e−(x+y) dx dy.

y>x>0

x>y>0

Upon making the substitution u = x − y and using the fact that the mean of an exponential distribution is the reciprocal of its parameter, we get !& ∞ " & ∞ && esx+ty (x − y)e−(x+y) dx dy = e−(1−t)y e−(1−s)x (x − y) dx dy 0

x>y>0

=

&

y



e

0

1 = 1−s

Similarly, we find that

−(1−t)y

&

∞ 0

1 = (1 − s)2 &&

y>x>0

&

e

!&

ue

0

−(2−s−t)y

∞ 0



−(1−s)(u+y)

!&

∞ 0

u(1 − s)e

e−(2−s−t)y dy =

esx+ty (y − x)e−(x+y) dx dy =

" du dy

1 (1 − s)2 (2 − s

1 (1 − t)2 (2 − s

−(1−s)u

− t)

.

" du dy − t)

.

11-22

CHAPTER 11

Generating Functions and Limit Theorems

Hence, 1 MX,Y (s, t) = 2−s−t

!

1 1 + 2 (1 − s) (1 − t)2

"

,

s, t < 1.

b) From part (a), we find that

and

( ) ( ) ∂MX,Y (s, t) = (2 − s − t)−1 2(1 − s)−3 + (2 − s − t)−2 (1 − s)−2 + (1 − t)−2 , ∂s ) ( ) ( ∂ 2 MX,Y −1 −4 −2 −3 + (2 − s − t) 2(1 − s) (s, t) = (2 − s − t) 6(1 − s) ∂s 2 ( ) ( ) + (2 − s − t)−2 2(1 − s)−3 + 2(2 − s − t)−3 (1 − s)−2 + (1 − t)−2 , ( ) ( ) ∂ 2 MX,Y (s, t) = (2 − s − t)−2 2(1 − s)−3 + (2 − s − t)−2 2(1 − t)−3 ∂t∂s ( ) + 2(2 − s − t)−3 (1 − s)−2 + (1 − t)−2 .

Therefore, from Proposition 11.6 on page 643, E(X) =

and

( ) 3 ∂MX,Y (0, 0) = 2−1 · 2 · 1−3 + 2−2 1−2 + 1−2 = , ∂s 2

( ) 9 $ % ∂ 2 MX,Y −1 −4 −2 −3 −3 −2 −2 (0, 0) = 2 · 6 · 1 + 2 · 2 · 2 · 1 + 2 · 2 1 + 1 = , E X2 = 2 ∂s 2 E(XY ) =

( ) 3 ∂ 2 MX,Y (0, 0) = 2−2 · 2 · 1−3 + 2−2 · 2 · 1−3 + 2 · 2−3 1−2 + 1−2 = . 2 ∂t∂s

By symmetry, E(Y ) = 3/2. Also, Var(X) = E and

$

X2

%

9 − (E(X)) = − 2 2

Cov(X, Y ) = E(XY ) − E(X) E(Y ) =

! "2 3 9 = 2 4

3 3 3 3 − · =− . 2 2 2 4

By symmetry, Var(Y ) = 9/4. c) From Proposition 11.7 on page 645 and part (a), MX (s) = MX,Y (s, 0) = and

1 2−s

1 MY (t) = MX,Y (0, t) = 2−t

! 1+ !

1 (1 − s)2

1 1+ (1 − t)2

"

"

,

s < 1,

,

t < 1.

11.2

Joint Moment Generating Functions

11-23

d) From parts (a) and (c), " ! " 1 1 1 · 1+ 2−t (1 − s)2 (1 − t)2 "! " ! 1 1 1 1 + = 1+ (2 − s)(2 − t) (1 − s)2 (1 − t)2 " ! 1 1 1 ̸= + = MX,Y (s, t). 2 2 − s − t (1 − s) (1 − t)2

1 MX (s)MY (t) = 2−s

! 1+

Therefore, from Proposition 11.9 on page 646, X and Y are not independent random variables. e) From Proposition 11.10 on page 647 and part (a), ! " 1 1 1 1 MX+Y (t) = MX,Y (t, t) = + = . 2 2 2 − t − t (1 − t) (1 − t) (1 − t)3

Referring now to Table 11.1 on page 634, we see that the MGF of X + Y is that of a "(3, 1) distribution. Hence, by the uniqueness property of MGFs, we conclude that X + Y ∼ "(3, 1).

Theory Exercises 11.28 a) We have b) We have

c) We have

( ) ( ) MaX+bY (t) = E et (aX+bY ) = E e(at)X+(bt)Y = MX,Y (at, bt).

( ) ( ) Ma+bX,c+dY (s, t) = E es(a+bX)+t (c+dY ) = E eas+ct e(bs)X+(dt)Y ( ) = eas+ct E e(bs)X+(dt)Y = eas+ct MX,Y (bs, dt). ( ) ( ) MaX+bY,cX+dY (s, t) = E es(aX+bY )+t (cX+dY ) = E e(as+ct)X+(bs+dt)Y = MX,Y (as + ct, bs + dt).

11.29 Analogue of Proposition 11.6: Suppose that the joint MGF of the random variables X1 , . . . , Xm is defined in some open m-dimensional rectangle containing the origin. Then X1 , . . . , Xm have individual and joint moments of all orders, and we have ( ) ∂ j1 +···+jm M X1 ,...,Xm j j E X11 · · · Xmm = (0, . . . , 0), j1 j ∂t1 · · · ∂tmm for each choice of nonnegative integers jk (1 ≤ k ≤ m).

Analogue of Proposition 11.7: Let X1 , . . . , Xm be random variables defined on the same sample space. Then MXj (tj ) = MX1 ,...,Xm (0, . . . , 0, tj , 0, . . . , 0), 1 ≤ j ≤ m.

Thus, for each 1 ≤ j ≤ m, the marginal MGF of Xj is obtained by setting tk = 0 for k ̸= j in the joint MGF.

11-24

CHAPTER 11

Generating Functions and Limit Theorems

Analogue of Proposition 11.8: If the joint MGF of X1 , . . . , Xm equals the joint MGF of Y1 , . . . , Ym in some open m-dimensional rectangle containing the origin, then X1 , . . . , Xm have the same joint probability distribution as Y1 , . . . , Ym . Analogue of Proposition 11.9: Let X1 , . . . , Xm be random variables defined on the same sample space. Then X1 , . . . , Xm are independent if and only if MX1 ,...,Xm (t1 , . . . , tm ) = MX1 (t1 ) · · · MXm (tm ). In words, m random variables are independent if and only if their joint MGF equals the product of their marginal MGFs. Analogue of Proposition 11.10: Let X1 , . . . , Xm be random variables defined on the same sample space. Then MX1 +···+Xm (t) = MX1 ,...,Xm (t, . . . , t). 11.30 The joint MGF of Xk1 , . . . , Xkj is obtained by setting ti = 0 for all i ̸∈ {k1 , . . . , kj } in the joint MGF of X1 , . . . , Xm .

Advanced Exercises 11.31 a) From Exercise 11.28(c) and Equation (11.15) on page 642, 1

MX+Y,X−Y (s, t) = MX,Y (s + t, s − t) = eµX (s+t)+µY (s−t)+ 2 1

= e(µX +µY )s+(µX −µY )t+ 2

$ 2 % σX (s+t)2 +2ρσX σY (s+t)(s−t)+σY2 (s−t)2

$ 2 % (σX +2ρσX σY +σY2 )s 2 +2(σX2 −σY2 )st+(σX2 −2ρσX σY +σY2 )t 2

.

Thus, again referring to Equation (11.15), we see that

where

( ) (X, Y ) ∼ BVN µX + µY , σX2 + 2ρσX σY + σY2 , µX − µY , σX2 − 2ρσX σY + σY2 , ρ ∗ , σX2 − σY2 ρ ∗ = ρ(X + Y, X − Y ) = 6$ %$ %. σX2 + 2ρσX σY + σY2 σX2 − 2ρσX σY + σY2

b) From part (a) and Proposition 10.9 on page 613, X + Y and X − Y are independent if and only if they are uncorrelated. Hence, X + Y and X − Y are independent if and only if ρ ∗ = 0, that is, if and only if Var(X) = Var(Y ). 11.32 Recall from Proposition 6.7 on page 277 that pX1 ,...,Xm (x1 , . . . , xm ) =

!

" n xm px1 · · · pm , x1 , . . . , xm 1

if x1 , . . . , xm are nonnegative integers whose sum is n, and pX1 ,...,Xm (x1 , . . . , xm ) = 0 otherwise.

11.2

Joint Moment Generating Functions

11-25

a) We have MX1 ,...,Xm (t1 , . . . , tm ) = =

#

···

#

et1 x1 +···+tm xm pX1 ,...,Xm (x1 , . . . , xm )

(x1 ,...,xm )

#

···

#

e

t1 x1 +···+tm xm

x1 +···+xm =n

" n xm px1 · · · pm x1 , . . . , xm 1

" $ %x $ %x n = ··· p1 et1 1 · · · pm etm m x1 +···+xm =n x1 , . . . , xm $ %n = p1 et1 + · · · + pm etm , #

#!

!

where the last equality follows from the multinomial theorem, as presented in Exercise 3.49 on page 109. b) From Exercise 11.29, we have, for each 1 ≤ j ≤ m, 0 1n # $ %n tj MXj (tj ) = MX1 ,...,Xm (0, . . . , 0, tj , 0, . . . , 0) = pj e + pk = pj etj + 1 − pj . k̸=j

Hence, from Table 11.1, we see that Xj ∼ B(n, pj ). c) We have, for each 1 ≤ j ≤ m,

$ %n−1 ∂MX1 ,...,Xm (t1 , . . . , tm ) = n p1 et1 + · · · + pm etm pj etj ∂tj

and, for k ̸= ℓ,

$ %n−2 ∂ 2 MX1 ,...,Xm (t1 , . . . , tm ) = n(n − 1) p1 et1 + · · · + pm etm pk etk pℓ etℓ . ∂tk ∂tℓ

Hence, from Exercise 11.29, $ % ∂MX1 ,...,Xm E Xj = (0, . . . , 0) = n · 1n−1 · pj · 1 = npj ∂tj and, for k ̸= ℓ,

E(Xk Xℓ ) =

∂ 2 MX1 ,...,Xm (0, . . . , 0) = n(n − 1) · 1n−2 pk · 1 · pℓ · 1 = n(n − 1)pk pℓ . ∂tk ∂tℓ

Consequently, Cov(Xk , Xℓ ) = E(Xk Xℓ ) − E(Xk ) E(Xℓ ) = n(n − 1)pk pℓ − npk · npℓ = −npk pℓ , which agrees with Equation (∗) on page 375. d) If k $= ℓ,% we know that ρ(Xk , Xℓ ) = 1. We can use either Exercise 11.29 or part (b) to show that Var Xj = npj (1 − pj ). Therefore, from part (c), if k ̸= ℓ, 7 Cov(Xk , Xℓ ) pk pℓ −npk pℓ ρ(Xk , Xℓ ) = √ . =√ =− (1 − pk )(1 − pℓ ) npk (1 − pk ) · npℓ (1 − pℓ ) Var(Xk ) · Var(Xℓ ) 11.33 The result of this exercise is needed for the proof of Proposition 11.9 on page 646. First we show that X and U have the same marginal distributions: P (X ∈ A) = P (X ∈ A, Y ∈ R) = P (U ∈ A, V ∈ R) = P (U ∈ A).

11-26

CHAPTER 11

Generating Functions and Limit Theorems

Likewise, Y and V have the same marginal distributions. Using the independence of U and V , we get P (X ∈ A, Y ∈ B) = P (U ∈ A, V ∈ B) = P (U ∈ A)P (V ∈ B) = P (X ∈ A)P (Y ∈ B). Thus, X and Y are independent random variables. 11.34 a) To begin, we note that µX s + µY t = [ µX

8 9 s = µ′ t µY ] t

and σX2 s 2 + 2ρσX σY st + σY2 t 2 = Var(X) s 2 + 2 Cov(X, Y ) st + Var(Y ) t 2 8 9 s Var(X) + t Cov(X, Y ) = [s t ] s Cov(Y, X) + t Var(Y ) 8 98 9 Var(X) Cov(X, Y ) s = [s t ] = t′ !t. t Cov(Y, X) Var(Y ) Therefore, in view of Equation (11.15) on page 642, 1

MX,Y (s, t) = eµX s+µY t+ 2

$ 2 2 % σX s +2ρσX σY st+σY2 t 2



1 ′

= eµ t+ 2 t !t .

$ % b) Let X be a normal random variable, say, X ∼ N µ, σ 2 . Define the following 1 × 1 vectors and matrix: t = [t], µ = [µ] = [µX ], and ! = [σ 2 ] = [Var(X)]. Then 1

2 2



1 ′

MX (t) = eµt+σ t /2 = etµX + 2 t Var(X)t = eµ t+ 2 t !t . : $ %; c) Define ! = Cov Xi , Xj , an m × m matrix, and let ⎡

⎤ t1 . t = ⎣ .. ⎦ tm

⎡µ

X1

and



µ = ⎣ ... ⎦ .

µXm

Then, in view of parts (a) and (b), an educated guess for the matrix form of the joint MGF of an m-variate normal distribution is ′

1 ′

MX1 ,...,Xm (t1 , . . . , tm ) = eµ t+ 2 t !t . 11.35 As for real-valued random variables, a complex-valued random variable Z has finite expectation if and only if 'E(|Z|) < ∞, where, in this 'case, |z| denotes the modulus of a complex number z. ' i(sX+tY $' i(sX+tY % ) ) ' ' ' ' Because e = 1, we have E e = E(1) = 1 < ∞. Consequently, ei(sX+tY ) has finite expectation for all s, t ∈ R. 11.36 Analogue of Proposition 11.6: If (X, Y ) has a joint moment of order (j, k), then ( ) ∂ j +k φX,Y E Xj Y k = (−i)j +k (0, 0). ∂s j ∂t k

11.2

Joint Moment Generating Functions

11-27

For the proof, we note that, by interchanging the order of differentiation and expectation, we get 0 1 $ % ∂ j +k E ei(sX+tY ) ∂ j +k φX,Y ∂ j +k ei(sX+tY ) (s, t) = =E ∂s j ∂t k ∂s j ∂t k ∂s j ∂t k ( ) ( ) = E i j +k Xj Y k ei(sX+tY ) = i j +k E X j Y k ei(sX+tY ) . Setting s = t = 0, we now get the required result.

Analogue of Proposition 11.7: Let X and Y be random variables defined on the same sample space. Then φX (s) = φX,Y (s, 0)

and

φY (t) = φX,Y (0, t).

Thus, the marginal CHF of X is obtained by setting t = 0 in the joint CHF, and the marginal CHF of Y is obtained by setting s = 0 in the joint CHF. For the proof, we have ( ) ( ) φX (s) = E eisX = E ei(sX+0·Y ) = φX,Y (s, 0). Arguing similarly, we find that φY (t) = φX,Y (0, t).

Analogue of Proposition 11.9: Let X and Y be random variables defined on the same sample space. Then X and Y are independent if and only if φX,Y (s, t) = φX (s)φY (t),

s, t ∈ R.

(∗)

In words, two random variables are independent if and only if their joint CHF equals the product of their marginal CHFs. For the proof, first suppose that X and Y are independent random variables. Proposition 6.9 on page 291 implies that eisX and eitY are also independent random variables for each s, t ∈ R. Now using the fact that, for independent random variables, the expected value of the product equals the product of the expected values, we conclude that ) ( ) ( ) ( ) ( φX,Y (s, t) = E ei(sX+tY ) = E eisX eitY = E eisX E eitY = φX (s)φY (t). Hence, Equation (∗) holds. Conversely, suppose that Equation (∗) holds. Let U and V be independent random variables with the same (marginal) probability distributions as X and Y , respectively. Then, by Equation (∗) and what we just proved, φX,Y (s, t) = φX (s)φY (t) = φU (s)φV (t) = φU,V (s, t).

Hence, the joint CHF of X and Y is the same as that of U and V . Therefore, by the uniqueness property of joint CHFs, the joint probability distribution of X and Y is the same as that of U and V . Because U and V are independent random variables, it now follows from Exercise 11.33 that X and Y are also independent random variables. Analogue of Proposition 11.10: Let X and Y be random variables defined on the same sample space. Then φX+Y (t) = φX,Y (t, t). For the proof, we have

11.37 a) Formally,

( ) ( ) φX+Y (t) = E eit (X+Y ) = E ei(tX+tY ) = φX,Y (t, t).

( ) ( ) ( ) φX,Y (s, t) = E ei(sX+tY ) = E eisX+itY = E e(is)X+(it)Y = MX,Y (is, it).

11-28

CHAPTER 11

Generating Functions and Limit Theorems

b) From part (a) and Equation (11.15) on page 642, 1

φX,Y (s, t) = MX,Y (is, it) = eµX (is)+µY (it)+ 2 = ei

$ 2 % σX (is)2 +2ρσX σY (is)(it)+σY2 (it)2

$ % $ % µX s+µY t − 21 σX2 s 2 +2ρσX σY st+σY2 t 2

.

c) Applying elementary calculus and the result from part (b), we get ! )" 1( 2 ∂φX,Y (s, t) = iµX − 2σX s + 2ρσX σY t φX,Y (s, t); ∂s 2 ! )" ∂φX,Y 1( (s, t) = iµY − 2ρσX σY s + 2σY2 t φX,Y (s, t); ∂t 2 0! 1 )"2 1( 2 ∂ 2 φX,Y (s, t) = iµX − − σX2 φX,Y (x, y); 2σX s + 2ρσX σY t 2 ∂s 2 0! 1 )"2 ∂ 2 φX,Y 1( (s, t) = iµY − 2ρσX σY s + 2σY2 t − σY2 φX,Y (s, t); 2 ∂t 2 0! )" ! )" ∂ 2 φX,Y 1( 1( 2 2 (s, t) = iµY − 2ρσX σY s + 2σY t iµX − 2σX s + 2ρσX σY t ∂s∂t 2 2 " − ρσX σY φX,Y (s, t). Referring to Exercise 11.36 and noting that φX,Y (0, 0) = 1, we conclude that ( ) ∂φX,Y E(X) = E X1 Y 0 = (−i)1+0 (0, 0) = (−i) · iµX = µX ; ∂s ( ) ∂φX,Y E(Y ) = E X0 Y 1 = (−i)0+1 (0, 0) = (−i) · iµY = µY ; ∂t ( ) ( ) $ % ∂ 2 φX,Y 2 2 2 2 (0, 0) = (−1) (iµ ) − σ E X 2 = E X2 Y 0 = (−i)2+0 X X = µX + σX ; ∂s 2 ( ) ) ( $ % ∂ 2 φX,Y 2 2 2 2 (0, 0) = (−1) (iµ ) − σ E Y 2 = E X0 Y 2 = (−i)0+2 Y Y = µY + σY ; ∂t 2 ) ( $ % ∂ 2 φX,Y (0, 0) = (−1) (iµY )(iµX ) − ρσX σY = µX µY + ρσX σY . E(XY ) = E X1 Y 1 = (−i)1+1 ∂s∂t

From these equations, E(X) = µX , E(Y ) = µY ,

and

%2 $ % $ Var(X) = E X2 − E(X) = µ2X + σX2 − (µX )2 = σX2 , $ % $ %2 Var(Y ) = E Y 2 − E(Y ) = µ2Y + σY2 − (µY )2 = σY2 ,

Cov(X, Y ) = E(XY ) − E(X) E(Y ) = µX µY + ρσX σY − µX µY = ρσX σY .

11.3

11-29

Laws of Large Numbers

11.38 a) The joint CHF of X1 , . . . , Xm is defined by ( ) φX1 ,...,Xm (t1 , . . . , tm ) = E ei(t1 X1 +···+tm Xm ) ,

t1 , . . . , tm ∈ R.

b) Analogue of Proposition 11.6: If (X1 , . . . , Xm ) has a joint moment of order (j1 , . . . , jm ), then ( ) ∂ j1 +···+jm φX1 ,...,Xm j j E X11 · · · Xmm = (−i)j1 +···+jm (0, . . . , 0). j j ∂t1 1 · · · ∂tmm Analogue of Proposition 11.7: Let X1 , . . . , Xm be random variables defined on the same sample space. Then φXj (tj ) = φX1 ,...,Xm (0, . . . , 0, tj , 0, . . . , 0), 1 ≤ j ≤ m.

Thus, for each 1 ≤ j ≤ m, the marginal CHF of Xj is obtained by setting tk = 0 for k ̸= j in the joint CHF.

Analogue of Proposition 11.8: If the joint CHF of X1 , . . . , Xm equals the joint CHF of Y1 , . . . , Ym , then X1 , . . . , Xm have the same joint probability distribution as Y1 , . . . , Ym .

Analogue of Proposition 11.9: Let X1 , . . . , Xm be random variables defined on the same sample space. Then X1 , . . . , Xm are independent if and only if φX1 ,...,Xm (t1 , . . . , tm ) = φX1 (t1 ) · · · φXm (tm ).

In words, m random variables are independent if and only if their joint CHF equals the product of their marginal CHFs. Analogue of Proposition 11.10: Let X1 , . . . , Xm be random variables defined on the same sample space. Then φX1 +···+Xm (t) = φX1 ,...,Xm (t, . . . , t).

11.3 Laws of Large Numbers Basic Exercises 11.39 In this context, the strong law of large numbers states that limn→∞ (X1 + · · · + Xn )/n = 1/2 with probability 1, not that the limit always equals 1/2. In other words, the outcomes for which the average value doesn’t converge to 1/2 constitute an event that has zero probability. 11.40 For each j ∈ N , let Yj = Xjr . As X1 , X2 , . . . are independent random variables, all having the same probability distribution as X, we have that Y1 , Y2 , . . . are independent random variables, $ %all having r r the same probability distribution as X . By assumption, X has finite mean. Because E Yj = E(Xr ) for all j ∈ N , the weak law of large numbers implies that ' ' !' r !' " " ' X1 + · · · + Xnr ' Y1 + · · · + Yn $ r %' $ r %' lim P '' − E X '' < ϵ = lim P '' − E X '' < ϵ = 1, n→∞ n→∞ n n for each ϵ > 0. In other words, the rth sample moment is a consistent estimator of the rth moment. 11.41 a) For n ∈ N ,

Yn = s

n n @ @ Yj =s Xj . Y j =1 j −1 j =1

11-30

CHAPTER 11

Generating Functions and Limit Theorems

Consequently, !

Yn ln s

" ! " !@ n n 1# 1 Yn 1 ln X1 + · · · + ln Xn = ln = ln Xj = ln Xj = . n s n n j =1 n j =1

"1/n

Applying the strong law of large numbers, Theorem 11.2 on page 654, to the sequence of independent and identically distributed random variables ln X1 , ln X2 , . . . , we get !

Yn lim ln n→∞ s

"1/n

ln X1 + · · · + ln Xn = E(ln X) , n→∞ n

= lim

with probability 1. Consequently, with probability 1, lim

n→∞

!

Yn s

"1/n

= lim e

ln

n→∞

(

)1/n

Yn s

= eE(ln X) .

b) From part (a), we have, for large n, that (Yn /s)1/n ≈ eE(ln X) or, equivalently, 0! " 1n ( )n Yn 1/n Yn = s ≈ s eE(ln X) = senE(ln X) . s c) We consider each distribution given for X in turn. ●

X ∼ U(0, 1): In this case, E(ln X) =

&

1

0

(ln x) · 1 dx =

&

1

0

ln x dx = −1.

Hence, with probability 1, lim

n→∞

!

Yn s

"1/n

= e−1 ,

and we have Yn ≈ se−n for large n. ●

X ∼ Beta(2, 1): In this case, E(ln X) =

&

0

1

(ln x) · 2x

2−1

(1 − x)

1−1

dx = 2

&

0

1

!

1 x ln x dx = 2 · − 4

"

1 =− . 2

Hence, with probability 1, lim

n→∞

!

Yn s

"1/n

= e−1/2 ,

and we have Yn ≈ se−n/2 for large n. ●

X ∼ Beta(1, 2): In this case, E(ln X) =

&

0

1

(ln x) · 2x

1−1

(1 − x)

2−1

dx = 2

&

0

1

!

3 (1 − x) ln x dx = 2 · − 4

"

3 =− . 2

11.3

Hence, with probability 1, lim

n→∞

and we have Yn ≈ se−3n/2 for large n. ●

11-31

Laws of Large Numbers

!

Yn s

"1/n

= e−3/2 ,

X ∼ Beta(2, 2): In this case, & 1 & E(ln X) = (ln x) · 6x 2−1 (1 − x)2−1 dx = 6 0

0

Hence, with probability 1,

lim

n→∞

and we have Yn ≈ se−5n/6 for large n.

!

Yn s

"1/n

1

!

5 x(1 − x) ln x dx = 6 · − 36

"

5 =− . 6

= e−5/6 ,

11.42 For each x ∈ R and j ∈ N , let Yj (x) = I{Xj ≤x} . Then Y1 (x), Y2 (x), . . . are independent Bernoulli random variables with common parameter FX (x) and, hence, with common mean FX (x). Noting $ % that Fˆn (x) = Y1 (x) + · · · + Yn (x) /n, the strong law of large numbers yields

Y1 (x) + · · · + Yn (x) = FX (x), lim Fˆn (x) = lim n→∞ n with probability 1. Thus, for large n, the CDF of X can be approximated by the corresponding empirical distribution function. n→∞

11.43 a) Because g is Riemann integrable, it is bounded, say, by M. Thus, & 1 & 1 & & ∞ |g(x)|fX (x) dx = |g(x)| · 1 dx = |g(x)| dx ≤ −∞

0

0

0

1

M dx = M < ∞.

Thus, by the FEF for continuous random variables, g(X) has finite expectation. b) The random variables g(X1 ), g(X2 ), . . . all have the same distribution as the random variable g(X) of part (a). From Proposition 6.13 on page 297, they are also independent. Moreover, by part (a), they $ % have finite mean E g(X) . Therefore, by the strong law of large numbers and the FEF, & ∞ & 1 & 1 $ % g(X1 ) + · · · + g(Xn ) lim = E g(X) = g(x)fX (x) dx = g(x) · 1 dx = g(x) dx, n→∞ n −∞ 0 0 with probability 1. c) For a large n ∈ N , use a basic random number generator to obtain n (independent) uniform numbers between 0 and 1—say, x1 , . . . , xn . In view of the result in part (b), we then have & 1 g(x1 ) + · · · + g(xn ) g(x) dx ≈ . n 0 d) Answers will vary. We used statistical software to obtain 1000 uniform numbers between 0 and 1. Next we cubed each of those numbers. Then we took the average value of the 1000 cubed numbers. The A1 result we obtained was 0.248, which is quite close to the exact value of 0.25 for 0 x 3 dx. e) Answers will vary. Here we want to estimate & 1 & 1 1 2 P (0 ≤ Z ≤ 1) = φ(x) dx = √ e−x /2 dx. 2π 0 0

11-32

CHAPTER 11

Generating Functions and Limit Theorems

We used statistical software to obtain 1000 uniform numbers between 0 and 1. Next we computed the value of the standard normal PDF for each of those numbers. Then we took the average of the 1000 values of the standard normal PDF. The result we got was 0.3414, which is quite close to the value of 0.3413 obtained by using Table I. f) Answers will vary. Repeat the processes used in parts (d) and (e), except use 10,000 uniform numbers instead of 1000 uniform numbers. 11.44 a) As we know, if X ∼ U(0, 1), then Y = a + (b − a)X ∼ U(a, b). Because g is Riemann integrable on [a, b], it is bounded thereon and, hence, by an argument similar to that given in the solution to Exercise 11.43(a), g(Y ) has finite expectation. Let X1 , X2 , . . . be independent U(0, 1) random variables, and set Yj = a + (b − a)Xj for j ∈ N . Then Y1 , Y2 , . . . are independent U(a, b) random variables and, consequently, g(Y1 ), g(Y2 ), . . . are independent and identically distributed random variables, each with the same distribution as g(Y ). Applying the strong law of large numbers and the FEF, we get & ∞ & b $ % g(Y1 ) + · · · + g(Yn ) 1 lim g(y)fY (y) dy = g(y) dy = E g(Y ) = n→∞ n b−a a −∞

with probability 1. From this result, we see that the following procedure can be applied. Use a basic random number generator to obtain n (independent) uniform numbers between 0 and 1—say, x1 , . . . , xn , where n is large. Set yj = a + (b − a)xj for j = 1, 2, . . . , n. Then & b g(y1 ) + · · · + g(yn ) 1 g(x) dx, ≈ b−a a n

or, equivalently,

&

b a

!

" g(y1 ) + · · · + g(yn ) g(x) dx ≈ (b − a) . n

b) Answers will vary. We used statistical software to obtain 1000 uniform numbers between 0 and 1. Next we transformed those numbers by using the relation y = 1 + 2x and then cubed each of the transformed numbers. Finally, we took the average value of the 1000 cubed transformed numbers and multiplied that average by 2 (= 3 − 1). The result we obtained was 20.265, which is reasonably close to the exact value A3 of 20 for 1 x 3 dx. c) Answers will vary. Here we want to estimate & 2 & 2 1 2 P (−1 ≤ Z ≤ 2) = φ(x) dx = √ e−x /2 dx. 2π −1 −1

We used statistical software to obtain 1000 uniform numbers between 0 and 1. Next we transformed those numbers by using the relation y = −1 + 3x and then computed the value of the standard normal PDF for each of the transformed numbers. Finally, we took the average of the$ 1000 values%of the standard normal PDF of the transformed numbers and multiplied that average by 3 = 2 − (−1) . The result we got was 0.8274, which is reasonably close to the value of 0.8185 obtained by using Table I. d) Answers will vary. Repeat the processes used in parts (b) and (c), except use 10,000 uniform numbers instead of 1000 uniform numbers. 11.45 a) To begin, we recall that the PDF of a standard Cauchy random variable is f (x) =

1 %, π 1 + x2 $

x ∈ R.

11.3

11-33

Laws of Large Numbers

For convenience, set X¯ n = (X1 + · · · + Xn )/n. From Exercise 11.20(b), the random variable X¯ n has the standard Cauchy distribution for each n ∈ N . Hence, by the FPF, % $ P |X¯ n − c| < ϵ =

&

c+ϵ

c−ϵ

% 1$ dx = arctan(c + ϵ) − arctan(c − ϵ) , 2 π π(1 + x )

for all n ∈ N . As the right side of the previous display doesn’t depend on n, it also is the limit as n → ∞ of the left side; that is, ' !' " ' X1 + · · · + Xn ' % 1$ ' ' − c' < ϵ = arctan(c + ϵ) − arctan(c − ϵ) . lim P ' n→∞ n π

b) The weak law of large numbers states that, if X1 , X2 , . . . are independent and identically distributed random variables with finite expectation, µ, then, for each ϵ > 0, ' !' " ' X1 + · · · + Xn ' ' ' − µ' < ϵ = 1. lim P ' n→∞ n

Thus, for such random variables, the limit in Expression (∗) equals 1 when c = µ. However, the weak law of large numbers doesn’t apply to standard Cauchy random variables because they don’t have finite expectation. 11.46 b) The mean of a U(0, 1) distribution is 1/2. Hence, from the strong law of large numbers, we would expect the average value of the 10,000 numbers obtained in part (a) to be roughly 1/2. c) Answers will vary. We performed the simulation and got an average value of 0.506, which is quite close to 1/2. 11.47 b) The mean of a N (0, 1) distribution is 0. Hence, from the strong law of large numbers, we would expect the average value of the 10,000 numbers obtained in part (a) to be roughly 0. c) Answers will vary. We performed the simulation and got an average value of −0.006, which is quite close to 0. 11.48 a) We first note that the CDF of a standard Cauchy random variable is F (x) =

&

x

−∞

dt 1 1 % = + arctan x, 2 2 π π 1+t $

x ∈ R.

$ % Solving for x in the equation y = 1/2 + (arctan x)/π, we find that F −1 (y) = tan π(2y − 1)/2 . Hence, $ % from Proposition 8.16(b), if U ∼ U(0, 1), then F −1 (U ) = tan π(2U − 1)/2 ∼ C(0, 1). Consequently, to obtain 10,000 observations of a standard Cauchy random variable, we first use a basic random number generator to get 10,000 uniform numbers $ % between 0 and 1 and then transform each of those numbers by using the relation x = tan π(2u − 1)/2 . b) No, we cannot guess the average value of the 10,000 numbers obtained in part (a). Indeed, by Exercise 11.20, the average value of 10,000 observations of a standard Cauchy random variable also has the standard Cauchy distribution.

11-34

CHAPTER 11

Generating Functions and Limit Theorems

c) Answers will vary. We performed the simulation in part (a) 20 times and obtained the following 20 averages: −0.168 2.720 −0.643 −0.153

0.380 0.062 0.746 0.119

−2.459 1.586 −0.794 0.645

−4.208 −0.935 −1.433 0.309

−1.730 3.034 2.097 −0.850

The strong law of large numbers states that, if X1 , X2 , . . . are independent and identically distributed random variables with finite expectation, µ, then, lim

n→∞

X1 + · · · + Xn = µ, n

with probability 1. Thus, for large n, the average value of n independent observations of a random variable X with finite expectation, µ, should be roughly equal to µ. In particular, if we repeat the experiment of making n independent observations of X, the average values should all be close to each other. As we see from the preceding table, this phenomenon does not occur for a standard Cauchy random variable. This is not a problem, however, as the strong law of large numbers doesn’t apply to a standard Cauchy random variable because it doesn’t have finite expectation.

Theory Exercises 11.49 Let Sn = X1 + · · · + Xn . From properties of expected value, ! " 1 1 E(X1 ) + · · · + E(Xn ) Sn = E(Sn ) = E(X1 + · · · + Xn ) = . E n n n n Hence, by Chebyshev’s inequality, Proposition 7.11 on page 360, for each ϵ > 0 and n ∈ N , ' " !' ' ' X1 + · · · + Xn E(X + · · · + E(X ) ) 1 n '≥ϵ − P '' ' n n

!' ! " ! "' " ' Sn Var(Sn ) 1 Var(X1 + · · · + Xn ) Sn '' Var(Sn /n) ' =P ' −E = = 2 . ≥ϵ ≤ n n ' ϵ2 n2 ϵ 2 ϵ n2

By assumption, the last term in the preceding display converges to 0 as n → ∞. The required result now follows from the complementation rule. $ % 11.50 Let M be such that Var Xj ≤ M for all j ∈ N . From Equation (10.31) on page 589 and the uncorrelatedness of the Xj s, we get Var(X1 + · · · + Xn ) =

n # k=1

Var(Xk ) + 2

## i 0. Then, for each n ∈ N , % $ % $ P |X¯ n − µ| < ϵ ≥ P µ − ϵ/2 < X¯ n ≤ µ + ϵ/2 = Fn (µ + ϵ/2) − Fn (µ − ϵ/2). Therefore,

$ % $ % 1 ≥ lim P |X¯ n − µ| < ϵ ≥ lim Fn (µ + ϵ/2) − Fn (µ − ϵ/2) n→∞

n→∞

= FX (µ + ϵ/2) − FX (µ − ϵ/2) = 1 − 0 = 1. $ % Hence, limn→∞ P |X¯ n − µ| < ϵ = 1, which is Equation (11.26) on page 651.

11.52 As usual, we let X¯ n = (X1 + · · · + Xn )/n. a) For convenience, set . and E = lim X¯ n = µ n→∞

' -' . Emn = 'X¯ n − µ' < 1/m .

for a sequence We have ω ∈ E if and only if limn→∞ X¯ n (ω) = µ, which, by the definition of convergence ' ' 'X¯ n (ω) − µ' < 1/m of real numbers, happens if and only if for each m ∈ N , there is an N ∈ N such that %% B∞ $C∞ $B∞ for all n ≥ N. However, means N=1 n=N Emn . Hence, we have $this %% that ω ∈ m=1 B C∞last$condition B∞ proved that E = ∞ E , as required. mn m=1 N=1 n=N B b) Let E and Emn be as in part (a), and set Amn = ∞ k=n Emk . We note that Am1 ⊂ Am2 ⊂ · · · and, therefore, by the continuity property of a probability measure, " !D ∞ P Amn = lim P (Amn ). n=1

n→∞

11-36

CHAPTER 11

Generating Functions and Limit Theorems

We want to prove that P (E) = 1 if and only if !E " ∞ -' ' . 'X¯ k − µ' < ϵ = 1, lim P n→∞

(∗)

k=n

$C∞ % B for each ϵ > $0.C From part (a), we know that E = ∞ m=1 n=1 Amn . It follows that P (E) = 1 if % and only if P ∞ n=1 Amn = 1 for all m ∈ N or, equivalently, if and only if lim n→∞ P (Amn ) = 1 for all m ∈ N ; that is, !E " ∞ -' ' . ' ' ¯ lim P Xk − µ < 1/m = 1, (∗∗) n→∞

k=n

for all m ∈ N . Hence, we need to show that Equation (∗) holds for each ϵ > 0 if and only if Equation (∗∗) holds for all m ∈ N . First suppose that Equation (∗) holds for each ϵ > 0. Let m ∈ N . Taking ϵ = 1/m, we see that Equation (∗∗) holds. Conversely, holds ' ' for .all m ∈ N . Let ϵ > 0. Choose M ∈ N . -(∗∗) -' suppose' that Equation such that 1/M ≤ ϵ. Then 'X¯ k − µ' < 1/M ⊂ 'X¯ k − µ' < ϵ for all k ∈ N and, hence, !E " !E " ∞ -' ∞ -' ' ' . . 'X¯ k − µ' < 1/M 'X¯ k − µ' < ϵ ≤ 1. 1 = lim P ≤ lim P n→∞

n→∞

k=n

k=n

Consequently, Equation (∗) holds for each ϵ > 0. c) Clearly, for each ϵ > 0,

∞ -' ' ' . E . -' 'X¯ k − µ' < ϵ , 'X¯ n − µ' < ϵ ⊃ k=n

n ∈ N.

Therefore, for each ϵ > 0,

lim P

n→∞

From part (b), ( ) P lim X¯ n = µ = 1 n→∞

!E " ∞ -' (-' ' ' .) . 'X¯ n − µ' < ϵ ≥ lim P 'X¯ k − µ' < ϵ . n→∞

if and only if

lim P

n→∞

!E ∞

k=n

k=n

" ' -' . 'X¯ k − µ' < ϵ = 1 for each ϵ > 0.

' % $-' .% Thus, if P limn→∞ X¯ n = µ = 1, then limn→∞ P 'X¯ n − µ' < ϵ = 1 for each ϵ > 0. In other words, the strong law of large numbers implies the weak $ law of large numbers. % d) From the strong law of large numbers, we have P limn→∞ X¯ n = µ = 1. As we showed in part (b), this equation is equivalent to 0 "1 ∞ !E ∞ -' D ' . 'X¯ n − µ' < ϵ = 1, P $

N=1 n=N

for each ϵ > 0. For convenience, let' A denote the ' event in the previous display. Then ' P (A) ='1. If ω ∈ A, ' ' ¯ then there is an N ∈ N such that Xn (ω) − µ < ϵ for all n ≥ N. Therefore, 'X¯ n (ω) − µ' ≥ ϵ occurs for only finitely many n. In other words, with probability 1, deviations of the average value of X1 , . . . , Xn from µ in excess of any specified amount occur for only finitely many n.

11.53 a) The number x is a normal number if and only if the relative frequency of occurrence of each decimal digit in x is 1/10.

11.3

11-37

Laws of Large Numbers

b) Let D = {0, 1, . . . , 9}. For k1 , . . . , kn−1 , k ∈ D, we have 8 " k1 kn−1 k k1 kn−1 k+1 + · · · + n−1 + n , + · · · + n−1 + . {X1 = k1 , . . . , Xn−1 = kn−1 , Xn = k} = 10 10 10 10n 10 10 Therefore,

P (X1 = k1 , . . . , Xn−1 = kn−1 , Xn = k) '8 "' ' ' k1 k k k + 1 k k n−1 1 n−1 '= 1 . = '' + · · · + n−1 + n , + · · · + n−1 + 10 10 10 10n ' 10n 10 10

In the hint, the events in the union are mutually exclusive. Thus, for each k ∈ D, P (Xn = k) = P = =

!D 9

k1 =0

9 #

···

9 #

···

k1 =0

k1 =0

···

9 D

kn−1 =0

9 #

kn−1 =0

{X1 = k1 , . . . , Xn−1 = kn−1 , Xn = k}

"

P (X1 = k1 , . . . , Xn−1 = kn−1 , Xn = k)

9 #

kn−1

1 1 1 = 10n−1 · n = . n 10 10 10 =0

Hence, each Xn has the discrete uniform distribution on D. Moreover, from what we have already established, we see that, for each n ∈ N and k1 , . . . , kn ∈ D, P (X1 = k1 , . . . , Xn = kn ) =

1 1 1 = ··· = P (X1 = k1 ) · · · P (Xn = kn ). n 10 F10 GH 10I n times

Therefore, X1 , X2 , . . . are independent random variables. J c) Let k ∈ D. For each j ∈ N , let Yj = 1{Xj =k} and note that nk = jn=1 Yj . Because X1 , X2 , . . . are independent random variables, each having the discrete uniform distribution on D, the random variables Y1 , Y2 , . . . are independent, each having the Bernoulli distribution with parameter 1/10, which is also the mean of such a distribution. Therefore, by the strong law of large numbers, lim

n→∞

nk Y1 + · · · + Yn 1 = lim = , n→∞ n n 10

with probability 1. In other words, with probability 1, a randomly selected number from the interval [0, 1] is a normal number. 11.54 a) Each real number between 0 and 1 has a binary expansion and, except for numbers of the form m/2n , the expansion is unique. For definiteness, we take the unique terminating expansion for numbers of the form m/2n . Let x denote a randomly selected number from the interval [0, 1] and define Xn (x) = xn , where xn is the nth binary digit of x. Proceeding as in Exercise 11.53(b), we can show that the random variables X1 , X2 , . . . are independent and identically distributed, all having the discrete uniform distribution on the set {0, 1}. Now, for a single toss of a balanced coin, let us associate the binary digit 1 with the outcome of a head and the binary digit 0 with the outcome of a tail. If we select a number at random from the interval [0, 1], then its binary expansion represents an outcome of repeatedly tossing a balanced coin and observing at each toss (digit) whether the result is a head (1) or a tail (0).

11-38

CHAPTER 11

Generating Functions and Limit Theorems

b) Let x ∈ [0, 1]. For each n ∈ N and k ∈ {0, 1}, use nk (x) to denote the number of the first n binary digits of x that equal k. Now let x denote a randomly selected number J from the interval [0, 1] and let k ∈ {0, 1}. For each j ∈ N , let Yj = 1{Xj =k} and note that nk = jn=1 Yj . Because X1 , X2 , . . . are independent random variables, each having the discrete uniform distribution on {0, 1}, the random variables Y1 , Y2 , . . . are independent, each having the Bernoulli distribution with parameter 1/2, which is also the mean of such a distribution. Therefore, by the strong law of large numbers, nk Y1 + · · · + Yn 1 = lim = , n→∞ n n→∞ n 2 with probability 1. In terms of tossing a balanced coin repeatedly, this result states that, in the long run, half of the tosses will be heads (and half will be tails). lim

11.4 The Central Limit Theorem Note: For this section, we have used statistical software to obtain any required standard normal probabilities. If you use Table I instead, your answers may differ somewhat from those shown here.

Basic Exercises 11.55 Let X denote the number of people who don’t take the flight. Then X ∼ B(42, 0.16) and, hence, ! " 42 (0.16)x (0.84)42−x , x = 0, 1, . . . , 42. pX (x) = x

As n = 42 and p = 0.16, we have np = 6.72 and np(1 − p) = 5.6448. Hence, in view of Proposition 11.11 on page 661, for integers a and b, with 0 ≤ a ≤ b ≤ 42, 1 0 1 0 a − 21 − 6.72 b + 21 − 6.72 −) . P (a ≤ X ≤ b) ≈ ) √ √ 5.6448 5.6448 a) We have

Also,

!

" 42 (0.16)5 (0.84)37 = 0.1408. P (X = 5) = pX (5) = 5 0

5 + 21 − 6.72 P (X = 5) = P (5 ≤ X ≤ 5) ≈ ) √ 5.6448

1

0

5 − 21 − 6.72 −) √ 5.6448

1

= 0.1288.

b) We have 12 ! " # 42 pX (x) = (0.16)x (0.84)42−x = 0.2087. P (9 ≤ X ≤ 12) = x x=9 x=9 12 #

Also,

0

12 + 21 − 6.72 P (9 ≤ X ≤ 12) ≈ ) √ 5.6448

c) We have

1

0

9 − 21 − 6.72 −) √ 5.6448

1

= 0.2194.

! " 42 (0.16)0 (0.84)42−0 = 0.9993. P (X ≥ 1) = 1 − P (X = 0) = 1 − pX (0) = 1 − 0

11.4

11-39

The Central Limit Theorem

Also, ⎛ 0

P (X ≥ 1) = 1 − P (X = 0) ≈ 1 − ⎝)

1 2

0 + − 6.72 √ 5.6448

1

d) We have

1⎞ 0 − − 6.72 ⎠ = 0.9968. −) √ 5.6448 0

1 2

2 ! " # 42 (0.16)x (0.84)42−x = 0.0266. pX (x) = P (X ≤ 2) = x x=0 x=0 2 #

Also, 0

2 + 21 − 6.72 P (X ≤ 2) = P (0 ≤ X ≤ 2) ≈ ) √ 5.6448

1

0

0 − 21 − 6.72 −) √ 5.6448

1

= 0.0367.

11.56 a) See the solution to Exercise 8.108. b) The following table compares the results of all three methods of obtaining the probabilities. Each number in parentheses gives the absolute percentage error in each case. Binomial probability

Local normal approximation

Integral normal approximation

P (X = 5)

0.1408

0.1292 (8.2%)

0.1288 (8.5%)

P (9 ≤ X ≤ 12)

0.2087

0.2181 (4.5%)

0.2194 (5.1%)

P (X ≥ 1)

0.9993

0.9969 (0.2%)

0.9968 (0.3%)

P (X ≤ 2)

0.0266

0.0357 (34.2%)

0.0367 (38.0%)

c) The customary rule of thumb for using the normal approximation to the binomial distribution is that both np and n(1 − p) are 5 or greater. In this case, np = 6.72 and n(1 − p) = 35.28, so the rule of thumb is met. However, it is barely met, which explains some of the large errors, as shown in the table in part (b). 11.57 Let Tn denote the number of times that the gambler wins in n bets. Then the gambler is ahead after n bets if and only if Tn > n/2. We know that Tn ∼ B(n, 9/19). As p = 9/19, Proposition 11.11 on page 661, shows that, for even n, P (X > n/2) = 1 − P (X ≤ n/2) = 1 − P (0 ≤ X ≤ n/2) ⎛ ⎛ ⎞ ⎛ ⎞⎞ 9 9 0 − 1 − n · 19 n/2 + 1 − n · 19 ⎜ ⎠ − )⎝ 6 2 ⎠⎟ ≈ 1 − ⎝) ⎝ 6 2 ⎠ 9 9 10 n · 19 · 10 n · · 19 19 19 ⎛ ⎛ n+1 ⎜ ⎝ 2 − 6 ≈ 1 − ⎝)



9 19 n ⎠

90 361 n

−)



−1 ⎝ 2

6



⎞⎞

9 19 n ⎠⎟

90 361 n

⎠.

11-40

CHAPTER 11

Generating Functions and Limit Theorems

The following answers are given to three significant digits. a) Here n = 100, so the required probability approximately equals ⎛ ⎛ ⎞ ⎛ ⎞⎞ 100+1 9 1 9 − 2 − 19 · 100 ⎟ − 19 · 100 ⎜ ⎠ − )⎝ 6 ⎠⎠ 1 − ⎝) ⎝ 26 90 90 361 · 100 361 · 100

= 1 − )(0.627185) + )(−9.58697) = 1 − 0.734731 + 0.000000 = 0.265.

b) Here n = 1000, so the required probability approximately equals ⎛ ⎛ ⎞ ⎛ ⎜ 1 − ⎝) ⎝

1000+1 2

6



90 361

9 19

· 1000

· 1000

⎠−)

⎞⎞ − · 1000 ⎟ ⎠⎠ 6 90 361 · 1000 9 19

−1 ⎝ 2

= 1 − )(1.69833) + )(−30.0317) = 1 − 0.955278 + 0.000000 = 0.0447.

c) Here n = 5000, so the required probability approximately equals ⎛ ⎛ ⎞ ⎛ ⎜ 1 − ⎝) ⎝

5000+1 2

6



90 361

9 19

· 5000

· 5000

⎠−)

⎞⎞ − · 5000 ⎟ ⎠⎠ 6 90 · 5000 361

−1 ⎝ 2

9 19

= 1 − )(3.74094) + )(−67.0962) = 1 − 0.9999083 + 0.0000000 = 0.0000917.

11.58 Let X1 , . . . , X100 denote the lengths of the 100 steps taken by the boy. We know that X1 , . . . , X100 are independent and identically distributed random variables with µ = 0.95 and σ = 0.08. The distance stepped off by the boy is D = X1 + · · · + X100 . Note that √ √ nµ = 100 · 0.95 = 95 and σ n = 0.08 · 100 = 0.8. a) Applying Equation (11.42) on page 666, we get

!

104 − 95 P (|D − 100| ≤ 4) = P (96 ≤ D ≤ 104) ≈ ) 0.8

"

!

"

!

"

96 − 95 −) 0.8

= )(11.25) − )(1.25) = 0.106.

b) Again applying Equation (11.42), we get

!

106 − 95 P (|D − 100| ≤ 6) = P (94 ≤ D ≤ 106) ≈ ) 0.8 = )(13.75) − )(−1.25) = 0.894.

"

94 − 95 −) 0.8

$ % 11.59 Let Y denote the lifetime of a single battery. As Y ∼ N 30, 52 , " ! 25 − 30 = 1 − )(−1) = 0.841. P (Y > 25) = 1 − P (Y ≤ 25) = 1 − ) 5

11.4

The Central Limit Theorem

11-41

Now let X denote the number of the 500 batteries that last longer than 25 hours. Then X ∼ B(500, 0.841). We want to determine P (X ≥ 400). Applying the integral De Moivre–Laplace theorem, we have P (X ≥ 400) = P (400 ≤ X ≤ 500) 0 1 0 1 500 + 21 − 500 · 0.841 400 − 21 − 500 · 0.841 ≈) √ −) √ 500 · 0.841 · (1 − 0.841) 500 · 0.841 · (1 − 0.841) = )(9.78382) − )(−2.56825) = 0.995. 11.60 Let X1 , . . . , X100 denote the checkout times, in minutes, for 100 customers. Then X1 , . . . , X100 are independent and identically distributed random variables with µ = 3.5 and σ = 1.5. a) Let T denote the time, in minutes, required to check out 100 customers. Then T = X1 + · · · + X100 . Referring to Equation (11.42) on page 666, we get 0 ! " ! "1 360 − 100 · 3.5 0 − 100 · 3.5 P (T ≥ 360) = 1 − P (0 ≤ T < 360) ≈ 1 − ) −) √ √ 1.5 100 1.5 100 $ % = 1 − )(0.66667) − )(−23.33333) = 0.252. b) The average checkout time for 100 customers is T /100. Again referring to Equation (11.42), we get P (T /100 < 3.4) = P (T < 340) = P (0 ≤ T < 340) ! " ! " 340 − 100 · 3.5 0 − 100 · 3.5 ≈) −) √ √ 1.5 100 1.5 100 = )(−0.66667) − )(−23.33333) = 0.252. 11.61 a) Let X¯ 160 denote the average income of 160 randomly chosen households. From the central limit ¯ theorem, in the form of the fourth bulleted item on page 667, we know √ that X160 is approximately normally distributed with mean $35,216 and standard deviation 3134/ 160. Thus, ! " ! " $ % $ % 34,600 − 35,216 0 − 35,216 ¯ ¯ P X160 < 34,600 = P 0 ≤ X160 < 34,600 ≈ ) −) √ √ 3134/ 160 3134/ 160 = )(−2.48623) − )(−142.13500) = 0.00646.

b) No; the independence assumption of the central limit theorem isn’t strictly satisfied when sampling is without replacement. c) It is permissible to use the central limit theorem to solve part (a) because the population size is large relative to the sample size and, hence, there is little difference between sampling without and with replacement. In the latter case, the independence assumption of the central limit theorem is satisfied (as are its other assumptions). 11.62 For convenience, we work in units of thousands. We first observe that the claim amount of each individual policy has an exponential distribution with parameter 1. Let X1 , . . . , X100 denote the claim amounts for the 100 policies. Then X1 , . . . , X100 are independent and identically distributed E(1) random variables. a) Let T denote the total claim amount of the 100 policies. Then T = X1 + · · · + X100 . From the first bulleted item on page 537, we know that T ∼ "(100, 1).

11-42

CHAPTER 11

Generating Functions and Limit Theorems

$ % b) The expected claim amount per policy is E Xj = 1/1 = 1. Because the premium for each policy is set at 0.1 over the expected claim amount, the premium is 1.1 per policy. Hence, the total premium for 100 policies is 100 · 1.1 = 110. Therefore, in view of part (a), the exact probability that the insurance company will have claims exceeding the premiums collected is & ∞ & ∞ 100 1 1 100−1 −1·x x e dx = x 99 e−x dx. P (T > 110) = "(100) 99! 110 110 In view of Equation (8.49) on page 453, an alternative expression is

P (T > 110) = 1 − P (T ≤ 110) = 1 − FT (110) = e−110

99 # (110)j

j =0

j!

.

c) We have µ = 1/1 = 1 and σ 2 = 1/12 = 1. Hence, by the central limit theorem,

P (T > 110) = 1 − P (T ≤ 110) = 1 − P (0 ≤ T ≤ 110) 0 ! " ! "1 110 − 100 · 1 0 − 100 · 1 ≈1− ) −) √ √ 1 100 1 100 $ % = 1 − )(1) − )(−10) = 0.159.

d) Using statistical software, we obtained 0.158 for the probability that a "(100, 1) random variable will exceed 110. We see that this probability and the approximate probability of 0.159 found in part (c) are almost identical. Thus, the normal approximation here is excellent. 11.63 Let T denote the total number of hours that a randomly selected person watches movies or sporting events during a 3-month period. Then T = X + Y . We have and

E(T ) = E(X + Y ) = E(X) + E(Y ) = 50 + 20 = 70

Var(T ) = Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X, Y ) = 50 + 30 + 2 · 10 = 100.

Now let T1 , . . . , T100 denote the number of hours that the 100 randomly selected people watch movies or sporting events during the 3-month period. Then T1 , . . . , T100 are (basically) independent and identically distributed random variables with common mean µ = 70 and common variance σ 2 = 100. Hence, from the central limit theorem, P (T1 + · · · + T100 ≤ 7100) = P (0 ≤ T1 + · · · + T100 ≤ 7100) ! " ! " 7100 − 100 · 70 0 − 100 · 70 ≈) −) √ √ 10 100 10 100 = )(1) − )(−70) = 0.841. 11.64 Let X denote the number of the 10,000 screws that are not within tolerance specifications. Then X ∼ B(10,000, 0.05). We want to choose r as small as possible so that P (X > r) ≤ 0.01. a) We have µX = 10,000 · 0.05 = 500 and σX2 = 10,000 · 0.05 · 0.95 = 475. In view of Chebyshev’s inequality, we get, for x > 500, that P (X > x) ≤ P (|X − 500| ≥ x − 500) ≤

σX2 475 = . 2 (x − 500) (x − 500)2

11.4

The Central Limit Theorem

11-43

√ Setting this last expression equal to 0.01 and solving for x, we find that x = 500 + 47,500 ≈ 717.94. Hence, using Chebyshev’s inequality, we choose r = 718. b) Here, because n = 10,000 is so large, we can use the central limit theorem for binomial random variables in the form of Equation (11.34) on page 661: P (X > x) = 1 − P (X ≤ x) = 1 − P (0 ≤ X ≤ x) 0 ! " ! "1 x − 500 0 − 500 ≈1− ) √ −) √ 475 475 " ! x − 500 . ≈1−) √ 475 √ Setting this last expression equal to 0.01 and solving for x gives x = 500 + 475 )−1 (0.99) ≈ 550.70. Hence, using the central limit theorem, we choose r = 551. c) Using statistical software, we find that P (X > 550) = 0.0110571 and P (X > 551) = 0.0098269. Thus, the smallest integer r for which P (X > r) ≤ 0.01 is r = 551. This result is the same as the one obtained in part (b) by using the central limit theorem. We see that the result obtained in part (a) by using Chebyshev’s rule is quite crude (a vast overestimate), which is not surprising. 11.65 Let X1 , . . . , X250 denote the lifetimes, in months, of the 250 previously used compressors. We know that σ = 40 and we assume as reasonable that X1 , . . . , X250 are independent and identically distributed random variables. From Equation (11.43) on page 668, ' $' % P 'X¯ 250 − µ' ≤ 5 ≈ 2)

0√ 1 250 · 5 − 1 = 2)(1.97642) − 1 = 0.952. 40

11.66 a) We have Yn = s

n n @ @ Yj =s Xj . Y j =1 j −1 j =1

J Taking natural logarithms, we get ln Yn = ln s + jn=1 ln Xj . Assume that X isn’t constant and that ln X 2 has finite variance. we have, JnSet µ = E(ln X) and σ = Var(ln X). Applying the central limit theorem, for large n, that j =1 ln Xj is approximately normal with mean nµ and variance nσ 2 . Therefore, for large n, the random variable ln Yn is approximately normal with mean ln s + nµ and variance nσ 2 . Consequently, for large n, the random variable Yn has approximately a lognormal distribution with parameters ln s + nE(ln X) and n Var(ln X). b) We consider each distribution given for X in turn. ●

X ∼ U(0, 1): In this case, E(ln X) =

&

0

1

(ln x) · 1 dx =

&

0

1

ln x dx = −1.

11-44

CHAPTER 11

Also, E

($

ln X

2 − (−1)2

Generating Functions and Limit Theorems

%2 )

&

=

0

1

&

(ln x)2 · 1 dx =

1

0

(ln x)2 dx = 2.

= 1. Consequently, Yn has approximately a lognormal distribution Hence, Var(ln X) = with parameters ln s − n and n.



X ∼ Beta(2, 1): In this case, & & 1 2−1 1−1 (ln x) · 2x (1 − x) dx = 2 E(ln X) = 0

1

0

!

1 x ln x dx = 2 · − 4

Also,

E

($

ln X

%2 )

=

&

0

1

2

(ln x) · 2x

2−1

(1 − x)

1−1

dx = 2

1/2 − (−1/2)2

&

0

1

"

1 =− . 2

! " 1 1 x(ln x) dx = 2 · = . 4 2 2

Hence, Var(ln X) = = 1/4. Consequently, Yn has approximately a lognormal distribution with parameters ln s − n/2 and n/4.



X ∼ Beta(1, 2): In this case, ! " & 1 & 1 3 3 1−1 2−1 (ln x) · 2x (1 − x) dx = 2 (1 − x) ln x dx = 2 · − =− . E(ln X) = 4 2 0 0

Also,

E

($

ln X

%2 )

=

&

0

1

2

(ln x) · 2x

1−1

(1 − x)

2−1

dx = 2

&

0

1

! " 7 7 (1 − x)(ln x) dx = 2 · = . 4 2 2

Hence, Var(ln X) = 7/2 − (−3/2)2 = 5/4. Consequently, Yn has approximately a lognormal distribution with parameters ln s − 3n/2 and 5n/4. ●

X ∼ Beta(2, 2): In this case, & & 1 2−1 2−1 (ln x) · 6x (1 − x) dx = 6 E(ln X) = 0

1

5 x(1 − x) ln x dx = 6 · − 36

0

Also, & & 1 ($ %2 ) E ln X = (ln x)2 · 6x 2−1 (1 − x)2−1 dx = 6 0

0

19/18 − (−5/6)2

!

1

2

x(1 − x)(ln x) dx = 6 ·

!

"

5 =− . 6

19 108

"

=

19 . 18

Hence, Var(ln X) = = 13/36. Consequently, Yn has approximately a lognormal distribution with parameters ln s − 5n/6 and 13n/36. 11.67 a) We apply Equation (11.47) on page 671. We have n = 51, σ = 2.4, and the sample mean of the data is x¯51 = 15.051. Also, γ = 0.90 and, so, ! " ! " −1 1 −1 1 ) (1 + γ ) = ) · 1.90 = )−1 (0.95) = 1.645. 2 2

Consequently, a 90% confidence √ interval for the mean depth of all subterranean coruro burrows has endpoints 15.051 ± 1.645 · 2.4/ 51, or (14.50, 15.60). b) We can be 90% confident that the mean depth of all subterranean coruro burrows is somewhere between 14.50 cm and 15.60 cm.

11.4

The Central Limit Theorem

11-45

c) Here γ = 0.99, so )−1

!

" ! " 1 1 (1 + γ ) = )−1 · 1.99 = )−1 (0.995) = 2.576. 2 2

Consequently, a 99% confidence √ interval for the mean depth of all subterranean coruro burrows has endpoints 15.051 ± 2.576 · 2.4/ 51, or (14.19, 15.92). d) If you want to be more confident that µ lies in your confidence interval, then you must settle for a greater-length interval. 11.68 Let I1 denote the elapsed time until the arrival of the first patient and, for j ≥ 2, let Ij denote the elapsed time between the arrivals of the (j − 1)st and j th patients. Then I1 , I2 , . . . are independent and identically distributed E(6.9) random variables. Moreover, we have Wn = I1 + · · · + In for each n ∈ N . a) From the first bulleted item on page 537, we see that Wn ∼ "(n, 6.9) for each n ∈ N . b) Answers will vary, but here is the procedure: We know that W4 = I1 + I2 + I3 + I4 , where the Ij s are independent E(6.9) random variables. From Proposition 8.16 or Example 8.22(b) on page 472, we find that, if U ∼ U(0, 1), then −(ln U )/6.9 ∼ E(6.9). Hence, to use a basic random number generator to get 5000 observations of W4 , proceed as follows: First obtain 5000 four-tuples of uniform numbers between 0 and 1. Next transform each of the 20,000 numbers so obtained by applying the function −(ln u)/6.9. Then take the sum of the four numbers in each of the 5000 transformed four-tuples. c) No, we would not expect a histogram of the 5000 observations from part (b) to be roughly bell shaped. Rather, as seen by referring to part (a), we would expect it to have roughly the shape of the PDF of a "(4, 6.9) random variable, which is right skewed. d) Answers will vary, but here is what we obtained. The superimposed curve is the PDF of a "(4, 6.9) random variable.

e) For the solution corresponding to part (b), we proceed as in that part, except that we obtain 5000 60-tuples instead of 5000 four-tuples. Here, though, we would expect a histogram of the 5000 observations of W60 to be bell shaped because of the central limit theorem. Indeed, we haveW60 = I1 + · · · + I60 , where I1 , . . . , I60 are independent and identically distributed random variables, all having the E(6.9) distribution. Hence, by the central limit theorem, W60 is approximately normally distributed; its exact distribution is "(60, 6.9), as we see by referring to part (a). The solid curve in the following figure is of the PDF of a "(60, 6.9) random variable. The histogram that you obtain of the 5000 observations of W60 should be shaped roughly like that PDF. For comparison, we have also included a plot of the PDF of a

11-46

CHAPTER 11

Generating Functions and Limit Theorems

normal random variable having the same mean and variance as a "(60, 6.9) random variable. That plot is the dashed curve in the figure.

11.69 Let X and Y denote the times, in seconds, that it takes George and Julia, respectively, to make a double decaf skim milk latte. We are assuming that µX = µY and that σX = σY = 4, and we will also assume that X and Y are independent random variables. Let D = X − Y . Then E(D) = E(X − Y ) = E(X) − E(Y ) = 0 and Var(D) = Var(X − Y ) = Var(X) + Var(Y ) = 2 · 42 = 32.

in seconds, %that it takes George and Julia, respectively, Let X1 , . . . , X40 and Y1 , . . . , Y40 denote the times, $ to make the 40 lattes. We want to compute P X¯ 40 − Y¯40 > 3 . Now, X1 + · · · + X40 Y1 + · · · + Y40 − X¯ 40 − Y¯40 = 40 40 =

(X1 − Y1 ) + · · · + (X40 − Y40 ) 40

=

D1 + · · · + D40 = D¯ 40 , 40

where Dj = Xj − Yj . The random variables D1 , . . . , D40 are independent and identically distributed with mean 0 and variance 32. Hence, by the central limit theorem, in the form of the fourth bulleted item on page 667, % $ % $ % $ P X¯ 40 − Y¯40 > 3 = P D¯ 40 > 3 = 1 − P D¯ 40 ≤ 3 0 1 3−0 ≈ 1 − ) √ ,√ = 1 − )(3.3541) 32 40 = 0.000398.

11.70 a) Let Xj (1 ≤ j ≤ 20) denote the lifetime, in hours, of bulb j . We know that X1 , . . . , X20 are independent random variables, each having the E(1/300) distribution. Let T denote the total time that the 20 bulbs

11.4

The Central Limit Theorem

11-47

last, so that T = X1 + · · · + X20 . Recalling that the mean and variance of an E(1/300) random variable are 300 and 3002 , respectively, we can apply the central limit theorem to conclude that P (T ≤ 7000) = P (0 ≤ T ≤ 7000) ! " ! " 7000 − 20 · 300 0 − 20 · 300 ≈) −) √ √ 300 20 300 20 = )(0.745356) − )(−4.472136) = 0.772. b) From the first bulleted item on page 537, we see that T ∼ "(20, 1/300). Using statistical software, we get that P (T ≤ 7000) = 0.783. The discrepancy between the exact value of 0.783 and the value of 0.772 obtained by using the normal approximation is due to the fact that an exponential distribution is highly skewed and that here n = 20, which is relatively small. Nonetheless, that absolute percentage error resulting from using the normal approximation is only 1.4%.

Theory Exercises 11.71 ( ) √ a) The quantity )−1 21 (1 + γ ) · σ/ n appearing in Expression (11.47) is called the margin of error (or maximum error of the estimate) for µ because we are 100γ % confident that our error ( the estimate ) of√ 1 −1 in estimating µ by x¯n is at most ) 2 (1 + γ ) · σ/ n. b) We want to find the smallest integer, n, such that ! " 1 σ )−1 (1 + γ ) · √ = E. 2 n Solving for n in the preceding equation, we obtain ⎡⎛ ( ) ⎞2 ⎤ 1 −1 ) 2 (1 + γ ) · σ ⎢ ⎠ ⎥ n = ⎢⎝ ⎥. ⎥ ⎢ E ⎥ ⎢

Advanced Exercises

11.72 a) Referring to the solution to Exercise 11.67(a), we see that, for a 90% confidence interval, the margin of error is ! " 2.4 2.4 2.4 −1 1 ) (1 + 0.90) · √ = )−1 (0.95) · √ = 1.645 · √ = 0.55 2 51 51 51 and, referring to the solution to Exercise 11.67(c), we see that, for a 99% confidence interval, the margin of error is ! " 2.4 2.4 2.4 −1 1 ) (1 + 0.99) · √ = )−1 (0.995) · √ = 2.576 · √ = 0.87. 2 51 51 51 Note that these margins of error can also be obtained by taking half the lengths of the confidence intervals found in parts (a) and (c), respectively, of Exercise 11.67.

11-48

CHAPTER 11

Generating Functions and Limit Theorems

b) Applying the result of Exercise 11.71(b), we find that, for a confidence coefficient of 0.90, the required sample size for a 0.5 cm margin of error is ⎡⎛ ( ) ⎞2 ⎤ S ! "2 T )−1 21 (1 + 0.90) · 2.4 1.645 · 2.4 ⎥ ⎢⎝ ⎠ ⎥= = 63 n=⎢ ⎥ ⎢ 0.5 0.5 ⎥ ⎢ and that, for a confidence coefficient of 0.99, the required sample size for a 0.5 cm margin of error is ⎡⎛ ( ) ⎞2 ⎤ S ! "2 T )−1 21 (1 + 0.99) · 2.4 2.576 · 2.4 ⎥ ⎢⎝ ⎠ ⎥= = 153. n=⎢ ⎥ ⎢ 0.5 0.5 ⎥ ⎢

11.73 Let X1 , . . . , X44 denote the purchase amounts, in dollars, for 44 sales with the new strategy. We assume that these random variables are independent and identically distributed with standard deviation $11 (as with the old strategy). If the mean purchase amount has not changed, then, by the central limit theorem, ! " 73 − 68 P (X¯ 44 ≥ 73) ≈ 1 − ) = 1 − )(3.01511) = 0.00128. √ 11/ 44

Thus, if the mean purchase amount has not changed, then the chances are only about 0.1% that 44 sales would average $73 or greater. Consequently, either an extremely unlikely event has occurred or there has been an increase in the mean purchase amount. The most reasonable conclusion is that the latter has occurred and, hence, that the new strategy is effective.

11.74 Let X1 , X2 , . . . be independent and identically distributed random variables, each having the Poisson distribution with parameter λ = 1. In particular, then, the common mean and variance of these random variables is µ = 1 and σ 2 = 1, respectively. From Proposition 6.20(b) on page 311, we know that X1 + · · · + Xn ∼ P(n) for each n ∈ N . Hence, P (X1 + · · · + Xn ≤ n) = Therefore,

n # k=0

e−n

n # nk nk = e−n . k! k! k=0

lim P (X1 + · · · + Xn ≤ n) = lim e

n→∞

n→∞

On the other hand, by the central limit theorem, lim P (X1 + · · · + Xn ≤ n) = lim P

n→∞

n→∞

The required result now follows.

!

−n

n # nk k=0

k!

.

" X1 + · · · + Xn − n · 1 1 ≤ 0 = )(0) = . √ 2 1· n

11.75 a) From Proposition 11.2 on page 635, the assumption that the common MGF of X1 , X2 , . . . is defined in some open interval containing 0 implies that these random variables have moments of all orders. (n) b) From Exercise 11.18, if a random variable X has finite nth moment, then E(Xn ) = (−i)n φX (0). 2 And, from Exercise 11.19(b), if X ∼ N (0, 1), then φX (t) = e−t /2 . To prove the central limit theorem by using characteristic functions, we first show that, if X is a random variable with finite variance, then σX2 ln φX (t) − iµX t . = − t→0 2 t2 lim

(∗)

11.4

The Central Limit Theorem

We recall that φX (0) = 1, and we have ′ φX (0) = iE(X) = iµX

and

Applying L’Hôpital’s rule twice, we get

11-49

$ % ′′ φX (0) = −E X2 = −µ2X − σX2 .

′ (t)/φ (t) − iµ ′ (t) − iµ φ (t) φX φX ln φX (t) − iµX t X X X X = lim = lim 2 t→0 t→0 t→0 2t 2tφX (t) t

lim

′ (t) − iµ φ (t) ′ (t) φX φ ′′ (t) − iµX φX X X = lim X t→0 t→0 2t 2

= lim =

′′ (0) − iµ φ ′ (0) φX −µ2X − σX2 − iµX · iµX σ2 X X = = − X. 2 2 2

Now, let X1 , X2 , . . . be independent and identically distributed random variables with common finite mean µ and common finite nonzero variance σ 2 . For convenience, set Sn = X1 + · · · + Xn and let Yn denote the standardized version of X1 + · · · + Xn . Then Yn =

X1 + · · · + Xn − nµ Sn − nµ µ 1 = = −n √ + √ Sn . √ √ σ n σ n σ n σ n

Applying Exercise 11.18, we get φYn (t) =

µ −in √ t e σ n φ

0 1n µ iµ $ √ % √ % √ %)n −in √ t ( $ − √ t $ σ n σ n φ t/σ n = e φ t/σ n , Sn t/σ n = e

where φ is the common CHF of X1 , X2 , . . . . Taking natural logarithms yields $ $ √ % √ % ! " $ √ % iµ t 2 ln φ t/σ n − iµ · t/σ n ln φYn (t) = n ln φ t/σ n − √ t = 2 · . $ √ %2 σ σ n t/σ n

It follows from Equation (∗) that

! " t2 σ2 t2 lim ln φYn (t) = 2 · − =− , n→∞ 2 2 σ or, equivalently, lim φYn (t) = e−t

n→∞

2 /2

.

As the function on the right of this last display is the CHF of a standard normal random variable, the central limit theorem now follows from the continuity theorem for characteristic functions. c) A proof of the central limit theorem by using characteristic functions is preferable to that by using moment generating functions for the following reason: By using characteristic functions, we need only assume moments up to order 2 (i.e., finite variance), whereas, by using moment generating functions, we must assume that the common MGF is defined in some open interval containing 0, which, in particular, assumes moments of all orders, as we noted in part (a).

11-50

CHAPTER 11

Generating Functions and Limit Theorems

Review Exercises for Chapter 11 Basic Exercises 11.76 Let X ∼ "(α, λ). We have MX (t) =

!

λ λ−t



= λα (λ − t)−α .

Consequently, MX′ (t) = αλα (λ − t)−α−1

and

MX′′ (t) = (α + 1)αλα (λ − t)−α−2 .

Therefore, from Proposition 11.2 on page 635, E(X) = MX′ (0) = αλα (λ − 0)−α−1 = and

α λ

$ % (α + 1)α . E X 2 = MX′′ (0) = (α + 1)αλα (λ − 0)−α−2 = λ2

Also,

%2 $ % $ (α + 1)α ( α )2 α − = 2. Var(X) = E X2 − E(X) = 2 λ λ λ

11.77 For convenience, let c = 2500. a) We have MX (t) = (1 − ct)−4 . Then

MX′ (t) = 4c(1 − ct)−5

and

MX′′ (t) = 20c2 (1 − ct)−6 .

Referring to Proposition 11.2 on page 635 and setting t = 0, we get E(X) = MX′ (0) = 4c(1 − 0)−5 = 4c and Hence,

and, therefore, σX = b) We have

$ % E X 2 = MX′′ (0) = 20c2 (1 − 0)−6 = 20c2 .

%2 $ % $ Var(X) = E X2 − E(X) = 20c2 − (4c)2 = 4c2 ,

√ 4c2 = 2c = 5000.

1 MX (t) = = (1 − ct)4

!

1/c 1/c − t

"4

.

From Table 11.1 on page 634, we see that MX is the MGF of a "(4, 1/c) random variable. Hence, by the uniqueness property of MGFs, X has that distribution. Referring now to the result obtained in Exercise 11.76, we get U / 4 = 2 · c = 5000. σX = Var(X) = (1/c)2

11-51

Review Exercises for Chapter 11

11.78 a) Let T ∼ T (−1, 1). From Equation (8.57) on page 460, we have fT (x) = 1 − |x| if −1 < x < 1, and fT (x) = 0 otherwise. Therefore, MT (t) = =

&



−∞

&

tx

e fT (x) dx =

1

0

(1 − x)e

−tx

&

dx +

1 −1

&

(1 − |x|)e dx =

1

0

tx

&

0

−1

tx

(1 + x)e dx +

&

1 0

(1 − x)etx dx

(1 − x)etx dx.

We know that MT (0) = 1. From calculus, for t ̸= 0, & 1 & 1 & 1 et − 1 tet − et + 1 et − 1 − t (1 − x)etx dx = etx dx − xetx dx = − = . t t2 t2 0 0 0 Replacing t by −t, we get

&

0

Consequently, for t ̸= 0,

1

(1 − x)e−tx dx =

e−t − 1 + t . t2

$ t % e + e−t − 2 e−t − 1 + t et − 1 − t 2(cosh t − 1) + = = . MT (t) = 2 2 2 t t t t2

b) Let X ∼ T (a, b). Then we can write

a+b b−a + T, 2 2 where T ∼ T (−1, 1). For convenience, set c = (a + b)/2 and d = (b − a)/2. From Proposition 11.1 on page 633 and part (a), we have, for t ̸= 0, X=

$ % 2 cosh(dt) − 1 MX (t) = Mc+dT (t) = e MT (dt) = e · (dt)2 ! " ($ %) 2e(a+b)t/2 = ($ % )2 cosh (b − a)/2 t − 1 (b − a)/2 t ct

8e(a+b)t/2 = (b − a)2 t 2

!

ct

" (b − a)t cosh −1 . 2

c) Let X and Y be independent $ % U(0, 1) random variables. From Table 11.1 on page 634, we know that MX (t) = MY (t) = et − 1 /t for t ̸= 0. Therefore, from Proposition 11.4 on page 636, MX+Y (t) = MX (t)MY (t) =

!

et − 1 t

" ! t " e −1 e2t − 2et + 1 · = t t2

% 2et et $ t −t + e − 2 = 2 (cosh t − 1) . e t2 t Letting a = 0 and b = 2 in the result of part (b) gives the same function as the one at the end of the preceding display. Hence, X + Y has the same MGF as a T (0, 2) random variable and, so, by the uniqueness property of MGFs, we conclude that X + Y ∼ T (0, 2). =

11-52

CHAPTER 11

Generating Functions and Limit Theorems

11.79 a) We want to use the MGF of T to obtain its mean and variance. To do so, we need to determine MT′ (0) and MT′′ (0). We can accomplish this task in several ways. One way is to use the Taylor series expansion J 2n of the hyperbolic cosine, namely, cosh t = ∞ n=0 t /(2n)!. From this expansion and Exercise 11.78(a), we get ∞ ∞ 2(cosh t − 1) 2 # t 2n t2 # 2t 2n−2 = = 1 + + . MT (t) = 12 n=3 (2n)! t2 t 2 n=1 (2n)! Therefore,

MT′ (t) = Consequently,

∞ # t 4(n − 1)t 2n−3 + 6 n=3 (2n)!

MT′′ (t) =

and

∞ 1 # 4(n − 1)(2n − 3)t 2n−4 + . 6 n=3 (2n)!

E(T ) = MT′ (0) = 0

and

$ % $ $ % %2 1 Var(T ) = E T 2 − E(T ) = E T 2 = MT′′ (0) = . 6 b) As we noted in Exercise 11.78(b), we can write X=

a+b b−a + T, 2 2

where T ∼ T (−1, 1). From part (a), then, ! " a+b b−a a+b b−a a+b b−a a+b E(X) = E + T = + E(T ) = + ·0= 2 2 2 2 2 2 2 and

!

a+b b−a Var(X) = Var + T 2 2

"

=

!

b−a 2

"2

Var(T ) =

!

b−a 2

"2

·

1 (b − a)2 = . 6 24

11.80 a) Using Equation (5.44), making the substitution j = x − r, and applying Equation (5.45), we get ! " ! " ∞ ∞ # # # −r tx tx r x−r t (j +r) −r e pX (x) = e p (p − 1) = e p r (p − 1)j MX (t) = x − r j x x=r j =0 $

= pe

% t r

" ∞ ! # −r $

j =0

j

(p − 1)e

% t j

$

= pe

% t r

$ %−r 1 + (p − 1)et =

!

pet 1 − (1 − p)et

"r

For this result to make sense, we need to have (1 − p)et < 1 or, equivalently, t < − ln(1 − p). b) For convenience, set pet . q =1−p and g(t) = 1 − qet We have $ % $ %r−1 ′ $ %r−1 1 − qet pet + pet qet ′ MX (t) = r g(t) g (t) = r g(t) · (1 − qet )2 $ %r r r g(t) = = MX (t) t 1 − qe 1 − qet

.

11-53

Review Exercises for Chapter 11

and

r rqet ′ M (t) + MX (t). 1 − qet X (1 − qet )2 Consequently, by the moment generation property of MGFs, r r r E(X) = MX′ (0) = MX (0) = · 1 = 1−q p p MX′′ (t) =

and

$ % E X 2 = MX′′ (0) =

Also,

r rq r r r(1 − p) r2 r(1 − p) MX′ (0) + M (0) = · + · 1 = + . X 2 2 2 1−q p p (1 − q) p p p2

$ % $ %2 r2 r(1 − p) Var(X) = E X2 = E(X) = 2 + − p p2

! "2 r r(1 − p) = . p p2

11.81 a) From Proposition 11.4 on page 636 and Exercise 11.80(a), ! "r1 ! "rm pet pet MX1 +···+Xm (t) = MX1 (t) · · · MXm (t) = ··· 1 − (1 − p)et 1 − (1 − p)et "r1 +···+rm ! pet . = 1 − (1 − p)et

We see that the MGF of X1 + · · · + Xm is that of a random variable having the negative binomial distribution with parameters r1 + · · · + rm and p. Hence, by the uniqueness property of MGFs, we conclude that X1 + · · · + Xm ∼ N B(r1 + · · · + rm , p). b) As we know, a geometric random variable with parameter p is also a negative binomial random variable with parameters 1 and p. Hence, X1 , . . . , Xm are also independent random variables with Xj ∼ N B(1, p) for 1 ≤ j ≤ m. Consequently, from part (a), we have X1 + · · · + Xm ∼ N B(m, p). % $ % $ 11.82 Let X ∼ N µ, σ 2 and Y ∼ N ν, τ 2 . As X and Y are independent random variables, Proposition 6.9 on page 291 implies that aX and bY are also independent random variables. Thus, from Propositions 11.4 and 11.1 on pages 636 and 633, respectively, and Table 11.1 on page 634, MaX+bY (t) = MaX (t)MbY (t) = MX (at)MY (bt) = eµ(at)+σ

2 (at)2 /2

eν(bt)+τ

2 (bt)2 /2

= e(aµ+bν)t+

$ 2 2 2 2% 2 a σ +b τ t /2

.

Referring again to Table 11.1 and applying$ the uniqueness property % of MGFs (Proposition 11.3 on 2 2 2 2 page 636), we conclude that aX + bY ∼ N aµ + bν, a σ + b τ .

11.83 a) As both −1/n and 1/n converge to 0 as n → ∞, heuristically, {Xn }∞ n=1 converges in distribution to the constant random variable X = 0; that is, pX (0) = 1 and pX (x) = 0 if x ̸= 0. b) We know that MXn (0) = 1 for all n ∈ N . From Table 11.1 on page 634, we see that, for t ̸= 0, MXn (t) = Applying L’Hôpital’s rule gives

et/n − e−t/n . 2t/n

%$ % $ − t/n2 et/n + e−t/n et/n − e−t/n 2 lim MXn (t) = lim = lim = = 1. 2 n→∞ n→∞ n→∞ 2t/n 2 −2t/n

11-54

CHAPTER 11

Generating Functions and Limit Theorems

$ % Now, let X be the constant random variable equal to 0. Then, MX (t) = E et·0 = 1 for all t ∈ R. Therefore, we have shown that MXn (t) → MX (t) as n → ∞ for all t ∈ R. c) We have * 0, + if x < −1/n; 0, if x < 0; FXn (x) = (nx + 1)/2, if −1/n ≤ x < 1/n; and FX (x) = 1, if x ≥ 0. 1, if x ≥ 1/n. We note that FX is continuous except at x = 0. Now, if x < 0, then, for sufficiently large n, we have x < −1/n and, hence, FXn (x) = 0. Therefore, lim FXn (x) = lim 0 = 0 = FX (x).

n→∞

n→∞

If x > 0, then, for sufficiently large n, we have x ≥ 1/n and, hence, FXn (x) = 1. Therefore, lim FXn (x) = lim 1 = 1 = FX (x).

n→∞

n→∞

We have shown that FXn (x) → FX (x) as n → ∞ for all x ∈ R at which FX is continuous.

11.84 $ % $ % a) We have MXn (t) = E et·1/n = et/n and MX (t) = E et·0 = 1. Consequently, for each t ∈ R, lim MXn (t) = lim et/n = 1 = MX (t).

n→∞

n→∞

Therefore, by the continuity theorem, Proposition 11.5 on page 638, we have that {Xn }∞ n=1 converges in distribution to X. Now, we have P (Xn = 0) = 0 for all n ∈ N and P (X = 0) = 1. Therefore, lim P (Xn = 0) = 0 ̸= 1 = P (X = 0).

n→∞

b) We have FX (x) =

+

0, 1,

if x < 0; if x ≥ 0.

To resolve your classmate’s confusion, just note that 0 is not a point of continuity of FX . 11.85 a) We have MX,Y (s, t) =

&

=6 =



&



e

sx+ty

−∞ −∞

&



0

6 2−t

e−(1−s)x &

∞ 0

fX,Y (x, y) dx dy =

!&

∞ x

&

0

∞ !& ∞

" e−(2−t)y dy dx =

e−(3−s−t)x dx =

x

6 2−t

6 (2 − t)(3 − s − t)

e

&

sx+ty ∞

0

· 6e

3 6 = . (2 − 0)(3 − s − 0) 3−s

Referring now to Table 11.1 on page 634, we see that X ∼ E(3).

"

dy dx

e−(1−s)x e−(2−t)x dx

if t < 2 and s + t < 3. b) From Proposition 11.7 on page 645 and part (a), MX (s) = MX,Y (s, 0) =

−(x+2y)

11-55

Review Exercises for Chapter 11

c) The time required to complete the second task is Y − X. From part (a), ( ) ( ) MY −X (t) = E e(Y −X)t = E e(−t)X+tY = MX,Y (−t, t) =

6 2 $ %= . 2−t (2 − t) 3 − (−t) − t

Referring now to Table 11.1, we see that Y − X ∼ E(2). d) The times required to complete the two tasks are X and Y − X. From parts (a)–(c), ( ) ( ) MX,Y −X (s, t) = E esX+t (Y −X) = E e(s−t)X+tY = =

6 $ % (2 − t) 3 − (s − t) − t

6 3 2 = · = MX (s)MY −X (t). (2 − t)(3 − s) 3−s 2−t

Hence, from Proposition 11.9 on page 646, X and Y − X are independent random variables. e) The time required to complete the two successive tasks is Y . From Proposition 11.7 and part (a), MY (t) = MX,Y (0, t) =

6 6 = . (2 − t)(3 − 0 − t) 6 − 5t + t 2

Hence, 6 6(5 − 2t) MY′ (t) = − $ %2 · (−5 + 2t) = $ %2 . 2 6 − 5t + t 6 − 5t + t 2

Consequently, E(Y ) = MY′ (0) = 30/36 = 5/6. 11.86 a) We have MX,Y (s, t) =

##

esx+ty pX,Y (x, y)

(x,y)

) 4 ( s·0+t·0 −(0+0) e 2 + es·1+t·0 2−(1+0) + es·0+t·1 2−(0+1) + es·1+t·1 2−(1+1) 9 ! " % 4 1 s 1 t 1 s+t 1$ = 1+ e + e + e = 4 + 2es + 2et + es+t . 9 2 2 4 9

=

b) We have

% 1$ s ∂MX,Y (s, t) = 2e + es+t , ∂s 9

% ∂ 2 MX,Y 1$ s s+t (s, t) = 2e + e , 9 ∂s 2

∂ 2 MX,Y 1 (s, t) = es+t . ∂t∂s 9

Therefore, from Proposition 11.6 on page 643, E(X) =

∂MX,Y 1 (0, 0) = , ∂s 3

$ % ∂ 2 MX,Y 1 E X2 = (0, 0) = , 2 3 ∂s

E(XY ) =

∂ 2 MX,Y 1 (0, 0) = . ∂t∂s 9

Thus, E(X) = 1/3 and Var(X) = 1/3 − (1/3)2 = 2/9. By symmetry, Y has the same mean and variance as X. Also, 1 1 1 Cov(X, Y ) = E(XY ) − E(X) E(Y ) = − · = 0. 9 3 3

11-56

CHAPTER 11

Generating Functions and Limit Theorems

c) From Proposition 11.7 on page 645 and part (a), ) 1 1( 2 MX (s) = MX,Y (s, 0) = 4 + 2es + 2e0 + es+0 = es + . 9 3 3

Referring now to Table 11.1 on page 634 and applying the uniqueness property of MGFs, we see that X ∼ B(1, 1/3) or, equivalently, X has the Bernoulli distribution with parameter 1/3. By symmetry, Y has the same probability distribution as X. d) From Proposition 11.10 on page 647 and part (a), ) % 1( 1$ MX+Y (t) = MX,Y (t, t) = 4 + 2et + 2et + et+t = 4 + 4et + e2t 9 9 ! "2 % 1$ 1 2 2 = 2 + et = et + . 9 3 3

Referring now to Table 11.1 on page 634 and applying the uniqueness property of MGFs, we see that X + Y ∼ B(2, 1/3). e) From parts (a) and (c), % 1$ %$ % 1$ 4 + 2es + 2et + es+t = 2 + es 2 + et 9 9 ! "! " 1 s 2 1 t 2 = e + e + = MX (s)MY (t). 3 3 3 3

MX,Y (s, t) =

Hence, from Proposition 11.9 on page 646, X and Y are independent random variables. 11.87 a) We have MX,Y (s, t) =

##

esx+ty pX,Y (x, y)

(x,y)

) 4 ( s·0+t·0 −(0+0) 2 + es·1+t·0 2−(1+0) + es·1+t·1 2−(1+1) e 7 " ! % 1$ 4 1 s 1 s+t = = 1+ e + e 4 + 2es + es+t . 7 2 4 7

=

b) We have

% 1$ s ∂MX,Y (s, t) = 2e + es+t , ∂s 7

% 1$ s ∂ 2 MX,Y s+t (s, t) = + e 2e , 7 ∂s 2

1 ∂MX,Y (s, t) = es+t , ∂t 7

1 ∂ 2 MX,Y (s, t) = es+t , 2 7 ∂t

∂ 2 MX,Y 1 (s, t) = es+t . ∂s∂t 7

Therefore, from Proposition 11.6 on page 643, E(X) =

∂MX,Y 3 (0, 0) = , ∂s 7

$ % ∂ 2 MX,Y 3 (0, 0) = , E X2 = 2 7 ∂s

E(Y ) =

∂MX,Y 1 (0, 0) = , ∂t 7

$ % ∂ 2 MX,Y 1 E Y2 = (0, 0) = , 2 7 ∂t

E(XY ) =

∂ 2 MX,Y 1 (0, 0) = . ∂s∂t 7

Review Exercises for Chapter 11

11-57

Hence, E(X) = 3/7, Var(X) = 3/7 − (3/7)2 = 12/49, E(Y ) = 1/7, Var(Y ) = 1/7 − (1/7)2 = 6/49, and 4 1 3 1 . Cov(X, Y ) = E(XY ) − E(X) E(Y ) = − · = 7 7 7 49 c) From Proposition 11.7 on page 645 and part (a), ) 3 1( 4 4 + 2es + es+0 = es + . MX (s) = MX,Y (s, 0) = 7 7 7 Referring now to Table 11.1 on page 634 and applying the uniqueness property of MGFs, we see that X ∼ B(1, 3/7) or, equivalently, X has the Bernoulli distribution with parameter 3/7. Also, ) 1 1( 6 4 + 2e0 + e0+t = et + . MY (t) = MX,Y (0, t) = 7 7 7 Referring again to Table 11.1 and applying the uniqueness property of MGFs, we see that Y ∼ B(1, 1/7) or, equivalently, Y has the Bernoulli distribution with parameter 1/7. d) From Proposition 11.10 on page 647 and part (a), ) 4 2 % 1( 1$ 1 4 + 2et + et+t = 4 + 2et + e2t = + et + e2t . MX+Y (t) = MX,Y (t, t) = 7 7 7 7 7 Applying the uniqueness property of MGFs, we conclude that ⎧ 4/7, if z = 0; ⎪ ⎨ 2/7, if z = 1; pX+Y (z) = ⎪ ⎩ 1/7, if z = 2; 0, otherwise. e) We claim that X + Y does not have a binomial distribution. Suppose to the contrary that it does, say, X + Y ∼ B(2, p). Then ! " 2 0 4 p (1 − p)2−0 = (1 − p)2 = pX+Y (0) = 0 7

and

Therefore,

! " 2 2 1 p (1 − p)2−2 = p2 . = pX+Y (2) = 2 7 1 = p + (1 − p) =

7

1 + 7

7

4 3 =√ , 7 7

√ or 7 = 3, which is not the case. Hence, X + Y does not have a binomial distribution. f) From parts (a) and (c), MX,Y (s, t) = and

% 4 2 1$ 1 4 + 2es + es+t = + es + es+t 7 7 7 7

"! " 1 t 6 24 18 s 3 s 4 4 3 e + e + = + e + et + es+t . MX (s)MY (t) = 7 7 7 7 49 49 49 49 Thus, MX,Y (s, t) ̸= MX (s)MY (t) and, hence, from Proposition 11.9 on page 646, X and Y are not independent random variables. 11.88 a) Answers will vary.

!

11-58

CHAPTER 11

Generating Functions and Limit Theorems

b) As we know, the mean of a P(3) random variable is 3. So, in view of the law of large numbers, we would expect the average value of the 10,000 numbers obtained in part (a) to be roughly 3. c) Answers will vary, but should be close to 3. We got 2.9747. 11.89 a) As g is Riemann integrable on [a, b], it is bounded thereon, say, by M. Define h(x) = g(x)/f (x) if x ∈ [a, b], and h(x) = 0 otherwise. Then ' & ∞ & b' & b & b ' g(x) ' ' ' |g(x)| dx ≤ M |h(x)|f (x) dx = 1 dx = (b − a)M < ∞. ' ' f (x) dx = −∞ a f (x) a a

Consequently, we conclude from the $ % $ FEF that% the random variable h(X) has finite expectation. However, P g(X)/f (X) = h(X) ≥ P X ∈ [a, b] = 1. Hence, g(X)/f (X) = h(X) with probability 1. In particular, then, g(X)/f (X) has the same probability distribution as h(X). Thus, g(X)/f (X) has finite expectation. b) Let X be as in part (a) and let h be as in the solution to part (a). From the FEF and the result of part (a), & ∞ & b & b $ % g(x) E h(X) = h(x)f (x) dx = f (x) dx = g(x) dx. −∞ a f (x) a

Also, from the solution to part (a), g(X)/f (X) has the same probability distribution as h(X) and, hence, the same mean. Because X1 , X2 , . . . are independent and identically distributed random variables, so are the random variables g(X1 )/f (X1 ), g(X2 )/f (X2 ), . . . . Applying the strong law of large numbers to these latter random variables, we get ! " & b n # 1 g(X ) g(X) k =E = g(x) dx, lim Iˆn (f, g) = lim n→∞ n→∞ n f (Xk ) f (X) a k=1 with probability 1. c) We have &



−∞

$ %2 h(x) f (x) dx =

&

b a

!

g(x) f (x)

"2

f (x) dx =

$ %2 Hence, from the FEF, h(X) has finite expectation if and only if %2 & b$ g(x) dx < ∞. f (x) a

&

b a

$ %2 g(x) dx. f (x)

(∗)

We note that Iˆn (f, g) has finite variance if and only if g(X)/f (X) has finite second moment, which is the $ %2 case, if and only if h(X) has finite expectation. Thus, we have shown that Iˆn (f, g) has finite variance if and only if Relation (∗) holds. In that case, 0 1 0 1 n n ) ( # # $ % 1 g(X ) 1 g(X ) 1 k k = 2 Var = 2 · n Var h(X) Var Iˆn (f, g) = Var n k=1 f (Xk ) f (Xk ) n n k=1 0& $ % ! " "2 1 !& b b g(x) 2 %2 ) ( $ %) 2 1 1 ($ E h(X) − E h(X) = g(x) dx dx − . = n n f (x) a a

d) Suppose a = 0, b = 1, and f is the PDF of a U(0, 1) random variable. Then f (x) = 1 if 0 < x < 1, and f (x) = 0 otherwise. However, because we can change the value of a PDF at a finite number of

11-59

Review Exercises for Chapter 11

points without affecting the distribution of the corresponding random variable, we can take f (x) = 1 if x ∈ [0, 1], and f (x) = 0 if x ∈ / [0, 1]. In this case, the result of part (b) becomes & 1 n n 1# g(Xk ) 1# = lim g(Xk ) = g(x) dx, lim Iˆn (f, g) = lim n→∞ n→∞ n f (Xk ) n→∞ n k=1 0 k=1

with probability 1, which is the result of the Monte Carlo integration technique presented in Exercise 11.43. Referring to part (c), we see that, in this case, ⎛ 0& 12 ⎞ ) 1 & 1$ ( 1 %2 g(x) dx − g(x) dx ⎠ . Var Iˆn (f, g) = ⎝ n 0 0

11.90 Let X denote the number of the 100 sampled items that are defective. Then X ∼ B(100, 0.08). We want P (X ≥ 12). Note that np = 100 · 0.08 = 8 and n(1 − p) = 100 · 0.92 = 92, both of which are 5 or greater. Applying the integral De Moivre–Laplace theorem, Proposition 11.11 on page 661, we get 0 1 0 1 100 + 21 − 100 · 0.08 12 − 21 − 100 · 0.08 P (X ≥ 12) = P (12 ≤ X ≤ 100) ≈ ) −) √ √ 100 · 0.08 · 0.92 100 · 0.08 · 0.92 = )(34.09595) − )(1.29012) = 0.0985. 11.91 Let X denote the number of the 10,000 sampled males who have Fragile X Syndrome. Then we have X ∼ B(10,000, 1/1500). Note that np = 10,000/1500 = 20/3 and that 7 √ / 1 1499 1499 np(1 − p) = 10,000 · · = . 1500 1500 15 Applying the integral De Moivre–Laplace theorem, we get 1 0 1 0 a − 21 − 20 b + 21 − 20 3 3 −) √ , 0 ≤ a ≤ b ≤ 10,000. P (a ≤ X ≤ b) ≈ ) √ 1499/15 1499/15 a) We have 0

10,000 + 21 − P (X > 7) = P (8 ≤ X ≤ 10,000) ≈ ) √ 1499/15 b) We have

20 3

1

0

8 − 1 − 20 3 −) √ 2 1499/15

= )(3871.88585) − )(0.32286) = 0.373. 0

10 + 1 − 20 3 P (X ≤ 10) = P (0 ≤ X ≤ 10) ≈ ) √ 2 1499/15

1

0

0 − 1 − 20 3 −) √ 2 1499/15

= )(1.48514) − )(−2.77656) = 0.928. 11.92 b) We have Probability

Local

Integral

Poisson

P (X > 7) P (X ≤ 10)

0.375 0.930

0.373 0.928

0.352 0.923

1

1

11-60

CHAPTER 11

Generating Functions and Limit Theorems

11.93 Let X denote the number of the 1500 cases in which the new oral vaccine is effective. Then we have X ∼ B(1500, 0.8). Note that np = 1500 · 0.8 = 1200 and that / √ √ np(1 − p) = 1500 · 0.8 · 0.2 = 4 15. Applying the integral De Moivre–Laplace theorem, we get 0 1 0 1 b + 21 − 1200 a − 21 − 1200 P (a ≤ X ≤ b) ≈ ) −) , √ √ 4 15 4 15

0 ≤ a ≤ b ≤ 1500.

a) We have 0

1225 + 21 − 1200 P (X = 1225) = P (1225 ≤ X ≤ 1225) ≈ ) √ 4 15

b) We have

0

1

0

1

1225 − 21 − 1200 −) √ 4 15

= )(1.64602) − )(1.58147) = 0.00701. 0

1500 + 21 − 1200 P (X ≥ 1175) = P (1175 ≤ X ≤ 1500) ≈ ) √ 4 15

c) We have

1

1

1175 − 21 − 1200 −) √ 4 15

= )(19.39719) − )(−1.64602) = 0.950. 0

1250 + 21 − 1200 P (1150 ≤ X ≤ 1250) ≈ ) √ 4 15

1

0

1150 − 21 − 1200 −) √ 4 15

1

= )(3.25976) − )(−3.25976) = 0.999. 11.94 For convenience, set S = X1 + X2 + X3 . a) The following table provides the 27 possible outcomes and their sums: (X1 , X2 , X3 ) S

(1, 1, 1) (1, 1, 2) (1, 1, 3) (1, 2, 1) (1, 2, 2) (1, 2, 3) (1, 3, 1) (1, 3, 2) (1, 3, 3)

(X1 , X2 , X3 ) S

3 4 5 4 5 6 5 6 7

(2, 1, 1) (2, 1, 2) (2, 1, 3) (2, 2, 1) (2, 2, 2) (2, 2, 3) (2, 3, 1) (2, 3, 2) (2, 3, 3)

(X1 , X2 , X3 ) S

4 5 6 5 6 7 6 7 8

(3, 1, 1) (3, 1, 2) (3, 1, 3) (3, 2, 1) (3, 2, 2) (3, 2, 3) (3, 3, 1) (3, 3, 2) (3, 3, 3)

5 6 7 6 7 8 7 8 9

Because each of the 27 outcomes is equally likely, we obtain the following probability distribution of the random variable S. s

3

4

5

6

7

8

9

pS (s)

1 27

3 27

6 27

7 27

6 27

3 27

1 27

Review Exercises for Chapter 11

11-61

b) From the FPF and part (a), P (S ≤ 7) =

# s≤7

pS (s) =

3 6 7 6 23 1 + + + + = . 27 27 27 27 27 27

c) In the following graph, we use p(s) instead of pS (s). For comparison purposes, we have superimposed the appropriate normal curve.

As we see, the probability histogram is quite bell shaped and reasonably resembles the superimposed normal curve. d) We have $ % # 1 1 1 µ = E Xj = xpXj (x) = 1 · + 2 · + 3 · = 2 3 3 3 x and

( ) $ $ %% # 1 1 1 2 2 σ 2 = E Xj2 − E Xj = x 2 pXj (x) − 22 = 12 · + 22 · + 32 · − 4 = . 3 3 3 3 x

Applying the normal approximation with a continuity correction, we get 0 1 0 1 7 + 21 − 3 · 2 0 − 21 − 3 · 2 P (S ≤ 7) = P (3 ≤ S ≤ 7) ≈ ) −) √ √ √ √ 2/3 3 2/3 3 = )(1.06066) − )(−4.59619) = 0.856.

This approximate probability compares quite favorably with the exact probability of 23/27 = 0.852 found in part (b). 11.95 Let X1 , . . . , X140 denote the scores of the 140 people. We assume that these random variables are independent and identically distributed with common mean 73 and common standard deviation 5. In view of the central limit theorem, in the form of the fourth bulleted item on page 667, we have $ % $ % $ % P X¯ 140 > 74 = 1 − P X¯ 140 ≤ 74 = 1 − P 0 ≤ X¯ 140 ≤ 74 0 ! " ! "1 74 − 73 0 − 73 ≈1− ) −) √ √ 5/ 140 5/ 140 $ % = 1 − )(2.36643) − )(−172.74953) = 0.00898.

11-62

CHAPTER 11

Generating Functions and Limit Theorems

11.96 Let Xj (1 ≤ j ≤ 20) denote the lifetime, in hours, of bulb j . Also, let Yj (1 ≤ j ≤ 19) denote the replacement time, in hours, to install bulb j + 1 after bulb j burns out. We know that X1 , . . . , X20 are independent and identically distributed random variables, each with mean 300 and standard deviation 300, and we assume that Y1 , . . . , Y19 are independent and identically distributed random variables, each with mean 0.5 and standard deviation 0.1, and independent of the Xj s. Let T denote the total time that the 20 bulbs last, so that T = X1 + · · · + X20 + Y1 + · · · + Y19 . From the central limit theorem, $ $ % % X1 + · · · + X20 ≈ N 20 · 300, 20 · 3002 and Y1 + · · · + Y19 ≈ N 19 · 0.5, 19 · 0.12 .

Therefore, from the independence of X1 + · · · + X20 and Y1 + · · · + Y19 , we conclude that T is approximately normally distributed with µT = 20 · 300 + 19 · 0.5 = 6009.5

and

σT2 = 20 · 3002 + 19 · 0.12 = 1,800,000.19.

Consequently, P (T ≤ 7000) = P (0 ≤ T ≤ 7000) ! " ! " 7000 − 6009.5 0 − 6009.5 ≈) √ −) √ 1,800,000.19 1,800,000.19 = )(0.73828) − )(−4.47922) = 0.770. 11.97 All units are in years. We assume that the difference between a true age and its rounded age is uniformly distributed on the interval from −2.5 to 2.5. Now, for 1 ≤ j ≤ 48, let Xj and Yj denote the age and rounded age, respectively, of the j th person. We assume that X1 , . . . , X48 are independent and identically distributed, that Y1 , . . . , Y48 are independent and identically distributed, and that the Xj s and Yj s are independent. Let Dj = Xj − Yj . By assumption, Dj ∼ U(−2.5, 2.5) and, from Proposition 6.13 on page $' 297, we'conclude % that D1 , . . . , D48 are independent. We want to find the approximate value of P 'X¯ 48 − Y¯48 ' ≤ 0.25 . However, n n n n 1 # 1 # 1 # 1 # X¯ 48 − Y¯48 = Xj − Yj = (Xj − Yj ) = Dj = D¯ 48 . 48 j =1 48 j =1 48 j =1 48 j =1

$ % $ % Because Dj ∼ U(−2.5, 2.5), we have µ = E Dj = 0 and σ 2 = Var Dj = 52 /12. Applying the central limit theorem, in the form of Equation (11.43) on page 668, we have 0√ 1 ' ' $' % $' % 48 · 0.25 P 'X¯ 48 − Y¯48 ' ≤ 0.25 = P 'D¯ 48 − 0' ≤ 0.25 ≈ 2) − 1 = 2)(1.2) − 1 = 0.770. √ 5/ 12 11.98 Let Xj (1 ≤ j ≤ 1250) denote the number of claims filed by policyholder j during the 1-year period. We know that X1 , . . . , X1250 are independent random variables, each having the Poisson distribution with mean (parameter) 2. a) The total number of claims during the 1-year period is T = X1 + · · · + X1250 , which, by Proposition 6.20(b) on page 311, has a P(2500) distribution. Hence, P (2450 ≤ T ≤ 2600) = e−2500

2600 #

(2500)k . k! k=2450

11-63

Review Exercises for Chapter 11

$ % $ % b) We have µ = E Xj = 2 and σ 2 = Var Xj = 2. Applying the central limit theorem, we get P (2450 ≤ T ≤ 2600) = P (2450 ≤ X1 + · · · + X1250 ≤ 2600) ! " ! " 2600 − 1250 · 2 2450 − 1250 · 2 ≈) −) √ √ √ √ 2 1250 2 1250 = )(2) − )(−1) = 0.819. 11.99 As X1 , X2 , . . . are independent and identically distributed random variables, so are the random variables ln X1$,Vln X2 , . %. . . By Jnassumption, these latter random variables have finite nonzero variance. n Noting that ln j =1 Xj = j =1 ln Xj , we conclude from the central limit theorem that, for large n, $V % V the random variable ln jn=1 Xj has approximately a normal distribution or, equivalently, jn=1 Xj has approximately a lognormal distribution.

11.100 We apply Equation (11.47) on page 671 with n = 30, x¯30 = 27.97, σ = 10.04, and γ = 0.90. A 90% confidence interval for µ is from 10.04 27.97 − )−1 (0.95) · √ 30

to

10.04 27.97 + )−1 (0.95) · √ 30

or from 24.955 to 30.985. We can be 90% confident that the mean commute time in Washington, D.C., is somewhere between 24.955 and 30.985 minutes.

Theory Exercises 11.101 We know that X1 , X2 , . . . are independent Bernoulli random variables with common parameter p and, hence, with common finite mean p. As usual, we let X¯ n = (X1 + · · · + Xn ) /n. a) From the strong law of large numbers, Theorem 11.2 on page 654, limn→∞ X¯ n = p with probability 1. Let W X E = ω : lim X¯ n (ω) = p . n→∞

Then P (E) = 1. Let ω ∈ E. Because f is continuous at p and limn→∞ X¯ n (ω) = p, we conclude that ( ) $ % lim f X¯ n (ω) = f lim X¯ n (ω) = f (p). n→∞

n→∞

$ % Because the preceding result holds for all ω ∈ E, we have limn→∞ f X¯ n = f (p) with probability 1. b) Let Sn = X1 + · · · + Xn . Then Sn ∼ B(n, p). Applying the FEF with g(x) = f (x/n) yields 0 ! "1 % $ % # $ X1 + · · · + Xn E f g(x)pSn (x) = E f (Sn /n) = E g(Sn ) = n x ! " n # # n k = f (x/n)pSn (x) = f (k/n) p (1 − p)n−k . k x k=0

c) Suppose that f is bounded by M on [0, 1]. By the triangle inequality, |f (x) − f (y)| ≤ |f (x)| + |f (y)| ≤ M + M = 2M,

x, y ∈ [0, 1].

Now, let ϵ > 0 be given. Because f is uniformly continuous on [0, 1], we can choose a δ > 0 such that |f (x) − f (y)| < ϵ/2 whenever x, y ∈ [0, 1] and |x − y| < δ. Then we choose N ∈ N such that, if n ≥ N, then M/2nδ 2 < ϵ/2.

11-64

CHAPTER 11

Noting that

Jn

k=0 pSn (k)

Generating Functions and Limit Theorems

= 1, the triangle inequality yields

' ' ' ' ! " n n ' ' ' # # ' ' '' n ' ' ' k n−k 'f (p) − Bn (p)' = 'f (p) − f (k/n) p (1 − p) ' = 'f (p) − f (k/n)pSn (k)' ' ' ' ' k k=0 k=0 ' ' n n ' ' # '# ' % ' ' $ 'f (p) − f (k/n)' pS (k) f (p) − f (k/n) pSn (k)' ≤ =' n ' k=0 'k=0 # ' # ' ' ' 'f (p) − f (k/n)' pS (k) + 'f (p) − f (k/n)' pS (k). = n n |k/n−p| 1/2. Because FX has only a countable number of points of discontinuity, we can choose x0 ∈ CFX with x0 ≥ M. Then, from part (b) and the definition of convergence in distribution, 1 1 = lim FXn (x0 ) = FX (x0 ) ≥ FX (M) > , 2 n→∞ 2 ∞ which is impossible. Hence, as suggested in part (a), {Xn }n=1 does not converge in distribution to a random variable. 11.110 a) We have

( ) ($ % ) $ % N MN (t) = E etN = E et = PN et .

Review Exercises for Chapter 11

b) From part (a), and

11-69

$ % MN′ (t) = et PN′ et

$ % $ % $ % $ % MN′′ (t) = et · et PN′′ et + et · PN′ et = e2t PN′′ et + et PN′ et .

Applying the moment generating property of MGFs, Proposition 11.2 on page 635, we now obtain ( ) E(N) = MN′ (0) = e0 PN′ e0 = PN′ (1)

and

( ) ( ) $ % E N 2 = MN′′ (0) = e2·0 PN′′ e0 + e0 PN′ e0 = PN′′ (1) + PN′ (1).

Thus,

$ % $ $ %2 %2 Var(N) = E N 2 − E(N) = PN′′ (1) + PN′ (1) − PN′ (1) .

11.111 % $ %N $ a) From the multiplication property of MGFs (Proposition 11.4 on page 636), E etSN | N = MX (t) . Hence, by the law of total expectation, ( $ ($ $ %) %N ) % % $ MSN (t) = E etSN = E E etSN | N = E MX (t) = PN MX (t) .

b) Referring to part (a), we get and

% $ MS′ N (t) = PN′ MX (t) MX′ (t)

$ % $ %$ %2 MS′′N (t) = PN′ MX (t) MX′′ (t) + PN′′ MX (t) MX′ (t) .

Applying the moment generating property of MGFs and Exercise 11.110(b) now yields $ % E(SN ) = MS′ N (0) = PN′ MX (0) MX′ (0) = PN′ (1)MX′ (0) = µE(N) and

$ 2% $ % $ %$ %2 E SN = MS′′N (0) = PN′ MX (0) MX′′ (0) + PN′′ MX (0) MX′ (0) $ % $ %2 $ % $ %2 = PN′ (1)E X 2 + PN′′ (1) E(X) = PN′ (1)E X 2 + PN′′ (1) E(X) $ % $ % = σ 2 PN′ (1) + µ2 PN′′ (1) + PN′ (1) = σ 2 E(N) + µ2 PN′′ (1) + PN′ (1) .

Consequently,

% $ %2 $ 2% $ %2 $ Var(SN ) = E SN − E(SN ) = σ 2 E(N) + µ2 PN′′ (1) + PN′ (1) − µPN′ (1) $ % $ %2 = σ 2 E(N) + µ2 PN′′ (1) + PN′ (1) − µ2 PN′ (1) ( $ %2 ) = σ 2 E(N) + µ2 PN′′ (1) + PN′ (1) − PN′ (1) = σ 2 E(N) + µ2 Var(N) .

11.112 Let X1 + X2 , X = X¯ 2 = 2

X1 − X2 Y = X1 − X¯ 2 = , 2

X2 − X1 Z = X2 − X¯ 2 = . 2

Then, sX + tY + uZ =

sX1 + sX2 1 tX1 − tX2 uX2 − uX1 1 + + = (s + t − u)X1 + (s − t + u)X2 . 2 2 2 2 2

11-70

CHAPTER 11

Generating Functions and Limit Theorems

Applying the independence of X1 and X2 and then recalling that the MGF of a standard normal random 2 variable is et /2 , we get ) ( 1 ) ( 1 ) ( 1 1 MX,Y,Z (s, t, u) = E esX+tY +uZ = E e 2 (s+t−u)X1 + 2 (s−t+u)X2 = E e 2 (s+t−u)X1 e 2 (s−t+u)X2 ( 1 ) ( 1 ) 2 2 2 2 = E e 2 (s+t−u)X1 E e 2 (s−t+u)X2 = e(s+t−u) /8 e(s−t+u) /8 = es /8+(t−u) /8 = es

2 /8

2 /8

e(t−u)

.

First setting t = u = 0 in the preceding result and then setting s = 0, we conclude that MX,Y,Z (s, t, u) = es

2 /8

2 /8

e(t−u)

= MX (s)MY,Z (t, u).

Consequently, X and (Y, Z) are independent; that is, X¯ 2 is independent of the random variables X1 − X¯ 2 and X2 − X¯ 2 . 11.113 a) Let M bound the variance of X1 , X2 , . . . . We have 0 1 !# " n n n n n # # # # $ % Var Xk = Cov Xk , Xk = Cov Xj , Xk k=1

k=1

=2 =2

k n # # k=1 j =1

k n # # k=1 j =1

k=1 j =1

k=1

n $ % # Cov Xj , Xk − Cov(Xk , Xk ) k=1

$

%

Cov Xj , Xk −

n #

Var(Xk ) .

k=1

J We note that nk=1 Var(Xk ) ≤ nM and, hence, upon division by n2 , the second term on the right of the preceding display approaches 0 as n → ∞. J We claim that the same is true for the first Jterm on the right of the preceding display. Indeed, let sn = n−1 nk=1 Cov(Xk , Xn ), an = n, and bn = nk=1 ak = n(n + 1)/2. By assumption, limn→∞ sn = 0. Now, n n # k n $ % 1 # 1 # n(n + 1) 1 # Cov X , X = a s = · ak sk . j k k k bn k=1 n2 k=1 2n2 n2 k=1 j =1

Applying Toeplitz’s lemma, we conclude that the term on the right $ofJthe preceding display converges % to (1/2) · 0 = 0 as n → ∞. We have now shown that limn→∞ Var nk=1 Xk /n2 = 0. The required result now follows from Markov’s weak law of large numbers, Exercise 11.49. b) Let M bound the variance of X1 , X2 , . . . . From the hint, ' $ %' 6 $ % 'Cov Xi , Xj ' ≤ Var(Xi ) Var Xj ≤ M, i, j ∈ N . ' $ %' Let ϵ > 0. Choose N ∈ N so that 'Cov Xi , Xj ' < ϵ if |i − j | ≥ N . Then, for n > N, ' n ' ' # ' ' N−1 ' n−1 '1 # ' ' 1 n−1 ' '1 # ' 1 # ' ' ' ' ' ' Cov(X , X = Cov(X , X = Cov(X , X + Cov(X , X ) ) ) ) k n n−k n n−k n n−k n 'n ' 'n ' 'n ' n k=N k=1 k=0 k=0 ≤

n−1 ' #' # ' ' 1 N−1 'Cov(Xn−k , Xn )' + 1 'Cov(Xn−k , Xn )' ≤ N M + n − N ϵ. n k=0 n k=N n n

11-71

Review Exercises for Chapter 11

Consequently,

' n ' ! " '1 # ' N n−N ' ' lim sup' Cov(Xn−k , Xn )' ≤ lim M+ ϵ = ϵ. n→∞ n n n→∞ n k=0

As ϵ > 0 was arbitrary, we conclude that Equation (∗) holds. 11.114 a) If k = 0, If k ̸= 0,

& &

π

−π

π −π

eikt dt =

&

π

−π

ei·0·t dt =

&

1 dt = 2π.

−π

8 9 ) % 1 ikt π 1 ( ikπ 1 $ e = e − e−ikπ = cos(kπ) − cos(−kπ) ik ik ik −π

eikt dt =

% 1 $ cos(kπ) − cos(kπ) = 0. ik

=

b) For x ∈ Z, we have, in view of part (a), 1 2π

π

&

1 e−ixt φX (t) dt = 2π −π π

&

π

−π

e−ixt

0

#

1

eitk pX (k)

k∈Z

dt

!& π " 1 # i(k−x)t e dt pX (k) = 2π k∈Z −π

=

1 · 2πpX (x) = pX (x). 2π

c) From the multiplication property of CHFs for independent random variables, we know that $ %m φX1 +···+Xm (t) = φX1 (t) · · · φXm (t) = φ(t) · · · φ(t) = φ(t) . GH I F m times

Applying part (b) to the random variable X1 + · · · + Xm , we conclude that & π & π $ %m 1 1 e−ixt φX1 +···+Xm (t) dt = e−ixt φ(t) dt, pX1 +···+Xm (x) = 2π −π 2π −π

x ∈ Z.

11.115 $ % 2 2 a) From Exercise 11.19, we know that, if X ∼ N 0, σ 2 , then φX (t) = e−σ t /2 . Thus, for all t ∈ R, & ∞ & ∞ 1 2 2 2 2 eitx fX (x) dx = √ eitx e−x /2σ dx. e−σ t /2 = φX (t) = 2π σ −∞ −∞ Hence,

e

−σ 2 t 2 /2

=√

1

2π σ Interchanging the roles of x and t yields e

−σ 2 x 2 /2

1

=√ 2π σ

&

∞ −∞

&

∞ −∞

eitx e−x

2 /2σ 2

eixt e−t

2 /2σ 2

dx,

t ∈ R.

dt,

x ∈ R.

11-72

CHAPTER 11

Generating Functions and Limit Theorems

Now replacing x by −x and σ by 1/σ , we get & ∞ σ 2 2 −x 2 /2σ 2 =√ e−ixt e−σ t /2 dt, e 2π −∞ Hence, for all x ∈ R,

1 1 2 2 e−x /2σ = fX (x) = √ 2π 2π σ

&



−∞

e−ixt e−σ

2 t 2 /2

x ∈ R.

dt =

1 2π

&



−∞

e−ixt φX (t) dt.

b) By independence, ( ) ( ) ( ) ( ) 2 2 φX+cZ (t) = E eit (X+cZ) = E eitX eictZ = E eitX E eictZ = φX (t)φZ (ct) = φX (t)e−c t /2 .

Now,

)' (' ' ' ' ( ') 'φX (t)' = ''E eitX '' ≤ E 'eitX ' = E(1) = 1.

Therefore, &

∞'

−∞

' 'φX+cZ (t)' dt =

&

∞'

−∞

'φX (t)e

−c2 t 2 /2

' ' dt ≤

&



e

−∞

−c2 t 2 /2

√ 2π dt = < ∞. c

Thus, X + cZ is a continuous random variable and, by Equation (∗∗), & ∞ & ∞ 1 1 2 2 −iyt e φX+cZ (t) dt = e−ity φX (t)e−c t /2 dt, fX+cZ (y) = 2π −∞ 2π −∞

y ∈ R.

11.116 a) We have φY (t) =

&



e

ity

−∞

1 = 2

&



−∞

e

1 fY (y) dy = 2 −|y|

&



−∞

i cos ty dy + 2

e

&

ity −|y|



−∞

e

1 dy = 2

&



−∞

(cos ty + i sin ty)e−|y| dy

e−|y| sin ty dy.

The second integral on the right of the preceding display equals 0 because the integrand is an odd function. Because the integrand in the first integral on the right of the preceding display is an even function, 8 9∞ & ∞ 1 1 −y e−y cos ty dy = e (− cos ty + t sin ty) = . φY (t) = 1 + t2 1 + t2 0 0 b) From part (a) and Equation (∗∗), we have & ∞ & ∞ 1 1 1 1 −|y| e = fY (y) = e−iyt φY (t) dt = e−iyt dt, 2 2π −∞ 2π −∞ 1 + t2

or, equivalently,

e

−|y|

=

&



1 % dt, e−iyt $ π 1 + t2 −∞

y ∈ R,

y ∈ R.

Replacing y by t and t by −x, we get & ∞ & ∞ 1 % dx = eitx $ eitx fX (x) dx = φX (t), e−|t| = 2 π 1 + x −∞ −∞

t ∈ R.

BETA

Chapter 12

Applications of Probability Theory 12.1 The Poisson Process Basic Exercises 12.1 Let Xj = 1 if the j th Bernoulli trial results in success and let Xj = 0 otherwise. Then X1 , X2 , . . . are independent Bernoulli random variables with common mean p. On the one hand, as time is quantized into units of duration n1 , the number of successes that occur per unit of continuous time is X1 + · · · + Xn , which is to equal λ. On the other hand, by the strong law of large numbers, X1 + · · · + Xn ≈ np for large n. Hence, np ≈ λ, or p ≈ λ/n. 12.2 a) As one Bernoulli trial is performed at the end of each 1/n time-unit period (i.e., at times 1/n, 2/n, . . . ), the number performed by time t equals the greatest integer in t/(1/n). In other words, for t > 0, the quantity ⌊nt⌋ represents the number of Bernoulli trials performed by time t. b) Recall that x − 1 < ⌊x⌋ ≤ x for all x ∈ R. Hence, k − 1 = (⌊ns⌋ + 1) − 1 = ⌊ns⌋ ≤ ns < ⌊ns⌋ + 1 = (k − 1) + 1 = k, and, hence, k−1 k ≤s< . n n Furthermore, k + m − 1 = (⌊ns⌋ + 1) + (⌊nt⌋ − ⌊ns⌋) − 1 = ⌊nt⌋ ≤ nt

< ⌊nt⌋ + 1 = (⌊ns⌋ + 1) + (⌊nt⌋ − ⌊ns⌋) = k + m,

and, hence,

k+m−1 k+m ≤t < . n n c) For each x ∈ R,

1 nx − 1 ⌊nx⌋ nx = < ≤ = x, n n n n and, hence, limn→∞ ⌊nx⌋/n = x. Consequently, x−

m ⌊nt⌋ − ⌊ns⌋ ⌊nt⌋ ⌊ns⌋ = lim = lim − lim = t − s. n→∞ n n→∞ n→∞ n n→∞ n n lim

12-1

12-2

CHAPTER 12

Applications of Probability Theory

12.3 a) The number of patients that arrive between 4:00 A.M. and 5:00 Poisson distribution with parameter 6.9 · (5 − 4) = 6.9. Hence,

A.M.

is N(5) − N(4), which has a

! " (6.9)7 = 0.149. P N(5) − N(4) = 7 = e−6.9 7!

b) The number of patients that arrive between 6:00 A.M. and 8:00 Poisson distribution with parameter 6.9 · (8 − 6) = 13.8. Hence,

A.M.

is N(8) − N(6), which has a

! " (13.8)14 P N(8) − N(6) = 14 = e−13.8 = 0.106. 14!

c) The number of patients that arrive between 4:00 A.M. and 5:00 A.M. and the number of patients that arrive between 6:00 A.M. and 8:00 A.M. are independent random variables because the two time intervals are disjoint. Referring now to parts (a) and (b), we get ! " ! " ! " P N(5) − N(4) = 7, N (8) − N(6) = 14 = P N(5) − N(4) = 7 P N(8) − N(6) = 14 = 0.149 · 0.106 = 0.0158. d) From the general addition rule and parts (a)–(c), ! " P N(5) − N(4) = 7 or N(8) − N(6) = 14 " ! " ! = P N(5) − N(4) = 7 + P N(8) − N(6) = 14 ! " − P N(5) − N(4) = 7, N (8) − N(6) = 14 = 0.149 + 0.106 − 0.0158 = 0.239. e) The number of patients that arrive between 4:00 A.M. and 7:00 Poisson distribution with parameter 6.9 · (7 − 4) = 20.7. Hence,

A.M.

is N(7) − N(4), which has a

! " (20.7)20 P N(7) − N(4) = 20 = e−20.7 = 0.0878. 20!

f) Let A = {N(7) − N(4) = 20}, B = {N(8) − N(6) = 14}, and Ck = {N(7) − N(6) = k}. The problem is to determine P (A ∩ B). For convenience, let us set X = N(6) − N(4), Y = N(7) − N (6), and Z = N(8) − N(7). We know that X ∼ P(13.8), Y ∼ P(6.9), and Z ∼ P(6.9). Applying the law of partitions and using the independent-increments property of a Poisson process, we get P (A ∩ B) = = =

∞ # k=0

14 # k=0

14 #

P (A ∩ B ∩ Ck ) =

14 # k=0

P (X = 20 − k, Y = k, Z = 14 − k)

P (X = 20 − k)P (Y = k)P (Z = 14 − k) e−13.8

k=0

(13.8)20−k −6.9 (6.9)k −6.9 (6.9)14−k ·e ·e (20 − k)! k! (14 − k)!

14 −27.6

= (6.9) e

14 # k=0

(13.8)20−k . (20 − k)! k! (14 − k)!

g) Using a scientific calculator, we found that the probability in part (f) equals 0.0101.

12.1

The Poisson Process

12-3

h) Applying the general addition rule and referring to parts (e), (b), and (g), we get ! " P N(7) − N(4) = 20 or N(8) − N(6) = 14 ! " ! " = P N(7) − N(4) = 20 + P N(8) − N(6) = 14 ! " − P N (7) − N(4) = 20, N (8) − N(6) = 14 = 0.0878 + 0.106 − 0.0101 = 0.184.

12.4 a) The number of patients that arrive either between 4:00 A.M. and 5:00 A.M. or between 6:00 A.M. and 8:00 A.M. is the sum of N(5) − N(4) and N(8) − N(6). From properties of a Poisson process, these two quantities are independent Poisson random variables with parameters 6.9 and 13.8, respectively. Therefore, from Proposition 6.20(b) on page 311, the sum of those two random variables has the Poisson distribution with parameter 6.9 + 13.8 = 20.7. b) The number of patients that arrive either between 4:00 A.M. and 7:00 A.M. or between 6:00 A.M. and 8:00 A.M. is just the number of patients that arrive between 4:00 A.M. and 8:00 A.M., which has the Poisson distribution with parameter 6.9 · 4 = 27.6. 12.5 Let N(t) denote the number of queries to the database by time t, where time is measured in minutes. Then {N(t) : t ≥ 0} is a Poisson process with rate 4. a) The number of queries that occur between times s and t is N(t) − N(s), which has a Poisson distribution with parameter 4(t − s). b) On average, 4 queries occur per minute, so that, on average, 8 queries occur in a 2-minute interval. c) The number of queries that occur in a 2-minute interval has a Poisson distribution with parameter 4 · 2 = 8. Hence, the probability that between three and five queries, inclusive, occur in a 2-minute interval is $ % 5 k 3 # 84 85 −8 8 −8 8 e =e + + = 0.177. k! 3! 4! 5! k=3

d) From Proposition 12.2 on page 691, the elapsed time between successive queries has an exponential distribution with parameter 4. e) Referring to part (d) and recalling that the mean of an exponential random variable is the reciprocal of its parameter, we see that, on average it takes 1/4 minute, or 15 seconds, between successive queries. f) From Proposition 12.1 on page 689, the time elapsed until the nth query has an Erlang distribution with parameters n and 4 or, equivalently, a "(n, 4) distribution. g) Referring to part (f) and recalling that the mean of a "(α, λ) random variable is α/λ, we see that, on average it takes n/4 minutes for the nth query to occur. h) The elapsed time between the third and fifth queries is W5 − W3 , which, from part (g), has mean equal to 5/4 − 3/4 = 1/2. Therefore, on average, it takes 0.5 minute, or 30 seconds, between the third and fifth queries. i) The number of queries that occur in the first minute is N(1), which has a Poisson distribution with parameter 4 · 1 = 4, and the number of queries that occur in the next two minutes is N(3) − N(1), which has a Poisson distribution with parameter 4 · (3 − 1) = 8. Moreover, from properties of a Poisson process, these two Poisson random variables are independent. Hence, ! " ! " ! " P N(1) = 5, N (3) − N(1) = 7 = P N(1) = 5 P N(3) − N(1) = 7 = e−4

45 −8 87 ·e = 0.0218. 5! 7!

12-4

CHAPTER 12

Applications of Probability Theory

12.6 For convenience, set q = 1 − p. a) Let m and n be nonnegative integers. Note that, if N1 (t) = m and N2 (t) = n, then N(t) = m + n, because N(t) = N1 (t) + N2 (t). Hence, by the general multiplication rule, ! " ! " P N1 (t) = m, N2 (t) = n = P N1 (t) = m, N2 (t) = n, N(t) = m + n " ! = P N1 (t) = m, N(t) = m + n " ! = P (N(t) = m + n) P N1 (t) = m | N(t) = m + n

" (λt)m+n ! P N1 (t) = m | N(t) = m + n . (m + n)! Now, given that N(t) = m + n, the number of Type 1 customers that arrive by time t has the B(m + n, p) distribution. Therefore, & ' & ' ! " m+n m m+n m n (m+n)−m P N1 (t) = m | N(t) = m + n = p (1 − p) = p q . m m = e−λt

Consequently,

' m+n & ! " m+n m n (λt)m+n (m + n)! m n −λt (λt) P N1 (t) = m, N2 (t) = n = e p q = e−λ(p+q)t p q (m + n)! m (m + n)! m! n! & m n m n m'& n' −pλt −qλt (λt) (λt) p q −pλt (pλt) −qλt (qλt) =e e = e e . m! n! m! n!

It now follows from Exercise 6.60 that N1 (t) and N2 (t) are independent Poisson random variables with parameters pλt and qλt, respectively. b) We first show that {N1 (t) : t ≥ 0} is a Poisson process. Because 0 = N(0) = N1 (0) + N2 (0) and N1 (0) and N2 (0) are nonnegative-integer valued, it follows that N1 (0) = 0. Now, suppose that r ∈ N and let 0 ≤ t1 < t2 < · · · < tr . The random variables N1 (tj ) − N1 (tj −1 ), 2 ≤ j ≤ r, give the number of Type 1 customers that arrive in the time intervals (tj −1 , tj ], respectively. Relative to the Poisson process {N(t) : t ≥ 0}, they depend only on N(tj ) − N(tj −1 ), respectively. From the independentincrements property of a Poisson process, these latter random variables are independent and, hence, so are the random variables N1 (tj ) − N1 (tj −1 ). Hence, the counting process {N1 (t) : t ≥ 0} has independent increments. Next, let 0 ≤ s < t. From the law of total probability, for each nonnegative integer k, ∞ " # ! " ! " ! P N(t) − N(s) = n P N1 (t) − N1 (s) = k | N(t) − N(s) = n P N1 (t) − N1 (s) = k = n=k

=

∞ #

e

−λ(t−s)

n=k

=e

−pλ(t−s)

! "n & ' λ(t − s) n k n−k · p q n! k

! "k ∞ ! "n−k pλ(t − s) # −qλ(t−s) qλ(t − s) e k! (n − k)! n=k

! "k ! "k pλ(t − s) −pλ(t−s) pλ(t − s) =e ·1=e . k! k! ! " Hence, N1 (t) − N1 (s) ∼ P pλ(t − s) . Consequently, we have shown that {N1 (t) : t ≥ 0} is a Poisson process with rate pλ. Similarly, {N2 (t) : t ≥ 0} is a Poisson process with rate qλ. The fact that these two Poisson processes are independent follows from part (a). −pλ(t−s)

12.1

The Poisson Process

12-5

12.7 a) Let n1 , . . . , nm be any m nonnegative integers. Note that Nj (t) = nj for 1 ≤ j ≤ m implies that N(t) = n1 + · · · + nm , because N(t) = N1 (t) + · · · + Nm (t). Hence, by the general multiplication rule, ! " P N1 (t) = n1 , . . . , Nm (t) = nm ! " = P N1 (t) = n1 , . . . , Nm (t) = nm , N(t) = n1 + · · · + nm ! " = P (N(t) = n1 + · · · + nm ) P N1 (t) = n1 , . . . , Nm (t) = nm | N(t) = n1 + · · · + nm = e−λt

! " (λt)n1 +···+nm P N1 (t) = n1 , . . . , Nm (t) = nm | N(t) = n1 + · · · + nm . (n1 + · · · + nm )!

Given that N(t) = n1 + · · · + nm , the numbers of times that specified events with attributes a1 , . . . , am occur by time t has the multinomial distribution with parameters n1 + · · · + nm and p1 , . . . , pm . Therefore, we have ' & ! " n1 + · · · + nm n1 nm P N1 (t) = n1 , . . . , Nm (t) = nm | N(t) = n1 + · · · + nm = p1 · · · pm . n1 , . . . , nm Consequently,

" ! P N1 (t) = n1 , . . . , Nm (t) = nm = e−λt

& ' n1 + · · · + nm n1 (λt)n1 +···+nm nm p1 · · · pm (n1 + · · · + nm )! n1 , . . . , nm

= e−λ(p1 +···+pm )t

(λt)n1 +···+nm (n1 + · · · + nm )! n1 nm p1 · · · pm (n1 + · · · + nm )! n1 ! · · · nm !

nm e−p1 λt · · · e−pm λt (λt)n1 · · · (λt)nm p1n1 · · · pm n1 ! · · · nm ! & & n1 ' nm ' −p1 λt (p1 λt) −pm λt (pm λt) = e ··· e . n1 ! nm !

=

It now follows from Exercise 6.60 that N1 (t), . . . , Nm (t) are independent Poisson random variables with parameters p1 λt, . . . , pm λt, respectively. b) We first show that {N1 (t) : t ≥ 0} is a Poisson process. Because 0 = N(0) = N1 (0) + N2 (0) and N1 (0) and N2 (0) are nonnegative-integer valued, it follows that N1 (0) = 0. Now, suppose that r ∈ N and let 0 ≤ t1 < t2 < · · · < tr . The random variables N1 (tj ) − N1 (tj −1 ), 2 ≤ j ≤ r, give the number of times that a specified event with attribute a1 occurs in the time intervals (tj −1 , tj ], respectively. Relative to the Poisson process {N(t) : t ≥ 0}, they depend only on N(tj ) − N(tj −1 ), respectively. From the independent-increments property of a Poisson process, these latter random variables are independent and, hence, so are the random variables N1 (tj ) − N1 (tj −1 ). Hence, the counting process {N1 (t) : t ≥ 0} has independent increments. Next, let 0 ≤ s < t. From the law of total probability, for each nonnegative integer k, ∞ ! " # ! " ! " P N1 (t) − N1 (s) = k = P N(t) − N(s) = n P N1 (t) − N1 (s) = k | N(t) − N(s) = n n=k

=

∞ # n=k

e

−λ(t−s)

! "n ! " λ(t − s) · P N1 (t) − N1 (s) = k | N(t) − N(s) = n . n!

12-6

CHAPTER 12

Applications of Probability Theory

However, given that N(t) − N(s) = n, the number of times that a specified event with attribute a1 occurs in the time interval (s, t] has the binomial distribution with parameters n and p1 . Setting q1 = 1 − p1 , we now see that ! "n & ' ∞ " # ! n k n−k −λ(t−s) λ(t − s) e · p q P N1 (t) − N1 (s) = k = n! k 1 1 n=k =

∞ #

e

−(p1 +q1 )λ(t−s)

n=k

=e

−p1 λ(t−s)

=e

−p1 λ(t−s)

=e

−p1 λ(t−s)

! "k ! "n−k λ(t − s) λ(t − s) n! pk q n−k n! k! (n − k)! 1 1

"k ∞ "n−k ! ! p1 λ(t − s) # −q1 λ(t−s) q1 λ(t − s) e k! (n − k)! n=k "k ∞ "j ! ! p1 λ(t − s) # −q1 λ(t−s) q1 λ(t − s) e k! j! j =0

"k "k ! ! p1 λ(t − s) −p1 λ(t−s) p1 λ(t − s) ·1=e . k! k!

! " Hence, N1 (t) − N1 (s) ∼ P p1 λ(t − s) . Consequently, we have shown that {N1 (t) : t ≥ 0} is a Poisson process with rate p1 λ. Similarly, {N2 (t) : t ≥ 0}, . . . , {Nm (t) : t ≥ 0} are Poisson processes with rates p2 λ, . . . , pm λ, respectively. The fact that these m Poisson processes are independent follows from part (a). small positive numbers. For convenience, 12.8 Let(0 = t0 < t1 < · · · < tn , and let $t1 , . . . , $tn represent ) set E = t1 ≤ W1 ≤ t1 + $t1 , . . . , tn ≤ Wn ≤ tn + $tn . Event E occurs if and only if no successes occur in the time interval from 0 to t1 , one success occurs in the time interval from t1 to t1 + $t1 , no successes occur in the time interval from t1 + $t1 to t2 , one success occurs in the time interval from t2 to t2 + $t2 , . . . , no successes occur in the time interval from tn−1 + $tn−1 to tn , and at least one success occurs in the time interval from tn to tn + $tn . See Fig. 12.2 on page 690. For 1 ≤ j ≤ n − 1, let Aj denote the event that (exactly) one success occurs in the time interval from tj to tj + $tj and, for 1 ≤ j ≤ n, let Bj denote the event that no successes occur in the time interval from tj −1 + $tj −1 to tj , where we define $t0 = 0. Also, let Cn denote the event that at least one success occurs in the time interval from tn to tn + $tn . We note that, for 1 ≤ j ≤ n − 1, ! " (λ$tj )1 P (Aj ) = P N(tj + $tj ) − N(tj ) = 1 = e−λ$tj = e−λ$tj λ$tj 1!

and, for 1 ≤ j ≤ n,

! " P (Bj ) = P N(tj ) − N(tj −1 + $tj −1 ) = 0 ! "0 −λ(tj −tj −1 −$tj −1 ) λ(tj − tj −1 − $tj −1 ) =e = e−λ(tj −tj −1 −$tj −1 ) . 0!

Also,

! " ! " P (Cn ) = P N(tn + $tn ) − N(tn ) ≥ 1 = 1 − P N(tn + $tn ) − N(tn ) = 0 = 1 − e−λ$tn

(λ$tn )0 = 1 − e−λ$tn . 0!

12.1

12-7

The Poisson Process

Hence, by the independent-increments property of a Poisson process and Relation (5.18) on page 219,

P (E) = P = =

&n−1 * j =1

Aj ∩

n *

j =1

Bj ∩ Cn

'

=

n−1 +

P (Aj )

j =1

n +

j =1

P (Bj ) · P (Cj )

n−1 +!

n "+

! " e−λ(tj −tj −1 −$tj −1 ) · 1 − e−λ$tn

n−1 +!

+ " n−1

! " e−λ(tj +1 −tj −$tj ) · 1 − e−λ$tn

e−λ$tj λ$tj

j =1

j =1

e−λ$tj λ$tj

j =1

=e

−λt1

&n−1 +

j =0

e

−λ(tj +1 −tj )

'

! " λn−1 $t1 · · · $tn−1 1 − e−λ$tn

j =1 , −λ jn−1 =1 (tj +1 −tj ) n−1

! " $t1 · · · $tn−1 1 − e−λ$tn ! " = λn−1 e−λtn $t1 · · · $tn−1 1 − e−λ$tn = e−λt1 e

λ

≈ λn−1 e−λtn $t1 · · · $tn−1 · λ$tn

= λn e−λtn $t1 · · · $tn . Consequently,

fW1 ,...,Wn (t1 , . . . , tn )$t1 · · · $tn ≈ P (t1 ≤ W1 ≤ t1 + $t1 , . . . , tn ≤ Wn ≤ tn + $tn ) = P (E)

≈ λn e−λtn $t1 · · · $tn , and, therefore, Equation (12.9) holds. 12.9 a) First we use a basic random number generator to obtain n uniform numbers between 0 and 1, say, u1 , . . . , un . Next we let ij = −λ−1 ln uj for 1 ≤ j ≤ n. In view of Example 8.22(b), the numbers i1 , . . . , in represent n independent observations of an exponential random variable with parameter λ. Because of Proposition 12.2 on page 691, these n numbers can serve as a realization of the first n interarrival times of a Poisson process with rate λ. Now we set wk = i1 + · · · + ik for 1 ≤ k ≤ n. Then w1 , . . . , wn represent a realization of the waiting times for the first n events. In this way, we obtain a simulation of a Poisson process with rate λ up to and including the occurrence of the nth event. b) Answers will vary. The method, however, is to apply part (a) 10 times, each time with n = 5 and λ = 6.9. In each case, the graph of N(t) against t is obtained from the following formula: ⎧ 0, ⎪ ⎪ ⎪ 1, ⎪ ⎪ ⎨ 2, N(t) = 3, ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 4, 5,

if 0 ≤ t < w1 ; if w1 ≤ t < w2 ; if w2 ≤ t < w3 ; if w3 ≤ t < w4 ; if w4 ≤ t < w5 ; if t = w5 .

12-8

CHAPTER 12

Applications of Probability Theory

12.10 From the conditional probability rule and properties of a Poisson process, we get, for 0 ≤ k ≤ n, P (N(s) = k | N(t) = n) = =

P (N(s) = k, N(t) = n) P (N(s) = k, N(t) − N(s) = n − k) = P (N(t) = n) P (N(t) = n)

P (N(s) = k) P (N(t) − N(s) = n − k) P (N(t) = n)

(λs)k −λ(t−s) (λ(t − s))n−k & ' k k n−k ·e n λ s · λ (t − s)n−k k! (n − k)! = = (λt)n λn t n k e−λt n! & '1 2 & 'n−k & ' 1 2 1 s 2n−k n s k t −s n s k 1− = = . k t t k t t e−λs

Hence, given that N(t) = n, we see that N(s) ∼ B(n, s/t). 12.11 a) The intensity function, ν(t), is given by ⎧ 0, ⎪ ⎪ ⎪ 4t − 40, ⎪ ⎪ ⎨ −4t + 56, ν(t) = 5t − 70, ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ −5t + 110, 0,

if 0 ≤ t < 10; if 10 ≤ t < 12; if 12 ≤ t < 14; if 14 ≤ t < 18; if 18 ≤ t < 22; if 22 ≤ t < 24.

In the following graph, we use v(t) instead of ν(t):

3t b) The mean function is µ(t) = 0 ν(s) ds. We note that µ is continuous, being an indefinite integral. We refer to part (a) and consider six cases. Case 1: 0 ≤ t < 10. In this case, µ(t) =

4

0

t

ν(s) ds =

4

t 0

0 ds = 0.

12.1

12-9

The Poisson Process

Case 2: 10 ≤ t < 12. In this case, 4 10 4 t 4 t 4 t ν(s) ds = ν(s) ds + ν(s) ds = µ(10−) + (4s − 40) ds = 2t 2 − 40t + 200. µ(t) = 0

0

10

10

Case 3: 12 ≤ t < 14. In this case, 4 12 4 t 4 t 4 t ν(s) ds = ν(s) ds+ ν(s) ds = µ(12−)+ (−4s+56) ds = −2t 2 +56t −376. µ(t) = 0

0

12

12

Case 4: 14 ≤ t < 18. In this case, 4 14 4 t 4 t 4 t 5 ν(s) ds = ν(s) ds + ν(s) ds = µ(14−) + (5s − 70) ds = t 2 − 70t + 506. µ(t) = 2 0 0 14 14 Case 5: 18 ≤ t < 22. In this case, 4 18 4 t 4 t 4 t 5 ν(s) ds = ν(s) ds+ ν(s) ds = µ(18−)+ (−5s+110) ds = − t 2 +110t−1114. µ(t) = 2 0 0 18 18 Case 6: 22 ≤ t < 24. In this case, 4 4 t ν(s) ds = µ(t) = 0

Hence, we have

22 0

ν(s) ds +

4

t 22

ν(s) ds = µ(22−) +

4

t 22

0 ds = 96.

⎧ 0, 0 ≤ t < 10; ⎪ ⎪ ⎪ 2 ⎪ ⎪ 10 ≤ t < 12; 2t − 40t + 200, ⎪ ⎪ ⎪ ⎨ −2t 2 + 56t − 376, 12 ≤ t < 14; µ(t) = 5 2 ⎪ 14 ≤ t < 18; ⎪ 2 t − 70t + 506, ⎪ ⎪ ⎪ ⎪ − 5 t 2 + 110t − 1114, 18 ≤ t < 22; ⎪ ⎪ ⎩ 2 96, 22 ≤ t < 24. In the following graph we use m(t) instead of µ(t):

c) The number of customers that arrive at the restaurant per day is N(24), which we know has a Poisson distribution with parameter µ(24). From part (b), ! " E N(24) = µ(24) = 96.

12-10

CHAPTER 12

Applications of Probability Theory

d) The number of customers that arrive between 11:00 A.M. and 7:00 P.M. is N(19) − N(11), which we know has a Poisson distribution with parameter µ(19) − µ(11). Referring to part (b), we get ! " E N(19) − N(11) = µ(19) − µ(11) 1 2 1 2 = −(5/2) · 192 + 110 · 19 − 1114 − 2 · 112 − 40 · 11 + 200 = 71.5

and σN (19)−N (11)

5 ! √ " 6 = Var N(19) − N(11) = µ(19) − µ(11) = 71.5 = 8.46.

e) The number of customers that arrive between 11:00 A.M. and 1:00 P.M. is N(13) − N(11), which we know has a Poisson distribution with parameter µ(13) − µ(11). From part (b), 1 2 1 2 µ(13) − µ(11) = −2 · 132 + 56 · 13 − 376 − 2 · 112 − 40 · 11 + 200 = 12. Hence,

!

"

P 13 ≤ N(13) − N(11) ≤ 15 =

15 #

e−12

k=13

12k = e−12 k!

$

1213 1214 1215 + + (13)! (14)! (15)!

%

= 0.268.

12.12 Let Xk = 1 (the constant random variable equal to 1) for all k ∈ N . Then, for t ≥ 0, ⎧ 0, 0, if N(t) = 0; 7 if N(t) = 0; ⎧ ⎪ ⎪ ⎨ ⎨ N(t), if N(t) = 0; # # = N(t) = = N(t). Y (t) = N(t) N(t), if N(t) ≥ 1. ⎪ ⎪ Xk , if N(t) ≥ 1. 1, if N(t) ≥ 1. ⎩ ⎩ k=1

k=1

Therefore, a compound Poisson process does indeed generalize the concept of a (homogeneous) Poisson process. 12.13 a) Let λ denote the rate of the homogeneous Poisson process {N(t) : t ≥ 0}. Now, ⎧ if n = 0; 7 ⎪ E(0), ! " ⎨ & n ' 0, if n = 0; # = = nµ E Y (t) | N(t) = n = nµ, if n ≥ 1. ⎪ Xk , if n ≥ 1. ⎩E k=1

and

⎧ ⎪ Var(0), ! " ⎨ &# ' n Var Y (t) | N(t) = n = ⎪ Xk , ⎩ Var k=1

if n = 0;

if n ≥ 1.

=

7

0, nσ 2 ,

if n = 0; = nσ 2 . if n ≥ 1.

! " ! " Hence, E Y (t) | N(t) = N(t)µ and Var Y (t) | N(t) = N(t)σ 2 . Therefore, from the laws of total expectation and total variance, 1 ! ! " "2 ! " ! " µ(t) = E Y (t) = E E Y (t) | N(t) = E N(t)µ = µE N(t) = µλt and

1 ! 1 ! 1 2 ! " "2 "2 ! " σ 2 (t) = Var Y (t) = E Var Y (t) | N(t) + Var E Y (t) | N(t) = E N(t)σ 2 + Var N(t)µ 1 2 ! " ! " = σ 2 E N(t) + µ2 Var N(t) = σ 2 λt + µ2 λt = µ2 + σ 2 λt.

12.1

12-11

The Poisson Process

b) From the law of total expectation, Equation (10.49) on page 602, and part (a), 1 ! 1 ! " "2 ! "2 E N(t)Y (t) = E E N(t)Y (t) | N(t) = E N(t)E Y (t) | N(t) 2 1 2 1 ! " = E N (t)µN(t) = µE N(t)2 = λt + (λt)2 µ. Referring again to part (a), we get

! " ! " ! " " ! " E N(t)Y (t) − E N(t) E Y (t) 5 ! ρ N(t), Y (t) = Cov N(t), Y (t) = " ! " Var N(t) Var Y (t) ! " λt + (λt)2 µ − λt · µλt µ 5 = . =6 ! " µ2 + σ 2 λt · µ2 + σ 2 λt !

12.14 For each k ∈ N , let Xk denote the amount deposited by customer k and let Y (t) denote the total deposits by time t. In the notation of Exercise 12.13, λ = 25.8, µ = 574 and σ = 3167. a) Referring to the solution to Exercise 12.13(a), we have ! " E Y (t) = µ(t) = µλt = 574 · 25.8t = 14,809.20t. b) Referring again to the solution to Exercise 12.13(a), we have 5 ! 5! 5! √ " 6 " " 2 2 2 µ + σ λt = 5742 + 31672 · 25.8t = 16,348.44 t. σY (t) = Var Y (t) = σ (t) =

Theory Exercises 12.15 a) We have

8 8 8 1 0 0 ... 0 08 8 8 8 −1 1 0 . . . 0 0 8 8 8 8 0 −1 1 . . . 0 0 8 8 J (w1 , . . . , wn ) = 8 .. .. .. . . .. .. 88 . . . . . . .8 8 8 0 0 0 . . . 1 0 88 8 8 0 0 0 . . . −1 1 8 We see that J (w1 , . . . , wn ) is the determinant of a lower triangular matrix and, hence, equals the product of the diagonal elements, all of which are always 1. Hence, J (w1 , . . . , wn ) = 1 for all w1 , . . . , wn ∈ R. b) This result follows from the multivariate analogue of Exercise 9.95. Alternatively, for each 1 ≤ j ≤ n, we have, for tj > 0, 4 ∞ 4 ∞ 4 ∞ 4 ∞ + + ··· fI1 ,...,In (t1 , . . . , tn ) dtk = ··· λe−λt1 · · · λe−λtn dtk fIj (tj ) = −∞

−∞

= λe−λtj

+4

k̸=j



0

k̸=j

0

0

λe−λtk dtk = λe−λtj 19 ·:; · · 1< = λe−λtj . n−1 times

Therefore, Ij ∼ E(λ) for 1 ≤ j ≤ n. Moreover,

fI1 ,...,In (t1 , . . . , tn ) = λe−λt1 · · · λe−λtn = fI1 (t1 ) · · · fIn (tn )

for all t1 , . . . , tn > 0, so that I1 , . . . , In are independent random variables.

k̸=j

12-12

CHAPTER 12

Applications of Probability Theory

c) The fact that I1 , I2 , . . . are independent and identically distributed exponential random variables with common parameter λ follows immediately from part (b) and the definition of independence for an infinite number of random variables as given in Definition 6.5 on page 296.

Advanced Exercises 12.16 a) For 0 ≤ s < t,

! " ! " ! " P N(s) ≥ 1, N(t) = 1 ! " P W1 ≤ s | N(t) = 1 = P N(s) ≥ 1 | N(t) = 1 = P N(t) = 1 ! " ! " P N(s) = 1, N(t) = 1 P N(s) = 1, N(t) − N(s) = 0 ! " ! " = = P N(t) = 1 P N(t) = 1 ! " ! " P N(s) = 1 P N(t) − N(s) = 0 ! " = P N(t) = 1 ! "0 1 −λ(t−s) λ(t − s) −λs (λs) e ·e 1! 0! = 1 (λt) e−λt 1! λs s = = . λt t Hence, FW1 | N(t) (s | 1) = s/t if 0 ≤ s < t, so that W1|N(t)=1 ∼ U(0, t). b) Let 0 < s < t and let $s be a small positive number. Given N(t) = 1, the event {s ≤ W1 ≤ s + $s} occurs if and only if no successes occur in the time interval from 0 to s, exactly one success occurs in the time interval from s to s + $s, and no successes occur in the time interval from s + $s to t. Hence, by properties of a Poisson process, ! " P s ≤ W1 ≤ s + $s | N(t) = 1 ! " = P N(s) = 0, N(s + $s) − N(s) = 1, N(t) − N(s + $s) = 0 | N(t) = 1 ! " P N(s) = 0, N(s + $s) − N(s) = 1, N(t) − N(s + $s) = 0 ! " = P N(t) = 1 ! " ! " ! " P N(s) = 0 P N(s + $s) − N(s) = 1 P N(t) − N(s + $s) = 0 ! " = P N(t) = 1 ! "0 0 1 −λ(t−s−$s) λ(t − s − $s) −λs (λs) −λ$s (λ$s) e ·e ·e 1 λ$s 0! 1! 0! = = $s. = 1 λt t (λt) e−λt 1! Therefore, ! " 1 fW1 | N(t) (s | 1)$s ≈ P s ≤ W1 ≤ s + $s | N(t) = 1 = $s, 0 < s < t. t Hence, fW1 | N(t) (s | 1) = 1/t if 0 < s < t, so that W1|N(t)=1 ∼ U(0, t).

12.1

12-13

The Poisson Process

c) Let 0 ≤ s1 < s2 < t. From the law of partitions, Proposition 2.8 on page 68, applied to a conditional probability, we get " ! " ! P W1 ≤ s1 , W2 ≤ s2 | N(t) = 2 = P N(s1 ) = 0, W1 ≤ s1 , W2 ≤ s2 | N(t) = 2 ! " + P N(s1 ) = 1, W1 ≤ s1 , W2 ≤ s2 | N(t) = 2 " ! + P N(s1 ) = 2, W1 ≤ s1 , W2 ≤ s2 | N(t) = 2 .

The first probability is 0 because events {N(s1 ) = 0} and {W1 ≤ s1 } are mutually exclusive. For the second probability, we have " ! P N(s1 ) = 1, W1 ≤ s1 , W2 ≤ s2 | N(t) = 2 " ! = P N(s1 ) = 1, N(s2 ) − N(s1 ) = 1 | N(t) = 2 ! " P N(s1 ) = 1, N(s2 ) − N(s1 ) = 1, N(t) = 2 ! " = P N(t) = 2 " ! P N(s1 ) = 1, N(s2 ) − N(s1 ) = 1, N(t) − N(s2 ) = 0 ! " = P N(t) = 2 " ! " ! " ! P N(s1 ) = 1 P N(s2 ) − N(s1 ) = 1 P N(t) − N(s2 ) = 0 ! " = P N(t) = 2 "1 "0 ! ! 1 −λ(s2 −s1 ) λ(s2 − s1 ) −λ(t−s2 ) λ(t − s2 ) −λs1 (λs1 ) e ·e ·e 1! 1! 0! = 2 (λt) e−λt 2! 2s1 (s2 − s1 ) 2λs1 · λ(s2 − s1 ) = . = t2 λ2 t 2

For the third probability, we have " ! P N(s1 ) = 2, W1 ≤ s1 , W2 ≤ s2 | N(t) = 2 ! " " P N(s1 ) = 2, N(t) = 2 ! ! " = P N(s1 ) = 2 | N(t) = 2 = P N(t) = 2 " ! " ! " ! P N(s1 ) = 2, N(t) − N(s1 ) = 0 P N(s1 ) = 2 P N(t) − N(s1 ) = 0 ! " ! " = = P N(t) = 2 P N(t) = 2 "0 ! 2 −λ(t−s1 ) λ(t − s1 ) −λs1 (λs1 ) e ·e λ2 s12 s12 2! 0! = = . 2 λ2 t 2 t2 −λt (λt) e 2! Consequently, ! " 2s1 (s2 − s1 ) s12 2s1 s2 − s12 P W1 ≤ s1 , W2 ≤ s2 | N(t) = 2 = + = , t2 t2 t2

0 ≤ s1 < s2 < t.

12-14

CHAPTER 12

Applications of Probability Theory

Taking the mixed partial with respect to s1 and s2 yields fW1 ,W2 | N(t) (s1 , s2 | 2) =

2 , t2

0 < s1 < s2 < t.

Therefore, the conditional distribution of W1 and W2 given N(t) = 2 is the uniform distribution on the triangle 0 < s1 < s2 < t. d) Let 0 < s1 < s2 < t and let $s1 and $s2 represent small positive numbers. Given N(t) = 2, the event {s1 ≤ W1 ≤ s1 + $s1 , s2 ≤ W2 ≤ s2 + $s2 } occurs if and only if no successes occur in the time interval from 0 to s1 , exactly one success occurs in the time interval from s1 to s1 + $s1 , no successes occur in the time interval from s1 + $s1 to s2 , exactly one success occurs in the time interval from s2 to s2 + $s2 , and no successes occur in the time interval from s2 + $s2 to t. Hence, by properties of a Poisson process, we get, by arguing as previously, that P (s1 ≤ W1 ≤ s1 + $s1 , s2 ≤ W2 ≤ s2 + $s2 | N(t) = 2) =

e−λs1 · e−λ$s1 λ$s1 · e−λ(s2 −s1 −$s1 ) · e−λ$s2 λ$s2 · e−λ(t−s2 −$s2 ) e−λt (λt)2 /2

=

2 λ2 e−λt $s1 $s2 = 2 $s1 $s2 . −λt 2 2 e λ t /2 t

Therefore, fW1 ,W2 | N(t) (s1 , s2 | 2) =

2 , t2

0 < s1 < s2 < t.

Consequently, the conditional distribution of W1 and W2 given N(t) = 2 is the uniform distribution on the triangle 0 < s1 < s2 < t. e) Let 0 = s0 < s1 < · · · (< sn < sn+1 = t, and let $s1 , . . . , $sn represent ) small positive numbers. For convenience, set E = s1 ≤ W1 ≤ s1 + $s1 , . . . , sn ≤ Wn ≤ sn + $sn . Given N(t) = n, event E occurs if and only if no successes occur in the time interval from 0 to s1 , exactly one success occurs in the time interval from s1 to s1 + $s1 , no successes occur in the time interval from s1 + $s1 to s2 , exactly one success occurs in the time interval from s2 to s2 + $s2 , . . . , no successes occur in the time interval from sn−1 + $sn−1 to sn , exactly one success occurs in the time interval from sn to sn + $sn , and no successes occur in the time interval from sn + $sn to t. For 1 ≤ j ≤ n, let Aj denote the event that exactly one success occurs in the time interval from sj to sj + $sj and, for 0 ≤ j ≤ n, let Bj denote the event that no successes occur in the time interval from sj + $sj to sj +1 , where we define $s0 = 0. We note that, for 1 ≤ j ≤ n, ! " (λ$sj )1 = e−λ$sj λ$sj P (Aj ) = P N(sj + $sj ) − N(sj ) = 1 = e−λ$sj 1!

and, for 0 ≤ j ≤ n,

! " P (Bj ) = P N(sj +1 ) − N(sj + $sj ) = 0 ! "0 λ(s − s − $s ) j +1 j j = e−λ(sj +1 −sj −$sj ) = e−λ(sj +1 −sj −$sj ) . 0!

12.1

12-15

The Poisson Process

Hence, by the independent-increments property of a Poisson process, & ' n * P B0 ∩ (Aj ∩ Bj ) ! " ! " P E ∩ {N (t) = n} j =1 ! " ! " P E | N(t) = n = = P N(t) = n P N(t) = n P (B0 )

!

" P N(t) = n

=

e−λs1 e−λ

e−λs1

P (Aj )P (Bj )

j =1

= =

n +

n ! + " e−λ$sj λ$sj e−λ(sj +1 −sj −$sj )

j =1

=

e−λt (λt)n /n!

,n

j =1 (sj +1 −sj )

λn $s1 · · · $sn λn e−λt $s1 · · · $sn = e−λt (λt)n /n! e−λt (λt)n /n!

n! $s1 · · · $sn . tn

Consequently, fW1 ,...,Wn | N(t) (s1 , . . . , sn | n) =

n! , tn

0 < s1 < · · · < sn < t,

and fW1 ,...,Wn | N(t) (s1 , . . . , sn | n) = 0 otherwise. f) From the result of Exercise 9.34, the conditional joint PDF found in part (e) is that of the joint PDF of the order statistics of a random sample of size n from a uniform distribution on the interval (0, t). Hence, given N(t) = n, the times at which the n events occur, considered as unordered random variables, are independent and uniformly distributed on the interval (0, t). 12.17 a) Given that the first success occurs at time s (i.e., I1 = s), the elapsed time until the second success exceeds t (i.e., I2 > t) if and only if no successes occur in the time interval (s, s + t]. Therefore, ! " P (I2 > t | I1 = s) = P N(s + t) − N(s) = 0 | I1 = s " ! = P N(I1 + t) − N(I1 ) = 0 | I1 = s .

b) We have that I1 = s if and only if N(u) = 0 for 0 ≤ u < s, and N(s) = 1. Hence, the value of I1 depends only on the values of {N(u) : u ≥ 0} in the interval 0 ≤ u ≤ I1 . c) Applying in turn the facts that a Poisson process has independent increments, stationary increments, and N(t) ∼ P(λt), we conclude, in view of parts (a) and (b), that ! " ! " P (I2 > t | I1 = s) = P N(I1 + t) − N(I1 ) = 0 | I1 = s = P N(I1 + t) − N(I1 ) = 0

! " (λt)0 = e−λt . = P N(t) = 0 = e−λt 0! Referring now to Equation (9.33) on page 517, we get, for all s, t > 0, ' 4 ∞4 ∞ 4 ∞ &4 ∞ P (I1 > s, I2 > t) = fI1 ,I2 (u, v) du dv = fI2 | I1 (v | u) dv fI1 (u) du =

4

s

t



s

s

t

P (I2 > t | I1 = u)fI1 (u) du = e−λt e−λt

4



s

fI1 (u) du = P (I1 > s)e−λt .

Letting s → −∞ yields P (I2 > t) = and, hence, that P (I1 > s, I2 > t) = P (I1 > s)P (I2 > t). We therefore conclude that I2 ∼ E(λ) and that I1 and I2 are independent random variables. That I1 ∼ E(λ) is a consequence of Proposition 12.1 on page 689 and the fact that I1 = W1 .

12-16

CHAPTER 12

Applications of Probability Theory

d) Given that I1 = s1 , . . . , In = sn , the elapsed time until the (n + 1)st success exceeds t (i.e., In > t) if and only if no successes occur in (s1 + · · · + sn , s1 + · · · + sn + t]. Therefore, P (In+1 > t | I1 = s1 , . . . , In = sn ) ! " = P N(s1 + · · · + sn + t) − N(s1 + · · · + sn ) = 0 | I1 = s1 , . . . , In = sn ! " = P N(I1 + · · · + In + t) − N(I1 + · · · + In ) = 0 | I1 = s1 , . . . , In = sn .

Now, I1 = s1 , . . . , In = sn if and only if N(u) = 0, N(u) = 1, .. .

0 ≤ u < s1 ; s1 ≤ u < s1 + s2 ; .. .

N(u) = n − 1,

s1 + · · · + sn−1 ≤ u < s1 + · · · + sn ;

and N(s1 + · · · + sn ) = n. Thus, the values of I1 , . . . , In depend only on the values of {N(u) : u ≥ 0} for 0 ≤ u ≤ I1 + · · · + In . Applying in turn the facts that a Poisson process has independent increments and stationary increments, we conclude that P (In+1 > t | I1 = s1 , . . . , In = sn ) ! " = P N(I1 + · · · + In + t) − N(I1 + · · · + In ) = 0 | I1 = s1 , . . . , In = sn " ! = P N(I1 + · · · + In + t) − N(I1 + · · · + In ) = 0 ! " (λt)0 = e−λt . = P N(t) = 0 = e−λt 0!

e) We must show that, for each n ∈ N , the random variables I1 , . . . , In are independent and identically distributed exponential random variables with common parameter λ. The validity of this statement for n = 2 has already been established in part (c). We now use mathematical induction. Referring to part (d) and noting that e−λt does not depend on s1 , . . . , sn , it follows that In+1 is independent of I1 , . . . , In , and, furthermore, that P (In+1 > t) = e−λt , which shows In+1 ∼ E(λ). Hence, from the induction assumption, I1 , . . . , In+1 are independent and identically distributed exponential random variables with common parameter λ. 12.18 Suppose that {N(t) : t ≥ 0} is a Poisson process, either homogeneous or nonhomogeneous, with stationary increments. Then µ(t) − µ(s) depends only on the difference between s and t; that is, there is a function g such that µ(t) − µ(s) = g(t − s),

0 ≤ s < t < ∞.

Because µ(0) = 0, setting s = 0 in the previous display shows that µ = g; hence, µ(t) − µ(s) = µ(t − s),

0 ≤ s < t < ∞.

Letting t = 2s shows that µ(2s) = 2µ(s), and it follows by mathematical induction that µ(ps) = pµ(s) for all p ∈ N . Setting s = 1/p in the preceding relation gives µ(1) = pµ(1/p). Consequently, for all p, q ∈ N , µ(p/q) = pµ(1/q) = pµ(1)/q = (p/q)µ(1). As µ is everywhere continuous (being the indefinite integral of ν), it now follows that µ(t) = µ(1)t for all t > 0. Hence, µ is linear, which means that {N(t) : t ≥ 0} is a homogeneous Poisson process. Thus, a nonhomogeneous Poisson process doesn’t have stationary increments.

12.2

12-17

Basic Queueing Theory

12.19 a) By assumption, Y|X=t ∼ P(λt) for t > 0. In particular, then, E(Y | X = t) = λt for all t > 0. Applying the law of total expectation now yields ! " E(Y ) = E E(Y | X) = E(λX) = λE(X) . b) Referring to Exercises 9.83(b) and 9.85(a), we get 4 ∞ 4 ∞ 4 pY (y) = hX,Y (x, y) dx = fX (x)pY | X (y | x) dx = −∞



0

−∞

e−λx

(λx)y fX (x) dx y!

if y = 0, 1, 2, . . . , and pY (y) = 0 otherwise. c) From the definition of expected value and part (b), E(Y ) = =

#

&4 ∞ # ypY (y) = y

0

(λx)fX (x) dx = λ

Y

4





0

y=0

4

e−λx



0

(λx)y fX (x) dx y!

'

=

4

∞ 0

$

∞ # y=0

y e−λx

(λx)y y!

%

fX (x) dx

xfX (x) dx = λE(X) ,

which agrees with the result of part (a). d) When X ∼ E(µ), we have E(X) = 1/µ and fX (x) = µe−µx for x > 0. Hence, from parts (a) and (b), E(Y ) = λE(X) = and pY (y) = =

4



0

µλy y!

e

−λx

·

λ µ

(λx)y −µx µλy µe dx = y! y!

4

0



x (y+1)−1 e−(µ+λ)x dx

"(y + 1) = (µ + λ)y+1 (1 + λ/µ)y+1 (λ/µ)y

if y = 0, 1, 2, . . . , and pY (y) = 0 otherwise. Thus, Y has the Pascal distribution with parameter λ/µ, as defined in Exercise 7.26.

12.2 Basic Queueing Theory Basic Exercises 12.20 a) To reenter state n, the queueing system must first leave that state. Let En (t) and Ln (t) denote the numbers of times that the queueing system enters and leaves state n, respectively, by time t. If the queueing system is in state n at time t, then En (t) = Ln (t) + 1; otherwise, En (t) = Ln (t). Hence, in each time interval [0, t], the number of times the queueing system enters state n must equal or exceed by 1 the number of times the queueing system leaves state n. b) The average numbers of times that the queueing system enters and leaves state n by time t equal En (t)/t and Ln (t)/t, respectively. From part (a), we get Ln (t) En (t) Ln (t) + 1 . ≤ ≤ t t t

12-18

CHAPTER 12

Applications of Probability Theory

From these inequalities, we see that Ln (t) En (t) Ln (t) + 1 Ln (t) ≤ lim ≤ lim = lim , t→∞ t→∞ t→∞ t→∞ t t t t lim

and, hence, limt→∞ En (t)/t = limt→∞ Ln (t)/t. Thus, the average rates at which the queueing system enters and leaves state n are equal; that is, the rate-in = rate-out principle holds. 12.21 a) This drive-up station is an M/M/1 queue with λ = 10 and µ = 11. b) Because λ < µ (10 < 11), we know from Example 12.6 on page 703 that a steady-state distribution exists for the number of customers in this queueing system. And, from Equation (12.20) on page 704, the steady-state distribution is & ' & 'n & ' & 'n & ' λ λ 10 10 1 10 n Pn = 1 − = 1− = , n ≥ 0. µ µ 11 11 11 11 c) From Equation (12.24) on page 707, the steady-state expected number of customers in the queueing system is 10 λ = = 10. L= µ−λ 11 − 10 d) Let Y denote the steady-state number of customers in the queue. Then Y = 0 if X = 0 and Y = X − 1 if X ≥ 1. Hence, from the FEF and parts (b) and (c), ∞ ∞ ∞ # # # Lq = E(Y ) = (n − 1)Pn = nPn − Pn n=1

&

n=1

= L − (1 − P0 ) = 10 − 1 −

1 11

'

n=1

=

100 ≈ 9.09. 11

12.22 a) This drive-up station is an M/M/2 queue with λ = 10 and µ = 11. b) Because λ < 2µ (10 < 22), we know from Example 12.6 on page 703 that a steady-state distribution exists for the number of customers in this queueing system. From Equation (12.22) on page 704, P0 =

$ s−1 # (λ/µ)n n=0

n!

(λ/µ)s + s!(1 − λ/sµ)

%−1

=

$ 1 # (10/11)n n=0

n!

And, from Equation (12.23) on page 705, ⎧ ⎧ (λ/µ)n (10/11)n 3 ⎪ ⎪ ⎪ ⎪ P , if 1 ≤ n ≤ s − 1; · , 0 ⎨ ⎨ n! n! 8 Pn = = n n ⎪ ⎪ ⎪ ⎪ ⎩ (λ/µ) P0 , for n ≥ s. ⎩ (10/11) · 3 , n−s s! s 2 · 2n−2 8

(10/11)2 + 2(1 − 10/2 · 11)

if n = 1; if n ≥ 2.

3 = 4

&

5 11

%−1

'n

,

=

3 . 8

n ≥ 1.

c) See the solution to Exercise 12.23(c), specifically, the first three columns in the table and the first two probability histograms.

12.2

12-19

Basic Queueing Theory

d) From Equation (12.28) on page 708, the steady-state expected number of customers in the queueing system is (λ/sµ)(λ/µ)s 10 (10/2 · 11)(10/11)2 3 λ P = + · ≈ 1.15. L= + 0 µ s!(1 − λ/sµ)2 11 2 · (1 − 10/2 · 11)2 8 By having two tellers instead of one, we can reduce the steady-state expected number of customers in the queueing system from 10 to about 1.15. e) Let Y denote the steady-state number of customers in the queue. Then we have Y = 0 if X = 0 or 1, and Y = X − 2 if X ≥ 2. Hence, from the FEF and parts (b) and (d), Lq = E(Y ) =

∞ ∞ ∞ # # # (n − 2)Pn = nPn − 2 Pn n=2

n=2

n=2

= (L − P1 ) − 2(1 − P0 − P1 ) = L − 2 + 2P0 + P1 ≈ 0.24. By having two tellers instead of one, we can reduce the steady-state expected number of customers in the queue from about 9.09 to about 0.24. 12.23 a) This drive-up station is an M/M/3 queue with λ = 10 and µ = 11. b) Because λ < 3µ (10 < 33), we know from Example 12.6 on page 703 that a steady-state distribution exists for the number of customers in this queueing system. From Equation (12.22) on page 704, $ %−1 $ %−1 s−1 2 # # (λ/µ)n (λ/µ)s (10/11)n (10/11)3 + = + ≈ 0.400. P0 = n! s!(1 − λ/sµ) n! 6(1 − 10/3 · 11) n=0 n=0

And, from Equation (12.23) on page 705, ⎧ ⎧ (10/11)n ⎪ ⎪ P0 , if 1 ≤ n ≤ 2; ⎨ 0.363, ⎨ n! & 'n ≈ Pn = 10 n ⎩ 1.799 ⎪ , (10/11) ⎪ ⎩ 33 P0 , if n ≥ 3. n−3 3! 3

if n = 1; if n ≥ 2.

c) In the following table, we list the exact values of the steady-state probabilities to six decimal places from n = 0 to n = 12. n

One teller

Two tellers

Three tellers

0 1 2 3 4 5 6 7 8 9 10 11 12

0.090909 0.082645 0.075131 0.068301 0.062092 0.056447 0.051316 0.046651 0.042410 0.038554 0.035049 0.031863 0.028966

0.375000 0.340909 0.154959 0.070436 0.032016 0.014553 0.006615 0.003007 0.001367 0.000621 0.000282 0.000128 0.000058

0.399684 0.363349 0.165159 0.050048 0.015166 0.004596 0.001393 0.000422 0.000128 0.000039 0.000012 0.000004 0.000001

12-20

CHAPTER 12

Applications of Probability Theory

The following three graphs provide probability histograms for the steady-state probabilities in the preceding table.

12.2

Basic Queueing Theory

12-21

d) From Equation (12.28) on page 708, the steady-state expected number of customers in the queueing system is (λ/sµ)(λ/µ)s 10 (10/3 · 11)(10/11)3 λ + P = P0 ≈ 0.94. L= + 0 µ s!(1 − λ/sµ)2 11 3!(1 − 10/3 · 11)2 Roughly, the steady-state expected number of customers in the three-teller system is 0.94 compared with 10 and 1.15 in the one-teller and two-teller systems, respectively. e) Let Y denote the steady-state number of customers in the queue. Then we have Y = 0 if X = 0, 1, or 2, and Y = X − 3 if X ≥ 3. Hence, from the FEF and parts (b) and (d), Lq = E(Y ) =

∞ ∞ ∞ # # # (n − 3)Pn = nPn − 3 Pn n=3

n=3

n=3

= (L − P1 − 2P2 ) − 3(1 − P0 − P1 − P2 ) = L − 3 + 3P0 + 2P1 + P2 ≈ 0.03. Roughly, the steady-state expected number of customers in the queue in the three-teller system is 0.03 compared with 9.09 and 0.24 in the one-teller and two-teller systems, respectively. 12.24 As the maximum number of customers in an M/M/s/N queueing system is N, having more than N servers would be pointless. Hence, we assume that s ≤ N . 12.25 a) This barbershop is an M/M/1/4 queue with λ = 3 and µ = 2. b) We have 7 3, if 0 ≤ n ≤ 3; λn = µn = 2, for n ≥ 1. 0, if n ≥ 4. Hence,

7 n + λk−1 (3/2)n , = 0, µk k=1

if 1 ≤ n ≤ 4; if n ≥ 5.

Referring now to Proposition 12.3 on page 703, we find that $ %−1 $ %−1 '%−1 $ ∞ &+ n 4 5 # # λk−1 1 − (3/2) 16 P0 = 1 + = 1+ (3/2)n = = µk 1 − 3/2 211 n=1 k=1 n=1

and, moreover, for 1 ≤ n ≤ 4,

Hence,

c) Referring to part (b), we get ∞ #

&+ n

& 'n ' 3 16 λk−1 Pn = · P0 = . µk 2 211 k=1 & 'n ⎧ ⎨ 16 3 , Pn = 211 2 ⎩ 0, 4 #

16 L= nPn = n· 211 n=0 n=1

if 0 ≤ n ≤ 4; if n ≥ 5.

& 'n & 'n 4 3 16 # 3 = n ≈ 2.76. 2 211 n=1 2

12-22

CHAPTER 12

Applications of Probability Theory

12.26 a) From Equation (12.12) on page 700, 7 λ, if 0 ≤ n ≤ N − 1; λn = 0, if n ≥ N . Hence,

µn = µ,

7 n + λk−1 (λ/µ)n , = 0, µk k=1

Therefore,

for n ≥ 1.

if 1 ≤ n ≤ N; if n ≥ N + 1.

' # ∞ &+ n N # λk−1 = (λ/µ)n < ∞, µ k n=1 n=1 k=1

so that, by Proposition 12.3 on page 703, an M/M/1/N queueing system has a steady-state distribution. b) Referring again to Proposition 12.3 and to part (a), we find that ⎧ 1 $ %−1 $ %−1 ⎪ if λ = µ; ⎪ & ' ∞ + n N ⎨ N + 1, # # λk−1 n P0 = 1 + = (λ/µ) = ⎪ µk 1 − λ/µ ⎪ n=1 k=1 n=0 ⎩ , if λ ̸= µ. 1 − (λ/µ)N+1 and, moreover, for 1 ≤ n ≤ N ,

&+ n

& 'n ' λ λk−1 Pn = P0 . P0 = µk µ k=1

Consequently, if λ = µ,

⎧ ⎨

1 , Pn = N + 1 ⎩ 0,

whereas, if λ ̸= µ,

if 0 ≤ n ≤ N; if n ≥ N + 1.

⎧ n ⎨ (1 − λ/µ)(λ/µ) , 1 − (λ/µ)N+1 Pn = ⎩ 0,

c) We consider two cases:

if 0 ≤ n ≤ N; if n ≥ N + 1.

Case 1: λ = µ. Then, in view of (∗), L=

∞ # n=0

nPn =

N # n=1

n

N 1 1 # 1 N(N + 1) N = n= · = . N +1 N + 1 n=1 N +1 2 2

Case 2: λ ̸= µ. For this case, we first note that, for a ̸= 1, N # n=1

n

na = a

N # n=1

na

n−1

(∗)

&N ' & ' d # d 1 − a N+1 n =a a =a da n=0 da 1−a

! " a 1 − a N+1 − (N + 1)a N (1 − a) = . (1 − a)2

(∗∗)

12.2

12-23

Basic Queueing Theory

For convenience, set ρ = λ/µ. Then, in view of (∗∗),

! " N ρ 1 − ρ N+1 − (N + 1)ρ N (1 − ρ) 1 − λ/µ # 1−ρ n nPn = nρ = · L= 1 − ρ N+1 n=1 1 − ρ N+1 (1 − ρ)2 n=0 ∞ #

=

λ/µ ρ (N + 1)ρ N+1 (N + 1)(λ/µ)N+1 = . − − 1−ρ 1 − λ/µ 1 − ρ N+1 1 − (λ/µ)N+1

d) In this case λ = 3, µ = 2, and N = 4. Hence, by part (b), specifically, (∗∗), the steady-state distribution for the number of customers in the barbershop is & 'n ⎧ ⎧ n ⎨ (1 − 3/2)(3/2) , if 0 ≤ n ≤ 4; ⎨ 16 3 , if 0 ≤ n ≤ 4; 1 − (3/2)5 Pn = = 211 2 ⎩ ⎩ 0, if n ≥ 5. 0, if n ≥ 5. From part (c), the steady-state expected number of customers in the barbershop is L=

3/2 582 5(3/2)5 = ≈ 2.76. − 5 211 1 − 3/2 1 − (3/2)

12.27 a) We have λn = λ,

for n ≥ 0.

µn =

Hence, 7 n + (λ/µ)n , λk−1 = µk (λ/µ)2 (λ/ν)n−2 , k=1

Also,

7

if 1 ≤ n ≤ 2;

if n ≥ 3.

µ, if 1 ≤ n ≤ 2; ν, if n ≥ 3.

=

7

λ/µ, (λ/µ)2 (λ/ν)n−2 ,

if n = 1; if n ≥ 2.

' ∞ &+ n ∞ # # λk−1 (λ/µ)2 (λ/µ)2 ν = λ/µ + (λ/µ)2 (λ/ν)n−2 = λ/µ + = λ/µ + . µk 1 − λ/ν ν−λ n=1 k=1 n=2

Referring now to Proposition 12.3 on page 703, we find that $ '%−1 & '−1 ∞ &+ n # λk−1 (λ/µ)2 ν P0 = 1 + = 1 + λ/µ + µk ν−λ n=1 k=1 and that

Pn = (λ/µ)2 (λ/ν)n−2 P0 , n ≥ 2. , n−1 = 1/(1 − a)2 , we get b) Referring to part (a) and applying the formula ∞ n=1 na P1 = (λ/µ)P0

Lq =

and

∞ ∞ ∞ # # # (n − 1)Pn = (n − 1)(λ/µ)2 (λ/ν)n−2 P0 = (λ/µ)2 P0 (n − 1)(λ/ν)n−2 n=1

= (λ/µ)2 P0

n=2

∞ # n=1

n(λ/ν)n−1 =

n=2

(λ/µ)2 P0 = (1 − λ/ν)2

&

λν µ(ν − λ)

'2

P0 .

12-24

CHAPTER 12

Applications of Probability Theory

c) From part (a), with λ = 20, µ = 16, and ν = 24, & '−1 & '−1 (λ/µ)2 ν (20/16)2 · 24 8 P0 = 1 + λ/µ + = 1 + 20/16 + = . ν−λ 24 − 20 93 Applying now part (b) yields Lq =

&

λν µ(ν − λ)

'2

P0 =

&

20 · 24 16(24 − 20)

'2

·

8 150 = . 93 31

The steady-state expected number of customers in the queue is roughly 4.8. 12.28 In the M/M/2 queue, service is at rate µ when one customer is in the system and at rate 2µ when two or more customers are in the system. In the M/M/1 queue, service is always at rate 2µ when one or more customers are in the system. Hence the M/M/1 queue is more efficient. Note: Throughout the remaining exercises in this section, we let ρ = λ/sµ and r = λ/µ. Of course, we have ρ = r in the single-server case.

Theory Exercises 12.29 a) Let Y denote the steady-state number of customers in the queue and set Qn = P (Y = n). Then Y = 0 if X = 0 or 1, and Y = X − 1 if X ≥ 2. Referring to Equation (12.20) on page 704, we get Q0 = P (Y = 0) = P (X = 0) + P (X = 1) = 1 − ρ + (1 − ρ)ρ = 1 − ρ 2 = 1 − (λ/µ)2

and, for n ≥ 1,

Qn = P (Y = n) = P (X = n + 1) = (1 − ρ)ρ n+1 = (1 − λ/µ)(λ/µ)n+1 .

Thus, Qn = b) From part (a) and the formula Lq = E(Y ) = =

,∞

n=0

(1 − ρ)ρ 2

1 − (λ/µ)2 ,

(1 − λ/µ)(λ/µ)n+1 ,

n=1

∞ #

(1 − ρ)2

7

na n−1

nQn =

=

ρ2 1−ρ

=

∞ # n=1

=

1/(1 − a)2 ,

n(1 − ρ)ρ (λ/µ)2 1 − λ/µ

n+1

=

if n = 0; if n ≥ 1.

we get

= (1 − ρ)ρ λ2

µ(µ − λ)

2

∞ #

nρ n−1

n=1

.

Alternatively, we can use the FEF and Equations (12.20) and (12.24) on pages 704 and 707, respectively, to get ∞ ∞ ∞ # # # Lq = E(Y ) = 0 · P0 + 0 · P1 + (n − 1)Pn = nPn − Pn n=2

n=2

= (L − 1 · P1 ) − (1 − P0 − P1 ) = L − 1 + P0 = =

λ2 . µ(µ − λ)

n=2

λ λ −1+1− µ−λ µ

12.2

12-25

Basic Queueing Theory

12.30 a) As the service times are independent exponential random variables with common parameter µ, the time it takes for the server to serve n consecutive customers has a "(n, µ) distribution. Now, let W denote the steady-state waiting time in the queueing system for each individual customer. Because of the lack-of-memory property of an exponential random variable, we see that W |X=n ∼ "(n + 1, µ), where X denotes the number of customers in the queueing system when the individual customer arrives at the service facility. Applying the law of total probability and referring to Equation (12.20) on page 704, we find that, for t ≥ 0, P (W > t) =

∞ # n=0

P (X = n)P (W > t | X = n) =

= (1 − ρ)µ = (1 − ρ)µ

4



t

4

e−µx

&# ∞ n=0

∞ t

∞ & # n=0

(1 − ρ)ρ

n

4

∞ t

µn+1 x (n+1)−1 e−µx dx "(n + 1)

' 4 ∞ (λx)n dx = (1 − ρ)µ e−µx eλx dx n! t

e−(µ−λ)x dx =

'

(1 − ρ)µ −(µ−λ)t e = e−(µ−λ)t . µ−λ

Thus, we see that W has the exponential distribution with parameter µ − λ. b) From part (a) and the fact that the mean of an exponential random variable is the reciprocal of its parameter, we have W = E(W) =

1 . µ−λ

12.31 a) As the service times are independent exponential random variables with common parameter µ, the time it takes for the server to serve n consecutive customers has a "(n, µ) distribution. Now, let Wq denote the steady-state waiting time in the queue for each individual customer and let X denote the number of customers in the queueing system when the individual customer arrives at the service facility. Clearly, Wq |X=0 = 0 and, because of the lack-of-memory property of an exponential random variable, we have Wq |X=n ∼ "(n, µ) for n ≥ 1. Applying the law of total probability and referring to Equation (12.20) on page 704, we find that, for t ≥ 0, P (Wq > t) =

∞ # n=0

P (X = n)P (Wq > t | X = n)

= P (X = 0)P (Wq > t | X = 0) +

∞ # n=1

P (X = n)P (Wq > t | X = n)

' µn n−1 −µx =0+ x e dx (1 − ρ)ρ "(n) t n=1 &# ' 4 ∞ 4 ∞ ∞ (λx)n−1 −µx = (1 − ρ)λ e dx = (1 − ρ)λ e−µx eλx dx (n − 1)! t t n=1 4 ∞ (1 − ρ)λ −(µ−λ)t = (1 − ρ)λ e e−(µ−λ)x dx = = (λ/µ)e−(µ−λ)t . µ − λ t ∞ & #

n

4



12-26

CHAPTER 12

Thus, we see that FWq (t) =

Applications of Probability Theory

7

0, 1 − (λ/µ)e−(µ−λ)t ,

if t < 0; if t ≥ 0.

b) From Proposition 10.5 on page 577 and part (a), 4 4 ∞ ! " λ ∞ −(µ−λ)t λ 1 λ P (Wq > t) dt = e dt = · = . Wq = E Wq = µ 0 µ µ−λ µ(µ − λ) 0

12.32 An arriving customer must join the queue if and only if X ≥ 1. From Exercise 12.31, we know that Wq |X=0 = 0. Hence, for t ≥ 0, we have {Wq > t} ⊂ {X ≥ 1}. Therefore, from Exercise 12.31 and Equation (12.20) on page 704, we have, for t ≥ 0, P (Wq > t | X ≥ 1) =

P (Wq > t) P (Wq > t, X ≥ 1) (λ/µ)e−(µ−λ)t = = = e−(µ−λ)t . P (X ≥ 1) P (X ≥ 1) 1 − (1 − λ/µ)

Consequently, the conditional distribution of the waiting time in the queue, given that the customer must join the queue, is exponential with parameter µ − λ; the conditional expectation is, therefore, 1/(µ − λ).

12.33 a) Let X and Y denote the steady-state number of customers in the queueing system and in the queue, respectively. We have Y = 0 if X = 0 or 1, and Y = X − 1, if X ≥ 2. Hence, by the FEF, Lq = E(Y ) = 0 · P0 + 0 · P1 +

∞ # n=2

(n − 1)Pn =

∞ # n=2

nPn −

= (L − 1 · P1 ) − (1 − P0 − P1 ) = L − (1 − P0 ).

∞ #

Pn

n=2

b) From part (a) and Equations (12.24) and (12.20), Lq = L − (1 − P0 ) =

! " λ λ2 − 1 − (1 − λ/µ) = . µ−λ µ(µ − λ)

c) Let X and Y denote the steady-state number of customers in the queueing system and in the queue, respectively. We have Y = 0 if X = 0, 1, . . . , s, and Y = X − s, if X ≥ s + 1. Hence, by the FEF, Lq = E(Y ) = =L−

s # n=1

$

s # n=0

0 · Pn + $

∞ #

n=s+1

nPn − s 1 −

=L− s−

s # n=0

(n − s)Pn =

s # n=0

%

Pn

%

∞ #

n=s+1

=L−s−

nPn − s

s # n=0

∞ #

Pn

n=s+1

nPn + s

s #

Pn

n=0

(s − n)Pn .

Advanced Exercises 12.34 As noted in Exercise 12.24, we can assume that s ≤ N. Also, recall that we are letting ρ = λ/sµ and r = λ/µ. a) From Equation (12.13) on page 700, 7 7 nµ, if 1 ≤ n ≤ s − 1; λ, if 0 ≤ n ≤ N − 1; λn = µn = sµ, if n ≥ s. 0, if n ≥ N .

12.2

12-27

Basic Queueing Theory

Hence, n + λk−1 = µk k=1

Therefore,

=

r n /n!, (r s /s!)ρ n−s , 0,

if 1 ≤ n ≤ s − 1; = if s ≤ n ≤ N; if n ≥ N + 1.

=

r n /n!, r n /s! s n−s , 0,

if 1 ≤ n ≤ s − 1; if s ≤ n ≤ N; if n ≥ N + 1.

' # ∞ &+ n s−1 n N # # λk−1 r rn = + < ∞, µk n! n=s s! s n−s n=1 k=1 n=1

so that, by Proposition 12.3 on page 703, an M/M/s/N queueing system has a steady-state distribution. b) Referring again to Proposition 12.3 and to part (a), we find that $

' ∞ &+ n # λk−1 P0 = 1 + µk n=1 k=1 and, moreover,

%−1

=

$ s−1 n # r n=0

n!

+

N # n=s

rn s! s n−s

⎧ rn ⎪ P0 , ⎪ ⎪ ⎪ &+ ' n! n ⎨ λk−1 P0 = Pn = rn ⎪ µ P , k ⎪ k=1 n−s 0 ⎪ ⎪ ⎩ s! s 0,

%−1

=

$ s−1 n # r n=0

P0 =

$ s−1 n # r n=0

%−1

if 1 ≤ n ≤ s − 1; if s ≤ n ≤ N;

(∗)

if n ≥ N + 1.

We note that, if ρ = 1 (i.e., λ = sµ), then $ %−1 s−1 n # r rs , + (N − s + 1) P0 = n! s! n=0 whereas, if ρ ̸= 1 (i.e., λ ̸= sµ), then

N rs # + ρ n−s n! s! n=s

r s 1 − ρ N−s+1 + · n! s! 1−ρ

%−1

(∗∗)

.

(∗ ∗ ∗)

c) Assuming that λ < sµ (i.e., ρ < 1) and referring to Equation (∗ ∗ ∗), we find that, in the limit, as N → ∞, $ %−1 s−1 n # r rs + , P0 = n! s! (1 − ρ) n=0

which agrees with Equation (12.22) on page 704. Furthermore, referring to Equation (∗), we find that, in the limit, as N → ∞, ⎧ n r ⎪ ⎪ if 1 ≤ n ≤ s − 1; ⎨ P0 , n! Pn = ⎪ rn ⎪ ⎩ P0 , if n ≥ s. s! s n−s which agrees with Equation (12.23) on page 705. d) Let X and Y denote the steady-state number of customers in the queueing system and in the queue, respectively. We have Y = 0 if X = 0, 1, . . . , s, and Y = X − s, if X ≥ s + 1. Hence, by the FEF

12-28

CHAPTER 12

Applications of Probability Theory

and Equation (∗), Lq = E(Y ) = =

s # n=0

0 · Pn +

∞ #

(n − s)Pn =

n=s+1

N #

(n − s)Pn

n=s+1

N N # P0 # rn r s P0 # r s P0 N−s (n − s) n−s = (n − s)ρ n−s = nρ n . s! n=s+1 s s! n=s+1 s! n=1

We now consider two cases. Case 1: ρ = 1. In this case,

# # r s P0 N−s r s P0 N−s (N − s)(N − s + 1)r s n Lq = P0 , nρ = n= s! n=1 s! n=1 2s!

where P0 is given by Equation (∗∗). Case 2: ρ ̸= 1. In this case,

$ % & ' N−s s P N−s sP d # # # ρr ρr r s P0 N−s ρr s P0 d 1 − ρ N−s+1 0 0 n n−1 n Lq = nρ = nρ = ρ = s! n=1 s! n=1 s! dρ n=0 s! dρ 1−ρ " ! ρr s 1 − ρ N−s+1 − (1 − ρ)(N − s + 1)ρ N−s = P0 , s! (1 − ρ)2

where P0 is given by Equation (∗ ∗ ∗).

12.35 a) In steady state, the proportion of time , that the queueing system is in state n equals Pn and, when in ¯ state n, the arrival rate is λn . Hence, λ = ∞ n=0 λn Pn . b) For the M/M/1 and M/M/s queues, we have λn = λ for all n ≥ 0. Hence, from part (a), λ¯ =

∞ # n=0

λn Pn =

∞ # n=0

λPn = λ

∞ # n=0

Pn = λ · 1 = λ.

c) For the M/M/1/n and M/M/s/N queues, we have λn = λ if 0 ≤ n ≤ N − 1, and λn = 0 otherwise. Hence, from part (a) and the fact that Pn = 0 for n ≥ N + 1, $ % ∞ N−1 N−1 ∞ # # # # λ¯ = λn Pn = λPn = λ Pn = 1 − Pn λ = (1 − PN )λ. n=0

n=0

n=0

n=N

12.36 a) Let W and Wq denote the steady-state waiting times in the queueing system and in the queue, respectively, for each individual customer. Also, let S denote the service time for each individual customer. Because the service rate is µ, we have E(S) = 1/µ. Noting that W = Wq + S, we conclude that ! " ! " 1 W = E(W) = E Wq + S = E Wq + E(S) = Wq + . µ

b) From Little’s formulas and part (a), & ' 1 1 λ¯ L = λ¯ W = λ¯ Wq + = λ¯ Wq + λ¯ · = Lq + . µ µ µ

12.2

Basic Queueing Theory

12-29

¯ and Lq = λW ¯ q . We can use these two formulas and the two from c) Little’s formulas are L = λW parts (a) and (b) to solve for any three of the quantities L, Lq , W , and Wq in terms of the other one. d) From Exercise 12.35(b), we know that λ¯ = λ. Hence, Lq = L −

λ¯ λ λ λ2 = − = , µ µ−λ µ µ(µ − λ)

L λ/(µ − λ) 1 = = , ¯λ λ µ−λ 1 1 1 λ Wq = W − = − = . µ µ−λ µ µ(µ − λ) W =

These three results agree with those obtained in Exercises 12.29(b), 12.30(b), and 12.31(b), respectively. 12.37 From Exercise 12.35(b), we know that λ¯ = λ = 6.9. Recalling that µ = 7.4 and L = 13.8, we obtain the following results. a) The expected number of patients waiting to be treated is λ¯ = 13.8 − 6.9/7.4 ≈ 12.9. µ

Lq = L −

b) The expected waiting time until treatment commences is Wq =

Lq 13.8 − 6.9/7.4 = ≈ 1.9 hr, ¯λ 6.9

or about 112 minutes. c) The expected elapsed time from arrival at the emergency room until treatment is completed is L = 13.8/6.9 = 2 hr, λ¯

W = or 120 minutes.

12.38 From Exercise 12.35(b), we know that λ¯ = λ = 6.9. Recalling that µ = 7.4 and L ≈ 1.2, we obtain the following results. a) The expected number of patients waiting to be treated is Lq = L −

λ¯ ≈ 1.2 − 6.9/7.4 ≈ 0.27. µ

b) The expected waiting time until treatment commences is Wq =

Lq 1.2 − 6.9/7.4 ≈ ≈ 0.039 hr, ¯λ 6.9

or about 2.3 minutes. c) The expected elapsed time from arrival at the emergency room until treatment is completed is W = or about 10.4 minutes.

L ≈ 1.2/6.9 ≈ 0.17 hr, λ¯

12-30

CHAPTER 12

Applications of Probability Theory

d) By staffing two doctors instead of one, a drastic reduction occurs in the expected queue size and waiting times, as shown in the following table. Times are in minutes. Quantity

One doctor

Two doctors

L Lq Wq W

13.8 12.9 112 120

1.2 0.27 2.3 10.4

12.39 From Exercises 12.21–23, we have the following (approximate) results. Quantity

One teller

Two tellers

Three tellers

L Lq

10 9.09

1.15 0.24

0.94 0.03

From Exercise 12.35(b), we know that λ¯ = λ = 10. Applying Little’s formulas in the form W = L/λ¯ ¯ we get the following (approximate) results. and Wq = Lq /λ, a) The steady-state expected time, in minutes, that a customer spends in the queueing system, W : One teller

Two tellers

Three tellers

60

6.9

5.6

b) The steady-state expected time, in minutes, that a customer waits before being served, Wq : One teller

Two tellers

Three tellers

54.5

1.4

0.2

c) Referring to parts (a) and (b), we see that, by using two tellers instead of one, a drastic reduction occurs both in the expected time that a customer spends in the drive-up station and in the expected time a customer spends in the queue. Adding a third teller does reduce these two expected times somewhat further but would probably be considered an unjustified expense. 12.40 a) From Exercises 12.35(c) and 12.25(b), & ' 16 4 ¯λ = (1 − P4 )λ = 1 − (1.5) · 3 ≈ 1.85. 211 Therefore, from Exercises 12.36(b) and 12.25(c), Lq = L −

λ¯ 1.85 ≈ 2.76 − = 1.835. µ 2

b) From Little’s formulas and part (a), W = or about 89.5 minutes.

2.76 L ≈ ≈ 1.49 hr, 1.85 λ¯

12.3

12-31

The Multivariate Normal Distribution

c) From Little’s formulas and part (a), Wq = or about 59.5 minutes.

Lq 1.835 ≈ ≈ 0.99 hr, 1.85 λ¯

12.3 The Multivariate Normal Distribution Basic Exercises 12.41 ! " ! " a) First we note that !X is symmetric because Cov Xj , Xi = Cov Xi , Xj for all i and j . Applying the bilinearity property of covariance, Equation (7.40) on page 366, we find that for all x1 , . . . , xm , $ % $ % m m m # # # 0 ≤ Var xj Xj = Cov xj Xj , x k Xk =

j =1 m m ##

j =1 k=1

j =1

k=1

! " xj xk Cov Xj , Xk = x′ !X x.

Hence, x′ !X x ≥ 0 for all x ∈ Rm . Consequently, !X is nonnegative definite. b) Answers will vary. However, note that, if X1 , . . . , Xm are uncorrelated random variables with nonzero variances, then, in view of part (a), if x ̸= 0, x′ !X x =

m m # #

j =1 k=1

m ! " # xj xk Cov Xj , Xk = xk2 Var(Xk ) > 0. k=1

Hence, !X is positive definite. c) Answers will vary. However, note that, if X1 , . . . , Xm are constant random variables, then !X = 0 and, hence, !X is not positive definite. 12.42 , a) For 1 ≤ i ≤ k and 1 ≤ j ≤ m, let Yi = [Y]i , ai = [a]i , and bij = [B]ij . Then Yi = ai + jm=1 bij Xj . Therefore, $ % m m m # # # ! " [µY ]i = µYi = E ai + bij Xj = ai + bij E Xj = ai + bij µXj = [a + BµX ]i . j =1

j =1

Hence, µY = a + BµX . Also,

!

"

$

[!Y ]ij = Cov Yi , Yj = Cov ai + =

Hence, !Y = B!X B′ .

m m # #

n=1 p=1

j =1

m # n=1

bin Xn , aj +

m #

p=1

bjp Xp

%

m m # # ! " bin Cov Xn , Xp bjp = [B]in [!X ]np [B′ ]pj

= [B!X B′ ]ij .

n=1 p=1

12-32

CHAPTER 12

Applications of Probability Theory

b) As Z1 , . . . , Zm are independent standard normal random variables, we have µZ = 0m (the m × 1 zero vector) and !Z = Im (the m × m identity matrix). Applying part (a) with k = m, Y = X, and X = Z, we find that µX = a + BµZ = a + B · 0m = a and

!X = B!Z B′ = BIm B = BB′ . 12.43 a) We have E(Y1 ) = E(3 + 2X1 − 4X2 + X3 ) = 3 + 2E(X1 ) − 4E(X2 ) + E(X3 ) = 3 + 2 · (−1) − 4 · 0 + 5 = 6

and

E(Y2 ) = E(−1 + X2 − 4X3 ) = −1 + E(X2 ) − 4E(X3 ) = −1 + 0 − 4 · 5 = −21.

Also, recalling that Cov(X, Y ) = Cov(Y, X) and Cov(X, X) = Var(X), we get

Var(Y1 ) = Cov(Y1 , Y1 ) = Cov(3 + 2X1 − 4X2 + X3 , 3 + 2X1 − 4X2 + X3 ) = 4 Cov(X1 , X1 ) − 8 Cov(X1 , X2 ) + 2 Cov(X1 , X3 )

− 8 Cov(X2 , X1 ) + 16 Cov(X2 , X2 ) − 4 Cov(X2 , X3 )

+ 2 Cov(X3 , X1 ) − 4 Cov(X3 , X2 ) + Cov(X3 , X3 ) = 4 · 10 − 8 · (−5) + 2 · 2 − 8 · (−5) + 16 · 16 − 4 · (−3) + 2 · 2 − 4 · (−3) + 4

= 412,

Cov(Y1 , Y2 ) = Cov(3 + 2X1 − 4X2 + X3 , −1 + X2 − 4X3 )

= 2 Cov(X1 , X2 ) − 8 Cov(X1 , X3 ) − 4 Cov(X2 , X2 ) + 16 Cov(X2 , X3 ) + Cov(X3 , X2 ) − 4 Cov(X3 , X3 ) = 2 · (−5) − 8 · 2 − 4 · 16 + 16 · (−3) + (−3) − 4 · 4

= −157,

and

Var(Y2 ) = Cov(Y2 , Y2 ) = Cov(−1 + X2 − 4X3 , −1 + X2 − 4X3 )

= Cov(X2 , X2 ) − 4 Cov(X2 , X3 ) − 4 Cov(X3 , X2 ) + 16 Cov(X3 , X3 ) = 16 − 4 · (−3) − 4 · (−3) + 16 · 4

Consequently,

= 104.

>

6 µY = −21

b) From the given information, @ µX =

c) We note that

−1 0 5

?

>

412 !Y = −157

and

A

and >

!X =

? > 3 2 Y= + −1 0

−4 1

@

? −157 . 104

A 10 −5 2 −5 16 −3 . 2 −3 4 ? 1 X. −4

12.3

The Multivariate Normal Distribution

12-33

Hence, from Exercise 12.42(a) and part (b), >

? > 3 2 + µY = −1 0

−4 1

1 −4

? @ −1 A > ? > ? > ? 3 3 6 + = 0 = −1 −20 −21 5

and >

2 !Y = 0

=

>

A@ A ? @ 10 −5 2 2 0 −5 16 −3 −4 1 2 −3 4 1 −4 @ A ? > ? 2 0 −77 20 412 −157 −4 1 = 28 −19 −157 104 1 −4

−4 1 1 −4

42 −13

d) Although the methods used in parts (a) and (c) are essentially equivalent, doing the calculation with matrices, as in part (b), is simpler and provides a more organized way of proceeding. 12.44 From Proposition 12.5 on page 717, multivariate normal random variables are nonsingular if and only if their covariance matrix is nonsingular. We use this criterion in solving parts (a)–(d). a) The determinant of this matrix equals 1 and, hence, is nonzero. Thus, this matrix is nonsingular, which implies that the multivariate normal random variables with this covariance matrix are nonsingular. b) The determinant of this matrix equals 0 and, hence, the matrix is singular. Therefore, the multivariate normal random variables with this covariance matrix are singular. c) The determinant of this matrix equals 49 and, hence, is nonzero. Thus, this matrix is nonsingular, which implies that the multivariate normal random variables with this covariance matrix are nonsingular. d) The rows of this matrix are all multiples of one another. In particular, then, the matrix is singular. Therefore, the multivariate normal random variables with this covariance matrix are singular. 12.45 Let Z1 , . . . , Zm be independent standard normal random variables and let R be the m × m positive definite matrix such that R2 = !. Set Y = µ + RZ. From the argument leading to Definition 12.4, we know that Y1 , . . . , Ym are marginally normal random variables with µY = µ and !Y = RR′ = R2 = ! and, furthermore, that fY (y) =

1 1 − 21 (y−µY )′ !−1 − 21 (y−µ)′ !−1 (y−µ) Y (y−µY ) = e e . (2π)m/2 (det !Y )1/2 (2π)m/2 (det !)1/2

Thus, Y1 , . . . , Ym and X1 , . . . , Xm have the same joint PDFs, which implies that they have the same univariate and bivariate marginals. Hence, the univariate marginal distributions of X1 , . . . , Xm are all normal and, moreover, µX = µY = µ and !X = !Y = !. 12.46 a) Suppose that X has a nonsingular univariate normal distribution in the sense of Definition 12.4. Then fX (x) =

1 (2π)1/2 (det !)1/2

1



−1 (x−µ)

e− 2 (x−µ) !

,

where µ is a 1 × 1 vector and ! is a 1 × 1 positive definite matrix. Thus, we can write µ = [µ] and ! = [σ 2 ], where µ is a real number and σ is a positive real number. Therefore, (x − µ)′ !−1 (x − µ) = [x − µ]′ [σ −2 ][x − µ] = (x − µ)σ −2 (x − µ) = (x − µ)2 /σ 2 .

12-34

CHAPTER 12

Applications of Probability Theory

Noting that det ! = σ 2 , we conclude that fX (x) =

1 1 2 2 −(x−µ)2 /2σ 2 =√ e−(x−µ) /2σ . ! "1/2 e 2π σ (2π)1/2 σ 2

Hence, X has a univariate normal distribution in the sense of Definition 8.6. Conversely, suppose that X has a univariate normal distribution in the sense of Definition 8.6. Then 1 2 2 fX (x) = √ e−(x−µ) /2σ , 2π σ where µ and σ > 0 are real constants. Now, let µ = [µ] and ! = [σ 2 ]. We see that (x − µ)2 /σ 2 = (x − µ)σ −2 (x − µ) = [x − µ]′ [σ −2 ][x − µ] = (x − µ)′ !−1 (x − µ). Noting that σ 2 = det !, we conclude that fX (x) =

1 1 1 ′ −1 −(x−µ)2 /2σ 2 = e− 2 (x−µ) ! (x−µ) , ! "1/2 e 1/2 (det !)1/2 2 (2π) σ

(2π)1/2

where µ is a 1 × 1 vector and ! is a 1 × 1 positive definite matrix. Hence, X has a nonsingular univariate normal distribution in the sense of Definition 12.4. b) Suppose that X and Y have a nonsingular bivariate normal distribution in the sense of Definition 12.4. Then 1 1 ′ −1 fX,Y (x, y) = e− 2 (x−µ) ! (x−µ) , 2/2 1/2 (2π) (det !) where µ is a 2 × 1 vector and ! is a 2 × 2 positive definite matrix. Let X = [X, Y ]′ and x = [x, y]′ . From Exercise 12.45, we know that > ? µX µ = µX = µY and ! = !X =

>

Cov(X, X) Cov(Y, X)

? > σX2 Cov(X, Y ) = Cov(Y, Y ) ρσX σY

where ρ = ρ(X, Y ). In particular, we see that det ! = Therefore,

σX2 σY2

! " 1 − ρ2

(x − µ)′ !−1 (x − µ)

and

−1

!

ρσX σY σY2

> 1 σY2 " = 2 2! σX σY 1 − ρ 2 −ρσX σY

?

,

−ρσX σY σX2

?

.

1 2 1 2 2 2 2 ! " σ (x − µ ) − 2ρσ σ (x − µ )(y − µ ) + σ (y − µ ) X X Y X Y Y Y X σX2 σY2 1 − ρ 2 =& ' & '& ' & ' B x − µX 2 x − µX y − µY y − µY 2 1 . − 2ρ + = σX σX σY σY 1 − ρ2

=

12.3

12-35

The Multivariate Normal Distribution

Hence, fX,Y (x, y) = where

1 1 1 − 1 Q(x,y) 6 = e− 2 Q(x,y) , 1 21/2 e 2 ! " 2 2πσX σY 1 − ρ 2π σX2 σY2 1 − ρ 2

1 Q(x, y) = 1 − ρ2

=&

x − µX σX

'2

− 2ρ

&

x − µX σX

'&

y − µY σY

'

+

&

y − µY σY

'2 B

.

Consequently, X and Y are bivariate normal random variables in the sense of Definition 10.4. Conversely, suppose that X and Y are bivariate normal random variables in the sense of Definition 10.4. Then 1 1 6 fX,Y (x, y) = e− 2 Q(x,y) , 2 2πσX σY 1 − ρ where =& ' & '& ' & ' B 1 x − µX 2 x − µX y − µY y − µY 2 Q(x, y) = − 2ρ + . σX σX σY σY 1 − ρ2 Let

Note that

> ? x x= , y

det ! = and that Q(x, y) =

Hence,

σX2 σY2

>

? µX µ= , µY

! " 1 − ρ2

and

and

−1

!

>

σX2 != ρσX σY

ρσX σY σY2

> 1 σY2 " = 2 2! σX σY 1 − ρ 2 −ρσX σY

?

.

−ρσX σY σX2

?

1 2 1 2 2 2 2 ! " σ (x − µ ) − 2ρσ σ (x − µ )(y − µ ) + σ (y − µ ) X X Y X Y Y Y X σX2 σY2 1 − ρ 2

= (x − µ)′ !−1 (x − µ). fX,Y (x, y) =

1

1

(2π)2/2 (det !)1/2



−1 (x−µ)

e− 2 (x−µ) !

.

Consequently, X and Y have a nonsingular bivariate normal distribution in the sense of Definition 12.4. 12.47 a) Let

Then

X′ t



⎤ X1 . X = ⎣ .. ⎦

= t1 X1 + · · · + tm Xm so that

b) From part (a),

Xm

and



⎤ t1 . t = ⎣ .. ⎦ . tm

1 2 1 ′2 MX1 ,...,Xm (t1 , . . . , tm ) = E et1 X1 +···+tm Xm = E eX t = MX (t).

1 1 ′ ′2 1 ′ ′ 2 2 1 ′ ′ ′2 ′ ′ ′ ′ Ma+BX (t) = E e(a+BX) t = E ea t+X B t = ea t E eX B t = ea t E eX (B t) = ea t MX (B′ t).

12-36

CHAPTER 12

Applications of Probability Theory

12.48 a) We use matrix notation. The Jacobian determinant of the transformation z = !−1/2 (x − µ) is J (x) = det !−1/2 = (det !)−1/2 .

Solving z = !−1/2 (x − µ) for x, we get the inverse transformation x = µ + !1/2 z. Now, in terms of the inverse transformation, 1! 1! 2′ 2 ! " " "′ ! " (x − µ)′ !−1 (x − µ) = µ + !1/2 z − µ !−1 µ + !1/2 z − µ = !1/2 z !−1 !1/2 z ! "′ = z′ !1/2 !−1 !1/2 z = z′ !1/2 !−1 !1/2 z = z′ z.

Hence, by the multivariate transformation theorem, fZ (z) =

1 1 1 1 1 ′ −1 ′ fX (x) = e− 2 (x−µ) ! (x−µ) = e−z z , −1/2 m/2 1/2 m/2 |J (x)| (2π) (det !) (2π) (det !)

which, as we have seen, is the joint PDF of m independent standard normal random variables. Consequently, Z1 , . . . , Zm are independent standard normal random variables. −1/2 µ and B = !−1/2 . Note that B is a symmetric matrix so b) We can write Z = a! + BX, where "′ a = −! ′ ′ −1/2 ′ that B = B and a = −! µ = −µ B. From Exercise 12.47(b) and Proposition 12.4 on page 716, ′







1



MZ (t) = Ma+BX (t) = ea t MX (B′ t) = e−µ Bt MX (Bt) = e−µ Bt eµ (Bt)+ 2 (Bt) !(Bt) 1 ′ ′

1 ′

1 ′

−1/2 !!−1/2 t

= e 2 t B !Bt = e 2 t B!Bt = e 2 t !

1 ′

= e 2 t t.

Referring now to Lemma 12.1 on page 716, we conclude, in view of the uniqueness property of MGFs, that Z1 , . . . , Zm are independent standard normal random variables. c) We can write Z = a + BX, where a = −!−1/2 µ and B = !−1/2 . Applying Proposition 12.7, we conclude that Z has the multivariate normal distribution with parameters a + Bµ = −!−1/2 µ + !−1/2 µ = 0

and

Therefore, the joint MGF of Z1 , . . . , Zm is



1 ′

B!B′ = !−1/2 !!−1/2 = I. 1 ′

M(Z)t = e0 t+ 2 t It = e 2 t t .

Referring to Lemma 12.1, we see that this MGF is that of independent standard normal random variables. Hence, by the uniqueness property of MGFs, the random variables Z1 , . . . , Zm are independent standard normal. 12.49 In the univariate case, Z = [Z], ! = [σ 2 ], X = [X], and µ = [µ]. Hence, in that case, Equation (12.41) becomes ! "−1/2 X−µ Z = σ2 (X − µ) = . σ Consequently, Equation (12.41) is the multivariate analogue of standardizing. 12.50 Let Z = !−1/2 (X − µ). From Exercise 12.48, we know that Z1 , . . . , Zm are independent standard normal random variables. Furthermore, because !−1/2 is symmetric, 1 2′ 1 2 !−1/2 (X − µ) (X − µ)′ !−1 (X − µ) = (X − µ)′ !−1/2 !−1/2 (X − µ) = !−1/2 (X − µ) 2 . = Z′ Z = Z12 + · · · + Zm

Referring now to Proposition 8.13 on page 466 and the second bulleted item on page 537, we conclude that (X − µ)′ !−1 (X − µ) has the chi-square distribution with m degrees of freedom.

12.3

12-37

The Multivariate Normal Distribution

12.51 Choose an orthogonal matrix B such that B!X B′ is a diagonal matrix, say, D. Let Y = BX. From Proposition 12.7 on page 719, we have ! " Y ∼ Nm Bµ, B!x B′ = Nm (Bµ, D) . ! " Now, let U1 , . . . , Um be independent normal random variables with Uj ∼ N [Bµ]j , [D]jj . Then, because D is a diagonal matrix, it follows from Equation (12.30) on page 713 that Y and U have the same mean vector and covariance matrix. Because the joint MGF of multivariate normal random variables is determined by the mean vector and covariance matrix, we conclude that Y and U have the same joint MGF and, hence, the same joint distribution. Thus, in particular, Y1 , . . . , Ym are independent random variables. 12.52 Because (BB′ )′ = (B′ )′ B′ = BB′ , we see that BB′ is symmetric. Suppose that B is an m × n matrix. For x ∈ Rm , n # [x′ B]j2 ≥ 0. x′ (BB′ )x = (x′ B)(B′ x) = (x′ B)(x′ B)′ = j =1

Hence, BB′ is nonnegative definite.

12.53 a) Let a1 , . . . , am be real numbers. Then $ % $ % m m n m n m # # # # # # ai Xi = ai µi + bij Zj = ai µi + ai bij Zj . i=1

j =1

i=1

i=1

,m

j =1

i=1

Referring to Proposition 9.14 on page 540, we conclude that i=1 ai Xi is normally distributed. Applying now Proposition 12.6 on page 718, we see that X1 , . . . , Xm are multivariate normal random variables. b) We have $ % n n # # ! " bij Zj = µi + bij E Zj = µi . E(Xi ) = E µi + j =1

j =1

Furthermore, !

"

$

Cov Xi , Xj = Cov µi + Letting

n #

p=1

bip Zp , µj +

⎤ µ1 µ = ⎣ ... ⎦ ⎡

n #

bj q Zq

q=1

%

=



n n # #

p=1 q=1

b11 . B = ⎣ ..

and

µm

bm1

!

"

bip bj q Cov Zp , Zq =

n #

bip bjp .

p=1

⎤ . . . b1n .. ⎦ .. , . . . . . bmn

we see that µX = µ and !X = BB′ . c) Referring to part (b) and Proposition 12.5 on page 717, we see that a necessary and sufficient condition for X1 , . . . , Xm to be singular is that BB′ be a singular matrix. We note in passing that, if m = n, then BB′ is singular if and only if B is singular. Consequently, when m = n, X1 , . . . , Xm are singular if and only if B is singular. 12.54 Let ti be the m × 1 vector whose ith entry is t and has all other entries 0. Applying the multivariate analogue of Proposition 11.7 on page 645 and referring to Definition 12.5 on page 717, we get ′

1 ′

1

2

MXi (t) = MX (ti ) = eµ ti + 2 ti !ti = e[µ]i t+ 2 [!]ii t .

12-38

CHAPTER 12

Applications of Probability Theory

Therefore, Xi ∼ N ([µ]i , [!]ii ) and, from this result, we also conclude that µX = µ. Now, let 1 Q(t) = µ′ t + t′ !t, 2 so that MX (t) = eQ(t) . We have Q(t) =

m # k=1

[µ]k tk +

m m # 1# [!]kℓ tk tℓ . 2 k=1 ℓ=1

Using the symmetry of !, we find that m # ∂Q (t) = [µ]j + [!]kj tk ∂tj k=1

and

∂ 2Q (t) = [!]ij . ∂ti ∂tj

Furthermore, ∂MX ∂Q (t) = MX (t) (t) ∂tj ∂tj and, hence, ∂ 2Q ∂Q ∂Q ∂ 2 MX (t) = MX (t) (t) + MX (t) (t) (t). ∂ti ∂tj ∂ti ∂tj ∂ti ∂tj Applying now the multivariate analogue of Proposition 11.6 on page 643, we find that " ∂ 2 MX ! ∂ 2Q ∂Q ∂Q (0) + MX (0) (0) (0) (0) = MX (0) E Xi Xj = ∂ti ∂tj ∂ti ∂tj ∂ti ∂tj ! " = [!]ij + [µ]i [µ]j = [!]ij + E(Xi ) E Xj . " ! Consequently, Cov Xi , Xj = [!]ij , which means that !X = !. 12.55 Referring to Proposition 12.8 on page 719, we get the following results:

X1 ∼ N (2, 3), X2 ∼ N (1, 6), X3 ∼ N (0, 6), X4 ∼ N (−1, 2), &> ? > ?' &> ? > ?' 2 3 0 2 3 −1 , , (X1 , X3 ) ∼ N2 , , (X1 , X2 ) ∼ N2 1 0 6 0 −1 6 &> ? > ?' &> ? > ?' 2 3 0 1 6 3 (X1 , X4 ) ∼ N2 , , (X2 , X3 ) ∼ N2 , −1 0 2 0 3 6 &> ? > ?' &> ? > ?' 1 6 2 0 6 1 (X2 , X4 ) ∼ N2 , , (X3 , X4 ) ∼ N2 , −1 2 2 −1 1 2

$@ A @ A% $@ A @ 2 3 0 −1 2 3 (X1 , X2 , X3 ) ∼ N3 , (X1 , X2 , X4 ) ∼ N3 1 , 0 6 3 1 , 0 0 −1 3 6 −1 0 $@ A @ A% $@ A @ 2 3 −1 0 1 6 (X1 , X3 , X4 ) ∼ N3 , (X2 , X3 , X4 ) ∼ N3 0 , −1 6 1 0 , 3 −1 0 1 2 −1 2

0 0 6 2 2 2

A%

3 2 6 1 1 2

A%

12.3

12-39

The Multivariate Normal Distribution

Advanced Exercises 12.56 Let XP denote the column vector consisting of Xi1 , . . . , Xim and let B be the matrix obtained by permuting the rows of the m × m identity matrix according to the permutation P. Then XP = BX. a) From Proposition 12.7 on page 719, we know that XP = BX ∼ N m (Bµ, B!B′ ).

From linear algebra, Bµ is the vector obtained by permuting the rows of µ according to P, and B!B′ is the matrix obtained by permuting both the rows and columns of ! according to P. Thus, Bµ = µ∗ and B!B′ = !∗ and, hence, XP ∼ N m (µ∗ , !∗ ). b) We note that det B = ±1 and, hence, B is nonsingular. Now, from part (a), !∗ = B!B′ and, consequently, det !∗ = det(B!B′ ) = (det B)(det !)(det B′ ) = (det B)(det !)(det B) = (±1)2 det ! = det !.

Because X1 , . . . , Xm are nonsingular, det ! ̸= 0 and, hence, det !∗ ̸= 0; thus, !∗ is nonsingular. It now follows from Proposition 12.5 on page 717 that Xi1 , . . . , Xim are nonsingular multivariate random variables. 12.57 a) Write t′ = [t1′ , t2′ ], where t1 is j × 1 and t2 is k × 1. Now, and

µ′ t = µ′1 t1 + µ′2 t2

t′ !t = t1′ !11 t1 + t2′ !21 t1 + t1′ !12 t2 + t2′ !22 t2 = t1′ !11 t1 + 2t1′ !12 t2 + t2′ !22 t2 ,

where, in the last equality, we have used the fact that !′21 = !12 . Hence, ′

1 ′



1 ′





1 ′

MX (t) = eµ t+ 2 t !t = eµ1 t1 + 2 t1 !11 t1 et1 !12 t2 eµ2 t2 + 2 t2 !22 t2 .

Consequently, from Exercise 11.30,

1 ′



MX1 (t1 ) = MX (t1 , 0) = eµ1 t1 + 2 t1 !11 t1 .

It now follows from the uniqueness property of MGFs that X1 ∼ N j (µ1 , !11 ). b) Let j be a positive integer less than m. Consider a subcollection of X1 , . . . , Xm consisting of j random variables. Call the corresponding random vector Xj . To determine the (marginal) distribution of Xj , we first permute the components of X so that Xj comes first, applying as well the corresponding permutation to the mean vector and covariance matrix of X. From Exercise 12.56(a), the resulting random vector, Y, is m-dimensional multivariate normal. Next, we partition Y (and µY and !Y ) so that Y1 = Xj . Then, from part (a), Xj ∼ N j (µ1 , !11 ), where µ1 and !11 are the mean vector and covariance matrix, respectively, of Y1 . c) From part (a), 1 ′ 1 ′ ′ ′ ′ MX (t) = eµ1 t1 + 2 t1 !11 t1 et1 !12 t2 eµ2 t2 + 2 t2 !22 t2 1 ′



1 ′



and MX1 (t1 ) = eµ1 t1 + 2 t1 !11 t1 . Likewise, MX2 (t2 ) = eµ2 t2 + 2 t2 !22 t2 . Now, from the multivariate analogue of Propostion 11.9 on page 646, X1 and X2 are independent if and only if MX (t) = MX1 (t1 )MX2 (t2 ), which is the case if and only if ′

1 ′





1 ′



1 ′



1 ′

eµ1 t1 + 2 t1 !11 t1 et1 !12 t2 eµ2 t2 + 2 t2 !22 t2 = eµ1 t1 + 2 t1 !11 t1 eµ2 t2 + 2 t2 !22 t2 .

Hence, we see that X1 and X2 are independent if and only if t1′ !12 t2 = 0 for all t1 and t2 , which is true if and only if !12 = 0.

12-40

CHAPTER 12

Applications of Probability Theory

12.58 ′ ′ ′ a) Let U = X1 − !12 !−1 22 X2 , V = X2 , and W = [U , V ]. Then,

? > ? U X1 = BX, W= =B V X2 >

where

>

? Ij −!12 !−1 22 B= . 0 Ik ! " From Proposition 12.7 on page 719, W ∼ N m Bµ, B!B′ . However, >

I Bµ = j 0

−!12 !−1 22 Ik

?>

µ1 µ2

?

>

µ1 − !12 !−1 22 µ2 = µ2

?

and >

I B!B = j 0 ′

−!12 !−1 22 Ik

?>

!11 !21

!12 !22

?>

Ij −!−1 22 !21

0 Ik

?

>

!11 − !12 !−1 22 !21 = 0

? 0 , !22

where we have used the fact that !′12 = !21 . Referring to Exercise 12.57, we see that U and V (= X2 ) are independent random variables and that 1 2 −1 U ∼ N j µ1 − !12 !−1 µ , ! − ! ! ! 11 12 22 21 . 22 2 It now follows that

1 2 −1 −1 X1|X2 =x2 = !12 !−1 X + U |X2 =x2 = !12 !22 x2 + U = !12 !22 x2 + Ij U. 22 2

Applying Proposition 12.7 again, we conclude that

1 ! " ! " ′2 −1 −1 X1|X2 =x2 ∼ N j !12 !−1 x + I µ − ! ! µ , I ! − ! ! ! Ij 2 j 1 12 2 j 11 12 21 22 22 22 1 2 −1 . = N j µ1 + !12 !−1 (x − µ ), ! − ! ! ! 2 2 11 12 21 22 22

b) Let j and k be positive integers with j + k ≤ m. Consider a subcollection of X1 , . . . , Xm consisting of j random variables and another (mutually exclusive) subcollection of X1 , . . . , Xm consisting of k random variables. Call the corresponding random vectors Xj and Xk , respectively. To determine the conditional distribution of Xj given Xk = xk , we first permute the components of X so that Xj comes first, Xk comes second, and the remaining variables come third, applying as well the corresponding permutation to the mean vector and covariance matrix of X. From Exercise 12.56(a), the resulting random vector, Y, is m-dimensional multivariate normal. Next, we consider the random vector, W, consisting of the first j + k components of Y, which, from Exercise 12.57(a) is (j + k)-dimensional multivariate normal. We can then apply part (a) to obtain the conditional distribution of Xj given Xk = xk . c) In the bivariate normal case, > ? > ? > ? ρσX σY σX2 X µX X= , µ= , and != . Y µY ρσX σY σY2

12.3

12-41

The Multivariate Normal Distribution

Then, from part (a), 1 2 ! "−1 ! "−1 X|Y =y ∼ N 1 µX + ρσX σY σY2 (y − µY ), σX2 − ρσX σY σY2 ρσX σY & ' ! " σX 2 2 = N µX + ρ (y − µY ), σX 1 − ρ , σY

which is Proposition 10.10(e).

12.59 a) Here we have the following partition: ⎤ ⎤ ⎡ ⎡ [2] [ X1 ] ⎢@ 1A⎥ ⎢@X A⎥ 2 ⎥, ⎥, ⎢ µ = X=⎢ ⎣ ⎣ X ⎦ 0 ⎦ 3 −1 X4 Simple calculations show that

!−1 22

1 = 36

@

Therefore, µ1 + !12 !−1 22 (x2

1 − µ2 ) = 2 + [0 36 =2+ =

1 [4 36

and

11 −4 −9

A −9 0 . 27

−4 8 0 @

A@ A 11 −4 −9 x2 − 1 −1 0 ] −4 8 0 x3 − 0 −9 0 27 x4 + 1 @ A x2 − 1 1 2 −8 0 ] x3 − 0 = 2 + (x2 − 1) − x3 9 9 x4 + 1

2 17 1 + x2 − x3 9 9 9

and !11 − !12 !−1 22 !21

⎤ [3] [ 0 −1 0 ] ⎢@ 0A @6 3 2A⎥ ⎥. !=⎢ ⎣ −1 3 6 1 ⎦ 0 2 1 2 ⎡

1 =3− [0 36

−1

Consequently, from Exercise 12.58(a),

11 −4 −9 0 ] −4 8 0 −9 0 27

X1|X2 =x2 ,X3 =x3 ,X4 =x4 ∼ N b) Here we have the following partition: ⎡> ⎡ > ? ⎤ ?⎤ X1 2 ⎢ X ⎥ ⎢ 1 ⎥ 2 ⎥ ⎢ ⎢ ⎥ ?⎥, ?⎥, X = ⎢> µ = ⎢> ⎣ X3 ⎦ ⎣ 0 ⎦ X4 −1 Simple calculations show that

!−1 22

@

&

A@

0 −1 0

A

=3−

2 25 = . 9 9

' 17 1 2 25 + x2 − x3 , . 9 9 9 9

and

> 1 2 = 11 −1

⎡ >

3 0

⎢ ⎢ ! = ⎢> ⎣ −1 0

? −1 . 6

0 6

?

3 2

?

?⎤ −1 0 3 2 ⎥ ⎥ > ? ⎥. 6 1 ⎦ 1 2

>

12-42

CHAPTER 12

Applications of Probability Theory

Therefore, > ? ? > ?> ?> 1 −1 0 2 2 −1 x3 − 0 + 1 x4 + 1 3 2 −1 6 11 > ? ? > ? > ?> > ? 1 −2 1 1 −2x3 + x4 + 1 2 2 x3 − 0 = + = + 1 x4 + 1 1 4 9 11 11 4x3 + 9x4 + 9 @ 23 A 1 2 11 − 11 x3 + 11 x4 = 20 . 4 9 + x + x 3 4 11 11 11

µ1 + !12 !−1 22 (x2 − µ2 ) =

and !11 − !12 !−1 22 !21

>

3 = 0 > 3 = 0 @ 31

=

11 4 11

? > ?> ?> ? 1 −1 0 0 2 −1 −1 3 − 6 3 2 −1 6 0 2 11 > ? > ?> ? > ? 1 −2 1 1 2 0 −1 3 3 0 − = − 6 −4 4 9 0 2 0 6 11 11 A 4 11 36 11

−4 30

?

.

Consequently, from Exercise 12.58(a), $@ 23 11 20 11

(X1 , X2 )|X3 =x3 ,X4 =x4 ∼ N 2

c) Here we have the following partition: ⎡@ ⎡ @ A⎤ A⎤ 2 X1 ⎢ X2 ⎥ ⎢ 1 ⎥ ⎥ ⎥ X=⎢ µ=⎢ ⎣ X3 ⎦ , ⎣ 0 ⎦, [ X4 ] [ −1 ]

− +

2 11 x3 4 11 x3

+ +

1 11 x4 9 11 x4

A @ 31 ,

⎡@

3 ⎢ 0 !=⎢ ⎣ −1 [0

and

11 4 11

4 11 36 11

A%

0 −1 6 3 3 6 2 1]

A

.

@ A⎤ 0 2 ⎥ ⎥ 1 ⎦. [2]

We have !−1 22 = 1/2. Therefore, ⎤ @ A @ A @ A @ A ⎡ 2 0 2 0 2 1 1 ⎣ 2 + x4 ⎦ µ1 + !12 !−1 22 (x2 − µ2 ) = 1 + 2 2 [ x4 + 1 ] = 1 + 2 2(x4 + 1) = 1 1 0 1 0 x4 + 1 2 + 2 x4

and

!11 − !12 !−1 22 !21

=

@

=

@

3 0 −1 0 6 3 −1 3 6 3 0 0 4 −1 2

A

1 − 2

@ A 0 2 [0 1

2

1] =

@

A −1 2 . 11/2

3 0 −1

0 −1 6 3 3 6

A

Consequently, from Exercise 12.58(a), (X1 , X2 , X3 )|X4 =x4

⎛⎡

⎤ @ 2 3 ∼ N 2 ⎝⎣ 2 + x 4 ⎦ , 0 1 1 −1 2 + 2 x4

0 4 2

A⎞ −1 2 ⎠. 11/2

1 − 2

@

0 0 0

0 4 2

0 2 1

A

12.4

Sampling Distributions

12-43

12.4 Sampling Distributions Basic Exercises 12.60 Noting that T is a symmetric random variable (i.e., fT is an even function), we find that, for t > 0, ! " ! √ !√ " √" FT 2 (t) = P T 2 ≤ t = P − t ≤ T ≤ t = 2FT t − 1.

Therefore,

! " !√ " 1 −1/2 " (ν + 1)/2 fT 2 (t) = 2fT t · √ =t (1 + t/ν)−(ν+1)/2 √ νπ "(ν/2) 2 t ! " ! " " (ν + 1)/2 (1/ν)1/2 " (1 + ν)/2 t 1/2−1 t 1/2−1 = √ = ! "(1+ν)/2 . "(1/2)"(ν/2) νπ "(ν/2) (1 + t/ν)(ν+1)/2 1 + (1/ν)t

Referring now to Equation (12.54) on page 727, we see that T 2 has the F -distribution with degrees of freedom 1 and ν. 12.61 Let χ 2 = (n − 1)Sn2 /σ 2 . From Proposition 12.11 on page 725,!we know that χ"2 has the chi-square distribution with n − 1 degrees of freedom or, equivalently, χ 2 ∼ " (n − 1)/2, 1/2 . Consequently, a) We have

! " (n − 1)/2 E χ2 = =n−1 1/2 E

!

Sn2

"

&

σ2 2 =E χ n−1

'

and

=

! " (n − 1)/2 Var χ 2 = = 2(n − 1). (1/2)2

σ 2 ! 2" σ2 E χ = · (n − 1) = σ 2 . n−1 n−1

Therefore, Sn2 is an unbiased estimator of σ 2 . b) We have ' & 2 ! 2" ! 2" σ4 2σ 4 σ σ4 2 Var Sn = Var χ = Var χ · 2(n − 1) = . = n−1 n−1 (n − 1)2 (n − 1)2

From part (a) and Chebyshev’s inequality, for each ϵ > 0, ! 2" 8 18 2 8 8 ! ! " " Var Sn 2σ 4 8 2 8 P 8Sn − σ 2 8 ≥ ϵ = P 8Sn2 − E Sn2 8 ≥ ϵ ≤ = , ϵ2 (n − 1)ϵ 2

which converges to 0 as n → ∞. Consequently, from the complementation rule, for each ϵ > 0, 8 8 2 2 18 18 8 8 8 8 lim P 8Sn2 − σ 2 8 < ϵ = 1 − lim P 8Sn2 − σ 2 8 ≥ ϵ = 1 − 0 = 1. n→∞

n→∞

In other words, Sn2 is a consistent estimator of σ 2 .

12.62 6 ! " √ a) Referring to Exercise 12.61, we can write Sn = σ χ 2 / n − 1, where χ 2 ∼ " (n − 1)/2, 1/2 . Now, suppose that Y ∼ " (α, λ). Then, from the FEF, 4 ∞ 1√ 2 4 ∞ √ λα λα α−1 −λy y y e dy = y α+1/2−1 e−λy dy E Y = "(α) "(α) 0 0 =

λα "(α + 1/2) 1 "(α + 1/2) . =√ α+1/2 "(α) λ "(α) λ

12-44

CHAPTER 12

Applications of Probability Theory

Hence, &

σ

6

χ2

'

σ

E(Sn ) = E √ =√ E n−1 n−1 % $M 2 "(n/2) ! " σ. = n − 1 " (n − 1)/2

16

χ2

2

! " 1 " (n − 1)/2 + 1/2 ! " =√ √ " (n − 1)/2 n − 1 1/2 σ

b) Here we apply Proposition 7.7 ! on " page 354 (see also Exercise 7.81). From the solution to Exercise 12.61(b), we know that Var Sn2 > 0. Hence, Sn2 is not a constant random variable, which implies that Sn is not a constant random variable, which, in turn, implies that Var(Sn ) > 0. Referring to Exercise 12.61(a), we conclude that ! " ! "2 ! "2 0 < Var(Sn ) = E Sn2 − E(Sn ) = σ 2 − E(Sn ) , which shows that E(Sn ) < σ . In particular, then, Sn is not an unbiased estimator of σ .

12.63 a) From Proposition 12.12 on page 726,

Tn =

X¯ n − µ √ Sn / n

has the Student’s t-distribution with n − 1 degrees of freedom. Using the fact that a t-distribution is continuous and symmetric, we conclude that & ' ! " X¯ n − µ P −t(1+γ )/2,n−1 ≤ √ ≤ t(1+γ )/2,n−1 = P −t(1+γ )/2,n−1 ≤ Tn ≤ t(1+γ )/2,n−1 Sn / n ! " ! " = FTn t(1+γ )/2,n−1 − FTn −t(1+γ )/2,n−1 " ! 1+γ = 2FTn t(1+γ )/2,n−1 − 1 = 2 · − 1 = γ. 2 Solving for µ in the inequalities in the first term in the preceding display yields 1 √ √ 2 P X¯ n − t(1+γ )/2,n−1 · Sn / n ≤ µ ≤ X¯ n + t(1+γ )/2,n−1 · Sn / n = γ . b) From part (a), chances are 100γ % that the random interval from √ √ X¯ n − t(1+γ )/2,n−1 · Sn / n to X¯ n + t(1+γ )/2,n−1 · Sn / n

contains µ. Hence, we can be 100γ % confident that any such computed interval contains µ. In other words, the interval from √ √ x¯n − t(1+γ )/2,n−1 · sn / n to x¯n + t(1+γ )/2,n−1 · sn / n,

where x¯n and sn are the observed values of X¯ n and Sn , respectively, is a 100γ % confidence interval for µ.

12.64 a) We have γ = 0.95 and n = 16, so that (1 + γ )/2 = 0.975 and n − 1 = 15. From the note, then, we have t(1+γ )/2,n−1 = t0.975,15 = 2.13. We also know that x¯n = x¯16 = 86.2 and sn = s16 = 8.5. Referring now to Exercise 12.63(b), we conclude that a 95% confidence interval for µ is from √ √ 86.2 − 2.13 · 8.5/ 16 to 86.2 − 2.13 · 8.5/ 16, or from 81.7 mm Hg to 90.7 mm Hg.

12.4

Sampling Distributions

12-45

b) We can be 95% confident that the mean arterial blood pressure for all children of diabetic mothers is somewhere between 81.7 mm Hg and 90.7 mm Hg. c) From Proposition 12.12 on page 726, we must assume that arterial blood pressure for children of diabetic mothers is normally distributed. Technically, for the independence assumption in Proposition 12.12, we must, in addition, assume that the sampling is with replacement. However, because the sample size is small relative to the population size, sampling without replacement is also acceptable. 12.65 a) We assume that the current price for bananas is normally distributed. Technically, for the independence assumption in Proposition 12.12 on page 726, we must, in addition, assume that the sampling is with replacement. However, because the sample size is small relative to the population size, sampling without replacement is also acceptable. Here n = 15. If the mean retail price of bananas hasn’t changed from that in 1998, then µ = 51.0 and, from Proposition 12.12, T15 =

X¯ 15 − 51.0 √ S15 / 15

has the Student’s t-distribution with 14 degrees of freedom. In this case, the observed value of T15 is t15 =

x¯15 − 51.0 53.4 − 51.0 = = 2.656. √ √ s15 / 15 3.5/ 15

Hence, using the symmetry of a Student’s t-distribution, we get ! " ! " P (|T15 | ≥ |t15 |) = P (|T15 | ≥ 2.656) = 2 1 − P (T15 ≤ 2.656) = 2 1 − F (2.656) ,

where F is the CDF of the Student’s t-distribution with 14 degrees of freedom. b) Yes, we can reasonably conclude that the current mean retail price of bananas is different from the 1998 mean of 51.0 cents per pound. Indeed, there is less than a 2% chance of observing a value of the one-sample-t statistic at least as large in magnitude as that actually observed if the current mean retail price of bananas has not changed from that in 1998. 12.66 a) From Proposition 12.13 on page 728, Fn1 ,n2 =

Sn21 /σ12 Sn22 /σ22

has the F -distribution with degrees of freedom n1 − 1 and n2 − 1. Using the fact that an F -distribution is continuous, we conclude that % $ Sn21 /σ12 ≤ f(1+γ )/2,n1 −1,n2 −1 P f(1−γ )/2,n1 −1,n2 −1 ≤ Sn22 /σ22 ! " = P f(1−γ )/2,n1 −1,n2 −1 ≤ Fn1 ,n2 ≤ f(1+γ )/2,n1 −1,n2 −1 ! " ! " = FFn1 ,n2 f(1+γ )/2,n1 −1,n2 −1 − FFn1 ,n2 f(1−γ )/2,n1 −1,n2 −1 =

1+γ 1−γ − = γ. 2 2

Solving for σ12 /σ22 in the inequalities in the first term in the preceding display yields % $ Sn21 /Sn22 Sn21 /Sn22 σ12 ≤ 2 ≤ = γ. P f(1+γ )/2,n1 −1,n2 −1 f(1−γ )/2,n1 −1,n2 −1 σ2

12-46

CHAPTER 12

Applications of Probability Theory

b) From part (a), chances are 100γ % that the random interval from Sn21 /Sn22 f(1+γ )/2,n1 −1,n2 −1

to

Sn21 /Sn22 f(1−γ )/2,n1 −1,n2 −1

contains σ12 /σ22 . Hence, we can be 100γ % confident that any such computed interval contains σ12 /σ22 . In other words, the interval from sn21 /sn22 f(1+γ )/2,n1 −1,n2 −1

to

sn21 /sn22 f(1−γ )/2,n1 −1,n2 −1

where sn21 and sn22 are the observed values of Sn21 and Sn22 , respectively, is a 100γ % confidence interval for σ12 /σ22 . 12.67 a) We have γ = 0.95 so that (1 − γ )/2 = 0.025 and (1 + γ )/2 = 0.975. Furthermore, n1 = n2 = 10, so that, from the note, f(1−γ )/2,n1 −1,n2 −1 = f0.025,9,9 = 0.248

and

f(1+γ )/2,n1 −1,n2 −1 = f0.975,9,9 = 4.026.

2 = (128.3)2 and s 2 = s 2 = (199.7)2 . Referring now to Exercise 12.66(b), We also know that sn21 = s10 n2 10 we conclude that a 95% confidence interval for σ12 /σ22 is from

(128.3/199.7)2 4.026

to

(128.3/199.7)2 , 0.248

or from 0.10 to 1.66. b) We can be 95% confident that the ratio of the variances of tear strength for Brand A and Brand B vinyl floor coverings is somewhere between 0.10 and 1.66. c) We assume that tear strengths for both Brand A and Brand B vinyl floor coverings are normally distributed. 12.68 3∞ ! "−(ν+1)/2 dt < ∞. a) From the FEF, we know that T has finite expectation if and only if −∞ |t| 1 + t 2 /ν 2 Using symmetry and making the substitution u = 1 + t /ν, we find that 4 ∞ 4 ∞ 4 ∞ |t| t du dt = 2 dt = ν . ! "(ν+1)/2 ! "(ν+1)/2 (ν+1)/2 2 2 u −∞ 1 + t /ν 0 1 1 + t /ν

From calculus, we know that the integral on the right of the preceding display converges (i.e., is finite) if and only if (ν + 1)/2 > 1, that is, if and only if ν > 1. b) We note that the PDF of T is an even function (i.e., it is symmetric about 0). Consequently, by Exercise 10.20(c), when T has finite expectation, we have E(T ) = 0. 3∞ ! "−(ν+1)/2 dt < ∞. Using symmetry c) We note that T has finite variance if and only if −∞ t 2 1 + t 2 /ν 2 and making the substitution u = 1 + t /ν, we find that 4 ∞ 4 ∞ 4 ∞ √ t2 t2 u−1 3/2 du. "(ν+1)/2 dt = 2 "(ν+1)/2 dt = ν ! ! (ν+1)/2 u −∞ 1 + t 2 /ν 0 1 1 + t 2 /ν Noting that the integrand on3 the right of the preceding display is asymptotic to u−ν/2 as u → ∞ and ∞ recalling from calculus that 1 u−p du converges (i.e., is finite) if and only if p > 1, we see that T has finite variance if and only if ν/2 > 1, that is, if and only if ν > 2.

12.4

Sampling Distributions

12-47

d) Using integration by parts, we find that 4 ∞ 4 ∞ t2 ν dt "(ν+1)/2 dt = "(ν−1)/2 . ! ! ν−1 0 0 1 + t 2 /ν 1 + t 2 /ν √ Making the substitution u = t (ν − 2)/ν and using the fact that a t PDF with ν − 2 degrees of freedom integrates to 1, we get 4

∞ 0

dt

=

! "(ν−1)/2 1 + t 2 /ν

M 1 2

ν ν−2

M

4

0

4

du "!(ν−2)+1"/2 ! 2 1 + u /(ν − 2)



du " "! ! −∞ 1 + u2 /(ν − 2) (ν−2)+1 /2 ! " ! " √ M (ν − 2)π " (ν − 2)/2 " (ν − 2)/2 1 ν 1√ ! " ". = · = νπ ! 2 ν−2 2 " (ν − 1)/2 " (ν − 1)/2 =

ν ν−2



Recalling from part (b) that E(T ) = 0, we conclude that ! "4 ∞ ! 2 " " (ν + 1)/2 t2 Var(T ) = E T = √ dt " ! νπ "(ν/2) −∞ 1 + t 2 /ν (ν+1)/2 ! "4 ∞ " (ν + 1)/2 t2 =2√ "(ν+1)/2 dt ! νπ "(ν/2) 0 1 + t 2 /ν ! " ! " " (ν − 2)/2 " (ν + 1)/2 1√ ν " · νπ ! · =2√ νπ "(ν/2) ν − 1 2 " (ν − 1)/2 ! " ! " ! " (ν − 1)/2 " (ν − 1)/2 ν 1 " (ν − 2)/2 " ! "· ! " = 2! · (ν − 2)/2 " (ν − 2)/2 ν − 1 2 " (ν − 1)/2 =

ν . ν−2

12.69 "N! √ " ! a) From Proposition 12.9 on page 723, the random variable Z = X¯ n − µ σ/ n has the standard normal distribution and, hence, has mean 0 and variance 1. ! "N! √ " b) From Proposition 12.12 on page 726, the random variable Tn = X¯ n − µ s/ n has the Student’s t-distribution with n − 1 degrees of freedom. Referring to Exercise 12.68, we see that Tn has finite mean if and only if n ≥ 3, in which case, E(Tn ) = 0, and that Tn has finite variance if and only if n ≥ 4, in which case, n−1 n−1 Var(Tn ) = = . (n − 1) − 2 n−3

c) We see that the random variables in parts (a) and (b) both have mean 0 (when the mean of the latter exists). Futhermore, they asymptotically have the same variance because lim Var(Tn ) = lim

n→∞

n→∞

n−1 = 1 = Var(Z) . n−3

12-48

CHAPTER 12

Applications of Probability Theory

12.70 a) Answers will vary. b) Answers will vary. c) Answers will vary, but here is what we obtained:

d) The two histograms suggest that the distributions of both the standardized version of the sample mean and the studentized version of the sample mean are bell shaped and symmetric about 0. e) The two histograms suggest that the distribution of the studentized version of the sample mean has more spread (larger variance) than that of the standardized version of the sample mean. Intuitively, this difference is due to the fact that the variation in the possible values of the standardized version is due solely to the variation of sample means, whereas that of the studentized version is due to the variation of both sample means and sample standard deviations. Mathematically, the difference in variation follows from Exercise 12.69; specifically, Var(T4 ) = 3 > 1 = E(Z). f) The histograms would have the same look if the simulations were carried out with a different normal distribution because the distributions of Z and T4 don’t depend on µ and σ .

Theory Exercises 12.71 a) We have n 1# 1 Y1 = X¯ n = Xk = (X1 + · · · + Xn ) n k=1 n

n " 1# 1! Xk = (n − 1)X1 − X2 − · · · − Xn Y2 = X1 − X¯ n = X1 − n k=1 n

n " 1# 1! Xk = −X1 + (n − 1)X2 − X3 − · · · − Xn Y3 = X2 − X¯ n = X2 − n k=1 n

.. .

Yn = Xn−1 − X¯ n = Xn−1 − Yn+1 = Xn − X¯ n = Xn −

n " 1# 1! −X1 − · · · − Xn−2 + (n − 1)Xn−1 − Xn Xk = n k=1 n

n " 1# 1! −X1 − · · · − Xn−1 + (n − 1)Xn . Xk = n k=1 n

12.4

12-49

Sampling Distributions

Hence, we can write Y = BX, where B is the (n + 1) × n matrix given by ⎡ ⎤ 1 1 1 ··· 1 1 ⎢ n − 1 −1 −1 · · · −1 −1 ⎥ ⎢ ⎥ ⎢ ⎥ −1 n − 1 −1 · · · −1 −1 1⎢ B = ⎢ .. . .. .. .. .. ⎥ .. n⎢ . . . . . . ⎥ ⎥ ⎣ −1 −1 −1 · · · n − 1 −1 ⎦ −1 −1 −1 · · · −1 n − 1

b) As X1 , . . . , Xn are independent random variables with common variance σ 2 , they are multivariate normal random variables with !X = σ 2 In , where In is the n × n identity matrix. Applying Proposition 12.7 on page 719, we conclude that Y has the multivariate normal distribution with covariance matrix B!X B′ = σ 2 BB′ . Next we note that the sum of the first row of B equals 1 and that the sums of the other rows of B equal 0. Hence, the first row of !Y is [σ 2 /n, 0, 0, . . . , 0]. We can thus partition Y and !Y as follows: ⎡ ⎤ [ Y1 ] @ A ⎢⎡ Y ⎤⎥ 0 [ σ 2 /n ] 2 ⎢ ⎥ , Y=⎢ and !Y = ⎥ ⎣ ⎣ ... ⎦ ⎦ 0 !22 Yn+1 where !22 is the covariance matrix of Y2 , . . . , Yn+1 . Referring to Exercise 12.57(c), we conclude that Y1 and Y2 , . . . , Yn+1 are independent, that is, that X¯ n is independent of X1 − X¯ n , . . . , Xn − X¯ n . c) From part (b) and the fact that Sn2 is a function of X1 − X¯ n , . . . , Xn − X¯ n , we conclude that X¯ n and Sn2 are independent random variables.

Advanced Exercises 12.72 a) Because Sn21 and Sn22 are both unbiased estimators of σ 2 , we have 1 2 1 2 1 2 E cSn21 + (1 − c)Sn22 = cE Sn21 + (1 − c)E Sn22 = cσ 2 + (1 − c)σ 2 = σ. Hence, cSn21 + (1 − c)Sn22 is an unbiased estimator of σ 2 . b) From the solution to Exercise 12.61(b), we have 1 2 1 2 2σ 4 2σ 4 Var Sn21 = and Var Sn21 = . n1 − 1 n2 − 1

Therefore, & 1 2 1 2 1 2 2 2 2 2 2 2 4 Var cSn1 + (1 − c)Sn2 = c Var Sn1 + (1 − c) Var Sn2 = 2σ

c2 (1 − c)2 + n1 − 1 n2 − 1

'

.

Taking the derivative with respect to c of the term on the right of the preceding display, setting the result equal to 0, and solving for c, we find that c = (n1 − 1)/(n1 + n2 − 2). c) Referring to part (b), we see that, among the estimators of σ 2 of the form cSn21 + (1 − c)Sn22 , the one with minimum variance is & ' (n1 − 1)Sn21 + (n2 − 1)Sn22 n1 − 1 n1 − 1 2 2 S + 1− Sn2 = , n1 + n2 − 2 n1 + n2 − 2 n1 n1 + n2 − 2 which we denote Sp2 .

12-50

CHAPTER 12

Applications of Probability Theory

12.73 a) We begin by noting that (n1 + n2 − 2)Sp2

=

(n1 − 1)Sn21

+

(n2 − 1)Sn22

. σ2 σ2 σ2 From Proposition 12.11 on page 725, the two random variables in the sum on the right of the preceding display have the chi-square distributions with n1 − 1 and n2 − 1 degrees of freedom, respectively. Furthermore, because the random samples are independent, those two random variables are independent. Hence, from the second bulleted item on page 537, (n1 + n2 − 2)Sp2 /σ 2 has the chi-square distribution with (n1 − 1) + (n2 − 1) = n1 + n2 − 2 degrees !of freedom." ! " b) From previous results, we know that X¯ n1 ∼ N µ1 , σ 2 /n1 and X¯ n2 ∼ N µ2 , σ 2 /n2 . Furthermore, because the random samples are independent, X¯ n1 and X¯ n2 are independent. Hence, ! " ! " ! " E X¯ n1 − X¯ n2 = E X¯ n1 − E X¯ n2 = µ1 − µ2 and

" " ! " ! " σ2 ! ! σ2 + = σ 2 (1/n1 ) + (1/n2 ) . Var X¯ n1 − X¯ n2 = Var X¯ n1 + Var X¯ n2 = n1 n2 It follows that the random variable ! " X¯ n1 − X¯ n2 − (µ1 − µ2 ) Z= √ σ (1/n1 ) + (1/n2 ) has the standard normal distribution. Now, let Y =

(n1 + n2 − 2)Sp2 σ2

.

From Proposition 12.10 on page 724, X¯ n1 and Sn21 are independent, as are X¯ n2 and Sn22 . Thus, because the random samples are independent, X¯ n1 , Sn21 , X¯ n2 , and Sn22 are independent random variables. Consequently, from Proposition 6.13N√ on page 297, Z and Y are independent. Referring now to part (a) and Exercise 9.164, we conclude that Z Y/(n1 + n2 − 2) has the Student’s t-distribution with n1 + n2 − 2 degrees of freedom. However, 2N √ 1! " ¯ n1 − X¯ n2 − (µ1 − µ2 ) σ (1/n1 ) + (1/n2 ) X Z 5 = √ Y/(n1 + n2 − 2) S 2 /σ 2 p

! " X¯ n1 − X¯ n2 − (µ1 − µ2 ) = = T. √ Sp (1/n1 ) + (1/n2 )

12.74 Using the fact that a t-distribution is continuous and symmetric and referring to Exercise 12.73, we conclude that $ % ! " X¯ n1 − X¯ n2 − (µ1 − µ2 ) P −t(1+γ )/2,n1 +n2 −2 ≤ ≤ t(1+γ )/2,n1 +n2 −2 √ Sp (1/n1 ) + (1/n2 ) ! " = P −t(1+γ )/2,n1 +n2 −2 ≤ T ≤ t(1+γ )/2,n1 +n2 −2 ! " ! " = FT t(1+γ )/2,n1 +n2 −2 − FT −t(1+γ )/2,n1 +n2 −2 ! " 1+γ = 2FT t(1+γ )/2,n1 +n2 −2 − 1 = 2 · − 1 = γ. 2

12.4

Sampling Distributions

12-51

Solving for µ1 − µ2 in the inequalities in the first term in the preceding display yields 2 1! " ! " P X¯ n1 − X¯ n2 − t(1+γ )/2,n1 +n2 −2 · SE ≤ µ1 − µ2 ≤ X¯ n1 − X¯ n2 +t(1+γ )/2,n1 +n2 −2 · SE = γ ,

√ where SE = Sp (1/n1 ) + (1/n2 ). Therefore, chances are 100γ % that the random interval with endpoints 6 " ! X¯ n1 − X¯ n2 ± t(1+γ )/2,n1 +n2 −2 · Sp (1/n1 ) + (1/n2 ) contains µ1 − µ2 . Hence, we can be 100γ % confident that any such computed interval contains µ1 − µ2 . In other words, the interval with endpoints 6 (x¯n1 − x¯n2 ) ± t(1+γ )/2,n1 +n2 −2 · sp (1/n1 ) + (1/n2 ),

where x¯n1 , x¯n2 , and sp are the observed values of X¯ n1 , X¯ n2 , and Sp , respectively, is a 100γ % confidence interval for µ1 − µ2 . 12.75 a) We assume that daily protein intakes of female vegetarians and female omnivores living in Taiwan are normally distributed with the same variance. Technically, we must, in addition, assume that the sampling is with replacement. However, because the sample sizes are small relative to the population sizes, sampling without replacement is also acceptable. Here n1 = 51, n2 = 53, x¯n1 = 39.04, sn1 = 18.82, x¯n2 = 49.92, and sn2 = 18.97. Hence, O O 2 2 (n1 − 1)sn1 + (n2 − 1)sn2 50 · (18.82)2 + 52 · (18.97)2 sp = = = 18.90. n1 + n2 − 2 102 Also, γ = 0.99 so that (1 + γ )/2 = 0.995. From the note, then, t(1+γ )/2,n1 +n2 −2 = t0.995,102 = 2.625. Referring now to Exercise 12.74, we conclude that a 99% confidence interval for µ1 − µ2 has endpoints 6 (39.04 − 49.92) ± 2.625 · 18.90 (1/51) + (1/53),

or −20.61 g to −1.15 g. b) From part (a), we can be 99% confident that the mean daily protein intake of female omnivores living in Taiwan exceeds that of female vegetarians living in Taiwan by somewhere between 1.15 g and 20.61 g. Thus, we can be quite confident that the mean daily protein intake of female omnivores living in Taiwan exceeds that of female vegetarians living in Taiwan. 12.76 a) We assume that both salary distributions are normally distributed with the same variance. Technically, we must, in addition, assume that the sampling is with replacement. However, because the sample sizes are small relative to the population sizes, sampling without replacement is also acceptable. For the samples obtained, O O (n1 − 1)sn21 + (n2 − 1)sn22 29 · (23.95)2 + 34 · (22.26)2 sp = = = 23.05. n1 + n2 − 2 63 If, in fact, the two salary distributions have the same mean (i.e., µ1 = µ2 ), then the value of the pooled-t statistic is (x¯n1 − x¯n2 ) − (µ1 − µ2 ) (57.48 − 66.39) − 0 t= = −1.554. = √ √ sp (1/n1 ) + (1/n2 ) 23.05 (1/30) + (1/35)

12-52

CHAPTER 12

Applications of Probability Theory

The problem is to determine P (|T | ≥ |t|). Because a Student’s t-distribution is symmetric about 0, we have ! " P (|T | ≥ |t|) = P (|T | ≥ 1.554) = 2 · 1 − FT (1.554) ,

where T has the Student’s t-distribution with 63 degrees of freedom. b) No, the result does not provide reasonably strong evidence that a difference exists in mean annual salaries for faculty in public and private institutions. Indeed, if the mean annual salaries are the same, we would get a value of the pooled-t statistic at least as large in magnitude as the one actually observed more than 12% of the time.

Review Exercises for Chapter 12 Basic Exercises 12.77 Expected number of successes per trial: In Bernoulli trials with success probability p, the number of successes per trial has the Bernoulli distribution with parameter p or, equivalently, the B(1, p) distribution. The expected value of such a random variable is p. Hence, the expected number of successes per trial equals p. Number of trials until the rth success: In Bernoulli trials with success probability p, the number of trials until the rth success has the negative binomial distribution with parameters r and p, that is, the N B(r, p) distribution, as we know from Proposition 5.13 on page 241. Elapsed time between successes: For a Poisson process with rate λ, the elapsed time between successes has the exponential distribution with parameter λ, that is, the E(λ) distribution, as we know from Proposition 12.2 on page 691. Number of successes in (s, t]: For a Poisson process with rate λ, the number of ! successes " in the time interval (s, t] has the Poisson distribution with parameter λ(t − s), that is, the P λ(t − s) distribution, as we know from Definition 12.1 on page 688. 12.78 a) From Proposition 12.2 on page 691, the interarrival times are independent E(6.9) random variables, where time is measured in hours after midnight. Hence, the probability that there is no arrival-free span of 8 minutes or more up to the time when the third patient arrives is P (I1 < 8/60, I2 < 8/60, I3 < 8/60) = P (I1 < 8/60)P (I2 < 8/60)P (I3 < 8/60) 1 23 = 1 − e−6.9·(8/60) = 0.218.

b) No patient sees a doctor by 12:20 A.M. if and only if the first patient arrives after 12:17 A.M. This latter event has probability P (I1 > 17/60) = e−6.9·(17/60) = 0.142.

12-53

Review Exercises for Chapter 12

c) Let T denote the check-in time, in hours. Then, by assumption, T ∼ T (1/30, 2/30). Let U denote the time when the first patient to arrive sees the doctor. We have U = I1 + T and, by assumption, I1 and T are independent random variables. We want to determine P (U > 20/60) = P (U > 1/3). To do so, we apply the bivariate FPF and independence to get 44 44 fI1 ,T (x, y) dx dy = fI1 (x)fT (y) dx dy P (U > 1/3) = P (I1 + T > 1/3) = x+y>1/3

=

4

&4



−∞

= e−2.3

4

∞ 1/3−y

2/30 1/30

x+y>1/3

'

fI1 (x) dx fT (y) dy =

e6.9y fT (y) dy = e−2.3

4

4

2/30 1/30

∞ −∞

e−6.9(1/3−y) fT (y) dy

e6.9y fT (y) dy = e−2.3 MT (6.9).

Referring now to Exercise 11.78, we conclude that P (U > 1/3) = e

−2.3

8e(1/30+2/30)(6.9)/2 · (2/30 − 1/30)2 (6.9)2

&

' (2/30 − 1/30)(6.9) cosh − 1 = 0.412. 2

Of course, we can also obtain this result by direct double-integration of the joint PDF of I1 and T over { (x, y) : x + y > 1/3 }. However, the method we presented is far more efficient. d) Let T be as in part (c), let S denote the time the first patient spends with a doctor, and let V denote the time that the first patient leaves the emergency room. We have V = I1 + T + S and, by assumption, I1 , T , and S are independent random variables. We know that I1 ∼ E(6.9), T ∼ T (1/30, 2/30), and S ∼ U(1/12, 2/12). Let W = T + S so that V = I1 + W . Note that 7/60 ≤ W ≤ 7/30 and that 7/30 < 1/3. We want to determine P (V > 20/60) = P (V > 1/3). To do so, we apply the bivariate FPF and independence to get 44 44 fI1 ,W (x, y) dx dy = fI1 (x)fW (y) dx dy P (V > 1/3) = P (I1 + W > 1/3) = x+y>1/3

=

4



−∞

= e−2.3

&4 4

∞ 1/3−y

7/30 7/60

x+y>1/3

' 4 fI1 (x) dx fW (y) dy =

e6.9y fW (y) dy = e−2.3

4

7/30 7/60

∞ −∞

e−6.9(1/3−y) fW (y) dy

e6.9y fW (y) dy = e−2.3 MW (6.9)

= e−2.3 MT +S (6.9) = e−2.3 MT (6.9)MS (6.9). Referring to the solution to part (c), we see that the product of the first two terms in the last expression in the preceding display equals 0.142 and, from Table 11.1 on page 634, the third term equals MS (6.9) = Therefore,

e(2/12)(6.9) − e(1/12)(6.9) = 2.402. (2/12 − 1/12)(6.9)

P (V > 1/3) = 0.142 · 2.402 = 0.341. Of course, we can also obtain this result by direct triple-integration of the joint PDF of I1 , T , and S over { (x, y, z) : x + y + z > 1/3 }. However, the method we presented is far more efficient.

12-54

CHAPTER 12

Applications of Probability Theory

12.79 a) We know that N(t) ∼ P(λt) and, hence, & ' " 1 N(t) 1 ! E = E N(t) = λt = λ. t t t

b) From part (a) and Chebyshev’s inequality, for each ϵ > 0, ! " 8 &8 ' &8 & '8 ' 8 N(t) 8 8 N(t) 8 Var N(t)/t N(t) 8≥ϵ ≤ P 88 − λ88 ≥ ϵ = P 88 −E 8 t t t ϵ2 ! " Var N(t) λt λ = = 2 2 = 2 . 2 2 ϵ t ϵ t ϵ t As t → ∞, the last term in the preceding display converges to 0. Hence, by the complementation rule, 8 8 &8 ' &8 ' 8 N(t) 8 8 N(t) 8 8 8 8 8 lim P 8 − λ8 < ϵ = 1 − lim P 8 − λ8 ≥ ϵ = 1 − 0 = 1. t→∞ t→∞ t t

12.80 a) For each n ∈ N , we have Wn = I1 + · · · + In and, by Proposition 12.2 on page 691, I1 , I2 , . . . are independent E(λ) random variables. Applying the strong law of large numbers, we conclude that ! " 1 Wn I1 + · · · + In lim = lim = E Ij = , n→∞ n n→∞ n λ with probability 1. Therefore, n 1 1 = lim = = λ, n→∞ Wn n→∞ Wn /n 1/λ lim

with probability 1. b) We are given that W4 = 15. Hence, from part (a), λ≈

4 4 = ≈ 0.267. W4 15

12.81 We assume that λ < 2µ. a) To begin, we note that, when only one customer is in the system, the service time is the minimum of the individual service times of the two servers. Hence, from Proposition 9.11(a) on page 532, the service time for a single customer in the system has an exponential distribution with parameter 2µ. Thus, we have a birth-and-death queueing system with λn = λ for n ≥ 0, and µn = 2µ for n ≥ 1. Hence, this queueing system behaves identically to an M/M/1 queue with the single server serving at rate 2µ. Referring now to Example 12.6 on page 703, specifically to Equation (12.20) on page 704, we conclude that Pn = (1 − λ/2µ)(λ/2µ)n for all n ≥ 0. b) Referring to part (a) and Example 12.8 on page 707, specifically, Equation (12.24) on that same page, we see that L = λ/(2µ − λ). 12.82 a) First assume that s ≤ t. Applying properties of covariance and of a Poisson process, we find that ! " ! " Cov N(s), N(t) = Cov N(s), N(t) − N(s) + N(s) ! " ! " ! " = Cov N(s), N(t) − N(s) + Cov N(s), N(s) = 0 + Var N(s) = λs. ! " Interchanging the ! roles of s "and t, we get that Cov N(s), N(t) = λt if t ≤ s. Consequently, we have shown that Cov N(s), N(t) = min{s, t} for all s, t ≥ 0.

12-55

Review Exercises for Chapter 12

b) Referring to part (a) and noting that s ≤ s + t, we get ! " M ! " Cov N(s), N(s + t) λs s ρ N(s), N(s + t) = 5 ! " ! " = √λs · λ(s + t) = s + t . Var N(s) Var N(s + t)

12.83 We measure time in hours after 7:45 A.M. Let N(t) be the number of passengers at the bus stop at time t. By assumption, {N(t) : t ≥ 0} is a Poisson process with rate 60. We note that the bus arrives at time T − 7.75, which we denote by S. The number of passengers who are waiting at the bus stop when the bus arrives is N(S), which we denote by X. The problem is to determine the mean and standard deviation of X. Now, let Y = 4T − 32. By assumption, Y ∼ Beta(2, 2). Therefore, E(Y ) = We have

2 1 = 2+2 2

and

S = T − 7.75 = so that " 1 1! E(S) = E(Y ) + 1 = 4 4

&

' 1 3 +1 = 2 8

Var(Y ) =

2·2

(2 + 2)2 (2 + 2 + 1)

=

1 . 20

Y + 32 1 − 7.75 = (Y + 1), 4 4 and

Var(S) =

1 1 1 1 Var(Y ) = · = . 16 16 20 320

Now, because {N(t) : t ≥ 0} is independent of S, we have ! " ! " E(X | S) = E N(S) | S = 60S and Var(X | S) = Var N(S) | S = 60S. Consequently, by the laws of total expectation and total variance,

and

! " 3 E(X) = E E(X | S) = E(60S) = 60E(S) = 60 · = 22.5 8

! " ! " Var(X) = E Var(X | S) + Var E(X | S) = E(60S) + Var(60S) 3 1 = 60E(S) + 3600 Var(S) = 60 · + 3600 · = 33.75. 8 320 Hence, the mean number of passengers √ who are waiting at the bus stop when the bus arrives is 22.5 and the standard deviation of the number is 33.75 ≈ 5.81.

12.84 a) The counting process {N(t) : t ≥ 0} gives the total number of securities orders, both taxable and nontaxable, that have been received at the brokerage house. In particular, for each t ≥ 0, N(t) is the total number of securities orders that have been received at the brokerage house by time t. b) We must show that the counting process {N(t) : t ≥ 0} satisfies the three conditions of Definition 12.1 on page 688 with λ = λ1 + λ2 . First condition: As {N1 (t) : t ≥ 0} and {N2 (t) : t ≥ 0} are each Poisson processes, we have N1 (0) = 0 and N2 (0) = 0, so that N(0) = N1 (0) + N2 (0) = 0. Second condition: Let r ∈ N and 0 ≤ t1 < t2 < · · · < tr . As {N1 (t) : t ≥ 0} and {N2 (t) : t ≥ 0} are Poisson processes, N1 (tj ) − N1 (tj −1 ), 2 ≤ j ≤ r, are independent random variables, as are N2 (tj ) − N2 (tj −1 ), 2 ≤ j ≤ r.

12-56

CHAPTER 12

Applications of Probability Theory

And, because {N1 (t) : t ≥ 0} and {N2 (t) : t ≥ 0} are independent, we can apply Proposition 6.13 on page 297 to conclude that ! " ! " N(tj ) − N(tj −1 ) = N1 (tj ) + N2 (tj ) − N1 (tj −1 ) + N2 (tj −1 ) " ! " ! = N1 (tj ) − N1 (tj −1 ) + N2 (tj ) − N2 (tj −1 ) , 2 ≤ j ≤ r, are independent random variables. Thus, {N(t) : t ≥ 0} has independent increments. Third condition:

Let 0 ≤ s < t < ∞. As {N1 (t) : t ≥ 0} and {N ! 2 (t) : t ≥" 0} are Poisson processes!with rates"λ1 and λ2 , respectively, we have that N1 (t) − N1 (s) ∼ P λ1 (t − s) and N2 (t) − N2 (s) ∼ P λ2 (t − s) . And, because {N1 (t) : t ≥ 0} and {N2 (t) : t ≥ 0} are independent, we can apply Proposition 6.20(b) on page 311 to conclude that ! " ! " ! " ! " N(t) − N(s) = N1 (t) + N2 (t) − N1 (s) + N2 (s) = N1 (t) − N1 (s) + N2 (t) − N2 (s) " ! " ! ∼ P λ1 (t − s) + λ2 (t − s) = P (λ1 + λ2 )(t − s) .

We have, therefore, now shown that {N(t) : t ≥ 0} satisfies the three conditions given in Definition 12.1 with λ = λ1 + λ2 and, hence, that it is a Poisson process with rate λ1 + λ2 . c) Recall that, for a Poisson process, the interarrival times between events are independent and identically distributed exponential random variables with common parameter equal to the rate of the process. Therefore, because of the lack-of-memory property of an exponential random variable, starting at any specified time, the elapsed time until the next event occurs has that same exponential distribution. Now, the probability that a securities order is taxable equals the probability that, starting at a specified time, the next taxable security order arrives before the next nontaxable security order. Because {N1 (t) : t ≥ 0} and {N2 (t) : t ≥ 0} are independent Poisson processes with rates λ1 and λ2 , respectively, those two arrival times are independent exponential random variables with parameters λ1 and λ2 , respectively. Hence, from Proposition 9.11(b), the probability that, starting at a specified time, the next taxable security order arrives before the next nontaxable security order is λ1 /(λ1 + λ2 ). d) Let E = event two taxable orders are received before the first nontaxable order, A = event the first order received is taxable, and B = event the second order received is taxable.

We want to determine the probability of event E. Noting that E = A ∩ B, we apply the general multiplication rule to conclude that P (E) = P (A)P (B | A). From part (c), we know that P (A) = λ1 /(λ1 + λ2 ). Now, because of the lack-of-memory property, given event A occurs, when that taxable order is received, both Poisson processes start anew, probabilistically speaking. Hence, from part (c), we have that P (B | A) = λ1 /(λ1 + λ2 ). Thus, & '2 λ1 λ1 λ1 P (E) = P (A)P (B | A) = · = . λ1 + λ2 λ1 + λ2 λ1 + λ2

e) From the arguments used previously in this problem, we see that each securities order ! " will be taxable with probability λ1 /(λ1 + λ2 ) and, hence, nontaxable with probability λ2 /(λ1 + λ2 ) , independent of the entire past history of both Poisson processes. In other words, we can consider the process of observing the type of each successive securities order a sequence of Bernoulli trials with success probability p = λ1 /(λ1 + λ2 ), where a success denotes a taxable order. Using that interpretation, the probability that n1 taxable orders are received before n2 nontaxable orders are received equals the probability, in Bernoulli trials with success probability p, of n1 successes occur before n2 failures. However, n1 successes occur

Review Exercises for Chapter 12

12-57

before n2 failures if and only if at least n1 successes occur in the first n1 + n2 − 1 trials. This latter event has probability n1 +n #2 −1 &n1 + n2 − 1' p k (1 − p)n1 +n2 −1−k . k k=n 1

Hence, the probability that n1 taxable orders are received before n2 nontaxable orders equals n1 +n #2 −1 &n1 + n2 − 1' & λ1 'k & λ2 'n1 +n2 −1−k . k λ1 + λ2 λ1 + λ2 k=n 1

12.85 a) The counting process {N(t) : t ≥ 0} gives the total number of event of all m types that occur by time t. b) We must show that the counting process {N(t) : t ≥ 0} satisfies the three conditions of Definition 12.1 on page 688 with λ = λ1 + · · · + λm .

First condition: Because {Nk (t) : t ≥ 0}, 1 ≤ k ≤ m, are Poisson processes, we have Nk (0) = 0 for all 1 ≤ k ≤ m, so that N(0) = N1 (0) + · · · + Nm (0) = 0.

Second condition: Let r ∈ N and 0 ≤ t1 < t2 < · · · < tr . Because {Nk (t) : t ≥ 0}, 1 ≤ k ≤ m, are Poisson processes, the random variables Nk (tj ) − Nk (tj −1 ), 2 ≤ j ≤ r, are independent for each 1 ≤ k ≤ m. And, as the m Poisson processes are independent, we can apply Proposition 6.13 on page 297 to conclude that ! " ! " N(tj ) − N(tj −1 ) = N1 (tj ) + · · · + Nm (tj ) − N1 (tj −1 ) + · · · + Nm (tj −1 ) " ! " ! 2 ≤ j ≤ r, = N1 (tj ) − N1 (tj −1 ) + · · · + Nm (tj ) − Nm (tj −1 ) , are independent random variables. Thus, {N(t) : t ≥ 0} has independent increments.

Third condition: Let 0 ≤ s < t < ∞. As {Nk (t) : t ≥ 0}, 1 ≤ k! ≤ m, are" Poisson processes with rates λk , 1 ≤ k ≤ m, respectively, we have that Nk (t) − Nk (s) ∼ P λk (t − s) for each 1 ≤ k ≤ m. And, as the m Poisson processes are independent, we can apply Proposition 6.20(b) on page 311 to conclude that ! " ! " N(t) − N(s) = N1 (t) + · · · + Nm (t) − N1 (s) + · · · + Nm (s) ! " ! " = N1 (t) − N1 (s) + · · · + Nm (t) − Nm (s) ! " ! " ∼ P λ1 (t − s) + · · · + λm (t − s) = P (λ1 + · · · + λm )(t − s) .

We have, therefore, now shown that {N(t) : t ≥ 0} satisfies the three conditions given in Definition 12.1 with λ = λ1 + · · · + λm and, hence, that it is a Poisson process with rate λ1 + · · · + λm . c) Recall that, for a Poisson process, the interarrival times between events are independent and identically distributed exponential random variables with common parameter equal to the rate of the process. Therefore, because of the lack-of-memory property of an exponential random variable, starting at any specified time, the elapsed time until the next event occurs has that same exponential distribution. Now, the probability an event occurring in the {N(t) : t ≥ 0} process is of Type j equals the probability that, starting at a specified time, the next Type j event occurs before the next occurrence of any of the other types of events. Because {Nk (t) : t ≥ 0}, 1 ≤ k ≤ m, are independent Poisson processes with rates λk , 1 ≤ k ≤ m, respectively, the m occurrence times are independent exponential random variables with parameters λk , 1 ≤ k ≤ m, respectively. Hence, from Proposition 9.11(b), the probability that, starting at a specified time, the next Type j event occurs before the next occurrence of any of the other types of events is λj /(λ1 + · · · + λm ).

12-58

CHAPTER 12

Applications of Probability Theory

12.86 We must show that the counting process {Y (t) : t ≥ 0} satisfies the three conditions of Definition 12.1 on page 688 with rate pλ. First condition: By definition, Y (t) = 0 if N(t) = 0. Hence, because, N(0) = 0, we have Y (0) = 0. Second condition: Let r ∈ N and 0 ≤ t1 < t2 < · · · < tr . For 2 ≤ j ≤ r, we can write Y (tj ) − Y (tj −1 ) = I{N(tj )−N(tj −1 )>0}

N(t #j )

k=N(tj −1 )+1

Xk = I{N(tj )−N(tj −1 )>0}

N(tj )−N(t # j −1 ) k=1

Xk+N(tj −1 ) .

We note that, because X1 , X2 , . . . are independent random variables, which are also independent of {N(t) : t ≥ 0}, the distribution of the sum on the right of the preceding display depends only on N(tj ) − N(tj −1 ) and p. Hence, as N(tj ) − N(tj −1 ), 2 ≤ j ≤ r, are independent random variables, we conclude from Proposition 6.13 on page 297 that Y (tj ) − Y (tj −1 ), 2 ≤ j ≤ r, are independent random variables. Therefore, {Y (t) : t ≥ 0} has independent increments. Third condition: Let 0 ≤ s < t < ∞. Given that N(t) − N(s) = n, we see from the just-presented verification of the second condition, that Y (t) − Y (s) ∼ B(n, p). Therefore, from the law of total probability, for each nonnegative integer k, ∞ ! " ! " ! " # P N(t) − N(s) = n P Y (t) − Y (s) = k | N(t) − N(s) = n P Y (t) − Y (s) = k = n=k

=

∞ #

e

−λ(t−s)

n=k

=e

−pλ(t−s)

=e

−pλ(t−s)

! " Hence, Y (t) − Y (s) ∼ P pλ(t − s) .

! "n & ' λ(t − s) n k · p (1 − p)n−k n! k

! "k ∞ ! "n−k pλ(t − s) # −(1−p)λ(t−s) (1 − p)λ(t − s) e k! (n − k)! n=k ! "k ! "k pλ(t − s) −pλ(t−s) pλ(t − s) ·1=e . k! k!

Consequently, we have now shown that the compound Poisson process {Y (t) : t ≥ 0} is, in this case, a Poisson process with rate pλ.

Note: An alternative method for solving this problem is to proceed as follows. Each time the specified event occurs, classify it as Type 1 if the corresponding Xk is 1 (which has probability p) and as Type 2 otherwise. We see the that Y (t) equals the number of Type 1 events that occur by time t. Therefore, from Exercise 12.7, we conclude that {Y (t) : t ≥ 0} is a Poisson process with rate pλ. 12.87 a) The tacit assumption is that the capacity of the emergency room is infinite. b) A better model would be an M/M/1/N queueing system, because it would reflect the reality of a finite-capacity emergency room.

Review Exercises for Chapter 12

12-59

c) From Exercise 12.26, for λ ̸= µ, the steady-state distribution of the number of customers in the system for an M/M/1/N queue is ⎧ n ⎨ (1 − λ/µ)(λ/µ) , if 0 ≤ n ≤ N; 1 − (λ/µ)N+1 Pn = ⎩ 0, if n ≥ N + 1. Here we have λ = 6.9, µ = 7.4, and N = 40. Then λ 6.9 = ≈ 0.93243 µ 7.4

and

1 − λ/µ 1 − 6.9/7.4 = ≈ 0.07164. N+1 1 − (λ/µ) 1 − (6.9/7.4)41

Hence, the steady-state distribution of the number of patients in an emergency room with a capacity of 40 is Pn ≈ 0.07164 · (0.93243)n , 0 ≤ n ≤ 40,

and Pn = 0 otherwise. In contrast, the steady-state distribution of the number of patients in an emergency room with infinite capacity is, from Example 12.7, Pn ≈ 0.06757 · (0.93243)n ,

n ≥ 0.

As we see, there is not much difference between the M/M/1 and M/M/1/40 queueing systems. d) In this case, N = 100, so that 1 − λ/µ 1 − 6.9/7.4 = ≈ 0.06763. N+1 1 − (6.9/7.4)101 1 − (λ/µ)

Hence, the steady-state distribution of the number of patients in an emergency room with a capacity of 100 is Pn ≈ 0.06763 · (0.93243)n , 0 ≤ n ≤ 100, and Pn = 0 otherwise. As we would expect, the steady-state distribution of this M/M/1/100 queueing system is even closer to that of the M/M/1 queueing system than is the steady-state distribution of the M/M/1/40 queueing system.

12.88 a) Measuring time in days, we note that the service facility is an M/M/1 queueing system with λ = 5 and µ = 24/4 = 6. Let X denote the steady-state number of down machines and let T denote the total cost per day to the company due to down machines, including the cost of operating the service facility. Then T = 247 + 50X. From Equation (12.24) on page 707, E(X) = L = Hence,

λ 5 = = 5. λ−µ 6−5

E(T ) = E(247 + 50X) = 247 + 50E(X) = 247 + 50 · 5 = 497.

b) In this case, the service facility is an M/M/1/ queueing system with λ = 5 and µ = 24/3 = 8. Let X denote the steady-state number of down machines, c the daily operating cost for this new service facility, and T the total cost per day to the company due to down machines, including the cost of operating the service facility. Then T = c + 50X. From Equation (12.24), E(X) = L =

λ 5 5 = = . λ−µ 8−5 3

12-60

CHAPTER 12

Applications of Probability Theory

Hence, 5 250 =c+ . 3 3 Referring to part (a), we see that, for the new service facility to be economically feasible, we must have c + 250/3 ≤ 497, or c ≤ 413.67. Therefore, the maximum daily allowable operating cost for the new service facility is $413.67. E(T ) = E(c + 50X) = c + 50E(X) = c + 50 ·

12.89 We will compare the steady-state expected number of people in the queueing systems in the two configurations. With the current configuration, the queueing system consists of two separate M/M/1 queues. Measuring time in hours, the service rate for both of these M/M/1 queues is µ = 60/4 = 15, whereas the arrival rates are 12 and 7. Hence, from Equation (12.24) on page 707, L=

12 7 39 + = = 4.875. 15 − 12 15 − 7 8

With the photocopying facilities in the same room, we have an M/M/2 queueing system with service rate 15 and arrival rate 12 + 7 = 19. Hence, from Equations (12.28) and (12.22) on pages 708 and 704, respectively, & '−1 19 (19/30)(19/15)2 (19/15)2 L= · 1 + 19/15 + ≈ 2.115. + 2 (1 − 19/30) 15 2 (1 − 19/30)2 Thus, with the current configuration, the expected number of people in the queueing system is 4.875, whereas, with the photocopying facilities in the same room, the expected number is roughly 2.115. We therefore see that the photocopying facilities would be more effectively used if they were in the same room. 12.90 Measuring time in hours, we have an M/M/2/3 queueing system with λ = 25 and µ = 60/6 = 10. From Equation (12.13) on page 700, 7 7 25, if 0 ≤ n ≤ 2; 10, if n = 1; λn = µn = 0, if n ≥ 3. 20, if n ≥ 2. For convenience, let r = λ/µ = 2.5 and ρ = r, n + λk−1 = rρ n−1 , µk k=1 0,

= λ/2µ = 1.25. Then

7 if n = 1; rρ n−1 , if n = 2, 3; = 0, if n ≥ 4.

Thus, from Equation (12.18) on page 703, 7 n−1 Pn = rρ P0 , 0, where

if 1 ≤ n ≤ 3; if n ≥ 4.

if 1 ≤ n ≤ 3; if n ≥ 4.

$

%−1 & '%−1 $ '−1 ∞ &+ n 3 # # λk−1 1 − ρ3 n−1 P0 = 1 + = 1+ rρ = 1+r . µ 1 − ρ k n=1 k=1 n=1

a) The (steady-state) expected number of customers in the gas station is L=

∞ # n=0

nPn = rP0

3 # n=1



n−1

&

1 − ρ3 =r 1+r 1−ρ

'−1 1 2 1 + 2ρ + 3ρ 2 ≈ 1.944.

Review Exercises for Chapter 12

12-61

b) A potential customer is lost when an arriving customer observes three customers in the gas station. The probability of that happening is 2

P3 = rρ P0 = rρ

2

&

1 − ρ3 1+r 1−ρ

'−1

= 0.371.

Hence, 37.1% of potential customers are lost. c) Both attendants are idle when and only when there are no customers in the gas station. The probability of that happening is & '−1 1 − ρ3 ≈ 0.095. P0 = 1 + r 1−ρ Hence, both attendants are idle 9.5% of the time.

12.91 a) Answers will vary, but one possibility is a self-service situation. b) We have λn = λ for all n ≥ 0, and µn = nµ for all n ≥ 1. c) For n ≥ 1, n n + + λk−1 λ (λ/µ)n = = . µk kµ n! k=1 k=1

Therefore,

' # ∞ &+ n ∞ # λk−1 (λ/µ)n = = eλ/µ − 1 < ∞. µk n! n=1 k=1 n=1

Hence, from Propostion 12.3 on page 703, a steady-state distribution always exists. Furthermore, $ '%−1 1 ∞ &+ n # ! "2−1 λk−1 P0 = 1 + = 1 + eλ/µ − 1 = e−λ/µ µ k n=1 k=1

and, for n ≥ 1,

&+ n

' λk−1 (λ/µ)n −λ/µ P0 = e . Pn = µk n! k=1

Consequently, we see that the steady-state number of customers in the queueing system has the Poisson distribution with parameter λ/µ. d) Referring to part (c), we conclude that the expected number of customers in the queueing system equals λ/µ. e) Because there are an infinite number of servers, there is never any queue. Thus, we have Lq = Wq = 0. Also, in this case, W is simply the expected time it takes a server to serve a customer, which is 1/µ. 12.92 a) Here the state of the system is the number of machines that are down. Clearly, µn = µ for all n ≥ 1, and λn = 0 for all n ≥ M. Now suppose that 0 ≤ n ≤ M − 1. When n machines are down, there are M − n machines that are working. Because the breakdown time for each machine has an exponential distribution with parameter λ, the time until the next machine breaks down has an exponential distribution with parameter (M − n)λ. Consequently, we have λn = (M − n)λ. Hence, 7 (M − n)λ, if 0 ≤ n ≤ M − 1; µn = µ, n ≥ 1. λn = 0, if n ≥ M.

12-62

CHAPTER 12

Applications of Probability Theory

b) Referring to part (a), we see that ⎧ ! " n + ⎪ M − (k − 1) λ n ⎨ + λk−1 , = k=1 µ ⎪ µ k ⎩ k=1 0,

if 1 ≤ n ≤ M; if n ≥ M + 1.

Hence, from Proposition 12.3 on page 703, 7 (M)n (λ/µ)n P0 , Pn = 0, where

=

7

if 1 ≤ n ≤ M;

(M)n (λ/µ)n , 0,

if n ≥ M + 1.

if 1 ≤ n ≤ M; if n ≥ M + 1.

$

%−1 '%−1 $# ∞ &+ n M # λk−1 n = (M)n (λ/µ) . P0 = 1 + µ k n=1 k=1 n=0

c) Let r = λ/µ. For 1 ≤ n ≤ M,

! " nPn = M − (M − n) Pn = MPn − (M − n)Pn .

However,

(M − n)Pn = (M − n)(M)n r n P0 = (M)n+1 r n P0 =

1 1 (M)n+1 r n+1 P0 = Pn+1 . r r

Hence, ∞ #

M #

M M M M M # # # 1# 1# L= nPn = MPn − (M − n)Pn = M Pn − Pn+1 = M Pn − Pn r n=1 r n=2 n=0 n=1 n=1 n=1 n=1

1 1 µ = M(1 − P0 ) − (1 − P0 − P1 ) = M(1 − P0 ) − (1 − P0 − MrP0 ) = M − (1 − P0 ). r r λ Consequently, the steady-state expected number of working machines is 1 2 µ µ M − L = M − M − (1 − P0 ) = (1 − P0 ). λ λ d) Here we have M = 10, λ = 1/6, and µ = 1/1.5 = 2/3. From part (a), 7 (M − n)/6, if 0 ≤ n ≤ 9; λn = µn = 2/3, n ≥ 1. 0, if n ≥ 10. N Noting that λ/µ = (1/6) (2/3) = 1/4, we get, in view of part (b), that 7 (10)n 4−n P0 , if 1 ≤ n ≤ 10; Pn = 0, if n ≥ 11. where

P0 =

$ 10 #

(10)n 4

n=0

−n

%−1

≈ 0.00531.

The following table provides the steady-state distribution for the number of down machines: n Pn

0

1

2

3

4

5

6

7

8

9

10

0.00531 0.01327 0.02985 0.05971 0.10449 0.15674 0.19592 0.19592 0.14694 0.07347 0.01837

12-63

Review Exercises for Chapter 12

A probability histogram for this steady-state number of down machines is as follows:

From part (c), the expected number of working machines is (µ/λ)(1 − P0 ) = 4(1 − P0 ) ≈ 3.98.

On average, roughly four machines are working (and six machines are down). 12.93 For convenience, let r = λ/µ. a) Referring to Equation (12.22) on page 704, we get P0 =

$ 2−1 n # r n=0

&

r2 + n! 2!(1 − r/2)

r2 = 1+r + 2−r

'−1

=

%−1

&

r2 = 1+r + 2(1 − r/2)

'−1

2−r 2 − λ/µ 2µ − λ = = . 2+r 2 + λ/µ 2µ + λ

b) Referring to Equation (12.28) on page 708 and then to part (a), we get L=r+ =

r3 2−r r3 4r (r/2)r 2 P = r + · = r + = 0 2 2 2 2!(1 − r/2) (2 − r) 2 + r 4−r 4 − r2

4(λ/µ) 4λµ 4λµ = = . 2 2 2 (2µ − λ)(2µ + λ) 4 − (λ/µ) 4µ − λ

c) In this case, λ = 6.9 and µ = 7.4. Hence, from parts (a) and (b), P0 = and

2 · 7.4 − 6.9 ≈ 0.364 2 · 7.4 + 6.9

4 · 6.9 · 7.4 ≈ 1.2. (2 · 7.4 − 6.9)(2 · 7.4 + 6.9) These two results agree with those found in Examples 12.7(b) and 12.9(b), respectively. L=

12.94 Let us call the three expected numbers L1 , L2 , and L3 . a) Referring to Exercise 12.93(b), we have L1 =

4λµ . 4µ2 − λ2

12-64

CHAPTER 12

Applications of Probability Theory

b) The time it takes a customer to be served is, in this case, the minimum of two exponential random variables with parameter µ, which, as we know, has an exponential distribution with parameter 2µ. Hence, this system is an M/M/1 queue with arrival rate λ and service rate 2µ. Referring now to Equation (12.24) on page 707, we conclude that λ L2 = . 2µ − λ c) Each of these M/M/1 queueing systems has service rate µ and, from Exercise 12.6, each has arrival rate λ/2. Referring again to Equation (12.24), we conclude that L3 = 2 ·

λ/2 2λ = . µ − λ/2 2µ − λ

d) To compare the three expected numbers, let α = λ/2µ. Then L1 =

4λµ 4λµ/4µ2 λ/µ 2α ! " , = = = 2 2 2 2 (1 − α)(1 + α) 4µ2 − λ2 1 − (λ/2µ) 4µ − λ /4µ L2 =

λ λ/2µ λ/2µ α = = = , 2µ − λ (2µ − λ)/2µ 1 − λ/2µ 1−α

L3 =

2λ 2λ/2µ λ/µ 2α = = = . 2µ − λ (2µ − λ)/2µ 1 − λ/2µ 1−α

and

Because λ < 2µ, we have 0 < α < 1. Hence,

α 2α 2α < < , 1−α (1 − α)(1 + α) 1−α

that is, L2 < L1 < L3 . The efficiency of the three configurations are, from highest to lowest, the second, the first, and the third. 12.95 Let µi = [µ]i for 1 ≤ i ≤ m. a) By convention, we can take X ∼ N m (µ, 0). Hence, from Definition 12.5 on page 717, ′

1 ′



1 ′





MX1 ,...,Xm (t1 , . . . , tm ) = MX (t) = eµ t+ 2 t !t = eµ t+ 2 t 0t = eµ t+0 = eµ t = e

,m

j =1 µj tj

.

b) As Var(Xi ) = Cov(Xi , Xi ) = [0]ii = 0, Proposition 7.7 on page 354 implies that P (Xi = µi ) = 1; that is, Xi is the constant random variable µi . In other words, 7 1, if x = µi ; pXi (x) = 1 ≤ i ≤ m. 0, otherwise.

From this result and the fact that a finite intersection of events with probability 1 has probability 1, we get ! " P (X = µ) = P (X1 , . . . , Xm ) = (µ1 , . . . , µm ) = P (X1 = µ1 , . . . , Xm = µm ) = 1.

Thus, (X1 , . . . , Xm ) is the constant vector (µ1 , . . . , µm ). In other words, 7 1, if xi = µi for 1 ≤ i ≤ m; pX1 ,...,Xm (x1 , . . . , xm ) = 0, otherwise.

We note in passing that we can use the preceding result to obtain the result of part (a) as follows: 1 ′2 # ′ ,m ′ MX1 ,...,Xm (t1 , . . . , tm ) = MX (t) = E eX t = ex t P (X = x) = eµ t = e j =1 µj tj . x

Review Exercises for Chapter 12

12-65

! " 12.96 From Proposition 12.7 on page 719, we know that Y ∼ N m a + Bµ, B!B′ . Thus, it remains to show that Y1 , . . . , Ym are nonsingular. Because X is nonsingular, Proposition 12.5 on page 717 implies that ! is a nonsingular matrix, which, in turn, implies that det ! ̸= 0. And, because B is a nonsingular matrix, det B ̸= 0. Hence, ! " det !Y = det B!B′ = det B · det ! · det B′ = (det B)2 det ! ̸= 0. Therefore, !Y is a nonsingular matrix, which, by Proposition 12.5, implies that Y1 , . . . , Ym are nonsingular multivariate normal random variables. 12.97 We have Var(Y1 ) = Var(X1 − 2X2 ) = Cov(X1 − 2X2 , X1 − 2X2 )

= Cov(X1 , X1 ) − 2 Cov(X1 , X2 ) − 2 Cov(X2 , X1 ) + 4 Cov(X2 , X2 ) = 1.0 − 2 · 0.1 − 2 · 0.1 + 4 · 1.0 = 4.6,

Var(Y2 ) = Var(X3 − 3X4 ) = Cov(X3 − 3X4 , X3 − 3X4 ) = Cov(X3 , X3 ) − 3 Cov(X3 , X4 ) − 3 Cov(X4 , X3 ) + 9 Cov(X4 , X4 ) = 4.0 − 3 · 2.4 − 3 · 2.4 + 9 · 4.0 = 25.6,

and

Cov(Y1 , Y2 ) = Cov(X1 − 2X2 , X3 − 3X4 ) = Cov(X1 , X3 ) − 3 Cov(X1 , X4 ) − 2 Cov(X2 , X3 ) + 6 Cov(X2 , X4 ) = 0.4 − 3 · 0.6 − 2 · 1.0 + 6 · 1.0 = 2.6.

Consequently,

Cov(Y1 , Y2 ) 2.6 ρ(Y1 , Y2 ) = √ =√ ≈ 0.24. Var(Y1 ) Var(Y2 ) 4.6 · 25.6 12.98 a) Let Y1 = X1 and Y2 = X2 − cX1 . We can write Y = BX, where > ? 1 0 0 B= . −c 1 0 From Proposition 12.7 on page 719, Y is bivariate normal. Hence, from Proposition 10.9 on page 613, Y1 and Y2 are independent if and only if they are uncorrelated. However, Cov(Y1 , Y2 ) = Cov(X1 , X2 − cX1 ) = Cov(X1 , X2 ) − c Cov(X1 , X1 ) = 1 − c · 4 = 1 − 4c. Hence, Y1 and Y2 are independent if and only if c = 1/4. b) From Proposition 12.8 on page 719, X1 and X2 are bivariate normal with > ? > ? −1 4 2 µ= and != . 2 2 3 c) From Proposition 12.6 on page 718 (or Proposition 12.7 on page 719), we know that X1 − 2X2 + 3X3 is normally distributed. We have E(X1 − 2X2 + 3X3 ) = E(X1 ) − 2E(X2 ) + 3E(X3 ) = −1 − 2 · 0 + 3 · 2 = 5

12-66

CHAPTER 12

Applications of Probability Theory

and Var(X1 − 2X2 + 3X3 ) = Cov(X1 − 2X2 + 3X3 , X1 − 2X2 + 3X3 ) = Cov(X1 , X1 ) − 2 Cov(X1 , X2 ) + 3 Cov(X1 , X3 ) − 2 Cov(X2 , X1 ) + 4 Cov(X2 , X2 ) − 6 Cov(X2 , X3 ) + 3 Cov(X3 , X1 ) − 6 Cov(X3 , X2 ) + 9 Cov(X3 , X3 ) =4−2·1+3·2−2·1+4·1−6·0+3·2+9·3

= 43.

Hence, X1 − 2X2 + 3X3 ∼ N (5, 43). d) Let Y1 = X1 + X2 − X3 and Y2 = X1 − 2X2 . We can write Y = BX, where > ? 1 1 −1 B= . 1 −2 0 ! " Hence, by Proposition 12.7 on page 719, Y ∼ N 2 Bµ, B!B′ . Now, >

1 1 −1 !Y = B!B = 1 −2 0 ′

?@4 1 2A@ 1 1 1 0 1 2 0 3 −1

1 −2 0

A

=

>

? 6 −1 . −1 4

We see that det !Y ̸= 0 and, therefore, !Y is a nonsingular matrix. Applying Proposition 12.5 on page 717, we conclude that Y1 and Y2 are nonsingular bivariate normal random variables. 12.99 a) Referring to !, we see that Cov(X1 , X2 ) 60 ρ(X1 , X2 ) = √ =√ = 0.6, Var(X1 ) Var(X2 ) 400 · 25 Cov(X1 , X3 ) 72 =√ = 0.6, ρ(X1 , X3 ) = √ Var(X1 ) Var(X3 ) 400 · 36

Cov(X2 , X3 ) 27 = 0.9. =√ ρ(X2 , X3 ) = √ Var(X2 ) Var(X3 ) 25 · 36 b) From µ, !, and Proposition 12.8 on page 719, we find that X1 ∼ N (150, 400),

X2 ∼ N (120, 25),

X3 ∼ N (80, 36).

c) Again from µ, !, and Proposition 12.8 on page 719, we find that ? > ?' 150 400 60 , , 120 60 25 &> ? > ?' 150 400 72 , , (X1 , X3 ) ∼ N2 80 72 36 &> ? > ?' 120 25 27 (X2 , X3 ) ∼ N2 , . 80 27 36

(X1 , X2 ) ∼ N2

&>

12-67

Review Exercises for Chapter 12

d) Here we use the partition ⎤ ⎡ [ X1 ] ?⎥ ⎢> X = ⎣ X2 ⎦ , X3 We see that

⎤ [ 150 ] ?⎥ ⎢> µ = ⎣ 120 ⎦ , 80 ⎡

!−1 22 Therefore, µ1 + !12 !−1 22 (x2

⎤ 72 ] ?⎥ 27 ⎦ . 36



[ 400 ] [ 60 ⎢> ? > ! = ⎣ 60 25 72 27

and

> ? 1 36 −27 = . 25 171 −27

?> ? 36 −27 130 − 120 72 ] −27 25 90 − 80 > ? 1 3960 10 = 150 + = 150 + [ 216 180 ] 10 171 171

1 − µ2 ) = 150 + [ 60 171

>

= 173.2

and !11 − !12 !−1 22 !21

?> ? 36 −27 60 72 ] −27 25 72 > ? 1 25920 60 = 400 − = 400 − [ 216 180 ] 72 171 171

1 = 400 − [ 60 171

>

= 248.4.

Hence, from Exercise 12.58(a), we have X1|X2 =130,X3 =90 ∼ N (173.2, 248.4). e) Here we use the partition ⎡> ⎡> ⎡> ?⎤ ?⎤ ? X2 120 25 27 ⎢ ⎥ ⎢ ⎥ ⎢ X = ⎣ X3 ⎦ , µ = ⎣ 80 ⎦ , and ! = ⎣ 27 36 [ X1 ] [ 150 ] [ 60 72 ]

?⎤ 60 72 ⎥ ⎦. [ 400 ] >

We see that !−1 22 = 1/400. Therefore, > ? > ? > ? > ? 1 50 60 120 60 120 −1 + + µ1 + !12 !22 (x2 − µ2 ) = [ 200 − 150 ] = 80 80 400 72 400 72 > ? 127.5 = 89

and

!11 − !12 !−1 22 !21

>

? > ? 1 25 27 60 = − [ 60 27 36 400 72 > ? 16 16.2 = 16.2 23.04

Hence, from Exercise 12.58(a), we have

(X2 , X3 )|X1 =200 ∼ N 2

&>

>

25 72 ] = 27

? > 127.5 16 , 89 16.2

? > 1 27 3600 − 36 400 4320

16.2 23.04

?'

.

4320 5184

?

12-68

CHAPTER 12

Applications of Probability Theory

12.100 a) The required graph is as follows:

b) All Student’s t-distributions have greater spread than the standard normal distribution and, the greater the number of degrees of freedom, the more a Student’s t-distribution resembles the standard normal distribution. c) From the hint, as n → ∞, we have Sn → σ . Hence, for large n, X¯ n − µ X¯ n − µ √ ≈ √ . Sn / n σ/ n From Proposition 12.12, the random variable on the left has the Student’s t-distribution with n − 1 degrees of freedom and, from Proposition 12.9, the random variable on the right has the standard normal distribution. Heuristically, then, a Student’s t-distribution approaches the standard normal distribution as the number of degrees of freedom increases without bound. 12.101 of a) Let U1 and U2 be independent random variables that have the chi-square distributions with degrees N freedom ν1 and ν2 , respectively. From Lemma 12.2 on page 727, the random variable (U1 /ν1 ) (U2 /ν2 ) has the F -distribution with degrees of freedom ν1 and ν2 . Hence, F has the same probability distribution N as (U1 /ν1 ) (U2 /ν2 ). Consequently, 1/F has the same probability distribution as N 1 N = (U2 /ν2 ) (U1 /ν1 ), (U1 /ν1 ) (U2 /ν2 )

which, again, by Lemma 12.2, is the F -distribution with degrees of freedom ν2 and ν1 . b) Let g(w) = 1/w and let Y = g(F ). Note that 1/F = Y . Now, g is defined, strictly decreasing, and differentiable on the range of F , and g −1 (y) = 1/y. We apply the univariate transformation theorem and Equation (12.54) on page 727 to conclude that a PDF of the random variable Y is 1 1 8 fF (w) = w2 fF (w) = =8 fF (1/y) 2 8−1/w 8 y2 ! " (1/y)ν1 /2−1 1 (ν1 /ν2 )ν1 /2 " (ν1 + ν2 )/2 = 2 "(ν +ν )/2 ! "(ν1 /2)"(ν2 /2) y 1 + (ν1 /ν2 )(1/y) 1 2

fY (y) =

1

fF (w) |g ′ (w)|

if y > 0, and fY (y) = 0 otherwise.

Review Exercises for Chapter 12

12-69

However, ! "(ν +ν )/2 (ν2 /ν1 )y 1 2 1 1 (1/y)ν1 /2−1 (1/y)ν1 /2−1 = · ! " ! " ! "(ν +ν )/2 (ν +ν )/2 y 2 y 2 1 + (ν1 /ν2 )(1/y) (ν1 +ν2 )/2 (ν2 /ν1 )y 1 2 1 + (ν1 /ν2 )(1/y) 1 2 y ν2 /2−1 = (ν2 /ν1 )(ν1 +ν2 )/2 ! "(ν +ν )/2 1 + (ν2 /ν1 )y 2 1

and (ν1 +ν2 )/2

(ν2 /ν1 ) Consequently,

! " ! " (ν1 /ν2 )ν1 /2 " (ν1 + ν2 )/2 (ν2 /ν1 )ν2 /2 " (ν2 + ν1 )/2 · = . "(ν1 /2)"(ν2 /2) "(ν2 /2)"(ν1 /2)

! " (ν2 /ν1 )ν2 /2 " (ν2 + ν1 )/2 y ν2 /2−1 fY (y) = ! "(ν +ν )/2 "(ν2 /2)"(ν1 /2) 1 + (ν2 /ν1 )y 2 1

if y > 0, and fY (y) = 0 otherwise. Thus, we see that 1/F (= Y ) has the F -distribution with degrees of freedom ν2 and ν1 . 12.102 For convenience, set a = ν1 /ν2

and

! " (ν1 /ν2 )ν1 /2 " (ν1 + ν2 )/2 b= . "(ν1 /2)"(ν2 /2)

Let g(w) = 1/(1 + aw) and let Y = g(F ). We want to find and identify the probability distribution of Y . Now, g is defined, strictly decreasing, and differentiable on the range of F , and g −1 (y) = (1 − y)/ay. We apply the univariate transformation theorem and Equation (12.54) on page 727 to conclude that a PDF of the random variable Y is fY (y) = =

1 1 (1 + aw)2 bw ν1 /2−1 8 8 fF (w) = f (w) = F ! "(ν +ν )/2 8−a/(1 + aw)2 8 |g ′ (w)| a 1 + aw 1 2 "ν /2−1 b (ν1 +ν2 )/2−2 ! b y (1 − y)/ay 1 = ν /2 y ν2 /2−1 (1 − y)ν1 /2−1 a a 1

if 0 < y < 1, and fY (y) = 0 otherwise. Because a and b are constants, we can now conclude that Y has the beta distribution with parameters ν2 /2 and ν1 /2. However, as a check, we note that ! "N ! " (ν1 /ν2 )ν1 /2 " (ν1 + ν2 )/2 "(ν1 /2)"(ν2 /2) " (ν2 + ν1 )/2 b 1 = = = . /2 /2 ν ν 1 1 "(ν2 /2)"(ν1 /2) B(ν2 /2, ν1 /2) (ν1 /ν2 ) a 12.103 a) From Proposition 12.11 on page 725, we know that χn2 = (n − 1)Sn2 /σ 2 has the chi-square distribution with n − 1 degrees of freedom. Hence, P

&

2 χ(1−γ )/2,n−1

(n − 1)Sn2 2 ≤ ≤ χ(1+γ )/2,n−1 σ2

'

1 2 2 2 2 = P χ(1−γ )/2,n−1 ≤ χn ≤ χ(1+γ )/2,n−1

! 2 " ! 2 " = Fχn2 χ(1+γ )/2,n−1 − Fχn2 χ(1−γ )/2,n−1 =

1−γ 1+γ − = γ. 2 2

12-70

CHAPTER 12

Applications of Probability Theory

Solving for σ 2 in the inequalities in the first term of the preceding display yields % $ 2 (n − 1)S (n − 1)Sn2 n ≤ σ2 ≤ 2 = γ. P 2 χ(1+γ χ )/2,n−1 (1−γ )/2,n−1 b) From part (a) chances are 100γ % that the random interval from (n − 1)Sn2 2 χ(1+γ )/2,n−1

to

(n − 1)Sn2 2 χ(1−γ )/2,n−1

contains σ 2 . Hence, we can be 100γ % confident that any such computed interval contains σ 2 . In other words, the interval from (n − 1)sn2 2 χ(1+γ )/2,n−1

to

(n − 1)sn2 , 2 χ(1−γ )/2,n−1

where sn2 is the observed value of Sn2 , is a 100γ % confidence interval for σ 2 . 12.104 a) We have γ = 0.95 so that (1 − γ )/2 = 0.025 and (1 + γ )/2 = 0.975. Because n = 12, 2 2 χ(1−γ )/2,n−1 = χ0.025,11 = 3.816

and

2 2 χ(1+γ )/2,n−1 = χ0.975,11 = 21.920.

From the data, sn2 = 1.148. Referring now to Exercise 12.103(b), we conclude that a 95% confidence interval for σ 2 is from (12 − 1) · 1.148 (12 − 1) · 1.148 to , 21.920 3.816 or from 0.58 to 3.31. Taking square roots, we find that a 95% confidence interval for the standard deviation of highway gas mileages for all cars of the model and year in question is from 0.76 mpg to 1.82 mpg. b) We can be 95% confident that the standard deviation of highway gas mileages for all cars of the model and year in question is somewhere between 0.76 mpg and 1.82 mpg. c) From Proposition 12.11 on page 725, we must assume that highway gas mileages for all cars of the model and year in question are normally distributed. Technically, for the independence assumption in Proposition 12.11, we must, in addition, assume that the sampling is with replacement. However, because the sample size is small relative to the population size, sampling without replacement is also acceptable. 12.105 a) We begin by noting that µD = E(D) = E(X − Y ) = E(X) − E(Y ) = µX − µY . Next we note that, if (X1 , Y1 ), . . . , (Xn , Yn ) are independent random vectors, then the random variables D1 , . . . , Dn , where Dj = Xj − Yj for 1 ≤ j ≤ n, are also independent. Hence, from Proposition 12.12 on page 726, D¯ n − (µX − µY ) D¯ n − µD Tn = = √ √ Sn / n Sn / n has the Student’s t-distribution with n − 1 degrees of freedom.

Review Exercises for Chapter 12

12-71

Using the fact that a t-distribution is continuous and symmetric, we conclude that &

' D¯ n − (µX − µY ) P −t(1+γ )/2,n−1 ≤ ≤ t(1+γ )/2,n−1 √ Sn / n ! " = P −t(1+γ )/2,n−1 ≤ Tn ≤ t(1+γ )/2,n−1 " ! " ! = FTn t(1+γ )/2,n−1 − FTn −t(1+γ )/2,n−1 ! " 1+γ = 2FTn t(1+γ )/2,n−1 − 1 = 2 · − 1 = γ. 2

Solving for µX − µY in the inequalities in the first term in the preceding display yields

1 √ √ 2 P D¯ n − t(1+γ )/2,n−1 · Sn / n ≤ µX − µY ≤ D¯ n + t(1+γ )/2,n−1 · Sn / n = γ .

b) From part (a), chances are 100γ % that the random interval from √ D¯ n − t(1+γ )/2,n−1 · Sn / n

to

√ D¯ n + t(1+γ )/2,n−1 · Sn / n

contains µX − µY . Thus, we can be 100γ % confident that any such computed interval contains µX − µY . In other words, the interval from √ d¯n − t(1+γ )/2,n−1 · sn / n

to

√ d¯n + t(1+γ )/2,n−1 · sn / n,

where d¯n and sn are the observed values of D¯ n and Sn , respectively, is a 100γ % confidence interval for µX − µY . c) Yes, because a linear combination of multivariate normal random variables is normally distributed. In particular, then, if X and Y are bivariate normal random variables, then their difference, D = X − Y , which is a linear combination of X and Y , is normally distributed, as required. 12.106 a) We have γ = 0.95 so that (1 + γ )/2 = 0.975. As n = 10, t(1+γ )/2,n−1 = t0.975,9 = 2.262. Also, from the data, d¯n = 3.3 and sn = 4.715. Referring now to Exercise 12.105(b), we conclude that a 95% confidence interval for µX − µY is from √ 3.3 − 2.262 · 4.715/ 10

to

√ 3.3 + 2.262 · 4.715/ 10,

or from −0.07 yr to 6.67 yr. We can be 95% confident that the difference between the mean age of married men and the mean age of married women is somewhere between −0.07 yr to 6.67 yr. b) As we know, the mean of the difference between two random variables is the difference between the means of the two random variables. Consequently, in view of part (a), a 95% confidence interval for the mean age difference of married couples is from −0.07 yr to 6.67 yr. We can be 95% confident that the mean age difference of married couples is somewhere between −0.07 yr to 6.67 yr. c) No, it’s only required that the differences between the ages of married couples be normally distributed. d) No, we can’t necessarily conclude that the differences between the ages of married couples are normally distributed just because both the ages of married men and the ages of married women are normally distributed. Indeed, the difference between two normally distributed random variables is not always normally distributed. An example is provided by the random variables considered in Exercise 10.113.

12-72

CHAPTER 12

Applications of Probability Theory

Theory Exercises 12.107 Let n1 ≤ n2 ≤ · · · ≤ nm be nonnegative integers. Using the independent-increments and distributional properties of a Poisson process, we get " ! P N(t1 ) = n1 , . . . , N(tm ) = nm ! " = P N(t1 ) = n1 , N(t2 ) − N(t1 ) = n2 − n1 , . . . , N(tm ) − N(tm−1 ) = nm − nm−1 ! " ! " ! " = P N(t1 ) = n1 P N(t2 ) − N(t1 ) = n2 − n1 · · · P N(tm ) − N(tm−1 ) = nm − nm−1 ! "n1 ! "n2 −n1 ! "nm −nm−1 −λt1 λt1 −λ(t2 −t1 ) λ(t2 − t1 ) −λ(tm −tm−1 ) λ(tm − tm−1 ) =e e ···e n1 ! (n2 − n1 )! (nm − nm−1 )! =e

n1 −λtm nm t1

(t2 − t1 )n2 −n1 (tm − tm−1 )nm −nm−1 ··· . n1 ! (n2 − n1 )! (nm − nm−1 )!

λ

·

Hence, with the convention that x0 = 0 and t0 = 0, we have pN(t1 ),...,N(tm ) (x1 , . . . , xm ) = e−λtm λxm

m + (tk − tk−1 )xk −xk−1 (xk − xk−1 )! k=1

if x1 , . . . , xm are nonnegative integers with x1 ≤ · · · ≤ xm , and pN(t1 ),...,N(tm ) (x1 , . . . , xm ) = 0 otherwise. 12.108 We prove that (a) ⇒ (d) ⇒ (c) ⇒ (b) ⇒ (a).

(a) ⇒ (d): This result is true because mutually independent random variables are necessarily pairwise independent. (d) ⇒!(c): Let " i ̸= j . From ! (d), " we know!that Xi "and Xj are independent random variables and, hence, that E Xi Xj = E(Xi ) E Xj . Thus, Cov Xi , Xj = 0 and, therefore, Xi and Xj are uncorrelated. " ! (c) ⇒ (b): Because Cov Xi , Xj = 0 for i ̸= j , the off-diagonal entries of ! all equal 0. Hence, ! is a diagonal matrix. (b) ⇒ (a): Let Y1 , . . . , Ym be independent normal random variables such that E(Yi ) = E(Xi ) and Var(Yi ) = Var(Xi ) for all 1 ≤ i ≤ m. As we know, because they are independent normal random variables, Y1 , . . . , Ym are multivariate normal random variables with a diagonal covariance matrix. By assumption, the covariance matrix of X1 , . . . , Xm is also diagonal. Furthermore, because Var(Yi ) = Var(Xi ) for 1 ≤ i ≤ m, the diagonal entries of the two covariance matrices are equal. Thus, !Y = !X . Also, because E(Yi ) = E(Xi ) for 1 ≤ i ≤ m, we have µY = µX . Hence, we have shown that X1 , . . . , Xm and Y1 , . . . , Ym have the same mean vector and covariance matrix. As the mean vector and covariance matrix of multivariate normal random variables determine the MGF and, therefore, the joint distribution, we conclude that X1 , . . . , Xm are independent random variables.

Advanced Exercises 12.109 Let 0 ≤ a < b. Then, because λ > 0,

∞ ∞ ! " # ! " # ! " 0 < λ(b − a) = E N(b) − N(a) = nP N(b) − N(a) = n = nP N(b) − N(a) = n . n=0

n=1

! " ! " Thus, there is an n ∈ N such that P N(b) − N(a) = n > 0 and, hence, P N(b) − N(a) ≥ 1 > 0. Now let 0 ≤ s < t < ∞, let M be a positive real number, and let N = ⌈M⌉. Consider the parti-

12-73

Review Exercises for Chapter 12

tion s = t0 < t1 < · · · < tN = t of the interval (s, t], where tj = s + j (t − s)/N for 0 ≤ j ≤ N . Applying the domination principle and the assumed independent increments, we get &* ' N ( ! " ! " ) P N(t) − N(s) ≥ M ≥ P N(t) − N(s) ≥ N ≥ P N(tj ) − N(tj −1 ) ≥ 1 j =1

=

N +

j =1

! " P N(tj ) − N(tj −1 ) ≥ 1 > 0.

Consequently, the random variable N(t) − N(s) is unbounded. 12.110 ( ) a) Let X ∼ U(0, t). For 1 ≤ j ≤ m, let Ej = X ∈ (tj −1 , tj ] . We note that E1 , . . . , Em are mutually exclusive and exhaustive events with probabilities p1 , . . . , pm , where ! " tj − tj −1 pj = P (Ej ) = P X ∈ (tj −1 , tj ] = , t

1 ≤ j ≤ m.

Thus, we see that, in n independent repetitions of the random experiment, Y1 , . . . , Ym give the number of times that events E1 , . . . , Em occur, respectively. Hence, Y1 , . . . , Ym have the multinomial distribution with parameters n and p1 , . . . , pm . Referring now to Proposition 6.7 on page 277, we conclude that & ' & ' n n (t1 − t0 )y1 · · · (tm − tm−1 )ym y1 ym pY1 ,...,Ym (y1 , . . . , ym ) = p1 · · · pm = y1 , . . . , ym y1 , . . . , ym tn if y1 , . . . , ym are nonnegative integers whose sum is n, and pY1 ,...,Ym (y1 , . . . , ym ) = 0 otherwise. b) Let n1 , . . . , nm be nonnegative integers whose sum is n. Note that, if Z1 = n1 , . . . , Zm = nm , then N(t) = n. Hence, from the conditional probability rule and the independent increments of a Poisson process, we get ! " P Z1 = n1 , . . . , Zm = nm | N(t) = n ! " = P N(t1 ) − N(t0 ) = n1 , . . . , N(tm ) − N(tm−1 ) = nm | N(t) = n " ! P N(t1 ) − N(t0 ) = n1 , . . . , N(tm ) − N(tm−1 ) = nm , N(t) = n ! " = P N(t) = n ! " P N(t1 ) − N(t0 ) = n1 , . . . , N(tm ) − N(tm−1 ) = nm ! " = P N(t) = n ! " ! " P N(t1 ) − N(t0 ) = n1 · · · P N(tm ) − N(tm−1 ) = nm ! " = P N(t) = n "n1 "nm ! ! −λ(t1 −t0 ) λ(t1 − t0 ) −λ(tm −tm−1 ) λ(tm − tm−1 ) e ···e n1 ! nm ! = n (λt) e−λt n! ' & n 1 (t1 − t0 ) · · · (tm − tm−1 )nm n . = tn n1 , . . . , nm

12-74

CHAPTER 12

Applications of Probability Theory

Hence,

&

' n (t1 − t0 )z1 · · · (tm − tm−1 )zm pZ1 ,...,Zm | N(t) (z1 , . . . , zm | n) = z1 , . . . , zm tn if z1 , . . . , zm are nonnegative integers whose sum is n, and pZ1 ,...,Zm (z1 , . . . , zm ) = 0 otherwise. c) The joint distributions in parts (a) and (b) are identical. Thus, for a Poisson process, given that exactly n events occur by time t, their times of occurrence, considered as unordered random variables, are the same as those obtained by selecting n times uniformly and independently from the interval (0, t]. 12.111 Conditions (1), (2), and (3) are part of the definition of a Poisson process. Applying L’Hôpital’s rule, we find that 1 2 e−λh − 1 + λh lim = lim −λe−λh + λ = 0. h→0 h→0 h −λh −λh − 1 + λh = o(h) or, equivalently, e = 1 − λh + o(h). Therefore, Hence, e ! " (λh)1 P N(h) = 1 = e−λh = λh − (λh)2 + λh · o(h). 1!

Now,

& ' 1 2 −(λh)2 + λh · o(h) o(h) 2 2 lim = lim −λ h + λ · o(h) = lim −λ h + λh · = 0, h→0 h→0 h→0 h h ! " which means that −(λh)2 + λh · o(h) = o(h). Hence, we see that P N(h) = 1 = λh + o(h), so that ! " condition (4) holds. To verify condition (5), we first note that P N(h) = 0 = e−λh = 1 − λh + o(h). Hence, by the complementation rule, ! " ! " ! " P N(h) ≥ 2 = 1 − P N(h) = 0 − P N(h) = 1 ! " ! " = 1 − 1 − λh + o(h) − λh + o(h) = o(h).

12.112 a) Conditions (1) and (2) are the same as conditions (a) and (b) of the definition of a Poisson process (Definition 12.1 on page 688). Suppose that we have shown that N(t) ∼ P(λt),

t > 0.

(∗)

By condition (3), we know that {N(t) : t ≥ 0} has stationary increments. Thus, N(t) − N(s) has the same probability distribution as N(t − s) − N(0). But, by condition (1) and (∗), ! " N(t − s) − N(0) = N(t − s) ∼ P λ(t − s) . ! " Hence, N(t) − N(s) ∼ P λ(t − s) , which is condition (c) of the definition of a Poisson process. Consequently, conditions (1)–(5) and (∗) imply that conditions (a)–(c) of Definition 12.1 hold; that is, that {N(t) : t ≥ 0} is a Poisson process with rate λ. b) Applying the law of total probability and then using conditions (1), (2), and (3), we get ∞ ! " ! " ! " # P N(t + h) = n | N(t) = k P N(t) = k Pn (t + h) = P N(t + h) = n = k=0

n # ! " ! " P N(t + h) − N(t) = n − k | N(t) = k P N(t) = k = k=0

=

n−2 # k=0

! " ! " ! " P N(h) = n − k Pk (t) + P N(h) = 1 Pn−1 (t) + P N(h) = 0 Pn (t).

Review Exercises for Chapter 12

12-75

Now, from conditions (4) and (5) and the complementation rule, ! " ! " ! " P N(h) = 0 = 1 − P N(h) = 1 − P N(h) ≥ 2 ! " = 1 − λh + o(h) − o(h) = 1 − λh + o(h). Moreover, 0≤

n−2 ! n n−2 ! # # " " # ! " ! " P N(h) = n − k Pk (t) ≤ P N(h) = n − k = P N(h) = j ≤ P N(h) ≥ 2 . k=0

j =2

k=0

Hence, by condition (5), n−2 ! # " P N(h) = n − k Pk (t) = o(h). k=0

Therefore, we have shown that

! " ! " Pn (t + h) = o(h) + λh + o(h) Pn−1 (t) + 1 − λh + o(h) Pn (t) = λhPn−1 (t) + (1 − λh)Pn (t) + o(h). c) We can write the result of part (b) as Pn (t + h) − Pn (t) = −λhPn (t) + λhPn−1 (t) + o(h). Therefore, Pn′ (t)

& ' Pn (t + h) − Pn (t) o(h) = lim = lim −λPn (t) + λPn−1 (t) + = −λPn (t) + λPn−1 (t). h→0 h→0 h h

d) We want to prove that Pn (t) = e−λt

(λt)n , n!

n ≥ 0.

(∗∗)

We proceed by mathematical induction. First we establish Equation (∗∗) for n = 0. From the general multiplication rule and conditions (2) and (3), ! " ! " P0 (t + h) = P N(t + h) = 0 = P N(t + h) = 0, N(t) = 0 ! " ! " = P N(t + h) = 0 | N(t) = 0 P0 (t) = P N(t + h) − N(t) = 0 | N(t) = 0 P0 (t) ! " ! " = P N(h) = 0 P0 (t) = 1 − λh + o(h) P0 (t) = (1 − λh)P0 (t) + o(h).

Thus,

& ' −λhP0 (t) + o(h) P0 (t + h) − P0 (t) = lim = −λP0 (t). h→0 h→0 h h

P0′ (t) = lim

′ −λt From calculus, the solution to the differential! equation P " 0 (t) = −λP0 (t) is P0 (t) = ce , where c is a constant. From condition (1), P0 (0) = P N(0) = 0 = 1 and, hence, c = 1. We have therefore established Equation (∗∗) for n = 0. Assume now that Equation (∗∗) holds for n − 1. We use part (c) to prove that it holds for n. By part (c) and the induction assumption,

Pn′ (t) + λPn (t) = λPn−1 (t) = λe−λt

(λt)n−1 λn = e−λt t n−1 . (n − 1)! (n − 1)!

12-76

CHAPTER 12

Applications of Probability Theory

Multiplying both sides of the preceding result by eλt yields eλt Pn′ (t) + λeλt Pn (t) = or, equivalently,

λn t n−1 , (n − 1)!

" d ! λt λn e Pn (t) = t n−1 . dt (n − 1)!

Integration now gives

λn t n (λt)n +k = + k, (n − 1)! n n! where k is a constant. Setting t = 0 and using condition (1) shows that k = 0. Therefore, we have eλt Pn (t) =

Pn (t) = e−λt

(λt)n , n!

as required. 12.113 a) For each n ∈ N , the nth interarrival interval is [Wn−1 , Wn ], which has length Wn − Wn−1 = In . As we know, In has the exponential distribution with parameter λ and, therefore, has mean 1/λ. Hence, we might guess that E(Z), the expected length of the interarrival interval that contains t, equals 1/λ. But, note that the interarrival interval that contains t is [WN(t) , WN(t)+1 ], which has length WN(t)+1 − WN(t) = IN (t)+1 . Thus, in fact, Z = IN(t)+1 and, due to the random subscript in IN(t)+1 , the mean of Z may not be 1/λ. In other words, our guess may be incorrect. b) From the lack-of-memory property of the exponential distribution, it follows that the distribution of X is the same as that of I1 , which is exponential with parameter λ. Thus, E(X) = 1/λ. c) Let 0 ≤ y < t. We note that Y > y if and only if no events occur in the interval from t − y to t. Consequently, ! " P (Y > y) = P N(t) − N(t − y) = 0 = e−λy . Hence, the CDF of Y is given by

FY (y) =

= 0,

1 − e−λy , 1,

if y < 0; if 0 ≤ y < t; if y ≥ t.

Applying Proposition 10.5, we find that 4 ∞ 4 t " 1! E(Y ) = P (Y > y) dy = e−λy dy = 1 − e−λt . λ 0 0

d) We note that Z = X + Y . Hence, from parts (b) and (c),

" 1! " 1! 1 + 1 − e−λt = 2 − e−λt . λ λ λ This result shows that our guess in part (a) is incorrect except when t = 0. Moreover, for large t, we see that E(Z) is roughly 2/λ, twice that of the guess made in part (a). e) From part (d), the expected length of the interarrival interval that contains t is, for large t, roughly 2/λ, which is twice 1/λ, the common expected length of the interarrival intervals. E(Z) = E(X + Y ) = E(X) + E(Y ) =

12.114 a) At time t, N(t) events have occurred and, thus, the time ! " of occurrence of the N(t)th event must be at or before t; hence, WN(t) ≤ t. At time t, the N(t) + 1 st event has yet to occur; hence, t < WN (t)+1 . Consequently, we have WN(t) ≤ t < WN(t)+1 .

12-77

Review Exercises for Chapter 12

b) We can write Wn = I1 + · · · + In . As I1 , I2 , . . . are independent and identically distributed E(λ) random variables, which have common mean 1/λ, the strong law of large numbers implies that Wn I1 + · · · + In 1 = lim = , n→∞ n n→∞ n λ lim

with probability 1. c) We note that N(t) is integer valued and nondecreasing as a function of t. Let E = {limt→∞ N(t) = ∞} and, for each n, M ∈ N , let EMn = {N(n) ≥ M}. It follows that ' ∞ &P ∞ * EMn . E= M=1 n=1

Fix M ∈ N . Then EM1 ⊂ EM2 ⊂ · · ·. Therefore, by the continuity property of a probability measure (Proposition 2.11 on page 74), ' &P ∞ EMn = lim P (EMn ). P n→∞

n=1

However, because N(t) ∼ P(λt), we have by the complementation rule and FPF,

M−1 # ! " ! " (λn)k P (EMn ) = P N(n) ≥ M = 1 − P N(n) < M = 1 − e−λn . k! k=0

Therefore, by L’Hôpital’s rule, $ lim P (EMn ) = lim

n→∞

n→∞

1−

M−1 # k=0

(λn)k e−λn k!

%

=1−

M−1 # k=0

λk k!

&

nk lim λn n→∞ e

'

= 1 − 0 = 1.

!Q∞ " It now follows that P n=1 EMn = 1 for each M ∈ N and, consequently, because a countable intersection of events with probability 1 has probability 1, we have that $ '% ∞ &P ∞ 1 2 * P lim N(t) = ∞ = P (E) = P EMn = 1. t→∞

M=1 n=1

d) Applying parts (b) and (c), we see that, with probability 1, WN(t) 1 = t→∞ N(t) λ lim

and that

WN(t)+1 N(t) + 1 WN(t)+1 1 1 = lim =1· = . t→∞ N(t) t→∞ N(t) N(t) + 1 λ λ lim

From part (a),

WN(t) WN(t)+1 t < . ≤ N(t) N(t) N(t) In view of the preceding three displays, we conclude that, with probability 1, 1 WN(t) t WN(t)+1 1 = lim ≤ lim ≤ lim = . t→∞ t→∞ t→∞ λ N(t) N(t) N(t) λ Hence, limt→∞ N(t)/t = λ with probability 1. e) The result of part (d) shows that, with probability 1, the rate of occurrence of the specified event is λ.

12-78

CHAPTER 12

Applications of Probability Theory

12.115 Note: In this exercise, we use the standard conventions λ−1 = 0 and µ0 = 0. a) To begin, suppose that X ∼ E(a) so that P (X > h) = e−ah . Referring to the hint in Exercise 12.111, we see that P (X > h) = 1 − ah + o(h) and P (X ≤ h) = ah + o(h). Now suppose that Y ∼ E(b) and is independent of X. Then ! "! " P (X ≤ h, Y ≤ h) = ah + o(h) bh + o(h) = o(h), ! "! " P (X ≤ h, Y > h) = ah + o(h) 1 − bh + o(h) = ah + o(h), and

! "! " P (X > h, Y > h) = 1 − ah + o(h) 1 − bh + o(h) = 1 − ah − bh + o(h).

Next, recall that, for a birth-and-death queueing system, successive interarrival times are independent exponential random variables, successive service times are independent exponential random variables, and arrival and service times are independent. Therefore, from the lack-of-memory property of an exponential random variable and the preceding three displays, we get the following results: If the current state of the system is k, then, in a time interval of length h, ● ● ● ●

the probability of more than one transition (arrival or departure) is o(h), the probability of one arrival and no departures is λk h + o(h),

the probability of no arrivals and one departure is µk h + o(h), and

the probability of no arrivals and no departures is 1 − λk h − µk h + o(h).

Therefore, from the law of total probability and the bulleted list, we have, for n ≥ 0, ∞ ! " # ! " ! " Pn (t + h) = P X(t + h) = n = P X(t + h) = n | X(t) = k P X(t) = k k=0

=

n−2 ! # " ! " P X(t + h) = n | X(t) = k Pk (t) + λn−1 h + o(h) Pn−1 (t) k=0

" ! " ! + 1 − λn h − µn h + o(h) Pn (t) + µn+1 h + o(h) Pn+1 (t) +

∞ #

k=n+2

! " P X(t + h) = n | X(t) = k Pk (t)

! " = λn−1 hPn−1 (t) + 1 − (λn + µn )h Pn (t) + µn+1 hPn+1 (t) + o(h).

b) We can write the result of part (a) as

Pn (t + h) − Pn (t) = λn−1 hPn−1 (t) − (λn + µn )hPn (t) + µn+1 hPn+1 (t) + o(h). Hence, for n ≥ 0, Pn′ (t) = lim

h→0

= lim

h→0

& &

Pn (t + h) − Pn (t) h

'

o(h) λn−1 Pn−1 (t) − (λn + µn )Pn (t) + µn+1 Pn+1 (t) + h

= λn−1 Pn−1 (t) − (λn + µn )Pn (t) + µn+1 Pn+1 (t).

'

12-79

Review Exercises for Chapter 12

c) Assuming that a steady-state distribution, {Pn : n ≥ 0}, exists, we have lim Pn (t) = Pn ,

n ≥ 0.

t→∞

Heuristically, then, for n ≥ 0, lim P ′ (t) t→∞ n

= lim

t→∞

&

' 2 d d 1 d Pn (t) = lim Pn (t) = Pn = 0. t→∞ dt dt dt

Thus, in view of part (b), we have, for n ≥ 0,

" ! 0 = lim Pn′ (t) = lim λn−1 Pn−1 (t) − (λn + µn )Pn (t) + µn+1 Pn+1 (t) t→∞

t→∞

= λn−1 lim Pn−1 (t) − (λn + µn ) lim Pn (t) + µn+1 lim Pn+1 (t) t→∞

t→∞

t→∞

= λn−1 Pn−1 − (λn + µn )Pn + µn+1 Pn+1 , or, equivalently, n ≥ 0,

λn−1 Pn−1 + µn+1 Pn+1 = (λn + µn )Pn , which are the balance equations.

12.116 We begin by showing that (univariate and multivariate) marginals of nonsingular normal random variables are nonsingular normal random variables. From Proposition 12.8 on page 719, such marginals are multivariate normal. They are nonsingular because, as we know from Chapter 9, marginals of random variables with a joint PDF also have a joint PDF. Now we proceed to the problem at hand. Because B has rank k ≤ m, linear algebra tells us that we can add m − k rows to the bottom of B to form an m × m nonsingular matrix A. Let us also add any m − k numbers to the bottom of a to form an m × 1 vector b. From Exercise 12.96, we know that W = b + AX has a nonsingular multivariate normal distribution. Hence, the random vector consisting of the first k rows of W, which is Y = a + BX, has a nonsingular multivariate normal distribution. 12.117 For an M/M/∞ queueing system, we have λn = λ for all n ≥ 0. Hence, λ¯ =

∞ # n=0

λn Pn = λ

∞ # n=0

Pn = λ.

We know that L = λ/µ. Hence, from Exercise 12.36(b), Lq = L −

λ¯ λ λ = − = 0. µ µ µ

The expected number of customers in the queue is 0, which makes sense because, for an M/M/∞ queue, there is never any queue due to the infinite number of servers. Now applying Little’s formulas, we get W =

L λ/µ 1 = = ¯λ λ µ

and

Wq =

Lq 0 = = 0. ¯λ λ

The expected time that a customer spends in the queueing system is 1/µ, which makes sense because, for an M/M/∞ queue, the time that a customer spends in the queueing system is just the duration of service. Also, the expected time a customer waits in the queue is 0, which makes sense because, for an M/M/∞ queue, there is never any queue due to the infinite number of servers.

12-80

CHAPTER 12

Applications of Probability Theory

12.118 a) Recall that λ¯ represents the long-run-average arrival rate. On average, there are L down machines and, hence, M − L up machines. As each up machine goes down at an exponential rate λ, independently of the other up machines, we conclude from Proposition 9.11 on page 532 that, on average, the arrival rate is (M − L)λ. b) Referring to Exercises 12.92(a) and 12.92(b), we get that λ¯ =

∞ # n=0

λn Pn =

= Mλ

∞ # n=0

M # n=0

Pn − λ

(M − n)λPn = Mλ

∞ # n=0

M # n=0

Pn − λ

M #

nPn

n=0

nPn = Mλ · 1 − λL = (M − L)λ.

c) From part (b), we have λ¯ = (10 − 6.02)/6 = 0.663. Hence, from Exercise 12.36(b), Lq = L −

λ¯ 0.663 = 6.02 − = 5.03. µ 2/3

Hence, from Little’s formulas, W =

6.02 L = = 9.08 hr ¯λ 0.663

and

Wq =

Lq 5.03 = = 7.59 hr. ¯λ 0.663

12.119 a) The number of customers in service, S, equals the number of customers in the system, X, less the number of customers in the queue, Q. Hence, from Exercise 12.36(b), E(S) = E(X − Q) = E(X) − E(Q) = L − Lq =

λ¯ . µ

b) Let Bj denote the event that server j is busy. By symmetry, P (Bj ) does not depend on j , and we denote that probability pb . Note that pb,equals the probability that a specified server is busy. Referring now to part (a) and observing that S = js =1 IBj , we conclude that &# ' # s s s ! " # λ¯ = E(S) = E IBj = E IBj = P (Bj ) = spb . µ j =1 j =1 j =1 ¯ Consequently, pb = λ/sµ. c) For a single-server queue, the server is busy if and only if X ≥ 1. Hence, from part (b), λ¯ = pb = P (X ≥ 1) = 1 − P (X = 0) = 1 − P0 . µ

¯ Hence, P0 = 1 − λ/µ. d) From Exercise 12.35(b), λ¯ = λ for an M/M/1 queue. Hence, from part (c), P0 = 1 − λ/µ.

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF