Exercises and Solutions in Biostatistical Theory (2010)
Short Description
Descripción: Biostatistics...
Description
Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 334872742 © 2011 by Taylor and Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acidfree paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number: 9781584887225 (Paperback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 9787508400. CCC is a notforprofit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress CataloginginPublication Data Kupper, Lawrence L. Exercises and solutions in biostatistical theory / Lawrence L. Kupper, Sean M. O’Brien, Brian H. Neelon. p. cm.  (Chapman & Hall/CRC texts in statistical science series) Includes bibliographical references and index. ISBN 9781584887225 (pbk. : alk. paper) 1. BiometryProblems, exercises, etc. I. O’Brien, Sean M. II. Neelon, Brian H. III. Title. QH323.5.K87 2010 570.1’5195dc22 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
2010032496
To my wonderful wife Sandy, to the hundreds of students who have taken my courses in biostatistical theory, and to the many students and colleagues who have collaborated with me on publications involving both theoretical and applied biostatistical research. Lawrence L. Kupper
To Sara, Oscar, and my parents for their unwavering support, and to Larry, a true mentor. Brian H. Neelon
To Sarah and Avery, for support and inspiration. Sean M. O’Brien
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1.
Basic Probability Theory . . . . . . . . . . . . . . . . . . . 1.1 Concepts and Notation . . . . . . . . . . . . . . . . . 1.1.1 Counting Formulas . . . . . . . . . . . . . . . 1.1.1.1 Ntuples . . . . . . . . . . . . . . . . 1.1.1.2 Permutations . . . . . . . . . . . . . 1.1.1.3 Combinations . . . . . . . . . . . . 1.1.1.4 Pascal’s Identity . . . . . . . . . . . 1.1.1.5 Vandermonde’s Identity . . . . . 1.1.2 Probability Formulas . . . . . . . . . . . . . 1.1.2.1 Definitions . . . . . . . . . . . . . . . 1.1.2.2 Mutually Exclusive Events . . . 1.1.2.3 Conditional Probability . . . . . 1.1.2.4 Independence . . . . . . . . . . . . 1.1.2.5 Partitions and Bayes’ Theorem Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
2.
Univariate Distribution Theory . . . . . . . . . . . . . . . . . . 2.1 Concepts and Notation . . . . . . . . . . . . . . . . . . . . . 2.1.1 Discrete and Continuous Random Variables 2.1.2 Cumulative Distribution Functions . . . . . . . 2.1.3 Median and Mode . . . . . . . . . . . . . . . . . . . 2.1.4 Expectation Theory . . . . . . . . . . . . . . . . . . . 2.1.5 Some Important Expectations . . . . . . . . . . . 2.1.5.1 Mean . . . . . . . . . . . . . . . . . . . . . . . 2.1.5.2 Variance . . . . . . . . . . . . . . . . . . . . . 2.1.5.3 Moments . . . . . . . . . . . . . . . . . . . . 2.1.5.4 Moment Generating Function . . . . . 2.1.5.5 Probability Generating Function . . . 2.1.6 Inequalities Involving Expectations . . . . . . . 2.1.6.1 Markov’s Inequality . . . . . . . . . . . . 2.1.6.2 Jensen’s Inequality . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
1 1 1 1 1 1 2 3 3 3 4 4 5 5 7 17 45 45 45 45 46 46 47 47 47 47 48 48 49 49 49 ix
x
Contents
2.1.7
2.1.8
Exercises Solutions 3.
2.1.6.3 Hölder’s Inequality . . . . . . . . . . . . . . . . . . . . Some Important Probability Distributions for Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.7.1 Binomial Distribution . . . . . . . . . . . . . . . . . . . 2.1.7.2 Negative Binomial Distribution . . . . . . . . . . . 2.1.7.3 Poisson Distribution . . . . . . . . . . . . . . . . . . . . 2.1.7.4 Hypergeometric Distribution . . . . . . . . . . . . . Some Important Distributions (i.e., Density Functions) for Continuous Random Variables . . . . . . . . . . . . . . . 2.1.8.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . 2.1.8.2 Lognormal Distribution . . . . . . . . . . . . . . . . . 2.1.8.3 Gamma Distribution . . . . . . . . . . . . . . . . . . . 2.1.8.4 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . 2.1.8.5 Uniform Distribution . . . . . . . . . . . . . . . . . . . ........................................... ...........................................
Multivariate Distribution Theory . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Concepts and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Discrete and Continuous Multivariate Distributions . . 3.1.2 Multivariate Cumulative Distribution Functions . . . . . 3.1.3 Expectation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3.1 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3.3 Moment Generating Function . . . . . . . . . . . . . 3.1.4 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . 3.1.5 Conditional Distributions and Expectations . . . . . . . . 3.1.6 Mutual Independence among a Set of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.7 Random Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.8 Some Important Multivariate Discrete and Continuous Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . 3.1.8.1 Multinomial . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.8.2 Multivariate Normal . . . . . . . . . . . . . . . . . . . 3.1.9 Special Topics of Interest . . . . . . . . . . . . . . . . . . . . . . . 3.1.9.1 Mean and Variance of a Linear Function of Random Variables . . . . . . . . . . . . . . . . . . . . . 3.1.9.2 Convergence in Distribution . . . . . . . . . . . . . . 3.1.9.3 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . 3.1.9.4 Method of Transformations . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 49 . . . . .
49 49 50 50 50
. . . . . . . .
51 51 51 51 52 52 52 66
. 107 . 107 . 107 . 108 . 108 . 108 . 109 . 109 . 109 . 110 . 111 . 112 . 112 . 112 . 112 . 114 . 114 . 114 . 114 . 115 . 116 . 132
xi
Contents
4.
5.
Estimation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Concepts and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Point Estimation of Population Parameters . . . . . . . . 4.1.1.1 Method of Moments (MM) . . . . . . . . . . . . . . 4.1.1.2 Unweighted Least Squares (ULS) . . . . . . . . . 4.1.1.3 Weighted Least Squares (WLS) . . . . . . . . . . . 4.1.1.4 Maximum Likelihood (ML) . . . . . . . . . . . . . 4.1.2 Data Reduction and Joint Sufficiency . . . . . . . . . . . . 4.1.2.1 Joint Sufficiency . . . . . . . . . . . . . . . . . . . . . . 4.1.2.2 Factorization Theorem . . . . . . . . . . . . . . . . . 4.1.3 Methods for Evaluating the Properties of a Point Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3.1 MeanSquared Error (MSE) . . . . . . . . . . . . . 4.1.3.2 Cramér–Rao Lower Bound (CRLB) . . . . . . . . 4.1.3.3 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3.4 Completeness . . . . . . . . . . . . . . . . . . . . . . . 4.1.3.5 Rao–Blackwell Theorem . . . . . . . . . . . . . . . . 4.1.4 Interval Estimation of Population Parameters . . . . . . 4.1.4.1 Exact Confidence Intervals . . . . . . . . . . . . . . 4.1.4.2 Exact CI for the Mean of a Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4.3 Exact CI for a Linear Combination of Means of Normal Distributions . . . . . . . . . . 4.1.4.4 Exact CI for the Variance of a Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4.5 Exact CI for the Ratio of Variances of Two Normal Distributions . . . . . . . . . . . . . . . . . . 4.1.4.6 LargeSample Approximate CIs . . . . . . . . . . 4.1.4.7 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4.8 Slutsky’s Theorem . . . . . . . . . . . . . . . . . . . . 4.1.4.9 Construction of MLBased CIs . . . . . . . . . . . 4.1.4.10 MLBased CI for a Bernoulli Distribution Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4.11 Delta Method . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4.12 Delta Method CI for a Function of a Bernoulli Distribution Probability . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 195 . . 196 . . 224
Hypothesis Testing Theory . . . . . . . . . . . . . . . . . . . . . 5.1 Concepts and Notation . . . . . . . . . . . . . . . . . . . . . 5.1.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . 5.1.1.1 Simple and Composite Hypotheses 5.1.1.2 Null and Alternative Hypotheses . . 5.1.1.3 Statistical Tests . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . . .
. 183 . 183 . 183 . 183 . 184 . 184 . 184 . 184 . 184 . 185
. . . . . . . .
. 185 . 185 . 186 . 186 . 187 . 187 . 187 . 187
. . 187 . . 188 . . 189 . . . . .
. 190 . 190 . 191 . 191 . 192
. . 193 . . 194
. 307 . 307 . 307 . 307 . 307 . 307
xii
Contents
5.1.2
5.1.3
5.1.4
Exercises Solutions
5.1.1.4 Type I and Type II Errors . . . . . . . . . . . . . . . . 5.1.1.5 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1.6 Test Statistics and Rejection Regions . . . . . . . . 5.1.1.7 PValues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Most Powerful (MP) and Uniformly Most Powerful (UMP) Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2.1 Review of Notation . . . . . . . . . . . . . . . . . . . . LargeSample MLBased Methods for Testing the Simple Null Hypothesis H0 : θ = θ0 (i.e., θ ∈ ω) versus the Composite Alternative Hypothesis H1 : θ ∈ ω ¯ .... 5.1.3.1 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . 5.1.3.2 Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3.3 Score Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . Large Sample MLBased Methods for Testing the Composite Null Hypothesis H0 : θ ∈ ω versus the Composite Alternative Hypothesis H1 : θ ∈ ω ¯ ....... 5.1.4.1 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . 5.1.4.2 Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4.3 Score Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................... ........................................... . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. 307 . 308 . 308 . 309 . 309 . 310
. 310 . 310 . 311 . 311
. 312 . 313 . 313 . 314 . 315 . 329
Appendix
Useful Mathematical Results . . . . . . . . A.1 Summations . . . . . . . . . . . . . . . . . A.2 Limits . . . . . . . . . . . . . . . . . . . . . . A.3 Important CalculusBased Results . A.4 Special Functions . . . . . . . . . . . . . A.5 Approximations . . . . . . . . . . . . . . A.6 Lagrange Multipliers . . . . . . . . . .
. 389 . 389 . 389 . 390 . 391 . 391 . 393
References
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Preface
This exercisesandsolutions book contains exercises and their detailed solutions covering statistical theory (from basic probability theory through the theory of statistical inference) that is taught in courses taken by advanced undergraduate students, and firstyear and secondyear graduate students, in many quantitative disciplines (e.g., statistics, biostatistics, mathematics, engineering, physics, computer science, psychometrics, epidemiology, etc.). The motivation for, and the contents of this book, stem mainly from the classroom teaching experiences of author Lawrence L. Kupper, who has taught graduatelevel courses in biostatistical theory for almost four decades as a faculty member with the University of North Carolina Department of Biostatistics. These courses have been uniformly and widely praised by students for their rigor, clarity, and use of reallife settings to illustrate the practical utility of the theoretical concepts being taught. Several exercises in this book have been motivated by actual biostatistical collaborative research experiences (including those of the three authors), where theoretical biostatistical principles have been used to address complicated research design and analysis issues (especially in fields related to the health sciences). The authors strongly believe that the best way to obtain an indepth understanding of the principles of biostatistical theory is to work through exercises whose solutions require nontrivial and illustrative utilization of relevant theoretical concepts. The exercises in this book have been prepared with this belief in mind. Mastery of the theoretical statistical strategies needed to solve the exercises in this book will prepare the reader for successful study of even higherlevel statistical theory. The exercises and their detailed solutions are divided into five chapters: Basic Probability Theory; Univariate Distribution Theory; Multivariate Distribution Theory; Estimation Theory; and Hypothesis Testing Theory. The chapters are arranged sequentially in the sense that a good understanding of basic probability theory is needed for exercises dealing with univariate distribution theory, and univariate distribution theory provides the basis for the extensions to multivariate distribution theory. The material in the first three chapters is needed for the exercises on statistical inference that constitute the last two chapters of the book. The exercises in each chapter vary in level of difficulty from fairly basic to challenging, with more difficult exercises identified with an asterisk. Each of the five chapters begins with a detailed introduction summarizing the statistical concepts needed to help solve the exercises in that xiii
xiv
Preface
chapter of the book. The book also contains a brief summary of some useful mathematical results (see Appendix A). The main mathematical prerequisite for this book is an excellent working knowledge of multivariable calculus, along with some basic knowledge about matrices (e.g., matrix multiplication, the inverse of a matrix, etc.). This exercisesandsolutions book is not meant to be used as the main textbook for a course on statistical theory. Some examples of excellent main textbooks on statistical theory include Casella and Berger (2002), Hogg, Craig, and McKean (2005), Kalbfleish (1985), Ross (2006), and Wackerly, Mendenhall III, and Scheaffer (2008). Rather, our book should serve as a supplemental source of a wide variety of exercises and their detailed solutions both for advanced undergraduate and graduate students who take such courses in statistical theory and for the instructors of such courses. In addition, our book will be useful to individuals who are interested in enhancing and/or refreshing their own theoretical statistical skills. The solutions to all exercises are presented in sufficient detail so that users of the book can see how the relevant statistical theory is used in a logical manner to address important statistical questions in a wide variety of settings. Lawrence L. Kupper Brian H. Neelon Sean M. O’Brien
Acknowledgments
Lawrence L. Kupper acknowledges the hundreds of students who have taken his classes in biostatistical theory. Many of these students have provided valuable feedback on the lectures, homework sets, and examinations that make up most of the material for this book. In fact, two of these excellent former students are coauthors of this book (Brian H. Neelon and Sean M. O’Brien). The authors want to personally thank Dr. Susan ReadeChristopher for helping with the construction of some exercises and solutions, and they want to thank the reviewers of this book for their helpful suggestions. Finally, the authors acknowledge the fact that some exercises may overlap in concept with exercises found in other statistical theory books; such conceptual overlap is unavoidable given the breadth of material being covered.
xv
Authors
Lawrence L. Kupper, PhD, is emeritus alumni distinguished professor of biostatistics, School of Public Health, University of North Carolina (UNC), Chapel Hill, North Carolina. Dr. Kupper is a fellow of the American Statistical Association (ASA), and he received a Distinguished Achievement Medal from the ASA’s Environmental Statistics Section for his research, teaching, and service contributions. During his 40 academic years at UNC, Dr. Kupper has won several classroom teaching and student mentoring awards. He has coauthored over 160 papers in peerreviewed journals, and he has published several coauthored book chapters. Dr. Kupper has also coauthored three textbooks, namely, Epidemiologic Research—Principles and Quantitative Methods, Applied Regression Analysis and Other Multivariable Methods (four editions), and Quantitative Exposure Assessment. The contents of this exercisesandsolutions book come mainly from course materials developed and used by Dr. Kupper for his graduatelevel courses in biostatistical theory, taught over a period of more than three decades. Brian H. Neelon, PhD, is a research statistician with the Children’s Environmental Health Initiative in the Nicholas School of the Environment at Duke University. He obtained his doctorate from the University of North Carolina, Chapel Hill, where he received the Kupper Dissertation Award for outstanding dissertationbased publication. Before arriving at Duke University, Dr. Neelon was a postdoctoral research fellow in the Department of Health Care Policy at Harvard University. His research interests include Bayesian methods, longitudinal data analysis, health policy statistics, and environmental health. Sean M. O’Brien, PhD, is an assistant professor in the Department of Biostatistics & Bioinformatics at the Duke University School of Medicine. He works primarily on studies of cardiovascular interventions using large multicenter clinical registries. He is currently statistical director of the Society of Thoracic Surgeons National Data Warehouse at Duke Clinical Research Institute. His methodological contributions are in the areas of healthcare provider performance evaluation, development of multidimensional composite measures, and clinical risk adjustment. Before joining Duke University, he was a research fellow at the National Institute of Environmental Health Sciences. He received his PhD in biostatistics from the University of North Carolina at Chapel Hill in 2002. xvii
1 Basic Probability Theory
1.1
Concepts and Notation
1.1.1 1.1.1.1
Counting Formulas Ntuples
With sets {a1 , a2 , . . . , aq } and {b1 , b2 , . . . , bs } containing q and s distinct items, respectively, it is possible to form qs distinct pairs (or 2tuples) of the form (ai , bj ), i = 1, 2, . . . , q and j = 1, 2, . . . , s. Adding a third set {c1 , c2 , . . . , ct } containing t distinct items, it is possible to form qst distinct triplets (or 3tuples) of the form (ai , bj , ck ), i = 1, 2, . . . , q, j = 1, 2, . . . , s, and k = 1, 2, . . . , t. Extensions to more than three sets of distinct items are straightforward.
1.1.1.2
Permutations
A permutation is defined to be an ordered arrangement of r distinct items. The number of distinct ways of arranging n distinct items using r at a time is denoted Pnr and is computed as Pnr =
n! , (n − r)!
where n! = n(n − 1)(n − 2) · · · (3)(2)(1) and where 0! ≡ 1. If the n items are not distinct, then the number of distinct permutations is less than Pnr . 1.1.1.3
Combinations
The number of ways of dividing n distinct items into k distinct groups with the ith group containing ni items, where n = ki=1 ni , is equal to n! n! . = k n1 !n2 ! · · · nk ! n! i=1
i
1
2
Basic Probability Theory
The above expression appears in the multinomial expansion (x1 + x2 + · · · + xk )n =
∗
n!
x1n1 x2n2 · · · xknk , i=1 ni !
k
where the summation symbol ∗ indicates summation over all possible values of n1 , n2 , . . . , nk with ni , i = 1, 2, . . . , k, taking the set of possible values {0, 1, . . . , n} subject to the restriction ki=1 ni = n. With x1 = x2 = · · · = xk = 1, it follows that ∗
k
n!
= kn . n ! i=1 i
As an important special case, when k = 2, then n! n! = = Cnn1 , n1 !n2 ! n1 !(n − n1 )! which is also the number of ways of selecting without replacement n1 items from a set of n distinct items (i.e., the number of combinations of n distinct items selected n1 at a time). The above combinational expression appears in the binomial expansion (x1 + x2 )n =
∗
n n! n1 n2 n n−n x1 x2 = Cnn1 x1 1 x2 1 . n1 !n2 ! n1 =0
When x1 = x2 = 1, it follows that n n1 =0
Cnn1 = 2n .
Example As a simple example using the above counting formulas, if 5 cards are dealt from a wellshuffled standard deck of 52 playing cards, the number of ways in which such a 5card hand would contain exactly 2 aces is equal to qs = C42 C48 3 = 103, 776, 4 where q = C2 = 6 is the number of ways of selecting 2 of the 4 aces and where s = C48 3 = 17, 296 is the number of ways of selecting 3 of the remaining 48 cards.
1.1.1.4
Pascal’s Identity n−1 Cnk = Cn−1 k−1 + Ck
for any positive integers n and k such that Cnk ≡ 0 if k > n.
3
Concepts and Notation
1.1.1.5 Vandermonde’s Identity Cm+n = r
r
n Cm r−k Ck ,
k=0
where m, n, and r are nonnegative integers satisfying r ≤ min{m, n}.
1.1.2
Probability Formulas
1.1.2.1
Definitions
Let an experiment be any process via which an observation or measurement is made. An experiment can range from a very controlled experimental situation to an uncontrolled observational situation. An example of the former situation would be a laboratory experiment where chosen amounts of different chemicals are mixed together to produce a certain chemical product. An example of the latter situation would be an epidemiological study where subjects are randomly selected and interviewed about their smoking and physical activity habits. Let A1 , A2 , . . . , Ap be p(≥ 2) possible events (or outcomes) that could occur when an experiment is conducted. Then: 1. For i = 1, 2, . . . , p, the complement of the event Ai , denoted Ai , is the event that Ai does not occur when the experiment is conducted. p
2. The union of the events A1 , A2 , . . . , Ap , denoted ∪i=1Ai , is the event that at least one of the events A1 , A2 , . . . , Ap occurs when the experiment is conducted. p 3. The intersection of the events A1 , A2 , . . . , Ap , denoted ∩i=1Ai , is the event that all of the events A1 , A2 , . . . , Ap occur when the experiment is conducted. Given these definitions, we have the following probabilistic results, where pr(Ai ), 0 ≤ pr(Ai ) ≤ 1, denotes the probability that event Ai occurs when the experiment is conducted: (i) pr(Ai ) = 1 − pr(Ai ). More generally, p p p pr ∪i=1Ai = 1 − pr ∪i=1Ai = pr ∩i=1 Ai and
p p p pr ∩i=1Ai = 1 − pr ∩i=1Ai = pr ∪i=1 Ai .
4
Basic Probability Theory
(ii) The probability of the union of p events is given by: p p−1 p p pr ∪i=1Ai = pr(Ai ) − pr(Ai ∩ Aj ) i=1
+
i=1 j=i+1
p p−2 p−1
pr(Ai ∩ Aj ∩ Ak ) − · · ·
i=1 j=i+1 k=j+1
p + (−1)p−1 pr ∩i=1Ai . As important special cases, we have, for p = 2, pr(A1 ∪ A2 ) = pr(A1 ) + pr(A2 ) − pr(A1 ∩ A2 ) and, for p = 3, pr(A1 ∪ A2 ∪ A3 ) = pr(A1 ) + pr(A2 ) + pr(A3 ) − pr(A1 ∩ A2 ) − pr(A1 ∩ A3 ) − pr(A2 ∩ A3 ) + pr(A1 ∩ A2 ∩ A3 ). 1.1.2.2
Mutually Exclusive Events
For i = j, two events Ai and Aj are said to be mutually exclusive if these two events cannot both occur (i.e., cannot occur together) when the experiment is conducted; equivalently, the events Ai and Aj are mutually exclusive when pr(Ai ∩ Aj ) = 0. If the p events A1 , A2 , . . . , Ap are pairwise mutually exclusive, that is, if pr(Ai ∩ Aj ) = 0 for every i = j, then p p pr ∪i=1Ai = pr(Ai ), i=1
since pairwise mutual exclusivity implies that any intersection involving more than two events must necessarily have probability zero of occurring. 1.1.2.3
Conditional Probability
For i = j, the conditional probability that event Ai occurs given that (or conditional on the fact that) event Aj occurs when the experiment is conducted, denoted pr(Ai Aj ), is given by the expression pr(Ai Aj ) =
pr(Ai ∩ Aj ) , pr(Aj )
pr(Aj ) > 0.
5
Concepts and Notation
Using the above definition, we then have: p p−1 p−1 pr ∩i=1Ai = pr Ap  ∩i=1 Ai pr ∩i=1 Ai p−1 p−2 p−2 = pr Ap  ∩i=1 Ai pr Ap−1  ∩i=1 Ai pr ∩i=1 Ai .. .
p−1 p−2 = pr Ap  ∩i=1 Ai pr Ap−1  ∩i=1 Ai · · · pr(A2 A1 )pr(A1 ).
Note that there would be p! ways of writing the above product of p probabilities. For example, when p = 3, we have pr(A1 ∩ A2 ∩ A3 ) = pr(A3 A1 ∩ A2 )pr(A2 A1 )pr(A1 ) = pr(A2 A1 ∩ A3 )pr(A1 A3 )pr(A3 ) = pr(A1 A2 ∩ A3 )pr(A3 A2 )pr(A2 ), and so on.
1.1.2.4
Independence
The events Ai and Aj are said to be independent events if and only if the following equivalent probability statements are true: 1. pr(Ai Aj ) = pr(Ai ); 2. pr(Aj Ai ) = pr(Aj ); 3. pr(Ai ∩ Aj ) = pr(Ai )pr(Aj ). When the events A1 , A2 , . . . , Ap are mutually independent, so that the conditional probability of any event is equal to the unconditional probability of that same event, then p p pr ∩i=1Ai = pr(Ai ). i=1
1.1.2.5
Partitions and Bayes’ Theorem p When pr ∪i=1Ai = 1, and when the events A1 , A2 , . . . , Ap are pairwise mutually exclusive, then the events A1 , A2 , . . . , Ap are said to constitute a partition of the experimental outcomes; in other words, when the experiment is conducted, exactly one and only one of the events A1 , A2 , . . . , Ap must occur. If B
6
Basic Probability Theory
is any event and A1 , A2 , . . . , Ap constitute a partition, it follows that
p p pr(B) = pr B ∩ ∪i=1Ai = pr ∪i=1 (B ∩ Ai ) =
p
pr(B ∩ Ai ) =
p
i=1
pr(BAi )pr(Ai ).
i=1
As an illustration of the use of the above formula, if the events A1 , A2 , . . . , Ap represent an exhaustive list of all p possible causes of some observed outcome B, where pr(B) > 0, then, given values for pr(Ai ) and pr(BAi ) for all i = 1, 2, . . . , p, one can employ Bayes’ Theorem to compute the probability that Ai was the cause of the observed outcome B, namely, pr(Ai B) =
pr(Ai ∩ B) pr(BAi )pr(Ai ) , = p pr(B) j=1 pr(BAj )pr(Aj )
i = 1, 2, . . . , p.
p Note that i=1 pr(Ai B) = 1. As an important special case, suppose that the events A1 , A2 , . . . , Ap constituting a partition are elementary events in the sense that none of these p events can be further decomposed into smaller events (i.e., for i = 1, 2, . . . , p, the event Ai cannot be written as a union of mutually exclusive events each having a smaller probability than Ai of occurring when the experiment is conducted). Then, any more complex event B (sometimes called a compound event) must be able to be represented as the union of two or more of the elementary events A1 , A2 , . . . , Ap . In particular, with 2 ≤ m ≤ p, if B = ∪m j=1Aij , where the set of positive integers {i1 , i2 , . . . , im } is a subset of the set of positive integers {1, 2, . . . , p}, then pr(B) =
m
pr(Aij ).
j=1
In the very special case when the elementary events A1 , A2 , . . . , Ap are equally likely to occur, so that pr(Ai ) = 1p for i = 1, 2, . . . , p, then pr(B) = mp . Example To continue an earlier example, there would be p = C52 5 = 2,598,960 possible 5card hands that could be dealt from a wellshuffled standard deck of 52 playing 1 of occurring. If B is cards. Thus, each such 5card hand has probability 2,598,960 the event that a 5card hand contains exactly two aces, then pr(B) =
103,776 m = = 0.0399. p 2,598,960
Exercises
7
EXERCISES Exercise 1.1. Suppose that a pair of balanced dice is tossed. Let Ex be the event that the sum of the two numbers obtained is equal to x, x = 2, 3, . . . , 12. (a) Develop an explicit expression for pr(Ex ). (b) Let A be the event that “x is divisible by 4,” let B be the event that “x is greater than 9,” and let C be the event that “x is not a prime number.” Find the numerical values of the following probabilities: pr(A), pr(B), pr(C), pr(A ∩ B), pr(A ∩ C), pr(B ∩ C), ¯ pr(A ∩ B ∩ C), pr(A ∪ B ∪ C), pr(A ∪ BC), and pr(AB ∪ C). Exercise 1.2. For any family in the United States, suppose that the probability of any child being male is equal to 0.50, and that the gender status of any child in a family is unaffected by the gender status of any other child in that same family. What is the minimum number, say n∗ , of children that any U.S. couple needs to have so that the probability is no smaller than 0.90 of having at least one male child and at least one female child? Exercise 1.3. Suppose that there are three urns. Urn 1 contains three white balls and four black balls. Urn 2 contains two white balls and three black balls. And, Urn 3 contains four white balls and two black balls. One ball is randomly selected from Urn 1 and is put into Urn 2. Then, one ball is randomly selected from Urn 2 and is put into Urn 3. Then, two balls are simultaneously selected from Urn 3. Find the exact numerical value of the probability that both balls selected from Urn 3 are white. Exercise 1.4. In the National Scrabble Contest, suppose that the two players in the final match (say, Player A and Player B) play consecutive games, with the national champion being that player who is the first to win five games. Assuming that no game can end in a tie, the two finalists must necessarily play at least 5 games but no more than 9 games. Further, assume (probably somewhat unrealistically) that the outcomes of the games are mutually independent of one another, and also assume that π is the probability that Player A wins any particular game. (a) Find an explicit expression for the probability that the final match between Player A and Player B lasts exactly 6 games. (b) Given that Player A wins the first two games, find an explicit expression for the probability that Player A wins the final match in exactly 7 games. (c) Find an explicit expression for the probability that Player B wins the final match. Exercise 1.5. Suppose that there are two different diagnostic tests (say, Test A and Test B) for a particular disease of interest. In a certain large population, suppose that the prevalence of this disease is 1%. Among all those people who have this disease in this large population, 10% will incorrectly test negatively for the presence of the disease when given Test A; and, independently of any results based on Test A, 5% of these diseased people will incorrectly test negatively when given Test B. Among all those people who do not have the disease in this large population, 6% will incorrectly test positively when given Test A; and, independently of any results based on Test A, 8% of these nondiseased people will incorrectly test positively when given Test B.
8
Basic Probability Theory
(a) Given that both Tests A and B are positive when administered to a person selected randomly from this population, what is the numerical value of the probability that this person actually has the disease in question? (b) Given that Test A is positive when administered to a person randomly selected from this population, what is the numerical value of the probability that Test B will also be positive? (c) Given that a person selected randomly from this population actually has the disease in question, what is the numerical value of the probability that at least one of the two different diagnostic tests given to this particular person will be positive? Exercise 1.6. A certain medical laboratory uses three machines (denoted M1 , M2 , and M3 , respectively) to measure prostatespecific antigen (PSA) levels in blood samples selected from adult males; high PSA levels have been shown to be associated with the presence of prostate cancer. Assume that machine M1 has probability 0.01 of providing an incorrect PSA level, that machine M2 has probability 0.02 of providing an incorrect PSA level, and that machine M3 has probability 0.03 of providing an incorrect PSA level. Further, assume that machine M1 performs 20% of the PSA analyses done by this medical laboratory, that machine M2 performs 50% of the PSA analyses, and that machine M3 performs 30% of the PSA analyses. (a) Find the numerical value of the probability that a PSA analysis performed by this medical laboratory will be done correctly. (b) Given that a particular PSA analysis is found to be done incorrectly, what is the numerical value of the probability that this PSA analysis was performed either by machine M1 or by machine M2 ? (c) Given that two independent PSA analyses are performed and that exactly one of these two PSA analyses is found to be correct, find the numerical value of the probability that machine M2 did not perform both of these PSA analyses. Exercise 1.7. Suppose that two medical doctors, denoted Doctor #1 and Doctor #2, each examine a person randomly chosen from a certain population to check for the presence or absence of a particular disease. Let C1 be the event that Doctor #1 makes the correct diagnosis, let C2 be the event that Doctor #2 makes the correct diagnosis, and let D be the event that the randomly chosen patient actually has the disease in question; further, assume that the events C1 and C2 are independent conditional on disease status. Finally, let the prevalence of the disease in the population be θ = pr(D), let π1 = pr(C1 D) = pr(C2 D), and let π0 = pr(C1 D) = pr(C2 D). (a) Develop an explicit expression for pr(C2 C1 ). Are the events C1 and C2 unconditionally independent? Comment on the more general implications of this particular example. (b) For this particular example, determine specific conditions involving θ, π0 , and π1 such that pr(C2 C1 ) = pr(C2 ). Exercise 1.8. For a certain state lottery, 5 balls are drawn each day randomly without replacement from an urn containing 40 balls numbered individually from 1 to 40. Suppose that there are k (>1) consecutive days of such drawings. Develop an expression
Exercises
9
for the probability πk that there is at least one matching set of 5 numbers in those k drawings. Exercise 1.9. In a certain small city in the United States, suppose that there are n (≥2) dental offices listed in that city’s phone book. Further, suppose that k (2 ≤ k ≤ n) people each independently and randomly call one of these n dental offices for an appointment. (a) Find the probability α that none of these k people call the same dental office, and then find the numerical value of α when n = 7 and k = 4. (b) Find the probability β that all of these k people call the same dental office, and then find the numerical value of β when n = 7 and k = 4. Exercise 1.10. Suppose that the positive integers 1, 2, . . . , k, k ≥ 3, are arranged randomly in a horizontal line, thus occupying k slots. Assume that all arrangements of these k integers are equally likely. For j = 0, 1, . . . , (k − 2), develop an explicit expression for the probability θj that there are exactly j integers between the integers 1 and k. Exercise 1.11. Suppose that a balanced die is rolled n (≥6) times. Find an explicit expression for the probability θn that each of the six numbers 1, 2, . . . , 6 appears at least once during the n rolls. Find the numerical value of θn when n = 10. Exercise 1.12. An urn contains N balls numbered 1, 2, 3, . . . , (N − 1), N. A sample of n (2 ≤ n < N) balls is selected at random with replacement from this urn, and the n numbers obtained in this sample are recorded. Derive an explicit expression for the probability that the n numbers obtained in this sample of size n are all different from one another (i.e., no two or more of these n numbers are the same). If N = 10 and n = 4, what is the numerical value of this probability? Exercise 1.13. Suppose that an urn contains N (N > 1) balls, each individually labeled with a number from 1 to N, where N is an unknown positive integer. (a) If n (2 ≤ n < N) balls are selected oneatatime with replacement from this urn, find an explicit expression for the probability θwr that the ball labelled with the number N is selected. (b) If n (2 ≤ n < N) balls are selected oneatatime without replacement from this urn, find an explicit expression for the probability θwor that the ball labelled with the number N is selected. (c) Use a proof by induction to determine which method of sampling has the higher probability of selecting the ball labeled with the number N. Exercise 1.14. A midwestern U.S. city has a traffic system designed to move morning rushhour traffic from the suburbs into this city’s downtown area via three tunnels. During any weekday, there is a probability θ (0 < θ < 1) that there will be inclement weather. Because of the need for periodic maintenance, tunnel i (i = 1, 2, 3) has probability πi (0 < πi < 1) of being closed to traffic on any weekday. Periodic maintenance
10
Basic Probability Theory
activities for any particular tunnel occur independently of periodic maintenance activities for any other tunnel, and all periodic maintenance activities for these three tunnels are performed independently of weather conditions. The rate of rushhour traffic flow into the downtown area on any weekday is considered to be excellent if there is no inclement weather and if all three tunnels are open to traffic. The rate of traffic flow is considered to be poor if either: (i) more than one tunnel is closed to traffic; or, (ii) there is inclement weather and exactly one tunnel is closed to traffic. Otherwise, the rate of traffic flow is considered to be marginal. (a) Develop an explicit expression for the probability that exactly one tunnel is closed to traffic. (b) Develop explicit expressions for the probability that the rate of traffic flow is excellent, for the probability that the rate of traffic flow is marginal, and for the probability that the rate of traffic flow is poor. (c) Given that a particular weekday has a marginal rate of traffic flow, develop an explicit expression for the conditional probability that this particular weekday of marginal flow is due to inclement weather and not to a tunnel being closed to traffic. Exercise 1.15. Bonnie and Clyde each independently toss the same unbalanced coin and count the number of tosses that it takes each of them to obtain the first head. Assume that the probability of obtaining a head with this unbalanced coin is equal to π, 0 < π < 1, with π = 12 . (a) Find the probability that Bonnie and Clyde each require the same number of tosses of this unbalanced coin to obtain the first head. (b) Find the probability that Bonnie will require more tosses of this unbalanced coin than Clyde to obtain the first head. Exercise 1.16. Suppose that 15 senior math majors, 7 males and 8 females, at a major public university in the United States each take the same Graduate Record Examination (GRE) in advanced mathematics. Further, suppose that each of these 15 students has probability π, 0 < π < 1, of obtaining a score that exceeds the 80th percentile for all scores recorded for that particular examination. Given that exactly 5 of these 15 students scored higher than the 80th percentile, what is the numerical value of the probability θ that at least 3 of these 5 students were female? Exercise 1.17. In the popular card game bridge, each of four players is dealt a hand of 13 cards from a wellshuffled deck of 52 standard playing cards. Find the numerical value of the probability that any randomly dealt hand of 13 cards contains all three face cards of the same suit, where a face card is a jack, a queen, or a king; note that it is possible for a hand of 13 cards to contain all three face cards in at least two different suits. Exercise 1.18∗ . In the game known as “craps,” a dice game played in casinos all around the world, a player competes against the casino (called “the house") according to the following rules. If the player (called “the shooter” when rolling the dice) rolls either a 7 or an 11 on the first roll of the pair of dice, the player wins the game (and the house,
11
Exercises
of course, loses the game); if the player rolls either 2, 3, or 12 on the first roll, the player loses the game (and the house, of course, wins the game). If the player rolls any of the remaining numbers 4, 5, 6, 8, 9, or 10 on the first roll (such a number is called “the point”), the player keeps rolling the pair of dice until either the point is rolled again or until a 7 is rolled. If the point (e.g., 4) is rolled before a 7 is rolled, the player wins the game; if a 7 is rolled before the point (e.g., 4) is rolled, the player loses the game. Find the exact numerical value of the probability that the player wins the game. Exercise 1.19∗ . In a certain chemical industry, suppose that a proportion πh (0 < πh < 1) of all workers is exposed to a high daily concentration level of a certain potential carcinogen, that a proportion πm (0 < πm < 1) of all workers is exposed to a moderate daily concentration level, that a proportion πl (0 < πl < 1) of all workers is exposed to a low daily concentration level, and that a proportion πo (0 < πo < 1) of all workers receives no exposure to this potential carcinogen. Note that (πh + πm + πl + πo ) = 1. Suppose that n workers in this chemical industry are randomly selected. Let θn be the probability that an even number of highly exposed workers is included in this randomly selected sample of n workers, where 0 is considered to be an even number. (a) Find a difference equation of the form θn = f (πh , θn−1 ) that expresses θn as a function of πh and θn−1 , where θ0 ≡ 1. (b) Assuming that a solution to this difference equation is of the form θn = α + βγn , find an explicit solution for this difference equation (i.e., find specific values for α, β, and γ), and then compute the numerical value of θ50 when πh = 0.05. Exercise 1.20∗ . In epidemiological research, a followup study involves enrolling randomly selected diseasefree subjects with different sets of values of known or suspected risk factors for a certain disease of interest and then following these subjects for a specified period of time to investigate how these risk factors are related to the risk of disease development (i.e., to the probability of developing the disease of interest). A model often used to relate a (row) vector of k risk factors x = (x1 , x2 , . . . , xk ) to the probability of developing the disease of interest, where D is the event that a person develops the disease of interest, is the logistic model
−1
−1
−(β + kj=1 βj xj ) pr(Dx) = 1 + e 0 = 1 + e−(β0 +β x) =
eβ0 +β x
1 + eβ0 +β x
,
where the intercept β0 and the (row) vector β = (β1 , β2 , . . . , βk ) constitute a set of (k + 1) regression coefficients. For certain rare chronic diseases like cancer, a followup study can take many years to yield valid and precise statistical conclusions because of the length to time required for sufficient numbers of diseasefree subjects to develop the disease. Because of this limitation of followup studies for studying the potential causes of rare chronic diseases, epidemiologists developed the case–control study. In a case–control study, random samples of cases (i.e., subjects who have the disease of interest) and controls (i.e., subjects who do not have the disease of interest) are asked to provide information about their values of the risk factors x1 , x2 , . . . , xk . One problem with this outcomedependent sampling design is that statistical models for the risk of disease will now depend on the probabilities of selection into the study for both cases and controls.
12
Basic Probability Theory
More specifically, let S be the event that a subject is selected to participate in a case–control study. Then, let π1 = pr(SD, x) = pr(SD)
¯ x) = pr(SD) and π0 = pr(SD,
be the probabilities of selection into the study for cases and controls, respectively, where it is assumed that these selection probabilities do not depend on x. (a) Assuming the logistic model for pr(Dx) given above, show that the risk of disease development for a case–control study, namely pr(DS, x), can be written as a logistic model, but with an intercept that functionally depends on π1 and π0 . Comment on this finding with regard to using a case–control study to estimate disease risk as a function of x. (b) The risk odds ratio comparing the odds of disease for a subject with the set of risk factors (x∗ ) = (x1∗ , x2∗ , . . . , xk∗ ) to the odds of disease for a subject with the set of risk factors x = (x1 , x2 , . . . , xk ) is defined as θr = Show that
¯ ∗) pr(Dx∗ )/pr(Dx . ¯ pr(Dx)/pr(Dx)
∗ θr = eβ (x −x) ,
and then show that the risk odds ratio expression for a case–control study, namely, θc =
¯ x∗ ) pr(DS, x∗ )/pr(DS, , ¯ x) pr(DS, x)/pr(DS,
∗
is also equal to eβ (x −x) . Finally, interpret these results with regard to the utility of case–control studies for epidemiological research. Exercise 1.21∗ . In a certain population of adults, the prevalence of inflammatory bowl disease (IBD) is θ, 0 < θ < 1. Suppose that three medical doctors each independently examine the same adult (randomly selected from this population) to determine whether or not this adult has IBD. Further, given that this adult has IBD, suppose that each of the three doctors has probability π1 , 0 < π1 < 1, of making the correct diagnosis that this adult does have IBD; and, given that this adult does not have IBD, suppose that each of the three doctors has probability π0 , 0 < π0 < 1, of making the correct diagnosis that this adult does not have IBD. Consider the following two diagnostic strategies: Diagnostic Strategy #1: The diagnosis is based on the majority opinion of the three doctors; Diagnostic Strategy #2: One of the three doctors is randomly chosen and the diagnosis is based on the opinion of just that one doctor. (a) Find ranges of values for π1 and π0 that jointly represent a sufficient condition for which Diagnostic Strategy #1 has a higher probability than Diagnostic Strategy #2 of providing the correct diagnosis. Comment on your findings.
13
Exercises
(b) Under the stated assumptions, suppose a fourth doctor’s opinion is solicited. Would it be better to make a diagnosis based on the majority opinion of four doctors (call this Diagnostic Strategy #3) rather than on the majority opinion of three doctors (i.e., Diagnostic Strategy #1)? Under Diagnostic Strategy #3, note that no diagnosis will be made if two doctors claim that the adult has IBD and the other two doctors claim that the adult does not have IBD. Exercise 1.22∗ . Consider the following three events: D: an individual has Alzheimer’s Disease; E: an individual has diabetes; M: an individual is male. And, consider the following list of conditional probabilities: ¯ π01 = pr(DE¯ ∩ M), π11 = pr(DE ∩ M), π10 = pr(DE ∩ M), ¯ π1 = pr(DE), and π0 = pr(DE). ¯ π00 = pr(DE¯ ∩ M), The risk ratio comparing the risk of Alzheimer’s Disease for a diabetic to that for a nondiabetic among males is equal to RR1 =
π11 ; π01
the risk ratio comparing the risk of Alzheimer’s Disease for a diabetic to that for a nondiabetic among females is equal to RR0 =
π10 ; π00
and, the crude risk ratio ignoring gender status that compares the risk of Alzheimer’s Disease for a diabetic to that for a nondiabetic is equal to RRc =
π1 . π0
Assuming that RR1 = RR0 = RR (i.e., there is homogeneity [or equality] of the risk ratio across gender groups), then gender status is said to be a confounder of the true association between diabetes and Alzheimer’s Disease when RRc = RR. Under this homogeneity assumption, find two sufficient conditions for which gender status will not be a confounder of the true association between diabetes and Alzheimer’s Disease; that is, find two sufficient conditions for which RRc = RR. Exercise 1.23∗ . Consider a diagnostic test which is being used to diagnose the presence ¯ 1 ) of some particular disease in a population with pretest probability (D1 ) or absence (D ¯ 1) = 1 − (or prevalence) of this particular disease equal to pr(D1 ) = π1 ; also, let pr(D ¯ 1 ) where T+ denotes the event π1 = π2 . Further, let θ1 = pr(T+ D1 ) and θ2 = pr(T+ D that the diagnostic test is positive (i.e., the diagnostic test indicates the presence of the disease in question).
14
Basic Probability Theory
(a) Given that the diagnostic test is positive, prove that the posttest odds of an individual having, versus not having, the disease in question is given by the formula pr(D1 T+ ) θ1 π1 π1 = = LR 12 + ¯ π2 θ2 π2 pr(D1 T ) where LR12 = θ1 /θ2 is the socalled likelihood ratio for the diagnostic test and where π1 /π2 is the pretest odds of the individual having, versus not having, the disease in question. Hence, knowledge of the likelihood ratio for a diagnostic test permits a simple conversion from pretest odds to posttest odds (Birkett, 1988). (b) Now, suppose we wish to diagnose an individual as having one of three mutually exclusive diseases (i.e., the patient is assumed to have exactly one, but only one, of the three diseases in question). Thus, generalizing the notation in part (a), we have 3i=1 πi = 1, where pr(Di ) = πi is the pretest probability of having disease i, i = 1, 2, 3. With θi = pr(T+ Di ), i = 1, 2, 3, prove that ⎡ ⎤ −1 −1 3 pr(D1 T+ ) ⎣ π1 ⎦ , = LR1i ¯ 1 T+ ) πi pr(D i=2
where LR1i = θ1 /θi , i = 2, 3. Further, prove that the posttest probability of having disease 1 is ⎡ ⎤ −1 −1 3 π 1 + ⎦ . pr(D1 T ) = ⎣1 + LR1i πi i=2
(c) As a numerical example, consider an emergency room physician attending a patient presenting with acute abdominal pain. This physician is considering the use of a new diagnostic test which will be employed to classify patients into one of three mutually exclusive categories: nonspecific abdominal pain (NS), appendicitis (A), or cholecystitis (C). The published paper describing this new diagnostic test reports that a positive test result gives a likelihood ratio for diagnosing NS versus A of 0.30, namely, pr(T+ NS)/pr(T+ A) = 0.30. Also, the likelihood ratio for diagnosing NS versus C is 0.50, and the likelihood ratio for diagnosing A versus C is 1.67. In addition, a study of a very large number of patients seen in emergency rooms revealed that the pretest probabilities for the three diseases were pr(NS) = 0.57, pr(A) = 0.33, and pr(C) = 0.10. Using all this information, calculate for each of these three diseases, the posttest odds and the posttest probability of disease. Based on your numerical results, what is the most likely diagnosis (NS, A, or C) for an emergency room patient with a positive test result based on the use of this particular diagnostic test? Exercise 1.24∗ . In medicine, it is often of interest to assess whether two distinct diseases (say, disease A and disease B) tend to occur together. The odds ratio parameter ψ is defined as ¯ pr(AB)/pr(AB) ψ= , ¯ ¯ ¯ pr(AB)/pr(AB) and serves as one statistical measure of the tendency for diseases A and B to occur together. An observed value of ψ significantly greater than 1 may suggest that diseases
15
Exercises
A and B have a common etiology, which could lead to better understanding of disease processes and, ultimately, to prevention. Suppose, however, that the diagnosis of the presence of diseases A and B involves the presence of a third factor (say, C). An example would be where a person with an abnormally high cholesterol level would then be evaluated more closely for evidence of both ischemic heart disease and hypothyroidism. In such a situation, one is actually considering the odds ratio ψc =
¯ ∩ CB ∩ C) pr(A ∩ CB ∩ C)/pr(A , ¯ ∩ CB¯ ∩ C) pr(A ∩ CB¯ ∩ C)/pr(A
which is a measure of the association between diseases A and B each observed simultaneously with factor C. (a) Show that ψ and ψc are related by the equation
¯ ∩ B) ¯ pr(CA ∩ B)pr(CA ψc = ψ ¯ ¯ pr(CA ∩ B)pr(CA ∩ B)
¯ pr(C) 1+ . ¯ ∩ B¯ ∩ C) pr(A
(b) If A, B, and C actually occur completely independently of one another, how are ψ and ψc related? Comment on the direction of the bias when using ψc instead of ψ as the measure of association between diseases A and B. Exercise 1.25∗ . Suppose that a certain process generates a sequence of (s + t) outcomes of two types, say, s successes (denoted as S’s) and t failures (denoted as F’s). A run is a subsequence of outcomes of the same type which is both preceded and succeeded by outcomes of the opposite type or by the beginning or by the end of the complete sequence. For example, consider the sequence SSFSSSFSFSFFS of s = 8 successes and t = 5 failures. When rewritten as SSFSSSFSFSFFS, it is clear that this particular sequence contains a total of nine runs, namely, five S runs (three of length 1, one of length 2, and one of length 3) and four F runs (three of length 1 and one of length 2). Since the S runs and F runs alternate in occurrence, the number of S runs differs by at most one from the number of F runs. (a) Assuming that all possible sequences of s successes and t failures are equally likely to occur, derive an expression for the probability πx that any sequence contains a total of exactly x runs. HINT: Consider separately the two situations where x is an even positive integer and where x is an odd positive integer. (b) For each year over a 7year period of time, a certain cancer treatment center recorded the percentage of pancreatic cancer patients who survived at least 5 years following treatment involving both surgery and chemotherapy. For each
16
Basic Probability Theory
of the seven years, let the event S be the event that the survival percentage is at ¯ Suppose that the following sequence (ordered least 20%, and let the event F = S. chronologically) is observed: FFSFSSS. Does this observed sequence provide evidence of a nonrandom pattern of 5year survival percentages over the 7year period of time? For additional information about the theory of runs, see Feller (1968). Exercise 1.26∗ . Consider the following experiment designed to examine whether a human subject has extrasensory perception (ESP). A set of R (R > 2) chips, numbered individually from 1 to R, is arranged in random order by an examiner, and this random order cannot be seen by the subject under study. Then, the subject is given an identical set of R chips and is asked to arrange them in exactly the same order as the random order constructed by the experimenter. (a) Develop an expression for the probability θ(0, R) that the subject has no chips in their correct positions (i.e., in positions corresponding to the chip positions constructed by the experimenter). Also, find the limiting value of θ(0, R) as R → ∞, and then comment on your finding. (b) For r = 0, 1, 2, . . . , R, use the result in part (a) to develop an expression for the probability θ(r, R) that the subject has exactly r out of R chips in their correct positions. (c) Assuming that R = 5, what is the probability that the subject places at least 3 chips in their correct positions? Exercise 1.27∗ . Suppose that two players (denoted Player A and Player B) play a game where they alternate flipping a balanced coin, with the winner of the game being the first player to obtain k heads (where k is a known positive integer). (a) With a and b being positive integers, let (a, b, A) denote that specific game where Player A needs a heads to win, where Player B needs b heads to win, and where it is Player A’s turn to flip the balanced coin. Similarly, let (a, b, B) denote that specific game where Player A needs a heads to win, where Player B needs b heads to win, and where it is Player B’s turn to flip the balanced coin. Also, let π(a, b, A) be the probability that Player A wins game (a, b, A), and let π(a, b, B) be the probability that Player A wins game (a, b, B). Show that π(a, b, A) =
2 1 π(a − 1, b, B) + π(a, b − 1, A), 3 3
π(a, b, B) =
1 2 π(a − 1, b, B) + π(a, b − 1, A). 3 3
and that
(b) Assuming that Player A goes first in any game, find the exact numerical values of the probabilities that A wins the game when k = 2 and when k = 3. In other words, find the exact numerical values of π(2, 2, A) and π(3, 3, A).
17
Solutions
Exercise 1.28∗ . The first author (LLK) has been a University of North Carolina (UNC) Tar Heel basketball fan for close to 50 years. This exercise is dedicated to LLK’s alltime favorite Tar Heel basketball player, Tyler Hansbrough; Tyler is the epitome of a studentathlete, and he led the Tar Heels to the 2009 NCAA Division I men’s basketball national championship. During his 4year career, Tyler also set numerous UNC, ACC, and NCAA individual records. In the questions to follow, assume that Tyler has a fixed probability π, 0 < π < 1, of making any particular free throw, and also assume that the outcome (i.e., either a make or a miss) for any one free throw is independent of the outcome for any other free throw. (a) Given that Tyler starts shooting free throws, derive a general expression (as a function of π, a, and b) for the probability θ(π, a, b) that Tyler makes a consecutive free throws before he misses b consecutive free throws, where a and b are positive integers. For his 4year career at UNC, Tyler’s value of π was 0.791; using this value of π, compute the numerical value of the probability that Tyler makes 10 consecutive free throws before he misses two consecutive free throws. HINT: Let Aab be the event that Tyler makes a consecutive free throws before he misses b consecutive free throws, let Ba be the event that Tyler makes the first a free throws that he attempts, and let Cb be the event that Tyler misses the first b free throws that he attempts. Express α = pr(Aab B1 ) as a function of both π and β = pr(Aab B¯ 1 ), express β as a function of both π and α, and then use the fact that θ(π, a, b) = pr(Aab ) = πα + (1 − π)β. (b) Find the value of θ(π, a, b) when both π = 0.50 and a = b; also, find the value of θ(π, a, b) when a = b = 1. For these two special cases, do these answers make sense? Also, comment on the reasonableness of any assumptions underlying the development of the expression for θ(π, a, b). (c) If Tyler continues to shoot free throws indefinitely, show that he must eventually either make a consecutive free throws or miss b consecutive free throws. SOLUTIONS Solution 1.1 (a) Let Dij be the event that die #1 shows the number i and that die #2 shows the number j, i = 1, 2, . . . , 6 and j = 1, 2, . . . , 6. Clearly, these 36 events form the finest partition of the set of possible experimental outcomes. Thus, it follows that 1 for all i and j, and where ∗ indicates pr(Ex ) = ∗ pr(Dij ), where pr(Dij ) = 36 summation over all (i, j) pairs for which (i + j) = x. For example, pr(E6 ) = pr(D15 ) + pr(D51 ) + pr(D24 ) + pr(D42 ) + pr(D33 ) = In general, pr(Ex ) =
min {(x − 1), (13 − x)} , 36
x = 2, 3, . . . , 12.
5 . 36
18
Basic Probability Theory
(b) Note that A = E4 ∪ E8 ∪ E12 ,
B = E10 ∪ E11 ∪ E12 ,
and
C = E4 ∪ E6 ∪ E8 ∪ E9 ∪ E10 ∪ E12 , ¯ = E2 ∪ E3 ∪ E5 ∪ E7 ∪ E11 . so that C 7 . Also, A ∩ B = So, it follows directly that pr(A) = 41 , pr(B) = 16 , and pr(C) = 12
1 ; A ∩ C = E ∪ E ∪ E , so that pr(A ∩ C) = 1 ; B ∩ C = E12 , so that pr(A ∩ B) = 36 4 8 12 4 1 ; and, E10 ∪ E12 , so that pr(B ∩ C) = 19 ; A ∩ B ∩ C = E12 , so that pr(A ∩ B ∩ C) = 36
A ∪ B ∪ C = E4 ∪ E6 ∪ E8 ∪ E9 ∪ E10 ∪ E11 ∪ E12 , so that pr(A ∪ B ∪ C) = 23 36 . Also, pr(A ∪ BC) = pr(AC) + pr(BC) − pr(A ∩ BC) =
pr(A ∩ C) pr(B ∩ C) pr(A ∩ B ∩ C) + − pr(C) pr(C) pr(C) 1
1
1
12
12
12
4 = . = 74 + 79 − 36 7 7 Finally, ¯ = pr(AB ∪ C)
¯ ¯ pr[A ∩ (B ∪ C)] pr[(A ∩ B) ∪ (A ∩ C)] = ¯ ¯ pr(B ∪ C) pr(B ∪ C)
=
¯ − pr(A ∩ B ∩ C) ¯ pr(A ∩ B) + pr(A ∩ C) . ¯ ¯ pr(B) + pr(C) − pr(B ∩ C)
¯ = pr(A ∩ B ∩ C) ¯ = 0 and since pr(B ∩ C) ¯ = pr(E11 ) = 2 , we Since pr(A ∩ C) 36 obtain ¯ = pr(AB ∪ C)
1
pr(A ∩ B) 1 36 = = . ¯ − pr(B ∩ C) ¯ 1 + 5 − 2 19 pr(B) + pr(C) 6 12 36
Solution 1.2. Let θn be the probability that a family with n children has at least one male child and at least one female child among these n children. Further, let Mn be the event that all n children are male, and let Fn be the event that all n children are female. And, note that the events Mn and Fn are mutually exclusive. Then, θn = 1 − pr(Mn ∪ Fn ) = 1 − pr(Mn ) − pr(Fn ) n−1 n n 1 1 1 − =1− . =1− 2 2 2 So, we need to find the smallest value of n, say n∗ , such that θn = 1 − It then follows that n∗ = 5.
n−1 1 ≥ 0.90. 2
19
Solutions
Solution 1.3. Define the following events: W1 : a white ball is selected from Urn 1; W2 : a white ball is selected from Urn 2; W3 : two white balls are selected from Urn 3; B1 : a black ball is selected from Urn 1; B2 : a black ball is selected from Urn 2. Then, pr(W3 ) = pr(W1 ∩ W2 ∩ W3 ) + pr(W1 ∩ B2 ∩ W3 ) + pr(B1 ∩ W2 ∩ W3 ) + pr(B1 ∩ B2 ∩ W3 ) = pr(W1 )pr(W2 W1 )pr(W3 W1 ∩ W2 ) + pr(W1 )pr(B2 W1 ) × pr(W3 W1 ∩ B2 ) + pr(B1 )pr(W2 B1 )pr(W3 B1 ∩ W2 ) + pr(B1 )pr(B2 B1 )pr(W3 B1 ∩ B2 ) = (3/7)(3/6)[(5/7)(4/6)] + (3/7)(3/6)[(4/7)(3/6)] + (4/7)(2/6)[(5/7)(4/6)] + (4/7)(4/6)[(4/7)(3/6)] = 0.3628. Solution 1.4 (a) pr(final match lasts exactly 6 games) = pr[(Player A wins 4 of first 5 games) ∩ (Player A wins sixth game)] + pr[(Player
B wins 4 of first 5 games) ∩ (Player B wins sixth game)] = C54 π4 (1 − π) (π) + C54 (1 − π)4 π (1 − π). (b) pr[(Player A wins match in 7 games)(Player A wins first 2 games)] =
pr[(Player A wins match in 7 games) ∩ (Player A wins first 2 games)] pr(Player A wins first 2 games)
Since pr[(Player A wins match in 7 games) ∩ (Player A wins first 2 games)] = pr[(Player A wins two of games #3 through #6) ∩ (Player A wins game #7) ∩ (Player A wins first 2 games)] = pr(Player A wins two of games #3 through #6) × pr(Player A wins game #7)pr(Player A wins first 2 games), it follows that pr[(Player A wins match in 7 games)(Player A wins first 2 games)] = pr(Player A wins two of games #3 through #6) × pr(Player A wins game #7)
= C42 π2 (1 − π)2 (π) = C42 π3 (1 − π)2 .
20
Basic Probability Theory
(c) pr(Player B wins final match) = pr[∪9j=5 (Player B wins match in j games)] =
9
pr[(Player B wins 4 of first(j − 1)games)
j=5
∩ (Player B wins jth game)] =
9
j−1
[C4 (1 − π)4 πj−5 ](1 − π)
j=5
=
9 1 − π 5 j−1 j C4 π . π j=5
Solution 1.5 (a) Define the following events: D: “a person has the disease of interest” A+ : “Test A is positive” B+ : “Test B is positive” Then, pr(D) = 0.01 pr(A+ D) = 1 − 0.10 = 0.90, pr(B+ D) = 1 − 0.05 = 0.95, ¯ = 0.06, pr(A+ D) and ¯ = 0.08. pr(B+ D) So, pr(DA+ ∩ B+ ) = =
pr(D ∩ A+ ∩ B+ ) pr(A+ ∩ B+ ) pr(A+ ∩ B+ D)pr(D)
¯ ¯ pr(A+ ∩ B+ D)pr(D) + pr(A+ ∩ B+ D)pr( D)
21
Solutions
=
pr(A+ D)pr(B+ D)pr(D) + D)pr( ¯ ¯ ¯ pr(A+ D)pr(B+ D)pr(D) + pr(A+ D)pr(B D)
=
(0.90)(0.95)(0.01) (0.90)(0.95)(0.01) + (0.06)(0.08)(0.99)
=
0.0086 0.0086 + 0.0048
= 0.6418. (b) pr(B+ A+ ) =
pr(A+ ∩ B+ ) pr(A+ )
=
¯ pr(A+ ∩ B+ ∩ D) + pr(A+ ∩ B+ ∩ D) + + ¯ pr(A ∩ D) + pr(A ∩ D)
=
+ D)pr( ¯ ¯ ¯ D) pr(A+ D)pr(B+ D)pr(D) + pr(A+ D)pr(B + + ¯ ¯ pr(A D)pr(D) + pr(A D)pr(D)
=
(0.90)(0.95)(0.01) + (0.06)(0.08)(0.99) (0.90)(0.01) + (0.06)(0.99)
=
0.0086 + 0.0048 = 0.1959. 0.0090 + 0.0594
(c) pr(A+ ∪ B+ D) =
pr[(A+ ∪ B+ ) ∩ D] pr(D)
=
pr[(A+ ∩ D) ∪ (B+ ∩ D)] pr(D)
=
pr(A+ ∩ D) + pr(B+ ∩ D) − pr(A+ ∩ B+ ∩ D) pr(D)
=
pr(A+ D)pr(D) + pr(B+ D)pr(D) − pr(A+ D)pr(B+ D)pr(D) pr(D)
(0.90)(0.01) + (0.95)(0.01) − (0.90)(0.95)(0.01) 0.01 0.0099 0.0090 + 0.0095 − 0.0086 = = 0.9900. = 0.01 0.01
=
Solution 1.6 (a) For i = 1, 2, 3, let Mi be the event that “machine Mi performs the PSA analysis”; and, let C be the event that “the PSA analysis is done correctly.” Then, pr(C) = pr(C ∩ M1 ) + pr(C ∩ M2 ) + pr(C ∩ M3 )
22
Basic Probability Theory
= pr(CM1 )pr(M1 ) + pr(CM2 )pr(M2 ) + pr(CM3 )pr(M3 ) = 0.99(0.20) + (0.98)(0.50) + 0.97(0.30) = 0.979. (b) ¯ ¯ ¯ ¯ = pr[(M1 ∪ M2 ) ∩ C] = pr[(M1 ∩ C) ∪ (M2 ∩ C)] pr(M1 ∪ M2 C) ¯ 1 − pr(C) pr(C) =
¯ + pr(M2 ∩ C) ¯ pr(M1 ∩ C) 1 − pr(C)
=
¯ 2 )pr(M2 ) ¯ 1 )pr(M1 ) + pr(CM pr(CM 1 − pr(C)
=
(0.01)(0.20) + (0.02)(0.50) = 0.5714. 1 − 0.979
Equivalently, ¯ ¯ = 1 − pr(M3 C) ¯ = 1 − pr(CM3 )pr(M3 ) pr(M1 ∪ M2 C) ¯ pr(C) =1−
(0.03)(0.30) = 0.5714. 0.021
(c) ¯ pr(1 of 2 PSA analyses is correct) = C21 pr(C)pr(C) = 2(0.979)(0.021) = 0.0411. Now, pr(machine M2 did not perform both PSA analyses1 of 2 PSA analyses is correct) = 1 − pr(machine M2 performed both PSA analyses1 of 2 PSA analyses is correct)
¯ 2 ) [pr(M2 )]2 C21 pr(CM2 )pr(CM =1− 0.0411 =1−
2(0.98)(0.02)(0.50)2 = 0.7616. 0.0411
Solution 1.7 (a) First, pr(C2 C1 ) = pr(C1 ∩ C2 )/pr(C1 ). Now, ¯ pr(C1 ) = pr(C2 ) = pr(C2 ∩ D) + pr(C2 ∩ D) ¯ ¯ D) = pr(C2 D)pr(D) + pr(C2 D)pr( = π1 θ + π0 (1 − θ) = θ(π1 − π0 ) + π0 .
23
Solutions
And, appealing to the conditional independence of the events C1 and C2 given disease status, we have ¯ ¯ pr(C1 ∩ C2 ) = pr(C1 ∩ C2 D)pr(D) + pr(C1 ∩ C2 D)pr( D) ¯ ¯ ¯ = pr(C1 D)pr(C2 D)pr(D) + pr(C1 D)pr(C 2 D)pr(D) = π21 θ + π20 (1 − θ) = θ(π21 − π20 ) + π20 . Finally, pr(C2 C1 ) =
θ(π21 − π20 ) + π20
θ(π1 − π0 ) + π0
,
so that, in this example, pr(C2 C1 ) = pr(C2 ). More generally, this particular example illustrates the general principle that conditional independence between two events does not allow one to conclude that they are also unconditionally independent. (b) Now, pr(C2 C1 ) = pr(C2 ) ⇔ θ(π21 − π20 ) + π20 = [θ(π1 − π0 ) + π0 ]2 , which is equivalent to the condition θ(1 − θ)(π1 − π0 )2 = 0. So, pr(C2 C1 ) = pr(C2 ) when either θ = 0 (i.e., the prevalence of the disease in the population is equal to zero, so that nobody in the population has the disease), θ = 1 (i.e., the prevalence of the disease in the population is equal to one, so that everybody in the population has the disease), or the probability of a correct diagnosis does not depend on disease status [i.e., since pr(C1 ) = pr(C2 ) = θ(π1 − π0 ) + π0 , the condition π1 = π0 gives pr(C1 ) = pr(C2 ) = π1 = π0 ]. Solution 1.8. Now, πk = 1 − pr(no matching sets of 5 numbers in k drawings), so that πk = 1 −
C40 5 −1
C40 5 −2
C40 C40 5 5 k−1 40 j=1 C5 − j =1− (k−1) . C40 5
···
C40 5 − (k − 1)
C40 5
Solution 1.9 (a) For i = 1, 2, . . . , k, let Ai be the event that the ith person calls a dental office that is different from the dental offices called by the preceding (i − 1) people. Then, α = pr ∩ki=1Ai
24
Basic Probability Theory
= =
nn − 1n − 2 n
n
[n!/(n − k)!] nk
n
n − (k − 1) ··· n
.
When n = 7 and k = 4, then α = 0.350. (b) For j = 1, 2, . . . , n, let Bj be the event that all k people call the jth dental office. Then,
n pr(Bj ) β = pr ∪nj=1 Bj = j=1
=
n k j=1
1 n
=
1 nk−1
.
When n = 7 and k = 4, then β = 0.003. Solution 1.10. For j = 0, 1, . . . , (k − 2), there are exactly (k − j − 1) pairs of slots for which the integer 1 precedes the integer k and for which there are exactly j integers between the integers 1 and k. Also, the integer k can precede the integer 1, and the other (k − 2) integers can be arranged in the remaining (k − 2) slots in (k − 2)! ways. So, θj =
2(k − j − 1) 2(k − j − 1)[(k − 2)!] = , k! k(k − 1)
j = 0, 1, . . . , (k − 2).
Solution 1.11. For i = 1, 2, . . . , 6, let Ai be the event that the number i does not appear in n rolls of this balanced die. Then, θn = 1 − pr ∪6i=1Ai , where pr ∪6i=1Ai may be calculated using Result (ii) on page 4. By symmetry, pr(A1 ) = pr(A2 ) = · · · = pr(A6 ) and pr(Ai1 ∩ Ai2 ∩ · · · ∩ Aik ) = pr(∩ki=1Ai ), (1 ≤ i1 < i2 < · · · < ik ≤ 6). Thus: pr ∪6i=1Ai = C61 [pr(A1 )] − C62 [pr(∩2i=1Ai )] + C63 [pr(∩3i=1Ai )] − C64 [pr(∩4i=1Ai )] + C65 [pr(∩5i=1Ai )] n n n n n 4 3 2 1 5 − 15 + 20 − 15 +6 . =6 6 6 6 6 6
When n = 10, θ10 ≈ (1 − 0.73) = 0.27.
25
Solutions
Solution 1.12. Let Ai denote the event that the first i numbers selected are different from one another, i = 2, 3, . . . , n. Note that An ⊂ An−1 ⊂ · · · ⊂ A3 ⊂ A2 . So, pr(An ) = pr(first n numbers selected are different from one another) ⎤ ⎡ n = pr ⎣ Ai ⎦ i=2
⎤ n−1 Ai ⎦ = pr(A2 )pr(A3 A2 )pr(A4 A2 ∩ A3 ) · · · pr ⎣An i=2
2 3 (n − 1) 1 1− 1− ··· 1 − = 1− N N N N =
(n−1)
1−
j=1
=
⎡
j N
=
N−1 N
N−2 N − (n − 1) ··· N N
N! . (N − n)! N n
For N = 10 and n = 4,
pr(A4 ) =
3 1 2 3 j 1− = 1− 1− 1− 10 10 10 10
j=1
=
10! = 0.504. (10 − 4)!(10)4
Solution 1.13 (a) We have θwr = 1 − pr(all n numbers have values less than N) N−1 n =1− . N (b) We have θwor = 1 − pr(all n numbers have values less than N) =1− =1−
C10 CN−1 n CN n n (N − n) = . N N
26
Basic Probability Theory
(c) First, note that N−1 n n − 1− N N n N−1 N−n = − . N N
δn = (θwor − θwr ) =
Now, for n = 2, N−1 2 N−2 − N N 1 1
= 2 (N − 1)2 − N(N − 2) = 2 > 0. N N
δ2 =
Then, assuming δn > 0, we have
N − 1 n+1 N − (n + 1) − N N n N−1 N−n 1 N−1 = − + N N N N N−1 n N−n 1 N−1 n 1 = − − + N N N N N N−1 n 1 N−1 n N−n = − + 1− N N N N n N−1 1 = δn + 1− > 0, N N
δn+1 =
which completes the proof by induction. Therefore, sampling without replacement has a higher probability than sampling with replacement of selecting the ball labeled with the number N. Solution 1.14 (a) Let T1 be the event that tunnel 1 is closed to traffic, let T2 be the event that tunnel 2 is closed to traffic, and let T3 be the event that tunnel 3 is closed to traffic. If α is the probability that exactly one tunnel is closed to traffic, then, since the events T1 , T2 , and T3 are mutually independent, it follows that α = pr(T1 )pr(T¯ 2 )pr(T¯ 3 ) + pr(T¯ 1 )pr(T2 )pr(T¯ 3 ) + pr(T¯ 1 )pr(T¯ 2 )pr(T3 ) = π1 (1 − π2 )(1 − π3 ) + (1 − π1 )π2 (1 − π3 ) + (1 − π1 )(1 − π2 )π3 .
27
Solutions
(b) Define the following five mutually exclusive events: A: no inclement weather and all tunnels open to traffic B: inclement weather and all tunnels are open to traffic C: inclement weather and at least one tunnel is closed to traffic D: no inclement weather and exactly one tunnel is closed to traffic E: no inclement weather and at least two tunnels are closed to traffic If β is the probability of an excellent traffic flow rate, then β = pr(A) = (1 − θ)(1 − π1 )(1 − π2 )(1 − π3 ). And, if γ is the probability of a marginal traffic flow rate, then γ = pr(B) + pr(D) = θ(1 − π1 )(1 − π2 )(1 − π3 ) + (1 − θ)α. Finally, if δ is the probability of a poor traffic flow rate, then δ = pr(C) + pr(E) = θ[1 − (1 − π1 )(1 − π2 )(1 − π3 )] + (1 − θ)[1 − (1 − π1 )(1 − π2 )(1 − π3 ) − α] = 1 − β − γ. (c) Using the event definitions given in part (b), we have pr(BB ∪ D) =
pr[B ∩ (B ∪ D)] pr(B ∪ D)
=
pr(B) pr[B ∪ (B ∩ D)] = γ γ
=
θ(1 − π1 )(1 − π2 )(1 − π3 ) . γ
Solution 1.15 (a) Let Ax be the event that it takes x tosses of this unbalanced coin to obtain the first head. Then, pr(Ax ) = pr{[first (x − 1) tosses are tails] ∩ [xth toss is a head]} = (1 − π)x−1 π,
x = 1, 2, . . . , ∞.
Now, letting θ = pr(Bonnie and Clyde each require the same number of tosses to obtain the first head), we have θ = pr{∪∞ x=1 [(Bonnie requires x tosses) ∩ (Clyde requires x tosses)]} =
∞
[pr(Ax )]2 =
x=1
∞ x=1
[(1 − π)x−1 π]2
28
Basic Probability Theory =
2 2 ∞ π π (1 − π)2 [(1 − π)2 ]x = 1−π 1−π 1 − (1 − π)2 x=1
π = . (2 − π) (b) By symmetry, pr(Bonnie requires more tosses than Clyde to obtain the first head) = pr(Clyde requires more tosses than Bonnie to obtain the first head) = γ, say. Thus, since (2γ + θ) = 1, it follows that π 1 − 2−π (1 − π) (1 − θ) = = . γ= 2 2 (2 − π) To illustrate a more complicated approach, γ=
∞ ∞
pr[(Clyde needs x tosses for first head) ∩ (Bonnie needs
x=1 y=x+1
y tosses for first head)] =
∞
x=1
=
∞
[(1 − π)x−1 π]
[(1 − π)y−1 π]
y=x+1
∞ (1 − π)x π2 (1 − π)x (1 − π) 1 − (1 − π) x=1
∞ π π [(1 − π)2 ]x = = (1 − π) (1 − π) x=1
=
(1 − π)2 1 − (1 − π)2
(1 − π) . (2 − π)
Solution 1.16. For x = 0, 1, . . . , 5, let A be the event that exactly 5 of these 15 students scored higher than the 80th percentile, and let Bx be the event that exactly x females and exactly (5 − x) males scored higher than the 80th percentile. So, pr(Bx A) =
=
=
pr(Bx ) pr(A ∩ Bx ) = pr(A) pr(A)
C8x πx (1 − π)8−x C75−x π5−x (1 − π)2+x 5 10 C15 5 π (1 − π)
C8x C75−x C15 5
x = 0, 1, . . . , 5.
,
Thus, θ=
5 C8 C7 x 5−x x=3
C15 5
=
(1176 + 490 + 56) = 0.5734. 3003
29
Solutions
Solution 1.17. For i = 1, 2, 3, 4, let the event Ai be the event that a hand contains all three face cards of the ith suit. Note that the event of interest is ∪4i=1Ai . So, C33 C49 10
pr(Ai ) =
C52 13
,
i = 1, 2, 3, 4.
For i = j, pr(Ai ∩ Aj ) =
C66 C46 7 C52 13
;
for i = j = k, pr(Ai ∩ Aj ∩ Ak ) =
C99 C43 4 C52 13
;
and, for i = j = k = l, pr(Ai ∩ Aj ∩ Ak ∩ Al ) =
40 C12 12 C1
C52 13
.
Then, using Result (ii) on page 4, we have pr(A) = pr ∪4i=1Ai =
4
4 m−1 C52−3m m=1 Cm (−1) 13−3m , C52 13
which is equal to 0.0513. Solution 1.18∗ . Let W denote the event that the player wins the game and let X denote the number obtained on the first roll. So, pr(W) =
12
pr(WX = x)pr(X = x)
x=2
=
12
pr(WX = x)
x=2
min(x − 1, 13 − x) 36
2 (x − 1) 1 pr(WX = x) + (0) + 36 36 36 6
= (0)
x=4
2 1 (13 − x) 6 + + (1) + (0) pr(WX = x) 36 36 36 36 10
+ (1)
x=8
(x − 1) 2 pr(WX = x) +2 , 9 36 6
=
x=4
30
Basic Probability Theory
since the pairs of numbers “4 and 10,” “5 and 9,” and “6 and 8” lead to the same result. So, for x = 4, 5, or 6, let πx = pr(number x is rolled before number 7 is rolled). Then, πx =
∞ [pr(any number but x or 7 is rolled)](j−1) pr(number x is rolled) j=1
=
∞
1−
j=1
=
6 (j−1) (x − 1) (x − 1) − 36 36 36
∞ (x − 1) 31 − x (j−1) 36 36 j=1
(31 − x) −1 (x − 1) (x − 1) = 1− = , 36 36 (x + 5)
x = 4, 5, 6.
So, pr(W) = = =
6 6 2 1 2 1 (x − 1)2 πx (x − 1) = + + 9 18 9 18 (x + 5)
2 1 + 9 18
x=4
9 16 25 + + 9 10 11
x=4
= 0.4931.
Thus, the probability of the house winning the game is (1−0.4931)=0.5069; so, as expected with any casino game, the house always has the advantage. However, relative to many other casino games (e.g., blackjack, roulette, slot machines), the house advantage of (0.5069 − 0.4931) = 0.0138 is relatively small. Solution 1.19∗ (a) Let E be the event that the first worker randomly selected is a highly exposed worker. Then, θn = (1 − θn−1 )pr(E) + θn−1 [1 − pr(E)] = (1 − θn−1 )πh + θn−1 (1 − πh ) = πh + θn−1 (1 − 2πh ), with θ0 ≡ 1. (b) Now, assuming that θn = α + βγn and using the result in part (a), we have α + βγn = πh + (α + βγn−1 )(1 − 2πh ) = πh + (1 − 2πh )α + (1 − 2πh )βγn−1 , with the restriction that (α + β) = 1 since θ0 ≡ 1. Thus, we must have α = β = 12 and γ = (1 − 2πh ), giving θn =
1 1 + (1 − 2πh )n , n = 1, 2, . . . , ∞. 2 2
31
Solutions
Finally, when πh = 0.05, θ50 = 12 + 12 [1 − 2(0.05)]50 = 0.5026. Solution 1.20∗ (a) We have pr(SD, x)pr(Dx) pr(D ∩ Sx) = ¯ x)pr(Dx) ¯ pr(Sx) pr(SD, + pr(SD, x)pr(Dx)
β0 +β x π1 e β +β x 1+e 0
=
1 eβ0 +β x π0 + π
1 β +β x β +β x
pr(DS, x) =
1+e
π1 eβ0
=
1+e
0
0
+β x
π0 + π1 eβ0 +β x π1 β0 +β x π0 e = 1 β0 +β x 1+ π π e 0
=
∗ eβ0 +β x
∗
1 + eβ0 +β x
,
where β∗0 = β0 + ln (π1 /π0 ) . So, for a case–control study, since β0 = β∗0 − ln (π1 /π0 ), to estimate the risk pr(Dx) of disease using logistic regression would necessitate either knowing (or being able to estimate) the ratio of selection probabilities, namely, the ratio π1 /π0 . (b) Since
pr(Dx) = eβ0 +β x , ¯ pr(Dx)
it follows directly that
∗
θr =
eβ0 +β x
eβ0 +β x
∗ = eβ (x −x) .
Analogously, since
∗ pr(DS, x) = eβ0 +β x , ¯ pr(DS, x)
it follows directly that ∗
θc =
∗
eβ0 +β x
∗ eβ0 +β x
∗ = eβ (x −x) = θr .
Hence, we can, at least theoretically, use case–control study data to estimate risk odds ratios via logistic regression, even though we cannot estimate the risk (or probability) of disease directly without information about the quantity π1 /π0 .
32
Basic Probability Theory
There are other potential problems with the use of case–control studies in epidemiologic research. For further discussion about such issues, see Breslow and Day (1980) and Kleinbaum, Kupper, and Morgenstern (1982). Solution 1.21∗ (a) Let A be the event that Diagnostic Strategy #1 provides the correct diagnosis, let B be the event that Diagnostic Strategy #2 provides the correct diagnosis, and let D be the event that the adult has IBD. Then, ¯ = pr(AD)pr(D) + pr(AD)pr( ¯ ¯ pr(A) = pr(A ∩ D) + pr(A ∩ D) D) = [3π21 (1 − π1 ) + π31 ]θ + [3π20 (1 − π0 ) + π30 ](1 − θ) = (3π21 − 2π31 )θ + (3π20 − 2π30 )(1 − θ). And, ¯ ¯ = π1 θ + π0 (1 − θ). pr(B) = pr(BD)pr(D) + pr(BD)pr( D) Now, pr(A) − pr(B) = [(3π21 − 2π31 )θ + (3π20 − 2π30 )(1 − θ)] − [π1 θ + π0 (1 − θ)] = (3π21 − 2π31 − π1 )θ + (3π20 − 2π30 − π0 )(1 − θ) = π1 (1 − π1 )(2π1 − 1)θ + π0 (1 − π0 )(2π0 − 1)(1 − θ). So, a sufficient condition for the ranges of π1 and π0 so that pr(A)>pr(B) is 1 < π1 < 1 2
and
1 < π0 < 1. 2
In other words, if each doctor has a better than 50% chance of making the correct diagnosis conditional on disease status, then Diagnostic Strategy #1 is preferable to Diagnostic Strategy #2. (b) Let C be the event that Diagnostic Strategy #3 provides the correct diagnosis. Then, ¯ ¯ pr(C) = pr(CD)pr(D) + pr(CD)pr( D) = [4π31 (1 − π1 ) + π41 ]θ + [4π30 (1 − π0 ) + π40 ](1 − θ) = (4π31 − 3π41 )θ + (4π30 − 3π40 )(1 − θ). Since pr(A) − pr(C) = [(3π21 − 2π31 ) − (4π31 − 3π41 )]θ + [3π20 − 2π30 ) − (4π30 − 3π40 )](1 − θ) = 3π21 (1 − π1 )2 θ + 3π20 (1 − π0 )2 (1 − θ) > 0,
33
Solutions
Diagnostic Strategy #1 (using the majority opinion of three doctors) has a higher probability than Diagnostic Strategy #3 (using the majority opinion of four doctors) of making the correct diagnosis. Solution 1.22∗ . First, π1 = pr(DE) =
pr(D ∩ E) pr(E)
=
¯ pr(D ∩ E ∩ M) + pr(D ∩ E ∩ M) pr(E)
=
¯ π11 pr(E ∩ M) + π10 pr(E ∩ M) pr(E)
¯ = pr(ME)π11 + pr(ME)π 10 . Similarly, ¯ 01 + pr(M ¯ E)π ¯ 00 . π0 = pr(ME)π So, RRc =
¯ π1 pr(ME)π11 + pr(ME)π 10 = ¯ E)π ¯ 00 ¯ π0 pr(ME)π01 + pr(M
¯ pr(ME)π01 RR1 + pr(ME)π 00 RR0 ¯ ¯ ¯ pr(ME)π01 + pr(ME)π00 ¯ pr(ME)π01 + pr(ME)π 00 = RR . ¯ 01 + pr(M ¯ E)π ¯ 00 pr(ME)π =
Thus, a sufficient condition for RRc = RR is ¯ ¯ ¯ ¯ pr(ME)π01 + pr(ME)π 00 = pr(ME)π01 + pr(ME)π00 , or equivalently, ¯ ¯ ¯ ¯ [pr(ME) − pr(ME)]π 01 + [pr(ME) − pr(ME)]π00 = 0. ¯ ¯ E) ¯ = 1 − pr(ME) ¯ in the Using the relationships pr(ME) = 1 − pr(ME) and pr(M above expression, it follows that RRc = RR when ¯ [pr(ME) − pr(ME)](π 01 − π00 ) = 0. Thus, the two sufficient conditions for no confounding are ¯ pr(ME) = pr(ME)
and π01 = π00 .
Further, since ¯ ¯ pr(M) = pr(ME)pr(E) + pr(ME)pr( E),
34
Basic Probability Theory
¯ means that pr(M) = pr(ME), or equivalently, that the condition pr(ME) = pr(ME) the events E and M are independent events. Finally, the two no confounding conditions are: (i) The events E and M are independent events; ¯ (ii) pr(DE¯ ∩ M) = pr(DE¯ ∩ M). Solution 1.23∗ (a) First, pr(D1 T+ ) =
pr(D1 ∩ T+ ) pr(T+ D1 )pr(D1 ) = + + ¯ 1) pr(T ) pr(T ∩ D1 ) + pr(T+ ∩ D
=
pr(T+ D1 )pr(D1 ) + ¯ 1 )pr(D ¯ 1) pr(T D1 )pr(D1 ) + pr(T+ D
=
θ1 π1 . θ1 π1 + θ2 π2
And, ¯ 1 T+ ) = 1 − pr(D1 T+ ) = pr(D
θ2 π2 . θ1 π1 + θ2 π2
Finally, pr(D1 T+ ) θ π = 1 1 = LR12 ¯ 1 T+ ) θ2 π2 pr(D
π1 π2
.
(b) First, θ π pr(T+ D1 )pr(D1 ) = 3 1 1 , pr(D1 T+ ) = 3 + D )pr(D ) pr(T i i i=1 i=1 θi πi and so
3
¯ 1 T+ ) = 1 − pr(D1 T+ ) = i=2 pr(D 3
θi πi
i=1 θi πi
.
Finally, θ1 π1 1 pr(D1 T+ ) = θπ = ¯ 1 T+ ) 2 2 + θ3 π3 θ2 π2 + θ3 π3 pr(D θ π θ π 1 1
⎤ ⎡ −1 −1 3 π 1 ⎦ . =⎣ LR1i πi i=2
1 1
35
Solutions
And, θ π pr(D1 T+ ) = 3 1 1 = θπ
1 −1
θ1 π1 −1 1 1 + θθ1 π + θ3 π3 2 π2 ⎡ ⎤−1 −1 3 π1 ⎦ . = ⎣1 + LR1i πi i=1 i i
i=2
(c) For notational convenience, let π1 =pr(NS)=0.57, π2 =pr(A)=0.33, π3 = pr (C) = 0.10, LR12 =pr(T+ NS)/pr(T+ A)=0.30, LR13 =pr(T+ NS)/pr(T+ C) = 0.50, and LR23 =pr(T+ A)/pr(T+ C)=1.67. Following the developments given in part (b), it then follows directly that pr(NST+ )
= 0.4385
and
pr(NST+ ) = 0.3048,
pr(AT+ ) = 1.4293 ¯ +) pr(AT
and
pr(AT+ ) = 0.5883,
pr(CT+ ) = 0.1196 ¯ +) pr(CT
and
pr(CT+ ) = 0.1069.
pr(NST+ )
and
Thus, based on this particular diagnostic test, the most likely diagnosis is appendicitis for an emergency room patient with a positive test result. Solution 1.24∗ (a) The four probabilities appearing in the expression for ψc can be rewritten as follows: pr(A ∩ CB ∩ C) =
pr(A ∩ B ∩ C) pr(CA ∩ B)pr(AB)pr(B) = pr(B ∩ C) pr(B ∩ C)
pr(A ∩ CB ∩ C) = =
pr(A ∩ CB ∩ C) = =
¯ ∪ C) ¯ ∩ (B ∩ C)] ¯ ∩ B ∩ C) pr(A pr[(A = pr(B ∩ C) pr(B ∩ C) ¯ ∩ B)pr(AB)pr(B) ¯ pr(CA pr(B ∩ C) ¯ pr[(A ∩ C) ∩ (B¯ ∪ C)] pr(B ∩ C)
=
pr(A ∩ B¯ ∩ C)
¯ ¯ ¯ pr(CA ∩ B)pr(A B)pr( B) pr(B ∩ C)
pr(B ∩ C)
36
Basic Probability Theory
and pr(A ∩ CB ∩ C) = = =
¯ ∪ C) ¯ ∩ (B¯ ∪ C)] ¯ pr[(A pr(B ∩ C) ¯ ∩ B) ¯ ∪ (A ¯ ∩ C) ¯ ∪ (B¯ ∩ C) ¯ ∪ C] ¯ pr[(A pr(B ∩ C) ¯ ∩ B) ¯ + pr(C) ¯ − pr(A ¯ ∩ B¯ ∩ C) ¯ pr(A pr(B ∩ C)
,
since ¯ ∩ B) ¯ ∪ (A ¯ ∩ C) ¯ ∪ (B¯ ∩ C) ¯ ∪ C] ¯ pr[(A ¯ ∩ B) ¯ + pr(C) ¯ − pr(A ¯ ∩ B¯ ∩ C) ¯ = pr(A via use of the general formula for the union of four events. Then, inserting these four expansions into the formula for ψc and simpli¯ ∩ B) ¯ + pr(C) ¯ − pr(A ¯ ∩ B¯ ∩ C) ¯ can be fying gives the desired result, since pr(A rewritten as ¯ ∩ B) ¯ ∪ C] ¯ = pr[(A ¯ ∪ C) ¯ ∩ (B¯ ∪ C)] ¯ pr[(A ¯ ∪ C) ¯ ∩ (B¯ ∪ C) ¯ ∩ (C ∪ C)] ¯ = pr[(A ¯ ∩ B¯ ∩ C) + pr(C). ¯ ¯ ∩ B¯ ∩ C) ∪ C] ¯ = pr(A = pr[(A (b) If events A, B, and C occur completely independently of one another, then ψ = 1, pr(CA ∩ B) = pr(C), so on, so that
¯ pr(C) ψc = (1)(1) 1 + ¯ ¯ pr(A)pr( B)pr(C)
> 1.
Thus, using ψc instead of ψ introduces a positive bias. So, using ψc could lead to the false conclusion that diseases A and B are related when, in fact, they are not related at all (i.e., ψ = 1). Solution 1.25∗ (a) First, given that there is a total of (s + t) available positions in a sequence, then s of these (s + t) positions can be filled with the letter S in Cs+t ways, leaving the s remaining positions to be filled by the letter F. Under the assumption of randomness, each of these Cs+t s sequences is equally likely to occur, so that each random sequence has probability 1/Cs+t s of occurring. Now, for x an even positive integer, let x = 2y. Since the S and F runs alternate, there will be exactly y S runs and exactly y F runs, where y = 1, 2, . . . , min(s, t). The number of ways of dividing the s available S letters into y S runs is equal to Cs−1 y−1 , which is simply the number of ways of choosing (y − 1) spaces from the (s − 1) spaces between the s available S letters. Analogously, the t available F letters can be
37
Solutions
divided into y runs in Ct−1 y−1 ways. Thus, since the first run in the sequence can be either an S run or an F run, the total number of sequences that each contain exactly t−1 2y runs is equal to 2Cs−1 y−1 Cy−1 . Hence, under the assumption that all sequences containing exactly s successes (the letter S) and t failures (the letter F) are equally likely to occur, the probability of observing a sequence containing a total of exactly x = 2y runs is equal to π2y =
t−1 2Cs−1 y−1 Cy−1
Cs+t s
y = 1, 2, . . . , min(s, t).
,
Now, for x an odd positive integer, let x = (2y + 1). Either there will be (y + 1) S runs and y F runs, or there will be y S runs and (y + 1) F runs, where y = min(s, t). In the former case, since the complete sequence must begin with an S run, the t−1 total number of runs will be Cs−1 y Cy−1 ; analogously, in the latter case, the total t−1 number of runs will be Cs−1 y−1 Cy . Hence, under the assumption that all sequences containing exactly s successes (the letter S) and t failures (the letter F) are equally likely to occur, the probability of observing a sequence containing a total of exactly x = (2y + 1) runs is equal to
π2y+1 =
t−1 s−1 t−1 Cs−1 y Cy−1 + Cy−1 Cy
Cs+t s
,
y = 1, 2, . . . , min(s, t),
where Cs−1 ≡ 0 when y = s and y
Ct−1 ≡ 0 when y = t. y
(b) For the observed sequence, s = 4 and t = 3. Also, the observed total number of runs x is equal to 4; in particular, there are two S runs, one of length 1 and one of length 3, and there are two F runs, one of length 1 and one of length 2. Using the formula π2y with y = 2 gives π4 =
2C31 C21 C74
=
12 = 0.343. 35
Since this probability is fairly large, there is no statistical evidence that the observed sequence represents a deviation from randomness. Solution 1.26∗ (a) For i = 1, 2 . . . , R, let Ai be the event that the subject’s ith chip is in its correct position. Then, R R−1 R θ(0, R) = 1 − pr ∪R A pr(A ) + pr(Ai ∩ Aj ) = 1 − i i=1 i i=1
−
R−2 R−1
R
i=1 j=i+1 k=j+1
i=1 j=i+1
pr(Ai ∩ Aj ∩ Ak ) + · · · + (−1)R pr ∩R i=1Ai .
38
Basic Probability Theory
Now, for all i, pr(Ai ) = 1/R = (R − 1)!/R!. For all i < j, pr(Ai ∩ Aj ) = pr(Ai )pr(Aj Ai ) =
1 1 (R − 2)! = . R R−1 R!
And, for i < j < k, pr(Ai ∩ Aj ∩ Ak ) = pr(Ai )pr(Aj Ai )pr(Ak Ai ∩ Aj ) 1 1 (R − 3)! 1 = = . R R−1 R−2 R! In general, for r = 1, 2, . . . , R, the probability of the intersection of any subset of r of the R events A1 , A2 , . . . , AR is equal to (R − r)!/R!. Thus, we have (R − 1)! (R − 2)! + CR 2 R! R!
θ(0, R) = 1 − CR 1 − CR 3
(−1)R (R − 3)! + ··· + R! R!
=1−1+ =
1 1 (−1)R − + ··· + 2! 3! R!
R (−1)l l=0
l!
.
So,
limR→∞ θ(0, R) = limR→∞
R (−1)l l=0
=
∞ (−1)l l=0
l!
l!
= e−1 ≈ 0.368,
which is a somewhat counterintuitive answer. (b) For a particular set of r chips, let the event Br be the event that these r chips are all in their correct positions, and let CR−r be the event that none of the remaining (R − r) chips are in their correct positions. Then, pr(Br ∩ CR−r ) = pr(Br )pr(CR−r )
1 1 1 = ··· θ(0, R − r) R R−1 R − (r − 1)
R−r (R − r)! (−1)l = . R! l! l=0
39
Solutions
Finally, since there are CR r ways of choosing a particular set of r chips from a total of R chips, it follows directly that θ(r, R) = CR r
R−r (R − r)! (−1)l R! l!
R−r =
l=0
l=0
(−1)l /l! r!
,
r = 0, 1, . . . , R.
(c) The probability of interest is θ(3, 5) + θ(4, 5) + θ(5, 5) =
5 5−r (−1)l /l! r!
r=3 l=0
=
1 3!
1 2!
+0+
1 (1) 5!
1 11 1 + = = 0.0917. 12 120 120 Note, in general, that θ(r − 1, R) ≡ 0 and that R r=0 θ(r, R) = 1. =
Solution 1.27∗ (a) First, let HA be the event that Player A obtains a head before Player B when it is Player A’s turn to flip the balanced coin. In particular, if H is the event that a head is obtained when the balanced coin is flipped, and if T is the event that a tail is obtained, then pr(HA) = pr(H) + pr(T ∩ T ∩ H) + pr(T ∩ T ∩ T ∩ T ∩ H) + · · · =
1 2 1 1 + + + ··· = . 2 8 32 3
And, if HB is the event that Player A obtains a head before Player B when it is Player B’s turn to flip the balanced coin, then pr(HB ) = pr(T ∩ H) + pr(T ∩ T ∩ T ∩ H) + pr(T ∩ T ∩ T ∩ T ∩ T ∩ H) + · · · =
1 1 1 1 + + + ··· = . 4 16 64 3
Then, we move from game (a, b, A) to game (a − 1, b, B) if Player A obtains the next head before Player B (an event that occurs with probability 2/3); and, we move from game (a, b, A) to game (a, b − 1, A) if Player B obtains the next head before Player A (an event that occurs with probability 1/3). Thus, we have 2 1 π(a, b, A) = π(a − 1, b, B) + π(a, b − 1, A). 3 3
40
Basic Probability Theory
Using analogous arguments, we obtain 2 1 π(a, b, B) = π(a − 1, b, B) + π(a, b − 1, A). 3 3 (b) First, note that the following boundary conditions hold: π(0, b, A) = π(0, b, B) = 1, b = 1, 2, . . . , ∞ and π(a, 0, A) = π(a, 0, B) = 0, a = 1, 2, . . . , ∞. From part (a), we know that π(1, 1, A) = Now, π(2, 2, A) =
2 3
and π(1, 1, B) = 13 .
2 1 π(1, 2, B) + π(2, 1, A), 3 3
so that we need to know the numerical values of π(1, 2, B) and π(2, 1, A). So, 1 2 π(1, 2, B) = π(0, 2, B) + π(1, 1, A) 3 3 2 2 7 1 (1) + = ; = 3 3 3 9 and, 2 1 π(1, 1, B) + π(2, 0, A) 3 3 1 1 2 2 + (0) = . = 3 3 3 9
π(2, 1, A) =
Finally, π(2, 2, A) =
2 7 1 2 16 + = = 0.593. 3 9 3 9 27
Now, π(3, 3, A) = where
2 1 π(2, 3, B) + π(3, 2, A), 3 3
1 2 π(2, 3, B) = π(1, 3, B) + π(2, 2, A) 3 3 2 16 1 π(1, 3, B) + = 3 3 27
41
Solutions
and π(3, 2, A) =
1 2 π(2, 2, B) + π(3, 1, A). 3 3
Now, 2 1 π(1, 2, B) + π(2, 1, A) 3 3 7 2 2 11 1 + = ; = 3 9 3 9 27
π(2, 2, B) =
so, π(3, 2, A) =
11 1 2 + π(3, 1, A). 3 27 3
Since 2 1 π(1, 1, B) + π(2, 0, A) π(2, 1, B) = 3 3 1 2 1 1 + (0) = , = 3 3 3 9 we have 1 2 π(2, 1, B) + π(3, 0, A) 3 3 1 1 2 2 + (0) = . = 3 9 3 27
π(3, 1, A) =
And, since 1 2 π(0, 2, B) + π(1, 1, A) 3 3 1 2 8 2 (1) + = , = 3 3 3 9
π(1, 2, A) =
we have 2 1 π(0, 3, B) + π(1, 2, A) 3 3 2 8 25 1 (1) + = . = 3 3 9 27
π(1, 3, B) =
Finally, since π(2, 3, B) =
25 2 16 57 1 + = 3 27 3 27 81
42
Basic Probability Theory
and π(3, 2, A) = we have π(3, 3, A) =
11 1 2 24 2 + = , 3 27 3 27 81
2 57 1 24 46 + = = 0.568. 3 81 3 81 81
Clearly, this procedure can be programed to produce the numerical value of π(k, k, A) for any positive integer k. For example, the reader can verify that π(4, 4, A) = 0.556 and that π(5, 5, A) = 0.549. In general, π(k, k, A) monotonically decreases toward the value 1/2 as k becomes large, but the rate of decrease is relatively slow. Solution 1.28∗ (a) Now, α = pr(Aab B1 ) = pr(Aab ∩ Ba B1 ) + pr(Aab ∩ B¯ a B1 ) = pr(Aab Ba ∩ B1 )pr(Ba B1 ) + pr(Aab B¯ a ∩ B1 )pr(B¯ a B1 ) = (1)πa−1 + β[1 − πa−1 ] = πa−1 + β[1 − πa−1 ], since the event “Aab given B¯ a ∩ B1 ” is equivalent to the event “Aab given B¯ 1 .” More specifically, the event “B¯ a ∩ B1 ” means that the first free throw is made and that there is at least one missed free throw among the next (a − 1) free throws. And, when such a miss occurs, it renders irrelevant all the previous makes, and so the scenario becomes exactly that of starting with a missed free throw (namely, the event “B¯ 1 ”). Similarly, ¯ b B¯ 1 ) β = pr(Aab B¯ 1 ) = pr(Aab ∩ Cb B¯ 1 ) + pr(Aab ∩ C ¯ b ∩ B¯ 1 )pr(C ¯ b B¯ 1 ) = pr(Aab Cb ∩ B¯ 1 )pr(Cb B¯ 1 ) + pr(Aab C = (0)(1 − π)b−1 + α[1 − (1 − π)b−1 ] = α[1 − (1 − π)b−1 ], ¯ b ∩ B¯ 1 ” is equivalent to the event “Aab given B1 .” since the event “Aab given C Solving these two equations simultaneously, we have α = πa−1 + β[1 − πa−1 ] = πa−1 + {α[1 − (1 − π)b−1 ]}[1 − πa−1 ], giving α=
πa−1 πa−1 + (1 − π)b−1 − πa−1 (1 − π)b−1
43
Solutions
and β=
πa−1 [1 − (1 − π)b−1 ] πa−1 + (1 − π)b−1 − πa−1 (1 − π)b−1
.
Finally, it follows directly that θ(π, a, b) = πα + (1 − π)β =
πa−1 [1 − (1 − π)b ] πa−1 + (1 − π)b−1 − πa−1 (1 − π)b−1
.
When π = 0.791, a = 10, and b = 2, then θ(0.791, 10, 2) = 0.38. (b) When both π = 0.50 and a = b, then θ(0.50, a, a) = θ(0.50, b, b) = 0.50; this answer makes sense because runs of makes and misses of the same length are equally likely when π = 0.50. When a = b = 1, then θ(π, 1, 1) = π; this answer also makes sense because the event A11 (i.e., the event that the first free throw is made) occurs with probability π. Finally, once several consecutive free throws are made, the pressure to continue the run of made free throws will increase; as a result, the assumption of mutual independence among the outcomes of consecutive free throws is probably not valid and the value of π would tend to decrease. (c) Since the probability of Tyler missing b consecutive free throws before making a consecutive free throws is equal to θ(1 − π, b, a) =
(1 − π)b−1 (1 − πa ) (1 − π)b−1 + πa−1 − (1 − π)b−1 πa−1
it follows directly that θ(π, a, b) + θ(1 − π, b, a) = 1.
,
2 Univariate Distribution Theory
2.1
Concepts and Notation
2.1.1
Discrete and Continuous Random Variables
A discrete random variable X takes either a finite, or a countably infinite, number of values. A discrete random variable X is characterized by its probability distribution pX (x) = pr(X = x), which is a formula giving the probability that X takes the (permissible) value x. Hence, a valid discrete probability distribution pX (x) has the following two properties: i. 0 ≤ pX (x) ≤ 1 for all (permissible) values of x and ii. all x pX (x) = 1. A continuous random variable X can theoretically take all the real (and hence uncountably infinite) numerical values on a line segment of either finite or infinite length. A continuous random variable X is characterized by its density function fX (x). A valid density function fX (x) has the following properties: i. 0 ≤ fX (x) < +∞ for all (permissible) values of x; ii. all x fX (x) dx = 1; b iii. For −∞ < a < b < +∞, pr(a < X < b) = a fX (x) dx; and x iv. pr(X = x) = 0 for any particular value x, since x fX (x) dx = 0.
2.1.2
Cumulative Distribution Functions
In general, the cumulative distribution function (CDF) for a univariate random variable X is the function FX (x) = pr(X ≤ x), −∞ < x < +∞, which possesses the following properties: i. 0 ≤ FX (x) ≤ 1, −∞ < x < +∞; ii. FX (x) is a monotonically nondecreasing function of x; and iii. limx→−∞ FX (x) = 0 and limx→+∞ FX (x) = 1. 45
46
Univariate Distribution Theory
For an integervalued discrete random variable X, it follows that i. FX (x) = all x∗ ≤x pX (x∗ ); ii. pX (x) = pr(X = x) = FX (x) − FX (x − 1); and iii. [dFX (x)]/dx = pX (x) since FX (x) is a discontinuous function of x. For a continuous random variable X, it follows that i. FX (x) = allx∗ ≤x fX (x∗ ) dx∗ ; ii. For −∞ < a < x < b < +∞, pr(a < X < b) = FX (b) − FX (a); and iii. [dFX (x)]/dx = fX (x) since FX (x) is an absolutely continuous function of x.
2.1.3
Median and Mode
For any discrete distribution pX (x) or density function fX (x), the population median ξ satisfies the two inequalities pr(X ≤ ξ) ≥
1 2
and
pr(X ≥ ξ) ≥ 12 .
For a density function fX (x), ξ is that value of X such that ξ −∞
fX (x) dx =
1 . 2
The population mode for either a discrete probability distribution pX (x) or a density function fX (x) is a value of x that maximizes pX (x) or fX (x). The population mode is not necessarily unique, since pX (x) or fX (x) may achieve its maximum for several different values of x; in this situation, all these local maxima are called modes.
2.1.4
Expectation Theory
Let g(X) be any scalar function of a univariate random variable X. Then, the expected value E[g(X)] of g(X) is defined to be E[g(X)] =
g(x)pX (x) when X is a discrete random variable,
all x
and is defined to be E[g(X)] = g(x)fX (x) dx when X is a continuous random variable. all x
47
Concepts and Notation
Note that E[g(X)] is said to exist if E[g(X)] < +∞; otherwise, E[g(X)] is said not to exist. Some general rules for computing expectations are: i. If C is a constant independent of X, then E(C) = C; ii. E[Cg(X)] = CE[g(X)]; iii. If C1 , C2 , . . . , Ck are k constants all independent of X, and if g1 (X), g2 (X), . . . , gk (X) are k scalar functions of X, then ⎡ E⎣
k
⎤ Ci gi (X)⎦ =
i=1
k
Ci E[gi (X)];
i=1
iv. If k → ∞, then E
∞
Ci gi (X) =
i=1
when 
2.1.5 2.1.5.1
∞
i=1 Ci E[gi (X)]
∞
Ci E[gi (X)]
i=1
< +∞.
Some Important Expectations Mean
μ = E(X) is the mean of X. 2.1.5.2 Variance
√ σ2 = V(X) = E{[X − E(X)]2 } is the variance of X, and σ = + σ2 is the standard deviation of X. 2.1.5.3
Moments
More generally, if r is a positive integer, a binomial expansion of [X − E(X)]r gives ⎧ ⎫ r r ⎨ ⎬ Crj X j [−E(X)]r−j = Crj (−1)r−j E(X j )[E(X)]r−j , E{[X − E(X)]r } = E ⎩ ⎭ j=0
j=0
where E{[X − E(X)]r } is the rth moment about the mean. For example, for r = 2, we obtain E{[X − E(X)]2 } = V(X) = E(X 2 ) − [E(X)]2 ;
48
Univariate Distribution Theory
and, for r = 3, we obtain E{[X − E(X)]3 } = E(X 3 ) − 3E(X 2 )E(X) + 2[E(X)]3 , which is a measure of the skewness of the distribution of X. 2.1.5.4
Moment Generating Function
MX (t) = E(etX ) is called the moment generating function for the random variable X, provided that MX (t) < +∞ for t in some neighborhood of 0 [i.e., for all t ∈ (− , ), > 0]. For r a positive integer, and with E(X r ) defined as the rth moment about the origin (i.e., about 0) for the random variable X, then MX (t) can be used to generate moments about the origin via the algorithm dr MX (t) = E(X r ). dtr t=0 More generally, for r a positive integer, the function M∗X (t) = E et[X−E(X)] = e−tE(X) MX (t) can be used to generate moments about the mean via the algorithm dr M∗X (t) = E{[X − E(X)]r }. dtr t=0 2.1.5.5
Probability Generating Function
If we let et equal s in MX (t) = E(etX ), we obtain the probability generating function PX (s) = E(sX ). Then, for r a positive integer, and with
X! = E[X(X − 1)(X − 2) · · · (X − r + 1)] E (X − r)!
defined as the rth factorial moment for the random variable X, then PX (s) can be used to generate factorial moments via the algorithm
dr PX (s) X! = E . dsr s=1 (X − r)! As an example, the probability generating function PX (s) can be used to find the variance of X when V(X) is written in the form V(X) = E[X(X − 1)] + E(X) − [E(X)]2 .
49
Concepts and Notation
2.1.6 2.1.6.1
Inequalities Involving Expectations Markov’s Inequality
If X is a nonnegative random variable [i.e., pr(X ≥ 0) = 1], then pr(X > k) ≤ E(X)/k for any constant k > 0. As a special case, for r > 0, if X = Y − E(Y)r when Y is any random variable, then, with νr = E [Y − E(Y)r ], we have # νr " pr Y − E(Y)r > k ≤ , k or equivalently with k = tr νr ,
1/r ≤ t−r , pr Y − E(Y) > tνr
t > 0.
For r = 2, we obtain Tchebyshev’s Inequality, namely,
$ pr Y − E(Y) > t V(Y) ≤ t−2 , t > 0. 2.1.6.2
Jensen’s Inequality
Let X be a random variable with E(X) < ∞. If g(X) is a convex function of X, then E[g(X)] ≥ g[E(X)], provided that E[g(X)] < ∞. If g(X) is a concave function of X, then the inequality is reversed, namely, E[g(X)] ≤ g[E(X)]. 2.1.6.3
Hölder’s Inequality
Let X and Y be random variables, and let p, 1 < p < ∞, and q, 1 < q < ∞, satisfy the restriction 1/p + 1/q = 1. Then, " #1/p " #1/q E(XY) ≤ E(Xp ) E(Yq ) . As a special case, when p = q = 2, we obtain the Cauchy–Schwartz Inequality, namely, $ E(XY) ≤ E(X 2 )E(Y 2 ).
2.1.7 2.1.7.1
Some Important Probability Distributions for Discrete Random Variables Binomial Distribution
If X is the number of successes in n trials, where the trials are conducted independently with the probability π of success remaining the same from trial to trial, then pX (x) = Cnx πx (1 − π)n−x ,
x = 0, 1, . . . , n and
0 < π < 1.
50
Univariate Distribution Theory
When X ∼ BIN(n, π), then E(X) = nπ, V(X) = nπ(1 − π), and MX (t) = [πet + (1 − π)]n . When n = 1, X has the Bernoulli distribution. 2.1.7.2
Negative Binomial Distribution
If Y is the number of trials required to obtain exactly k successes, where k is a specified positive integer, and where the trials are conducted independently with the probability π of success remaining the same from trial to trial, then y−1
pY (y) = Ck−1 πk (1 − π)y−k ,
y = k, k + 1, . . . , ∞ and
0 < π < 1.
When Y ∼ NEGBIN(k, π), then E(Y) = k/π, V(Y) = k(1 − π)/π2 , and
πet MY (t) = 1 − (1 − π)et
k .
In the special case when k = 1, then Y has a geometric distribution, namely, pY (y) = π(1 − π)y−1 ,
y = 1, 2, . . . , ∞ and
0 < π < 1.
When Y ∼ GEOM(π), then E(Y) = 1/π, V(Y) = (1 − π)/π2 , and MY (t) = πet /[1 − (1 − π)et ]. When X ∼ BIN(n, π) and when Y ∼ NEGBIN(k, π), then pr(X < k) = pr(Y > n). 2.1.7.3
Poisson Distribution
As a model for rare events, the Poisson distribution can be derived as a limiting case of the binomial distribution as n → ∞ and π → 0 with λ = nπ held constant; this limit is pX (x) =
λx e−λ , x!
x = 0, 1, . . . , ∞ and
λ > 0.
When X ∼ POI(λ), then E(X) = V(X) = λ and MX (t) = eλ(e −1) . t
2.1.7.4
Hypergeometric Distribution
Suppose that a finitesized population of size N(< +∞) contains a items of Type A and b items of Type B, with (a + b) = N. If a sample of n(< N) items is randomly selected without replacement from this population of N items, then the number X of items of Type A contained in this sample of n items has the hypergeometric distribution, namely, pX (x) =
Cax Cbn−x Ca+b n
=
Cax CN−a n−x CN n
,
max(0, n − b) ≤ X ≤ min(n, a).
51
Concepts and Notation
When X ∼ HG(a, N − a, n), then a a N − a N − n E(X) = n and V(X) = n . N N N N−1 2.1.8
Some Important Distributions (i.e., Density Functions) for Continuous Random Variables
2.1.8.1
Normal Distribution
The normal distribution density function is fX (x) = √
1
e−(x−μ)
2 /2σ2
2πσ
−∞ < x < ∞,
,
−∞ < μ < ∞,
0 < σ2 < ∞.
When X ∼ N(μ, σ2 ), then E(X) = μ, V(X) = σ2 , and MX (t) = eμt+σ t /2 . Also, when X ∼ N(μ, σ2 ), then the standardized variable Z = (X − μ)/σ ∼ N(0, 1), with density function 2 2
1 2 fZ (z) = √ e−z /2 , 2π 2.1.8.2
−∞ < z < ∞.
Lognormal Distribution
When X ∼ N(μ, σ2 ), then the random variable Y = eX has a lognormal distribution, with density function fY (y) = √
1
e−[ln(y)−μ]
2 /2σ2
2πσy
,
0 < y < ∞,
When Y ∼ LN(μ, σ2 ), then E(Y) = eμ+(σ 2.1.8.3
2 /2)
−∞ < μ < ∞,
0 < σ2 < ∞.
and V(Y) = [E(Y)]2 (eσ − 1). 2
Gamma Distribution
The gamma distribution density function is fX (x) =
xβ−1 e−x/α , Γ(β)αβ
0 < x < ∞,
0 < α < ∞,
0 < β < ∞.
When X ∼ GAMMA(α, β), then E(X) = αβ, V(X) = α2 β, and MX (t) = (1 − αt)−β . The Gamma distribution has two important special cases: i. When α = 2 and β = ν/2, then X ∼ χ2ν (i.e., X has a chisquared distribution with ν degrees of freedom). When X ∼ χ2ν , then ν
fX (x) =
x 2 −1 e−x/2 % & , Γ 2ν 2ν/2
0 < x < ∞ and
ν a positive integer;
52
Univariate Distribution Theory
also, E(X) = ν, V(X) = 2ν, and MX (t) = (1 − 2t)−ν/2 . And, if Z ∼ N(0, 1), then Z2 ∼ χ21 .
ii. When β = 1, then X has a negative exponential distribution with density function fX (x) =
1 −x/α , e α
0 < x < ∞,
0 < α < ∞.
When X ∼ NEGEXP(α), then E(X) = α, V(X) = α2 , and MX (t) = (1 − αt)−1 . 2.1.8.4
Beta Distribution
The Beta distribution density function is fX (x) =
Γ(α + β) α−1 x (1 − x)β−1 , Γ(α)Γ(β)
When X ∼ BETA(α, β), then E(X) = 2.1.8.5
0 < x < 1, α α+β
0 < α < ∞,
and V(X) =
0 < β < ∞.
αβ . (α+β)2 (α+β+1)
Uniform Distribution
The Uniform distribution density function is fX (x) =
1 , (θ2 − θ1 )
−∞ < θ1 < x < θ2 < ∞.
When X ∼ UNIF(θ1 , θ2 ), then E(X) = (etθ2 −etθ1 ) t(θ2 −θ1 ) .
(θ1 +θ2 ) , 2
V(X) =
(θ2 −θ1 )2 12
and MX (t) =
EXERCISES Exercise 2.1 (a) In a certain small group of seven people, suppose that exactly four of these people have a certain rare blood disorder. If individuals are selected at random oneatatime without replacement from this group of seven people, find the numerical value of the expected number of individuals that have to be selected in order to obtain one individual with this rare blood disorder and one individual without this rare blood disorder. (b) Now, consider a finitesized population of size N (< + ∞) in which there are exactly M (2 ≤ M < N) individuals with this rare blood disorder. Suppose that individuals are selected from this population at random oneatatime without replacement. Let the random variable X denote the number of individuals selected until exactly k (1 ≤ k ≤ M < N) individuals are selected who have this rare blood
Exercises
53
disorder. Derive an explicit expression for the probability distribution of the random variable X. (c) Given the conditions described in part (b), derive an explicit expression for the probability that the third individual selected has this rare blood disorder. Exercise 2.2. Suppose that the positive integers 1, 2, . . . , k, k ≥ 3, are arranged randomly in a horizontal line, thus occupying k slots. Assume that all arrangements of these k integers are equally likely. (a) Derive the probability distribution pX (x) of the discrete random variable X, where X is the number of integers between the integers 1 and k. Also, show directly that pX (x) is a valid discrete probability distribution. (b) Develop an explicit expression for E(X). Exercise 2.3. Consider an urn that contains four white balls and two black balls. (a) Suppose that pairs of balls are selected from this urn without replacement; in particular, the first two balls selected (each ball selected without replacement) constitute the first pair, the next two balls selected constitute the second pair, and so on. Find numerical values for E(Y) and V(Y), where Y is the number of black balls remaining in the urn after the first pair of white balls is selected. (b) Now, suppose that pairs of balls are selected from this urn with replacement in the following manner: the first ball in a pair is randomly selected, its color is recorded, and then it is returned to the urn; then, the second ball making up this particular pair is randomly selected, its color is recorded, and then it is returned to the urn. Provide an explicit expression for the probability distribution of the random variable X, the number of pairs of balls that have to be selected in this manner until exactly two pairs of white balls are obtained (i.e., both balls in each of these two pairs are white)? Exercise 2.4. To estimate the unknown size N(< +∞) of a population (e.g., the number of bass in a particular lake, the number of whales in a particular ocean, the number of birds of a specific species in a particular forest, etc.), a sampling procedure known as capture–recapture is often employed. This capture–recapture sampling method works as follows. For the first stage of sampling, m animals are randomly chosen (i.e., captured) from the population of animals under study and are then individually marked to permit future identification. Then, these m marked animals are released back into the population of animals under study. At the second stage of sampling, which occurs at some later time, n( 3 can be neglected, develop a reasonable approximation for E(X). Exercise 2.12. Suppose that the continuous random variable X has the uniform distribution fX (x) = 1, 0 < x < 1. Suppose that the continuous random variable Y is related to X via the equation Y = [− ln(1 − X)]1/3 . By relating FY (y) to FX (x), develop explicit expressions for fY (y) and E(Y r ) for r ≥ 0. Exercise 2.13. For a certain psychological test designed to measure workrelated stress level, a score of zero is considered to reflect a normal level of workrelated stress. Based on previous data, it is reasonable to assume that the score X on this psychological test can be accurately modeled as a continuous random variable with density function fX (x) =
1 (36 − x2 ), 288
−6 < x < 6,
58
Univariate Distribution Theory
where negative scores indicate lowerthannormal workrelated stress levels and positive scores indicate higherthannormal workrelated stress levels. (a) Find the numerical value of the probability that a randomly chosen person taking this psychological test makes a test score within two units of a test score of zero. (b) Develop an explicit expression for FX (x), the cumulative distribution function (CDF) for X, and then use this result to compute the exact numerical value of the probability that a randomly chosen person makes a test score greater than three in value given that this person’s test score suggests a higherthannormal workrelated stress level. (c) Find the numerical value of the probability (say, π) that, on any particular day, the sixth person taking this psychological test is at least the third person to make a test score greater than one in value. (d) Use Tchebyshev’s Inequality to find numbers L and U such that pr(L < X < U) ≥ 89 . Comment on your findings. Exercise 2.14. Suppose that the continuous random variable X has the mixture distribution fX (x) = πf1 (x) + (1 − π)f2 (x),
−∞ < x < +∞,
where f1 (x) is a normal density with mean μ1 and variance σ12 , where f2 (x) is a normal density with mean μ2 and variance σ22 , where π is the probability that X has distribution f1 (x), and where (1 − π) is the probability that X has distribution f2 (x). (a) Develop an explicit expression for PX (s), the probability generating function of the random variable X, and then use this result directly to find E(X). (b) Let π = 0.60, μ1 = 1.00, σ12 = 0.50, μ2 = 1.20, and σ22 = 0.40. Suppose that one value of X is observed, and that value of X exceeds 1.10 in value. Find the numerical value of the probability that this observed value of X was obtained from f1 (x). (c) Now, suppose that π = 1, μ1 = 0, and σ12 = 1. Find the numerical value of E(XX > 1.00). Exercise % & 2.15. If the random variable Y ∼ N(0, 1), develop an explicit expression for E Y r  when r is an odd positive integer. Exercise 2.16. Suppose that the discrete random variable Y has the negative binomial distribution y+k−1 k π (1 − π)y ,
pY (y) = Ck−1
y = 0, 1, . . . , ∞, 0 < π < 1,
with k a known positive integer. Derive an explicit expression for E[Y!/(Y − r)!] where r is a nonnegative integer. Then, use this result to find E(X) and V(X) when X = (Y + k).
59
Exercises
Exercise 2.17. Suppose that X is the concentration (in parts per million) of a certain airborne pollutant, and suppose that the random variable Y = ln(X) has a distribution that can be adequately modeled by the double exponential density function fY (y) = (2α)−1 e−y−β/α ,
−∞ < y < ∞, −∞ < β < ∞, 0 < α < ∞.
(a) Find an explicit expression for FY (y), the cumulative distribution function (CDF) associated with the density function fY (y). If α = 1 and β = 2, use this CDF to find the numerical value of pr(X > 4X > 2). (b) For the density function fY (y) given above, derive an explicit expression for a generating function φY (t) that can be used to generate the absolutevalue moments νr = E{Y − E(Y)r } for r a nonnegative integer, and then use φY (t) directly to find ν1 and ν2 = V(Y). Exercise 2.18. A certain statistical model describing the probability (or risk) Y of an adult developing leukemia as a function of lifetime cumulative exposure X to radiation (in microsieverts) is given by the equation Y = g(X) = 1 − αe−βX , 2
0 < X < +∞,
0 < α < 1,
0 < β < +∞,
where the continuous random variable X has the distribution fX (x) =
2 1/2 −x2 /2θ e , πθ
0 < x < +∞,
0 < θ < +∞.
Find an explicit expression relating average risk E(Y) to average cumulative exposure E(X). Comment on how the average risk varies as a function of α, β, and E(X). Exercise 2.19. A conceptually infinitely large population consists of a proportion π0 of nonsmokers, a proportion πl of light smokers (no more than one pack per day), and a proportion πh of heavy smokers (more than one pack per day), where (π0 + πl + πh ) = 1. Consider the following three random variables based on three different sampling schemes: 1. X1 is the number of subjects that have to be randomly selected sequentially from this population until exactly two heavy smokers are obtained. 2. X2 is the number of subjects that have to be randomly selected sequentially from this population until at least one light smoker and at least one heavy smoker are obtained. 3. X3 is the number of subjects that have to be randomly selected sequentially from this population until at least one subject from each of the three smoking categories (i.e., nonsmokers, light smokers, and heavy smokers) is obtained. (a) Develop an explicit expression for the probability distribution pX1 (x1 ) of X1 . (b) Develop an explicit expression for the probability distribution pX2 (x2 ) of X2 . (c) Develop an explicit expression for the probability distribution pX3 (x3 ) of X3 .
60
Univariate Distribution Theory
Exercise 2.20. If Y is a normally distributed random variable with mean μ and variance σ2 , then the random variable X = eY is said to have a lognormal distribution. The lognormal distribution has been used in many important practical applications, one such important application being to model the distributions of chemical concentration levels to which workers are exposed in occupational settings. (a) Using the fact that Y ∼ N(μ, σ2 ) and that X = eY , derive explicit expressions for E(X) and V(X). (b) If the lognormal random variable X = eY defined in part (a) represents the average concentration (in parts per million, or ppm) of a certain toxic chemical to which a typical worker in a certain chemical manufacturing industry is exposed over an 8hour workday, and if E(X) = V(X) = 1, find the exact numerical value of pr(X > 1), namely, the probability that such a typical worker will be exposed over an 8hour workday to an average chemical concentration level greater than 1 ppm. (c) To protect the health of workers in this chemical manufacturing industry, it is desirable to be highly confident that a typical worker will not be exposed to an average chemical concentration greater than c ppm over an 8hour workday, where c is a known positive constant specified by federal guidelines. Prove that pr(X ≤ c) ≥ (1 − α),
0 < α < 0.50,
if E(X) ≤ ce−0.50z1−α , 2
where pr(Z ≤ z1−α ) = (1 − α) when Z ∼ N(0, 1). The implication of this result is that it is possible to meaningfully reduce the chance that a worker will be exposed over an 8hour workday to a high average concentration of a potentially harmful chemical by sufficiently lowering the mean concentration level E(X), given the assumption that Y = ln(X) ∼ N(μ, σ2 ). Exercise 2.21. Let X be a discrete random variable such that θx = pr(X = x) = απx ,
x = 1, 2, . . . , +∞,
0 < π < 1,
and let θ0 = pr(X = 0) = 1 −
∞
απx .
x=1
Here, α is an appropriately chosen positive constant. (a) Develop an explicit expression for MX (t) = E(etX ), and then use this expression to find E(X). Be sure to specify appropriate ranges for α and t. (b) Verify your answer for E(X) in part (a) by computing E(X) directly.
61
Exercises
Exercise 2.22. A popular dimensionless measure of the skewness (or “asymmetry”) of a density function fX (x) is the quantity α3 =
μ3
= 3/2 μ2
E{[X − E(X)]3 } . [V(X)]3/2
As a possible competitor to α3 , a new dimensionless measure of asymmetry, denoted α∗3 , is proposed, where E(X) − θ α∗3 = √ ; V(X) here, θ is defined as the mode of the density function fX (x), namely, that unique value of x (if it exists) that maximizes fX (x). For the gamma density function fX (x) =
xβ−1 e−x/α , Γ(β)αβ
0 < x < ∞,
α > 0,
β > 0,
develop explicit expressions for α3 and α∗3 , and comment on the findings. Exercise 2.23∗ . Environmental scientists typically use personal exposure monitors to measure the average daily concentrations of chemicals to which workers are exposed during 8h work shifts. In certain situations, some average concentration levels are very low and so fall below a known detection limit L(>0) defined by the type of personal monitor being used; such unobservable average concentration levels are said to be leftcensored. To deal with this missing data problem, one suggested ad hoc approach is to replace such leftcensored √ average concentration levels with some numerical function g(L)(>0) of L, say, L/ 2, L/2, or even L itself. To study the statistical ramifications of such an ad hoc approach, let X(≥0) be a continuous random variable representing the average concentration level for a randomly chosen worker in a certain industrial setting; further, assume that X has the distribution fX (x) with mean E(X) and variance V(X). Then, define the random variable U=X
if X ≥ L and U = g(L)
if X < L.
∞
(a) If π = pr(X ≥ L) = L fX (x) dx, show that E(U) = (1 − π)g(L) + πE(XX ≥ L) and that " #2 . V(U) = π V(XX ≥ L) + (1 − π) g(L) − E(XX ≥ L) (b) Find an explicit expression for the optimal choice for g(L) such that E(U) = E(X), which is a very desirable equality when using U as a surrogate for X. If fX (x) = e−x , x ≥ 0, and L = 0.05, find the exact numerical value of this optimal choice for g(L).
62
Univariate Distribution Theory
Exercise 2.24∗ . Suppose that X ∼ N(μ, σ2 ). Develop an explicit expression for E(Y) when Y = 1 − αe−βX , 2
0 < α < 1,
0 < β < +∞.
Exercise 2.25∗ . The cumulant generating function for a random variable X is defined as ψX (t) = ln[MX (t)], where MX (t) = E(etX ) is the moment generating function of X; and, the rth cumulant κr is the coefficient of tr /r! in the series expansion ψX (t) = ln[MX (t)] =
∞
κr
r=1
tr . r!
(a) If Y = (X − c), where c is a constant independent of X, what is the relationship between the cumulants of Y and the cumulants of X? (b) Find the cumulants of X when X is distributed as: (i) N(μ, σ2 ); (ii) POI(λ); (iii) GAMMA(α, β). (c) In general, show that κ1 = E(X), that κ2 = V(X), and that κ3 = E{[X − E(X)]3 }. Exercise 2.26∗ . In the branch of statistics known as “survival analysis,” interest concerns a continuous random variable T (0 < T < ∞), the time until an event (such as death) occurs. For example, in a clinical trial evaluating the effectiveness of a new remission induction chemotherapy treatment for leukemia, investigators may wish to model the time (in months) in remission (or, equivalently, the time to the reappearance of leukemia) for patients who have received this chemotherapy treatment and who have gone into remission. In such settings, rather than modeling T directly, investigators will often model the hazard function, h(t), defined as h(t) = lim
Δt→0
pr(t ≤ T ≤ t + ΔtT ≥ t) , Δt
t > 0.
The hazard function, or “instantaneous failure rate,” is the limiting value (as Δt → 0) of the probability per unit of time of the occurrence of the event of interest during a small time interval [t, t + Δt] of length Δt, given that the event has not occurred prior to time t. (a) If fT (t) ≡ f(t) is the density function of T and if FT (t) ≡ F(t) is the corresponding CDF, show that h(t) =
f(t) , S(t)
63
Exercises
where S(t) = [1 − F(t)] is called the survival function and is the probability that the event of interest does not occur prior to time t. (b) Using the result in part (a), show that S(t) = e−H(t) , where H(t) = 0t h(u) du is the cumulative hazard function. (c) Prove that E(T) = 0∞ S(t) dt. (d) Due to funding restrictions, the chemotherapy clinical trial described above is to be terminated after a fixed period of time c (in months). Suppose that patients remain in the trial until either their leukemia reappears or the clinical trial ends (i.e., assume that there is no loss to followup, so that all patients either come out of remission or remain in remission until the trial ends). The observed time on study for each patient is therefore X = min(T, c), where T denotes the time in remission. Show that E[H(X)] = FT (c), where H(·) is the cumulative hazard function for T. For further details about survival analysis, see Hosmer, Lemeshow, and May (2008) and Kleinbaum and Klein (2005). Exercise 2.27∗ . A certain drug company produces and sells a popular insulin for the treatment of diabetes. At the beginning of each calendar year, the company produces a very large number of units of the insulin (where a unit is a dosage amount equivalent to one injection of the insulin), the production goal being to closely meet patient demand for the insulin during that year. The company makes a net gain of G dollars for each unit sold during the year, and the company suffers a net loss of L dollars for each unit left unsold during the year. Further, suppose that the total number X of units of insulin (if available) that patients would purchase during the year can be modeled approximately as a continuous random variable with probability density function fX (x), x > 0. (a) If N is the total number of units of the insulin that should be produced at the beginning of the year to maximize the expected value of the profit P of the company for the entire year, show that N satisfies the equation FX (N) =
G , (G + L)
where FX (x) = pr(X ≤ x) is the CDF of the random variable X. (b) Compute the value of N if G = 4, L = 1, and −10 )x2
fX (x) = (2 × 10−10 )xe−(10
,
x > 0.
64
Univariate Distribution Theory
Exercise 2.28∗ . Suppose that a particular automobile insurance company adopts the following strategy with regard to setting the value of yearly premiums for coverage. Any policy holder must pay a premium of P1 dollars for the first year of coverage. If a policy holder has a perfect driving record during this first year of coverage (i.e., this policy holder is not responsible for any traffic accidents or for any traffic violations during this first year of coverage), then the premium for the second year of coverage will be reduced to αP1 , where 0 < α < 1. However, if this policy holder does not have a perfect driving record during the first year of coverage, then the premium for the second year of coverage will be increased to βP1 , where 1 < β < +∞. More generally, let π, 0 < π < 1, be the probability that any policy holder has a perfect driving record during any particular year of coverage, and assume that any policy holder’s driving record during any one particular year of coverage is independent of his or her driving record during any other year of coverage. Then, in general, for k = 2, 3, . . . , ∞, let Pk−1 denote the premium for year (k − 1); thus, the premium Pk for year k will equal αPk−1 with probability π, and will equal βPk−1 with probability (1 − π). (a) For k = 2, 3, . . . , ∞, develop an explicit expression for E(Pk ), the average yearly premium for the kth year of coverage for any policy holder. (b) This insurance company cannot afford to let the average yearly premium for any policy holder be smaller than a certain value, say, P∗ . Find an expression (as a function of P1 , P∗ , β, and π) for the smallest value of α, say α∗ , such that the average yearly premium for year k is not less than P∗ . Then, consider the limiting value of α∗ as k → ∞; compute the numerical value of this limiting value of α∗ when π = 0.90 and β = 1.05, and then comment on your findings. Exercise 2.29∗ . Suppose that the discrete random variable X has the probability distribution pX (x) = pr(X = x) =
R−x 1 (−1)l , x! l!
x = 0, 1, . . . , R,
l=0
where R(>1) is a positive integer. (a) Use an inductive argument to show that
R
x=0 pX (x) = 1.
(b) Find explicit expressions for E(X) and V(X). Also, find limR→∞ pX (x). Comment on all these findings. Exercise 2.30∗ . Suppose that the number XT of incident (i.e., new) lung cancer cases developing in a certain diseasefree population of size N during a time interval of length T (in years) has the Poisson distribution pXT (x) =
(NTλ)x e−(NTλ) , x!
x = 0, 1, . . . , ∞;
N > 0,
λ > 0,
T > 0.
Here, N and T are known constants, and the parameter λ is the unknown rate of lung cancer development per personyear (a quantity often referred to as the “incidence density” by epidemiologists).
65
Exercises
(a) Starting at time zero, let the continuous random variable Wn be the length of time in years that passes until exactly n lung cancer cases have developed. Wn is referred to as the “waiting time” until the nth lung cancer case has developed. By expressing the CDF FWn (wn ) of the random variable Wn in terms of a probability statement about the Poisson random variable XT , develop an explicit expression for the density function of the random variable Wn . consider the (b) With XT ∼ POI(NTλ), $ Z = [XT − E(XT )]/ V(XT ). Show that
standardized
random
variable
2 lim E(etZ ) = et /2 ,
N→∞
which is the moment generating function of a standard normal random variable. Then, if N = 105 and λ = 10−4 , use the above result to provide a reasonable value for the probability of observing no more than 90 new cases of lung cancer in any 10year period of time. Exercise 2.31∗ . Important computational aids for the numerical evaluation of incomplete integrals of gamma and beta distributions involve expressing such integrals as sums of probabilities of particular Poisson and binomial distributions. (a) Prove that ∞ β−1 −x/α β−1 x (c/α)j e dx = e−c/α , β j! Γ(β)α c j=0
where α > 0 and c > 0 and where β is a positive integer. (b) Prove that c
α+β−1 α+β−1 Γ(α + β) α−1 x (1 − x)β−1 dx = Ci ci (1 − c)α+β−1−i , 0 Γ(α)Γ(β) i=α
where α and β are positive integers and where 0 < c < 1. Exercise 2.32∗ . Suppose that the probability that a sea turtle nest contains n eggs is equal to (1 − π)πn−1 , where n = 1, 2, . . . , ∞ and 0 < π < 1. Furthermore, each egg in any such nest has probability 0.30 of producing a live and healthy baby sea turtle, completely independent of what happens to any other egg in that same nest. Finally, because of predators (e.g., sea birds and other sea creatures) and other risk factors (e.g., shore erosion, harmful environmental conditions, etc.), each such live and healthy baby sea turtle then has probability 0.98 of NOT surviving to adulthood. (a) Find the exact numerical value of the probability that any egg produces an adult sea turtle. (b) Derive an explicit expression for the probability α that a randomly chosen sea turtle nest produces at least one adult sea turtle. Find the exact numerical value of α when π = 0.20.
66
Univariate Distribution Theory
(c) Suppose that a randomly chosen sea turtle nest is known to have produced exactly k adult sea turtles, where k ≥ 0. Derive an explicit expression for the probability βnk that this randomly chosen sea turtle nest originally contained exactly n eggs, n ≥ 1. Find the exact numerical value of βnk when π = 0.20, k = 2, and n = 6. Exercise 2.33∗ (a) Prove Pascal’s Identity, namely, + Cn−1 Cnk = Cn−1 k−1 k for any positive integers n and k such that Cnk ≡ 0 if k > n. (b) Prove Vandermonde’s Identity, namely,
= Cm+n r
r
n Cm r−k Ck ,
k=0
where m, n, and r are nonnegative integers satisfying r ≤ min{m, n}. (c) For y = 1, 2, . . . , min{s, t}, suppose that the discrete random variable X takes the value x = 2y with probability
π2y =
t−1 2Cs−1 y−1 Cy−1
Cs+t s
,
and takes the value
π2y+1 =
t−1 s−1 t−1 Cs−1 y Cy−1 + Cy−1 Cy
Cs+t s
,
where Cs−1 ≡ 0 when y = s and Ct−1 ≡ 0 when y = t. y y
Use Pascal’s Identity and Vandermonde’s Identity to show that X has a valid discrete probability distribution. SOLUTIONS Solution 2.1 (a) Let the random variable Y denote the number of individuals that must be selected until one individual with the rare blood disorder and one individual without the rare blood disorder are selected. (Note that Y can take the values 2, 3, 4, and 5.) If Di is the event that the ith individual selected has the rare blood disorder,
67
Solutions
then ¯ 2 ) + pr(D ¯ 1 ∩ D2 ) = pr(D1 )pr(D ¯ 2 D1 ) pr(Y = 2) = pr(D1 ∩ D ¯ 1 )pr(D2 D ¯ 1) + pr(D 4 3 3 4 4 = + = ; 7 6 7 6 7 ¯ 3 ) + pr(D ¯1∩D ¯ 2 ∩ D3 ) pr(Y = 3) = pr(D1 ∩ D2 ∩ D ¯ 3 D1 ∩ D2 ) = pr(D1 )pr(D2 D1 )pr(D ¯ 1 )pr(D ¯ 2 D ¯ 1 )pr(D3 D ¯1∩D ¯ 2) + pr(D 4 3 3 3 2 4 10 = + = . 7 6 5 7 6 5 35
Similarly, 4 3 2 3 3 2 1 4 4 + = ; and, 7 6 5 4 7 6 5 4 35 3 2 1 3 1 4 = pr(Y = 5) = 7 6 5 4 3 35
pr(Y = 4) =
=1−
4
pr(Y = y) = 1 −
y=2
34 . 35
4 + 5 1 = 2.60. Finally, E(Y) = 2 74 + 3 10 + 4 35 35 35 (b) Let A denote the event that “(k − 1) individuals have the rare blood disorder among the first (x − 1) individuals selected,” and let B denote the event that “the xth individual selected has the rare blood disorder.” Then, pX (x) = pr(X = x) = pr(A ∩ B) = pr(A)pr(BA) =
=
CN−M CM k−1 (x−1)−(k−1) CN x−1
·
[M − (k − 1)] [N − (x − 1)]
CM CN−M M − k + 1 k−1 x−k N−x+1 CN x−1
Cx−1 CN−x = k−1 NM−k , CM
1 ≤ k ≤ x ≤ (N − M + k).
68
Univariate Distribution Theory
(c) pr(third individual selected has the rare blood disorder) ¯ 2 ∩ D3 ) + pr(D ¯1∩D ¯ 2 ∩ D3 ) + pr(D ¯ 1 ∩ D2 ∩ D3 ) = pr(D1 ∩ D2 ∩ D3 ) + pr(D1 ∩ D M−1 M−2 M N−M M−1 M + = N N−1 N−2 N N−1 N−2 N−M−1 M N−M M M−1 M N−M + = . + N N−1 N−2 N N−1 N−2 N Solution 2.2. For x = 0, 1, . . . , (k − 2), there are exactly (k − x − 1) pairs of slots for which the integer 1 precedes the integer k and for which there are exactly x integers between the integers 1 and k. Also, the integer k can precede the integer 1, and the other (k − 2) integers can be arranged in the remaining (k − 2) slots in (k − 2)! ways. So,
pX (x) =
2(k − x − 1) 2(k − x − 1)[(k − 2)!] = , k! k(k − 1)
x = 0, 1, . . . , (k − 2).
Clearly, pX (x) ≥ 0, x = 0, 1, . . . , (k − 2), and k−2
k−2
pX (x) =
x=0
x=0
k−2 2 2(k − x − 1) = [(k − 1) − x] k(k − 1) k(k − 1)
x=0
2 (k − 2)(k − 1) (k − 1)2 − k(k − 1) 2
(k − 2) 2 (k − 1) − = 1. = k 2
=
So, pX (x) is a valid discrete probability distribution. (b) Now,
E(X) =
k−2
xpX (x) =
x=0
k−2 2 x[(k − 1) − x] k(k − 1) x=0
2 (k − 2)(k − 1) (k − 2)(k − 1)[2(k − 2) + 1] = (k − 1) − k(k − 1) 2 6
2 (k − 2)(k − 1) (k − 2)(2k − 3) − = k 2 6
(k − 2) 3(k − 1) − (2k − 3) = 3 k =
(k − 2) , 3
k ≥ 3.
69
Solutions
Solution 2.3 (a) 3 = 2, pr(Y = 2) = pr(W1 ∩ W2 ) = 46 5 5 pr(Y = 1) = pr(B1 ∩ W2 ∩ W3 ∩ W4 ) + pr(W1 ∩ B2 ∩ W3 ∩ W4 ) 4 2 3 2 4 4 3 2 2 + = , = 3 6 5 4 3 15 6 5 4 pr(Y = 0) = pr(B1 ∩ W2 ∩ W3 ∩ B4 ) + pr(B1 ∩ W2 ∩ B3 ∩ W4 ) + pr(W1 ∩ B2 ∩ B3 ∩ W4 ) + pr(W1 ∩ B2 ∩ W3 ∩ B4 ) + pr(B1 ∩ B2 ) 2 4 3 1 2 1 1 =4 + = . 6 5 4 3 6 5 3 4 − 2/5 = 1/3. Or, pr(Y = 0) = 1 − pr(Y = 1) − pr(Y = 2) = 1 − 15 Thus, 1 4 2 16 E(Y) = 0 +1 +2 = = 1.0667. 3 15 5 15
Since 28 1 4 2 + (1)2 + (2)2 = , 3 15 5 15 2 16 164 28 − = = 0.7289. V(Y) = 15 15 225
E(Y 2 ) = (0)2
(b) Clearly, pr(white ball) = 2/3, and this probability stays the same for each ball selected. So, pr(a pair contains 2 white balls) = (2/3)2 = 4/9. Now, let X = number of pairs that have to be selected to obtain exactly two pairs of white balls. Since X ∼ NEGBIN(k = 2, π = 4/9), it follows that 2 x−2 4 5 9 9 2 x−2 4 5 = (x − 1) , 9 9
pX (x) = Cx−1 2−1
x = 2, 3, . . . , ∞.
Solution 2.4 (a) At the second stage of sampling, we are sampling without replacement from a finite population of N animals, of which m are marked and (N − m) are unmarked. So, the hypergeometric distribution applies. In particular, the exact distribution of X is pX (x) =
N−m Cm x Cn−x
CN n
,
max[0, n − (N − m)] ≤ x ≤ n.
70
Univariate Distribution Theory
(b) N−4
π=
4 C4 C j n−j j=2
CN n
.
(c) Since X has the hypergeometric distribution given in part (a), it follows directly that E(X) = n(m/N). Since x, the observed value of X, is our best guess for E(X), it is logical to equate x to E(X), obtaining x = n(m/N). This leads to the expression ˆ = mn/x. When x = 22, m = 600, and n = 300, the computed value of N ˆ is 8181.82. N ˆ Two obvious problems with the estimate N are that it does not necessarily take positive integer values, and it is not defined when x = 0. Solution 2.5. Since each health status category has probability 1/k of being encountered, pr(encountering a new health status categoryc different health status categories have already been encountered) = (1 − c/k). Also, the daily outcomes are mutually independent of one another, and the probability (1 − c/k) remains the same from day to day. So, pr(it takes exactly x days to encounter a new health status categoryc different health status categories have already been encountered) = pr[not a new category in the first (x − 1) days] × pr[new category on the xth day] =
c x−1 c · 1− = (k − c)k −x cx−1 , k k
0 ≤ c ≤ (k − 1).
In other words, if X is the random variable denoting the number of days required to encounter a new health status category, then X has a geometric distribution, namely, pX (x) = (k − c)k −x cx−1 ,
x = 1, 2, . . . , ∞.
(b) For 0 ≤ c ≤ (k − 1) and with q = c/k, we have E(Xc different health status categories have already been encountered) =
∞ x−1 ∞ c c x−1 c x xq 1− = 1− k k k
x=1
x=1
∞ c d(qx ) = 1− k dq x=1 ⎧ ⎫
∞ c d ⎨ x ⎬ q c d = 1− q = 1− ⎭ k dq ⎩ k dq 1 − q x=1
= 1−
c (1)(1 − q) − q(−1) k
(1 − q)2
1 − kc k = 2 = (k − c) , c 1− k
which follows directly since X ∼ GEOM 1 − kc . −1 So, the expected total number of days = k k−1 c=0 (k − c) .
71
Solutions
When k = 4, we get 4 3c=0 (4 − c)−1 = 8.33; in other words, it will take, on average, nine days to encounter people in all k = 4 health status categories. Solution 2.6 (a) If X ∼ BIN(n, π = 0.0005), then pr(X ≥ 1) = 1 − pr(X = 0) = 1 − (0.9995)n ≥ 0.90; thus, we obtain n ln(0.9995) ≤ ln(0.10),
or
n ≥ 4605.17,
or n∗ = 4606.
And, if Y ∼POI(nπ), then pr(Y ≥ 1) = 1 − pr(Y = 0) = 1 − e−nπ = 1 − e−0.0005n ≥ 0.90; thus, we obtain e−0.0005n ≤ 0.10,
or
− 0.0005n ≤ ln(0.10),
or
n ≥ 4605.17,
which again gives n∗ = 4606. These numerical answers are the same because π is very close to zero in value. (b) With X ∼ BIN(n∗ , π), then P = (AX − n∗ ). Thus, requiring E(P) = (An∗ π − n∗ ) ≥ 0 gives Aπ − 1 ≥ 0,
or A ≥ π−1 ,
or A ≥ (0.0005)−1 = $2000.00.
(c) Let U be the discrete random variable denoting the number of the n∗ tickets purchased by this person that are jackpotwinning tickets. Then, U ∼ HG(N, K, n∗ ). So, pr(k ≤ U ≤ 4, 606) =
N−K 4606 CK u C4606−u . CN 4606 u=k
Solution 2.7 (a) pr(Y = 1) =
k
pr(both twins choose the number j)
j=1
=
k
pr(one twin chooses j)pr(other twin chooses j)
j=1
=
k 1 1 j=1
k
k
=
1 . k
72
Univariate Distribution Theory
So, pY (y) =
y k − 1 1−y 1 , k k
y = 0, 1.
(b) We wish to choose the smallest value of k such that 1k ≤ 0.01, which requires k = 100. (c) Let A be the event that “at least one set out of 100 sets of monozygotic twins chooses matching numbers” and let B be the event that “no set of monozygotic twins actually has ESP.” Then, ¯ pr(AB) = 1 − pr(AB) = 1 − (0.99)100 = 1 − 0.366 = 0.634. Thus, if this parapsychologist conducts this experiment on a reasonably large number of monozygotic twins, there is a very high probability of concluding incorrectly that one or more sets of monozygotic twins has ESP. Clearly, the chance of making this mistake increases as the number of sets of monozygotic twins studied increases. (d) With X defined as the number of matches in n = 10 independent repetitions of the experiment, then X ∼ BIN(n = 10, π = 0.01). So, pr(X ≥ 2) = 1 − pr(X ≤ 1) =1−
1
x 10−x C10 x (0.01) (0.99)
x=0
= 1 − (0.99)10 − (10)(0.01)(0.99)9 = 1 − 0.9044 − 0.0914 = 0.0042. So, for this particular set of monozygotic twins, there is some statistical evidence for the presence of ESP. Perhaps further study about this pair of twins is warranted, hopefully using other more sophisticated ESP detection experiments. (e) Let D be the event that “the two randomly chosen numbers are not the same.” Then, pr(S = 3D) = =
=
pr[(S = 3) ∩ D] pr(D) pr(1, 2) + pr(2, 1) 1 1− 4 pr(1)pr(2) + pr(2)pr(1) 3 4
73
Solutions 1 1 1 1 + 4 4 4 4 = 3 4 =
pr(S = 4D) =
1 ; 6
pr(1, 3) + pr(3, 1) = 3 4
1 1 + 16 16 3 4
=
1 ; 6
pr(2, 3) + pr(3, 2) + pr(4, 1) + pr(1, 4) 3 4 4 1 16 = = ; 3 3 4 2 1 pr(2, 4) + pr(4, 2) 16 = = pr(S = 6D) = 3 3 6 4 4 2 1 pr(3, 4) + pr(4, 3) 16 = = . pr(S = 7D) = 3 3 6 4 4 pr(S = 5D) =
Hence, the probability distribution is ⎧ 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨6 pS (sD) = 1 ⎪ ⎪ ⎪3 ⎪ ⎪ ⎩ 0
if s = 3, 4, 6, or 7; if s = 5; otherwise.
Note that this is a type of “truncated” distribution. The expected value is
E(SD) = 3
1 1 1 1 1 +4 +5 +6 +7 = 5. 6 6 3 6 6
74
Univariate Distribution Theory
Solution 2.8 (a) pr(Y = 2) + pr(Y = 3) pr[(Y ≥ 2) ∩ (Y ≤ 3)] = pr(Y ≥ 2) 1 − pr(Y = 1)
2 3 λ + 3!(eλλ −1) λ2 (λ + 3) 2!(eλ −1)
= = . 6(eλ − λ − 1) 1 − eλλ−1
pr(Y ≤ 3Y ≥ 2) =
(b) For r = 1, 2, . . . , E
∞ ∞ λy λy y! Y! = = λ (Y − r)! (y − r)! y!(e − 1) y=r (y − r)!(eλ − 1) y=1 =
∞ u=0
λu+r λr eλ = . u!(eλ − 1) (eλ − 1)
So, for r = 1, E(Y) =
λeλ (eλ − 1)
.
And, for r = 2, E[Y(Y − 1)] =
λ2 eλ , (eλ − 1)
so that V(Y) = E[Y(Y − 1)] + E(Y) − [E(Y)]2 =
λeλ (eλ − λ − 1) . (eλ − 1)2
(c) θ = pr(V) = pr[V ∩ (Y ≥ 1)] = pr V ∩ [∪∞ y=1 (Y = y)] ∞ [V ∩ (Y = y)] = pr[V ∩ (Y = y)] = pr ∪∞ y=1 y=1
=
∞
pr(VY = y)pr(Y = y) =
y=1
∞
(πy )
y=1
λy y!(eλ − 1)
⎡ ⎤ ∞ ∞ y y (πλ) (πλ) = (eλ − 1)−1 ⎣ − 1⎦ = (eλ − 1)−1 y! y! y=1
(eπλ − 1) . = (eλ − 1)
y=0
75
Solutions
Solution 2.9. First, note that U/2 = Z ∼ N(0, 1). Making use of this result, we then have pr[W − 30 < 0.50] = pr[−0.50 < (W − 30) < 0.50] = pr[29.5 < W < 30.5] = pr[29.5 < 31 + (0.50)U < 30.5] = pr [(29.5 − 31) < Z < (30.5 − 31)] = pr(−1.50 < Z < −0.50) = FZ (−0.50) − FZ (−1.50) = 0.3085 − 0.0668 = 0.2417. Solution 2.10 (a) For Process 1,
2.0 1.0
For Process 2,
2.0 1.0
3.144e−x dx = 0.7313.
2.574e−x dx = 0.5987.
Let A be the event that “both computer chips were produced by Process 1,” let B be the event that “one of the computer chips is acceptable and the other computer chip is unacceptable,” and let C be the event that “any computer chip is acceptable.” 2 Clearly, pr(A) = 13 = 1/9 = 0.1111. And, pr(BA) = 2(0.7313)(0.2687) = 0.3930. Also, since 1 2 pr(C) = (0.7313) + (0.5987) = 0.6429, 3 3 it follows that pr(B) = C21 (0.6429)(0.3571) = 0.4592. Finally, pr(AB) = =
pr(A ∩ B) pr(B) (0.3930)(0.1111) pr(BA)pr(A) = = 0.0951. pr(B) 0.4592
(b) For y = 3, 4, . . . , ∞, the event Y = y can occur in one of two mutually exclusive ways: (i) The first (y − 1) chips selected are all acceptable, and then the yth
76
Univariate Distribution Theory
chip selected is unacceptable; or (ii) The first (y − 1) chips selected include one acceptable chip and (y − 2) unacceptable chips, and then the yth chip selected is acceptable. So, if C is the event that a computer chip is acceptable, and if P1 is the event that a computer chip is produced by Process 1, and if θ denotes the probability of selecting an acceptable chip, then θ = pr(CP1 )pr(P1 ) + pr(CP¯ 1 )pr(P¯ 1 ) 2 1 + (0.5987) = 0.6429. = (0.7313) 3 3 Thus, with θ = 0.6429, we have pY (y) = θy−1 (1 − θ) + (y − 1)(1 − θ)y−2 θ2 ,
y = 3, 4, . . . , ∞.
Now, 0 ≤ pY (y), y = 3, 4, . . . , ∞, and ∞ y=3
pY (y) =
∞
θy−1 (1 − θ) + (y − 1)(1 − θ)y−2 θ2 y=3
= (1 − θ)
∞
θ2 = (1 − θ) 1−θ
u(1 − θ)u−1 θ
u=2
y=3
= θ2 + θ
∞
θy−1 + θ
⎡
+ θ⎣
1 − θ = 1, θ
∞
⎤ u(1 − θ)u−1 θ − θ⎦
u=1
so that pY (y) is a valid discrete probability distribution. Solution 2.11 (a) pr(X > 1) = pr(A1 ) = θ; pr(X > 2) = pr(A1 )pr(A2 A1 ) = θ(θ2 ) = θ3 ; pr(X > 3) = pr(A1 )pr(A2 A1 )pr(A3 A1 ∩ A2 ) = θ(θ2 )(θ3 ) = θ6 ; and, in general, pr(X > x) =
x i=1
θi , x = 1, 2, . . . , ∞.
77
Solutions
So, pX (x) = pr(X = x) = pr(X > x − 1) − pr(X > x) =
x−1
x
θi −
i=1
θi
i=1
⎛
x−1
=⎝
⎞
θi ⎠ (1 − θx )
i=1
x−1 i i=1 (1 − θx ) = θ =θ
x(x−1) 2 (1
− θx ),
x = 1, 2, . . . , ∞.
Since 0 < θ < 1, clearly 0 ≤ pX (x) ≤ 1 for x = 1, 2, . . . , ∞. And, ∞
pX (x) =
x=1
∞
θ
x(x−1) 2 (1
θ
x(x−1) 2
− θx )
x=1
=
∞ x=1
=
∞
∞
−
θ
x(x+1) 2
θ
x(x+1) 2
x=1
θ
y(y+1) 2
∞
−
y=0
x=1 ∞
=1+
θ
y(y+1) 2
−
y=1
∞
θ
x(x+1) 2
= 1.
x=1
(b) We have
E(X) =
∞
xθ
x(x−1) 2 (1
xθ
x(x−1) 2
− θx )
x=1
=
∞
−
x=1
=
∞
∞
xθ
x=1
(y + 1)θ
y(y+1) 2
−
y=0
=
x(x+1) 2
∞ y=0
∞
xθ
x(x+1) 2
x=1
θ
y(y+1) 2
+
∞ y=0
yθ
y(y+1) 2
−
∞ x=1
xθ
x(x+1) 2
78
Univariate Distribution Theory
=
∞
θ
y(y+1) 2
y=0
= 1 + θ + θ3 + θ6 + θ10 + · · · ≈ (1 + θ + θ3 ), assuming terms of the form θj for j > 3 can be neglected. Solution 2.12. For y ≥ 0, FY (y) = pr[Y ≤ y] = pr{[− ln(1 − X)]1/3 ≤ y} = pr[− ln(1 − X) ≤ y3 ] = pr[ln(1 − X) ≥ −y3 ] = pr[(1 − X) ≥ e−y ] 3
= pr[X ≤ (1 − e−y )] = FX (1 − e−y ) = 1 − e−y , 3
3
3
since FX (x) = x, 0 < x < 1. So, fY (y) =
3 dFY (y) = 3y2 e−y , dy
0 < y < ∞.
So, for r ≥ 0, and with u = y3 , E(Y r ) =
∞ 0
(yr )3y2 e−y dy = 3
%
∞ &
ur/3 e−u du
0
∞ u 3r +1 −1 e−u % & du +1 =Γ 3 Γ 3r + 1 0 r =Γ + 1 , r ≥ 0. 3 r
Solution 2.13 2 1 2 (a) pr(−2 < X < 2) = −2 288 (36 − x ) dx = 0.4815. x 1 1 x3 2 (b) FX (x) = −6 288 (36 − t ) dt = 288 (144 + 36x − 3 ), −6 < x < 6. So, pr(X > 3X > 0) = =
pr[(X > 3) ∩ (X > 0)] pr(X > 3) = pr(X > 0) pr(X > 0) 1 − FX (3) = 0.3124. 1 − FX (0)
(c) Now, pr(X > 1) = 1 − FX (1) = 0.3762. So, using the negative binomial distribution, π=
6 k=3
C6−1 (0.3762)k (0.6238)6−k = 0.2332. k−1
79
Solutions
√ (d) Using √ Tchebyshev’s Inequality, we know that L = E(X) − 3 V(X) and U = E(X) + 3 V(X). Since fX (x) is symmetric about zero, we know that E(X) = 0. So, V(X) = 6 1 (36 − x2 ) dx = 7.20, so that L = −8.05 and U = 8.05. These (x2 ) 288 E(X 2 ) = −6 findings clearly illustrate the very conservative nature of Tchebyshev’s Theorem, since pr(−8.05 < X < 8.05) = 1. Solution 2.14 (a) Although it is possible to find PX (s) directly, it is easier to make use of the connection between the moment generating function of X and the probability generating function of X. In particular, MX (t) = E(etX ) = =π
∞ −∞
∞ −∞
etx [πf1 (x) + (1 − π)f2 (x)] dx
etx f1 (x) dx + (1 − π)
∞ −∞
etx f2 (x) dx
2 2 2 2 = πe(μ1 t+σ1 t /2) + (1 − π)e(μ2 t+σ2 t /2) .
So, using the fact that s = et and ln(s) = t, it follows directly that PX (s) = πsμ1 e
σ2 [ln(s)]2 1 2
+ (1 − π)sμ2 e
σ2 [ln(s)]2 2 2
.
So,
2 2 2 2 dPX (s) = π μ1 sμ1 −1 e(σ1 /2)[ln(s)] + sμ1 e(σ1 /2)[ln(s)] σ12 s−1 ln(s) ds
2 2 2 2 + (1 − π) μ2 sμ2 −1 e(σ2 /2)[ln(s)] + sμ2 e(σ2 /2)[ln(s)] × σ22 s−1 ln(s) . Finally, E(X) =
dPX (s) = πμ1 + (1 − π)μ2 . ds s=1
¯ is the event that “X is from (b) Let A be the event that “X is from f1 (x),” so that A f2 (x)”; and, let B be the event that “X > 1.10.” Then, as a direct application of Bayes’ Theorem, we have pr(AB) = =
pr(BA)pr(A) ¯ ¯ pr(BA)pr(A) + pr(BA)pr( A) πpr(BA)
¯ πpr(BA) + (1 − π)pr(BA)
.
80
Univariate Distribution Theory
Now, with Z ∼N(0,1), we have X − 1.00 1.10 − 1.00 pr(BA) = pr > √ = pr(Z > 0.1414) = 0.44 √ 0.50 0.50 and ¯ = pr pr(BA)
X − 1.20 1.10 − 1.20 > √ = pr(Z > −0.1581) = 0.56. √ 0.40 0.40
Thus, pr(AB) =
(0.60)(0.44) = 0.54. (0.60)(0.44) + (0.40)(0.56)
(c) Since pr(X > 1) = 0.16, it follows that the appropriate truncated density function for X is 2 1 fX (xX > 1) = (0.16)−1 √ e−x /2 , 1 < x < ∞. 2π So, E(XX > 1) = (0.16)−1
∞
2 1 x √ e−x /2 dx. 2π 1
Letting y = x2 /2, so that dy = x dx, we have ∞ √ e−y dy E(XX > 1) = [ 2π(0.16)]−1 1/2
"
= 2.4934 −e
−y #∞
−1/2 1/2 = 2.4934 e
= 1.5123.
Solution 2.15. For r an odd positive integer, ∞ & % 2 1 yr  √ e−y /2 dy E Y r  = 2π −∞ 0 ∞ 2 2 1 1 = (−yr ) √ e−y /2 dy + yr √ e−y /2 dy 2π 2π −∞ 0 ∞ 2 1 =2 yr √ e−y /2 dy 2π 0 ∞ 1 1/2 r−1 −u/2 e du = u √ 2π 0 ∞ r+1 1 −1 −u/2 e du = √ u 2 2π 0
r+1 r+1 ∞ −1 −u/2 u 2 e 1 r+1 2 2 = √ Γ r+1 du 2 2π 0 Γ r+1 2 2 2
r+1 r+1 1 2 2 . = √ Γ 2 2π
81
Solutions
Solution 2.16 E
∞ Y! y! y+k−1 k π (1 − π)y = C (Y − r)! (y − r)! k−1 y=0 ∞
=
(y + k − 1)! πk (1 − π)y (k − 1)!(y − r)! y=r
=
∞ (u + r + k − 1)! k π (1 − π)u+r (k − 1)!u!
u=0
=
∞ π−r (1 − π)r (k + r − 1)! u+(k+r)−1 (k+r) C(k+r)−1 π (1 − π)u (k − 1)! u=0
=
π−r (1 − π)r (k + r − 1)!
(k − 1)! (k + r − 1)! 1 − π r = , (k − 1)! π
So, r = 1 gives
E(Y) = k
r = 0, 1, . . . , ∞.
1−π . π
And, r = 2 gives E[Y(Y − 1)] = k(k + 1)
1−π 2 , π
so that V(Y) = k(k + 1) =
1−π 2 1−π 2 1−π +k − k π π π
k(1 − π) . π2
Since X = (Y + k), E(X) = E(Y) + k = k
1−π π
+k =
and V(X) = V(Y) = k(1 − π)/π2 . These are expected answers, since X ∼ NEGBIN(k, π).
k π
82
Univariate Distribution Theory
Solution 2.17 (a) For −∞ < y ≤ β, FY (y) = =
y −∞
(2α)−1 e−(β−t)/α dt
e−β/α y 1 t/α e dt 2 −∞ α
e−β/α y/α e 2 1 = e(y−β)/α . 2 =
Note that FY (β) = 12 ; this is an expected result because the density function fY (y) is symmetric around β. For β < y < +∞, y
1 −(t−β)/α e dt 2α β 1 eβ/α y 1 −t/α dt e = + 2 2 β α
FY (y) = FY (β) +
1 eβ/α −β/α + [e − e−y/α ] 2 2 1 = 1 − e−(y−β)/α . 2 =
Thus, ⎧ 1 ⎪ ⎨ e(y−β)/α , FY (y) = 2 1 ⎪ ⎩1 − e−(y−β)/α , 2
−∞ < y ≤ β; β < y < +∞.
Now, if α = 1 and β = 2, pr(X > 4X > 2) = pr(Y > ln 4Y > ln 2) =
pr [(Y > ln 4) ∩ (Y > ln 2)] pr(Y > ln 2)
=
pr(Y > ln 4) pr(Y > ln 2)
=
1 − FY (1.3863) 1 − FY (0.6931)
83
Solutions
=
1 − 12 e(1.3863−2) 1 − 12 e(0.6931−2)
= 0.8434. (b) Now, φY (t) = E etY−E(Y) = E etY−β ∞ = ety−β (2α)−1 e−y−β/α dy −∞
∞
1 −y−β α1 −t = e dy −∞ 2α ,
∞ α 1 −y−β (1−αt) e = dy −∞ 2α =
[α/(1 − αt)] = (1 − αt)−1 , α
αt < 1.
So,
dφY (t) = ν1 = E{Y − E(Y)} dt t=0 = [−(1 − αt)−2 (−α)]t=0 = α.
And,
d2 φY (t) dt2
= ν2 = E{Y − E(Y)2 } = V(Y) t=0
= [α(−2)(1 − αt)−3 (−α)]t=0 = 2α2 . Solution 2.18. First, with y = x2 /2 so that dy = x dx, we have ∞ 1/2 2 2 x e−x /2θ dx E(X) = πθ 0 ∞ 1/2 2 e−y/θ dy = πθ 0 1/2 ∞ 1 −y/θ 2 e (θ) dy = πθ 0 θ 1/2 2θ . = π
84
Univariate Distribution Theory
Now, we have
E(Y) = E[g(X)] = 1 −
2 =1−α πθ
∞ 0
1/2 ∞
αe−βx
2
2 1/2 2 e−x /2θ dx πθ
1 x2 − β+ 2θ
e
dx
0
, ∞ θ −x2 2 2θβ+1 2 1/2 1 e dx =1−α πθ 2 −∞ , θ α ∞ 1 −x2 2 2θβ+1 =1− √ dx √ e θ −∞ 2π θ α =1− √ θ (2θβ + 1)
α =1− √ 2θβ + 1 α =1− $ . πβ[E(X)]2 + 1 Note that the average risk increases as both β and E(X) increase, but the average risk decreases as α increases. Solution 2.19 (a) Clearly, the distribution of X1 is negative binomial, namely, (x −1)
1 pX1 (x1 ) = C(2−1) π2h (1 − πh )(x1 −2) ,
x1 = 2, 3, . . . , ∞.
x2 −1 x2 −1 (Aj ∩ B) + pr ∪j=1 (Cj ∩ D) , where Aj is (b) pX2 (x2 ) = pr(X2 = x2 ) = pr ∪j=1 the event that “the first (x2 − 1) subjects selected consist of j heavy smokers and (x2 − 1 − j) nonsmokers,” B is the event that “the x2 th subject selected is a light smoker,” Cj is the event that “the first (x2 − 1) subjects selected consist of j light smokers and (x2 − 1 − j) nonsmokers,” and D is the event that “the x2 th subject selected is a heavy smoker.” So, ⎡
(x 2 −1)
pX2 (x2 ) = ⎣
⎤ (x −1) j Cj 2 πh (π0 )(x2 −1−j) ⎦ πl
j=1
⎡ +⎣
(x 2 −1) j=1
⎤ (x −1) j Cj 2 πl (π0 )(x2 −1−j) ⎦ πh
85
Solutions
(x2 −1)
= [(πh + π0 )(x2 −1) − π0 (x2 −1)
− π0
]πl + [(πl + π0 )(x2 −1)
]πh (x2 −1)
= πl (1 − πl )(x2 −1) + πh (1 − πh )(x2 −1) − (1 − π0 )π0
,
x2 = 2, 3, . . . , ∞. (c) Via a direct extension of the reasoning used in part (b), we obtain the following: ⎡
(x 3 −2)
pX3 (x3 ) = ⎣
⎤ (x3 −1) j (x3 −1−j) ⎦ πl πh π0
Cj
j=1
⎡
(x 3 −2)
+⎣ ⎡
⎤ (x3 −1) j (x3 −1−j) ⎦ π0 πh πl
Cj
j=1
(x 3 −2)
+⎣
⎤ (x3 −1) j (x3 −1−j) ⎦ π0 πl πh
Cj
j=1
(x −1) (x −1) = π0 (πl + πh )(x3 −1) − πl 3 − πh 3
(x −1) (x −1) − πh 3 + πl (π0 + πh )(x3 −1) − π0 3
(x −1) (x −1) − πl 3 + πh (π0 + πl )(x3 −1) − π0 3 = π0 (1 − π0 )(x3 −1) + πl (1 − πl )(x3 −1) + πh (1 − πh )(x3 −1) (x3 −1)
− (1 − π0 )π0
(x3 −1)
− (1 − πl )πl
(x3 −1)
− (1 − πh )πh
,
x3 = 3, 4, . . . , ∞. Solution 2.20 (a) Since Y ∼ N(μ, σ2 ), it follows that the moment generating function for Y = eX is 2 2 μt+ σ 2t tY t E(e ) = E(X ) = e ,
−∞ < t < +∞.
So, for t = 1, 2 E(X) = e(μ+0.50σ ) .
And, for t = 2,
2 2 2 V(X) = E(X 2 ) − [E(X)]2 = e(2μ+2σ ) − e(μ+0.50σ ) 2 2 = e(2μ+σ ) (eσ − 1).
86
Univariate Distribution Theory
(b) Since E(X) = V(X) = 1, we have 2 V(X) = eσ − 1 = 1, 2 [E(X)] which gives σ = 0.8326. And, the equation 2 2 [E(X)]2 = e(2μ+σ ) = e[2μ+(0.8326) ] = 1 gives μ = −0.3466.
So, pr(X > 1) = pr(Y > 0) = pr
0 − (−0.3466) Y − (−0.3466) > 0.8326 0.8326
= pr(Z > 0.4163) = 0.339,
since Z ∼ N(0, 1).
(c) Now,
ln(c) − μ , pr(X ≤ c) = pr[Y ≤ ln(c)] = pr Z ≤ σ where Z =
Y−μ ∼ N(0, 1). σ
Thus, to satisfy pr(X ≤ c) ≥ (1 − α) requires {[ln(c) − μ]/σ} ≥ z1−α . 2
And, since E(X) = e(μ+0.50σ ) , so that μ = ln[E(X)] − 0.50σ2 , the inequality {[ln(c) − μ]/σ} ≥ z1−α is equivalent to the inequality ln(c) − [lnE(X) − 0.50σ2 ] ≥ z1−α , σ which, in turn, can be written in the form ln
c 2 ≥ σz1−α − 0.50σ2 = 0.50z1−α − 0.50(z1−α − σ)2 . E(X)
2 , then the above inequality will be satisfied. EquivaSo, if ln [c/E(X)] ≥ 0.50z1−α lently, we need to pick E(X) small enough so that
E(X) ≤ ce−0.50z1−α . 2
Solution 2.21 (a) Since θ0 = 1 − α [π/(1 − π)] and θx = αθx for x ≥ 1, we 0 < α < [(1 − π)/π] so that 0 < πx < 1, x = 0, 1, 2, . . . , +∞. Now, E(etX ) =
∞ x=0
etx πx
require
that
87
Solutions ∞ π = 1−α + etx (απx ) 1−π x=1
= 1−α
π 1−π
+α
∞
(πet )x
x=1
πet π +α = 1−α 1−π 1 − πet
provided that 0 < πet < 1, or that −∞ < t < − ln π. So,
π MX (t) = 1 − α 1−π
πet +α , 1 − πet
0 < α < [(1 − π)/π] , −∞ < t < − ln π. So,
E(X) =
. / dMX (t) et (1 − πet ) − et (−πet ) = απ dt t=0 (1 − πet )2 . = απ
et (1 − πet )2
/ t=0
απ = . (1 − π)2
(b)
E(X) =
∞
xθx =
x=0
= απ
∞
∞
xαπx
x=1
xπx−1 = απ
x=1
= απ
d dπ
∞ d % x& π dπ
x=1 ∞ x=1
απ = . (1 − π)2
πx = απ
d dπ
π 1−π
t=0
88
Univariate Distribution Theory
Solution 2.22. For the gamma distribution, E(X r ) =
∞ 0
xr
xβ−1 e−x/α Γ(β + r) r dx = α , β Γ(β) Γ(β)α
(β + r) > 0.
So, μ3 = E{[X − E(X)]3 } = E(X 3 ) − 3E(X 2 )E(X) + 2[E(X)]3 = β(β + 1)(β + 2)α3 − 3[β(β + 1)α2 ](αβ) + 2α3 β3 = 2α3 β. Thus, α3 =
2α3 β 2 = √ . β (α2 β)3/2
Now, to find the mode of the gamma distribution, we need to find that value of x, say θ, which maximizes fX (x), or equivalently, which maximizes the function x h(x) = ln xβ−1 e−x/α = (β − 1)ln(x) − . α So, (β − 1) 1 dh(x) = − =0 dx x α gives θ = α(β − 1), which, for β > 1, maximizes fX (x); in particular, note that [d2 h(x)]/dx2 = (1 − β)/x2 , when evaluated at x = θ = α(β − 1), is negative for β > 1. Finally, we have α∗3 =
αβ − α(β − 1) 1 = √ . $ β α2 β
Thus, we have α3 = 2α∗3 , so that the two measures are essentially equivalent with regard to quantifying the degree of asymmetry for the gamma distribution. NOTE: For the beta distribution, fX (x) =
Γ(α + β) α−1 x (1 − x)β−1 , Γ(α)Γ(β)
0 < x < 1, α > 0, β > 0,
the interested reader can verify that the mode of the beta distribution is θ=
(α − 1) , (α + β − 2)
and that 2(β − α) α3 = (α + β + 2)

α > 1, β > 1,
(α + β + 1) 2(α + β − 2) ∗ = α . αβ (α + β + 2) 3
89
Solutions
Solution 2.23∗ (a) We have E(U) =
L 0
g(L)fX (x)dx +
= g(L)
L 0
∞ L
fX (x)dx + π
= (1 − π)g(L) + π
∞ L
xfX (x) dx
∞ f (x) x X dx π L xfX (xX ≥ L) dx
= (1 − π)g(L) + πE(XX ≥ L). And, using a similar development, we have E(U 2 ) =
L 0
"
∞ #2 x2 fX (x) dx g(L) fX (x) dx +
" #2 = (1 − π) g(L) + π
L
∞ L
x2 fX (xX ≥ L) dx
" #2 = (1 − π) g(L) + πE(X 2 X ≥ L). Thus, V(U) = E(U 2 ) − [E(U)]2 " #2 = (1 − π) g(L) + πE(X 2 X ≥ L) − [(1 − π)g(L) + πE(XX ≥ L)]2 " #2 " #2 = (1 − π) g(L) + πE(X 2 X ≥ L) − (1 − π)2 g(L) − 2π(1 − π)g(L)E(XX ≥ L) − π2 [E(XX ≥ L)]2
" #2 = (1 − π) − (1 − π)2 g(L) + πE(X 2 X ≥ L) − 2π(1 − π)g(L)E(XX ≥ L) − π2 [E(XX ≥ L)]2 + π [E(XX ≥ L)]2 − π [E(XX ≥ L)]2 " #2 = π(1 − π) g(L) + π E(X 2 X ≥ L) − [E(XX ≥ L)]2 − 2π(1 − π)g(L)E(XX ≥ L) + π(1 − π) [E(XX ≥ L)]2 " #2 = πV(XX ≥ L) + π(1 − π) g(L) − E(XX ≥ L) " #2 = π V(XX ≥ L) + (1 − π) g(L) − E(XX ≥ L) .
90
Univariate Distribution Theory
(b) Since
E(X) =
∞ 0
xfX (x) dx =
= (1 − π)
L 0
xfX (x) dx +
∞ L
xfX (x) dx
L ∞ fX (x) f (x) x dx + π x X dx (1 − π) π 0 L
= (1 − π)E(XX < L) + πE(XX ≥ L),
it follows directly that choosing g(L) to be equal to E(XX < L) will insure that E(U) = E(X). When fX (x) = e−x , x ≥ 0, and L = 0.05, then
(1 − π) =
0.05 0
" #0.05 e−x dx = −e−x 0 = 0.0488.
Thus, using integration by parts with u = x and dv = e−x dx, we find that the optimal choice for g(L) has the numerical value
L fX (x) x dx (1 − π) 0 0 0.05 0.05 −x e x dx = (0.0488)−1 xe−x dx = 0.0488 0 0 / . 0.05 #0.05 " e−x dx = (20.4918) −xe−x 0 +
E(XX < L) =
L
xfX (xX < L) dx =
0
= (20.4918) −0.05e−0.05 + 0.0488 = 0.0254.
For information about a more rigorous statistical approach for dealing with this leftcensoring issue, see Taylor et al. (2001). Solution 2.24∗ . Now,
E(Y) =
∞ −∞
(1 − αe−βx ) √ 2
1 2πσ
2 2 e−(x−μ) /2σ dx
2 − βx2 + (x−μ) 2
∞ α e =1− √ 2πσ −∞
2σ
dx.
91
Solutions
And, βx2 +
2 (x − μ)2 1 2− μ x+ μ x = β + 2σ2 2σ2 σ2 2σ2 ⎡ ⎤2 1 1 μ ⎢ ⎥ = ⎣x β + 2 − 2 ⎦ 1 2σ 2 2σ β + 2σ2
− =
μ2
4σ4 β + 12 2σ
2βσ2 + 1 2σ2
+
x−
μ2 2σ2
2 βμ2 μ . + (2βσ2 + 1) (2βσ2 + 1)
Finally,
α
−
∞
E(Y) = 1 − √ e 2πσ −∞ α − βμ2 =1− e (2βσ2 +1) σ
⎧⎡ ⎤2 ⎫ ⎪ ⎪ ⎪⎣ μ ⎪ ⎪ ⎦ ⎪ x− ⎪ ⎪ ⎨ ⎬ (2βσ2 +1) ⎪ ⎪ ⎪ ⎪ ⎩

⎛
⎞
2⎝
σ2 ⎠ 2βσ2 +1
⎪ ⎪ ⎪ ⎪ ⎭ −
e
βμ2 (2βσ2 +1)
dx
σ2 (2βσ2 + 1) 2
Solution 2.25∗
− βμ α =1− $ e (2βσ2 +1) . 2βσ2 + 1
(a) Now,
ψY (t) = ln E(etY ) = ln E et(X−c) ∞
tr κr = ln e−tc E etX = −tc + ln E etX = −tc + r! r=1
= (κ1 − c)t +
∞ r=2
κr
tr r!
.
Hence, the cumulants of Y are identical to those for X, except for the first cumulant. In particular, if Y = (X − c), then the first cumulant of Y is (κ1 − c), where κ1 is the first cumulant of X. (b) (i) If X ∼N(μ, σ2 ), then the moment generating function of X is MX (t) = 2 2 eμt+σ t /2 . So,
ψX (t) = μt +
σ2 t2 . 2
92
Univariate Distribution Theory
Hence, κ1 = μ, κ2 = σ2 , and κr = 0 for r = 3, 4, . . . , ∞. (ii) If X ∼POI(λ), then MX (t) = eλ(e −1) . So, t
ψX (t) = λ(et − 1) = λ
∞ r ∞ t tr (λ) . = r! r!
r=1
r=1
Thus, κr = λ for r = 1, 2, . . . , ∞. (iii) If X ∼GAMMA(α, β), then MX (t) = (1 − αt)−β . So, ψX (t) = −β ln(1 − αt). Now, ln(1 + y) =
∞
(−1)r+1
r=1
yr , r
−1 < y < +1.
If y = −αt, and t is chosen so that αt < 1, then ln(1 − αt) =
∞
(−1)r+1
r=1
=
∞
(−αt)r r
(−1)2r+1 αr (r − 1)!
r=1
∞
tr tr =− [(r − 1)!αr ] . r! r! r=1
So, ψX (t) = −β ln(1 − αt) =
∞
[(r − 1)!αr β]
r=1
tr , r!
αt < 1;
thus, κr = (r − 1)!αr β for r = 1, 2, . . . , ∞. tr (c) First, for r = 1, 2, . . . , ∞, since ψX (t) = ∞ r=1 κr r! , it follows directly that κr = So, since ψX (t) = ln[MX (t)] r = 1, 2, . . . , ∞, we have
dr ψX (t) . dtr t=0
and
since
[dr MX (t)]/dtr t=0 = E(X r )
dψX (t) dMX (t) −1 κ1 = = [MX (t)] dt t=0 dt t=0 = (1)−1 E(X) = E(X). Next, since
2 2 d2 ψX (t) −2 dMX (t) −1 d MX (t) , (t)] + [M (t)] = −[M X X dt dt2 dt2
for
93
Solutions
it follows that κ2 =
d2 ψX (t) = −(1)−2 [E(X)]2 + (1)−1 E(X 2 ) dt2 t=0
= E(X 2 ) − [E(X)]2 = V(X). Finally, since
3
d3 ψX (t) −3 dMX (t) −2 dMX (t) = 2[M (t)] − 2[M (t)] X X dt dt dt3
d2 MX (t) d2 MX (t) −2 dMX (t) − [M (t)] × X dt dt2 dt2 d3 MX (t) , + [MX (t)]−1 dt3 we have κ3 =
d3 ψX (t) dt3 t=0
= 2(1)−3 [E(X)]3 − 2(1)−2 [E(X)][E(X 2 )] − (1)−2 [E(X)] × [E(X 2 )] + (1)−1 E(X 3 ) = E(X 3 ) − 3E(X)E(X 2 ) + 2[E(X)]3 = E{[X − E(X)]3 }. Solution 2.26∗ (a) h(t) = lim
Δt→0
pr(t ≤ T ≤ t + ΔtT ≥ t) Δt
=
limΔt→0 pr(t ≤ T ≤ t + Δt)/Δt pr(T ≥ t)
=
f(t) dF(t)/dt = . 1 − F(t) S(t)
f(u) (b) From part (a), H(t) = 0t h(u) du = 0t S(u) du. Since dS(u) = d[1 − F(u)] du = −f(u) du, we have H(t) = −
t
1 dS(u) = − ln[S(t)] + ln[S(0)] = − ln[S(t)] + ln(1) 0 S(u)
= − ln[S(t)],
or
S(t) = e−H(t) .
94
Univariate Distribution Theory
(c) Now,
E(T) =
∞ 0
tf(t) dt =
∞ t 0
0
du f(t) dt =
∞ ∞ 0
I(t > u)du f(t) dt,
0
where I(A) is an indicator function taking the value 1 if event A holds and taking the value 0 otherwise. Hence,
E(T) = = = =
∞ ∞ 0
0
0
0
0 ∞
u
∞ ∞
I(t > u) du f(t) dt
I(t > u)f(t) dt du
∞ ∞
f(t) dt du
S(u) du. 0
(d) Note that X=
T c
if T < c; if T ≥ c
So, fX (x) = fT (x) if T < c (so that 0 < x < c) and fX (x) = c if T ≥ c, an event which occurs with probability [1 − FT (c)]. Thus, fX (x) = fT (x)I(x < c) + [1 − FT (c)]I(x = c),
0 < x ≤ c,
where, as in part (c), I(·) denotes the indicator function. In other words, fX (x) is a mixture of a continuous density [namely, fT (x)] for x < c and a point mass at c occurring with probability [1 − FT (c)]. So, E[H(X)] = E [H(X)I(X < c) + H(c)I(X = c)] = E[H(X)I(X < c)] + H(c)E[I(X = c)] c = H(x)fT (x) dx + H(c)pr(X = c) 0
=
c 0
H(x)fT (x) dx + H(c)[1 − FT (c)].
95
Solutions
Using integration by parts with u = H(x) and dv = fT (x) dx, we have c c E[H(X)] = H(x)FT (x)0 − h(x)FT (x) dx + H(c) − H(c)FT (c) 0
= H(c)FT (c) − 0 − =−
c 0 c
c 0
h(x)FT (x) dx + H(c) − H(c)FT (c)
h(x)FT (x) dx + H(c) = −
c 0
h(x)[1 − S(x)] dx + H(c)
f (x) h(x) 1 − T dx + H(c) h(x) 0 c c = − h(x) dx + fT (x) dx + H(c)
=−
0
0
= −[H(c) − H(0)] + [FT (c) − 0] + H(c) = H(0) + FT (c) = − ln[S(0)] + FT (c) = − ln(1) + FT (c) = FT (c). Solution 2.27∗ (a) If N units are produced, then it follows that P = NG if X ≥ N, and P = [XG − (N − X)L] = [(G + L)X − NL] if X < N. Hence, E(P) =
N 0
∞
[(G + L)x − NL]fX (x) dx +
= (G + L) = (G + L)
N 0
N 0
N
(NG)fX (x) dx
xfX (x) dx − NLFX (N) + NG[1 − FX (N)] xfX (x) dx + NG − N(G + L)FX (N).
Now, via integration by parts, N 0
xfX (x) dx = [xFX (x)]N 0 −
N 0
FX (x) dx = NFX (N) −
N 0
FX (x) dx,
so that we finally obtain E(P) = NG − (G + L)
N 0
FX (x) dx.
So, dE(P) = G − (G + L)[FX (N)] = G − (G + L)FX (N) = 0, dN which gives FX (N) =
G ; G+L
96
Univariate Distribution Theory
since d2 E(P) = −(G + L)fX (N) < 0, dN 2 this choice for N maximizes E(P). (b) Since fX (x) = 2kxe−kx , with k = 10−10 , FX (x) = 1 − e−kx . So, solving the equation 2
2
FX (N) = 1 − e−kN = 2
G (G + L)
gives ⎤1/2 L ln G+L ⎦ N=⎣ −k ⎡
So, using the values G = 4, L = 1, and k = 10−10 , we obtain N = 126, 860 units. Solution 2.28∗ (a) For k = 2, pr(P2 = αP1 ) = π and pr(P2 = βP1 ) = (1 − π), so that E(P2 ) = P1 [απ + β(1 − π)]. For k = 3, pr(P3 = α2 P1 ) = π2 , pr(P3 = αβP1 ) = 2π(1 − π), and pr(P3 = β2 P1 ) = (1 − π)2 , so that
E(P3 ) = P1 α2 π2 + 2αβπ(1 − π) + β2 (1 − π)2 = P1 [απ + β(1 − π)]2 . In general,
pr Pk = αj β(k−1)−j P1 = Ck−1 πj (1 − π)(k−1)−j , j j = 0, 1, . . . , (k − 1), so that E(Pk ) =
k−1
αj β(k−1)−j P1 Ck−1 πj (1 − π)(k−1)−j j
j=0
= P1
k−1
Ck−1 (απ)j [β(1 − π)](k−1)−j j
j=0
= P1 [απ + β(1 − π)]k−1 ,
k = 2, 3, . . . , ∞.
97
Solutions
(b) For k = 2, 3, . . . , ∞, we consider the inequality E(Pk ) = P1 [απ + β(1 − π)]k−1 ≥ P∗ , or equivalently
P∗ [1/(k−1)] − β(1 − π) , P1
1 α≥ π which gives
P∗ [1/(k−1)] − β(1 − π) . π P1
1 α∗ = Now,
limk→∞ α∗ =
(β − 1) 1 . [1 − β(1 − π)] = β − π π
Since 1 < β < +∞, this limiting value of α∗ varies directly with π (i.e., the larger is π, the larger is this limiting value of α∗ ). In particular, when π = 1, so that every policy holder has a perfect driving record every year, then this insurance company should never reduce the yearly premium from its firstyear value of P1 . If β = 1.05 and π = 0.90, then this limiting value equals 0.9944. So, for these particular values of β and π, this insurance company should never allow the yearly premium to be below 0.9944P1 in value. Solution 2.29∗ (a) For R = 2, we have 2 2−x 1 1 1 (−1)l = 1−1+ x! l! 0! 2!
x=0
l=0
+
1 1 (1 − 1) + (1) = 1. 1! 2!
Then, assuming that the result holds for the value R, we obtain R+1 x=0
=
1 x!
(R+1)−x l=0
R (R−x)+1 (−1)l 1 1 + x! l! (R + 1)!
x=0
=
l=0
R x=0
=
(−1)l l!
R x=0
1 x!
R−x l=0
pX (x) +
R
(−1)l + l!
x=0
1 x!
(−1)(R+1)−x 1 + [(R + 1) − x]! (R + 1)!
R 1 1 CR+1 (−1)(R+1)−x + x (R + 1)! (R + 1)! x=0
98
Univariate Distribution Theory
=
R x=0
pX (x) +
R+1 1 CR+1 (1)x (−1)(R+1)−x x (R + 1)! x=0
1 1 − + (R + 1)! (R + 1)! =
R
pX (x) + [1 + (−1)]R+1 =
R
pX (x) = 1,
x=0
x=0
which completes the proof by induction. (b) We have E(X) =
R−x R 1 (−1)l x x! l!
x=0
=
l=0
R x=1
=
R−x (−1)l 1 (x − 1)! l! l=0
R−1 y=0
1 y!
(R−1)−y l=0
(−1)l = 1. l!
And, E[X(X − 1)] =
R x=0
=
1 x(x − 1) x!
R x=2
=
1 (x − 2)!
R−2 y=0
1 y!
R−x x=0
R−x l=0
(R−2)−y l=0
(−1)l l!
(−1)l l!
(−1)l = 1, l!
so that V(X) = E[X(X − 1)] + E(X) − [E(X)]2 = 1 + 1 − (1)2 = 1. It seems counterintuitive that neither E(X) nor V(X) depends on the value of R. Also, ⎤ ⎡ R−x (−1)l 1 ⎦ limR→∞ pX (x) = limR→∞ ⎣ x! l! l=0
∞
=
1 (−1)l e−1 = l! x! x! l=0
=
(1)x e−1 , x!
x = 0, 1, . . . , ∞.
99
Solutions
So, as R → ∞, the distribution of X becomes Poisson with E(X) = V(X) = 1. Solution 2.30∗ (a) First, 1 − FWn (wn ) = pr(Wn > wn ) = pr[Xwn ≤ (n − 1) in the time interval(0, wn )] =
n−1 x=0
(Nwn λ)x e−(Nwn λ) , x!
so that FWn (wn ) = 1 −
n−1 x=0
(Nwn λ)x e−(Nwn λ) . x!
So, fWn (wn ) =
dFWn (wn ) dwn
= −e−Nwn λ
n−1 x=0
1
xNλ(Nwn λ)x−1 − Nλ(Nwn λ)x x!
⎡
n−1
= −Nλe−Nwn λ ⎣
x=1
= −Nλe−Nwn λ − =
wnn−1 e−Nλwn , Γ(n)(Nλ)−n
⎤ n−1 (Nwn λ)x−1 (Nwn λ)x ⎦ − (x − 1)! x!
(Nwn λ)n−1 (n − 1)!
x=0
wn > 0.
So, Wn ∼ GAMMA α = (Nλ)−1 , β = n . (b) Note that E(XT ) = V(XT ) = NTλ. So, X
t E(etZ ) = E e
T −NTλ √ NTλ
t
√ √ XT −t NTλ NTλ E e =e √ √ NTλ et/ NTλ −1 NTλ −t e . =e
100
Univariate Distribution Theory
Now, ⎡ ⎤ √ j ∞ t/ NTλ √ ⎥ ⎢ −t NTλ + NTλ ⎣ − 1⎦ j! j=0
j √ ∞ t/ NTλ √ √ 2 (NTλ), = −t NTλ + t NTλ + t /2 + j! j=3
which converges to t2 /2 as N → ∞. Thus, 2 lim E(etZ ) = et /2 ,
N→∞
so that, for large N, Z=
XT − NTλ ∼ ˙ N(0, 1). √ NTλ
Then, if N = 105 , λ = 10−4 , and T = 10, so that NTλ = 100, then pr(XT ≤ 90NTλ = 100) = pr
XT − 100 90 − 100 ≤ √ √ 100 100
= ˙ pr(Z ≤ −1.00) = ˙ 0.16, √ ˙ 1) for large N. since Z = (XT − 100)/ 100 ∼N(0, Solution 2.31∗ (a) With y = (x − c), we have ∞ ∞ β−1 −x/α x e (y + c)β−1 e−(y+c)/α dx = dy β Γ(β)α Γ(β)αβ c 0 e−c/α ∞ (y + c)β−1 e−y/α dy = Γ(β)αβ 0 ⎡ ⎤ β−1 e−c/α ∞ ⎣ β−1 j β−1−j ⎦ −y/α = e Cj c y dy Γ(β)αβ 0 j=0
=
β−1 e−c/α β−1 j ∞ (β−j)−1 −y/α C c y e dy j Γ(β)αβ 0 j=0
101
Solutions
=
β−1
e−c/α (β − 1)! j Γ(β − j)αβ−j c (β − j − 1)!j! (β − 1)!αβ j=0
=
β−1 e−c/α (c/α)j j=0
j!
,
which is pr[X ≤ (β − 1)] when X ∼ POI(c/α). (b) With x = c(1 − y), we have c
Γ(α + β) α−1 x (1 − x)β−1 dx
0 Γ(α)Γ(β)
=
1
Γ(α + β) [c(1 − y)]α−1 [1 − c(1 − y)]β−1 (c) dy Γ(α)Γ(β) 0
⎡ ⎤ β−1 β−1 Γ(α + β) α 1 α−1 j β−1−j ⎣ ⎦ dy c (1 − y) Cj (cy) (1 − c) = Γ(α)Γ(β) 0 j=0
=
1 β−1 Γ(α + β) α β−1 j c Cj c (1 − c)β−1−j yj (1 − y)α−1 dy. Γ(α)Γ(β) 0 j=0
Thus, since
1 0
yj (1 − y)α−1 dy =
Γ(j + 1)Γ(α) , Γ(α + j + 1)
we have β−1 Γ(j + 1)Γ(α) (α + β − 1)! (β − 1)! cα+j (1 − c)β−1−j (α − 1)!(β − 1)! j!(β − 1 − j)! Γ(α + j + 1) j=0
=
β−1 j=0
=
(α + β − 1)! cα+j (1 − c)β−1−j (α + j)!(β − 1 − j)!
α+β−1
α+β−1 i c (1 − c)α+β−1−i ,
Ci
i=α
which is pr(X ≥ α) when X ∼ BIN(α + β − 1, c). Solution 2.32∗ (a) Let A be the event that “any egg produces a live and healthy baby sea turtle,” let B be the event that “a live and healthy baby sea turtle grows to adulthood,” and
102
Univariate Distribution Theory
let C be the event that “any egg produces an adult sea turtle.” Then, pr(C) = pr(A ∩ B) = pr(A)pr(BA) = (0.30)(1 − 0.98) = (0.30)(0.02) = 0.006. (b) Let T0 be the event that “any randomly chosen sea turtle nest produces no adult sea turtles” and let En be the event that “any randomly chosen sea turtle nest contains exactly n eggs.” Then, ¯ α = pr(T¯ 0 ) = pr[∪∞ n=1 (T0 ∩ En ] =
∞
pr(T¯ 0 En )pr(En )
n=1
= 1 − pr(T0 ) = 1 −
∞
pr(T0 En )pr(En )
n=1
=1−
∞
[(0.994)n ](1 − π)πn−1
n=1
= 1 − 0.994(1 − π)
∞
(0.994π)n−1
n=1
1 = 1 − 0.994(1 − π) 1 − 0.994π =1−
0.006 0.994(1 − π) = . 1 − 0.994π 1 − 0.994π
When π = 0.20, then α = 0.0075. (c) Let Tk be the event that “a randomly chosen sea turtle nest produces exactly k adult sea turtles.” Then, based on the stated assumptions, it follows that pr(Tk En ) = Cnk (0.006)k (0.994)n−k ,
k = 0, 1, . . . , n.
Then, pr(En Tk ) = =
pr(Tk En )pr(En ) pr(En ∩ Tk ) = pr(Tk ) pr(Tk ) [Cnk (0.006)k (0.994)n−k ][π(1 − π)n−1 ] pr(Tk )
Now, for n ≥ k ≥ 1, we have pr(Tk ) =
∞ n=k
=
∞ n=k
pr(Tk ∩ En ) =
∞
pr(Tk En )pr(En )
n=k
Cnk (0.006)k (0.994)n−k (1 − π)πn−1
.
103
Solutions =
∞ 0.006 k 1 − π n Ck (0.994π)n 0.994 π n=k
=
0.006 0.994
k
∞ 1−π π
Cm+k (0.994π)m+k k
m=0
= (0.006)k (1 − π)πk−1
∞
Cm+k (0.994π)m k
m=0
= (0.006)k (1 − π)πk−1 (1 − 0.994π)−(k+1) So, βnk = pr(En Tk ) =
[Cnk (0.006)k (0.994)n−k ][(1 − π)πn−1 ]
(0.006)k (1 − π)πk−1 (1 − 0.994π)−(k+1)
= Cnk (1 − 0.994π)k+1 (0.994π)n−k ,
1 ≤ k ≤ n < ∞.
When k = 0, βn0 =
(0.994)n [(1 − π)πn−1 ]
0.994(1−π) 1−0.994π
= (0.994π)n−1 (1 − 0.994π),
n = 1, 2, . . . , ∞.
For any fixed k ≥ 0, note that, as required, ∞ n=k pr(En Tk ) = 1. Finally, when π = 0.20, k = 2, and n = 6, β62 = pr(E6 T2 ) = 0.0123. Solution 2.33∗ (a) If k > n, the result is obvious since 0 = (0 + 0); so, we only need to consider the case when k ≤ n. Now, (n − 1)! (n − 1)! + (k − 1)!(n − k)! k!(n − k − 1)!
(n − k) k + = (n − 1)! k!(n − k)! k!(n − k)!
n! n = = Cnk , = (n − 1)! k!(n − k)! k!(n − k)!
+ Cn−1 = Cn−1 k−1 k
which completes the proof. (b) The lefthand side of Vandermonde’s Identity is the number of ways of choosing r objects from a total of (m + n) objects. For k = 0, 1, . . . , r, this can be accomplished by choosing k objects from the set of n objects (which can be done in Cnk ways) and by choosing (r − k) objects from the set of m objects (which can be done in Cm ways), giving the product Cnk Cm as the total number of ways of choosing r−k r−k
104
Univariate Distribution Theory
r objects from a total of (m + n) objects given that exactly k objects must be chosen from the set of n objects. Vandermonde’s Identity follows directly by summing this product over all the values of k. (c) Without loss of generality, assume that s ≤ t. Then, we wish to show that s
s
s−1 t−1 s−1 t−1 s−1 t−1 y=1 2Cy−1 Cy−1 + Cy Cy−1 + Cy−1 Cy Cs+t s
(π2y + π2y+1 ) =
y=1
= 1,
or, equivalently, that the numerator N in the above ratio expression is equal to Cs+t s . Now, using Pascal’s Identity, we have N=
s
t−1 s−1 Ct−1 + Cs−1 Ct−1 2Cs−1 C + C y y y−1 y−1 y−1 y−1 y=1
=
s
t−1 t−1 + Ct−1 Cs−1 + Cs−1 Cs−1 y y−1 Cy−1 + Cy y−1 y−1
y=1
=
s
t−1 s t Cs−1 y−1 Cy + Cy−1 Cy y=1
=
s y=1
(t − 1)!s! (s − 1)!t! + (y − 1)!(s − y)!y!(t − y)! (y − 1)!(t − y)!y!(s − y)!
= (s + t)
s y=1
=
(s − 1)!(t − 1)! (y − 1)!(s − y)!y!(t − y)!
s (s + t) s t−1 Cy Cy−1 s y=1
=
s−1 (s + t) s Ck+1 Ct−1 k s k=0
=
s−1 (s + t) s C(s−1)−k Ct−1 . k s k=0
Then, in the above summation, if we let r = (s − 1), m = s, and n = (t − 1), in which case (s − 1) ≤ min{s, (t − 1)} since s ≤ t, then Vandermonde’s Identity gives s−1 k=0
Cs(s−1)−k Ct−1 = k
r
n Cm r−k Ck
k=0
= Cm+n r = Cs+t−1 s−1 .
105
Solutions
Finally,
(s + t) s+t−1 (s + t) (s + t − 1)! N= Cs−1 = s s (s − 1)!t! =
(s + t)! = Cs+t s . s!t!
This completes the proof since it then follows that 0 ≤ π2y ≤ 1
and
0 ≤ π2y+1 ≤ 1,
y = 1, 2, . . . , min{s, t}.
3 Multivariate Distribution Theory
3.1
Concepts and Notation
3.1.1
Discrete and Continuous Multivariate Distributions
A discrete multivariate probability distribution for k discrete random variables X1 , X2 , . . . , Xk is denoted
pX1 ,X2 ,...,Xk (x1 , x2 , . . . , xk ) = pr ∩ki=1 (Xi = xi ) ≡ pX (x) = pr(X = x),
x ∈ D,
where the row vector X = (X1 , X2 , . . . , Xk ), the row vector x = (x1 , x2 , . . . , xk ), and D is the domain (i.e., the set of all permissible values) of the discrete random vector X. A valid multivariate discrete probability distribution has the following properties: (i) 0 ≤ pX (x) ≤ 1 for all x ∈ D; (ii) · · · pX (x) = 1; D
(iii) If D1 is a subset of D, then pr[X ∈ D1 ] =
···
D1
pX (x).
A continuous multivariate probability distribution (i.e., a multivariate density function) for k continuous random variables X1 , X2 , . . . , Xk is denoted fX1 ,X2 ,...,Xk (x1 , x2 , . . . , xk ) ≡ fX (x),
x ∈ D,
where D is the domain of the continuous random vector X. A valid multivariate density function has the following properties: (i) 0 ≤ fX (x) < +∞ for all x ∈ D; (ii) · · · fX (x) dx = 1, where dx = dx1 dx2 . . . dxk ; D
107
108
Multivariate Distribution Theory
(iii) If D1 is a subset of D, then pr[X ∈ D1 ] =
· · · fX (x) dx.
D1
3.1.2
Multivariate Cumulative Distribution Functions
In general, the multivariate CDF for a random vector X is the scalar function
FX (x) = pr(X ≤ x) = pr ∩ki=1 (Xi ≤ xi ) . For a discrete random vector, FX (x) is a discontinuous function of x. For a continuous random vector, FX (x) is an absolutely continuous function of x, so that ∂ k FX (x) = fX (x). ∂x1 ∂x2 · · · ∂xk
3.1.3
Expectation Theory
Let g(X) be a scalar function of X. If X is a discrete random vector with probability distribution pX (x), then E[g(X)] =
···
D
g(x)pX (x).
And, if X is a continuous random vector with density function fX (x), then E[g(X)] =
· · · g(x)fX (x) dx. D
Some important expectations of interest in the multivariate setting are: 3.1.3.1
Covariance
For i = j, the covariance between the two random variables Xi and Xj is defined as cov(Xi , Xj ) = E{[Xi − E(Xi )][Xj − E(Xj )]} = E(Xi Xj ) − E(Xi )E(Xj ),
−∞ < cov(Xi , Xj ) < +∞.
109
Concepts and Notation
3.1.3.2
Correlation
For i = j, the correlation between the two random variables Xi and Xj is defined as cov(Xi , Xj ) corr(Xi , Xj ) = $ , V(Xi )V(Xj ) 3.1.3.3
−1 ≤ corr(Xi , Xj ) ≤ +1.
Moment Generating Function
With the row vector t = (t1 , t2 , . . . , tk ), k MX (t) = E etX = E e i=1 ti Xi is called the multivariate moment generating function for the random vector X. In particular, with r1 , r2 , . . . , rk being nonnegative integers satisfying the restriction ki=1 ri = r, we have r
∂ r MX (t)
r
r
E[X11 X22 · · · Xkk ] =
r
r
r
∂t11 ∂t22 · · · ∂tkk t=0
,
where the notation t = 0 means that ti = 0, i = 1, 2, . . . , k.
3.1.4
Marginal Distributions
When X is a discrete random vector, the marginal distribution of any proper subset of the k random variables X1 , X2 , . . . , Xk can be found by summing over all the random variables not in the subset of interest. In particular, for 1 ≤ j < k, the marginal distribution of the random variables X1 , X2 , . . . , Xj is equal to
pX1 ,X2 ,...,Xj (x1 , x2 , . . . , xj ) =
all xj+1 all xj+2
···
pX (x).
all xk−1 all xk
When X is a continuous random vector, the marginal distribution of any proper subset of the k random variables X1 , X2 , . . . , Xk can be found by integrating over all the random variables not in the subset of interest. In particular, for 1 ≤ j < k, the marginal distribution of the random variables X1 , X2 , . . . , Xj is equal to fX1 ,X2 ,...,Xj (x1 , x2 , . . . , xj ) ··· = all xj+1 all xj+2
all xk−1 all xk
fX (x)dxk dxk−1 · · · dxj+2 dxj+1 .
110
3.1.5
Multivariate Distribution Theory
Conditional Distributions and Expectations
For X a discrete random vector, let X 1 denote a proper subset of the k discrete random variables X1 , X2 , . . . , Xk , let X 2 denote another proper subset of X1 , X2 , . . . , Xk , and assume that the subsets X 1 and X 2 have no elements in common. Then, the conditional distribution of X 2 given that X 1 = x1 is defined as the joint distribution of X 1 and X 2 divided by the marginal distribution of X 1 , namely, pX 2 (x2 X 1 = x1 ) = =
pX 1 ,X 2 (x1 , x2 ) pX 1 (x1 ) pr[(X 1 = x1 ) ∩ (X 2 = x2 )] , pr(X 1 = x1 )
pr(X 1 = x1 ) > 0.
Then, if g(X 2 ) is a scalar function of X 2 , it follows that E[g(X 2 )X 1 = x1 ] =
···
g(x2 )pX 2 (x2 X 1 = x1 ).
all x2
For X a continuous random vector, let X 1 denote a proper subset of the k continuous random variables X1 , X2 , . . . , Xk , let X 2 denote another proper subset of X1 , X2 , . . . , Xk , and assume that the subsets X 1 and X 2 have no elements in common. Then, the conditional density function of X 2 given that X 1 = x1 is defined as the joint density function of X 1 and X 2 divided by the marginal density function of X 1 , namely, fX 2 (x2 X 1 = x1 ) =
fX 1 ,X 2 (x1 , x2 ) , fX 1 (x1 )
fX 1 (x1 ) > 0.
Then, if g(X 2 ) is a scalar function of X 2 , it follows that E[g(X 2 )X 1 = x1 ] =
· · · g(x2 )fX 2 (x2 X 1 = x1 ) dx2 .
all x2
More generally, if g(X 1 , X 2 ) is a scalar function of X 1 and X 2 , then useful iterated expectation formulas are: E[g(X 1 , X 2 )] = Ex1 {E[g(X 1 , X 2 )X 1 = x1 ]} = Ex2 {E[g(X 1 , X 2 )X 2 = x2 ]} and V[g(X 1 , X 2 )] = Ex1 {V[g(X 1 , X 2 )X 1 = x1 ]} + Vx1 {E[g(X 1 , X 2 )X 1 = x1 ]} = Ex2 {V[g(X 1 , X 2 )X 2 = x2 ]} + Vx2 {E[g(X 1 , X 2 )X 2 = x2 ]}.
111
Concepts and Notation
Also, pX (x) ≡ pX1 ,X2 ,...,Xk (x1 , x2 , . . . , xk ) = pX1 (x1 )
k
pXi xi ∩i−1 (X = x ) j j j=1
i=2
and fX (x) ≡ fX1 ,X2 ,...,Xk (x1 , x2 , . . . , xk ) = fX1 (x1 )
k
fXi xi ∩i−1 (X = x ) . j j j=1
i=2
Note that there are k! ways of writing each of the above two expressions.
3.1.6
Mutual Independence among a Set of Random Variables
The random vector X is said to consist of a set of k mutually independent random variables if and only if FX (x) =
k
FXi (xi ) =
i=1
k
pr(Xi ≤ xi )
i=1
for all possible choices of x1 , x2 , . . . , xk . Given mutual independence, then pX (x) ≡ pX1 ,X2 ,...,Xk (x1 , x2 , . . . , xk ) =
k
pXi (xi )
i=1
when X is a discrete random vector, and fX (x) ≡ fX1 ,X2 ,...,Xk (x1 , x2 , . . . , xk ) =
k
fXi (xi )
i=1
when X is a continuous random vector. Also, for i = 1, 2, . . . , k, let gi (Xi ) be a scalar function of Xi . Then, if X1 , X2 , . . . , Xk constitute a set of k mutually independent random variables, it follows that ⎤ ⎡ k k E[gi (Xi )]. E ⎣ gi (Xi )⎦ = i=1
i=1
And, if X1 , X2 , . . . , Xk are mutually independent random variables, then any subset of these k random variables also constitutes a group of mutually independent random variables. Also, for i = j, if Xi and Xj are independent random variables, then corr(Xi , Xj ) = 0; however, if corr(Xi , Xj ) = 0, it does not necessarily follow that Xi and Xj are independent random variables.
112
3.1.7
Multivariate Distribution Theory
Random Sample
Using the notation X i = (Xi1 , Xi2 , . . . , Xik ), the random vectors X 1 , X 2 , . . . , X n are said to constitute a random sample of size n from the discrete parent population pX (x) if the following two conditions hold: (i) X 1 , X 2 , . . . , X n constitute a set of mutually independent random vectors; (ii) For i = 1, 2, . . . , n, pX i (xi ) = pX (xi ); in other words, X i follows the discrete parent population distribution pX (x). A completely analogous definition holds for a random sample from a continuous parent population fX (x). Standard statistical terminology describes a random sample X 1 , X 2 , . . . , X n of size n as consisting of a set of independent and identically distributed (i.i.d.) random vectors. In this regard, it is important to note that the mutual independence property pertains to the relationship among the random vectors, not to the relationship among the k (possibly mutually dependent) scalar random variables within a random vector.
3.1.8 3.1.8.1
Some Important Multivariate Discrete and Continuous Probability Distributions Multinomial
The multinomial distribution is often used as a statistical model for the analysis of categorical data. In particular, for i = 1, 2, . . . , k, suppose that πi is the probability that an observation falls into the ith of k distinct categories, where 0 < πi < 1 and where ki=1 πi = 1. If the discrete random variable Xi is the number of observations out of n that fall into the ith category, then the k random variables X1 , X2 , . . . , Xk jointly follow a kvariate multinomial distribution, namely, pX (x) ≡ pX1 ,X2 ,...,Xk (x1 , x2 , . . . , xk ) =
n! x x x π 1 π 2 · · · πk k , x1 !x2 ! · · · xk ! 1 2
x ∈ D,
where D = {x : 0 ≤ xi ≤ n, i = 1, 2, . . . , k, and ki=1 xi = n}. When (X1 , X2 , . . . , Xk ) ∼ MULT(n; π1 , π2 , . . . , πk ), then Xi ∼ BIN(n, πi ) for i = 1, 2, . . . , k, and cov(Xi , Xj ) = −nπi πj for i = j. 3.1.8.2
Multivariate Normal
The multivariate normal distribution is often used to model the joint behavior of k possibly mutually correlated continuous random variables. The multivariate normal density function for k continuous random variables X1 , X2 , . . . , Xk
113
Concepts and Notation
is defined as fX (x) ≡ fX1 ,X2 ,...,Xk (x1 , x2 , . . . , xk ) =
1
−1 (x−μ)
(2π)k/2 Σ1/2
e−(1/2)(x−μ)Σ
,
where −∞ < xi < ∞ for i = 1, 2, . . . , k, where μ = (μ1 , μ2 , . . . , μk ) = [E(X1 ), E(X2 ), . . . , E(Xk )], and where Σ is the (k × k) covariance matrix of X with ith diagonal element equal to σi2 = V(Xi ) and with (i, j)th element σij equal to cov(Xi , Xj ) for i = j. Also, when X ∼ MVNk (μ, Σ), then the moment generating function for X is
MX (t) = etμ +(1/2)tΣt . And, for i = 1, 2, . . . , k, the marginal distribution of Xi is normal with mean μi and variance σi2 . As an important special case, when k = 2, we obtain the bivariate normal distribution, namely, − 1 2 1 e 2(1−ρ ) fX1 ,X2 (x1 , x2 ) = $ 2πσ1 σ2 (1 − ρ2 )
x1 −μ1 2 x −μ x2 −μ2 x −μ 2 + 2σ 2 −2ρ 1σ 1 σ1 σ 1 2 2
,
where −∞ < x1 < ∞ and −∞ < x2 < ∞, and where ρ = corr(X1 , X2 ). When (X1 , X2 ) ∼ BVN(μ1 , μ2 ; σ12 , σ22 ; ρ), then the moment generating function for X1 and X2 is MX1 ,X2 (t1 , t2 ) = et1 μ1 +t2 μ2 +(1/2)(t1 σ1 +2t1 t2 ρσ1 σ2 +t2 σ2 ) . 2 2
2 2
The conditional distribution of X2 given X1 = x1 is normal with E(X2 X1 = x1 ) = μ2 + ρ
σ2 (x1 − μ1 ) and σ1
V(X2 X1 = x1 ) = σ22 (1 − ρ2 ).
And, the conditional distribution of X1 given X2 = x2 is normal with E(X1 X2 = x2 ) = μ1 + ρ
σ1 (x2 − μ2 ) and σ2
V(X1 X2 = x2 ) = σ12 (1 − ρ2 ).
These conditional expectation expressions for the bivariate normal distribution are special cases of a more general result. More generally, for a pair of either discrete or continuous random variables X1 and X2 , if the conditional expectation of X2 given X1 = x1 is a linear (or straightline) function of x1 , namely E(X2 X1 = x1√ ) = α1 + β1 x1 , −∞ < α1 < +∞, −∞ < β1 < +∞, then corr(X1 , X2 ) = ρ = β1 [V(X1 )]/[V(X2 )]. Analogously,√if E(X1 X2 = x2 ) = α2 + β2 x2 , −∞ < α2 < +∞, −∞ < β2 < +∞, then ρ = β2 [V(X2 )]/[V(X1 )].
114
3.1.9 3.1.9.1
Multivariate Distribution Theory
Special Topics of Interest Mean and Variance of a Linear Function of Random Variables
For i = 1, 2, . . . , k, let gi (Xi ) be a scalar function of the random variable Xi . Then, if a1 , a2 , . . . , ak are known constants, and if L = ki=1 ai gi (Xi ), we have E(L) =
k
ai E[gi (Xi )],
i=1
and V(L) =
k
a2i V[gi (Xi )] + 2
i=1
k k−1
ai aj cov[gi (Xi ), gj (Xj )].
i=1 j=i+1
In the special case when the random variables Xi and Xj are uncorrelated for all i = j, then V(L) =
k
a2i V[gi (Xi )].
i=1
3.1.9.2
Convergence in Distribution
A sequence of random variables U1 , U2 , . . . , Un , . . . converges in distribution to a random variable U if lim FUn (u) = FU (u)
n→∞
D
for all values of u where FU (u) is continuous. Notationally, we write Un → U. As an important example, suppose that X1 , X2 , . . . , Xn constitute a random sample of size n from either a univariate discrete probability distribution pX (x) or a univariate density function fX (x), where E(X) = μ(−∞ < ¯ = n−1 n Xi , consider the μ < +∞) and V(X) = σ2 (0 < σ2 < +∞). With X i=1 standardized random variable n ¯ −μ Xi − nμ X Un = . √ = i=1√ σ/ n nσ Then, it can be shown that limn→∞ MUn (t) = et
2 /2
, leading to the conclusion
D
that Un → Z, where Z ∼N(0,1). This is the wellknown Central Limit Theorem. 3.1.9.3
Order Statistics
Let X1 , X2 , . . . , Xn constitute a random sample of size n from a univariate density function fX (x), −∞ < x < +∞, with corresponding cumulative
115
Concepts and Notation
x distribution function FX (x) = −∞ fX (t) dt. Then, the n order statistics X(1) , X(2) , . . . , X(n) satisfy the relationship −∞ < X(1) < X(2) < · · · < X(n−1) < X(n) < +∞. For r = 1, 2, . . . , n, the random variable X(r) is called the rth order statistic. In particular, X(1) = min{X1 , X2 , . . . , Xn }, X(n) = max{X1 , X2 , . . . , Xn }, and X((n+1)/2) = median{X1 , X2 , . . . , Xn } when n is an odd positive integer. For r = 1, 2, . . . , n, the distribution of X(r) is r−1 [1 − FX (x(r) )]n−r fX (x(r) ), −∞ < x(r) < +∞. fX(r) (x(r) ) = nCn−1 r−1 [FX (x(r) )]
For 1 ≤ r < s ≤ n, the joint distribution of X(r) and X(s) is equal to fX(r) ,X(s) (x(r) , x(s) ) =
n! [FX (x(r) )]r−1 (r − 1)!(s − r − 1)!(n − s)! × [FX (x(s) ) − FX (x(r) )]s−r−1 × [1 − FX (x(s) )]n−s fX (x(r) )fX (x(s) ),
− ∞ < x(r) < x(s) < +∞. And, the joint distribution of X(1) , X(2) , . . . , X(n) is fX(1) ,X(2) ,...,X(n) (x(1) , x(2) , . . . , x(n) ) = n!
n
fX (x(i) ),
i=1
− ∞ < x(1) < x(2) < · · · < x(n−1) < x(n) < +∞. 3.1.9.4
Method of Transformations
With k = 2, let X1 and X2 be two continuous random variables with joint density function fX1 ,X2 (x1 , x2 ), (x1 , x2 ) ∈ D. Let Y1 = g1 (X1 , X2 ) and Y2 = g2 (X1 , X2 ) be random variables, where the functions y1 = g1 (x1 , x2 ) and y2 = g2 (x1 , x2 ) define a onetoone transformation from the domain D in the (x1 , x2 )plane to the domain D∗ in the (y1 , y2 )plane. Further, let x1 = h1 (y1 , y2 ) and x2 = h2 (y1 , y2 ) be the inverse functions expressing x1 and x2 as functions of y1 and y2 . Then, the joint density function of the random variables Y1 and Y2 is fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 [h1 (y1 , y2 ), h2 (y1 , y2 )]J,
(y1 , y2 ) ∈ D∗ ,
where the Jacobian J, J = 0, of the transformation is the secondorder determinant ∂h1 (y1 , y2 ) ∂h1 (y1 , y2 ) ∂y1 ∂y2 . J= ∂h2 (y1 , y2 ) ∂h2 (y1 , y2 ) ∂y1 ∂y2
116
Multivariate Distribution Theory
For the special case k = 1 when Y1 = g1 (X1 ) and X1 = h1 (Y1 ), it follows that dh1 (y1 ) , y1 ∈ D∗ . fY1 (y1 ) = fX1 [h1 (y1 )] dy1 It is a straightforward generalization to the situation when Yi = gi (X1 , X2 , . . . , Xk ), i = 1, 2, . . . , k, with the Jacobian J being the determinant of a (k × k) matrix. EXERCISES Exercise 3.1. Two balls are selected sequentially at random without replacement from an urn containing N (>1) balls numbered individually from 1 to N. Let the discrete random variable X be the number on the first ball selected, and let the discrete random variable Y be the number on the second ball selected. (a) Provide an explicit expression for the joint distribution of the random variables X and Y, and also provide explicit expressions for the marginal distributions of X and Y. (b) Provide an explicit expression for pr[X ≥ (N − 1)Y = y], where y is a fixed positive integer satisfying the inequality 1 ≤ y ≤ N. (c) Derive an explicit expression for corr(X, Y), the correlation between X and Y. Find the limiting value of corr(X, Y) as N → ∞, and then comment on your finding. Exercise 3.2. Consider an experiment consisting of n mutually independent Bernoulli trials, where each trial results in either a success (denoted by the letter S) or a failure (denoted by the letter F). For any trial, the probability of a success is equal to π, 0 < π < 1, and so the probability of a failure is equal to (1 − π). For any set of n trials with outcomes arranged in a linear sequence, a run is a subsequence of outcomes of the same type which is both preceded and succeeded by outcomes of the opposite type or by the beginning or by the end of the complete sequence. The number of successes in a success (or S) run is referred to as its length. For any such sequence of n Bernoulli trial outcomes, let the discrete random variable Mn denote the length of the shortest S run in the sequence, and let the discrete random variable Ln denote the length of the longest S run in the sequence. For example, for the sequence of n = 12 outcomes given by FFSFSSSFFFSS, the observed value of M12 is m12 = 1 and the observed value of L12 is l12 = 3. (a) If n = 5, find the joint distribution of the random variables M5 and L5 . (b) Find the marginal distribution of the random variable L5 , and then find the numerical value of E(L5 ) when π = 0.90. Exercise 3.3. Suppose that pU (u) = pr(U = u) = n−1 , u = 1, 2, . . . , n. Further, suppose that, given (or conditional on) U = u, X and Y are independent geometric random variables, with pX (xU = u) = u−1 (1 − u−1 )x−1 ,
x = 1, 2, . . . , ∞
117
Exercises
and pY (yU = u) = u−1 (1 − u−1 )y−1 ,
y = 1, 2, . . . , ∞.
(a) Derive an explicit expression for corr(X, Y), the correlation between the random variables X and Y. (b) Develop an expression for pr(X = Y). What is the numerical value of this probability when n = 4? Exercise 3.4. Suppose that Z ∼ N(0, 1), that U ∼ χ2ν , and that Z and U are independent random variables. Then, the random variable Tν = √
Z U/ν
has a (central) tdistribution with ν degrees of freedom. (a) By considering the conditional density function of Tν given U = u, develop an explicit expression for the density function of Tν . (b) Find E(Tν ) and V(Tν ). Exercise 3.5 (a) Suppose that Y is a random variable with conditional mean E(YX = x) = β0 + β1 x and that X is a random variable with mean E(X) and variance V(X). Use conditional expectation theory to show that corr(X, Y) = β1
V(X) , V(Y)
and then comment on this finding. (b) Now, given the above assumptions, suppose also that E(XY = y) = α0 + α1 y. Develop an explicit expression relating corr(X, Y) to α1 and β1 , and then comment on this finding. (c) Now, suppose that E(YX = x) = β0 + β1 x + β2 x2 . Derive an explicit expression for corr(X, Y), and then comment on how the addition of the quadratic term β2 x2 affects the relationship between corr(X, Y) and β1 given in part (a). Exercise 3.6. Suppose that the amounts X and Y (in milligrams) of two toxic chemicals in a liter of water selected at random from a river near a certain manufacturing plant can be modeled by the bivariate density function fX,Y (x, y) = 6θ−3 (x − y),
0 < y < x < θ.
(a) Derive an explicit expression for corr(X, Y), the correlation between the two continuous random variables X and Y.
118
Multivariate Distribution Theory
(b) Set up appropriate integrals that are needed to find
θ . pr (X + Y) < θ(X + 2Y) > 4
Note that the appropriate integrals do not have to be evaluated, but the integrands and the limits of integration must be correctly specified for all integrals that are used. (c) Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) constitute a random sample of size n from ¯ = n−1 n Xi and Y¯ = n−1 n Yi . Develop explicit expresfX,Y (x, y), and let X i=1 i=1 ¯ − 2Y). ¯ sions for E(L) and V(L) when L = (3X Exercise 3.7. For a certain type of chemical reaction involving two chemicals A and B, let X denote the proportion of the initial amount (in grams) of chemical A that remains unreacted at equilibrium, and let Y denote the corresponding proportion of the initial amount (in grams) of chemical B that remains unreacted at equilibrium. The bivariate density function for the continuous random variables X and Y is assumed to be of the form fX,Y (x, y) = α > −1,
Γ(α + β + 3) (1 − x)α yβ , Γ(α + 1)Γ(β + 1)
0 < y < x,
0 < x < 1,
β > −1.
(a) Derive explicit expressions for fX (x) and fY (y), the marginal distributions of the random variables X and Y, and for fY (yX = x), the conditional density function of Y given X = x. (b) Use the results obtained in part (a) to develop an expression for ρX,Y = corr(X, Y), the correlation between the random variables X and Y. What is the numerical value of this correlation coefficient when α = 2 and β = 3? (c) Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) constitute a random sample of size n from ¯ = n−1 n Xi and Y¯ = n−1 n Yi , find the expected value and fX,Y (x, y). If X i=1 i=1 ¯ − 5Y) ¯ when n = 10, α = 2, and β = 3. variance of the random variable L = (3X Exercise 3.8. A certain simple biological system involves exactly two independently functioning components. If one of these two components fails, then the entire system fails. For i = 1, 2, let Yi be the random variable representing the time (in weeks) to failure of the ith component, with the distribution of Yi being negative exponential, namely, fYi (yi ) = θi e−θi yi ,
0 < yi < ∞,
θi > 0.
Further, assume that Y1 and Y2 are independent random variables. Clearly, if component 1 fails first, then Y1 is observable, but Y2 is not observable (i.e., Y2 is then said to be censored); conversely, if component 2 fails first, then Y2 is observable, but Y1 is not observable (i.e., Y1 is censored). Thus, if this biological system fails, then only two random variables, call them U and W, are observable, where U = min(Y1 , Y2 ) and where W = 1 if Y1 < Y2 and W = 0 if Y2 < Y1 .
119
Exercises
(a) Develop an explicit expression for the joint distribution fU,W (u, w) of the random variables U and W. (b) Find the marginal distribution pW (w) of the random variable W. (c) Find the marginal distribution fU (u) of the random variable U. (d) Are U and W independent random variables? Exercise 3.9. It has been documented via numerous research studies that the eldest child in a family with multiple children generally has a higher IQ than his or her siblings. In a certain large population of U.S. families with two children, suppose that the random variable Y1 denotes the IQ of the older child and that the random variable Y2 denotes the IQ of the younger child. Assume that Y1 and Y2 have a joint bivariate normal distribution with parameter values E(Y1 ) = 110, E(Y2 ) = 100, V(Y1 ) = V(Y2 ) = 225, and ρ = corr(Y1 , Y2 ) = 0.80. (a) Suppose that three families are randomly chosen from this large population of U.S. families with two children. What is the probability that the older child has an IQ at least 15 points higher than the younger child for at least two of these three families? (b) For a family randomly chosen from this population, if the older child is known to have an IQ of 120, what is the probability that the younger child has an IQ greater than 120? Exercise 3.10. Discrete choice statistical models are useful in many situations, including transportation research. For example, transportation researchers may want to know why certain individuals choose to use public bus transportation instead of a car. As a starting point, the investigators typically assume that each mode of transportation carries with it a certain value, or “utility,” that makes it more or less desirable to consumers. For instance, cars may be more convenient, but a bus may be more environmentally friendly. According to the “maximum utility principle,” consumers select the alternative that has the greatest desirability or utility. As a simple illustration of a discrete choice statistical model, suppose that there are only two possible discrete choices, A and B. Let the random variable Y take the value 1 if choice A is made, and let Y take the value 0 if choice B is made. Furthermore, let U and V be the utilities associated with the choices A and B, respectively, and assume that U and V are independent random variables, each having the same standard Gumbel (TypeI ExtremeValue) distribution. In particular, both U and V are assumed to have CDFs of the general form −x
FX (x) = pr(X ≤ x) = e−e ,
−∞ < x < ∞.
According to the maximum utility principle, Y = 1 if and only if U > V, or equivalently, if W = (U − V) > 0. (a) Show that W follows a logistic distribution, with CDF FW (w) = 1/(1 + e−w ), −∞ < w < ∞. (b) Suppose that U = α + E1 and V = E2 , where E1 and E2 are independent error terms, each following the standard Gumbel CDF of the general form FX (x) given
120
Multivariate Distribution Theory
above. Here, α represents the average population difference between the two utilities. (More generally, U and V can be modeled as functions of covariates, although this extension is not considered here.) Again, assume that we observe Y = 1 if choice A is made and Y = 0 if choice B is made. Find an explicit expression as a function of α for pr(Y = 1) under the maximum utility principle. Exercise 3.11. Let X1 , X2 , . . . , Xn constitute a random sample of size n(n ≥ 3) from the parent population fX (x) = λe−λx ,
0 < x < +∞,
0 < λ < +∞.
(a) Find the conditional density function of X1 , X2 , . . . , Xn given that S =
n
i=1 Xi = s.
(b) Consider the (n − 1) random variables Y1 =
X1 , S
Y2 =
(X1 + X2 ) ,..., S
Yn−1 =
(X1 + X2 + · · · + Xn−1 ) . S
Find the joint distribution of Y1 , Y2 , . . . , Yn−1 given that S = s. (c) When n = 3 and when n = 4, find the marginal distribution of Y1 given that S = s, and then use these results to infer the structure of the marginal distribution of Y1 given that S = s for any n ≥ 3. Exercise 3.12. Let X1 , X2 , . . . , Xn constitute a random sample of size n from a N(μ, σ2 ) population. Then, consider the n random variables Y1 , Y2 , . . . , Yn , where Yi = eXi , i = 1, 2, . . . , n. Finally, consider the following two random variables: n
(i) The arithmetic mean Y¯ a = n−1 (ii) The geometric mean Y¯ g =
i=1 n
i=1
Yi ;
1/n
Yi
.
Develop an explicit expression for corr(Y¯ a , Y¯ g ), the correlation between the two random variables Y¯ a and Y¯ g . Then, find the limiting value of this correlation as n → ∞, and comment on your finding. Exercise 3.13. For a certain public health research study, an epidemiologist is interested in determining via blood tests which particular subjects in a random sample of N(= Gn) human subjects possess a certain antibody; here, G and n are positive integers. For the population from which the random sample of N subjects is selected, the proportion of subjects in that population possessing the antibody is equal to π(0 < π < 1), a known quantity. The epidemiologist is considering two possible blood testing plans: Plan #1: Perform the blood test separately on each of the N subjects in the random sample; Plan #2: Divide the N subjects in the random sample into G groups of n subjects each; then, for each group of size n, take a blood sample from each of the n subjects in that group, mix the n blood samples together, and do one blood test on the
121
Exercises
mixture; if the blood test on the mixture is negative (indicating that the antibody is not present in that mixture), then none of those n subjects possesses the antibody; however, if the blood test on the mixture is positive (indicating that the antibody is present), then the blood test will have to be performed on each of the n subjects in that group. (a) Let T2 be the random variable denoting the number of blood tests required for Plan #2. Develop an explicit expression for E(T2 ). (b) Clearly, the larger the value of π, the more likely it is that the blood test on a mixture of n blood samples will be positive, necessitating a blood test on every one of those n subjects. Determine the optimal value of n (say, n∗ ) and the associated desired largest value of π (say, π∗ ) for which E(T2 ) < N (i.e., for which Plan #2 is preferred to Plan #1). Exercise 3.14. For the state of North Carolina (NC), suppose that the number Y of female residents who are homicide victims in any particular calendar year follows a Poisson distribution with mean E(Y) = Lλ, where L is the total number of personmonths at risk for homicide for all female NC residents during that year and where λ is the rate of female homicides per personmonth. Let π be the proportion of all homicide victims who were pregnant at the time of the homicide; more specifically, π =pr(woman was pregnant at the time of the homicide  woman was a homicide victim). It can be assumed that women in NC function independently of one another with regard to homiciderelated and pregnancyrelated issues. Domestic violence researchers are interested in making statistical inferences about the true average (or expected value) of the number Yp of homicide victims who were pregnant at the time of the homicide and about the true average (or expected value) of the number Yp¯ = (Y − Yp ) of homicide victims who were not pregnant at the time of the homicide. Find the conditional joint moment generating function MYp ,Yp¯ (s, tY = y) = MYp ,(Y−Yp ) (s, tY = y) of Yp and Yp¯ = (Y − Yp ) given Y = y, and then unconditionalize to determine the distributions of Yp and Yp¯ . Are Yp and Yp¯ independent random variables? If L has a known value, and if estimates λˆ and π ˆ of λ and π are available, provide reasonable estimates of E(Yp ) and E(Yp¯ ). Exercise 3.15. A chemical test for the presence of a fairly common protein in human blood produces a continuous measurement X. Let the random variable D take the value 1 if a person’s blood contains the protein in question, and let D take the value 0 if a person’s blood does not contain the protein in question. Among all those people carrying the protein, X has a lognormal distribution with mean E(XD = 1) = 2.00 and variance V(XD = 1) = 2.60. Among all those people not carrying the protein, X has a lognormal distribution with mean E(XD = 0) = 1.50 and variance V(XD = 0) = 3.00. In addition, it is known that 60% of all human beings actually carry this particular protein in their blood. (a) If a person is randomly chosen and is given the chemical test, what is the numerical value of the probability that this person’s blood truly contains the protein in
122
Multivariate Distribution Theory
question given that the event “1.60 < X < 1.80” has occurred (i.e., it is known that the observed value of X for this person lies between 1.60 and 1.80)? (b) Let the random variable X be the value of the chemical test for a person chosen completely randomly. Provide numerical values for E(X) and V(X). (c) Suppose that the following diagnostic rule is proposed: “classify a randomly chosen person as carrying the protein if X > c, and classify that person as not carrying the protein if X ≤ c, where 0 < c < ∞.” Thus, a carrier for which X ≤ c is misclassified, as is a noncarrier for which X > c. For this diagnostic rule, develop an expression (as a function of c) for the probability of misclassification θ of a randomly chosen human being, and then find the numerical value c∗ of c that minimizes θ. Comment on your finding. Exercise 3.16. Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) constitute a random sample of size n from a bivariate population involving two random variables X and Y, where E(X) = μx , E(Y) = μy , V(X) = σx2 , V(Y) = σy2 , and ρ = corr(X, Y). Show that the random variable n ¯ ¯ U = (n − 1)−1 (Xi − X)(Y i − Y) i=1
has an expected value equal to the parametric function cov(X, Y) = ρσx σy . Exercise 3.17. A certain large community in the United States receives its drinking water supply from a nearby lake, which itself is located in close proximity to a plant that uses benzene, among other chemicals, to manufacture styrene. Because this community has recently experienced elevated rates of leukemia, a blood cancer that has been associated with benzene exposure, the EPA decides to send a team to sample the drinking water used by this community and to determine whether or not this drinking water contains a benzene level exceeding the EPA standard of 5 parts of benzene per billion parts of water (i.e., a standard of 5 ppb). Suppose that the continuous random variable X represents the measured benzene concentration in ppb in the drinking water used by this community, and assume that X has a lognormal distribution. More specifically, assume that Y = ln(X) has a normal distribution with unknown mean μ and variance σ2 = 2. The EPA decides to take n = 10 independently chosen drinking water samples and to measure the benzene concentration in each of these 10 drinking water samples. Based on the results of these 10 benzene concentration measurements (denoted X1 , X2 , . . . , X10 ), the EPA team has to decide whether the true mean benzene concentration in this community’s drinking water is in violation of the EPA standard (i.e., exceeds 5 ppb). Three decision rules are proposed: Decision Rule #1: Decide that the drinking water is in violation of the EPA standard if at least 3 of the 10 benzene concentration measurements exceed 5 ppb. Decision Rule #2: Decide that the drinking water is in violation of the EPA standard if the geometric mean of the 10 benzene concentration measurements exceeds 5 ppb, where ⎛ ⎞1/10 10 ¯ g = ⎝ Xi ⎠ . X i=1
123
Exercises
Decision Rule #3: Decide that the drinking water is in violation of the EPA standard if the maximum of the 10 benzene concentration measurements, denoted X(10) , exceeds 5 ppb. (a) For each of these three different decision rules, develop, as a function of the unknown parameter μ, a general expression for the probability of deciding that the drinking water is in violation of the EPA standard. Also, if E(X) = 7, find the numerical value of each of these three probabilities. ¯ g ) to provide ¯ g > 5) and E(X (b) For Decision Rule #2, examine expressions for pr(X analytical arguments as to why Decision Rule #2 performs so poorly. Exercise 3.18. Let X1 , X2 , . . . , Xn constitute a random sample of size n from a normal distribution with mean μ = 0 and variance σ2 = 2. Determine the smallest value of n, say n∗ , such that pr[min{X12 , X22 , . . . , Xn2 } ≤ 0.002] ≥ 0.80. [HINT: If Z ∼N(0,1), then Z2 ∼ χ21 .] Exercise 3.19. A large hospital wishes to determine the appropriate number of coronary bypass grafts that it can perform during the upcoming calendar year based both on the size of its coronary bypass surgery staff (e.g., surgeons, nurses, anesthesiologists, technicians, etc.) and on other logistical and space considerations. National data suggest that a typical coronary bypass surgery patient would require exactly one (vessel) graft with probability π1 = 0.54, would require exactly two grafts with probability π2 = 0.22, would require exactly three grafts with probability π3 = 0.15, and would require exactly four grafts with probability π4 = 0.09. Further, suppose that it is known that this hospital cannot feasibly perform more than about 900 coronary bypass grafts in any calendar year. (a) An administrator for this hospital suggests that it might be reasonable to perform coronary bypass surgery on n = 500 different patients during the upcoming calendar year and still have a reasonably high probability (say, ≥0.95) of not exceeding the yearly upper limit of 900 coronary bypass grafts. Use the Central Limit Theorem to assess the reasonableness of this administrator’s suggestion. (b) Provide a reasonable value for the largest number n∗ of patients that can undergo coronary bypass surgery at this hospital during the upcoming year so that, with probability at least equal to 0.95, no more than 900 grafts will need to be performed. Exercise 3.20. For the ith of k drug treatment centers (i = 1, 2, . . . , k) in a certain large U.S. city, suppose that the distribution of the number Xi of adult male drug users that have to be tested until exactly one such adult drug user tests positively for HIV is assumed to be geometric, namely pXi (xi ) = π(1 − π)xi −1 ,
xi = 1, 2, . . . , +∞;
0 < π < 1.
In all that follows, assume that X1 , X2 , . . . , Xk constitute a set of mutually independent random variables.
124
Multivariate Distribution Theory
(a) If π = 0.05 and if S = 1, 100) if k = 50.
k
i=1 Xi , provide a reasonable numerical value for pr(S >
(b) Use moment generating function (MGF) theory to show that the distribution of the random variable U = 2πS is, for small π, approximately chisquared with 2k degrees of freedom. (c) Use the result in part (b) to compute a numerical value for pr(S > 1, 100) when π = 0.05 and k = 50, and then compare your answer to the one found in part (a). Exercise 3.21. Let Y1 , Y2 , . . . , Yn constitute a random sample of size n (>1) from the Pareto density function fY (y; θ) = θcθ y−(θ+1)
0 < c < y < +∞
and θ > 0,
where c is a known positive constant and where θ is an unknown parameter. The Pareto density function has been used to model the distribution of family incomes in certain populations. Consider the random variable Un = θn[Y(1) − c]/c, where Y(1) = min{Y1 , Y2 , . . . , Yn }. Directly evaluate limn→∞ FUn (u), where FUn (u) is the CDF of Un , to find the asymptotic distribution of Un . In other words, derive an explicit expression for the CDF of U when Un converges in distribution to U. Exercise 3.22. For a certain laboratory experiment involving mice, suppose that the random variable X, 0 < X < 1, represents the proportion of a fixed time period (in minutes) that it takes a mouse to locate food at the end of a maze, and further suppose that X follows a uniform distribution on the interval (0, 1), namely, fX (x) = 1,
0 < x < 1.
Suppose that the experiment involves n randomly chosen mice. Further, suppose that x1 , x2 , . . . , xn are the n realized values (i.e., the n observed proportions) of the n mutually independent random variables X1 , X2 , . . . , Xn , which themselves can be considered to constitute a random sample of size n from fX (x). Let the random variable U be the smallest proportion based on the shortest time required for a mouse to locate the food, and let the random variable V be the proportion of the fixed time period still remaining based on the longest time required for a mouse to locate the food. (a) Find an explicit expression for the joint distribution of the random variables U and V. (b) Let " R = nU and let S = nV. 4 Find the asymptotic 5# joint distribution of R and S. HINT : Evaluate limn→∞ pr[(R > r) ∩ (S > s)] . Exercise 3.23. Suppose that the total cost C (in millions of dollars) for repairs due to floods occurring in the United States in any particular year can be modeled by defining
125
Exercises
the random variable C as follows: C = 0 if X = 0
and C =
X
Cj if X > 0;
j=1
here, the number of floods X in any particular year in the United States is assumed to have a Poisson distribution with mean E(X) = λ, and Cj is the cost (in millions of dollars) for repairs due to the jth flood in that particular year. Also, it is assumed that C1 , C2 , . . . are i.i.d. random variables, each with the same expected value μ, the same variance σ2 , and the same moment generating function M(t) = E(etCj ). Note that the actual distribution of the random variables C1 , C2 , . . . has not been specified. (a) Develop an explicit expression for corr(X, C), the correlation between the random variables X and C, and then comment on the structure of the expression that you obtained. (b) Develop an explicit expression for MC (t) =E(etC ), the moment generating function of the random variable C, and then use this result to find E(C). Exercise 3.24. To evaluate the performance of a new cholesterollowering drug, a large drug company plans to enlist a randomly chosen set of k private medical practices to help conduct a clinical trial. Under the protocol proposed by the drug company, each private medical practice is to enroll into the clinical trial a set of n randomly chosen subjects with high cholesterol. The cholestorol level (in mg/dL) of each subject is to be measured both before taking the new drug and after taking the new drug on a daily basis for 6 months. The continuous response variable of interest is Y, the change in a subject’s cholesterol level over the 6month period. The following statistical model will be used: Yij = μ + βi + ij ,
i = 1, 2, . . . , k
and j = 1, 2, . . . , n.
Here, μ is the average change in cholesterol level for a typical subject with high cholesterol who takes this new cholesterollowering drug on a daily basis for a 6month period, βi is the random effect associated with the ith private medical practice, and ij is the random effect associated with the jth subject in the ith private medical practice. Here, it is assumed that βi ∼ N(0, σβ2 ), that ij ∼ N(0, σ 2 ), and that the sets {βi } and { ij } constitute a group of (k + kn) mutually independent random variables. Finally, let Y¯ = (kn)−1
n k
Yij
i=1 j=1
be the overall sample mean. ¯ and V(Y). ¯ (a) Develop explicit expressions for E(Y) (b) Suppose that it will cost Dc dollars for each clinic to enroll and monitor n subjects over the duration of the proposed clinical trial, and further suppose that each subject is to be paid Dp dollars for participating in the clinical trial. Thus, the
126
Multivariate Distribution Theory
total cost of the clinical trial is equal to C = (kDc + knDp ). Suppose that this drug company can only afford to spend C∗ dollars to conduct the proposed clinical trial. Find specific expressions for n∗ and k ∗ , the specific values of n and k that minimize the variance of Y¯ subject to the condition that C = (kDc + knDp ) = C∗ . (c) If C∗ = 100, 000, Dc = 10, 000, Dp = 100, σβ2 = 4, and σ 2 = 9, find appropriate numerical values for n∗ and k ∗ . Exercise 3.25. For i = 1, 2, suppose that the conditional distribution of Yi given that Y3 = y3 is y
pYi (yi Y3 = y3 ) = y3i e−y3 /yi !,
yi = 0, 1, . . . , ∞.
Further, assume that the random variable Y3 has the truncated Poisson distribution y
pY3 (y3 ) =
λ33
y3 !(eλ3 − 1)
,
y3 = 1, 2, . . . , ∞
and λ3 > 0;
and, also assume that the random variables Y1 and Y2 are conditionally independent given that Y3 = y3 . Then, consider the random variables R = (Y1 + Y3 )
and S = (Y2 + Y3 ).
Derive an explicit expression for the moment generating function MU (t) of the random variable U = (R + S), and then use MU (t) directly to find an explicit expression for E(U). Verify that your expression for E(U) is correct by finding E(U) directly. Exercise 3.26. Suppose that n(>1) balls are randomly tossed into C(>1) cells, so that the probability is 1/C of any ball ending up in the ith cell, i = 1, 2, . . . , C. Find the expected value and the variance of the number X of cells that will end up being empty (i.e., that will contain no balls). For the special case when C = 6 and n = 5, find the numerical values of E(X) and V(X). Exercise 3.27. A researcher at the Federal Highway Administration (FHWA) proposes the following statistical model for traffic fatalities. Let the random variable N be the number of automobile accidents occurring on a given stretch of heavily traveled interstate highway over a specified time period. For i = 1, 2, . . . , N, let the random variable Yi take the value 1 if the ith automobile accident involved at least one fatality, and let Yi take the value 0 otherwise. Let pr(Yi = 1) = π, 0 < π < 1, and further assume that the {Yi } are mutually independent dichotomous random variables. Also, let the random variable N have the geometric distribution pN (n) = θ(1 − θ)n−1 ,
n = 1, . . . , ∞;
This researcher is interested in the random variable T = Y1 + Y2 + · · · + YN ,
0 < θ < 1.
127
Exercises
the total number of automobile accidents involving fatalities on that stretch of interstate highway during the specified time period. (a) Find explicit expressions for E(T), V(T), and corr(N, T). (b) Find an explicit expression for pr(T = 0). Exercise 3.28. Let X1 , X2 , . . . , Xm constitute a random sample of size m from a POI(λ1 ) population, and let Y1 , Y2 , . . . , Yn constitute a random sample of size n from a POI(λ2 ) population. Consider the random variable ¯ − Y) ¯ = m−1 U = (X
m
Xi − n−1
i=1
n
Yi .
i=1
(a) Find explicit expressions for E(U) and V(U). (b) Use the Lagrange multiplier method to find expressions (which are functions of N, λ1 , and λ2 ) for m and n that minimize V(U) subject to the restriction (m + n) = N, where N is the total sample size that can be selected from these two Poisson populations due to cost considerations. Provide an interpretation for your findings. If N = 60, λ1 = 2, and λ2 = 8, use these expressions to find numerical values for m and n. Exercise 3.29∗ . Let the discrete random variables X and Y denote the numbers of AIDS cases that will be detected yearly in two different NC counties, one in the eastern part of the state and the other in the western part of the state. Further, assume that X and Y are independent random variables, and that they have the respective distributions pX (x) = (1 − πx )πxx , and
y
pY (y) = (1 − πy )πy ,
x = 0, 1, . . . , ∞,
0 < πx < 1
y = 0, 1, . . . , ∞,
0 < πy < 1.
(a) Derive an explicit expression for θ = pr(X = Y). (b) The absolute difference in the numbers of AIDS cases that will be detected yearly in both counties is the random variable U = X − Y. Derive an explicit expression for pU (u), the probability distribution of the random variable U. (c) For a particular year, suppose that the observed values of X and Y are x = 9 and y = 7. Provide a quantitative answer regarding the question of whether or not these observed values of X and Y provide statistical evidence that πx = πy . For your calculations, you may assume that πx ≤ 0.10 and that πy ≤ 0.10. Exercise 3.30∗ . Let the random variable Y denote the number of Lyme disease cases that develop in the state of NC during any one calendar year. The event Y = 0 is not observable since the observational apparatus (i.e., diagnosis) is activated only when Y > 0. Since Lyme disease is a rare disease, it seems appropriate to model the distribution of Y by the zerotruncated Poisson distribution (ZTPD) pY (y) =
−1
eθ − 1
y!
θy
,
y = 1, 2, . . . , ∞,
128
Multivariate Distribution Theory
where θ(>0) is called the “incidence parameter.” (a) Find an explicit expression for
ψ(t) = E (t + 1)Y . (b) Use ψ(t) to show that E(Y) =
θeθ (eθ − 1)
and
V(Y) =
θeθ (eθ − θ − 1) . (eθ − 1)2
(c) To lower the incidence of Lyme disease in NC, the state health department mounts a vigorous media campaign to educate NC residents about all aspects of Lyme disease (including information about preventing and dealing with tick bites, using protective measures such as clothing and insect repellents, recognizing symptoms of Lyme disease, treating Lyme disease, etc.) Assume that this media campaign has the desired effect of lowering θ to πθ, where 0 < π < 1. Let Z be the number of Lyme disease cases occurring during a 1year period after the media campaign is over. Assume that pZ (z) =
(πθ)z e−πθ , z!
z = 0, 1, . . . , ∞,
and that Y and Z are independent random variables. There is interest in the random variable X = (Y + Z), the total number of Lyme disease cases that occur altogether (namely, 1 year before and 1 year after the media campaign). Find an explicit expression for pX (x) = pr(X = x) = pr[(Y + Z) = x],
x = 1, 2, . . . , ∞.
(d) Find E(X) and V(X). Exercise 3.31∗ . For patients receiving a double kidney transplant, let Xi be the lifetime (in months) of the ith kidney, i = 1, 2. Also, assume that the density function of Xi is negative exponential with mean α−1 , namely, fXi (xi ) = αe−αxi ,
xi > 0, α > 0, i = 1, 2,
and further assume that X1 and X2 are independent random variables. As soon as one of the two kidneys fails, the lifetime Y (in months) of the remaining functional kidney follows the conditional density function fY (yU = u) = βe−β(y−u) ,
0 < u < y < ∞, β > 2α,
where U = min(X1 , X2 ). (a) Show that the probability that both organs are still functioning at time t is equal to π2 (t) = e−2αt ,
t ≥ 0.
129
Exercises
(b) Show that the probability that exactly one organ is still functioning at time t is equal to 2α π1 (t) = e−2αt − e−βt , t ≥ 0. (β − 2α) (c) Using the results in parts (a) and (b), develop an explicit expression for fT (t), the density function of the length of life T (in months) of the twokidney system [i.e., T is the length of time (in months) until both kidneys have failed]. (d) Develop an explicit expression for the marginal distribution fY (y) of the random variable Y. How are the random variables T and Y related? Also, find explicit expressions for the expected value and variance of the length of life of the twokidney system. Exercise 3.32∗ . Let X1 , X2 , . . . , Xn constitute a random sample of size n(>3) from a N(μ, σ2 ) parent population. Further, define ¯ = n−1 X
n
Xi ,
S2 = (n − 1)−1
i=1
n ¯ 2 (Xi − X)
and
T(n−1) =
i=1
¯ −μ X √ . S/ n
¯ T(n−1) ]. Find the numerical value of this (a) Develop an explicit expression for corr[X, correlation when n = 4 and when n = 6. √ (b) Using the fact that Γ(x) ≈ 2πe−x x(x−1/2) for large x, find the limiting value of ¯ T(n−1) ] as n → ∞, and then interpret this limit in a meaningful way. corr[X, Exercise 3.33∗ . Suppose that there are three identical looking die. Two of these three die are perfectly balanced, so that the probability is 16 of obtaining any one of the six numbers 1, 2, 3, 4, 5, and 6. The third die is an unbalanced die. For this unbalanced die, the probability of obtaining a 1 is equal to 16 − and the probability of obtaining a 6 is equal to 16 + , where , 0 < < 16 , has a known value; for this unbalance die, the probability is 16 of obtaining any of the remaining numbers 2, 3, 4, and 5. In a simple attempt to identify which of these die is the unbalanced one, it is decided that each of the three die will be tossed n times, and then that die producing the smallest number of ones in n tosses will be identified as the unbalanced die. Develop an expression (which may involve summation signs) that can be used to find the minimum value of n (say, n∗ ) required so that the probability of correctly identifying the unbalanced die will be at least 0.99. Exercise 3.34∗ (a) If X1 and X2 are i.i.d. random variables, each with the same CDF FXi (xi ) = exp(−e−xi ),
−∞ < xi < +∞,
i = 1, 2,
prove that the random variable Y = (X1 − X2 ) has CDF FY (y) = (1 + e−y )−1 ,
−∞ < y < +∞.
130
Multivariate Distribution Theory
(b) In extreme value theory, under certain validating conditions, the largest observation X(n) in a random sample X1 , X2 , . . . , Xn of size n has a CDF which can be approximated for large n by the expression FX(n) (x(n) ) = exp{− exp[−nθ(x(n) − β)]}. The parameters θ (θ > 0) and β(−∞ < β < +∞) depend on the structure of the population being sampled. Using this largesample approximation and the result from part (a), find an explicit expression for a random variable U = g[X1(m) , X2(m) ] . such that pr(θ ≤ U) = (1 − α), 0 < α < 1. Assume that there is a random sample of size 2m (m large) available that has been selected from a population of unspecified structure satisfying the validating conditions, and consider the random variable " # " # mθ X1(m) − β − mθ X2(m) − β , where X1(m) is the largest observation in the first set of m observations and where X2(m) is the largest observation in the second set of m observations. Exercise 3.35∗ . Let X1 , X2 , . . . , Xn constitute a random sample of size n from the density function fX (x), −∞ < x < +∞. x fX (t) dt. Find the distri(a) For i = 1, 2, . . . , n, let Ui = FX (Xi ), where FX (x) = −∞ bution of the random variable Ui . The transformation Ui = FX (Xi ) is called the Probability Integral Transformation. (b) Let U(1) , U(2) , . . . , U(n) be the n order statistics corresponding to the i.i.d. random variables U1 , U2 , . . . , Un . For 1 ≤ r < s ≤ n, prove that the random variable Vrs = [U(s) − U(r) ] ∼ BETA(α = s − r, β = n − s + r + 1). (c) For 0 < θ < 1 and 0 < p < 1, consider the probability statement θ = pr(Vrs ≥ p) = pr{[U(s) − U(r) ] ≥ p} # " = pr FX (X(s) ) − FX (X(r) ) ≥ p . The random interval [X(r) , X(s) ] is referred to as a 100θ percent tolerance interval for the density function fX (x). More specifically, this random interval has probability θ of containing at least a proportion p of the total area (equal to 1) under fX (x), regardless of the particular structure of fX (x). As an example, find the numerical value of θ when n = 10, r = 1, s = 10, and p = 0.80. Exercise 3.36∗ . Clinical studies where several clinics participate, using a standardized protocol, in the evaluation of new drug therapies have become quite common. In what follows, assume that a statistical design is being used for which patients who meet protocol requirements are each randomly assigned to one of t new drug therapies and to one of c clinics, where the c clinics participating in a particular study can be considered to represent a random sample from a conceptually very large population of clinics that might use the new drug therapies.
131
Exercises
For i = 1, 2, . . . , t, j = 1, 2, . . . , c, and k = 1, 2, . . . , nij , consider the linear model Yijk = μi + βj + γij + ijk , where Yijk is a continuous random variable representing the response to the ith drug therapy of the kth patient at the jth clinic, μi is the fixed average effect of the ith drug therapy, βj is a random variable representing the random effect of the jth clinic, γij is a random variable representing the random effect due to the interaction between the ith drug therapy and the jth clinic, and ijk is a random variable representing the random effect of the kth patient receiving the ith drug therapy at the jth clinic. The random variables βj , γij , and ijk are assumed to be mutually independent random variables for all i, j, and k, each with an expected value equal to 0 and with respective variances equal to σβ2 , σγ2 , and σ 2 . (a) Develop an explicit expression for V(Yijk ), the variance of Yijk . (b) Develop an explicit expression for the covariance between the responses of two different patients receiving the same drug therapy at the same clinic. (c) Develop an explicit expression for the covariance between the responses of two different patients receiving different drug therapies at the same clinic. nij Y be the mean of the nij responses for (d) For i = 1, 2, . . . , t, let Y¯ ij = n−1 ij k=1 ijk patients receiving drug therapy i at clinic j. Develop explicit expressions for E(Y¯ ij ), for V(Y¯ ij ), and for cov Y¯ ij , Y¯ i j when i = i . (e) Let L=
t
ai Y¯ i ,
i=1
where the {ai }ti=1 are a set of known constants satisfying the constraint ti=1 ai = 0 and where Y¯ i = c−1 cj=1 Y¯ ij . Develop explicit general expressions for E(L) and for V(L). For the special case when a1 = +1, a2 = −1, a3 = a4 = · · · = at = 0, how do the general expressions for E(L) and V(L) simplify? More generally, comment on why L can be considered to be an important random variable when analyzing data from multicenter clinical studies that simultaneously evaluate several drug therapies. Exercise 3.37∗ . Let X1 , X2 , . . . , Xn constitute a random sample of size n(>1) from a parent population of unspecfied structure, where E(Xi ) = μ, V(Xi ) = σ2 , and E[(Xi − μ)4 ] = μ4 , i = 1, 2, . . . , n. Define the sample mean and the sample variance, respectively, as ¯ = n−1 X
n
Xi
and S2 = (n − 1)−1
i=1
n ¯ 2. (Xi − X) i=1
(a) Prove that V(S2 ) =
1 n−3 μ4 − σ4 . n n−1
132
Multivariate Distribution Theory
(b) How does the general expression in part (a) simplify if the parent population is POI(λ) and if the parent population is N(μ, σ2 )? Exercise 3.38∗ . Let X1 , X2 , . . . , Xn constitute a random sample of size n from a parent population of unspecified structure, where E(Xi ) = μ, V(Xi ) = σ2 , and E[(Xi − μ)3 ] = μ3 , i = 1, 2, . . . , n. Define the sample mean and the sample variance, respectively, as
¯ = n−1 X
n
Xi
and S2 = (n − 1)−1
i=1
n ¯ 2. (Xi − X) i=1
¯ S2 ) can be written as an explicit function of n and μ3 . (a) Show that cov(X, (b) Suppose that X1 and X2 constitute a random sample of size n = 2 from the parent population pX (x) =
x 1−x 1 1 , 4 2
x = −1, 0, 1.
¯ and S2 are dependent random ¯ S2 ) = 0, but that X Show directly that cov(X, variables. Comment on this finding relative to the general result developed in part (a).
SOLUTIONS Solution 3.1 (a) The joint distribution of X and Y is pX,Y (x, y) = pr(X = x)pr(Y = yX = x) = x = 1, 2, . . . , N
1 N
1 , N−1
and y = 1, 2, . . . , N with x = y.
Hence, the marginal distribution of X is
pX (x) =
all y, y =x
pX,Y (x, y) =
1 (N − 1) = , N(N − 1) N
Analogously, the marginal distribution of Y is
pY (y) =
1 , N
y = 1, 2, . . . , N.
x = 1, 2, . . . , N.
133
Solutions
(b) pr[X ≥ (N − 1)Y = y] =
pr{[X ≥ (N − 1)] ∩ (Y = y)} pr(Y = y)
pr[(X = N − 1) ∩ (Y = y)] + pr[(X = N) ∩ (Y = y)] 1/N . 2 y = 1, 2, . . . , (N − 2); N−1 , = 1 , y = (N − 1), N. N−1 =
(c) E(X) = E(Y) =
N (N + 1) 1 N(N + 1)/2 i= = . N N 2 i=1
V(X) = V(Y) =
N (N + 1) 2 1 2 i − N 2 i=1
=
N(N + 1)(2N + 1)/6 (N + 1)2 (N 2 − 1) − = . N 4 12
Now, pY (yX = x) = 1/(N − 1), y = 1, 2, . . . , N with y = x. So, ⎤ ⎡ N x N(N + 1) x y ⎦− = − . E(YX = x) = ⎣ (N − 1) (N − 1) 2(N − 1) (N − 1) y=1
So, E(XY) = Ex {E(XYX = x)} = Ex {xE(YX = x)} / . x2 N(N + 1) x− = Ex 2(N − 1) (N − 1) N(N + 1) E(X 2 ) E(X) − 2(N − 1) (N − 1)
(N 2 − 1)/12 + (N + 1)2 /4 N(N + 1)2 − = 4(N − 1) (N − 1) =
=
(N + 1)(3N + 2) . 12
So, cov(X, Y) =
(N + 1)(3N + 2) (N + 1)2 −(N + 1) − = . 12 4 12
134
Multivariate Distribution Theory
Hence, cov(X, Y) corr(X, Y) = √ V(X)V(Y) −(N + 1)/12 = $ [(N 2 − 1)/12][(N 2 − 1)/12] =
−1 . (N − 1)
As N → ∞, corr(X, Y) → 0 as expected, since the population of balls is becoming infinitely large. Solution 3.2 (a) The most direct approach is to list all the 25 = 32 possible sequences and their associated individual probabilities of occurring. If we let πml = pr[(M5 = m) ∩ (L5 = l)],
m = 0, 1, . . . , 5
and l = 0, 1, . . . , 5,
it then follows directly that π00 = (1 − π)5 , π11 = [π3 (1 − π)2 + 6π2 (1 − π)3 + 5π(1 − π)4 ], π22 = [π4 (1 − π) + 4π2 (1 − π)3 ], π33 = 3π3 (1 − π)2 , π44 = 2π4 (1 − π), π55 = π5 , π12 = 6π3 (1 − π)2 , π13 = 2π4 (1 − π), and πml = 0 otherwise. (b) With pL5 (l) = pr(L5 = l) = πl , l = 0, 1, . . . , 5, then π0 = (1 − π)5 , π1 = [π3 (1 − π)2 + 6π2 (1 − π)3 + 5π(1 − π)4 ], π2 = [π4 (1 − π) + 4π2 (1 − π)3 + 6π3 (1 − π)2 , π3 = [3π3 (1 − π)2 + 2π4 (1 − π)], π4 = 2π4 (1 − π),
and
π5 = π5 .
Thus, E(L5 ) =
5
lπl = 5π(1 − π)4 + 14π2 (1 − π)3 + 22π3 (1 − π)2
l=0
+ 16π4 (1 − π) + 5π5 . When π = 0.90, then E(L5 ) = 4.1745. For further details, see Makri, Philippou, and Psillakis (2007).
135
Solutions
Solution 3.3 (a) Since E(U) = (n + 1)/2, it follows from conditional expectation theory that E(X) = Eu [E(XU = u)] = Eu (u) = E(U) = (n + 1)/2. Completely analogously, E(Y) = (n + 1)/2. Also, V(XU = u) = (1 − u−1 )/(u−1 )2 = u(u − 1), so that V(X) = Vu [E(XU = u)] + Eu [V(XU = u)] = Vu (u) + Eu [u(u − 1)] = V(U) + E(U 2 ) − 5 (n2 − 1). Completely analogously, V(Y) = E(U) = 2V(U) + [E(U)]2 − E(U) = 12 5 2 12 (n − 1). Also, E(XY) = Eu [E(XYU = u)] = Eu [E(XU = u)E(YU = u)] = Eu (u2 ) = V(U) + [E(U)]2 = (n + 1)(2n + 1)/6.
Thus, based on the above results, corr(X, Y) =
E(XY) − E(X)E(Y) 1 = . √ 5 V(X)V(Y)
(b) Now, pr(X = Y) = 1 − pr(X = Y) =1−
n
pr(X = YU = u)pr(U = u)
u=1
=1−
n 1 pr(X = YU = u). n u=1
And,
pr(X = YU = u) =
∞
pr(X = kU = u)pr(Y = kU = u)
k=1
=
∞
u−1 (1 − u−1 )k−1 u−1 (1 − u−1 )k−1
k=1
= u−2
∞
[(1 − u−1 )2 ]k−1
k=1
= u−2
1 1 − (1 − u−1 )2
=
So, pr(X = Y) = 1 −
n 1 1 . n 2u − 1 u=1
And, when n = 4, pr(X = Y) = 0.581.
1 . 2u − 1
136
Multivariate Distribution Theory
Solution 3.4 (a) First, the conditional density function of Tν given U = u is N(0, ν/u). Since & % U ∼ χ2ν =GAMMA α = 2, β = 2ν , we have fTν (tν ) =
∞ 0
fTν ,U (tν , u) du =
∞
fTν (tν U = u)fU (u) du
0
∞
ν
u1/2 −ut2 /2ν u 2 −1 e−u/2 · % ν & ν/2 du e ν √ Γ 2 2 2πν 0 ∞ ν+1 % & 1 −1 − t2 /2ν+1/2 u u 2 e ν du = √ %ν& 2πνΓ 2 2ν/2 0 −[(ν+1)/2] t2 Γ [(ν + 1)/2] 2νν + 12 = √ % & 2πνΓ 2ν 2ν/2 =
tν2
Γ [(ν + 1)/2] % & 1+ = √ ν πνΓ 2ν
−
ν+1 2
,
−∞ < tν < ∞.
(b) Since Z and U are independent random variables, we have E(Tν ) =
√ −1/2 √ νE ZU = νE(Z)E U −1/2 = 0
since E(Z) = 0.
And, V(Tν ) = E(Tν2 ) = νE Z2 U −1 = νE(Z2 )E(U −1 ) = ν(1)E(U −1 ) = νE(U −1 ). Since U ∼ GAMMA(α = 2, β = ν/2), we know that Γ (β + r) r α = E(U r ) = Γ (β)
%ν
&
2 %+ &r 2r , ν + r > 0. 2 Γ 2ν
Finally, with r = −1, & & % % Γ 2ν − 1 −1 Γ 2ν − 1 ν %ν& 2 = ν%ν & %ν & 2−1 = , ν > 2. V(Tν ) = ν (ν − 2) Γ 2 2 −1 Γ 2 −1 Solution 3.5 (a) First, using conditional expectation theory, we have E(Y) = Ex [E(YX = x)] = Ex [β0 + β1 x] = β0 + β1 E(X).
137
Solutions
And, since E(XY) = Ex [E(XYX = x)] = Ex [xE(YX = x)] = Ex [x(β0 + β1 x)] = β0 E(X) + β1 E(X 2 ), we have cov(X, Y) = β0 E(X) + β1 E(X 2 ) − E(X)[β0 + β1 E(X)] = β1 V(X). Thus, β V(X) corr(X, Y) = √ 1 = β1 V(X)V(Y)
V(X) . V(Y)
√ Thus, corr(X, Y) = kβ1 , where k = V(X)/V(Y) > 0. In particular, when β1 = 0, indicating no linear relationship between X and Y in the sense that E(YX = x) = β0 does not depend on x, then corr(X, Y) = 0. When β1 < 0, then corr(X, Y) < 0. And, when β1 > 0, then corr(X, Y) > 0. In general, corr(X, Y) is reflecting the strength of the linear, or straightline, relationship between X and Y. (b) When E(XY = y) = α0 + α1 y, it follows, using arguments identical to those used in part (a), that α1 V(Y) V(Y) corr(X, Y) = √ . = α1 V(X) V(X)V(Y) Y)]2 = Thus, it follows directly that α1 and β1 have the same sign and that [corr(X,$ α1 β1 . In particular, when both α1 and β1 are negative, then corr(X, Y) = − α1 β1 ; $ and, when both α1 and β1 are positive, then corr(X, Y) = + α1 β1 . (c) If E(YX = x) = β0 + β1 x + β2 x2 , then E(Y) = Ex [E(YX = x)] = Ex [β0 + β1 x + β2 x2 ] = β0 + β1 E(X) + β2 E(X 2 ). And, since E(XY) = Ex [E(XYX = x)] = Ex [xE(YX = x)] = Ex [x(β0 + β1 x + β2 x2 )] = β0 E(X) + β1 E(X 2 ) + β2 E(X 3 ), we have cov(X, Y) = β0 E(X) + β1 E(X 2 ) + β2 E(X 3 ) − E(X)[β0 + β1 E(X) + β2 E(X 2 )] = β1 V(X) + β2 [E(X 3 ) − E(X)E(X 2 )].
138
Multivariate Distribution Theory
Finally,
corr(X, Y) = β1
V(X) E(X 3 ) − E(X)E(X 2 ) + β2 . √ V(Y) V(X)V(Y)
Thus, unless β2 = 0 or unless E(X 3 ) = E(X)E(X 2 ), the direct connection between corr(X, Y) and β1 is lost. Solution 3.6 (a) fX (x) = 6θ−3
x 0
So, E(X r ) =
(x − y) dy = 6θ−3
θ 0
xr · 3θ−3 x2 dx =
x2 x2 −
=
2
3θr , (r + 3)
3x2 , θ3
0 < x < θ.
r = 1, 2, . . . ;
thus, E(X) =
3θ2 3θ , E(X 2 ) = , 4 5
and
V(X) =
3θ2 − 5
3θ 2 3θ2 = . 4 80
Now, fY (yX = x) = So, E(Y r X = x) =
x
yr
0
fX,Y (x, y) 2(x − y) = , fX (x) x2
2 2y − 2 x x
dy =
0 < y < x.
2xr , (r + 1)(r + 2)
r = 1, 2, . . .
So, E(YX = x) =
x (which is a linear function of x). 3
Also, E(Y 2 X = x) =
x2 x 2 x 2 x2 = , so that V(YX = x) = − . 6 6 3 18
√ Since corr(X, Y) = 13 V(X)/V(Y), we need V(Y). Now, V(Y) = Vx [E(YX = x)] + Ex [V(YX = x)] X2 X +E =V 3 18 =
3θ2 1 = V(X), so that corr(X, Y) = . 80 3
139
Solutions
Equivalently,
E(Y) = Ex [E(YX = x)] = E
and
X 3
X2 E(XY) = Ex [xE(YX = x)] = E 3
=
θ 4
=
θ2 , 5
so that cov(X, Y) = E(XY) − E(X)E(Y) θ2 3θ θ θ2 = − = . 5 4 4 80 Hence, cov(X, Y) 1 θ2 /80 corr(X, Y) = √ = as before. = $ 2 2 3 V(X)V(Y) (3θ /80)(3θ /80) (b)
pr [(X + Y) < θ] ∩ (X + 2Y) > θ θ 4
. = pr (X + Y) < θ(X + 2Y) > 4 pr (X + 2Y) > 4θ So,
θ pr [(X + Y) < θ] ∩ (X + 2Y) > 4 θ x θ x 4 2 fX,Y (x, y) dy dx + = fX,Y (x, y) dy dx θ 12
θ 4
0
θ (θ−x)
+
=
θ−4x 8
θ 2 θ 12
0
fX,Y (x, y) dy dx
0
(θ−y)
θ−8y 4
fX,Y (x, y) dx dy +
θ (θ−y) 2 θ 12
y
fX,Y (x, y) dx dy.
And,
θ x θ x 4 θ fX,Y (x, y) dy dx + = fX,Y (x, y) dy dx pr (X + 2Y) > θ θ−4x θ 4 0 12
=
θ 12
0
8
θ
θ−8y 4
4
fX,Y (x, y) dx dy +
where fX,Y (x, y) = 6θ−3 (x − y), 0 < y < x < θ.
θ θ θ 12
y
fX,Y (x, y) dx dy,
140
Multivariate Distribution Theory
3θ2 θ (c) From part (a), we know that E(Xi ) = 3θ 4 , V(Xi ) = 80 , E(Yi ) = 4 , V(Yi ) = 3θ2 , and cov(X , Y ) = θ2 . i i 80 80
So, n ¯ − 2Y) ¯ = 3E(X) ¯ − 2E(Y) ¯ = 7θ , since E(X) ¯ = 1 ¯ = E(L) = E(3X E(Xi ) and E(Y) 4 n i=1 n 1 E(Yi ). n i=1 And, n n n 1 1 1 ¯ ¯ V(L) = V(3X − 2Y) = V (3) Xi − (2) Yi = 2 V(3Xi − 2Yi ), since n i=1 n i=1 n i=1 the pairs are mutually independent. Now,
V(3Xi − 2Yi ) = 9V(Xi ) + 4V(Yi ) + 2(3)(−2)cov(Xi , Yi ) 3θ2 3θ2 θ2 27θ2 =9 +4 − 12 = . 80 80 80 80 Thus,
n 27θ2 1 27θ2 = . V(L) = 2 80 80n n i=1
Solution 3.7 (a) Since fX (x) = =
x Γ(α + β + 3) (1 − x)α yβ dy Γ(α + 1)Γ(β + 1) 0 Γ(α + β + 3) xβ+1 (1 − x)α , Γ(α + 1)Γ(β + 2)
0 < x < 1,
it follows that fY (yX = x) =
fX,Y (x, y) = (β + 1)yβ x−β−1 , fX (x)
0 < y < x < 1.
Also, 1 Γ(α + β + 3) β fY (y) = y (1 − x)α dx Γ(α + 1)Γ(β + 1) y =
Γ(α + β + 3) yβ (1 − y)α+1 , Γ(α + 2)Γ(β + 1)
0 < y < 1.
(b) It is clear that fX (x) and fY (y) are beta distributions, with variances V(X) =
(β + 2)(α + 1) (α + β + 3)2 (α + β + 4)
and
V(Y) =
(β + 1)(α + 2) . (α + β + 3)2 (α + β + 4)
141
Solutions
And, (β + 1) x β+1 β+1 yfY (yX = x) dy = β+1 y dy = x. E(YX = x) = β+2 x 0 0 x
Thus, appealing to the mathematical relationship between the correlation coefficient and the slope for a simple linear (i.e., straightline) regression model, we have
β+1 V(X) (α + 1)(β + 1) 1/2 . = ρX,Y = β+2 V(Y) (α + 2)(β + 2) When α = 2 and β = 3, ρX,Y = 0.7746. Alternatively, ρX,Y can be computed using the formula ρX,Y =
E(XY) − E(X)E(Y) , √ V(X)V(Y)
where, for example, E(XY) = Ex [E(XYX = x)] = Ex [xE(YX = x)] =
β+1 E(X 2 ), β+2
and E(X 2 ) = V(X) + [E(X)]2 =
2 (β + 2)(α + 1) β+2 . + α+β+3 (α + β + 3)2 (α + β + 4)
(c) Since ¯ = E(Xi ) = E(X)
5 (β + 2) = (α + β + 3) 8
and
¯ = E(Yi ) = E(Y)
1 (β + 1) = , (α + β + 3) 2
it follows that E(L) = 3( 58 ) − 5( 12 ) = − 58 . And, ⎡ ⎤ ⎡ ⎤ n n n 5 1 3 ¯ − 5Y) ¯ = V⎣ Xi − Yi ⎦ = 2 V ⎣ (3Xi − 5Yi )⎦ V(L) = V(3X n n n i=1
i=1
i=1
$ 1
1 9V(Xi ) + 25V(Yi ) − 2(3)(5)ρX,Y V(Xi )V(Yi ) . = V(3Xi − 5Yi ) = n n When n = 10, α = 2, and β = 3, we then find that  5 1 5 1 1 9 + 25 − 30(0.7746) = 0.0304. V(L) = 10 192 36 192 36 Solution 3.8 (a) Now, pr[(U ≤ u) ∩ (W = 0)] = pr[(Y2 ≤ u) ∩ (Y2 < Y1 )]
142
Multivariate Distribution Theory
= =
u ∞ 0 y2
θ1 e−θ1 y1
θ2 e−θ2 y2 dy1 dy2
θ1 1 − e−(θ1 +θ2 )u , (θ1 + θ2 )
0 < u < ∞, w = 0.
So, fU,W (u, 0) = θ1 e−(θ1 +θ2 )u ,
0 < u < ∞, w = 0.
And, pr[(U ≤ u) ∩ (W = 1)] = pr[(Y1 ≤ u) ∩ (Y1 < Y2 )] u ∞ = θ1 e−θ1 y1 θ2 e−θ2 y2 dy2 dy1 0 y1
=
θ2 1 − e−(θ1 +θ2 )u , (θ1 + θ2 )
0 < u < ∞, w = 1. So, fU,W (u, 1) = θ2 e−(θ1 +θ2 )u ,
0 < u < ∞, w = 1.
So, we can compactly combine the above two results notationally as follows: (1−w) w −(θ1 +θ2 )u θ2 e ,
fU,W (u, w) = θ1
0 < u < ∞, w = 0, 1.
(b) We have pW (w) =
∞ 0
(1−w) w θ2
fU,W (u, w) du = θ1
(1−w) w θ2 (θ1 + θ2 )−1 =
= θ1
∞
e−(θ1 +θ2 )u du
0
(1−w) w θ1 θ2 , θ1 + θ2 θ1 + θ2
w = 0, 1.
(c) We have fU (u) =
1
fU,W (u, w) = e−(θ1 +θ2 )u
w=0
= (θ1 + θ2 )e−(θ1 +θ2 )u ,
1 (1−w) w θ1 θ2 w=0
0 < u < ∞.
(d) Since fU,W (u, w) = fU (u)pW (w), 0 < u < ∞, w = 0, 1, it follows that U and W are independent random variables. Solution 3.9 (a) First, since Y1 and Y2 have a joint bivariate normal distribution, it follows that the random variable (Y1 − Y2 ) is normally distributed.
143
Solutions
Also, E(Y1 − Y2 ) = E(Y1 ) − E(Y2 ) = 110 − 100 = 10. And, $ V(Y1 − Y2 ) = V(Y1 ) + V(Y2 ) − 2ρ V(Y1 )V(Y2 ) = 225 + 225 − 2(0.80)(15)(15) = 90. Thus, we have (Y1 − Y2 ) ∼ N(10, 90). Hence, pr(Y1 − Y2 > 15) = pr
(Y1 − Y2 ) − 10 15 − 10 > √ = pr(Z > 0.527), √ 90 90
where Z ∼ N(0, 1), so that pr(Y1 − Y2 > 15) ≈ 0.30. So, using the BIN(n = 3, π = 0.30) distribution, the probability that the older child has an IQ at least 15 points higher than the younger child for at least two of three randomly chosen families is equal to C32 (0.30)2 (0.70)1 + C33 (0.30)3 (0.70)0 = 0.216. (b) From general properties of the bivariate normal distribution, we have 
# V(Y2 ) " y1 − E(Y1 ) , E(Y2 Y1 = y1 ) = E(Y2 ) + ρ V(Y1 ) and V(Y2 Y1 = y1 ) = V(Y2 )(1 − ρ2 ). Also, Y2 given Y1 = y1 is normally distributed. In our particular situation, 1 E(Y2 Y1 = 120) = 100 + (0.80)
225 (120 − 110) = 108, 225
and V(Y2 Y1 = 120) = 225[1 − (0.80)2 ] = 81. So, pr(Y2 > 120Y1 = 120) = pr
Y2 − 108 120 − 108 > √ √ 81 81
= pr(Z > 1.333) where Z ∼ N(0, 1), so that pr(Y2 > 120Y1 = 120) ≈ 0.09.
144
Multivariate Distribution Theory
Solution 3.10 (a) Now, FW (w) = pr(W ≤ w) = pr[(U − V) ≤ w) = Ev [pr(U − v ≤ wV = v)] ∞
= Ev [pr(U ≤ w + vV = v)] =
FU (w + vV = v) fV (v) dv, −∞
where fV (v) is the density for V. Thus, we obtain ∞
FW (w) =
−(w+v)
e−e
−v
e−v e−e
dv
−∞ ∞
=
%
&
−w −v −v e− e e e−v e−e dv
−∞ ∞
=
"%
&
#
−w −v e−v e− 1+e e dv.
−∞
Letting z = 1 + e−w , we obtain ∞
−v
e−v e−ze
FW (w) =
dv
−∞
1 = z
∞
−v
ze−v e−ze
dv = z−1 .
−∞
Thus, FW (w) = 1/(1 + e−w ), −∞ < w < ∞, and hence W has a logistic distribution. (b) Now, pr(Y = 1) = pr(U > V) = pr(α + E1 > E2 ) =pr(E1 − E2 > − α) = 1 − FW (−α) = FW (α) = 1/(1 + e−α ). This expression is exactly pr(Y = 1) for an ordinary logistic regression model with a single intercept term α. Thus, logistic regression, and, more generally, multinomial logistic regression, can be motivated via a random utility framework, where the utilities involve i.i.d. standard Gumbel error terms. Likewise, probit regression can be motivated by assuming i.i.d. standard normal error terms for the utilities. Solution 3.11 (a) Since X1 , X2 , . . . , Xn constitute a set of i.i.d. negative exponential random variables with E(Xi ) = λ−1 , i = 1, 2, . . . , n, it follows directly that S ∼ GAMMA(α = λ−1 , β = n).
145
Solutions
Hence, with s =
n
i=1 xi , we have
fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn S = s) = = = =
fX1 ,X2 ,...,Xn ,S (x1 , x2 , . . . , xn , s) fS (s) fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) fS (s) n −λxi i=1 λe λn sn−1 e−λs /(n − 1)!
(n − 1)! , xi > 0, sn−1 and
n
i = 1, 2, . . . , n,
xi = s.
i=1
(b) The inverse functions for this transformation are X1 = SY1 , X2 = S(Y2 − Y1 ), . . . , Xn−1 = S(Yn−1 − Yn−2 ); hence, it follows that the Jacobian is equal to Sn−1 , since it is the determinant of the (n − 1) × (n − 1) matrix with (i, j)th element equal to ∂Xi /∂Yj , i = 1, 2, . . . , (n − 1) and j = 1, 2, . . . , (n − 1). Thus, using the result from part (a), we have fY1 ,Y2 ,...,Yn−1 (y1 , y2 , . . . , yn−1 S = s) (n − 1)! = n−1 sn−1 = (n − 1)!, 0 < y1 < y2 < · · · < yn−1 < 1. s (c) When n = 3, fY1 (y1 ) =
1 y1
(2!) dy2 = 2(1 − y1 ),
0 < y1 < 1.
When n = 4, fY1 (y1 ) =
1 1 y1 y2
(3!) dy3 dy2 = 3(1 − y1 )2 ,
0 < y1 < 1.
In general, fY1 (y1 ) = (n − 1)(1 − y1 )n−2 ,
0 < y1 < 1.
% & Solution 3.12 From moment generating function theory, E Yir = E erXi = 2 2 erμ+r σ /2 ,
e
2μ+2σ2
−∞ < r < ∞, since Xi ∼ N(μ, σ2 ). So, E(Yi ) = eμ+σ /2 and E(Yi2 ) = , i = 1, . . . , n; also, Y1 , Y2 , . . . , Yn are mutually independent random variables. 2
146
Multivariate Distribution Theory
% & 2 So, E Y¯ a = eμ+σ /2 and, by mutual independence, n
% & 1 V Y¯ a = 2 n
2 σ2 2 e2μ+2σ − eμ+ 2 V(Yi ) =
=
n
i=1
e2μ+σ
2
2 eσ − 1
n
.
Also, by mutual independence, n n % & σ2 1/n 2/n E Yi E Yi = eμ+ 2n and E Y¯ g2 = E Y¯ g = i=1
i=1
2μ+2σ2 /n
=e so that
,
2 2 % & 2 2 2 V Y¯ g = e2μ+2σ /n − eμ+σ /2n = e2μ+σ /n eσ /n − 1 .
Finally,
⎡
⎤ n n 1 1 % ¯ & E Y¯ a Y¯ g = E ⎣ Yi Y¯ g ⎦ = E Yi Yg . n n %
&
i=1
i=1
Now, ⎡ %
&
⎛
⎢ E Yi Y¯ g = E ⎣Yi ⎝
n
⎞1/n ⎤ ⎡ ⎤ 1 (1+ ) ⎥ 1/n Yi ⎠ ⎦ = E ⎣Yi n · Yj ⎦ allj =i
i=1
1+ 1 1/n = E Yi n E Yj all j =i
=e
n+1 n
2 2 μ+ (n+1)2 σ 2n
·
2 μ +σ e n 2n2
(n−1) = e2μ+
(n+3)σ2 2n
.
So, 2 2 2 2μ+ (n+3)σ μ+ σ2 μ+ σ2n % & % & % & 2n e − e e % & E Y¯ a Y¯ g − E Y¯ a E Y¯ g corr Y¯ a , Y¯ g = = 6 2 % & % & 7 2μ+σ2 σ2 2 7 e V Y¯ a V Y¯ g e −1 σ2 σ 8 e2μ+ n e n − 1 n
e =
2μ+
1
n+1 2n
σ2
ne
σ2 n
1/2 −n
4μ+ n+1 σ2 2 n eσ − 1
e
which does not depend on μ.
1/2 σ2 ne n − n = , $ 2 eσ − 1
147
Solutions
Since σ2 /n
ne
−n=n
j ∞ σ2 /n j=0
j!
it follows that
− n = n + σ2 + n
j ∞ σ2 /n j=2
j!
− n = σ2 +
∞ 2j 1−j σ n j=2
j!
,
" # σ , lim corr(Y¯ a , Y¯ g ) = $ 2 σ e −1
n→∞
which monotonically goes to 0 as σ2 → ∞. Hence, the larger is σ2 , the smaller is the correlation. Solution 3.13 (a) For the ith group, i = 1, 2, . . . , G, if Yi denotes the number of blood tests required for Plan #2, then E(Yi ) = (1)(1 − π)n + (n + 1)[1 − (1 − π)n ] = (n + 1) − n(1 − π)n . Then, since T2 =
G
i=1 Yi , it follows that
E(T2 ) =
G
E(Yi ) = G[(n + 1) − n(1 − π)n ]
i=1
= N + G[1 − n(1 − π)n ]. (b) For N − E(T2 ) = G[n(1 − π)n − 1] > 0, we require n(1 − π)n > 1, or equivalently, 1 ln(n) > ln . n 1−π Now, it is clear that we want to pick the largest value of n, say n∗ , that maximizes the quantity ln(n)/n, thus providing the desired largest value of π, say π∗ , for which E(T2 ) < N. It is straightforward to show that n∗ = 3, which then gives π∗ =0.3066. So, if we use groups of size three, then the expected number of blood tests required under Plan #2 will be smaller than the number N of blood tests required under Plan #1 for all values of π less than 0.3066. Solution 3.14. Since the conditional distribution of Yp , given Y = y, is BIN(y, π), we have
MYp ,Yp¯ (s, tY = y) = E esYp +tYp¯ Y = y = E esYp +t(Y−Yp ) Y = y
y
= ety E e(s−t)Yp Y = y = ety πe(s−t) + (1 − π)
y = πes + (1 − π)et .
148
Multivariate Distribution Theory
Hence, letting θ = πes + (1 − π)et and recalling that Y ∼ POI(Lλ), we have
MYp ,Yp¯ (s, t) = Ey MYp ,Yp¯ (s, tY = y) = E(θY ) =
∞
(θy )
y=0
∞
(Lλθ)y (Lλ)y e−Lλ = e−Lλ y! y! y=0
Lλ[πes +(1−π)et −1]
= eLλ(θ−1) = e
s t = eLλπ(e −1) eLλ(1−π)(e −1)
= MYp (s)MYp¯ (t). Hence, we have shown that Yp ∼ POI(Lλπ), that Yp¯ ∼ POI[Lλ(1 − π)], and that Yp and Yp¯ are independent random variables. Finally, reasonable estimates of E(Yp ) and ˆ − π), E(Yp¯ ) are Lλˆ π ˆ and Lλ(1 ˆ respectively. Solution 3.15 (a) In general, if Y = ln(X) ∼ N(μ, σ2 ), then 2 E(X) = eμ+σ /2
so that
and
V(X) σ2 = ln 1 + [E(X)]2
V(X) = [E(X)]2 (eσ − 1), 2
and μ = ln[E(X)] −
σ2 . 2
So, since E(XD = 1) = 2.00 and V(XD = 1) = 2.60, it follows that E(YD = 1) = 0.443 and V(YD = 1) = 0.501. Also, since E(XD = 0) = 1.50 and V(XD = 0) = 3.00, we have E(YD = 0) = −0.018 and V(YD = 0) = 0.847. Thus, pr(D = 11.60 < X < 1.80) =
pr [(D = 1) ∩ (1.60 < X < 1.80)] pr(1.60 < X < 1.80)
=
pr(1.60 < X < 1.80D = 1)pr(D = 1) . pr(1.60 < X < 1.80D = 1)pr(D = 1) + pr(1.60 < X < 1.80D = 0)pr(D = 0)
Now, pr(1.60 < X < 1.80D = 1) = pr
0.470 − 0.443 0.588 − 0.443 cD = 0)pr(D = 0) ln(c) − (−0.018) ln(c) − 0.443 (0.60) + pr Z > (0.40) = pr Z ≤ 0.708 0.920 ln(c) − 0.443 ln(c) + 0.018 + 0.40 1 − FZ , = (0.60)FZ 0.708 0.920 where FZ (z) = pr(Z ≤ z) when Z ∼ N(0, 1).
,
150
Multivariate Distribution Theory
So, with k = ln(c), we have dθ = dk
1 0.60 1 −2 √ e 0.708 2π
2
k−0.443 0.708
1 1 −2 0.40 − √ e 0.920 2π
k+0.018 0.920
2
=0
(k 2 − 0.886k + 0.196) (k 2 + 0.036k + 0.0003) − ln(0.435) + =0 1.003 1.693 1 0.886 0.036 1 − k2 + + k ⇒ 1.693 1.003 1.003 1.693
0.196 0.0003 − 0.165 − + 0.832 = 0 + 1.693 1.003 ⇒ ln(0.848) −
⇒ 0.406k 2 − 0.905k − 0.472 = 0. The two roots of this quadratic equation are: 0.905 ±
$
(−0.905)2 − 4(0.406)(−0.472) 0.905 ± 1.259 = , 2(0.406) 0.812
or −0.436 and 2.665. The value c∗ = e−0.436 = 0.647 minimizes θ. Note that c∗ < E(XD = 0) < E(XD = 1), which appears to be a counterintuitive finding. However, note that the value of c∗ is inversely proportional to the value of the prevalence of the protein (i.e., the higher the prevalence of the protein, the lower the value of c∗ ). In the extreme, if the prevalence is 0%, then the value of c∗ is +∞; and, if the prevalence is 100%, then the value of c∗ is 0. In our particular situation, the prevalence is 60% (a fairly high value), so that a “low” value of c∗ would be anticipated. Solution 3.16. Now, n n n ¯ ¯ ¯ ¯ ¯ ¯ ¯ Y¯ (Xi − X)(Y − Y) = (X Y − X Y − XY + X Y) = Xi Yi − nX i i i i i i=1
i=1
i=1
⎡⎛ ⎞⎛ ⎞⎤ n n n = Xi Yi − n−1 ⎣⎝ Xi ⎠ ⎝ Yi ⎠⎦ i=1
i=1
i=1
⎡ ⎤ n n Xi Yi − n−1 ⎣ Xi Yi + Xi Yj ⎦ = i=1
= (1 − n−1 )
i=1 n i=1
Xi Yi − n−1
all i =j
all i =j
Xi Yj .
151
Solutions
Since ⎛ ⎞ n Xi Yi ⎠ = nE(Xi Yi ) = n[cov(Xi , Yi ) + μx μy ] E⎝ i=1
= n(ρσx σy + μx μy ), we have E(U) = (n − 1)−1 {(1 − n−1 )n(ρσx σy + μx μy ) − n−1 [n(n − 1)μx μy ]} = (n − 1)−1 [(n − 1)(ρσx σy + μx μy ) − (n − 1)μx μy ] = ρσx σy . Solution 3.17 (a) Let Xi denote the ith benzene concentration measurement, i = 1, 2, . . . , 10. Then, we know that Yi = lnXi ∼ N(μ, σ2 = 2) and that the {Yi } are mutually independent. Decision Rule #1:
Yi − μ 1.6094 − μ > √ √ 2 2
1.6094 − μ 1.6094 − μ = 1 − FZ , = pr Z > 1.4142 1.4142
pr(Xi > 5) = pr[lnXi > ln5] = pr
Z ∼ N(0, 1).
So, if θ1 = pr(Decision that drinking water violates EPA standardDecision Rule #1), then
θ1 =
10 j=3
=1−
1.6094 − μ j 1.6094 − μ 10−j C10 1 − F F Z Z j 1.4142 1.4142 2 j=0
1.6094 − μ j 1.6094 − μ 10−j C10 1 − F F . Z Z j 1.4142 1.4142 σ2
Now, with E(X) = eμ+ 2 = eμ+1 = 7, then μ = ln(7) − 1 = 1.9459 − 1 = 0.9459. Thus, with μ = 0.9459, we have FZ 1.6094−0.9459 = FZ (0.4692) ≈ 0.680, 1.4142 so that
θ1 = 1 −
2 j=0
j 10−j = 1 − 0.0211 − 0.0995 − 0.2107 = 0.6687. C10 j (0.320) (0.680)
152
Multivariate Distribution Theory
Decision Rule #2: 10 ¯g = 1 ¯ Since Y¯ = lnX i=1 Yi , then Y ∼ N(μ, 2/10). So, with θ2 =pr(Decision that 10 drinking water violates EPA standardDecision Rule #2), then ¯ −μ ln5 − μ ¯ g > 5) = pr Y θ2 = pr(X > √ √ 0.20 0.20 1.6094 − μ , Z ∼ N(0, 1). = 1 − FZ 0.4472 With μ = 0.9459, we have θ2 = 1 − FZ
1.6094 − 0.9459 0.4472
= 1 − FZ (1.4837) ≈ 1 − 0.931 = 0.069. Decision Rule #3: With θ3 = pr(Decision that drinking water violates EPA standardDecision Rule #3), we have " # θ3 = pr X(10) > 5 = 1 − pr[∩10 i=1 (Xi ≤ 5)] 510 4 Yi − μ 1.6094 − μ 10 = 1 − pr ≤ = 1 − pr(Yi ≤ ln5) √ √ 2 2 10 1.6094 − μ = 1 − pr Z ≤ , Z ∼ N(0, 1). 1.4142 With μ = 0.9459, we have 1.6094 − 0.9459 10 θ3 = 1 − pr Z ≤ 1.4142 " #10 = 1 − pr (Z ≤ 0.4692) = 1 − [FZ (0.4692)]10 = 1 − (0.680)10 = 1 − 0.0211 = 0.9789. (b) With E(X) = 7, so that μ = 0.9459, we have ¯ g > 5) = pr(Y¯ > ln5) = pr pr(X √ = pr(Z > 0.4692 n),
Y¯ − 0.9459 ln5 − 0.9459 > √ √ 2/n 2/n Z ∼ N(0, 1).
153
Solutions
¯ g > 5) gets smaller as n increases! The reason for this phenomenon can Thus, pr(X ¯ g ). In general, be determined by examining E(X ⎡⎛ ⎞1/n ⎤ n n ⎥ 1/n ¯ g) = E ⎢ E Xi E(X ⎣⎝ Xi ⎠ ⎦ = i=1
i=1
n μ σ2 (1/n)2 n
Y /n i = en+ 2 = E e σ2
= eμ+ 2n . Hence, for n > 1, ¯ g ) < E(X) = eμ+σ2 /2 , E(X with the size of the bias increasing as n increases. In particular, ¯ g ) = eμ , lim E(X
n→+∞
which is the median, not the mean, of the lognormal distribution of the random variable X. Solution 3.18
pr min{X12 , X22 , . . . , Xn2 } ≤ 0.002 = 1 − pr min{X12 , X22 , . . . , Xn2 } > 0.002
= 1 − pr ∩ni=1 (Xi2 > 0.002) =1−
n
pr
i=1
Xi 2 0.002 > √ 2 2
= 1 − pr [(Ui > 0.001)]n = 1 − (0.975)n , since Ui ∼ χ21 , i = 1, 2, . . . , n. So, n∗ is the smallest positive integer such that 1 − (0.975)n ≥ 0.80, or n≥ so that n∗ = 8.
0.20 0.20 = = 7.9051, − ln(0.975) 0.0253
154
Multivariate Distribution Theory
Solution 3.19 (a) Let the random variable Xi denote the number of coronary bypass grafts needed by the ith patient, i = 1, 2, . . . , 500. It is reasonable to assume that X1 , X2 , . . . , X500 constitute a set of 500 i.i.d random variables. Also, E(Xi ) = 4j=1 jπj = 1.79 4 2 and V(Xi ) = j=1 j πj − (1.79)2 = 1.0059. Thus, with the random variable T = 500 i=1 Xi denoting the total number of coronary bypass grafts to be performed during the upcoming year, it follows that E(T) = 500(1.79) = 895.00 and V(T) = 500(1.0059) = 502.95. Thus, by the √ Central Limit Theorem, the standardized random variable Z = [T − E(T)]/ V(T) ∼ ˙ N(0, 1) for large n. Hence, we have pr(T ≤ 900) = pr
T − E(T) 900 − E(T) ≤ √ √ V(T) V(T)
≈ pr(Z ≤ 0.223) ≈ 0.59. Thus, the hospital administrator’s suggestion is not reasonable. (b) In general, if this hospital plans to perform coronary bypass surgery on n√patients during√the upcoming year, then E(T) = 1.79n, V(T) = 1.0059n, and V(T) = 1.0029 n. Again, by the Central Limit Theorem, the standardized random variable √ Z = (T − 1.79n)/(1.0029 n) ∼ ˙ N(0, 1) for large n. Hence, we have
T − 1.79n 900 − 1.79n pr(T ≤ 900) = pr √ ≤ √ 1.0029 n 1.0029 n
√ 897.3975 − 1.7848 n . ≈ pr Z ≤ √ n Hence, for pr(T ≤ 900) ≥ 0.95, n∗ is the largest value of n satisfying the inequality √ 897.3975 − 1.7848 n∗ ≥ 1.645. √ n∗ It is straightforward to show that n∗ = 482. Solution 3.20 (a) Since S = ki=1 Xi , where the {Xi } are i.i.d. random variables, the Central Limit Theorem allows us to say that S − E(S) ∼ ˙ N(0, 1) √ V(S) for large k. So, for π = 0.05 and k = 50, E(S) =
k 50 = = 1000 π 0.05
155
Solutions
and 50(0.95) k(1 − π) = = 19,000; π2 (0.05)2
V(S) = thus,
$ V(S) = 137.84.
So, with Z ∼ ˙ N(0, 1) for large k, we have
(1100) − E(S) S − E(S) > √ √ V(S) V(S)
(1100) − (1000) = ˙ pr Z > 137.84
pr[S > (1100)] = pr
= pr(Z > 0.7255) = ˙ 0.235. (b)
k MU (t) = E(etU ) = E et(2πS) = E e2πt i=1 Xi ⎡
k
= E⎣
⎤ e2πtXi ⎦ =
i=1
k
MXi (2πt),
i=1
so that lim MU (t) =
π→0
k i=1
lim MXi (2πt) .
π→0
Now, .
πe2πt lim MXi (2πt) = lim π→0 π→0 1 − (1 − π)e2πt
/ =
0 , 0
so we can employ L’Hôpital’s Rule. So, ∂(πe2πt ) = e2πt + 2πte2πt ∂π and
∂ 1 − (1 − π)e2πt ∂π
= e2πt − (1 − π)(2t)e2πt .
156
Multivariate Distribution Theory
So, .
/ e2πt + 2πte2πt lim MXi (2πt) = lim π→0 π→0 e2πt − (1 − π)(2t)e2πt 1 + 2πt = (1 − 2t)−1 , = lim π→0 1 − (1 − π)2t so that lim MU (t) =
π→0
k
(1 − 2t)−1 = (1 − 2t)−k , i=1
which is the MGF for a GAMMA[α = 2, β = k], or χ22k , random variable. So, for small π, U = 2πS ∼ ˙ χ22k . (c) For small π, pr[S > (1100)] = pr[2πS > (2π)(1100)] = ˙ pr[U > (2π)(1100)], where U ∼ ˙ χ22k . When π = 0.05 and k = 50, we have pr[S > (1100)] = ˙ pr[U > (2)(0.05)(1100)] = pr(U > 110) ≈ 0.234, since U ∼ ˙ χ22k = χ2100 . This number agrees quite well with the numerical answer computed in part (a). Solution 3.21 FUn (u) = pr(Un ≤ u) = 1 − pr(Un > u). Now,
θn[Y(1) − c] >u c uc +c = pr Y(1) > θn uc n +c = pr ∩i=1 Yi > θn n uc +c = pr Yi > θn
pr(Un > u) = pr
i=1
n uc
+ c; θ , = 1 − FY θn
157
Solutions
where FY (y; θ) =
y c
θcθ t−(θ+1) dt
y = cθ −t−θ c
= cθ [c−θ − y−θ ] % &−θ , = 1 − y/c
0 < c < y < +∞.
So, FUn (u) = 1 − pr(Un > u), where pr(Un > u) =
⎧ ⎨ ⎩
⎡ 1 − ⎣1 −
θn + c c
−θ ⎤⎫n ⎬ ⎦ ⎭
u −θ n θn
u n −θ = 1+ . θn =
uc
1+
So, lim pr(Un > u) = lim
n→∞
n→∞
u n −θ 1+ = (eu/θ )−θ = e−u . θn
So, lim F (u) = 1 − e−u , n→∞ Un so that fU (u) = e−u , 0 < u < +∞. Solution 3.22 % & (a) First, note that U = X(1) = min{X1 , X2 , . . . , Xn } and that V = 1 − X(n) , where X(n) = max{X1 , X2 , . . . , Xn }. Since FX (x) = x, 0 < x < 1, direct application of the general formula for the joint distribution of any twoorder statistics based on a random sample of size n from fX (x) (see the introductory material for this chapter) gives % &n−2 fX(1) ,X(n) (x(1) , x(n) ) = n(n − 1) x(n) − x(1) , 0 < x(1) < x(n) < 1. % & For the transformation U = X(1) and V = 1 − X(n) , with inverse functions X(1) = U and X(n) = (1 − V), the absolute value of the Jacobian is equal to 1; so, it follows directly that fU,V (u, v) = n(n − 1)(1 − u − v)n−2 , 0 < u < 1, 0 < (u + v) < 1.
158
Multivariate Distribution Theory
(b) Now, θn = pr [(R > r) ∩ (S > s)] = pr [(nU > r) ∩ (nV > r)]
s r ∩ V> = pr U > n n 1−s/n 1−u n(n − 1)(1 − u − v)n−2 dv du = r/n
s/n
s n r . = 1− − n n So, we have lim θn = lim
n→∞
n→∞
1−
s n r − n n
[−(r + s)] n 1+ n→∞ n
= lim
= e−(r+s) = e−r e−s ,
0 < r < ∞,
0 < s < ∞.
So, asymptotically, R and S are independent random variables with exponential distributions, namely, fR (r) = e−r , 0 < r < ∞
fS (s) = e−s , 0 < s < ∞.
and
Solution 3.23 (a) E(C) = Ex [E(CX = x)] =
∞
E(CX = x)pr(X = x)
x=0
= E(CX = 0)pr(X = 0) +
∞
E (CX = x) pr(X = x)
x=1
∞
=0+
⎡ ⎤ X E⎣ Cj X = x⎦ pr(X = x)
x=1
=
∞
j=1
(xμ)pr(X = x) = μ
x=1
∞
x pr(X = x) = μE(X) = μλ;
x=0
and, E(C2 ) =
∞
E(C2 X = x)pr(X = x)
x=0
⎡⎛ ⎤ ⎞2 ∞ X ⎢⎝ ⎠ ⎥ =0+ E⎣ Cj X = x⎦ pr(X = x) x=1
j=1
159
Solutions ⎧ ⎡ ⎤⎫ ∞ ⎨ x ⎬ = Cj2 + 2 Cj Ck X = x⎦ pr(X = x) E⎣ ⎭ ⎩ x=1
=
j=1
all j m, V(Xi ) > V(Yi ) requires m > n, and V(Xi ) = V(Yi ) requires m = n. √ When N = 60, λ1 = 2, and λ2 = 8, so that θ = 2, we obtain m = 20 and n = 40. Solution 3.29∗ (a) θ = pr(X = Y) =
∞
pr[(X = s) ∩ (Y = s)]
s=0
=
∞ s=0
pr(X = s)pr(Y = s) =
∞ (1 − πx )πsx (1 − πy )πsy s=0
166
Multivariate Distribution Theory
= (1 − πx )(1 − πy )
∞ (πx πy )s = (1 − πx )(1 − πy ) s=0
=
(1 − πx )(1 − πy ) (1 − πx πy )
1 (1 − πx πy )
.
(b) First, pU (0) = pr(U = 0) = pr(X = Y) = θ. And, for u = 1, 2, . . . , ∞, pU (u) = pr(X − Y = u) = pr[(X − Y) = u] + pr[(X − Y) = −u]. So, for u = 1, 2, . . . , ∞, we have ∞
pr(X − Y = u) =
pr[(X = k + u) ∩ (Y = k)]
k=0 ∞
=
pr(X = k + u)pr(Y = k) =
k=0
∞
k (1 − πx )πk+u x (1 − πy )πy
k=0
= (1 − πx )(1 − πy )πux
∞
(πx πy )k =
k=0
(1 − πx )(1 − πy ) u πx = θπux . (1 − πx πy )
And, pr(X − Y = −u) =
∞
pr(X = k)pr(Y = k + u)
k=0
=
∞
(1 − πx )πkx (1 − πy )πk+u = (1 − πx )(1 − πy )πuy y
k=0
∞
(πx πy )k
k=0
(1 − πx )(1 − πy ) u = πy = θπuy . (1 − πx πy ) Hence, we have pU (0) = θ and pU (u) = θ πux + πuy , u = 1, 2, . . . , ∞. It can be shown directly that
∞
u=0 pU (u) = 1.
(c) For the available data, the observed value of U is u = 2. So, under the assumption that πx = πy = π, say, it follows that pr(U ≥ 2πx = πy = π) =
∞
2θπu = 2
1−π 1+π
u=2
=2
1−π 1+π
π2 1−π
∞
πu
u=2
=
2π2 . (1 + π)
167
Solutions
Given the restrictions πx ≤ 0.10 and πy ≤ 0.10, the largest possible value of pr(U ≥ 2πx = πy = π) = 2π2 /(1 + π) is 2(0.10)2 /(1 + 0.10) = 0.018. So, these data provide fairly strong statistical evidence that πx = πy . Solution 3.30∗ (a) ∞
ψ(t) = E[(t + 1)Y ] =
(t + 1)y
−1 eθ − 1 θy y!
y=1
⎫ ⎧ ∞ ∞ ⎬ ⎨ y y [θ(t + 1)] [θ(t + 1)] = (eθ − 1)−1 −1 = (eθ − 1)−1 ⎭ ⎩ y! y! y=1
=
y=0
eθ(t+1) − 1 . (eθ − 1)
(b) For Y a positive integer, ⎡ Y
E (t + 1)Y = E ⎣ .
⎤ j Y−j ⎦ CY j t (1)
j=0
t2 = E 1 + tY + Y(Y − 1) + · · · 2 = 1 + tE(Y) + So,
dψ(t) = E(Y), dt t=0
Thus, d dt
eθ(t+1) − 1 (eθ − 1)
= t=0
t2 E[Y(Y − 1)] + · · ·. 2
d2 ψ(t) = E[Y(Y − 1)]. dt2 t=0
/
θeθ(t+1) (eθ − 1)
= t=0
θeθ (eθ − 1)
= E(Y).
And, d2 dt2
eθ(t+1) − 1 (eθ − 1)
= t=0
=
d dt
θeθ(t+1) (eθ − 1)
θ2 eθ (eθ − 1)
= t=0
θ2 eθ(t+1) (eθ − 1)
= E[Y(Y − 1)].
t=0
168
Multivariate Distribution Theory
Finally, V(Y) = E[Y(Y − 1)] + E(Y) − [E(Y)]2 =
θ2 eθ θeθ θ2 e2θ + − % &2 (eθ − 1) (eθ − 1) eθ − 1
=
θeθ (eθ − θ − 1) . (eθ − 1)2
(c) pX (x) = pr(X = x) = pr[(Y + Z) = x] =
x−1
pr(Y = x − l)pr(Z = l)
l=0
=
x−1 l=0
(eθ − 1)−1 θx−l (πθ)l e−πθ · (x − l)! l!
−1 x−1 = eπθ (eθ − 1) θx l=0
=
−1 eπθ (eθ − 1) θx x−1 x!
πl l!(x − l)!
Cxl πl
l=0
−1 ⎡ ⎤ x eπθ (eθ − 1) θx ⎣ = Cxl πl (1)x−l − πx ⎦ x! l=0
−1 θx "
# (π + 1)x − πx , = eπθ (eθ − 1) x!
x = 1, 2, . . . , ∞.
(d) Since Z ∼ POI(πθ), E(X) = E(Y + Z) = E(Y) + E(Z) =
eθ θeθ + πθ = θ + π . (eθ − 1) (eθ − 1)
And, eθ (eθ − θ − 1) θeθ (eθ − θ − 1) + πθ = θ +π . V(X) = V(Y) + V(Z) = (eθ − 1)2 (eθ − 1)2 Solution 3.31* (a) pr(both kidneys are still functioning at time t) = pr[(X1 ≥ t) ∩ (X2 ≥ t)] = pr(X1 ≥ t)pr(X2 ≥ t) =
∞ t
αe−αx dx
2
= e−2αt .
169
Solutions
(b) pr(exactly one kidney is functioning at time t) =pr[(U < t) ∩ (Y ≥ t)]. Now, from part (a), FU (u) = pr(U ≤ u) = 1 − e−2αu , so that fU (u) = 2αe−2αu ,
u > 0.
Hence,
fU,Y (u, y) = fU (u)fY (yU = u) = 2αe−2αu βe−β(y−u) , 0 < u < y < ∞. Thus, pr[(U < t) ∩ (Y ≥ t)] = =
t ∞ 0 t
(2αe−2αu )[βe−β(y−u) ] dy du
2α e−2αt − e−βt , (β − 2α)
t ≥ 0.
(c) FT (t) = pr(T ≤ t) = 1 − π0 (t) − π1 (t), so that fT (t) =
d 2αβ −2αt [FT (t)] = e − e−βt , dt (β − 2α)
t ≥ 0.
(d) The marginal density of Y is given by the expression fY (y) =
y 0
(2αe−2αu )[βe−β(y−u) ] du =
2αβ −2αy e − e−βy , (β − 2α)
y ≥ 0. So, as expected, T and Y have exactly the same distribution (i.e., T = Y). Finally, 1 1 1 E(T) = E(Y) = Eu [E(YU = u)] = Eu u + = + , β 2α β and V(T) = V(Y) = Vu [E(YU = u)] + Eu [V(YU = u)] 1 1 1 1 +E 2 = 2 + 2. = Vu u + β β 4α β
170
Multivariate Distribution Theory
Solution 3.32∗ ¯ and S2 are independent random variables, and (a) Since X √ ¯ − μ)E(S−1 ) = 0. Thus, ¯ ∼ N(μ, σ2 /n), it follows that E[T(n−1) ] = nE(X X ¯ cov X,
since
. √ / √ ¯ − μ) ¯ − μ) √ n(X n(X ¯ ¯ X ¯ − μ)]E(S−1 ) =E X = nE[X( S S =
√
¯ 2 − μ2 )E(S−1 ) = nE(X
√ −1 ) ¯ nV(X)E(S
σ2 = √ E(S−1 ). n Now, since U=
(n − 1)S2 (n − 1) 2 ∼ χ = GAMMA α = 2, β = , (n−1) 2 σ2
it follows that
E(U r ) =
∞ 0
Γ
−1 −u/2 Γ n−1 e n−1 2 +r r du = 2 , + r > 0. n−1 2 Γ n−1 n−1 2 2 2 2
n−1 2
u ur
So, E(U −1/2 ) = E
so that
Γ n−1 − 12 2 2−1/2 , = √ E(S−1 ) = ⎭ n−1 Γ n−1 2
⎧ ⎫ ⎨ (n − 1)S2 −1/2 ⎬ ⎩
σ2
σ
1 Γ n−2 (n − 1) 2 −1 E(S ) = , n−1 2σ2 Γ
n > 2.
2
Thus,
n−2 Γ ¯ T(n−1) ] = σ (n − 1) 2 , cov[X, 2n Γ n−1 2 1
n > 2.
Now,
¯ − μ)2 n(X 2 ¯ − μ)2 ]E(S−2 ) = σ2 E(S−2 ). = nE[(X V[T(n−1) ] = E Tn−1 = E S2
171
Solutions
And, since 2 Γ n−1 −1 σ 2 2−1 = (n − 3)−1 , E(U −1 ) = E(S−2 ) = (n − 1) Γ n−1 2
n > 3,
it follows that E(S−2 ) =
(n − 1) (n − 3)σ2
and hence V[T(n−1) ] =
(n − 1) , (n − 3)
n > 3.
Finally, 1 ¯ T(n−1) ] = corr[X,
n−2 (n − 3) Γ 2 , 2 Γ n−1
n > 3.
2
¯ T(n−1) ] = When n = 4, corr[X, √ 2 2/3π = 0.921.
√
¯ T(n−1) ] = 2/π = 0.798; for n = 6, corr[X,
(b) Using the stated “large x” approximation for Γ(x), we have 1 ¯ T(n−1) ] ≈ corr[X,
2
=
√ − (n − 3) 2πe
√ − 2πe
n−2 2 n−1 2
e(n − 3)(n − 2)(n−3) (n − 1)(n−2)
n−2 2 n−1 2
n−2 2
− 12
n−1 2
− 12
1/2 ,
¯ T(n−1) ] = 1. so that limn→∞ corr[X, As n → ∞, the distribution of T(n−1) becomes that of a standard normal random √ √ √ ¯ − μ)/(σ/ n), and the random variable Z = − nμ/σ + ( n/σ)X ¯ variable Z = (X ¯ is a straight line function of the random variable X. Solution 3.33∗ . Let X and Y denote the numbers of ones obtained when the two balanced die are each tossed n times, and let Z be the number of ones obtained when the unbalanced die is tossed n times. Further, let U = min(X, Y). Then, n∗ is the smallest value of n such that pr(Z < U) ≥ 0.99.
172
Multivariate Distribution Theory
Now, for u = 0, 1, . . . , n, pr(U = u) = pr[(X = u) ∩ (Y > u)] + pr[(X > u) ∩ (Y = u)] + pr[(X = u) ∩ (Y = u)] = pr(X = u)pr(Y > u) + pr(X > u)pr(Y = u) + pr(X = u)pr(Y = u) ⎡ ⎤ u n−u j n−j n 1 5 1 5 n n ⎣ ⎦ Cj = 2Cu 6 6 6 6 j=u+1
+ Cnu
u n−u 2 1 5 . 6 6
Finally, determine n∗ as the smallest value of n such that
pr(Z < U) =
n n−1
pr(Z = z)pr(U = u)
z=0 u=z+1
=
z n−z 1 5 − + pr(U = u) ≥ 0.99. 6 6
n n−1
Cnz
z=0 u=z+1
Solution 3.34 (a) FY (y) = pr(Y ≤ y) = pr[(X1 − X2 ) ≤ y] = pr[X1 ≤ (X2 + y)] ∞ = pr[X1 ≤ (x2 + y)X2 = x2 ]fX2 (x2 ) dx2 −∞
= =
∞
−∞ ∞ −∞
−(x2 +y)
e−e
−x2
e−e
−x2 (1+e−y )
e−e
e−x2 dx2
e−x2 dx2 .
Let u = −e−x2 (1 + e−y ), so that du = e−x2 (1 + e−y ) dx2 . So, FY (y) =
0 −∞
eu (1 + e−y )−1 du
= (1 + e−y )−1
0 −∞
eu du
173
Solutions
= (1 + e−y )−1 [eu ]0−∞ = (1 + e−y )−1 ,
−∞ < y < +∞.
(b) Let X1(m) denote the largest observation in the first m observations, and let X2(m) denote the largest observation in the second m observations. Then, from part (a), the variable mθ[X1(m) − β] − mθ[X2(m) − β] = mθ[X1(m) − X2(m) ] has the CDF
−1 1 + e−mθ(X1(m) −X2(m) ) .
So, / k m X1(m) − X2(m) & 5 4 % = pr −k ≤ mθ X1(m) − X2(m) ≤ k k −1 e 1 1 − = % = &, (1 + e−k ) (1 + ek ) ek + 1
. 4 5 pr mθ(X1(m) − X2(m) ) ≤ k = pr θ ≤
k > 0.
So, if k1−α is chosen so that ek1−α − 1 = (1 − α), ek1−α + 1 then U=
k1−α . m X1(m) − X2(m)
Solution 3.35∗ (a) Clearly, 0 < Ui = FX (Xi ) < 1. And, FUi (ui ) = pr(Ui ≤ ui ) = pr [FX (Xi ) ≤ ui ] −1 = pr F−1 (X )] ≤ F (u ) [F X i i X X
−1 = pr Xi ≤ F−1 X (ui ) = FX FX (ui ) = ui . So, since dFUi (ui )/dui = fUi (ui ) = 1, 0 < ui < 1, it follows that Ui = FX (Xi ) has a uniform distribution on the interval (0, 1).
174
Multivariate Distribution Theory
(b) Given the result in part (a), it follows that U(1) , U(2) , . . . , U(n) can be considered to be the order statistics based on a random sample U1 , U2 , . . . , Un of size n from a uniform distribution on the interval (0, 1). Hence, from the theory of order statistics, it follows directly that fU(r) ,U(s) (u(r) , u(s) ) =
n! ur−1 (u − u(r) )s−r−1 (r − 1)!(s − r − 1)!(n − s)! (r) (s) × (1 − u(s) )n−s ,
0 < u(r) < u(s) < 1.
Now, using the method of transformations, let Vrs ≡ V = [U(s) − U(r) ] and W = U(r) , so that U(s) = (V + W) and U(r) = W. Then, the Jacobian J = 1, and so fV,W (v, w) =
n! wr−1 vs−r−1 (1 − v − w)n−s , (r − 1)!(s − r − 1)!(n − s)!
0 < (v + w) < 1. Then, using the relationship y = w/(1 − v), so that dy = dw/(1 − v), and making use of properties of the beta distribution, we have
fV (v) =
1−v 0
n! wr−1 vs−r−1 (r − 1)!(s − r − 1)!(n − s)!
× (1 − v − w)n−s dw 1 n! = [(1 − v)y]r−1 vs−r−1 0 (r − 1)!(s − r − 1)!(n − s)! × [(1 − v) − (1 − v)y]n−s (1 − v) dy 1 n! = vs−r−1 (1 − v)[(r−1)+(n−s)+1] (r − 1)!(s − r − 1)!(n − s)! 0 × yr−1 (1 − y)n−s dy =
Γ(n + 1) vs−r−1 (1 − v)n−s+r , Γ(s − r)Γ(n − s + r + 1)
0 < v < 1,
so that Vrs ∼ BETA(α = s − r, β = n − s + r + 1). (c) If n = 10, r = 1, s = 10, and p = 0.80, then fVrs (v) = 90v8 (1 − v), 0 < v < 1, so that θ = pr(V1n ≥ 0.80) =
1 0.80
90v8 (1 − v) dv = 0.6242.
Solution 3.36∗ 2 + σ2 . (a) V(Yijk ) = V(βj ) + V[γij ] + V( ijk ) = σβ2 + σαβ
175
Solutions
(b) For fixed i and j, and for k = k , we have cov(Yijk , Yijk ) = cov[μi + βj + γij + ijk , μi + βj + γij + ijk ] = cov(βj , βj ) + cov(γij , γij ) = V(βj ) + V(γij ) = σβ2 + σγ2 . (c) For i = i , for fixed j, and for k = k , we have cov(Yijk , Yi jk ) = cov[μi + βj + γij + ijk , μi + βj + γi j + i jk ] = cov(βj , βj ) = V(βj ) = σβ2 . (d) Now, n
Y¯ ij = n−1 ij
ij
[μi + βj + γij + ijk ]
k=1
= μi + βj + γij + ¯ ij , where ¯ ij = n−1 ij So,
nij
. k=1 ij E(Y¯ ij ) = μi
and
σ2 V(Y¯ ij ) = σβ2 + σγ2 + . nij
Also, for i = i , cov(Y¯ ij , Y¯ i j ) = cov[μi + βj + γij + ¯ ij , μi + βj + γi j + ¯ i j ] = cov(βj , βj ) = V(βj ) = σβ2 . (e) First, L=
t
ai Y¯ i =
i=1
= c−1 t
t
ai
t i=1 ai
ai ⎝c−1
Y¯ ij ⎠ = c−1
j=1
c t i=1 j=1
⎞
t i=1
ai
c j=1
j=1
ai μi + c−1
c j=1 βj
c
c [μi + βj + γij + ¯ ij ]
i=1
since
⎛
i=1
i=1
=
t
= 0.
ai γij + c−1
c t i=1 j=1
ai ¯ ij ,
Y¯ ij
176
Multivariate Distribution Theory
So, we clearly have E(L) =
t
ai μi .
i=1
And, V(L) = c−2
c t
a2i σγ2 + c−2
i=1 j=1
=
c t i=1 j=1
σ2 a2i nij
t t c σγ2 σ2 2 −1 a2i + 2 ai nij . c c i=1
i=1
j=1
For the special case when a1 = +1, a2 = −1, a3 = a4 = · · · = at = 0, we obtain E(L) = (μ1 − μ2 ), which is the true difference in average effects for drug therapies 1 and 2; and, ⎛ ⎞ c c σ 2 ⎝ −1 −1 ⎠ V(L) = + 2 n1j + n2j . c c 2σγ2
j=1
j=1
The random variable L = ti=1 ai Y¯ i is called a contrast since ti=1 ai = 0, and L can be used to estimate unbiasedly important comparisons among the set {μ1 , μ2 , . . . , μt } of t drug therapy average effects. For example, if a1 = +1, a2 = − 12 , a3 = − 12 , a4 = a5 = · · · = at = 0, then E(L) = μ1 − 12 (μ2 + μ3 ), which is a comparison between the average effect of drug therapy 1 and the mean of the average effects of drug therapies 2 and 3. Solution 3.37∗ (a) Now, V(S2 ) = (n − 1)−2 V
n ¯ 2 i=1 (Xi − X) . So,
⎧⎡ ⎤ ⎤2⎫ ⎧ ⎡ ⎤⎫2 ⎪ ⎪ n n n ⎨ ⎬ ⎬ ⎨ ¯ 2 ⎦ = E ⎣ (Xi − X) ¯ 2⎦ − E ⎣ (Xi − X) ¯ 2⎦ V ⎣ (Xi − X) ⎪ ⎭ ⎪ ⎩ i=1 ⎭ ⎩ i=1 i=1 ⎡
⎧⎡ ⎤2 ⎫ ⎪ ⎪ n ⎨ ⎬ 2 2 ¯ ⎣ ⎦ =E (Xi − μ) − n(X − μ) − (n − 1)2 σ4 . ⎪ ⎪ ⎩ i=1 ⎭
177
Solutions
Now, ⎧⎡ ⎧⎡ ⎤2 ⎫ ⎤2 ⎫ ⎪ ⎪ ⎪ ⎪ n n ⎨ ⎬ ⎨ ⎬
¯ − μ)2 ⎦ ¯ − μ)4 E ⎣ (Xi − μ)2 − n(X = E ⎣ (Xi − μ)2 ⎦ + n2 E (X ⎪ ⎪ ⎪ ⎪ ⎩ i=1 ⎭ ⎩ i=1 ⎭ ⎤ n ¯ − μ)2 (Xi − μ)2 ⎦ . − 2nE ⎣(X ⎡
i=1
Now, ⎧⎡ ⎤2 ⎫ ⎤ ⎡ ⎪ ⎪ n n n n−1 ⎨ ⎬ E ⎣ (Xi − μ)2 ⎦ (Xi − μ)2 (Xj − μ)2 ⎦ = E ⎣ (Xi − μ)4 + 2 ⎪ ⎪ ⎩ i=1 ⎭ i=1 i=1 j=i+1 = nμ4 + n(n − 1)σ4 . And, ⎧⎡ ⎡⎛ ⎞4 ⎤ ⎤4 ⎫ ⎪ ⎪ n n ⎨ 1 ⎬
1 ⎥ ¯ − μ)4 = E ⎢ E (X Xi − μ⎠ ⎦ = E ⎣ (Xi − μ)⎦ ⎣⎝ ⎪ ⎪ n ⎩ n ⎭ i=1
i=1
⎧⎡ ⎤4 ⎫ ⎪ ⎪ n ⎨ ⎬ = n−4 E ⎣ (Xi − μ)⎦ ⎪ ⎪ ⎩ i=1 ⎭ = n−4 E
= n−4
⎧ ⎨ ⎩
···
···
%n
4!
&
n
i=1 αi ! i=1
%n
4!
&
(Xi − μ)αi
⎫ ⎬ ⎭
n " # E (Xi − μ)αi ,
i=1 αi ! i=1
where the notation · · · denotes the summation over all nonnegative integer value choices for α1 , α2 , . . . , αn such that ni=1 αi = 4. Noting that E [(Xi − μ)αi ] = 0 when αi = 1, we only have to consider two types of terms: i) αi = 4 for some i and αj = 0 for all j( = i); and, ii) αi = 2 and αj = 2 for i = j, and αk = 0 for all k( = i or j). There are n of the former terms, each with expectation μ4 , and there are n(n − 1)/2 of the latter terms, each with expectation 6σ4 . Thus,
¯ − μ)4 = n−4 nμ4 + n(n − 1) (6σ4 ) = n−3 μ4 + 3(n − 1)σ4 . E (X 2
178
Multivariate Distribution Theory
And,
⎧⎡ ⎤2 ⎡ ⎤ ⎤⎫ ⎪ ⎪ n n n ⎨ 1 ⎬ ¯ − μ)2 E ⎣(X (Xi − μ)2 ⎦ = E ⎣ (Xi − μ)⎦ ⎣ (Xi − μ)2 ⎦ ⎪ ⎪ ⎩ n ⎭ ⎡
i=1
i=1
i=1
⎧⎡ ⎫ ⎤2 ⎡ ⎤ ⎪ ⎪ n n n n−1 ⎨ ⎬ 1 2 2 ⎦ ⎦ ⎣ ⎣ (Xi − μ) +2 (Xk − μ) (Xi − μ)(Xj − μ) = 2E ⎪ n ⎪ ⎩ i=1 ⎭ i=1 j=i+1 k=1 ⎧⎡ ⎤2 ⎫ ⎪ ⎪ n ⎨ ⎬ 1
1 2 ⎣ ⎦ (Xi − μ) = 2 nμ4 + n(n − 1)σ4 = 2E ⎪ n ⎪ ⎩ i=1 ⎭ n
= n−1 μ4 + (n − 1)σ4 . So, we have ⎡ ⎤ n
¯ 2 ⎦ = nμ4 + n(n − 1)σ4 + n2 n−3 μ4 + 3(n − 1)σ4 V ⎣ (Xi − X) i=1
− 2n n−1 μ4 + (n − 1)σ4 − (n − 1)2 σ4 =
(n − 1)2 (n − 1)(n − 3) 4 μ4 − σ . n n
Finally, ⎡ ⎤ n 1 n−3 2 −2 2 ¯ ⎣ ⎦ V(S ) = (n − 1) V (Xi − X) = μ4 − σ4 . n n−1 i=1
(b) For the POI(λ) distribution, σ2 = λ and μ4 = λ(1 + 3λ), giving n−3 2n 1 λ V(S2 ) = λ2 = 1+ λ . [λ(1 + 3λ)] − n n−1 n n−1 For the N(μ, σ2 ) distribution, μ4 = 3σ4 , giving V(S2 ) =
1 n−3 2σ4 3σ4 − σ4 = . n n−1 (n − 1)
Solution 3.38∗ (a) First, 2 − E(S2 )]} = E(XS 2) ¯ S2 ) = E{[X ¯ − E(X)][S ¯ ¯ 2 ) − E(X)E(S ¯ cov(X, ⎡⎛ ⎞⎛ ⎞⎤ n n 1 2 ¯ ⎠⎦ − μσ2 . E ⎣⎝ Xi ⎠ ⎝ (Xj − X) = n(n − 1) i=1
j=1
179
Solutions
Now, ⎞⎛ ⎞⎤ ⎡ ⎤ ⎡ ⎤ ⎡⎛ n n n ¯ 2 ⎠⎦ = E ⎣ ¯ 2⎦ + E ⎣ ¯ 2⎦ . Xi ⎠ ⎝ (Xj − X) Xi (Xi − X) Xi (Xj − X) E ⎣⎝ i=1
j=1
all i =j
i=1
And, ¯ 2 = (Xi − μ)[(Xi − μ) − (X ¯ − μ)]2 + μ(Xi − X) ¯ 2 Xi (Xi − X) ¯ − μ)(Xi − μ) = (Xi − μ)[(Xi − μ)2 − 2(X ¯ − μ)2 ] + μ(Xi − X) ¯ 2 + (X ¯ − μ)(Xi − μ)2 = (Xi − μ)3 − 2(X ¯ − μ)2 (Xi − μ) + μ(Xi − X) ¯ 2 + (X ⎡ ⎤ n 2 3 = (Xi − μ) − ⎣ (Xl − μ)⎦ (Xi − μ)2 n l=1
⎡
⎤2 n 1 ¯ 2 +⎣ (Xl − μ)⎦ (Xi − μ) + μ(Xi − X) n l=1
= (Xi − μ)3 −
2 2 (Xi − μ)3 − (Xi − μ)2 (Xl − μ) n n all l( =i)
⎤ ⎡ n 1 ⎣ 2 (Xl − μ) + (Xl − μ)(Xl − μ)⎦ + 2 n
all l =l
l=1
¯ 2 × (Xi − μ) + μ(Xi − X) = (Xi − μ)3 −
2 2 (Xi − μ)3 − (Xi − μ)2 (Xl − μ) n n all l( =i)
1 1 (Xl − μ)2 + 2 (Xi − μ)3 + 2 (Xi − μ) n n all l( =i)
1 + 2 (Xi − μ) n
all l =l
¯ 2. (Xl − μ)(Xl − μ) + μ(Xi − X)
Finally, ⎡ E⎣
n i=1
⎤ ¯ 2 ⎦ = nμ3 − 2μ3 − 0 + μ3 + 0 + 0 + μ(n − 1)σ2 Xi (Xi − X) n =
(n − 1)2 μ3 + (n − 1)μσ2 . n
180
Multivariate Distribution Theory
Also, for i = j, ¯ 2 = (Xi − μ)[(Xj − μ) − (X ¯ − μ)]2 + μ(Xj − X) ¯ 2 Xi (Xj − X) ¯ − μ)(Xj − μ) = (Xi − μ)[(Xj − μ)2 − 2(X ¯ 2 ¯ − μ)2 ] + μ(Xj − X) + (X ¯ − μ)(Xi − μ)(Xj − μ) = (Xi − μ)(Xj − μ)2 − 2(X ¯ 2 ¯ − μ)2 (Xi − μ) + μ(Xj − X) + (X ⎡ ⎤ n 2 = (Xi − μ)(Xj − μ)2 − ⎣ (Xl − μ)⎦ n l=1
⎡
⎤2 n 1 (Xl − μ)⎦ × (Xi − μ)(Xj − μ) + ⎣ n l=1
¯ 2 × (Xi − μ) + μ(Xj − X) 2 (X − μ)(Xj − μ) n i ⎡ n n 1 (Xl − μ) + 2 ⎣ (Xl − μ)2 × n
= (Xi − μ)(Xj − μ)2 −
l=1
+
all l =l
l=1
⎤
¯ 2 (Xl − μ)(Xl − μ)⎦(Xi − μ) + μ(Xj − X)
= (Xi − μ)(Xj − μ)2 −
2 (Xi − μ)(Xj − μ) (Xl − μ) n n
l=1
1 1 + 2 (Xi − μ)3 + 2 (Xi − μ) (Xl − μ)2 n n + (Xi − μ)
all l =l
all l( =i)
¯ 2. (Xl − μ)(Xl − μ) + μ(Xj − X)
Hence, we have ⎡ E⎣
all i =j
⎤ ¯ 2 ⎦ = 0 − 0 + 1 [n(n − 1)μ3 ] + 0 + 0 Xi (Xj − X) n2 + μ(n − 1)2 σ2 n−1 = μ3 + (n − 1)2 μσ2 . n
181
Solutions
Finally, we have ¯ S2 ) = cov(X,
. 1 (n − 1)2 μ3 + (n − 1)μσ2 n n(n − 1) n−1 + μ3 + (n − 1)2 μσ2 − μσ2 . n
= μ3 /n. (b) The joint distribution of X1 and X2 is equal to pX1 ,X2 (x1 , x2 ) = pX1 (x1 )pX2 (x2 )
x1  1−x1  x2  1−x2  1 1 1 = 41 2 4 2 2+x1 +x2  = 12 ,
x1 = −1, 0, 1
and x2 = −1, 0, 1.
¯ S2 ) values occur with the Hence, it follows directly that the following pairs of (X, following probabilities: (−1,0) with probability 1/16, (−1/2,1/2) with probability 1/4, (0,0) with probability 1/4, (0,2) with probability 1/8, (1/2, 1/2) with probability 1/4, and (1,0) with probability 1/16. ¯ S2 ) = 0. However, Hence, it is easy to show by direct computation that cov(X, since ¯ = 1) = 1 = pr(S2 = 0) = 3 , pr(S2 = 0X 8 ¯ and S2 are dependent random variables. it follows that X Clearly, pX (x) is a discrete distribution that is symmetric about E(X) = 0, so ¯ S2 ) = 0. that μ3 = 0. Thus, it follows from part (a) that, as shown directly, cov(X, ¯ and S2 are independent when selecting a More generally, the random variables X random sample from a normally distributed parent population, but are generally dependent when selecting a random sample from a nonnormal parent population.
4 Estimation Theory
4.1
Concepts and Notation
4.1.1
Point Estimation of Population Parameters
Let the random variables X1 , X2 , . . . , Xn constitute a sample of size n from some population with properties depending on a row vector θ = (θ1 , θ2 , . . . , θp ) of p unknown parameters, where the parameter space is the set Ω of all possible values of θ. In the most general situation, the n random variables X1 , X2 , . . . , Xn are allowed to be mutually dependent and to have different distributions (e.g., different means and different variances). A point estimator or a statistic is any scalar function U(X1 , X2 , . . . , Xn ) ≡ U(X) of the random variables X1 , X2 , . . . , Xn , but not of θ. A point estimator or statistic is itself a random variable since it is a function of the random vector X = (X1 , X2 , . . . , Xn ). In contrast, the corresponding point estimate or observed statistic U(x1 , x2 , . . . , xn ) ≡ U(x) is the realized (or observed) numerical value of the point estimator or statistic that is computed using the realized (or observed) numerical values x1 , x2 , . . . , xn of X1 , X2 , . . . , Xn for the particular sample obtained. Some popular methods for obtaining a row vector θˆ = (θˆ 1 , θˆ 2 , . . . , θˆ p ) of point estimators of the elements of the row vector θ = (θ1 , θ2 , . . . , θp ), where θˆ j ≡ θˆ j (X) for j = 1, 2, . . . , p, are the following: 4.1.1.1
Method of Moments (MM)
For j = 1, 2, . . . , p, let 1 j Mj = Xi n n
i=1
1 j E(Mj ) = E(Xi ), n n
and
i=1
where E(Mj ), j = 1, 2, . . . , p, is a function of the elements of θ. Then, θˆ mm , the MM estimator of θ, is obtained as the solution of the p equations Mj = E(Mj ), j = 1, 2, . . . , p. 183
184
Estimation Theory
4.1.1.2
Unweighted Least Squares (ULS) Let Qu = ni=1 [Xi − E(Xi )]2 . Then, θˆ uls , the ULS estimator of θ, is chosen to minimize Qu and is defined as the solution of the p equations ∂Qu = 0, ∂θj
j = 1, 2, . . . , p.
4.1.1.3 Weighted Least Squares (WLS) Let Qw = ni=1 wi [Xi − E(Xi )]2 , where w1 , w2 , . . . , wn are weights. Then, θˆ wls , the WLS estimator of θ, is chosen to minimize Qw and is defined as the solution of the p equations ∂Qw = 0, j = 1, 2, . . . , p. ∂θj 4.1.1.4
Maximum Likelihood (ML)
Let L(x; θ) denote the likelihood function, which is often simply the joint distribution of the random variables X1 , X2 , . . . , Xn . Then, θˆ ml , the ML estimator (MLE) of θ, is chosen to maximize L(x; θ) and is defined as the solution of the p equations ∂ ln L(x; θ) = 0, j = 1, 2, . . . , p. ∂θj If τ(θ) is a scalar function of θ, then τ(θˆ ml ) is the MLE of τ(θ); this is known as the invariance property of MLEs.
4.1.2
Data Reduction and Joint Sufficiency
The goal of any statistical analysis is to quantify the information contained in a sample of size n by making valid and precise statistical inferences using the smallest possible number of point estimators or statistics. This data reduction goal leads to the concept of joint sufficiency. 4.1.2.1
Joint Sufficiency
The statistics U1 (X), U2 (X), . . . , Uk (X), k ≥ p, are jointly sufficient for the parameter vector θ if and only if the conditional distribution of X given U1 (X) = U1 (x), U2 (X) = U2 (x), . . . , Uk (X) = Uk (x) does not in any way depend on θ. More specifically, the phrase “in any way” means that the conditional distribution of X, including the domain of X, given the k sufficient statistics is not a function of θ. In other words, the jointly sufficient statistics U1 (X), U2 (X), . . . , Uk (X) utilize all the information about θ that is contained in the sample X.
185
Concepts and Notation
4.1.2.2
Factorization Theorem
To demonstrate joint sufficiency, the Factorization Theorem (Halmos and Savage, 1949) is quite useful: Let X be a discrete or continuous random vector with distribution L(x; θ). Then, U1 (X), U2 (X), . . . , Uk (X) are jointly sufficient for θ if and only if there are nonnegative functions g[U1 (x), U2 (x), . . . , Uk (x); θ] and h(x) such that L(x; θ) = g[U1 (x), U2 (x), . . . , Uk (x); θ]h(x), where, given U1 (X) = U1 (x), U2 (X) = U2 (x), . . . , Uk (X) = Uk (x), the function h(x) in no way depends on θ. Also, any onetoone function of a sufficient statistic is also a sufficient statistic. As an important example, a family Fd = {pX (x; θ), θ ∈ Ω} of discrete probability distributions is a member of the exponential family of distributions if pX (x; θ) can be written in the general form pX (x; θ) = h(x)b(θ)e
k
j=1 wj (θ)vj (x)
,
where h(x) ≥ 0 does not in any way depend on θ, b(θ) ≥ 0 does not depend on x, w1 (θ), w2 (θ), . . . , wk (θ) are realvalued functions of θ but not of x, and v1 (x), v2 (x), . . . , vk (x) are realvalued functions of x but not of θ. Then, if X1 , X2 , . . . , Xn constitute a random sample of size n from pX (x; θ), so that pX (x; θ) = ni=1 pX (xi ; θ), it follows that / . n k n w (θ) v (x ) [ ] n i=1 j i p (x; θ) = [b(θ)] e j=1 j h(x ) ; i
X
i=1
so, by the Factorization Theorem, the p statistics Uj (X) = ni=1 vj (Xi ), j = 1, 2, . . . , k, are jointly sufficient for θ. The above results also hold when considering a family Fc = {fX (x; θ), θ ∈ Ω} of continuous probability distributions. Many important families of distributions are members of the exponential family; these include the binomial, Poisson, and negative binomial families in the discrete case, and the normal, gamma, and beta families in the continuous case.
4.1.3
Methods for Evaluating the Properties of a Point Estimator
For now, consider the special case of one unknown parameter θ. 4.1.3.1
MeanSquared Error (MSE)
The meansquared error of θˆ as an estimator of the parameter θ is defined as ˆ θ) = E[(θˆ − θ)2 ] = V(θ) ˆ + [E(θ) ˆ − θ]2 , MSE(θ,
186
Estimation Theory
ˆ is the variance of θˆ and [E(θ) ˆ − θ]2 is the squaredbias of θˆ as an where V(θ) estimator of the parameter θ. An estimator with small MSE has both a small variance and a small squaredbias. Using MSE as the criterion for choosing among a class of possible estimators of θ is problematic because this class is too large. Hence, it is common practice to limit the class of possible estimators of θ to those estimators that are unbiased estimators of θ. More formally, θˆ is an unbiased estimator of the ˆ = θ for all θ ∈ Ω. Then, if θˆ is an unbiased estimator of θ, we parameter θ if E(θ) ˆ ˆ so that the criterion for choosing among competing have MSE(θ, θ) = V(θ), unbiased estimators of θ is based solely on variance considerations. 4.1.3.2
Cramér–Rao Lower Bound (CRLB)
Let L(x; θ) denote the distribution of the random vector X, and let θˆ be any unbiased estimator of the parameter θ. Then, under certain mathematical regularity conditions, it can be shown (Rao, 1945; Cram´er, 1946) that ˆ ≥ V(θ)
1 1 " #= " #. Ex (∂ ln L(x; θ)/∂θ)2 −Ex ∂ 2 ln L(x; θ)/∂θ2
In the important special case when X1 , X2 , . . . , Xn constitute a random sample ofsize n from the discrete probability distribution pX (x; θ), so that L(x; θ) = ni=1 pX (xi ; θ), then we obtain ˆ ≥ V(θ)
1 1 % 4 5. &2 = 2 −nEx ∂ ln[pX (x; θ)]/∂θ2 nEx ∂ ln[pX (x; θ)]/∂θ
A completely analogous result holds when X1 , X2 , . . . , Xn constitute a random sample of size n from the density function fX (x; θ). For further discussion, see Lehmann (1983). 4.1.3.3
Efficiency
The efficiency of any unbiased estimator θˆ of θ relative to the CRLB is defined as ˆ θ) = EFF(θ,
CRLB , ˆ V(θ)
ˆ θ) ≤ 1, 0 ≤ EFF(θ,
ˆ θ). and the corresponding asymptotic efficiency is limn→∞ EFF(θ, There are situations when no unbiased estimator of θ achieves the CRLB. In such a situation, we can utilize the Rao–Blackwell Theorem (Rao, 1945; Blackwell, 1947) to aid in the search for that unbiased estimator with the smallest variance (i.e., the minimum variance unbiased estimator or MVUE). First, we need to introduce the concept of a complete sufficient statistic:
187
Concepts and Notation
4.1.3.4
Completeness
The family Fu = {pU (u; θ), θ ∈ Ω}, or Fu = {fU (u; θ), θ ∈ Ω}, for the sufficient statistic U is called complete (or, equivalently, U is a complete sufficient statistic) if the condition E[g(U)] = 0 for all θ ∈ Ω implies that pr[g(U) = 0] = 1 for all θ ∈ Ω. an important special case, for an exponential family with Uj (X) = As n v i=1 j (Xi ) for j = 1, 2, . . . , k, the vector of sufficient statistics U(X) = [U1 (X), U2 (X), . . . , Uk (X)] is complete if {w1 (θ), w2 (θ), . . . , wk (θ) : θ ∈ Ω} contains an open set in k .
4.1.3.5
Rao–Blackwell Theorem
Let U ∗ ≡ U ∗ (X) be any unbiased point estimator of θ, and let U ≡ U(X) be a sufficient statistic for θ. Then, θˆ = E(U ∗ U = u) is an unbiased point estimator ˆ ≤ V(U ∗ ). If U is a complete sufficient statistic for θ, then θˆ is the of θ, and V(θ) unique (with probability one) MVUE of θ. It is important to emphasize that the variance of the MVUE of θ may not achieve the CRLB.
4.1.4 4.1.4.1
Interval Estimation of Population Parameters Exact Confidence Intervals
An exact 100(1 − α)% confidence interval (CI) for a parameter θ involves two random variables, L (called the lower limit) and U (called the upper limit), defined so that pr(L < θ < U) = (1 − α), where typically 0 < α ≤ 0.10. The construction of exact CIs often involves the properties of statistics based on random samples from normal populations. Some illustrations are as follows.
4.1.4.2
Exact CI for the Mean of a Normal Distribution
Let X1 , X2 , . . . , Xn constitute a random sample from a N(μ, σ2 ) parent popn −1 ¯ ulation. The sample X=n i=1 Xi and the sample variance is n mean is 2 −1 2 ¯ (X − X) . S = (n − 1) i i=1
188
Estimation Theory
Then,
2 σ ¯ ∼ N μ, X , n n ¯ 2 (n − 1)S2 i=1 (Xi − X) = ∼ χ2n−1 , 2 2 σ σ ¯ and S2 are independent random variables. and X In general, if Z ∼ N(0, 1), U ∼ χ2ν , and Z and √ U are independent random variables, then the random variable Tν = Z/ U/ν ∼ tν ; that is, Tν has a tdistribution with ν degrees of freedom (df). Thus, the random variable Tn−1 = $
√ ¯ − μ)/(σ/ n) (X [(n − 1)S2 /σ2 ]/(n − 1)
=
¯ −μ X √ ∼ tn−1 . S/ n
With tn−1,1−α/2 defined so that pr(Tn−1 < tn−1,1−α/2 ) = 1 − α/2, we then have (1 − α) = pr(−tn−1,1−α/2 < Tn−1 < tn−1,1−α/2 ) ¯ −μ X = pr −tn−1,1−α/2 < √ < tn−1,1−α/2 S/ n
¯ − tn−1,1−α/2 √S < μ < X ¯ + tn−1,1−α/2 √S . = pr X n n Thus, ¯ − tn−1,1−α/2 √S L=X n
and
¯ + tn−1,1−α/2 √S , U=X n
giving ¯ ± tn−1,1−α/2 √S X n as the exact 100(1 − α)% CI for μ based on a random sample X1 , X2 , . . . , Xn of size n from a N(μ, σ2 ) parent population. 4.1.4.3
Exact CI for a Linear Combination of Means of Normal Distributions
More generally, for i = 1, 2, . . . , k, let Xi1 , Xi2 , . . . , Xini constitute a random sample of size ni from a N(μi , σi2 ) parent population. Then, ¯ i = n−1 i. For i = 1, 2, . . . , k, X i
ni
j=1 Xij
∼N
σ2 μi , nii
;
189
Concepts and Notation
ii. For i = 1, 2, . . . , k,
(ni −1)S2i σi2
ni
¯ 2 j=1 (Xij −Xi ) σi2
=
∼ χ2ni −1 ;
¯ i , S2 }k are mutually independent. iii. The 2k random variables {X i i=1 Now, assuming σi2 = σ2 for all i (i.e., assuming variance homogeneity), if c1 , c2 , . . . , ck are known constants, then the random variable k i=1
and, with N =
k
⎡ ⎛ ⎞⎤ k k 2 c i ⎠⎦ ¯i ∼ N⎣ ci X ci μ i , σ 2 ⎝ ; ni
i=1 ni ,
i=1
i=1
the random variable
k
2 i=1 (ni − 1)Si = σ2
k
i=1
ni
j=1 (Xij σ2
¯ i )2 −X
∼ χ2N−k ;
Thus, the random variable TN−k =
k
¯ − 1 k
i=1 ci Xi
Sp
k
i=1 ci μi
where the pooled sample variance is S2p = This gives k i=1
∼ tN−k ,
ci2 i=1 ni
k
i=1 (ni
− 1)S2i /(N − k).
6 7 k 2 7 c i ¯ α ci Xi ± tN−k,1− 2 Sp 8 ni i=1
as the exact 100(1 − α)% CI for the parameter ki=1 ci μi . In the special case when k = 2, c1 = +1, and c2 = −1, we obtain the wellknown twosample CI for (μ1 − μ2 ), namely, ¯1 −X ¯ 2 ) ± tn +n −2,1−α/2 Sp (X 1 2
4.1.4.4
1 1 + . n1 n2
Exact CI for the Variance of a Normal Distribution
For i = 1, 2, . . . , k, since (ni − 1)S2i /σi2 ∼ χ2ni −1 , we have (1 − α) = pr
χ2ni −1,α/2
<
(ni − 1)S2i σi2
<
χ2ni −1,1−α/2
= pr(L < σi2 < U),
190
Estimation Theory
where L=
(ni − 1)S2i χ2ni −1,1−α/2
and
U=
(ni − 1)S2i χ2ni −1,α/2
,
and where χ2ni −1,α/2 and χ2ni −1,1−α/2 are, respectively, the 100 (α/2) and 100 (1 − α/2) percentiles of the χ2ni −1 distribution. 4.1.4.5
Exact CI for the Ratio of Variances of Two Normal Distributions
In general, if U1 ∼ χ2ν1 , U2 ∼ χ2ν2 , and U1 and U2 are independent random variables, then the random variable Fν1 ,ν2 =
U1 /ν1 ∼ fν1 ,ν2 ; U2 /ν2
that is, Fν1 ,ν2 follows an f distribution with ν1 numerator df and ν2 denominator df. As an example, when k = 2, the random variable " # [(n1 − 1)S21 ]/σ12 /(n1 − 1) σ22 S21 ∼ fn1 −1,n2 −1 . Fn1 −1,n2 −1 = " = # [(n2 − 1)S22 ]/σ22 /(n2 − 1) S22 σ12 , we have So, since fn1 −1,n2 −1,α/2 = fn−1 2 −1,n1 −1,1−α/2 (1 − α) = pr
fn−1 2 −1,n1 −1,1−α/2
= pr L < where
σ22 σ12
L = fn−1 2 −1,n1 −1,1−α/2
S22 S21
<
S21 S22
σ22 σ12
< fn1 −1,n2 −1,1−α/2
0, lim pr(θˆ − θ > ) = 0.
n→∞
P In this case, we say that θˆ converges in probability to θ, and we write θˆ → θ. P Two sufficient conditions so that θˆ → θ are
ˆ = θ and lim V(θ) ˆ = 0. lim E(θ)
n→∞
4.1.4.8
n→∞
Slutsky’s Theorem
P
D
If Vn → c, where c is a constant, and if Wn → W, then D
Vn Wn → cW
and
D
(Vn + Wn ) → (c + W).
To develop MLbased largesample approximate CIs, we make use of the following properties of the MLE θˆ ml ≡ θˆ of θ, assuming L(x; θ) is the correct likelihood function and assuming that certain regularity conditions hold: i. For j = 1, 2, . . . , p, θˆj is a consistent estimator of θj . More generally, if ˆ is a the scalar function τ(θ) is a continuous function of θ, then τ(θ) consistent estimator of τ(θ). ii. √ D n(θˆ − θ) → MVNp [0, nI −1 (θ)], where I(θ) is the (p × p) expected information matrix, with (j, j ) element equal to ∂ 2 ln L(x; θ) −Ex , ∂θj ∂θj
and where I −1 (θ) is the largesample covariance matrix of θˆ based on expected information. In particular, the (j, j ) element of I −1 (θ) is denoted vjj (θ) = cov(θˆ j , θˆ j ), j = 1, 2, . . . , p and j = 1, 2, . . . , p.
192
4.1.4.9
Estimation Theory
Construction of MLBased CIs
As an illustration, properties (i) and (ii) will now be used to construct a largesample MLbased approximate 100(1 − α)% CI for the parameter θj . First, with the (j, j) diagonal element vjj (θ) of I −1 (θ) being the largesample variance of θˆ j based on expected information, it follows that θˆ j − θj D → N(0, 1) as n −→ ∞. $ vjj (θ) ˆ denoting the estimated largesample covariance matrix of Then, with I −1 (θ) ˆθ based on expected information, and with the (j, j) diagonal element vjj (θ) ˆ ˆ being the estimated largesample variance of θˆ j based on expected of I −1 (θ) information, it follows by Sluksky’s Theorem that θˆ j − θj = 2 ˆ vjj (θ)

vjj (θ) ˆ vjj (θ)
θˆ j − θj $ vjj (θ)
D
→ N(0, 1) as n −→ ∞
ˆ is a consistent estimator of vjj (θ). since vjj (θ) Thus, it follows from the above results that θˆ j − θj ∼N(0, ˙ 1) 2 ˆ vjj (θ)
for large n.
Finally, with Z1−α/2 defined so that pr(Z < Z1−α/2 ) = (1 − α/2) when Z ∼ N(0, 1), we have (1 − α) = pr(−Z1−α/2 < Z < Z1−α/2 ) ⎡ ⎤ θˆ j − θj ⎢ ⎥ ≈ pr ⎣−Z1−α/2 < 2 < Z1−α/2 ⎦ ˆ vjj (θ)
2 2 ˆ < θj < θˆ j + Z1−α/2 vjj (θ) ˆ . = pr θˆ j − Z1−α/2 vjj (θ) Thus,
2 ˆ θˆ j ± Z1−α/2 vjj (θ)
is the largesample MLbased approximate 100(1 − α)% CI for the parameter θj based on expected information.
193
Concepts and Notation
In practice, instead of the estimated expected information matrix, the ˆ is used, with its (j, j ) element estimated observed information matrix I(x; θ) equal to ∂ 2 ln L(x; θ) − . ∂θj ∂θj
ˆ θ=θ
ˆ denoting the estimated largesample covariance matrix Then, with I −1 (x; θ) ˆ of θ based on observed information, and with the (j, j) diagonal element ˆ of I −1 (x; θ) ˆ being the estimated largesample variance of θˆ j based on vjj (x; θ) observed information, it follows that 2 ˆ θˆ j ± Z1−α/2 vjj (x; θ) is the largesample MLbased approximate 100(1 − α)% CI for the parameter θj based on observed information. 4.1.4.10
MLBased CI for a Bernoulli Distribution Probability
As a simple oneparameter (p = 1) example, let X1 , X2 , . . . , Xn constitute a random sample of size n from the Bernoulli parent population pX (x; θ) = θx (1 − θ)1−x ,
x = 0, 1
and
0 < θ < 1,
and suppose that it is desired to develop a largesample MLbased approximate 100(1 − α)% CI for the parameter θ. First, the appropriate likelihood function is n
θxi (1 − θ)1−xi = θs (1 − θ)n−s , L(x; θ) = i=1
where s = Now,
n
i=1 xi
is a sufficient statistic for θ. ln L(x; θ) = s ln θ + (n − s) ln(1 − θ),
so that the equation s (n − s) ∂ ln L(x; θ) = − =0 ∂θ θ (1 − θ) ¯ = n−1 gives θˆ = X And,
n
i=1 Xi
as the MLE of θ.
−s (n − s) ∂ 2 ln L(x; θ) = 2 − , ∂θ2 θ (1 − θ)2
194
Estimation Theory
so that
∂ 2 ln L(x; θ) −E ∂θ2
=
nθ (n − nθ) n = + . 2 2 θ(1 − θ) θ (1 − θ)
Hence, .
2 ˆ = −E ∂ ln L(x; θ) v11 (θ) ∂θ2
/−1 θ=θˆ
/−1 . ∂ 2 ln L(x; θ) ˆ = v11 (x; θ) = − ∂θ2 ˆ
θ=θ
=
¯ − X) ¯ X(1 , n
so that the largesample MLbased approximate 100(1 − α)% CI for θ is equal to ¯ ¯ ¯ ± Z1−α/2 X(1 − X) . X n In this simple example, the same CI is obtained using either expected information or observed information. In more complicated situations, this will typically not happen. 4.1.4.11
Delta Method
Let Y = g(X), where X = (X1 , X2 , . . . , Xk ), μ = (μ1 , μ2 , . . . , μk ), E(Xi ) = μi , V(Xi ) = σi2 , and cov(Xi , Xj ) = σij for i = j, i = 1, 2, . . . , k and j = 1, 2, . . . , k. Then, a firstorder (or linear) multivariate Taylor series approximation to Y around μ is Y ≈ g(μ) +
k ∂g(μ) i=1
∂Xi
(Xi − μi ),
where ∂g(μ) ∂g(X) = . ∂Xi ∂Xi X=μ Thus, using the above linear approximation for Y, it follows that E(Y) ≈ g(μ) and that V(Y) ≈
k ∂g(μ) 2 i=1
∂Xi
σi2
k k−1 ∂g(μ) ∂g(μ) σij . +2 ∂Xi ∂Xj i=1 j=i+1
195
Concepts and Notation
The delta method for MLEs is as follows. For q ≤ p, suppose that the (1 × q) row vector Φ(θ) = [τ1 (θ), τ2 (θ), . . . , τq (θ)] involves q scalar parametric functions of the parameter vector θ. Then, ˆ = [τ1 (θ), ˆ τ2 (θ), ˆ . . . , τq (θ)] ˆ Φ(θ) is the MLE of Φ(θ). ˆ based on expected Then, the (q × q) largesample covariance matrix of Φ(θ) information is [Δ(θ)]I −1 (θ)[Δ(θ)] , where the (i, j) element of the (q × p) matrix Δ(θ) is equal to ∂τi (θ)/∂θj , i = 1, 2, . . . , q and j = 1, 2, . . . , p. ˆ Hence, the corresponding estimated largesample covariance matrix of Φ(θ) based on expected information is −1 ˆ ˆ ˆ . [Δ(θ)]I (θ)[Δ(θ)]
Analogous expressions based on observed information are obtained by subˆ for I −1 (θ) ˆ in the stituting I −1 (x; θ) for I −1 (θ) and by substituting I −1 (x; θ) above two expressions. The special case q = p = 1 gives V[τ1 (θˆ 1 )] ≈
∂τ1 (θ1 ) ∂θ1
2
V(θˆ 1 ).
The corresponding largesample MLbased approximate 100(1 − α)% CI for τ1 (θ1 ) based on expected information is equal to 6 7 7 ∂τ1 (θ1 ) 2 ˆ τ1 (θ1 ) ± Z1−α/2 8 v11 (θˆ 1 ). ∂θ1 θ1 =θˆ 1 The corresponding CI based on observed information is obtained by substituting v11 (x; θˆ 1 ) for v11 (θˆ 1 ) in the above expression. 4.1.4.12
Delta Method CI for a Function of a Bernoulli Distribution Probability
As a simple illustration, for the Bernoulli population example considered earlier, suppose that it is now desired to use the delta method to obtain a largesample MLbased approximate 100(1 − α)% CI for the “odds” τ(θ) =
θ pr(X = 1) = . (1 − θ) [1 − pr(X = 1)]
196
Estimation Theory
¯ ¯ is the MLE of τ(θ) since θˆ = ˆ = X/(1 So, by the invariance property, τ(θ) − X) ¯ X is the MLE of θ. And, via the delta method, the largesample estimated ˆ is equal to variance of τ(θ)
∂τ(θ) 2 ˆ ˆ θ) ˆ ˆ V τ(θ) ≈ V( ∂θ θ=θˆ
2 ¯ ¯ 1 X(1 − X) = ¯ 2 n (1 − X) =
¯ X ¯ 3 n(1 − X)
.
Finally, the largesample MLbased approximate 100(1 − α)% CI for τ(θ) = θ/(1 − θ) using the delta method is equal to ¯ X ¯ (1 − X)
± Z1−α/2
¯ X ¯ 3 n(1 − X)
.
EXERCISES Exercise 4.1. Suppose that Yx ∼ N(xμ, x3 σ2 ), x = 1, 2, . . . , n. Further, assume that {Y1 , Y2 , . . . , Yn } constitute a set of n mutually independent random variables, and that σ2 is a known positive constant. Consider the following three estimators of μ: 1. μ ˆ 1 , the method of moments estimator of μ; 2. μ ˆ 2 , the unweighted least squares estimator of μ; 3. μ ˆ 3 , the MLE of μ. (a) Derive expressions for μ ˆ 1, μ ˆ 2 , and μ ˆ 3 . (These expressions can involve summation signs.) Also, determine the exact distribution of each of these estimators of μ. (b) If n = 5, σ2 = 2, and yx = (x + 1) for x = 1, 2, 3, 4, and 5, construct what you believe to be the “best” exact 95% CI for μ. Exercise 4.2. An epidemiologist gathers data (xi , Yi ) on each of n randomly chosen noncontiguous cities in the United States, where xi (i = 1, 2, . . . , n) is the known population size (in millions of people) in city i, and where Yi is the random variable denoting the number of people in city i with liver cancer. It is reasonable to assume that Yi (i = 1, 2, . . . , n) has a Poisson distribution with mean E(Yi ) = θxi , where θ (>0) is an unknown parameter, and that Y1 , Y2 , . . . , Yn constitute a set of mutually independent random variables. (a) Find an explicit expression for the unweighted leastsquares estimator θˆ uls of θ. Also, find explicit expressions for E(θˆ uls ) and V(θˆ uls ).
197
Exercises
(b) Find an explicit expression for the method of moments estimator θˆ mm of θ. Also, find explicit expressions for E (θˆ mm ) and V(θˆ mm ). (c) Find an explicit expression for the MLE θˆ ml of θ. Also, find explicit expressions for E(θˆ ml ) and V(θˆ ml ). (d) Find an explicit expression for the CRLB for the variance of any unbiased estimator of θ. Which (if any) of the three estimators θˆ uls , θˆ mm , and θˆ ml achieve this lower bound? Exercise 4.3. Suppose that θˆ 1 and θˆ 2 are two unbiased estimators of an unknown parameter θ. Further, suppose that the variance of θˆ 1 is σ12 , that the variance of θˆ 2 is σ22 , and that corr(θˆ 1 , θˆ 2 ) = ρ, −1 < ρ < +1. Define the parameter λ = σ1 /σ2 , and assume (without loss of generality) that 0 < σ1 ≤ σ2 < +∞, so that 0 < λ ≤ 1. Consider the unbiased estimator of θ of the general form θˆ = k θˆ 1 + (1 − k)θˆ 2 , where the quantity k satisfies the inequality −∞ < k < +∞. (a) Develop an explicit expression (as a function of λ and ρ) for that value of k (say, k ∗ ) that minimizes the variance of the unbiased estimator θˆ of θ. Discuss the special cases when ρ > λ and when λ = 1. (b) Let θˆ ∗ = k ∗ θˆ 1 + (1 − k ∗ )θˆ 2 , where k ∗ was determined in part (a). Develop a sufficient condition (as a function of λ and ρ) for which V(θˆ ∗ ) < σ12 = V(θˆ 1 ) ≤ σ22 = V(θˆ 2 ). Exercise 4.4. Suppose that the random variable Xi ∼ N(βai , σi2 ), i = 1, 2, . . . , n. Further, assume that {X1 , X2 , . . . , Xn } constitute a set of mutually independent random variables, that {a1 , a2 , . . . , an } constitute a set of known constants, and that {σ12 , σ22 , . . . , σn2 } constitute a set of known variances. A biostatistician suggests that the random variable βˆ =
n
ci Xi
i=1
would be an excellent estimator of the unknown parameter β if the constants c1 , c2 , . . . , cn are chosen so that the following two conditions simultaneously hold: ˆ = β; and, (2) V(β) ˆ is a minimum. (1) E(β) Find explicit expressions for c1 , c2 , . . . , cn (as functions of the ai ’s and σi2 ’s) such that these two conditions simultaneously hold. Using these “optimal” choices of the ci ’s, what then is the exact distribution of this “optimal” estimator of β? Exercise 4.5. For i = 1, 2, . . . , k, let Yi1 , Y12 , . . . , Yini constitute a random sample of size ni (> 1) from a N(μi , σ2 ) parent population. Further, Y¯ i = n−1 i
ni j=1
Yij
and S2i = (ni − 1)−1
ni (Yij − Y¯ i )2 j=1
198
Estimation Theory
are, respectively, the sample mean and sample variance of the ni observations from this N(μi , σ2 ) parent population. Further, let N = ki=1 ni denote the total number of observations. (a) Consider estimating σ2 with the estimator σˆ 2 =
k
wi S2i ,
i=1
where wi , w2 , . . . , wk are constants satisfying the constraint ki=1 wi = 1. Prove rigorously that E(ˆσ2 ) = σ2 , namely, that σˆ 2 is an unbiased estimator of σ2 . (b) Under the constraint ki=1 wi = 1, find explicit expressions for w1 , w2 , . . . , wk such that V(ˆσ2 ), the variance of σˆ 2 , is a minimum. Exercise 4.6. Suppose that a professor in the Maternal and Child Health Department at the University of North Carolina at Chapel Hill administers a questionnaire (consisting of k questions, each of which is to be answered “yes” or “no”) to each of n randomly selected mothers of infants less than 6 months of age in Chapel Hill. The purpose of this questionnaire is to assess the quality of maternal infant care in Chapel Hill, with “yes” answers indicating good care and “no” answers indicating bad care. Suppose that this professor asks you, the consulting biostatistician on this research project, the following question: Is it possible for you to provide me with a “good” estimator of the probability that a randomly selected new mother in Chapel Hill will respond “yes” to all k items on the questionnaire, reflecting “perfect care”? As a start, assume that the number X of “yes” answers to the questionnaire for a randomly chosen new mother in Chapel Hill follows a binomial distribution with sample size k and probability parameter π, 0 < π < 1. Then, the responses X1 , X2 , . . . , Xn of the n randomly chosen mothers can be considered to be a random sample of size n from this binomial distribution. Your task as the consulting biostatistician is to find the minimum variance unbiased estimator (MVUE) θˆ of θ = pr(X = k) = πk . Once ˆ demonstrate by direct calculation that you have found an explicit expression for θ, ˆ = θ. E(θ) Exercise 4.7. Let Y1 , Y2 , . . . , Yn constitute a random sample of size n (n ≥ 2) from a N(0, σ2 ) population. (a) Develop an explicit expression for an unbiased estimator θˆ of the unknown parameter θ = σr (r a known positive integer) that is a function of a sufficient statistic for θ. (b) Derive an explicit expression for the CRLB for the variance of any unbiased estimator of the parameter θ = σr . Find a particular value of r for which the variance of θˆ actually achieves the CRLB. Exercise 4.8. In a certain laboratory experiment, the time Y (in milliseconds) for a certain blood clotting agent to show an observable effect is assumed to have the negative
199
Exercises
exponential distribution fY (y) = α−1 e−y/α ,
y > 0,
α > 0.
Let Y1 , Y2 , . . . , Yn constitute a random sample of size n from fY (y), and let y1 , y2 , . . . , yn be the corresponding observed values (or realizations) of Y1 , Y2 , . . . , Yn . One can think of y1 , y2 , . . . , yn as the set of observed times for the blood clotting agent to show an observable effect based on n repetitions of the laboratory experiment. It is of interest to make statistical inferences about the unknown parameter θ = V(Y) = α2 using the available data y = {y1 , y2 , . . . , yn }. (a) Develop an explicit expression for the MLE θˆ of θ. If the observed value of S = n i=1 Yi is the value s = 40 when n = 50, compute an appropriate largesample 95% CI for the parameter θ. (b) Develop an explicit expression for the MVUE θˆ ∗ of θ, and then develop an explicit expression for V(θˆ ∗ ), the variance of the MVUE of θ. (c) Does θˆ ∗ achieve the CRLB for the variance of any unbiased estimator of θ? ˆ θ) and MSE(θˆ ∗ , θ), (d) For any finite value of n, develop explicit expressions for MSE(θ, ∗ ˆ ˆ the mean squared errors of θ and θ as estimators of the unknown parameter θ. Using this MSE criterion, which estimator do you prefer for finite n, and which estimator do you prefer asymptotically (i.e., as n → +∞)? Exercise 4.9. Suppose that a laboratory test is conducted on a blood sample from each of n randomly chosen human subjects in a certain city in the United States. The purpose of the test is to detect the presence of a particular biomarker reflecting recent exposure to benzene, a known human carcinogen. Let π, 0 < π < 1, be the unknown probability that a randomly chosen subject in this city has been recently exposed to benzene. When a subject has been recently exposed to benzene, the biomarker will be correctly detected with known probability γ, 0 < γ < 1; when a subject has not been recently exposed to benzene, the biomarker will be incorrectly detected with known probability δ, 0 < δ < γ < 1. Let X be the random variable denoting the number of the n subjects who are classified as having been recently exposed to benzene (or, equivalently, who provide a blood sample in which the biomarker is detected). (a) Find an unbiased estimator π ˆ of the parameter π that is an explicit function of the random variable X, and also derive an explicit expression for V(π), ˆ the variance of the estimator π. ˆ (b) If n = 50, α = 0.05, β = 0.90, and if the observed value of X is x =20, compute an appropriate 95% largesample CI for the unknown parameter π. Exercise 4.10. A scientist at the National Institute of Environmental Health Sciences (NIEHS) is studying the teratogenic effects of a certain chemical by injecting a group of pregnant female rats with this chemical and then observing the number of abnormal (i.e., dead or malformed) fetuses in each litter. Suppose that π, 0 < π < 1, is the probability that a fetus is abnormal. Further, for the ith of n litters, each litter being of size two, let the random variable Xij take the
200
Estimation Theory
value 1 if the jth fetus is abnormal and let Xij take the value 0 if the jth fetus is normal, j = 1, 2. Since the two fetuses in each litter have experienced the same gestational conditions, the dichotomous random variables Xi1 and Xi2 are expected to be correlated. To allow for such a correlation, the following correlated binomial model is proposed: for i = 1, 2, . . . , n, pr[(Xi1 = 1) ∩ (Xi2 = 1)] = π2 + θ, pr[(Xi1 = 1) ∩ (Xi2 = 0)] = pr[(Xi1 = 0) ∩ (Xi2 = 1)] = π(1 − π) − θ, and pr[(Xi1 = 0) ∩ (Xi2 = 0)] = (1 − π)2 + θ. Here, cov(Xi1 , Xi2 ) = θ, −min[π2 , (1 − π)2 ] ≤ θ ≤ π(1 − π). (a) Let the random variable Y11 be the number of litters out of n for which both fetuses are abnormal, and let the random variable Y00 be the number of litters out of n for which both fetuses are normal. Show that the MLEs π ˆ of π and θˆ of θ are, respectively, 1 (Y − Y00 ) π ˆ = + 11 , 2 2n and Y θˆ = 11 − π ˆ 2. n (b) Develop explicit expressions for E(π) ˆ and V(π). ˆ (c) If n = 30, and if the observed values of Y11 and Y00 are y11 = 3 and y00 = 15, compute an appropriate largesample 95% CI for π. For a more general statistical treatment of such a correlated binomial model, see Kupper and Haseman (1978). Exercise 4.11. A popular epidemiologic study design is the pairmatched case–control study design, where a case (i.e., a diseased person, denoted D) is “matched” (on covariates such as age, race, and sex) to a control (i.e., a nondiseased person, denoted ¯ Each member of the pair is then interviewed as to the presence (E) or absence (E) ¯ D). of a history of exposure to some potentially harmful substance (e.g., cigarette smoke, asbestos, benzene, etc.). The data from such a study involving n case–control pairs can be presented in tabular form, as follows: ¯ D
D
E
E Y11
E¯ Y10
E¯
Y01
Y00 n
201
Exercises
Here, Y11 is the number of pairs for which both the case and the control are exposed (i.e., both have a history of exposure), Y10 is the number of pairs for which the case is exposed but the control is not, and so on. Clearly, 1i=0 1j=0 Yij = n. In what follows, assume that the {Yij } have a multinomial distribution with sample size n and associated cell probabilities {πij }, where 1 1
πij = 1.
i=0 j=0
For example, then, π10 is the probability of obtaining a pair in which the case is exposed and its matched control is not. In such a study, the parameter measuring the association between exposure status and disease status is the odds ratio OR = π10 /π01 ; 9 = Y10 /Y01 . the estimator of OR is OR (a) Under the assumed multinomial model for the {Yij }, use the delta method to 9 of V(ln OR), 9 the variance of the ranˆ develop an appropriate estimator V(ln OR) 9 What is the numerical value of your variance estimator when dom variable ln OR. n = 100 and when the observed cell counts are y11 = 15, y10 = 25, y01 = 15, and y00 = 45? (b) Assuming that 9 − ln OR ln OR ∼ ˙ N(0, 1), 2 9 ˆ V(ln OR) for large n, use the observed cell counts given in part (a) to construct an appropriate 95% CI for OR. Exercise 4.12. Actinic keratoses are small skin lesions that serve as precursors for skin cancer. It has been theorized that adults who are residents of U.S. cities near the equator are more likely to develop actinic keratoses, and hence to be at greater risk for skin cancer, than are adults who are residents of U.S. cities distant from the equator. To test this theory, suppose that dermatology records for a random sample of n1 adult residents of a particular U.S. city (say, City 1) near the equator are examined to determine the number of actinic keratoses that each of these n1 adults has developed. In addition, dermatology records for a random sample of n0 adult residents of a particular U.S. city (say, City 0) distant from the equator are examined to determine the number of actinic keratoses that each of these adults has developed. As a statistical model for evaluating this theory, for adult resident j (j = 1, 2, . . . , ni ) in City i (i = 0, 1), suppose that the random variable Yij ∼ POI(Lij λi ), where Lij is the length of time (in years) that adult j has resided in City i and where λi is the rate of development of actinic keratoses per year (i.e., the expected number of actinic keratoses that develop per year) for an adult resident of City i. So, the pair (Lij , yij ) constitutes the observed information for adult resident j in City i. (a) Develop an appropriate MLbased largesample 100(1 − α)% CI for the log rate ratio ln ψ = ln(λ1 /λ0 ).
202
Estimation Theory
30 30 30 (b) If n1 = n0 = 30, 30 j=1 y1j = 40, j=1 L1j = 350, j=1 y0j = 35, and j=1 L0j = 400, compute a 95% CI for the rate ratio ψ. Comment on your findings. Exercise 4.13. The time T (in months) in remission for leukemia patients who have completed a certain type of chemotherapy treatment is assumed to have the negative exponential distribution fT (t; θ) = θe−θt ,
t > 0, θ > 0.
Suppose that monitoring a random sample of n leukemia patients who have completed this chemotherapy treatment leads to the n observed remission times t1 , t2 , . . . , tn . In formal statistical terms, T1 , T2 , . . . , Tn represent a random sample of size n from fT (t; θ), and t1 , t2 , . . . , tn are the observed values (or realizations) of the n random variables T1 , T2 , . . . , Tn . (a) Using the available data, derive an explicit expression for the largesample variance (based on expected information) of the MLE θˆ of θ. (b) Abiostatistician responsible for analyzing this data set realizes that it is not possible to know with certainty the exact number of months that each patient is in remission after completing the chemotherapy treatment. So, this biostatistician suggests the following alternative procedure for estimating θ: “After some specified time period (in months) of length t∗ (a known positive constant) after completion of the chemotherapy treatment, let Yi = 1 if the ith patient is still in remission after t∗ months and let Yi = 0 if not, where pr(Yi = 1) = pr(Ti > t∗ ), i = 1, 2, . . . , n. Then, use the n mutually independent dichotomous random variables Y1 , Y2 , . . . , Yn to find an alternative MLE θˆ ∗ of the parameter θ.” Develop an explicit expression for θˆ ∗ . (c) Use expected information to compare the largesample variances of θˆ and θˆ ∗ . Assuming t∗ ≥ E(T), which of these two MLEs has the smaller variance, and why should this be the anticipated finding? Are there circumstances where the MLE with the larger variance might be preferred? Exercise 4.14. For a typical woman in a certain highrisk population of women, suppose that the number Y of lifetime events of domestic violence involving emergency room treatment is assumed to have the Poisson distribution pY (y; λ) = λy e−λ /y!,
y = 0, 1, . . . , + ∞ and
λ > 0.
Let Y1 , Y2 , . . . , Yn constitute a random sample of size n (where n is large) from this Poisson population (i.e., n women from this highrisk population are randomly sampled and then each woman in the random sample is asked to recall the number of lifetime events of domestic violence involving emergency room treatment that she has experienced). (a) Find an explicit expression for the CRLB for the variance of any unbiased estimator of parameter θ = pr(Y = 0). Does there exist an unbiased estimator of θ that achieves this CRLB for all finite values of n?
Exercises
203
(b) Suppose that a certain domestic violence researcher believes that reported values of Y greater than zero are not very accurate (although a reported value greater than zero almost surely indicates at least one domestic violence experience involving emergency room treatment), but that reported values of Y equal to zero are accurate. Because of this possible data inaccuracy problem, this researcher wants to analyze the data by converting each Yi to a twovalued (or dichotomous) random variable Xi , where Xi is defined as follows: if Yi ≥ 1, then Xi = 1; and, if Yi = 0, then Xi = 0. Using the n mutually independent dichotomous random variables X1 , X2 , . . . , Xn , find an explicit expression for the MLE λˆ ∗ of λ and then find an explicit expression for the largesample variance of λˆ ∗ . (c) This domestic violence researcher is concerned that she may be doing something wrong by using the dichotomous variables X1 , X2 , . . . , Xn (instead of the original Poisson variables Y1 , Y2 , . . . , Yn ) to estimate the unknown parameter λ. To address ˆ her concern, make a quantitative comparison between the properties of λˆ ∗ and λ, where λˆ is the MLE of λ obtained by using Y1 , Y2 , . . . , Yn . Also, comment on issues of validity (i.e., bias) and precision (i.e.,variability) as they relate to the choice between λˆ and λˆ ∗ . Exercise 4.15. For a certain African village, available data strongly suggest that the expected number of new cases of AIDS developing in any particular year is directly proportional to the expected number of new AIDS cases that developed during the immediately preceding year. An important statistical goal is to estimate the value of this unknown proportionality constant θ (θ > 1), which is assumed not to vary from year to year, and then to find an appropriate 95% CI for θ. To accomplish this goal, the following statistical model is to be used: For j = 0, 1, . . . , n consecutive years of data, let Yj be the random variable denoting the number of new AIDS cases developing in year j. Further, suppose that the (n + 1) random variables Y0 , Y1 , . . . , Yn are such that the conditional distribution of Yj+1 , given Yk = yk for k = 0, 1, . . . , j, depends only on yj and is Poisson with E(Yj+1 Yj = yj ) = θyj , j = 0, 1, . . . , (n − 1). Further, assume that the distribution of the random variable Y0 is Poisson with E(Y0 ) = θ, where θ > 1. (a) Using all (n + 1) random variables Y0 , Y1 , . . . , Yn , develop an explicit expression for the MLE θˆ of the unknown proportionality constant θ. (b) If n = 25 and θˆ = 1.20, compute an appropriate MLbased 95% CI for θ. Exercise 4.16. In a certain clinical trial, suppose that the outcome variable X represents the 6month change in cholesterol level (in milligrams per deciliter) for subjects in the treatment (T) group who will be given a certain cholesterollowering drug, and suppose that Y represents this same outcome variable for subjects in the control (C) group who will be given a placebo. Further, suppose that it is reasonable to assume that X ∼ N(μt , σt2 ) and Y ∼ N(μc , σc2 ), and that σt2 and σc2 have known values such that σt2 = σc2 . Let X1 , X2 , . . . , Xnt constitute a random sample of size nt from N(μt , σt2 ); namely, these nt observations represent the set of outcomes to be measured on the nt subjects who have been randomly assigned to the T group. Similarly, let Y1 , Y2 , . . . , Ync constitute a random sample of size nc from N(μc , σc2 ); namely, these nc observations represent
204
Estimation Theory
the set of outcomes to be measured on the nc subjects who have been randomly assigned to the C group. Because of monetary and logistical constraints, suppose that a total of only N subjects can participate in this clinical trial, so that nt and nc are constrained to satisfy the relationship (nt + nc ) = N. Based on the stated assumptions (namely, random samples from two normal populations with known, but unequal, variances), determine the “optimal” partition of N into values nt and nc that will produce the most “precise” exact 95% CI for (μt − μc ). When N = 100, σt2 = 4, and σc2 = 9, find the optimal choices for nt and nc . Comment on your findings. Exercise 4.17. Suppose that the random variable Y = ln(X), where X is the ambient carbon monoxide (CO) concentration (in parts per million) in a certain highly populated U.S. city, is assumed to have a normal distribution with mean E(Y) = μ and variance V(Y) = σ2 . Let Y1 , Y2 , . . . , Yn constitute a random sample from this N(μ, σ2 ) population. Practically speaking, Y1 , Y2 , . . . , Yn can be considered to be ln(CO concentration) readings taken on days 1, 2, . . . , n, where these n days are spaced far enough apart so that Y1 , Y2 , . . . , Yn can be assumed to be mutually independent random variables. It is of interest to be able to predict with some accuracy the value of the random variable Yn+1 , namely, the value of the random variable representing the ln(CO concentration) on day (n + 1), where day (n + 1) is far enough in time from day n so that Yn+1 can reasonably be assumed to be independent of the random variables Y1 , Y2 , . . . , Yn . Also, it can be further assumed, as well, that Yn+1 ∼ N(μ, σ2 ). ¯ 2 , determine explicit If Y¯ = n−1 ni=1 Yi and if S2 = (n − 1)−1 ni=1 (Yi − Y) expressions for random variables L and U (involving Y¯ and S) such that pr[L < Yn+1 < U] = (1 − α), 0 < α ≤ 0.10. In other words, rigorously derive an exact 100(1 − α)% prediction interval for the random variable Yn+1 . If n = 5, and if Yi = i, i = 1, 2, 3, 4, 5, compute an exact 95% prediction interval for Y6 . As a hint, construct a statistic involving the random variable (Y¯ − Yn+1 ) that has a tdistribution. Exercise 4.18. Let X1 , X2 , . . . , Xn constitute a random sample of size n from a N(μ, σ2 ) population. Let ¯ = n−1 X
n
Xi
and S2 = (n − 1)−1
i=1
n %
¯ Xi − X
&2
.
i=1
Under the stated assumptions, the most appropriate 100(1 − α)% CI for μ is √ ¯ ± tn−1,1−α/2 S/ n, X where tn−1,1−α/2 is the 100(1 − α/2)% percentile point of Student’s tdistribution with (n − 1) df. The width Wn of this CI is √ Wn = 2tn−1,1−α/2 S/ n.
205
Exercises
(a) Under the stated assumptions, derive an explicit expression for E(Wn ), the expected width of this CI. What is the exact numerical value of E(Wn ) if n = 4, α = 0.05, and σ2 = 4? (b) Suppose that it is desired to find the smallest sample size n∗ such that √ pr(Wn∗ ≤ δ) = pr{2tn∗ −1,1−α/2 S/ n∗ ≤ δ} ≥ (1 − γ), where δ (> 0) and γ (0 < γ < 1) are specified positive numbers. Under the stated assumptions, prove rigorously that n∗ should be chosen to be the smallest positive integer satisfying the inequality n∗ (n∗ − 1) ≥
2σ 2 2 χn∗ −1,1−γ f1,n∗ −1,1−α , δ
where χ2n∗ −1,1−γ and f1,n∗ −1,1−α denote, respectively, 100(1 − γ) and 100(1 − α) percentile points for a chisquare distribution with (n∗ − 1) df and for an f distribution with 1 numerator, and (n∗ − 1) denominator, df. Exercise 4.19. Suppose that an epidemiologist desires to make statistical inferences about the true mean diastolic blood pressure levels for adult residents in three rural North Carolina cities. As a starting model, suppose that she assumes that the true underlying distribution of diastolic blood pressure measurements for adults in each city is normal, and that these three normal distributions have a common variance (say, σ2 ), but possibly different means (say, μ1 , μ2 , and μ3 ). This epidemiologist decides to obtain her blood pressure study data by randomly selecting ni adult residents from city i, i = 1, 2, 3, and then measuring their diastolic blood pressures. Using more formal statistical notation, for i = 1, 2, 3, let Yi1 , Yi2 , . . . , Yini constitute a random sample of size ni from a N(μi , σ2 ) population. Define the random variables ni Yij , i = 1, 2, 3, Y¯ i = n−1 i j=1
and S2i = (ni − 1)−1
ni
Yij − Y¯ i
2
,
i = 1, 2, 3.
j=1
(a) Consider the parameter θ = (2μ1 − 3μ2 + μ3 ). Using all the available data (in particular, all three sample means and all three sample variances), construct a random variable that has a Student’s tdistribution. (b) If n1 = n2 = n3 = 4, y¯ 1 = 80, y¯ 2 = 75, y¯ 3 = 70, s21 = 4, s22 = 3, and s23 = 5, find an exact 95% CI for θ given the stated assumptions. (c) Now, suppose that governmental reviewers of this study are skeptical about both the epidemiologist’s assumptions of normality and homogeneous variance,
206
Estimation Theory
claiming that her sample sizes were much too small to provide reliable information about the appropriateness of these assumptions or about the parameter θ. To address these criticisms, this epidemiologist goes back to these same three rural North Carolina cities and takes blood pressure measurements on large random samples of adult residents in each of the three cities; she obtains the following data: n1 = n2 = n3 = 50; y¯ 1 = 85, y¯ 2 = 82, y¯ 3 = 79; s21 = 7, s22 = 2, s23 = 6. Retaining the normality assumption for now, find an appropriate 95% CI for σ12 /σ22 , and then comment regarding the appropriateness of the homogeneous variance assumption. (d) Using the data in part (c), compute an appropriate largesample 95% CI for θ. Comment on the advantages of increasing the sizes of the random samples selected from each of the three populations. Exercise 4.20. Let X1 , X2 , . . . , Xn1 constitute a random sample of size n1 (>2) from a normal parent population with mean 0 and variance θ. Also, let Y1 , Y2 , . . . , Yn2 constitute a random sample of size n2 (>2) from a normal parent population with mean 0 and variance θ−1 . The set of random variables {X1 , X2 , . . . , Xn1 } is independent of the set of random variables {Y1 , Y2 , . . . , Yn2 }, and θ(>0) is an unknown parameter. √ n1 Xi2 . (a) Derive an explicit expression for E( L) when L = i=1 (b) Using all (n1 + n2 ) available observations, derive an explicit expression for an exact 100(1 − α)% CI for the unknown parameter θ. If n1 = 8, n2 = 5, 8i=1 xi2 = 30, and 5 2 i=1 yi = 15, compute a 95% confidence interval for θ. Exercise 4.21. In certain types of studies called crossover studies, each of n randomly chosen subjects is administered both a treatment T (e.g., a new drug pill) and a placebo P (e.g., a sugar pill). Typically, neither the subject nor the person administering the pills knows which pill is T and which pill is P (namely, the study is a socalled doubleblind study). Also, the two possible pill administration orderings “first T, then P” and “first P, then T” are typically allocated randomly to subjects, and sufficient time is allowed between administrations to avoid socalled “carryover” effects. One advantage of a crossover study is that a comparison between the effects of T and P can be made within (or specific to) each subject (since each subject supplies information on the effects of both T and P), thus eliminating subjecttosubject variability in each subjectspecific comparison. For the ith subject (i = 1, 2, . . . , n), suppose that Di = (YTi − YPi ) is the continuous random variable representing the difference between a continous response (YTi ) following T administration and a continuous response (YPi ) following P administration. So, Di is measuring the effect of T relative to P for subject i. Since YTi and YPi are responses for the same subject (namely, subject i), it is very sensible to expect that YTi and YPi will be correlated to some extent. To allow for this potential intrasubject response correlation, assume in what follows that YTi and YPi jointly follow 2 , E(Y ) = μ , V(Y ) = a bivariate normal distribution with E(YTi ) = μT , V(YTi ) = σT Pi P Pi 2 σP , and with corr(YTi , YPi ) = ρ, i = 1, 2, . . . , n. Further, assume that the n differences D1 , D2 , . . . , Dn are mutually independent of one another.
207
Exercises
2 , σ2 , and ρ have known values, use the n mutually indepen(a) Assuming that σT P dent random variables D1 , D2 , . . . , Dn to derive an exact 100(1 − α)% CI for the unknown parameter θ = (μT − μP ), the true difference between the expected responses for the T and P administrations. In particular, find explicit expressions for random variables L and U such that pr(L < θ < U) = (1 − α), 0 < α ≤ 0.10. If 2 = 2.0, σ2 = 3.0, there are available data for which n = 10, y¯ T = 15.0, y¯ P = 14.0, σT P ρ = 0.30, and α = 0.05, use this numerical information to compute exact numerical values for L and U. Interpret these numerical results with regard to whether or not the available data provide statistical evidence that μT and μP have different values.
(b) Now, assume that treatment effectiveness is equivalent to the inequality θ > 0 2 = 2.0, σ2 = 3.0, ρ = 0.30, and θ = 1.0 (or, equivalently, μT > μP ). If α = 0.05, σT P (so that T is truly effective compared to P), what is the minimum number n∗ of subjects that should be enrolled in this crossover study so that the random variable L determined in part (a) exceeds the value zero with probability at least equal to 0.95? The motivation for finding n∗ is that, if the treatment is truly effective, it is highly desirable for the lower limit L of the CI for θ to have a high probability of exceeding zero in value, thus providing statistical evidence in favor of a real treatment effect relative to the placebo effect. Exercise 4.22. For i = 1, 2, . . . , n, let the random variables Xi and Yi denote, respectively, the diastolic blood pressure (DBP) and systolic blood pressure (SBP) for the ith of n (>1) randomly chosen hypertensive adult males. Assume that the pairs (Xi , Yi ), i = 1, 2, . . . , n, constitute a random sample of size n from a bivariate normal population, where E(Xi ) = μx , E(Yi ) = μy , V(Xi ) = V(Yi ) = σ2 , and corr(Xi , Yi ) = ρ. The goal is to develop an exact 95% CI for the correlation coefficient ρ. To accomplish this goal, consider the following random variables. Let Ui = (Xi + Yi ) and Vi = (Xi − Yi ), i = ¯ 2 , and ¯ = n Ui , nV¯ = n Vi , (n − 1)S2 = n (Ui − U) 1, 2, . . . , n. Further, let nU u i=1 i=1 i=1 n 2 2 ¯ (n − 1)Sv = i=1 (Vi − V) . (a) Derive explicit expressions for the means and variances of the random variables Ui and Vi , i = 1, 2, . . . , n. (b) Prove rigorously that cov(Ui , Vi ) = 0, i = 1, 2, . . . , n, so that, in this situation, it will follow that Ui and Vi are independent random variables, i = 1, 2, . . . , n. (c) Use rigorous arguments to prove that the random variable W=
(1 − ρ)S2u (1 + ρ)S2v
has an f distribution. (d) If n = 10, and if the realized values of S2u and S2v are 1.0 and 2.0, respectively, use these data, along with careful arguments, to compute an exact 95% CI for ρ. Exercise 4.23. An economist postulates that the distribution of income (in thousands of dollars) in a certain large U.S. city can be modeled by the Pareto density function fY (y; γ, θ) = θγθ y−(θ+1) ,
01) from fX (x; θ). Let X(1) = min{X1 , X2 , . . . , Xn } and let X(n) = max{X1 , X2 , . . . , Xn }. Then, consider the following two estimators of the unknown parameter θ: # 1" θˆ 1 = X(1) + X(n) − 1 2
212
Estimation Theory
and θˆ 2 =
" # 1 nX(1) − X(n) . (n − 1)
(a) Show that θˆ 1 and θˆ 2 are both unbiased estimators of the parameter θ and find explicit expressions for V(θˆ 1 ) and V(θˆ 2 ). (b) More generally, consider the linear function W = (c0 + c1 U1 + c2 U2 ), where V(U1 ) = V(U2 ) = σ2 , where cov(U1 , U2 ) = σ12 , and where c0 , c1 , and c2 are constants with (c1 + c2 ) = 1. Determine values for c1 and c2 that minimize V(W), and explain how this general result relates to a comparison of the variance expressions obtained in part (a). (c) Show that X(1) and X(n) constitute a set of jointly sufficient statistics for θ. Do X(1) and X(n) constitute a set of complete sufficient statistics for θ? Exercise 4.33*. Reliable estimation of the numbers of subjects in the United States living with different types of medical conditions is important to both public health and health policy professionals. In the United States, diseasespecific registries have been established for a variety of medical conditions including birth defects, tuberculosis, HIV, and cancer. Such registries are very often only partially complete, meaning that the number of registry records for a particular medical condition generally provides an underestimate of the actual number of subjects with that particular medical condition. When two registries exist for the same medical condition, statistical models can be used to estimate the degree of underascertainment for each registry and to produce an improved estimate of the actual number of subjects having the medical condition of interest. The simplest statistical model for this purpose is based on the assumption that membership status for one registry is statistically independent of membership status for the other registry. Let the parameter N denote the true unknown number of subjects who have a certain medical condition of interest. Define the random variables Xyy = number of subjects listed in both Registry 1 and Registry 2, Xyn = number of subjects listed in Registry 1 but not in Registry 2, Xny = number of subjects listed in Registry 2 but not in Registry 1, Xnn = number of subjects listed in neither of the two registries, and the corresponding probabilities πyy = pr(a subject is listed in both Registry 1 and Registry 2), πyn = pr(a subject is listed in Registry 1 only), πny = pr(a subject is listed in Registry 2 only), πnn = pr(a patient is listed in neither Registry). It is reasonable to assume that the data arise from a multinomial distribution of the form pXyy ,Xyn ,Xny ,Xnn (xyy , xyn , xny , xnn ) =
N! xyy xyn xny π π π πxnn , xyy !xyn !xny !xnn ! yy yn ny nn
213
Exercises
where 0 ≤ xyy ≤ N, 0 ≤ xyn ≤ N, 0 ≤ xny ≤ N, 0 ≤ xnn ≤ N, and (xyy + xyn + xny + xnn ) = N. It is important to note that the random variable Xnn is not observable. (a) Let π1 = (πyy + πyn ) denote the marginal probability that a patient is listed in Registry 1, and let π2 = (πyy + πny ) denote the marginal probability that a patient is listed in Registry 2. Under the assumption of statistical independence [i.e., ˆ of N by equating πyy = π1 π2 , πyn = π1 (1 − π2 ), etc.], develop an estimator N observed cell counts to their expected values under the assumed model. What is ˆ when xyy = 12, 000, xyn = 6, 000, and xny = 8, 000? the numerical value of N (b) For j = 1, 2, let Ej denote the event that a subject with the medical condition is listed in Registry j, and let E¯ j denote the event that this subject is not listed in Registry j. In part (a), it was assumed that the events E1 and E2 are independent. As an alternative to this independence assumption, assume that membership in one of the two registries increases or decreases the odds of membership in the other registry by a factor of k; in other words, odds(E1  E2 ) odds(E2  E1 ) = = k, ¯ odds(E1  E2 ) odds(E2  E¯ 1 )
0 < k < +∞,
where, for two events A and B, odds(AB) = pr(AB)/[1 − pr(AB)]. Note that k > 1 implies a positive association between the events E1 and E2 , that k < 1 implies a negative association between the events E1 and E2 , and that k = 1 implies no association (i.e., independence) between the events E1 and E2 . Although k is not known in practice, it is of interest to determine whether estimates of N would meaningfully change when plugging in various plausible values for k. Toward this end, develop an explicit expression for the methodofmoments estima˜ tor N(k) of N that would be obtained under the assumption that k is a known constant. ˜ ˜ ˜ Using the data from part (a), calculate numerical values of N(1/2), N(2), and N(4). Comment on your findings. In particular, is the estimate of N sensitive to different assumptions about the direction and magnitude of the association between membership status for the two registries (i.e., to the value of k)? Exercise 4.34*. University researchers are conducting a study involving n infants to assess whether infants placed in day care facilities are more likely to be overweight than are infants receiving care at home. Infants are defined as “overweight” if they fall within the 85th or higher percentile on the official Centers for Disease Control and Prevention (CDC) ageadjusted and sexadjusted body mass index (BMI) growth chart. Let Yi = 1 if the ith infant (i = 1, 2, . . . , n) is overweight, and let Yi = 0 otherwise. It is assumed that Yi has the Bernoulli distribution y
pYi (yi ; πi ) = πi i (1 − πi )1−yi ,
yi = 0, 1
and
0 < πi < 1.
Also, Y1 , Y2 , . . . , Yn are assumed to be mutually independent random variables. To make statistical inferences about the association between type of care and the probability of being overweight, the researchers propose the following logistic
214
Estimation Theory
regression model: eα+βxi , or equivalently, (1 + eα+βxi )
π(xi ) = α + βxi , i = 1, . . . , n, logit[π(xi )] = ln 1 − π(xi )
πi ≡ π(xi ) = pr(Yi = 1xi ) =
where xi = 1 if the ith infant is in day care and xi = 0 if the ith infant is at home, and where α and β are unknown parameters to be estimated. Here, the parameter α represents the “log odds” of being overweight for infants in home care (xi = 0), and the parameter β represents the difference in log odds (or “log odds ratio”) of being overweight for infants placed in day care (xi = 1) compared to infants receiving care at home (xi = 0). Suppose that n pairs (y1 , x1 = 0), (y2 , x2 = 0), . . . , (yn0 , xn0 = 0), (yn0 +1 , xn0 +1 = 1), (yn0 +2 , xn0 +2 = 1) . . . , (yn , xn = 1) of observed data are collected during the study, where the first n0 data pairs are associated with the infants receiving home care, where the last n1 data pairs are associated with the infants placed in day care, and where (n0 + n1 ) = n. (a) Show that the MLEs of α and β are αˆ = ln
p0 1 − p0
p /(1 − p1 ) , and βˆ = ln 1 p0 /(1 − p0 )
n0 is the sample proportion of overweight infants receiving where p0 = n−1 i=1 yi 0 −1 n home care and p1 = n1 i=n0 +1 yi is the sample proportion of overweight infants in day care. (b) Develop an explicit expression for the largesample variance–covariance matrix of αˆ and βˆ based on both expected and observed information. (c) Suppose that there are 100 infants receiving home care, 18 of whom are overweight, and that there are 100 infants in day care, 26 of whom are overweight. Use these data to compute largesample 95% CIs for α and β. Based on these CI results, do the data supply statistical evidence that infants placed in day care facilities are more likely to be overweight than are infants receiving care at home? For further details about logistic regression, see Breslow and Day (1980), Hosmer and Lemeshow (2000), Kleinbaum and Klein (2002), and Kleinbaum et al. (1982). Exercise 4.35*. In April of 1986, a reactor exploded at the Chernobyl Nuclear Power Plant in Chernobyl, Russia. There were roughly 14,000 permanent residents of Chernobyl who were exposed to varying levels of radioactive iodine, as well as to other radioactive substances. It took about 3 days before these permanent residents, and other persons living in nearby areas, could be evacuated. As a result, many children and adults have since developed various forms of cancer. In particular, many young children developed thyroid cancer. As a model for the development of thyroid cancer in such children, the following statistical model is proposed. Let T be a continuous random variable representing the time (in years)
215
Exercises
from childhood radioactive iodine exposure caused by the Chernobyl explosion to the diagnosis of thyroid cancer, and let the continuous random variable X be the level (in Joules per kilogram) of radioactive iodine exposure. Then, it is assumed that the conditional distribution of T given X = x is fT (tX = x) = θxe−θxt ,
t > 0, x > 0, θ > 0.
Further, assume that the distribution of X is GAMMA(α = 1, β), so that fX (x) =
xβ−1 e−x , Γ(β)
x > 0,
β > 1.
Suppose that an epidemiologist locates n children with thyroid cancer who were residents of Cherynobyl at the time of the explosion. For each of these children, this epidemiologist determines the time in years (i.e., the socalled latency period) from exposure to the diagnosis of thyroid cancer. In particular, let t1 , t2 , . . . , tn denote these observed latency periods. Since it is impossible to determine the true individual level of radioactive iodine exposure for each of these n children, the only data available to this epidemiologist are the n observed latency periods. Based on the use of the observed latency periods for a random sample of n = 300 children, if the MLEs of θ and β are θˆ = 0.32 and βˆ = 1.50, compute an appropriate largesample 95% CI for γ = E(T), the true average latency period for children who developed thyroid cancer as a result of the Chernobyl nuclear reactor explosion. Exercise 4.36*. Using n mutually independent data pairs of the general form (x, Y), it is desired to use the method of unweighted least squares to fit the model Y = β0 + β1 x + β2 x2 + , where E( ) = 0 and V( ) = σ2 . Suppose that the three possible values of the predictor x are −1, 0, and +1. What proportion of the n data points should be assigned to each of the three values of x so as to minimize V(βˆ 2 ), the variance of the unweighted leastsquares estimator of β2 ? To proceed, assume that n = nπ1 + nπ2 + nπ3 , where π1 (0 < π1 < 1) is the proportion of the n observations to be assigned to the xvalue of −1, where π2 (0 < π2 < 1) is the proportion of the n observations to be assigned to the xvalue of 0, and where π3 (0 < π3 < 1) is the proportion of the n observations to be assigned to the xvalue of +1. Further, assume that n can be chosen so that n1 = nπ1 , n2 = nπ2 , and n3 = nπ3 are positive integers. (a) With βˆ = (βˆ 0 , βˆ 1 , βˆ 2 ) = (X X)−1 X Y, show that X X can be written in the form ⎡ 1 XX = n ⎣ b a where a = (π1 + π3 ) and b = (π3 − π1 ).
b a b
⎤ a b ⎦, a
216
Estimation Theory
(b) Show that
. V(βˆ 2 ) =
/ [(π1 + π3 ) − (π3 − π1 )2 ] σ2 . 4nπ1 π2 π3
(c) Use the result from part (b) to find the values of π1 , π2 and π3 that minimize V(βˆ 2 ) subject to the constraint (π1 + π2 + π3 ) = 1. Exercise 4.37*. Given appropriate data, one possible (but not necessarily optimal) algorithm for deciding whether or not there is statistical evidence that p(≥2) population means are not all equal to the same value is the following: compute a 100(1 − α)% CI for each population mean and decide that there is no statistical evidence that these population means are not all equal to the same value if these p CIs have at least one value in common (i.e., if there is at least one value that is simultaneously contained in all p CIs); otherwise, decide that there is statistical evidence that these p population means are not all equal to the same value. To evaluate some statistical properties of this proposed algorithm, consider the following scenario. For i = 1, 2, . . . , p, let Xi1 , Xi2 , . . . , Xin constitute a random sample of size n from a N(μi , σ2 ) population. Given the stated assumptions, the appropriate exact 100(1 − α)% CI for μi , using only the data {Xi1 , Xi2 , . . . , Xin } from the ith population, involves the tdistribution with (n − 1) df and takes the form Si ¯i ± k√ X , n
where k = t(n−1),1−α/2 ,
where ¯ i = n−1 X
n
Xij
and S2i = (n − 1)−1
j=1
n ¯ i )2 , (Xij − X j=1
and where, for 0 < α < 0.50, % & α pr Tν > tν,1−α/2 = 2 when the random variable Tν has a tdistribution with ν df. For notational convenience, let Ii denote the set of values included in the ith computed CI; and, for i = i , let the event Eii = Ii ∩ √Ii = ∅, the empty√(or null) set; in other ¯ i ± kSi / n and X ¯ i ± kSi / n have no values in words, Eii is the event that the CIs X common (i.e., do not overlap). (a) Show that
Si S
¯i −X ¯ i  > k √ πii = pr(Eii ) = pr X + √i . n n
(b) Under the condition (say, Cp ) that all p population means are actually equal to the same value (i.e., μ1 = μ2 = · · · = μp = μ, say), use the result from part (a) to show that, for i = i , # " π∗ii = pr(Eii Cp ) ≤ pr T2(n−1)  > kCp ≤ α.
217
Exercises
(c) When p = 3 and under the condition C3 that μ1 = μ2 = μ3 = μ, say, find a crude upper bound for the probability that there are no values common to all three CIs. Comment on this finding and, in general, on the utility of this algorithm. Exercise 4.38*. A highway safety researcher theorizes that the number Y of automobile accidents per year occurring on interstate highways in the United States is linearly related to a certain measure x of traffic density. To evaluate his theory, this highway safety researcher gathers appropriate data from n independently chosen locations across the United States. More specifically, for the ith of n independently chosen locations (i = 1, 2, . . . , n), the data point (xi , yi ) is recorded, where xi (the measure of traffic density at location i) is assumed to be a known positive constant and where yi is the observed value (or “realization”) of the random variable Yi . Here, the random variable Yi is assumed to have a Poisson distribution with E(Yi ) = E(Yi xi ) = θ0 + θ1 xi . You can assume that E(Yi ) > 0 for all i and that the set {Y1 , Y2 , . . . , Yn } constitutes a set of n mutually independent Poisson random variables. The goal is to use the available n pairs of data points (xi , yi ), i = 1, 2, . . . , n, to make statistical inferences about the unknown parameters θ0 and θ1 . (a) Derive explicit expressions for the unweighted leastsquares (ULS) estimators : θ0 and : θ1 , respectively, of θ0 and θ1 . Also, derive expressions for the expected values and variances of these two ULS estimators. (b) For a set of n = 100 data pairs (xi , yi ), i = 1, 2, . . . , 100, suppose that each of 25 data pairs has an x value equal to 1.0, that each of 25 data pairs has an x value equal to 2.0, that each of 25 data pairs has an x value equal to 3.0, and that each of 25 data pairs has an x value of 4.0. If the MLEs of θ0 and θ1 are, respectively, θˆ 0 = 2.00 and θˆ 1 = 4.00, compute an appropriate largesample 95% CI for the parameter ψ = E(Yx = 2.5) = θ0 + (2.5)θ1 . Exercise 4.39*. Let X1 , X2 , . . . , Xn constitute a random sample of size n(> 1) from an N(μ, σ2 ) population, and let Y1 , Y2 , . . . , Yn constitute a random sample of size n(> 1) from a completely different N(μ, σ2 ) population. Hence, the set {X1 , X2 , . . . , Xn ; Y1 , Y2 , . . . , Yn } is made up of a total of 2n mutually independent random variables, with each random variable in the set having a N(μ, σ2 ) distribution. Consider the following random variables: ¯ = n−1 X
n
Xi ,
S2x = (n − 1)−1
i=1
Y¯ = n−1
n i=1
n ¯ 2, (Xi − X) i=1
Yi ,
S2y = (n − 1)−1
n ¯ 2. (Yi − Y) i=1
(a) For any particular value of i(1 ≤ i ≤ n), determine the exact distribution of the ¯ random variable Di = (Xi − X). (b) For particular values of i(1 ≤ i ≤ n) and j(1 ≤ j ≤ n), where i = j, derive an explicit expression for corr(Di , Dj ), the correlation between the random variables Di and
218
Estimation Theory
Dj . Also, find the limiting value of corr(Di , Dj ) as n → ∞, and then provide an argument as to why this result makes sense. (c) Prove rigorously that the density function of the random variable R = S2x /S2y is fR (r) = [Γ(n − 1)]{Γ[(n − 1)/2]}−2 r[(n−3)/2] (1 + r)−(n−1) , 0 < r < ∞. Also, find an explicit expression for E(R). (d) Now, consider the following two estimators of the unknown parameter μ: ¯ + Y)/2; ¯ (i) μ ˆ 1 = (X 2 ) + (Y)(S ¯ ¯ 2 )]/(S2 + S2 ). (ii) μ ˆ 2 = [(X)(S y x x y Prove rigorously that both μ ˆ 1 and μ ˆ 2 are unbiased estimators of μ.
ˆ 2 ). Which estimator, μ ˆ 1 or μ ˆ 2 , do (e) Derive explicit expressions for V(μ ˆ 1 ) and V(μ you prefer and why? Exercise 4.40*. A certain company manufactures stitches for coronary bypass graft surgeries. The distribution of the length Y (in feet) of defectfree stitches manufactured by this company is assumed to have the uniform density fY (y; θ) = θ−1 ,
0 < y < θ, θ > 0.
Clearly, the larger is θ, the better is the quality of the manufacturing process. Suppose that Y1 , Y2 , . . . , Yn constitute a random sample of size n(n > 2) from fY (y; θ). A statistician proposes three estimators of the parameter μ = E(Y) = θ/2, the true average length of defectfree stitches manufactured by this company. These three estimators are as follows: (1) μ ˆ 1 = k1 Y¯ = k1 n−1
n
ˆ 1 ) = μ; i=1 Yi , where k1 is to be chosen so that E(μ
(2) μ ˆ 2 = k2 Y(n) , where Y(n) is the largest order statistic based on this random sample and where k2 is to be chosen so that E(μ ˆ 2 ) = μ; (3) μ ˆ 3 = k3 [Y(1) + Y(n) ]/2, the socalled “midpoint” of the data, where Y(1) is the smallest order statistic based on this random sample and where k3 is to be chosen so that E(μ ˆ 3 ) = μ. (a) Find the value of k1 , and then find V(μ ˆ 1 ). ˆ 2 ). (b) Find the value of k2 , and then find V(μ ˆ 3 ). (c) Find the value of k3 , and then find V(μ ˆ 2 , and μ ˆ 3 for both finite n and as n → ∞. Which (d) Compare the variances of μ ˆ 1, μ estimator do you prefer and why? Exercise 4.41*. For adult males with incurable malignant melanoma who have lived at least 25 consecutive years in Arizona, an epidemiologist theorizes that the true mean time (in years) to death differs between those adult males with a family history of skin cancer and those adult males without a family history of skin cancer.
219
Exercises
To test this theory, this epidemiologist selects a random sample of n adult males in Arizona with incurable malignant melanoma, each of whom has lived in Arizona for at least 25 consecutive years. Then, this epidemiologist and a collaborating biostatistician agree to consider the following statistical model for two random variables X and Y. Here, X is a dichotomous random variable taking the value 1 for an adult male in the random sample without a family history of skin cancer and taking the value 0 for an adult male in the random sample with a family history of skin cancer; and, Y is a continuous random variable representing the time (in months) to death for an adult male in the random sample with incurable malignant melanoma. More specifically, assume that the marginal distribution of X is Bernoulli (or pointbinomial), namely, pX (x; θ) = θx (1 − θ)(1−x) ,
x = 0, 1; 0 < θ < 1.
Moreover, assume that the conditional distribution of Y, given X = x, is negative exponential with conditional mean E(YX = x) = μ(x) = eα+βx , namely, fY (yX = x; α, β) = [μ(x)]−1 e−y/μ(x) ,
0 < y < +∞.
This twovariable model involves three unknown parameters, namely, θ(0 < θ < 1), α(−∞ < α < +∞), and β(−∞ < β < +∞). Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) constitute a random sample of size n from the joint distribution fX,Y (x, y; θ, α, β) of the random variables X and Y, where this joint distribution is given by the product fX,Y (x, y; θ, α, β) = pX (x; θ)fY (yX = x; α, β). Now, suppose that the available data contain n1 adult males without a family history of skin cancer and n0 = (n − n1 ) adult males with a family history of skin cancer. Further, assume (without loss of generality) that the n observed data pairs (i.e., the n realizations) are arranged (for notational simplicity) so that the first n1 pairs (1, y1 ), (1, y2 ), . . . , (1, yn1 ) are the observed data for the n1 adult males without a family history of skin cancer, and the remaining n0 data pairs (0, yn1 +1 ), (0, yn1 +2 ), . . . , (0, yn ) are the observed data for the n0 adult males with a family history of skin cancer. ˆ α, (a) Develop explicit expressions for the MLEs θ, ˆ and βˆ of θ, α, and β, respectively. In particular, show that these three ML estimators can be written as explicit n1 functions of one or more of the sample means x¯ = n1 /n, y¯ 1 = n−1 1 i=1 yi , and −1 n y¯ 0 = n0 i=(n1 +1) yi . (b) Using expected information, develop an explicit expression for the (3×3) largeˆ α, ˆ ˆ and β. sample covariance matrix I −1 for the three ML estimators θ, (c) If n = 50, θˆ = 0.60, αˆ = 0.50, and βˆ = 0.40, use appropriate CI calculations to determine whether this numerical information supplies statistical evidence that the true mean time to death differs between adult males with a family history of skin cancer and adult males without a family history of skin cancer, all of whom developed incurable malignant melanoma and lived at least 25 consecutive years in Arizona. Exercise 4.42*. For a certain laboratory experiment, the concentration Yx (in milligrams per cubic centimeter) of a certain pollutant produced via a chemical reaction
220
Estimation Theory
taking place at temperature x (conveniently scaled so that −1 ≤ x ≤ +1) has a normal distribution with mean E(Yx ) = θx = (β0 + β1 x + β2 x2 ) and variance V(Yx ) = σ2 . Also, the temperature x is nonstochastic (i.e., is not a random variable) and is known without error. Suppose that an environmental scientist runs this experiment N times, with each run involving a different temperature setting. Further, suppose that these N runs produce the N pairs of data (x1 , Yx1 ), (x2 , Yx2 ), . . . , (xN , YxN ). Assume that the random variables Yx1 , Yx2 , . . . , YxN constitute a set of mutually independent random variables, and that x1 , x2 , . . . , xN constitute a set of known constants. k Further, let μk = N −1 N i=1 xi , k = 1, 2, 3; and, assume that the environmental scientist chooses the N temperature values x1 , x2 , . . . , xN so that μ1 = μ3 = 0. Suppose that this environmental scientist decides to estimate the parameter θx using the straightline estimator θˆ x = (B0 + B1 x), where B0 = N −1 N i=1 Yxi and where B1 = N N 2 ˆ x Y / x . Note that θ is a straightline function of x, but that the true x i=1 i xi i=1 i model relating the expected value of Yx to x actually involves a squared term in x. Hence, the wrong model is being fit to the available data. (a) Develop explicit expressions for E(θˆ x ) and V(θˆ x ). What is the exact distribution of the estimator θˆ x ? (b) Consider the expression Q=
1 −1
[E(θˆ x ) − θx ]2 dx.
Since [E(θˆ x ) − θx ]2 is the squared bias when θˆ x is used to estimate θx at temperature setting x, Q is called the integrated squared bias. The quantity Q can be interpreted as being the cumulative bias over all values of x such that −1 ≤ x ≤ +1. It is desirable to choose the temperature settings x1 , x2 , . . . , xN to make Q as small as possible. More specifically, find the numerical value of μ2 that minimizes Q. Then, given this result, if N = 4, find a set of values for x1 , x2 , x3 , and x4 such that Q is minimized and that μ1 = μ3 = 0. Exercise 4.43*. For the ith of n (i = 1, 2, . . . , n) U.S. military families, suppose that there are yi1 events of child abuse during a period of Li1 months when the soldierfather is not at home (i.e., is deployed to a foreign country), and suppose that there are yi0 events of child abuse during a period of Li0 months when the soldierfather is at home (i.e., is not deployed). To assess whether the rate of child abuse when the soldierfather is deployed is different from the rate of child abuse when the soldierfather is not deployed, the following statistical model is proposed. Let αi ∼ N(0, σα2 ). For j = 0, 1, and given αi fixed, suppose that the random variable Yij , with realization (or observed value) yij , is assumed to have a Poisson distribution with conditional mean E(Yij αi ) = Lij λij , where ln(λij ) = αi + βDij +
p
γl Cil ;
l=1
here, λi0 and λi1 denote the respective nondeployment and deployment rates of child abuse per month for the ith family, Dij takes the value 1 if j = 1 (i.e., if the soldierfather
221
Exercises
is deployed) and Dij takes the value 0 if j = 0 (i.e., if the soldierfather is not deployed), and Ci1 , Ci2 , . . . , Cip are the values of p covariates C1 , C2 , . . . , Cp specific to the ith family. Further, conditional on αi being fixed, Yi0 and Yi1 are assumed to be independent random variables, i = 1, 2, . . . , n. (a) Find an explicit expression for cov(Yi0 , Yi1 ), and comment on the rationale for including the random effect αi in the proposed statistical model. (b) Let the random variable Yi = (Yi0 + Yi1 ) be the total number of child abuse events for the ith family. Show that the conditional distribution pYi1 (yi1 Yi = yi , αi ) of Yi1 , given Yi = yi and αi fixed, is BIN(yi , πi ), where πi =
Li1 θ ; Li0 + Li1 θ
here, θ=
λi1 = eβ λi0
is the rate ratio comparing the deployment and nondeployment rates of child abuse per month for the ith family. Note that the rate ratio parameter θ does not vary with i (i.e., does not vary across families), even though the individual rate parameters are allowed to vary with i. (c) Under the reasonable assumption that families behave independently of one another, use the conditional likelihood function L=
n
pYi1 (yi1 Yi = yi , αi )
i=1
to show that the conditional MLE θˆ of θ satisfies the equation θˆ
n i=1
yi Li1
Li0 + Li1 θˆ
=
n
yi1 .
i=1
(d) Using expected information, develop a general expression for a largesample 95% CI for the rate ratio parameter θ. Exercise 4.44*. Let Y be a continuous response variable and let X be a continuous predictor variable. Also, assume that E(YX = x) = (β0 + β1 x)
and X ∼ N(μx , σx2 ).
Further, suppose that the predictor variable X is very expensive to measure, but that a surrogate variable X ∗ is available and can be measured fairly inexpensively. Further, for i = 1, 2, . . . , n, assume that Xi∗ and Xi are related by the measurement error model Xi∗ = (Xi + Ui ), where Ui ∼ N(0, σu2 ) and where Xi and Ui are independent random variables. Suppose that it is decided to use X ∗ instead of X as the predictor variable
222
Estimation Theory
when estimating the slope parameter β1 by fitting a straightline regression model via unweighted least squares. In particular, suppose that the n mutually independent pairs (Xi∗ , Yi ) = (Xi + Ui , Yi ), i = 1, 2, . . . , n, are used to construct an estimator βˆ ∗1 of β1 of the form n ¯ ∗ )Yi (X ∗ − X ˆβ∗ = i=1 i , 1 n ∗ ¯∗ 2 i=1 (Xi − X ) ¯ ∗ = n−1 n X ∗ . where X i=1 i Using conditional expectation theory, derive an explicit expression for E(βˆ ∗1 ), and then comment on how E(βˆ ∗1 ) varies as a function of the ratio λ = σu2 /σx2 , 0 < λ < ∞. In your derivation, use the fact that Xi and Xi∗ have a bivariate normal distribution and employ the assumption that E(Yi Xi = xi , Xi∗ = xi∗ ) = E(Yi Xi = xi ), i = 1, 2, . . . , n. This assumption is known as the nondifferential error assumption and states that Xi∗ contributes no further information regarding Yi if Xi is available. For an excellent book on measurement error and its effects on the validity of statistical analyses, see Fuller (2006). Exercise 4.45*. Let the random variable Y take the value 1 if a person develops a certain rare disease, and let Y take the value 0 if not. Consider the following exponential regression model, namely,
pr(Y = 1X, C) = e(β0 +β1 X+γ C) , where X is a continuous exposure variable, C = (C1 , C2 , . . . , Cp ) is a row vector of p covariates, and γ = (γ1 , γ2 , . . . , γp ) is a row vector of p regression coefficients. Here, β1 (>0) is the key parameter of interest; in particular, β1 measures the effect of the exposure X on the probability (or risk) of developing the rare disease after adjusting for the effects of the covariates C1 , C2 , . . . , Cp . Since the disease in question is rare, it is reasonable to assume that
pr(Y = 1X, C) = e(β0 +β1 X+γ C) < 1. Now, suppose that the exposure variable X is very expensive to measure, but that a surrogate variable X ∗ for X is available and can be measured fairly inexpensively. Further, assume that X and X ∗ are related via the Berkson measurement error model (Berkson, 1950) X = α0 + α1 X ∗ + δ C + U, where α1 > 0, where U ∼ N(0, σu2 ), where δ = (δ1 , δ2 , . . . , δp ) is a row vector of p regression coefficients, and where the random variables U and X ∗ are independent given C. (a) Show that corr(X, X ∗ C) < 1, in which case X ∗ is said to be an imperfect surrogate for X (since it is not perfectly correlated with X). (b) Determine the structure of fX (xX ∗ , C), the conditional density function of X given X ∗ and C.
223
Exercises
(c) Now, suppose that an epidemiologist decides to use X ∗ instead of X for the exposure variable in the exponential regression model given above. To assess the implications of this decision, show that pr(Y = 1X ∗ , C) has the structure
∗ pr(Y = 1X ∗ , C) = e(θ0 +θ1 X +ξ C) ,
where θ0 , θ1 , and ξ are specific parametric functions of one or more of the quantities β0 , β1 , α0 , α1 , σu2 , γ , and δ . In your derivation, assume that pr(Y = 1X, X ∗ , C) = pr(Y = 1X, C); this is known as the nondifferential error assumption and states that X ∗ contributes no further information regarding Y if X is available. In particular, show that θ1 = β1 , and then comment on the implication of this result with regard to the estimation of β1 using X ∗ instead of X in the stated exponential regression model. For an application of this methodology, see Horick et al. (2006). Exercise 4.46*. Suppose that Y is a dichotomous outcome variable taking the values 0 and 1, and that X is a dichotomous predictor variable also taking the values 0 and 1. Further, for x = 0, 1, let μx = pr(Y = 1X = x) and
let δ = pr(X = 1).
(a) Suppose that X is unobservable, and that a surrogate dichotomous variable X ∗ is used in place of X. Further, assume that X and X ∗ are related via the misclassification probabilities πxx∗ = pr(X ∗ = x∗ X = x),
x = 0, 1
and x∗ = 0, 1.
Find an explicit expression for corr(X, X ∗ ). For what values of π00 , π10 , π01 , and π11 will corr(X, X ∗ ) = 1, in which case X ∗ is said to be a perfect surrogate for X? Comment on your findings. (b) Now, consider the risk difference parameter θ = (μ1 − μ0 ). With μ∗x∗ = pr(Y = 1X ∗ = x∗ ), prove that θ∗  ≤ θ, where θ∗ = (μ∗1 − μ∗0 ). Then, comment on this finding with regard to the misclassification bias resulting from the use of X ∗ instead of X for estimating θ. In your proof, assume that pr[Y = 1(X = x) ∩ (X ∗ = x∗ )] = pr(Y = 1X = x); this nondifferential error assumption states that X ∗ contributes no further information regarding Y if X is available.
224
Estimation Theory
SOLUTIONS Solution 4.1 (a) Method of Moments: n n n μ 1 ¯ = 1 Yx is equated to E(Y) E(Yx ) = x, Y¯ = n n n x=1
x=1
so that nY¯ μ ˆ 1 = n
x=1 x
=
nY¯ n(n + 1) 2
=
x=1
2 ¯ Y. n+1
Unweighted Least Squares: Q=
n
[Yx − E(Yx )]2 =
x=1
n
(Yx − xμ)2 .
x=1
So, n n n ∂Q =2 (Yx − xμ)(−x) = 0 =⇒ μ x2 = xYx , ∂μ x=1
x=1
x=1
so that n n 6 nx=1 xYx xYx x=1 xYx
= = . μ ˆ 2 = x=1 n 2 n(n + 1)(2n + 1) n(n + 1)(2n + 2) x=1 x 6 Maximum Likelihood: / −(yx − xμ)2 exp L= √ 2x3 σ2 2π(r3 σ2 )1/2 x=1 ⎛ ⎞ ⎡ ⎤ n n 1 = (2π)−n/2 σ−n ⎝ x−3/2 ⎠ exp ⎣− 2 x−3 (yx − xμ)2 ⎦ . 2σ n
.
1
x=1
x=1
So, n n n 3 1 −3 ln L = − ln(2π) − n ln(σ) − ln x − 2 x (yx − xμ)2 . 2 2 2σ x=1
x=1
Thus, n −1 −3 ∂ ln L x (yx − xμ)(−x) = 0 = 2 ∂μ σ x=1
225
Solutions
gives μ
n
x−1 =
x=1
so that
n
x−2 yx ,
x=1
n −2 x=1 x Yx . μ ˆ3 = n −1 x=1 x
Now, since μ ˆ 1, μ ˆ 2 , and μ ˆ 3 are each a linear combination of mutually independent normal random variables, all three of these estimators are normally distributed. Now, E(μ ˆ 1) =
n 2 1 2 ¯ = E(Y) E(Yx ) n+1 n+1 n x=1
=
2 n+1
μ n
n
x = μ,
x=1
2 n 4σ2 nx=1 x3 2 1 3 2 x σ = 2 = σ2 . n+1 n (n + 1)2 n2 x=1 n x=1 x(xμ) = μ, n 2 x=1 x n n 2 3 2 36σ2 x=1 x (x σ ) = x5 . %n &2 2 n2 (n + 1)2 (2n + 1)2 x x=1 x=1 n −2 x=1 x (xμ) = μ, n −1 x=1 x n −4 3 2 2 x=1 x (x σ ) = % σ &. %n &2 n −1 −1 x=1 x x=1 x
V(μ ˆ 1) = E(μ ˆ 2) = V(μ ˆ 2) =
E(μ ˆ 3) = V(μ ˆ 3) =
(b) Clearly, since all these estimators are unbiased estimators of μ, we want to use the estimator with the smallest variance. We could analytically compare V(μ ˆ 1 ), V(μ ˆ 2 ), and V(μ ˆ 3 ), but there is a more direct way. Since n n ∂ ln L 1 −2 μ −1 = 2 x yx − 2 x , ∂μ σ σ x=1
and since
∂ 2 ln L E ∂μ ∂σ2
x=1
= 0,
226
Estimation Theory
so that the expected information matrix is a diagonal matrix, the Cramér–Rao lower bound for the variance of any unbiased estimator of μ using {Y1 , Y2 , . . . , Yn } is
−E
1
= n
∂ 2 ln L ∂μ2
σ2
x=1 x
−1
,
which is achieved by μ ˆ 3 for any finite n. So, the “best” exact 95% CI for μ should be based on μ ˆ 3 , the minimum variance bound unbiased estimator (MVBUE) of μ. Since μ ˆ −μ ∼ N(0, 1), $3 V(μ ˆ 3) $ the “best” exact 95% CI for μ is μ ˆ 3 ± 1.96 V(μ ˆ 3 ). For the given data,
μ ˆ3 =
2 3 4 5 6 + + + + 1 4 9 16 25 = 1.641, 1 1 1 1 1+ + + + 2 3 4 5
5
−2 x=1 x (x + 1) = 5 −1 x=1 x
and V(μ ˆ 3) =
(2) = 0.876, (2.283)
so that the computed exact 95% CI for μ is √ 1.641 ± 1.96 0.876 = 1.641 ± 1.835 = (−0.194, 3.476). Solution 4.2 (a) The unweighted leastsquares estimator θˆ uls is the value of θ that minimizes Q=
n (Yi − θxi )2 . i=1
Solving ∂Q = −2 xi (Yi − θxi ) = 0 ∂θ n
i=1
yields θˆ uls =
n
xi Yi
i=1
; n
xi2 .
i=1
Since ∂ 2Q =2 xi2 > 0, ∂θ2 n
i=1
227
Solutions
θˆ uls minimizes Q. Also, n
i=1 xi E(Yi ) = n 2 i=1 xi
E(θˆ uls ) =
n
i=1 xi (θxi ) = θ, 2 i=1 xi
n
and V(θˆ uls ) =
n
=θ
n 2 2 i=1 xi V(Yi ) i=1 xi (θxi ) = n n 2 ( i=1 xi )2 ( i=1 xi2 )2
n
xi3
⎛ ⎞2 ; n 2 ⎝ x ⎠ . i
i=1
i=1
(b) The method of moments estimator is obtained by solving for θ using the equation ¯ Y¯ = E(Y), where Y¯ = n−1
n
Yi
i=1
and ¯ = n−1 E(Y)
n
E(Yi ) = n−1
i=1
= θn−1
n
n (θxi ) i=1
xi = θ¯x.
i=1
Hence, the equation ¯ = θ¯x Y¯ = E(Y) gives ¯ x. θˆ mm = Y/¯ Obviously, E(θˆ mm ) = θ, and
−1 n Y V n ¯ i i=1 V(Y) = V(θˆ mm ) = (¯x)2 (¯x)2 n−2 ni=1 V(Yi ) n−2 ni=1 (θxi ) = = (¯x)2 (¯x)2 =
θ n−1 θ¯x θ = n = . n¯x (¯x)2 i=1 xi
228
Estimation Theory
(c) Now, with y = (y1 , y2 , . . . , yn ), we have L(y; θ) =
n
.
i=1 n
=
(θxi )yi e−θxi yi !
i=1 yi
θ
/
yi n −θ ni=1 xi i=1 xi e n , i=1 yi !
so that ⎛ ⎞ n n n n ln L(y; θ) = ⎝ yi ⎠ ln θ + yi ln xi − θ xi − ln yi !. i=1
i=1
So, ∂ ln L(y; θ) = ∂θ
%n
i=1
&
i=1 yi −
θ
n
i=1
xi = 0
i=1
gives θˆ ml =
n
Yi
; n
i=1
i=1
xi =
Y¯ (= θˆ mm ). x¯
So, E(θˆ ml ) = E(θˆ mm ) = θ and
θ V(θˆ ml ) = V(θˆ mm ) = . n¯x
Note that one can use exponential family theory to show that θˆ ml (= θˆ mm ) is the MVBUE of θ. In particular, ⎫ ⎧ ⎛ ⎞ n yi ⎬ n n ⎨ x i . L(y; θ) = exp θˆ ml ⎝ xi ⎠ (ln θ) − θ xi + ln i=1 n ⎭ ⎩ i=1 yi ! i=1
(d) From part (c),
so that
i=1
− ni=1 yi ∂ 2 ln L(y; θ) = , θ2 ∂θ2
∂ 2 ln L(y; θ) −Ey ∂θ2
=
n
i=1 E(Yi ) = θ θ2
Hence, CRLB =
1 θ = , (n¯x/θ) n¯x
n
x i=1 xi = n¯ .
θ2
θ
229
Solutions
which is achieved by the estimators θˆ ml and θˆ mm (which are identical), but which is not achieved by θˆ uls . Solution 4.3 ˆ = k 2 σ2 + (1 − k)2 σ2 + 2k(1 − k)ρσ1 σ2 , it follows that (a) Since V(θ) 1 2 ˆ ∂V(θ) = 2kσ12 − 2(1 − k)σ22 + 2(1 − 2k)ρσ1 σ2 = 0. ∂k Solving the above equation gives σ2 −ρ (1 − ρλ) σ1 = = k∗ = 2 , σ1 σ2 2 (1 + λ2 − 2ρλ) σ1 + σ2 − 2ρσ1 σ2 + − 2ρ σ2 σ1 σ22 − ρσ1 σ2
k ∗ > 0,
ˆ which minimizes V(θ). Interestingly, if ρ > λ, then k ∗ > 1, so that the unbiased estimator θˆ 2 gets negative weight. And, when λ = 1, so that σ1 = σ2 , k ∗ = 12 , regardless of the value of ρ. (b) In general, since σ2 = σ1 /λ, σ2 σ2 ˆ = k 2 σ2 + (1 − k)2 1 + 2k(1 − k)ρ 1 V(θ) 1 λ λ2
1 ρ 2 2 2 = σ1 k + (1 − k) 2 + 2k(1 − k) . λ λ So, after substituting k ∗ for k in the above expression and doing some algebraic simplification, we obtain ∗ 2 ˆ V(θ ) = σ1 1 −
(λ − ρ)2 . (1 − 2ρλ + λ2 )
Thus, if λ = ρ, V(θˆ ∗ ) < σ12 . For further discussion, see SamuelCahn (1994). Solution 4.4. Since ˆ = E(β)
n
ci E(Xi ) =
i=1
we require that
n
ci (βai ) = β
i=1
n i=1
n
i=1 ci ai = 1. Now,
ˆ = V(β)
n i=1
ci2 V(Xi ) =
n i=1
ci2 σi2 .
ci ai ,
230
Estimation Theory
So, we need to minimize ni=1 ci2 σi2 subject to the constraint ni=1 ci ai = 1. Although the Lagrange Multiplier Method could be used, we will do this minimization directly. So, ˆ = V(β)
n
ci2 σi2
i=1
=
n−1
ci2 σi2 + (cn an )2
i=1
=
n−1
⎛ ci2 σi2 + ⎝1 −
i=1
n−1
σn2
a2n ⎞2 ci ai ⎠
σn2 a2n
i=1
.
So, for i = 1, 2, . . . , (n − 1), ⎛ ⎞ n−1 ˆ σ2 dV(β) 2 ⎝ ⎠ = 2ci σi + 2 1 − ci ai (−ai ) 2n 0= dci an i=1
= 2ci σi2 + 2(cn an )(−ai ) a cn σn2 ⇒ 0 = ci σi2 − i an 2 ai cn σn2 ⇒ 0 = ai ci − an σi2 ⇒0=
n
ai ci −
i=1
⇒0=1−
n 2 a cn i=1
σn2
a2n
i
σn2
an
σi2
n cn σn2 a2i , an σ2 i=1 i
so that (an /σn2 ) cn = n . 2 2 i=1 (ai /σi ) Substituting this result into the above equations yields (ai /σi2 ) , ci = n 2 i=1 (ai /σi )
i = 1, 2, . . . , n.
For this choice of the ci ’s, βˆ =
n i=1
(ai /σi2 )
n
2 i=1 (ai /σi )
Xi .
231
Solutions
Since βˆ is a linear combination of mutually independent normal variates, ˆ V(β)], ˆ βˆ ∼ N[E(β), with ˆ =β E(β) and with ˆ = V(β)
n
2
(ai /σi2 )
n
2 i=1 (ai /σi )
i=1
σi2
n (a2 /σ2 ) = ni=1 2i 2i [ i=1 (ai /σi )]2 ⎡ ⎤−1 n = ⎣ (a2i /σi2 )⎦ . i=1
Solution 4.5 (a) Since (ni − 1)S2i σ2
(n − 1) ∼ χ2n −1 = GAMMA α = 2, βi = i i 2
we have
E
so that
(ni − 1)S2i σ2
=2·
,
(ni − 1) , 2
(ni − 1) E(S2i ) = (ni − 1), σ2
and hence E(S2i ) = σ2 , i = 1, 2, . . . , k. Thus, ⎡ ⎤ k k 2 2 wi S ⎦ = wi E(S2 ) E(ˆσ ) = E ⎣ i
i=1
⎛ = σ2 ⎝
k
i
⎞
i=1
wi ⎠ = σ2 .
i=1
(b) Now, since S21 , S22 , . . . , S2k constitute a set of k mutually independent random variables, we have V(ˆσ2 ) =
k i=1
wi2 V(S2i ).
232
Estimation Theory
And, since V
(ni − 1)S2i
=
σ2
(ni − 1)2 V(S2i ) σ4
= (2)2
(ni − 1) = 2(ni − 1), 2
it follows that V(S2i ) = 2σ4 /(ni − 1), so that V(ˆσ2 ) =
k
wi2
i=1
So, V(ˆσ2 ) ∝
k−1
2σ4 . (ni − 1)
⎛ wi2 (ni − 1)−1 + ⎝1 −
i=1
⎞2
k−1
wi ⎠ (nk − 1)−1 .
i=1
Thus, 2(1 − k−1 2wi ∂V(ˆσ2 ) i=1 wi ) = 0, = − ∂wi (ni − 1) (nk − 1)
i = 1, 2, . . . , (k − 1),
so that wk wi − = 0, (ni − 1) (nk − 1)
i = 1, 2, . . . , k.
Hence, (nk − 1)
k
wi = wk
i=1
k (ni − 1), i=1
or wk =
(nk − 1) . (N − k)
And, since wi = [(ni − 1)/(nk − 1)]wk , we have, in general, wi =
(n − 1) (ni − 1) = i , k (N − k) (ni − 1)
i = 1, 2, . . . , k.
i=1
Using these optimal choices for the weights w1 , w2 , . . . , wk , the estimator σˆ 2 takes the specific form σˆ 2 =
k i=1
(ni − 1)
k
i=1 (ni − 1)
k
S2i
=
i=1
ni
¯ 2 j=1 (Yij − Yi )
(N − k)
,
233
Solutions
which is recognizable as a pooled variance estimator often encountered when using analysis of variance (ANOVA) methods. Solution 4.6. The joint distribution of X1 , X2 , . . . , Xn is pX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ; π) =
n
Ckxi πxi (1 − π)k−xi
i=1
⎛
=⎝
n
⎞
n
Ckxi ⎠ π
i=1 xi (1
− π)nk−
n
i=1 xi .
i=1
Substituting θ = πk and u =
n
i=1 xi in the above expression, we have
⎛ ⎞ n pX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ; θ) = (θ1/k )u (1 − θ1/k )nk−u ⎝ Ckxi ⎠ ,
i=1
which has the form g(u; θ) · h(x1 , x2 , . . . , xn ), where h(x1 , x2 , . . . , xn ) does not (in any way) depend on θ. Hence, by the Factorization Theorem, U = ni=1 Xi is sufficient for θ. Note that U ∼ BIN(nk, π). To show that this binomial distribution represents a complete family of distributions, let g(U) denote a generic function of U, and note that E[g(U)] =
nk
u nk−u g(u)Cnk u π (1 − π)
u=0
= (1 − π)nk
nk
π u g(u)Cnk . u 1−π
u=0
Using this result and appealing to the theory of polynomials, we find that the condition E[g(U)] = 0 ∀π,
0 < π < 1,
implies that g(u) = 0, u = 0, 1, . . . , nk. Hence, U is a complete sufficient statistic for θ. Let U ∗ =
1 0
if X1 = k, otherwise.
Then, E(U ∗ ) = πk . Thus, by the Rao–Blackwell Theorem, θˆ = E(U ∗ U = u) = pr(U ∗ = 1U = u) = pr(X1 = kU = u) is the MVUE of θ. Clearly, since U = u = k, (k + 1), . . . , nk,
n
ˆ i=1 Xi , θ = 0 for u = 0, 1, . . . , (k − 1). So, for
& % θˆ = E U ∗ U = u = pr(U ∗ = 1U = u)
234
Estimation Theory
= pr(X1 = kU = u) pr[(X1 = k) ∩ (U = u)] pr(U = u) pr(X1 = k) × pr( ni=2 Xi = u − k) = pr( ni=1 Xi = u) =
=
k(n−1) u−k π (1 − π)k(n−1)−(u−k)
(πk ) × Cu−k
u nk−u Cnk u π (1 − π) k(n−1)
=
Cu−k
,
Cnk u
where the nexttolast line follows because BIN(nk, π). ⎧ ⎪ ⎨0, k(n−1) ˆ So, θ = Cu−k ⎪ , ⎩ Cnk u
n
i=2 Xi ∼ BIN[k(n − 1), π] and U ∼
u = 0, 1, 2, . . . , (k − 1), u = k, (k + 1), . . . , nk,
where u = ni=1 xi . ˆ = θ, note that To demonstrate that E(θ) ˆ = E(θ)
=
nk Ck(n−1) u−k u nk−u Cnk u π (1 − π) nk C u u=k nk k(n−1) Cu−k πu (1 − π)nk−u u=k
=
k(n−1)
k(n−1) z+k π (1 − π)nk−(z+k)
Cz
z=0
= πk
k(n−1)
k(n−1) z π (1 − π)k(n−1)−z
Cz
z=0
= πk [π + (1 − π)]k(n−1) = πk . Solution 4.7 (a) Now, L(y; σr ) =
n i=1
1
√ 2πσ
2 2 e−yi /2σ
= (2π)−n/2 θ−n/r exp
. n
yi2 − i=1 2θ2/r
/ ,
235
Solutions
so that U =
n
2 r i=1 Yi is a sufficient statistic for θ = σ . Also,
n n 2 U Yi n = = Z2i ∼ χ2n = GAMMA(α = 2, β = ), σ 2 σ2 i=1
i=1
since Zi ∼ N(0, 1) and the {Zi }ni=1 are mutually independent. So, E U/σ2 = n, so that E(U) = nσ2 . So, we might consider some function of U r/2 . Thus, E
E U r/2 U r/2 Γ(n/2 + r/2) r/2 = = 2 r 2 σ Γ(n/2) σ =
Γ[(n + r)/2] r/2 2 . Γ( n2 )
So, θˆ = 2−r/2
Γ(n/2) U r/2 Γ[(n + r)/2]
is a function of a sufficient statistic (namely, U) that is an unbiased estimator of θ. As a special case, when r = 2, θˆ = 2−1
Γ(n/2) U = 2−1 Γ(n/2 + 1)
U 2 U= , n n
as expected. (b) Since L(y; θ) ≡ L = (2π)−n/2 θ−n/r exp
. n
yi2 − i=1 2/r 2θ
/ ,
we have n n u ln L = − ln(2π) − ln θ − 2/r , 2 r 2θ
where u =
n
yi2 .
i=1
So, n ∂ ln L =− − rθ ∂θ
−2 r
−2/r−1 θ −n θ−2/r−1 u u = + , 2 rθ r
and ∂ 2 ln L n = 2 + 2 ∂θ rθ
−2/r−2 −2 (2 + r)θ−2/r−2 u u θ n . −1 = 2 − r r rθ r2
236
Estimation Theory
Thus,
∂ 2 ln L −E ∂θ2
=
−n (2 + r)θ−2/r−2 (nσ2 ) + 2 rθ r2
−n (2 + r)θ−2/r−2 nθ2/r + rθ2 r2 −n n(2 + r) 2n = 2 + = 2 2. rθ r2 θ2 r θ =
So, the CRLB is CRLB =
r2 θ2 r2 σ2r = . 2n 2n
When r = 2, we obtain CRLB =
2σ4 4σ4 = , 2n n
which is achieved by θˆ since σ2 U U =V · 2 = V θˆ = V n n σ
σ4 V(χ2n ) = n2
σ4 2σ4 (2n) = . 2 n n
Solution 4.8 (a) First, L(y; θ) =
n
√
√
[θ−1/2 e−yi / θ ] = θ−n/2 e−s/ θ ,
i=1
where s =
n
i=1 yi . Solving for θ in the equation
−n s ∂ ln L(y; θ) = + 3/2 = 0, ∂θ 2θ 2θ yields the MLE θˆ = Y¯ 2 . Now, ∂ 2 ln L(y; θ) n 3s = 2 − 5/2 , ∂θ2 2θ 4θ so that
∂ 2 ln L(y; θ) −Ey ∂θ2
=
−n 3(nθ1/2 ) −n 3n n + = 2 + 2 = 2. 2 2θ 4θ5/2 2θ 4θ 4θ
237
Solutions
So, an appropriate largesample 95% CI for θ is 
4θˆ 2 Y¯ 2 ± 1.96 , n or, equivalently, 3.92Y¯ 2 Y¯ 2 ± √ . n When s = 40 and n = 50, this CI is
2 40 40 2 (3.92) 50 ± √ 50 50
or (0.285, 0.995). √ (b) Clearly, S = ni=1 Yi is a sufficient statistic for θ = α2 . And, E(S) = n θ and V(S) = nθ. Since E(S2 ) = nθ + n2 θ = n(n + 1)θ, it follows that S2 = θˆ ∗ n(n + 1) is the MVUE of θ (because S is a complete sufficient statistic for θ from exponential family theory). Now, V(S2 ) E(S4 ) − [E(S2 )]2 = . V(θˆ ∗ ) = 2 n (n + 1)2 n2 (n + 1)2 Since S ∼ GAMMA(α =
√ θ, β = n), it follows that
E(Sr ) =
Γ(n + r) r Γ(n + r) r/2 α = θ , Γ(n) Γ(n)
r ≥ 0.
So, E(S4 ) =
Γ(n + 4) 2 θ = n(n + 1)(n + 2)(n + 3)θ2 . Γ(n)
V(θˆ ∗ ) =
n(n + 1)(n + 2)(n + 3)θ2 − n2 (n + 1)2 θ2 n2 (n + 1)2
So,
=
θ2 [n2 + 5n + 6 − n2 − n] n(n + 1)
=
θ2 (4n + 6) n(n + 1)
=
2(2n + 3) 2 θ . n(n + 1)
238
Estimation Theory
(c) From part (a),
∂ 2 ln L(y; θ) −Ey ∂θ2
=
n , 4θ2
so that the CRLB is CRLB = However, V(θˆ ∗ ) =
4θ2 . n
2 2n + 3 4θ 4θ2 2(2n + 3)θ2 = > , n(n + 1) 2n + 2 n n
so θˆ ∗ does not achieve the CRLB. (d) In general, MSE(θˆ ∗ , θ) = V(θˆ ∗ ) + [E(θˆ ∗ ) − θ]2 = V(θˆ ∗ ) + 0 = Since θˆ = Y¯ 2 ,
2(2n + 3) 2 θ . n(n + 1)
% & " % θ E Y¯ 2 = V Y¯ + E Y¯ = + θ. n
And, V(S2 ) V Y¯ 2 = n4 =
n(n + 1)(n + 2)(n + 3)θ2 − n2 (n + 1)2 θ2 n4
=
(n + 1)θ2 2 [n + 5n + 6 − n2 − n] n3
=
2(n + 1)(2n + 3)θ2 . n3
So, ˆ θ) = MSE(θ,
2(n + 1)(2n + 3)θ2 θ2 + n3 n2
θ2 = 3 [2(n + 1)(2n + 3) + n] n =
(4n2 + 11n + 6) 2 θ . n3
239
Solutions
Since θˆ ∗ =
S2 = n(n + 1)
n ˆ θ, n+1
we have MSE(θˆ ∗ , θ) = V(θˆ ∗ ) =
2 n ˆ < V(θ) ˆ < MSE(θ, ˆ θ), V(θ) n+1
for all finite n, so that θˆ ∗ is preferable to θˆ for finite n. However,
MSE(θˆ ∗ , θ) n→∞ MSE(θ, ˆ θ)
lim
= 1,
so that there is no difference asymptotically. Solution 4.9 (a) Let C be the event that a subject is classified as having been recently exposed to benzene, and let E be the event that a subject has truly been recently exposed to ¯ ¯ so that pr(C) = γπ + δ(1 − benzene. Then, pr(C) = pr(CE)pr(E) + pr(CE)pr( E), π). Since X has a binomial distribution with mean E(X) = n[pr(C)], equating X to E(X) via the method of moments gives X −δ π ˆ = n , γ−δ
as the unbiased estimator of π. Since V(X) = n[pr(C)][1 − pr(C)], the variance of the estimator π ˆ is V(π) ˆ = =
V (X/n) (γ − δ)2 [γπ + δ(1 − π)][1 − γπ − δ(1 − π)] . n(γ − δ)2
2 ˆ π) ˆ ∼ ˙ N(0, 1) by (b) Since n is large, the standardized random variable (π ˆ − π)/ V( Slutsky’s Theorem, where [γπ ˆ + δ(1 − π)][1 ˆ − γπ ˆ − δ(1 − π)] ˆ ˆ π) V( ˆ = . n(γ − δ)2 Thus, an appropriate largesample 95% CI for π is 2 ˆ π). π ˆ ± 1.96 V( ˆ When n = 50, δ = 0.05, γ = 0.90, and x = 20, the computed 95% interval for π is 0.412 ± 1.96(0.0815) = (0.252, 0.572).
240
Estimation Theory
Solution 4.10 (a) Since (Y11 , Y00 , n − Y11 − Y00 ) ∼ MULT n; (π2 + θ), [(1 − π)2 + θ], 2[π(1 − π) − θ] , it follows directly that ˆ = (π ˆ 2 + θ)
Y11 n
ˆ = and [(1 − π) ˆ 2 + θ]
Y00 . n
Solving these two equations simultaneously gives the desired expressions for π ˆ ˆ and θ. (b) Appealing to properties of the multinomial distribution, we have E(π) ˆ =
1 [E(Y11 ) − E(Y00 )] + 2 2n
1 n(π2 + θ) − n[(1 − π)2 + θ] + 2 2n 1 1 = + (π2 − 1 + 2π − π2 ) = π, 2 2 =
so that π ˆ is an unbiased estimator of the parameter π. And, with β11 = (π2 + θ) and β00 = [(1 − π)2 + θ], it follows that V(π) ˆ = (4n2 )−1 [V(Y11 ) + V(Y00 ) − 2cov(Y11 , Y00 )] = (4n2 )−1 [nβ11 (1 − β11 ) + nβ00 (1 − β00 ) − 2nβ11 β00 ] = (4n)−1 [β11 (1 − β11 ) + β00 (1 − β00 ) − 2β11 β00 ]. ˆ π) (c) Since βˆ 11 = Y11 /n and βˆ 00 = Y00 /n, it follows that the estimator V( ˆ of V(π) ˆ is equal to Y11 Y Y Y Y11 Y00 ˆ π) V( ˆ = (4n)−1 1 − 11 + 00 1 − 00 − 2 . n n n n n n When n = 30, y11 = 3, and y00 = 15, then the estimated value π ˆ of π is equal to π ˆ =
1 (3 − 15) + = 0.50 − 0.40 = 0.10. 2 30
And, the estimated variance of π ˆ is equal to ˆ π) V( ˆ = [4(30)]−1 = 0.0020.
3 30
27 30
+
15 30
15 30
−2
3 30
15 30
241
Solutions
Thus, the computed 95% CI for π is equal to 2 √ ˆ π) π ˆ ± 1.96 V( ˆ = 0.10 ± 1.96 0.0020 = 0.10 ± 0.0877, or (0.0123, 0.1877). Solution 4.11 (a) With Y = g(X), where X = (X1 , X2 , . . . , Xk ) and μ = (μ1 , μ2 , . . . , μk ), and with E(Xi ) = μi , V(Xi ) = σi2 , and cov(Xi , Xj ) = σij for all i = j, i = 1, 2, . . . , k and j = 1, 2, . . . , k, then the delta method gives E(Y) ≈ g(μ) and
V(Y) ≈
k ∂g(μ) 2 i=1
∂Xi
σi2 + 2
k k−1 ∂g(μ) ∂g(μ) i=1 j=i+1
∂Xi
∂Xj
σij ,
where ∂g(μ) ∂g(X) = . ∂Xi ∂Xi X=μ In our particular situation, k = 2; and, with X1 ≡ Y10 and X2 ≡ Y01 , then Y = 9 = ln X1 − ln X2 . So, g(X1 , X2 ) = ln(X1 /X2 ) = ln OR ∂g(X1 , X2 ) 1 = ∂X1 X1
and
∂g(X1 , X2 ) 1 =− . ∂X2 X2
Now, E(X1 ) = nπ10 , E(X2 ) = nπ01 , V(X1 ) = nπ10 (1 − π10 ), and V(X2 ) = nπ01 (1 − π01 ). Also, cov(X1 , X2 ) = −nπ10 π01 . Finally, 9 ≈ V(ln OR)
2 1 −1 2 nπ10 (1 − π10 ) + nπ01 (1 − π01 ) nπ10 nπ01 −1 1 (−nπ10 π01 ) +2 nπ10 nπ01
=
2 (1 − π10 ) (1 − π01 ) + + nπ10 nπ01 n
=
1 1 1 1 2 − + − + nπ10 n nπ01 n n
=
1 1 + . nπ10 nπ01
Since E(Y10 ) = nπ10 and E(Y01 ) = nπ01 , we have 1 1 9 ≈ ˆ ˙ + . V(ln OR) Y10 Y01
242
Estimation Theory
9 is For the given set of data, the estimate of the variance of ln OR 9 ≈ 1 + 1 = 0.107. ˆ V(ln OR) 25 15 (b) Assume Z ∼ N(0, 1). Then, 0.95 = pr{−1.96 < Z < +1.96} ⎧ ⎫ ⎪ ⎪ ⎨ ⎬ 9 ln OR − ln OR ≈ pr −1.96 < 2 < 1.96 ⎪ ⎪ ⎩ ⎭ 9 ˆ V(ln OR) 2 9 < ln OR 9 − 1.96 V(ln ˆ OR) = pr ln OR 2 9 9 + 1.96 V(ln ˆ OR) < ln OR .
2 9 ˆ −1.96 V(ln OR)
9 = pr (OR)e
/ 2 9 ˆ +1.96 V(ln OR)
9 < OR < (OR)e
.
So, the 95% CI for OR is 2 2 9 9 ˆ ˆ −1.96 V(ln OR) +1.96 V(ln OR) 9 9 (OR)e , (OR)e . For the data in part (a), we obtain
25 −1.96√0.107 25 +1.96√0.107 e , e = (0.878, 3.164). 15 15
Solution 4.12 (a) The appropriate likelihood function L is ⎡ ⎤ yij ni 1 Lij λi e−Lij λi ⎥ ⎢ L= ⎦, ⎣ yij ! i=0 j=1
so that ln L can be written as ⎛ ⎤ ⎡ ⎞ ni ni ni ni 1 ⎣ ln L = yij ln Lij + ⎝ yij ⎠ ln λi − λi Lij − yij !⎦ . i=0
j=1
j=1
So, for i = 0, 1, ∂ ln L = ∂λi
j=1
n1
j=1 yij
λi
−
ni j=1
Lij = 0
j=1
243
Solutions
gives
ni
j=1 λˆ i = ni
Yij
j=1 Lij
as the MLE of λi . Also, for i = 0, 1, ∂ 2 ln L ∂λ2i
=
−
n1
j=1 yij , λ2i
so that, with E(Yij ) = Lij λi and ∂ 2 ln L/∂λ1 ∂λ2 = 0, we have .
V(λˆ i ) = −E
∂ 2 ln L
/−1
∂λ2i
λ = ni i
j=1 Lij
.
ˆ of ln ψ is Now, by the invariance principle, the MLE ln ψ ˆ = ln λˆ 1 − ln λˆ 0 . ln ψ And, using the delta method, we have ˆ = V(ln λˆ 1 ) + V(ln λˆ 0 ) V(ln ψ) 2 2 1 1 ≈ V(λˆ 1 ) + V(λˆ 0 ) λ1 λ0 =
λ1
1 n1
j=1 L1j
+
λ0
1 n0
j=1 L0j
.
Hence, from ML theory, the random variable ˆ − ln ψ ln ψ = 2 ˆ ˆ V(ln ψ)
ˆ − ln ψ ln ψ 1 n1
j=1 y1j
˙ N(0, 1) for large samples, 1/2 ∼
+ n01
j=1 y0j
so that a MLbased largesample 100(1 − α)% CI for ln ψ is 2 ˆ ˆ ± Z1−α/2 V(ln ˆ = (ln λˆ 1 − ln λˆ 0 ) ln ψ ψ) ⎛ ⎞1/2 1 1 ⎠ , + n0 ± Z1−α/2 ⎝ n1 j=1 y1j j=1 y0j where pr(Z > Z1−α/2 ) = α/2 when Z ∼ N(0, 1).
244
Estimation Theory
(b) Based on the CI for ln ψ developed in part (a), an appropriate MLbased largesample 100(1 − α)% CI for the rate ratio ψ is ⎡
⎛
1 ⎢ ˆ (ψ)exp ⎣±Z1−α/2 ⎝ n1
1 + n0
j=1 y1j
j=1 y0j
⎞1/2 ⎤ ⎠ ⎥ ⎦.
For the given data, the computed 95% CI for ψ is
1 1 1/2 40/350 exp ±1.96 + = (1.306)e±0.454 , 35/400 40 35
or (0.829, 2.056). Since the number 1 is contained in this 95% CI, these data provide no evidence in favor of the proposed theory. Of course, there could be several reasons why there were no significant findings. In particular, important individualspecific risk factors for skin cancer and related skin conditions were not considered, some of these important risk factors being skin color (i.e., having fair skin), having a family history of skin cancer, having had a previous skin cancer, being older, being male, and so on. Solution 4.13 (a) The likelihood function L(t1 , t2 , . . . , tn ) ≡ L is L=
n
fT (ti ; θ) =
n
n θe−θti = θn e−θ i=1 ti . i=1
i=1
So, ln L = n ln θ − θ
n
ti ,
i=1
n ∂L = − ti , ∂θ θ n
i=1
and ∂ 2L −n = 2. ∂θ2 θ Thus, the largesample variance of θˆ is
2 ˆ = −E ∂ ln L V(θ) ∂θ2
−1 =
θ2 . n
245
Solutions
(b) Now, pr(Ti > t∗ ) =
∞ t∗
∞ ∗ θe−θti dti = −e−θti ∗ = e−θt . t
So, the likelihood function L∗ (y1 , y2 , . . . , yn ) ≡ L∗ is L∗ =
n i=1
= e−θt
∗ yi ∗ 1−yi −θt −θt e 1−e
∗ n y i=1 i (1
∗ − e−θt )n−
n
i=1 yi .
So, ∗
ln L∗ = −θt∗ n¯y + n(1 − y¯ ) ln(1 − e−θt ), and ∗
∂ ln L∗ t∗ e−θt = −t∗ n¯y + n(1 − y¯ ) ∗ . ∂θ (1 − e−θt ) So, ∗ ∗ ∂ ln L∗ = 0 ⇒ n(1 − y¯ )t∗ e−θt = nt∗ y¯ (1 − e−θt ) ∂θ ∗
∗
⇒ (1 − y¯ )e−θt = y¯ (1 − e−θt ) ∗
⇒ e−θt = y¯
− ln y¯ 1 1 = ln . ⇒ θˆ ∗ = t∗ t∗ y¯
(c) Now, ∗ ∗ ∗ ∗ −t∗ e−θt (1 − e−θt ) − e−θt (t∗ e−θt ) ∂ 2 ln L∗ ∗ = nt (1 − y¯ ) ∗ ∂θ2 (1 − e−θt )2 =
−nt∗ (1 − y¯ ) ∗ −θt∗ (t e ), ∗ (1 − e−θt )2
so that
∂ 2 ln L∗ −E ∂θ2
∗
=
¯ n(t∗ )2 e−θt E(1 − Y) ∗ 2 −θt (1 − e )
=
∗ n(t∗ )2 e−θt (1 − e−θt ) ∗ (1 − e−θt )2
=
n(t∗ )2 . ∗ (eθt − 1)
∗
246
Estimation Theory
∗ So, the largesample variance of θˆ ∗ is (eθt − 1)/n(t∗ )2 . Hence, with t∗ ≥ E(T) = θ−1 , we have
ˆ θ2 /n V(θ) θ2 (t∗ )2 = θt∗ = θt∗ < 1, ∗ 2 ∗ ˆ (e − 1) (e − 1)/n(t ) V(θ ) so that θˆ is preferred based solely on largesample variance considerations. This finding reflects the fact that we have lost information by categorizing {T1 , T2 , . . . , Tn } into dichotomous data {Y1 , Y2 , . . . , Yn }. However, if the remission times are measured with error, then θˆ ∗ would be preferred to θˆ on validity grounds; in other words, if the remission times are measured with error, then θˆ would be an asymptotically biased estimator of the unknown parameter θ. Solution 4.14 (a) The parameter of interest is θ = pr(Y = 0) = e−λ , so that λ = −ln θ. Now, with y = (y1 , y2 , . . . , yn ), L(y; θ) =
n (−ln θ)yi θ yi !
i=1
where s =
θn (−ln θ)s , = n i=1 yi !
n
i=1 yi . So,
ln L(y; θ) = n ln θ + s ln(−ln θ) −
n
ln(yi !);
i=1
s ∂ ln L(y; θ) n ; = + θ θ ln θ ∂θ ∂ 2 ln L(y; θ) −n s(ln θ + 1) = 2 − . ∂θ2 θ (θ ln θ)2 So, since S ∼ POI(nλ),
∂ 2 ln L(y; θ) −E ∂θ2
n (−n ln θ)(ln θ + 1) = 2 + θ (θ ln θ)2 n n n = 2 − 2 − 2 θ θ θ ln θ −n = 2 . θ ln θ
247
Solutions
So, the CRLB is CRLB =
e−2λ λ λ θ2 (−ln θ) = = 2λ . n n ne
Consider the estimator θˆ =
n−1 S . n
Since θˆ is an unbiased estimator of θ and is a function of a complete sufficient statistic for θ, it is the MVUE of θ. Since ˆ = V(θ)
(eλ/n − 1) λ > 2λ , 2λ e ne
there is no unbiased estimator that attains the CRLB for all finite values of n. (b) Note that pr(Xi = 0) = pr(Yi = 0) = e−λ , and that pr(Xi = 1) = pr(Yi ≥ 1) = 1 − pr(Yi = 0) = 1 − e−λ . So, with x = (x1 , x2 , . . . , xn ), pXi (xi ; λ) = (1 − e−λ )xi (e−λ )1−xi , xi = 0, 1. Thus, L(x; λ) =
n
(1 − e−λ )xi e−λ(1−xi )
i=1
= (1 − e−λ )n¯x e−nλ(1−¯x) , where x¯ = n−1
n
xi .
i=1
So, ln L(x; λ) = n¯x ln(1 − e−λ ) − nλ(1 − x¯ ). The equation ∂ ln L(x; λ) n¯xe−λ − n(1 − x¯ ) = 0 = ∂λ (1 − e−λ ) ⇒ n¯xe−λ − n(1 − x¯ )(1 − e−λ ) = 0 ⇒ x¯ e−λ − 1 + e−λ + x¯ − x¯ e−λ = 0
248
Estimation Theory
⇒ e−λ = (1 − x¯ ) ⇒ −λ = ln(1 − x¯ ) ⇒ λˆ ∗ = − ln(1 − x¯ ). ¯ is the MLE of pr(Xi = 1) = (1 − e−λ ). This result also follows because X And, ∂ 2 ln L(x; λ) −e−λ (1 − e−λ ) − e−λ e−λ = n¯x ∂λ2 (1 − e−λ )2 =
−n¯xe−λ . (1 − e−λ )2
So, −E
∂ 2 ln L(x; λ) ∂λ2
=
¯ ne−λ E(X) (1 − e−λ )2
=
ne−λ (1 − e−λ ) (1 − e−λ )2
=
ne−λ . (1 − e−λ )
Thus, for large n, V(λˆ ∗ ) =
(1 − e−λ ) (eλ − 1) = . n ne−λ
(c) There are two scenarios to consider: Scenario 1: Assume that Y1 , Y2 , . . . , Yn are accurate. Then, λˆ = Y¯ is the MLE (and ˆ = λ and V(λ) ˆ = λ/n. Since, for large n, λˆ ∗ is essentially MVBUE) of λ, with E(λ) unbiased, a comparison of variances is appropriate. Now, ˆ = EFF(λˆ ∗ , λ)
λ λ λ/n = λ = ∞ λj < 1, (eλ − 1)/n (e − 1) λ + j=2 j!
so that λˆ ∗ always has a larger variance than λˆ (which is an expected result since we are losing information by categorizing Yi into the dichotomous variable Xi ). In fact, ˆ = 0, lim EFF(λˆ ∗ , λ)
λ→∞
so the loss in efficiency gets worse as λ gets larger (and this loss of information is not affected by increasing n). Scenario 2: Assume that Y1 , Y2 , . . . , Yn are inaccurate. In this case, using λˆ = Y¯ to estimate λ could lead to a severe bias problem. Assuming that X1 , X2 , . . . , Xn are accurate, then λˆ ∗ is essentially unbiased for large n and so would be the preferred
249
Solutions
estimator. Since validity takes preference over precision, λˆ ∗ would be preferred to λˆ when {Y1 , Y2 , . . . , Yn } are inaccurate but {X1 , X2 , . . . , Xn } are correct. Solution 4.15 (a) For the assumed statistical model, and with y = (y0 , y1 , . . . , yn ), the corresponding likelihood function L(y; θ) ≡ L is L = pY0 (y0 ; θ)
n−1
pYj+1 (yj+1 Yk = yk , k = 0, 1, . . . , j; θ)
j=0
= pY0 (y0 ; θ)
n−1
pYj+1 (yj+1 Yj = yj ; θ)
j=0
=
θy0 e−θ y0 !
Thus,
n−1 (θyj )yj+1 eθyj yj+1 !
j=0
⎛ ln(L) ∼ ⎝
n
.
⎞
⎛
yj ⎠ ln(θ) − θ ⎝1 +
j=0
n−1
⎞ yj ⎠ ,
j=0
so that the equation ⎛ ⎞ n−1 n ∂ ln(L) −1 yj − ⎝1 + yj ⎠ = 0 =θ ∂θ j=0
gives
j=0
n θˆ =
1+
j=0 Yj n−1
j=0 Yj
as the MLE of θ. (b) Now, n − nj=0 yj ∂ 2 ln(L) ∂ 2 ln(L) j=0 E(Yj ) = , so that − E = . ∂θ2 θ2 ∂θ2 θ2 And, E(Y0 ) = θ, E(Y1 ) =Ey0 [E(Y1 Y0 = y0 )] = Ey0 (θy0 ) = θ2 , E(Y2 ) = Ey1 [E(Y2 Y1 = y1 )] = Ey1 (θy1 ) = θ3 and so on, so that, in general, E(Yj ) = θ(j+1) , j = 0, 1, . . . , n. Finally,
∂ 2 ln(L) −E ∂θ2
n
=
(j+1) n 1 − θ(n+1) j=0 θ −1 j −1 . =θ θ =θ 1−θ θ2 j=0
250
Estimation Theory
. ˆ = So, for large n, V(θ) [θ(1 − θ)]/[1 − θ(n+1) ], and a MLbased 95% CI for θ is 2 ˆ ˆ ˆ θ) ˆ = θˆ ± 1.96 θ(1 − θ) . θˆ ± 1.96 V( (n+1) ˆ 1−θ When n = 25 and θˆ = 1.20, the computed 95% CI for θ is (1.11, 1.29). Solution 4.16. Given the stated assumptions, the appropriate CI for (μt − μc ) using ¯ − Y) ¯ is: (X 2 σ2 ¯ − Y) ¯ ± 1.96 t + σc , (X nt nc nc ¯ = n−1 nt Xi and Y¯ = n−1 where X c t i=1 Yi . i=1 The optimal choices for nt and nc , subject to the constraint (nt + nc ) = N, would minimize the width of the above CI. So, we want to minimize the function σt2 /nt + σt2 /nc subject to the constraint (nt + nc ) = N, or, equivalently, we want to minimize the function σ2 σc2 Q= t + , nt (N − nt ) with respect to nt . So, −σ2 dQ σc2 = 2t + =0 dnt (N − nt )2 nt ⇒ (σt2 − σc2 )n2t − 2Nσt2 nt + N 2 σt2 = 0. So, via the quadratic formula, the two roots of the above quadratic equation are 2Nσt2 ±
2
4N 2 σt4 − 4(σt2 − σc2 )N 2 σt2 2(σt2 − σc2 )
σt (σt ± σc ) =N . (σt + σc )(σt − σc )
If the positive sign is used, the possible answer is Nσt /(σt − σc ), which cannot be correct. If the negative sign is used, the answer is nt = N so that
nc = N
σt σt + σc σc σt + σc
2 This choice for nt minimizes Q since dQ2 dnt n = t
, .
Nσt (σt +σc )
> 0.
251
Solutions
When N = 100, σt2 = 4, and σc2 = 9, then nt = 40 and nc = 60. Note that these answers make sense, since more data are required from the more variable population. Solution 4.17. Consider the random variable (Y¯ − Yn+1 ), which is a linear combination of independent N(μ, σ2 ) variates. Since ¯ − E(Yn+1 ) = μ − μ = 0, E(Y¯ − Yn+1 ) = E(Y) and since ¯ + V(Yn+1 ) = V(Y¯ − Yn+1 ) = V(Y) it follows that
σ2 + σ2 = n
n+1 σ2 , n
n+1 ¯ (Y − Yn+1 ) ∼ N 0, σ2 . n
Hence, (Y¯ − Yn+1 ) ∼ N(0, 1). 1 n + 1 σ2 n Also, we know that (n − 1)S2 ∼ χ2n−1 . σ2 So,
⎡

⎤
⎢ (Y¯ − Yn+1 ) ⎥ ⎥ ⎢1 ⎦ ⎣ n + 1 σ2 n (n − 1)S2 , (n − 1) σ2
=
(Y¯ − Yn+1 ) ∼ tn−1 , 1 (n + 1) S n
since (Y¯ − Yn+1 ) and S2 are independent random variables. So, ⎧ ⎪ ⎪ ⎨
⎫ ⎪ ⎪ ⎬ ¯ (Y − Yn+1 ) (1 − α) = pr −tn−1,1−α/2 < 1 < tn−1,1−α/2 ⎪ ⎪ (n + 1) ⎪ ⎪ ⎩ ⎭ S n / . 1 1 (n + 1) (n + 1) ¯ ¯ < Yn+1 < Y + tn−1,1−α/2 S . = pr Y − tn−1,1−α/2 S n n Thus,
1 L = Y¯ − tn−1,1−α/2 S
(n + 1) n
252
Estimation Theory
and
1 U = Y¯ + tn−1,1−α/2 S
(n + 1) . n
For the given data, the realized values of Y¯ and S2 are y¯ = 3 and s2 = 2.50, so that the computed 95% prediction interval for the random variable Y6 is 1
1 √ 6 (n + 1) = 3 ± t0.975,4 2.50 y¯ ± tn−1,1−α/2 s 5 n √ = 3 ± 2.776 3 = (−1.8082, 7.8082). Solution 4.18 (a) We know that U=
n−1 (n − 1)S2 2 ∼ χ = GAMMA α = 2, β = . n−1 2 σ2
If Y ∼ GAMMA(α, β), then E(Y r ) =
∞ 0
yr
yβ−1 e−y/α Γ(β + r) r dy = α , Γ(β) Γ(β)αβ
(β + r) > 0.
So, E(U r ) =
Γ [(n − 1)/2 + r] r 2 . Γ [(n − 1)/2]
So, ⎡
⎤ √ 2 (n − 1)S ⎦ = n − 1 E(S) E U 1/2 = E ⎣ σ σ2
Γ(n/2) √ Γ[(n − 1)/2 + 1/2] 1/2 = 2 2 Γ[(n − 1)/2] Γ[(n − 1)/2] 2 Γ(n/2) σ E(S) = Γ[(n − 1)/2] (n − 1) =
⇒ ⇒
σ E(S) Γ(n/2) E(W) = 2tn−1,1−α/2 √ = 23/2 tn−1,1−α/2 . √ Γ[(n − 1)/2] n(n − 1) n
If α = 0.05, n = 4, and σ2 = 4, then E(W) = 23/2 t3,.975
2 Γ(2) √ Γ(3/2) 4(4 − 1)
1 1 = 23/2 (3.182) √ √ = 5.8633. ( π/2) 3
253
Solutions
(b) . / S S2 2 (1 − γ) ≤ pr 2tn∗ −1,1− α √ ≤ δ = pr 4f1,n∗ −1,1−α ∗ ≤ δ 2 n n∗ / . n∗ δ2 = pr S2 ≤ 4f1,n∗ −1,1−α / . n∗ (n∗ − 1)δ2 (n∗ − 1)S2 ≤ 2 = pr σ2 4σ f1,n∗ −1,1−α / . n∗ (n∗ − 1)δ2 2 . = pr χn∗ −1 ≤ 2 4σ f1,n∗ −1,1−α So, we require n∗ (n∗ − 1)δ2 ≥ χ2n∗ −1,1−γ , 4σ2 f1,n∗ −1,1−α or n∗ (n∗ − 1) ≥
2σ 2 2 χn∗ −1,1−γ f1,n∗ −1,1−α . δ
For further details, see Kupper and Hafner (1989). Solution 4.19 ˆ = θ, and (a) Let θˆ = 2Y¯ 1 − 3Y¯ 2 + Y¯ 3 ; so, E(θ) ˆ =4 V(θ) So,
σ2 n1
σ2 +9 n2
+
σ2 n3
= σ2
4 9 1 + + n1 n2 n3
.
& % ˆ 2Y¯ 1 − 3Y¯ 2 + Y¯ 3 − (2μ1 − 3μ2 + μ3 ) θˆ − E(θ) ∼ N(0, 1). = Z= 2 $ σ 4/n1 + 9/n2 + 1/n3 ˆ V(θ)
Now, (ni − 1)S2i /σ2 ∼ χ2n −1 , i = 1, 2, 3, and the S2i ’s are mutually independent i random variables. Thus, by the additivity property of mutually independent gamma random variables, U=
(n1 − 1)S21 + (n2 − 1)S22 + (n3 − 1)S23 σ2
and
3 E(U) = E
2 i=1 (ni − 1)Si
(n1 + n2 + n3 − 3)
∼ χ2(n +n +n −3) 1 2 3
= σ2 ,
254
Estimation Theory
3 2 where i=1 (ni − 1)Si /(n1 + n2 + n3 − 3) is called a “pooled estimator” 2 of σ . So, noting that the numerators and denominators in each of the following expressions are independent, we have Z T(n1 +n2 +n3 −3) = $ U/(n1 + n2 + n3 − 3) 2 ˆ ˆ V(θ) (θˆ − E(θ))/ = ; 3 2 i=1 (ni −1)Si (n1 + n2 + n3 − 3) 2 σ
% =  3
& 2Y¯ 1 − 3Y¯ 2 + Y¯ 3 − θ
2 2 i=1 (ni − 1)Si 4 + 9 + 1 n2 n3 (n1 + n2 + n3 − 3) n1
(b) Let θˆ = 2Y¯ 1 − 3Y¯ 2 + Y¯ 3 and S2p =
∼ t 3 i=1
. ni −3
3
2 i=1 (ni − 1) Si /(n1 + n2 + n3 − 3). From part (a),
θˆ − θ ∼ t(n1 +n2 +n3 −3) . $ Sp 4/n1 + 9/n2 + 1/n3 So, . (1 − α) = pr −t3
<
i=1 ni −3, 1−α/2
/ < t3
i=1 ni −3, 1−α/2
$
θˆ − θ
Sp 4/n1 + 9/n2 + 1/n3
,
and hence an exact 100(1 − α)% CI for θ is θˆ ± t3
S p
i=1 ni −3, 1−α/2
4 9 1 + + . n1 n2 n3
For these data, we have: 1 [2(80) − 3(75) + 70] ± t9,0.975
3(4 + 3 + 5) 9
1
4 9 1 + + = 5 ± 8.46, 4 4 4
or (−3.46, 13.46). (c) An exact 100(1 − α)% CI for σ12 /σ22 is . (1 − α) = pr
S21 /S22 fn1 −1, n2 −1, 1−α/2
σ12
S21 /S22
< 2 < 1/fn2 −1, n1 −1, 1−α/2 σ2
/ .
255
Solutions
Now, f49, 49, 0.975 = 1.76. So, lower limit = and upper limit =
7/2 = 1.99, 1.76
7 (1.76) = 6.16; 2
thus, our 95% CI for σ12 /σ22 is (1.99, 6.16). Note that the value 1 is not included in this interval, suggesting variance heterogeneity. (d) Consider the statistic
θˆ − θ
= 2 4S21 /n1 + 9S22 /n2 + S23 /n3
4σ12 /n1 + 9σ22 /n2 + σ32 /n3
1/2
4S21 /n1 + 9S22 /n2 + S23 /n3 ⎡
⎤
⎢ × ⎣2
θˆ − θ 4σ12 /n1 + 9σ22 /n2 + σ32 /n3
⎥ ⎦.
The expression in the first set of brackets converges to 1, since S2i is consistent for σi2 , i = 1, 2, 3, while the expression in the second set of brackets converges in distribution to N(0,1) by the Central Limit Theorem. So, by Slutsky’s Theorem, θˆ − θ ∼ ˙ N(0, 1) 2 4S21 /n1 + 9S22 /n2 + S23 /n3
for large n1 , n2 , n3 .
Thus, an approximate largesample 95% CI for θ is θˆ ± 1.96
4S21 n1
+
9S22
S2 + 3. n2 n3
For the data in part (c), θˆ = 2(85) − 3(82) + 79 = 3. So, our largesample CI is 1 3 ± 1.96
6 4(7) 9(2) + + = 3 ± 2.00 or (1.00, 5.00). 50 50 50
The advantage of selecting large random samples from each of the three populations is that the assumptions of exactly normally distributed populations and homogenous variance across populations can both be relaxed. Solution 4.20 √ (a) Since Xi / θ ∼ N(0, 1), then L/θ ∼ χ2n1 , or equivalently GAMMA (2, n1 /2), since X1 , X2 , . . . , Xn1 constitute a set of mutually independent random variables. If
256
Estimation Theory
U ∼ GAMMA(α, β), then E(U r ) = [Γ(β + r)/Γ(β)]αr , (β + r) > 0. Thus, for r = 12 , we have 1 √ L Γ (n1 /2 + 1/2) 1/2 = θ−1/2 E( L) = 2 , E θ Γ (n1 /2) so that
√ Γ [(n1 + 1)/2] √ 2θ. E( L) = Γ (n1 /2)
(b) The random variable √ 2 n1 θ /n1 Xi2 n2 −2 i=1 Fn1 ,n2 = =θ n2 2 ∼ fn1 ,n2 . n2 √ 2 n1 i=1 Yi θYi /n2 i=1 n1
i=1 Xi /
So, & % (1 − α) = pr fn1 ,n2 ,α/2 < Fn1 ,n2 < fn1 ,n2 ,1−α/2 = pr fn−1,n ,1−α/2 < Fn−1 < fn2 ,n1 ,1−α/2 = pr(L < θ < U), 1 ,n2 1
where
2
L=
n1 2 1/2 n2 1/2 −1/2 i=1 Xi fn ,n ,1−α/2 n2 2 1 2 n1 Y i=1
and
U=
i
n1 2 1/2 n2 1/2 1/2 i=1 Xi fn ,n ,1−α/2 . n2 2 2 1 n1 Y i=1
i
For the available data, since f8,5,0.975 = 6.76 and f5,8,0.975 = 4.82, the computed exact 95% CI for θ is (0.430,2.455). Solution 4.21 ¯ = n−1 n Di . Since E(Di ) =E(YTi − YPi ) = (a) The best point estimator of θ is D i=1 $ 2 + σ2 − (μT − μP ) = θ, and V(Di ) = V(YTi ) + V(YPi ) − 2ρ V(YTi )V(YPi ) = (σT P 2 2 ¯ ¯ 2ρσT σP ), it follows that E(D) = θ and V(D) = (σT + σP − 2ρσT σP )/n. Since $ ¯ − E(D)]/ ¯ ¯ ∼N(0, 1), it follows that [D V(D)
¯ − E(D) ¯ D pr −z1−α/2 < $ < z1−α/2 = (1 − α) = pr(L < θ < U) ¯ V(D) where
2 2 ¯ ¯ + z1−α/2 V(D). ¯ ¯ − z1−α/2 V(D) and U = D L=D
257
Solutions
Given the available data, the realized value of L is 0.02, and the realized value of U is 1.98, so that the computed 95% CI for θ is (0.02, 0.98). This computed 95% CI does not include the value zero, indicating that there is statistical evidence that θ = 0 (or, equivalently, that μT = μP ). (b) Now,
2 ¯ ¯ pr(L > 0θ = 1.0) = pr D − z0.975 V(D) > 0θ = 1.0
$ ¯ − 1.0 ¯ − 1.0 D 1.96 V(D) > $ $ ¯ ¯ V(D) V(D) 1.0 , = pr Z > 1.96 − $ ¯ V(D)
= pr
where Z ∼ N(0, 1). $ ¯ ≤ −1.645, or, So, to achieve pr(L > 0θ = 1.0) ≥ 0.95, we require 1.96 − 1.0/ V(D) equivalently, $
√ 1.0 1.0 n ≥ (1.96 + 1.645), = 2 ¯ 2 + σ2 − 2ρσ σ V(D) σT T P P
which gives n∗ = 46. Solution 4.22 (a) E(Ui ) = E(Xi + Yi ) = E(Xi ) + E(Yi ) = (μx + μy ), E(Vi ) = E(Xi − Yi ) = E(Xi ) − E(Yi ) = (μx − μy ), V(Ui ) = V(Xi + Yi ) = V(Xi ) + V(Yi ) + 2cov(Xi , Yi ) = σ2 + σ2 + 2ρσ2 = 2σ2 (1 + ρ), and V(Vi ) = V(Xi − Yi ) = V(Xi ) + V(Yi ) − 2cov(Xi , Yi ) = σ2 + σ2 − 2ρσ2 = 2σ2 (1 − ρ). (b) cov(Ui , Vi ) = E(Ui Vi ) − E(Ui )E(Vi ) = E[(Xi + Yi )(Xi − Yi )] − (μx + μy )(μx − μy ) = E(Xi2 − Yi2 ) − (μ2x − μ2y ) = [E(Xi2 ) − μ2x ] − [E(Yi2 ) − μ2y ] = σ2 − σ2 = 0.
258
Estimation Theory
(c) Given the bivariate normal assumption, it follows that U1 , U2 , . . . , Un are i.i.d. N[(μx + μy ), 2σ2 (1 + ρ)] random variables; and, V1 , V2 , . . . , Vn are i.i.d. N[(μx − μy ), 2σ2 (1 − ρ)] random variables. Hence, (n − 1)S2u ∼ χ2(n−1) , 2σ2 (1 + ρ)
(n − 1)S2v ∼ χ2(n−1) , 2σ2 (1 − ρ)
and S2u and S2v are independent random variables because of the result in part (b). So, ; (n − 1)S2u (n − 1) 2σ2 (1 + ρ) (1 − ρ)S2u ; ∼ f(n−1),(n−1) . = (1 + ρ)S2v (n − 1)S2v (n − 1) 2σ2 (1 − ρ) (d) If fn−1,n−1,1− α is defined such that 2
# α " pr Fn−1,n−1 > fn−1,n−1,1−α/2 = , 2 then # " (1 − α) = pr fn−1,n−1,α/2 < Fn−1,n−1 < fn−1,n−1,1−α/2 1 (1 − ρ)S2u = pr < < fn−1,n−1,1−α/2 fn−1,n−1,1−α/2 (1 + ρ)S2v 1 2 S2v < −1 = pr 2 f (1 + ρ) Su n−1,n−1,1−α/2 S2v fn−1,n−1,1−α/2 < S2u 2 2 −1 r.
The method of moments estimators are found by solving for γ and θ using the following two equations: μ ˆ 1 = y¯ =
θγ (θ − 1)
and μ ˆ 2 =
1 2 θγ2 yi = E(Y 2 ) = . n (θ − 2) n
i=1
The above equations imply that μ ˆ 2 y¯ 2
θγ2 /(θ − 2) (θ − 1)2 = 2 2 = . 2 θ(θ − 2) θ γ /(θ − 1)
Hence, μ ˆ
(μ ˆ − y¯ 2 ) 1 (θ − 1)2 −1= = 22 − 1 = 2 2 θ(θ − 2) θ(θ − 2) y¯ y¯ n 1 (yi − y¯ )2 n − 1 s2 i=1 n = = . n y¯ 2 y¯ 2
260
Estimation Theory
So,
θ(θ − 2) =
n n−1
2 y¯ 900 50 = = 91.8367. 49 10 s2
The roots of the quadratic equation θ2 − 2θ − 91.8367 = 0 are 2±
$
(−2)2 + 4(91.8367) , 2
or −8.6352 and 10.6352. Since θ > 2, we take the positive root and use θˆ mm = 10.6352. Finally, (θˆ mm − 1) y¯ = γˆ mm = θˆ mm
9.6352 (30) = 27.1793. 10.6352
So, γˆ mm = 27.1793. (b) Now, F(y; γ, θ) =
y γ
y θγθ t−(θ+1) dt = γθ −t−θ
γ
θ
γ = γθ γ−θ − y−θ = 1 − , y
0 < γ < y < ∞.
So, " #n−1 fY (y(1) ; γ, θ) fY(1) (y(1) ; γ, θ) = n 1 − FY (y(1) ; γ, θ) =n
γ
θ n−1
y(1)
θγθ y(1) −(θ+1)
= nθγnθ y(1) −(nθ+1) ,
0 < γ < y(1) < ∞.
Using this density, we have ∞ # " y(1) r nθγnθ y(1) −(nθ+1) dy(1) = E Y(1) r = γ
nθγr , (nθ − r)
So, # " E Y(1) =
nθγ , (nθ − 1)
and " # lim E Y(1) = lim
n→∞
θγ
n→∞ (θ − 1 ) n
=
θγ = γ, θ
nθ > r.
261
Solutions
so that Y(1) is an asymptotically unbiased estimator of γ. Also,
2 nθγ2 nθγ − (nθ − 2) (nθ − 1)
nθ 1 = nθγ2 − (nθ − 2) (nθ − 1)2
" # V Y(1) =
=
nθγ2 . (nθ − 1)2 (nθ − 2)
" # Since limn→∞ V Y(1) = 0, and since Y(1) is asymptotically unbiased, it follows that Y(1) is a consistent estimator of γ. # " (c) For 0 < c < 1, we wish to find c such that pr γ < cY(1) = (1 − α). Now,
γ ∞ # " < Y(1) = nθγnθ y(1) −(nθ+1) dy(1) pr γ < cY(1) = pr c γ/c
∞ γ −nθ = γnθ −y(1) −nθ = γnθ = cnθ = (1 − α). γ/c c So, c = (1 − α)1/(nθ) . Thus, since θ = 3, we have U = cY(1) = (1 − α)1/3n Y(1) . When n = 5, α = 0.10, and y(1) = 20, the computed value of U is u = (1 − 0.10)1/15 (20) = 19.860. So, the upper 90% CI for γ is (0, 19.860). Solution 4.24 (a) From order statistics theory, we know that, for r = 1, 2, . . . , n, " #r−1 " #n−r fX(r) (x(r) ) = nCn−1 1 − FX (x(r) ) fX (x(r) ), r−1 FX (x(r) )
−∞ < x(r) < +∞.
Hence, letting u = FX (x(r) ), so that du = fX (x(r) ) dx(r) , and then appealing to properties of the beta distribution, we have E(Ur ) =
∞ −∞
1
"
# " #r−1 " #n−r FX (X(r) ) nCn−1 1 − FX (x(r) ) fX (x(r) ) dx(r) r−1 FX (x(r) )
n! ur (1 − u)n−r du 0 (r − 1)!(n − r)! 1 Γ(n + 1) = ur (1 − u)n−r du 0 Γ(r)Γ(n − r + 1)
Γ(n + 1) Γ(r + 1)Γ(n − r + 1) = Γ(r)Γ(n − r + 1) Γ(n + 2) r . = (n + 1) =
262
Estimation Theory
(b) For any particular value of p, we can pick a pair of values for r and n such that " # E(Ur ) = E FX (X(r) ) =
r ≈ p. (n + 1)
For these particular choices for r and n, the amount of area under fX (x) to the left of X(r) is, on average (i.e., on expectation), equal to p. Thus, for these values of r and n, it is reasonable to use X(r) as the estimator of the pth quantile θp ; in particular, X(r) is called the pth sample quantile. Solution 4.25 (a) E(W) = E[X(n) − X(1) ] = E[X(n) ] − E[X(1) ]. Since fX (x; θ) = θxθ−1 , FX (x; θ) = xθ , 0 < x < 1, θ > 0. So, θ n−1 θxθ−1 , 0 < x fX(1) (x(1) ; θ) = n 1 − x(1) (1) < 1 (1) and
θ n−1 θxθ−1 , fX(n) (x(n) ; θ) = n x(n) (n)
So, E[X(n) ] =
1 0
θ(n−1)
x(n) nx(n)
0 < x(n) < 1.
θ−1 θx(n) dx(n) =
nθ . nθ + 1
θ and du = θxθ−1 dx , we have And, with u = x(1) (1) (1)
E(X(1) ) =
1 0
=n
θ n−1 θxθ−1 dx x(1) n 1 − x(1) (1) (1)
1 0
u1/θ (1 − u)n−1 du
1 1 +1 −1 =n u θ (1 − u)n−1 du 0
nΓ (1/θ + 1) Γ(n) Γ (1/θ + 1) Γ(n + 1) = = . Γ (1/θ + 1 + n) Γ (1/θ + 1 + n) So,
E(W) =
nθ nθ + 1
−
Γ (1/θ + 1) Γ(n + 1) . Γ (1/θ + 1 + n)
(b) Let A be the event “X(1) < ξ,” and let B be the event “X(n) > ξ.” Then, # 4" # " #5 " pr X(1) < ξ < X(n) = pr X(1) < ξ ∩ X(n) > ξ ¯ ∪ B) ¯ = pr(A ∩ B) = 1 − pr(A ∩ B) = 1 − pr(A ¯ + pr(B) ¯ − pr(A ¯ ∩ B)]. ¯ = 1 − [pr(A)
263
Solutions
Now, ¯ = pr[X(1) > ξ] = pr[∩n (Xi > ξ)] = pr(A) i=1
n
pr(Xi > ξ) =
n 1 ; 2
pr(Xi < ξ) =
n 1 , 2
i=1
similarly, ¯ = pr[X(n) < ξ] = pr[∩n (Xi < ξ)] = pr(B) i=1
n i=1
and
So,
" # ¯ ∩ B) ¯ = pr (X(1) > ξ) ∩ (X(n) < ξ) = 0. pr(A n " # 1 1 pr X(1) < ξ < X(n) = 1 − 2 = 1 − n−1 . 2 2
So, the confidence coefficient for the interval [X(1) , X(n) ] varies with n, which is a highly undesirable property. Solution 4.26. First, if X has a uniform distribution on the interval (0, 1), then, for r ≥ 0, we have 1 E(X r ) = xr (1) dx = (1 + r)−1 . 0
So, ⎞1/n ⎤ n n ⎥ ⎢ 1/n E Xi E(G) = E ⎣⎝ Xi ⎠ ⎦ = ⎡⎛
i=1
=
i=1
−1 n 1 −1 1 n 1+ = 1+ , n n
i=1
so that limn→∞ E(G) = e−1 . And, similarly, ⎡⎛ ⎞2/n ⎤ n n ⎢ ⎥ 2/n E(G2 ) = E ⎣⎝ Xi ⎠ ⎦ = E Xi i=1
i=1
−1 n 2 n 2 −1 = = 1+ , 1+ n n i=1
so that limn→∞ E(G2 ) = e−2 .
264
Estimation Theory
Thus, lim V(G) = lim
n→∞
n→∞
E(G2 ) − [E(G)]2 = e−2 − (e−1 )2 = 0.
Hence, since limn→∞ E(G) = e−1 and limn→∞ V(G) = 0, it follows that the random variable G converges in probability to (i.e., is a consistent estimator of) the quantity e−1 = 0.368. Solution 4.27. We wish to prove that limn→∞ pr{Xn − 0 > } = 0 ∀ > 0. Since pr(Y > n) = n∞ e−y dy = e−n , we have Xn = en I(Y > n) =
n e 0
with probability e%−n , & with probability 1 − e−n .
Thus, for any > 0, p{Xn  > } = pr{Xn > } =
0 e−n
if en ≤ , if en > .
So, lim pr{Xn  > } ≤ lim e−n = 0,
n→∞
n→∞
∀ > 0.
Note that E(Xn ) = 1 and V(Xn ) = (en − 1), so that limn→∞ V(Xn ) = +∞; hence, a direct proof of convergence in probability is required. Solution 4.28. Now,
n ¯ i (Ti − T)Y , βˆ ∗ = i=1 n ¯ 2 i=1 (Ti − T)
¯ 1 = n−1 n1 Ai , and A ¯ 0 = n−1 where T¯ = n−1 ni=1 Ti = n1 /n. Also, define A 1 i=1 0 n i=(n1 +1) Ai . Now, since E(Yi Ti , Ai ) = α + βTi + γAi , it follows that E(βˆ ∗ {Ti }, {Ai }) =
n
¯ i=1 (Ti − T)(α + βTi + γAi ) n ¯ 2 i=1 (Ti − T) n
(Ti − nn1 )Ai = β + ni=1 n1 2 . i=1 (Ti − n ) γ
Now, n n n ¯ 1 + (0 − n1 )n0 A ¯ 0 = n0 n1 (A ¯1 −A ¯ 0 ). (Ti − 1 )Ai = (1 − 1 )n1 A n n n n i=1
And, n n n n n n (Ti − 1 )2 = n1 (1 − 1 )2 + n0 (− 1 )2 = 0 1 . n n n n i=1
265
Solutions
So, ¯1 −A ¯ 0 ). E(βˆ ∗ {Ti }, {Ai }) = β + γ(A ¯1 = A ¯ 0 (i.e., the Thus, since γ = 0, a sufficient condition for E(βˆ ∗ {Ti }, {Ai }) = β is A average age of the n1 subjects in the treatment group is equal to the average age of the n0 subjects in the comparison group). If the n subjects are randomly selected from a large population of subjects and then randomization is employed in assigning these n subjects to the treatment and ¯ 0 ) = E(A ¯ 1 ), so that comparison groups, then it follows that E(A ¯1 −A ¯ 0 ) = β. E(βˆ ∗ {Ti }) = E{Ai } [E(βˆ ∗ {Ti }, {Ai }] = β + γE(A So, on expectation, randomization is sufficient to insure that βˆ ∗ is an unbiased estimator of β. Solution 4.29 (a) For i = 1, 2, . . . , n, note that E(Yi ) = πi , V(Yi ) = πi (1 − πi ), and L(y; β) =
n y πi i (1 − πi )1−yi , i=1
so that ln L(y; β) =
n "
# yi ln πi + (1 − yi ) ln(1 − πi ) .
i=1
So, for j = 0, 1, . . . , p, we have ∂ ln L(y; β) = ∂βj n
i=1
=
yi πi
∂πi (1 − yi ) ∂πi − ∂βj (1 − πi ) ∂βj
n ∂πi (yi − πi ) . ∂βj πi (1 − πi ) i=1
And, p
∂πi = ∂βj
xij e
j=0 βj xij
p p βx 2 βx 1 + e j=0 j ij − xij e j=0 j ij 2 p βx 1 + e j=0 j ij
p
xij e
j=0 βj xij
= 2 = xij πi (1 − πi ). p βj xij j=0 1+e
266
Estimation Theory
Thus,
n ∂ ln L(y; β) (yi − πi ) = xij πi (1 − πi ) ∂βj πi (1 − πi ) i=1
=
n
# " " # xij yi − E(Yi ) = x j y − E(Y) ,
i=1
where x j = (x1j , x2j , . . . , xnj ). Finally, with the [nx(p + 1)] matrix X defined so that its ith row is x i = (1, xi1 , xi2 , . . . , xip ), i = 1, 2, . . . , n, and with the [(p + 1) × 1] column vector [∂ ln L(y; β)]/∂β defined as ∂ ln L(y; β) = ∂β we have
∂ ln L(y; β) ∂ ln L(y; β) ∂ ln L(y; β) , ,..., ∂β0 ∂β1 ∂βp
,
" # ∂ ln L(y; β) = X y − E(Y) , ∂β
which gives the desired result. (b) Since ∂ ln L(y; β)/∂βj = ni=1 xij (yi − πi ), it follows that ∂ 2 ln L(y; β) ∂πi = xij ∂βj ∂βj
∂βj
n
−
i=1
=
n
xij xij πi (1 − πi )
i=1
=
n
xij xij V(Yi ),
i=1
which does not functionally depend on Y. Since Y has a diagonal covariance matrix of the simple form V = diag [V(Y1 ), V(Y2 ), . . . , V(Yn )] = diag [π1 (1 − π1 ), π2 (1 − π2 ), . . . , πn (1 − πn )] , it follows directly that the observed information matrix I(y; β) equals the expected information matrix I(β), which can be written in matrix notation as I(β) = X V X. ˆ of βˆ is equal to Finally, the estimated covariance matrix Vˆ (β) −1 ˆ = X Vˆ X ˆ = I −1 (β) , Vˆ (β)
267
Solutions
where Vˆ = diag [π ˆ 1 (1 − π ˆ 1 ), π ˆ 2 (1 − π ˆ 2 ), . . . , π ˆ n (1 − π ˆ n )] , and where
p
π ˆi =
ˆ
j=0 βj xij
e 1+e
p
ˆ
j=0 βj xij
,
i = 1, 2, . . . , n.
Solution 4.30*. At iteration t, the Estep requires that we evaluate (t−1) (t−1) Q(t) (y; π, μ1 , μ2 ) ≡ Q(t) = EZ ln[Lc (y, z; π, μ1 , μ2 )] y, π ˆ (t−1) , μ ˆ1 ,μ ˆ2 , where the completedata likelihood is given by Lc (y, z; π, μ1 , μ2 ) =
n
[πpYi (yi ; μ1 )]zi [(1 − π)pYi (yi ; μ2 )](1−zi ) .
i=1
So, Q(t) (y; π, μ1 , μ2 ) =
n
(t−1) EZi zi ln[π ˆ (t−1) pYi (yi ; μ ˆ1 )]
i=1 (t−1)
+ (1 − zi ) ln[(1 − π ˆ (t−1) )pYi (yi ; μ ˆ2 =
n "
)] yi
# (C1i − C2i )E(Zi yi ) + C2i ,
i=1 (t−1)
where C1i = ln[π ˆ (t−1) pYi (yi ; μ ˆ1 stants with respect to Zi . Now,
(t−1)
)] and C2i = ln[(1 − π ˆ (t−1) )pYi (yi ; μ ˆ2
)] are con
E(Zi yi ) = pr(Zi = 1yi )
(t−1) ˆ1 ) pr(Zi = 1) pYi (yi ; μ
=
(t−1) (t−1) pYi (yi ; μ ˆ1 ) pr(Zi = 1) + pYi (yi ; μ ˆ2 ) pr(Zi = 0)
(t−1) ˆ1 ) π ˆ (t−1) pYi (yi ; μ ˆ (t) ,
=Z = i (t−1) (t−1) π ˆ (t−1) pYi (yi ; μ ˆ1 ) + (1 − π ˆ (t−1) ) pYi (yi ; μ ˆ2 )
say.
(t) Note that Zˆ i is the tth iteration estimate of the probability that the ith fish was (0) (0) born in a Pfiesteriarich site. Also, when t = 1, π ˆ (0) , μ ˆ 1 , and μ ˆ 2 are the wellchosen initial values that must be specified to start the EM algorithm iteration process.
268
Estimation Theory
Thus, Qˆ (t) (y; π, μ1 , μ2 ) ≡ Qˆ (t) =
n
(t) (C1i − C2i )Zˆ i + C2i i=1
=
n
(t) (t−1) (t) ˆ (t−1) pYi (yi ; μ ˆ1 )] + (1 − Zˆ i ) Zˆ i ln[π
i=1 (t−1)
× ln[(1 − π ˆ (t−1) )pYi (yi ; μ ˆ2
)] .
For the Mstep, maximizing Qˆ (t) with respect to π yields n 4 (t) " # ∂ Qˆ (t) (t) " = ∂ Zˆ i ln π + ln pYi (yi ; μ1 ) + (1 − Zˆ i ) ln(1 − π) ∂π i=1
; #5 + ln pYi (yi ; μ2 ) ∂π n ⇒
ˆ (t) i=1 Zi π
−
(t) n − ni=1 Zˆ i 1−π
n
ˆ (t) i=1 Zi
⇒π ˆ (t) =
n
=0
.
Thus, π ˆ (t) is the sample average estimated probability that a randomly selected fish was born in a Pfiesteriarich site. And,
n ∂ Qˆ (t) (t) (t) =∂ Zˆ i ln π + ln pYi (yi ; μ1 ) + (1 − Zˆ i ) ∂μ1 i=1
× ln(1 − π) + ln pYi (yi ; μ2 ) =
∂
# n ˆ (t) " yi ln μ1 − μ1 − ln yi ! i=1 Zi
n =
, ∂μ1
∂μ1
n ˆ (t) (t) i=1 Zi yi − Zˆ i = 0 μ1 i=1
n (t) Zˆ yi (t) ⇒ μ ˆ 1 = i=1 i (t) . n ˆ i=1 Zi (t)
Note that μ ˆ 1 is a weighted estimate of the average number of ulcerative lesions for fish born in Pfiesteriarich sites. (t) (t) (t) Similarly, it can be shown that μ ˆ 2 = ni=1 (1 − Zˆ i )yi / ni=1 (1 − Zˆ i ) is a weighted estimate of the average number of ulcerative lesions for fish born in Pfiesteriafree sites.
269
Solutions
Solution 4.31* (a) Let IE (x) be the indicator function for the set E, so that IE (x) equals 1 if x ∈ E and IE (x) equals 0 otherwise. Then, letting A= {1, 2, . . . , θ} and letting B={1, 2, . . . , ∞}, we have pX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ; θ) =
n
θ−1 IA(xi )
i=1
⎡
= [(θ−n )IA(x(n) )] · ⎣
n
⎤ IB (xi )⎦
i=1
= g(u; θ) · h(x1 , x2 , . . . , xn ), where u = x(n) = max{x1 , x2 , . . . , xn }. And, given X(n) = x(n) , h(x1 , x2 , . . . , xn ) does not in any way depend on θ, so that X(n) is a sufficient statistic for θ. Also, E(U ∗ ) = E[(2X1 − 1)] = 2E(X1 ) − 1 θ
=2
x1 θ−1 − 1 =
x1 =1
=
θ 2 x1 − 1 θ x1 =1
2 θ(θ + 1) −1 θ 2
= θ. (b) For notational ease, let X(n) = U. Now, θˆ = E(U ∗ U = u) = E(2X1 − 1U = u) = 2E(X1 U = u) − 1, so we need to evaluate E(X1 U = u). To do so, we need to first find pX1 (x1 U = u) =
pX1 ,U (x1 , u) pU (u)
=
pr[(X1 = x1 ) ∩ (U = u)] . pU (u)
Now, pr(U = u) = pr(U ≤ u) − pr(U ≤ u − 1) ⎡ ⎤ ⎡ ⎤ n n = pr ⎣ (Xi ≤ u)⎦ − pr ⎣ (Xi ≤ u − 1)⎦ i=1
=
n
pr(Xi ≤ u) −
i=1
=
u n θ
i=1
−
n
pr(Xi ≤ u − 1)
i=1
u−1 n , θ
u = 1, 2, . . . , θ.
270
Estimation Theory
And, ⎧ 0, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 u n−1 ⎨ , pr[(X1 = x1 ) ∩ (U = u)] = θ θ ⎪ ⎪ ⎪ ⎪ ⎪ u − 1 n−1 1 u n−1 ⎪ ⎪ − , ⎩ θ θ θ
x1 > u, x1 = u, x1 < u.
In the above expression, u, x1 = u, x1 = 1, 2, . . . u − 1,
which cannot (and does not) depend in any way on θ by the sufficiency principle. So, E(X1 U = u) =
u−1 x1 =1
un−1 − (u − 1)n−1 x1 un − (u − 1)n
(u − 1)u = 2 =
un−1 +u n u − (u − 1)n
n−1
u − (u − 1)n−1 un + un − (u − 1)n un − (u − 1)n
un+1 − u(u − 1)n + un . 2 [un − (u − 1)n ]
Thus, 2E(X1 U = u) − 1 = =
un+1 − u(u − 1)n + un −1 un − (u − 1)n un+1 − (u − 1)n+1 , un − (u − 1)n
so that the MVUE of θ is U n+1 − (U − 1)n+1 θˆ = . U n − (U − 1)n
271
Solutions
As a simple check, when n = 1, so that U = X1 , we obtain U 2 − (U − 1)2 θˆ = = 2U − 1 = 2X1 − 1, U − (U − 1)
as desired .
Note that ˆ = E(θ)
θ
θˆ pr(U = u) =
u=1
1 = n θ
θ un+1 − (u − 1)n+1 un − (u − 1)n un − (u − 1)n θn
u=1
θ
un+1 − (u − 1)n+1
u=1
1 = n {(1 − 0) + (2n+1 − 1) + (3n+1 − 2n+1 ) + · · · + [θn+1 − (θ − 1)n+1 ]} θ =
θn+1 = θ. θn
As a numerical example, if n = 5 and u = 2, then θˆ = (26 − 16 )/(25 − 15 ) = 2.0323. So, one disadvantage of θˆ is that it does not necessarily take positive integer values, even though the parameter θ is a positive integer. Solution 4.32* (a) First, note that FX (x) =
x θ
(1)dx = (x − θ), 0 < θ < x < (θ + 1) < +∞.
Hence, from order statistics theory, it follows directly that " #n−1 fX (x(1) ) fX(1) (x(1) ) = n 1 − FX (x(1) ) " #n−1 = n 1 − (x(1) − θ) ,
0 < θ < x(1) < (θ + 1) < +∞.
Then, with u = (1 + θ) − x(1) , so that du = −dx(1) , we have, for r a nonnegative integer,
θ+1 r r n[(1 + θ) − x ]n−1 dx E X(1) = x(1) (1) (1) θ
=
1 0
[(1 + θ) − u]r nun−1 du
⎡ ⎤ 1 r r j r−j =n ⎣ C (1 + θ) (−u) ⎦ un−1 du 0
j
j=0
272
Estimation Theory
=n
r
Crj (1 + θ)j (−1)r−j
j=0
=n
r
1
un+r−j−1 du
0
Crj (1 + θ)j (−1)r−j (n + r − j)−1 .
j=0
When r = 1, we obtain E(X(1) ) = θ + 1/(n + 1). Also, for r = 2, we obtain 2 )=n E(X(1)
1 1 1 + 2(1 + θ)(−1) + (1 + θ)2 , (n + 2) n+1 n
so that 2 ) − [E(X )]2 = V(X(1) ) = E(X(1) (1)
n . (n + 1)2 (n + 2)
By symmetry, E(X(n) ) = (θ + 1) −
1 (n + 1)
and
V(X(n) ) =
n . (n + 1)2 (n + 2)
" #n−1 , Or, more directly, one can use fX(n) (x(n) ) = n x(n) − θ 0 < θ < x(n) < (θ + 1) < +∞, to show that E(X(n) r ) = n
r
Crj θr−j (n + j)−1 .
j=0
Thus, # 1" E(X(1) ) + E(X(n) ) − 1 2
1 1 1 θ+ + (θ + 1) − − 1 = θ. = 2 (n + 1) (n + 1)
E(θˆ 1 ) =
And, " # 1 nE(X(1) ) − E(X(n) ) (n − 1)
1 1 1 n θ+ − (θ + 1) − = θ. = (n − 1) (n + 1) (n + 1)
E(θˆ 2 ) =
To find the variances of the estimators θˆ 1 and θˆ 2 , we need to find cov[X(1) , X(n) ]. If we let Yi = (Xi − θ), i = 1, 2, . . . , n, then fYi (yi ) = 1, 0 < yi < 1. Also, Y(1) = min{Y1 , Y2 , . . . , Yn } = (X(1) − θ) and Y(n) = max{Y1 , Y2 , . . . , Yn } = (X(n) − θ), so that cov[X(1) , X(n) ]=cov[Y(1) , Y(n) ].
273
Solutions
Now, since fY(1) ,Y(n) (y(1) , y(n) ) = n(n − 1)(y(n) − y(1) )n−2 , 0 < y(1) < y(n) < 1, we have 1 1 E(Y(1) Y(n) ) = [y(1) y(n) ]n(n − 1)(y(n) − y(1) )n−2 dy(n) dy(1) . 0 y(1)
So, using the relationship w = (y(n) − y(1) ), so that dw = dy(n) , and appealing to properties of the beta distribution, we obtain E(Y(1) Y(n) ) = n(n − 1) = n(n − 1) = n(n − 1)
1 1−y(1) 0 0
y(1) (w + y(1) )wn−2 dw dy(1)
1 1−y(1)
2 wn−2 dw dy y(1) wn−1 + y(1) (1) 0 0
1 0
2 (1 − y )n−1 y(1) y(1) (1 − y(1) )n (1) + n (n − 1)
dy(1)
Γ(2)Γ(n + 1)/Γ(n + 3) Γ(3)Γ(n)/Γ(n + 3) + n (n − 1)
2 1 + = n(n − 1) n(n + 1)(n + 2) (n − 1)n(n + 1)(n + 2)
= n(n − 1)
=
1 . (n + 2)
Finally, cov(X(1) , X(n) ) = cov(Y(1) , Y(n) ) = E(Y(1) Y(n) ) − E(Y(1) )E(Y(n) ) 1 n 1 1 − = = . (n + 2) n+1 n+1 (n + 1)2 (n + 2) So, 2 # 1 " V(X(1) ) + V(X(n) ) + 2cov(X(1) , X(n) ) 2 n 1 n 1 = + + 2 4 (n + 1)2 (n + 2) (n + 1)2 (n + 2) (n + 1)2 (n + 2)
V(θˆ 1 ) =
=
1 . 2(n + 1)(n + 2)
And,
1 2 V(X ) + V(X ) − 2ncov(X , X ) n (n) (n) (1) (1) (n − 1)2 n 2n 1 n3 + − = (n − 1)2 (n + 1)2 (n + 2) (n + 1)2 (n + 2) (n + 1)2 (n + 2)
V(θˆ 2 ) =
=
n n . = 2 (n − 1)(n + 1)(n + 2) (n − 1)(n + 2)
274
Estimation Theory
Thus, V(θˆ 1 ) < V(θˆ 2 ), n > 1. (b) Now, V(W) = c12 σ2 + c22 σ2 + 2c1 c2 σ12 = [c12 + (1 − c1 )2 ]σ2 + 2c1 (1 − c1 )σ12 . So, ∂V(W) = [2c1 − 2(1 − c1 )] + 2(1 − 2c1 )σ12 = 0 ∂c1 gives 2(2c1 − 1)(σ2 − σ12 ) = 0. Thus, if c1 = c2 = 12 , then V(W) is minimized as long as σ2 > σ12 . Note that these conditions are met by the estimator θˆ 1 , but not by the estimator θˆ 2 . Also, another drawback associated with the estimator θˆ 2 is that it can take a negative value, even though θ > 0. (c) Let IA(x) be the indicator function for the set A, so that IA(x) equals 1 if x ∈ A and IA(x) equals 0 otherwise. Then, with A equal to the open interval (θ, θ + 1), the joint distribution of X1 , X2 , . . . , Xn can be written in the form (1)n
n
4 54 5 I(θ,θ+1) (xi ) = (1)n I(θ,θ+1) [x(1) ] I(θ,θ+1) [x(n) ] ,
i=1
since 0 < θ < x(1) ≤ xi ≤ x(n) < (θ + 1) < +∞, i = 1, 2, . . . , n. Hence, by the Factorization Theorem, X(1) and X(n) are jointly sufficient for θ. However, " X(1) and X#(n) do not constitute a set of complete sufficient statistics for θ since E g(X(1) , X(n) ) = 0 for all θ, 0 < θ < +∞, where g(X(1) , X(n) ) = X(1) − X(n) +
n−1 . n+1
This finding raises the interesting question about whether or not the estimator θˆ 1 could be the MVUE of θ, even though it is not a function of complete sufficient statistics. For more general discussion about this issue, see Bondesson (1983). Solution 4.33* (a) Under the independence assumption, ⎞ ⎛ ⎞ Xyy Nπ1 π2 E ⎝ Xyn ⎠ = ⎝ Nπ1 (1 − π2 ) ⎠ . N(1 − π1 )π2 Xny ⎛
275
Solutions
Hence, the methodofmoments estimator of N is obtained by solving for N using the three equations Xyy = Nπ1 π2 ,
(4.1)
Xyn = Nπ1 (1 − π2 ),
(4.2)
Xny = (1 − π1 )π2 .
(4.3)
The operations [(Equation 4.1) + (Equation 4.2)] and [(Equation 4.1) + (Equation 4.3)] give (Xyy + Xyn ) = Nπ1 ,
(4.4)
(Xyy + Xny ) = Nπ2 .
(4.5)
Finally, the operation [(Equation 4.4) × (Equation 4.5)] / (Equation 4.1) produces ˆ = N
(Xyy + Xyn )(Xyy + Xny ) Xyy
.
ˆ does not necessarily take integer values. Also, E[N] ˆ and V[N] ˆ are Note that N undefined since Xyy = 0 occurs with nonzero probability. When xyy = 12, 000, xyn = 6000, and xny = 8000, we have ˆ = [(12, 000) + (6000)][(12, 000) + (8000)] = 30, 000. N (12, 000) (b) Since odds(E1 E2 ) =
πyy /(πyy + πny ) πyy pr(E1 E2 ) = = 1 − pr(E1 E2 ) πny /(πyy + πny ) πny
odds(E1 E¯ 2 ) =
πyn /(πyn + πnn ) πyn pr(E1 E¯ 2 ) = = , ¯ /(π + π ) π π 1 − pr(E1 E2 ) nn yn nn nn
and
the assumption that odds(E1  E2 ) =k odds(E1  E¯ 2 ) implies that πyy /πny πyn /πnn
=k
or, equivalently, πyy (1 − πyy − πyn − πny ) = kπyn πny .
276
Estimation Theory
So, a methodofmoments estimator of N is obtained by simultaneously solving the four equations Xyy = Nπyy ,
(4.6)
Xyn = Nπyn ,
(4.7)
Xny = Nπny ,
(4.8)
πyy (1 − πyy − πyn − πny ) = kπyn πny .
(4.9)
ˆ yn = Xyn /N, and π ˆ ny = Equations 4.6, 4.7, and 4.8 imply that π ˆ yy = Xyy /N, π Xny /N. Substituting these expressions into Equation 4.9 yields
Xyy
(N − Xyy − Xyn − Xny )
N
N
=k
Xyn
Xny
N
N
,
giving ˜ N(k) =
2 + X X + X X + kX X Xyy yy yn yy ny yn ny
Xyy
.
When xyy = 12, 000, xyn = 6000, and xny = 8000, we have (12, 000)2 + (12, 000)(6000) + (12, 000)(8000) + (1/2)(6000)(8000) ˜ N(1/2) = 12, 000 = 28, 000, (12, 000)2 + (12, 000)(6000) + (12, 000)(8000) + (2)(6000)(8000) ˜ N(2) = 12, 000 = 34, 000, (12, 000)2 + (12, 000)(6000) + (12, 000)(8000) + (4)(6000)(8000) ˜ N(4) = 12, 000 = 42, 000. ˆ which assumes independence, has a tendency to Apparently, the estimator N, overestimate N when k < 1 and to underestimate N when k > 1. Solution 4.34* (a) Let π(xi ) ≡ πi = eα+βxi /(1 + eα+βxi ). The likelihood function L is equal to L=
n y πi i (1 − πi )1−yi , i=1
so that ln L =
n i=1
yi ln(πi ) + (1 − yi ) ln(1 − πi ).
277
Solutions
By the chain rule, ∂ ln L ∂ ln L ∂πi · = ∂πi ∂α ∂α n
i=1
and
∂ ln L ∂ ln L ∂πi = · . ∂β ∂πi ∂β n
i=1
Now,
∂ ∂πi = ∂α
and
eα+βxi (1 + eα+βxi )
=
∂α
∂ ∂πi = ∂β
eα+βxi (1 + eα+βxi )
eα+βxi = πi (1 − πi ) (1 + eα+βxi )2
∂β
=
xi eα+βxi = xi πi (1 − πi ). (1 + eα+βxi )2
Thus,
n n 1 − yi yi ∂ ln L ∂ ln L ∂πi · − = = πi (1 − πi ) ∂α ∂πi ∂α πi 1 − πi i=1
i=1
n (yi − πi ) = 0; = i=1
⇒
n
yi =
n
i=1
πi =
i=1
=
n0 i=1
⇒
n
yi = n0
i=1
n i=1
eα+βxi (1 + eα+βxi )
n eα + α (1 + e )
i=n0 +1
eα+β (1 + eα+β )
eα eα+β + n . 1 (1 + eα ) (1 + eα+β )
Similarly,
n n yi 1 − yi ∂ ln L ∂ ln L ∂πi = · = − x π (1 − πi ) ∂β ∂πi ∂β πi 1 − πi i i i=1
=
n
i=1
n i=n0 +1
yi =
n i=n0 +1
(yi − πi ) = 0;
i=n0 +1
i=1
⇒
n
xi (yi − πi ) =
πi =
n i=n0 +1
eα+β eα+β = n1 . α+β (1 + e ) (1 + eα+β )
278
Estimation Theory
Via subtraction, we obtain n
n
yi −
yi = n0
eα (1 + eα )
yi = n0
eα (1 + eα )
i=n0 +1
i=1
n0
⇒
i=1
⇒ αˆ = ln
p0 1 − p0
,
n0 where p0 = n−1 i=1 yi is the sample proportion of overweight infants receiving 0 home care. Then, it follows directly that ˆ
n
yi = n1
ˆ β eα+
ˆ βˆ 1 + eα+ p1 ⇒ βˆ = ln − αˆ 1 − p1
p0 p1 − ln = ln 1 − p1 1 − p0
p /(1 − p1 ) , = ln 1 p0 /(1 − p0 )
i=n0 +1
n where p1 = n−1 i=n0 +1 yi is the sample proportion of overweight infants in day 1 care. Thus, αˆ is the sample log odds of being overweight for infants receiving home care, while βˆ is the sample log odds ratio comparing the estimated odds of being overweight for infants in day care to the estimated odds of being overweight for infants receiving home care. The estimators αˆ and βˆ make intuitive sense, since they are the sample counterparts of the population parameters α and β. (b) −∂ ∂ 2 ln L − = 2 ∂α =
n
n
i=1 (yi − πi ) · ∂πi ∂πi ∂α
πi (1 − πi ) =
i=1
=
n0 i=1
n i=1
eα
+ (1 + eα )2
eα+βxi (1 + eα+βxi )2
n i=n0 +1
eα+β (1 + eα+β )2
= n0 π0 (1 − π0 ) + n1 π1 (1 − π1 ), where π0 = eα /(1 + eα ) and π1 = eα+β /(1 + eα+β ).
279
Solutions
Also, −∂ ∂ 2 ln L = − 2 ∂β =
n
n
i=1 xi (yi − πi ) · ∂πi ∂β ∂πi
xi2 πi (1 − πi )
i=1 n
=
i=n0 +1
eα+β = n1 π1 (1 − π1 ). (1 + eα+β )2
Finally, −∂ ∂ 2 ln L = − ∂α∂β =
n
n
i=1 (yi − πi ) · ∂πi ∂πi ∂β
xi πi (1 − πi )
i=1
= n1 π1 (1 − π1 ) = −
∂ 2 ln L . ∂β∂α
So, with y = (y1 , y2 , . . . , yn ), it follows that I(y; α, β) = I(α, β) n0 π0 (1 − π0 ) + n1 π1 (1 − π1 ) = n1 π1 (1 − π1 )
n1 π1 (1 − π1 ) n1 π1 (1 − π1 )
.
Hence, using either observed or expected information, the largesample variance–covariance matrix for αˆ and βˆ is equal to ⎧
−1 ⎪ ⎨ n0 eα α 2 I −1 (α, β) =
(1+e ) −1 ⎪ ⎩− n0 eαα 2 (1+e )
n0
⎫
−1 α ⎪ ⎬ − n0 e α 2 (1+e ) −1
−1 . α α+β ⎪ e ⎭ + n1 e α+β 2 α 2
(1+e )
(1+e
)
(c) The estimated largesample 95% CI for α is 6 7 7 αˆ ± 1.968 n0
−1
eαˆ (1 + eαˆ )2
,
and the estimated largesample 95% CI for β is 6 7 7 ˆβ ± 1.968 n0
eαˆ (1 + eαˆ )2
−1
+ n1
ˆ βˆ eα+ ˆ βˆ )2 (1 + eα+
−1 .
280
Estimation Theory
For the given data, αˆ = −1.52 and βˆ = 0.47, so that the corresponding largesample 95% CIs for α and β are (−2.03, −1.01) and (−0.21, 1.15), respectively. Since the CI for β includes the value 0, there is no statistical evidence using these data that infants placed in day care are more likely to be overweight than are infants receiving home care. Solution 4.35*. First, the parameter of interest is γ = E(T) = Ex [E(TX = x)] = Ex [(θx)−1 ] =
1 1 E(X −1 ) = , θ θ(β − 1)
β > 1.
Since information on the random variable X is unavailable, the marginal distribution fT (t) of the random variable T must be used to make MLbased inferences about the parameters θ, β, and γ. In particular, the observed latency periods t1 , t2 , . . . , tn can then be considered to be the realizations of a random sample T1 , T2 , . . . , Tn of size n from fT (t) = =
∞ 0
∞ 0
fT,X (t, x) dx = θxe−θxt ·
∞ 0
fT (tX = x)fX (x) dx
xβ−1 e−x dx Γ(β)
=
∞ θ xβ e−(1+θt)x dx Γ(β) 0
=
θ Γ(β + 1)(1 + θt)−(β+1) Γ(β)
= θβ(1 + θt)−(β+1) ,
t > 0, θ > 0, β > 1.
To produce a largesample CI for the unknown parameter γ = [θ(β − 1)]−1 , we will employ the delta method. First, the appropriate likelihood function L is L=
n
n θβ(1 + θti )−(β+1) = θn βn (1 + θti )−(β+1) , i=1
i=1
so that ln L = n ln θ + n ln β − (β + 1)
n
ln(1 + θti ).
i=1
So, we have ∂ ln L n ti = − (β + 1) , ∂θ θ (1 + θti ) n
i=1
and ∂ ln L n ln(1 + θti ), = − ∂β β n
i=1
281
Solutions
so that n ti2 ∂ 2 ln L −n , = + (β + 1) ∂θ2 θ2 (1 + θti )2 i=1
∂ 2 ln L ∂β2 and
=
−n , β2
∂ 2 ln L ∂ 2 lnL ti = =− . ∂θ∂β ∂β∂θ (1 + θti ) n
i=1
Now, using integrationbyparts with u = t, du = dt, dv = θ(1 + θt)−(β+2) dt, and v = −(1 + θt)−(β+1) /(β + 1), we have
∞ t 1 T = θβ(1 + θt)−(β+1) dt = . E (1 + θT) θ(β + 1) 0 (1 + θt) And, using integrationbyparts with u = t2 , du = 2t dt, dv = θ(1 + θt)−(β+3) dt, and v = −(1 + θt)−(β+2) /(β + 2), we have ∞ t2 2 T2 = θβ(1 + θt)−(β+1) dt = 2 . E 2 (1 + θT)2 (1 + θt) θ (β + 1)(β + 2) 0 Thus, it follows directly that the expected information matrix I is equal to ⎤ ⎡ ∂ 2 lnL ∂ 2 lnL −E −E ⎢ ⎥ ∂θ∂β ⎥ ⎢ ∂θ2 ⎥ I=⎢ ⎢ ⎥ ∂ 2 lnL ⎦ ∂ 2 lnL ⎣ −E −E ∂θ∂β ∂β2 ⎤ ⎡ βn n θ(β+1) ⎥ ⎢ (β + 2)θ2 ⎥, =⎢ ⎦ ⎣ n n β2
θ(β + 1)
and hence that
⎡
θ2 (β + 1)2 (β + 2) ⎢ βn ⎢ I −1 = ⎢ ⎣ −θβ(β + 1)(β + 2) n
⎤ −θβ(β + 1)(β + 2) ⎥ n ⎥ ⎥. ⎦ β2 (β + 1)2 n
Now, with δ =
=
∂γ ∂γ , ∂θ ∂β
−1 −1 , , θ2 (β − 1) θ(β − 1)2
282
Estimation Theory
use of the delta method gives, for large n, V(γ) ˆ ≈ δ I −1 δ (β + 1) = 2 nθ (β − 1)2
(β + 1)(β + 2) 2β(β + 2) β2 (β + 1) − + . β (β − 1) (β − 1)2
Then, with n = 300, θˆ = 0.32, and βˆ = 1.50, we obtain γˆ = [0.32(1.50 − 1)]−1 √ = 6.25 ˆ γ) and V( ˆ = 2.387, so that the 95% largesample CI for γ is 6.25 ± 1.96 2.387 = 6.25 ± 3.03, or (3.22,9.28). Solution 4.36* (a) For i = 1, 2, 3, let
1ni = (1, 1, . . . , 1)
denote the (ni × 1) column vector of ones, and let 0n2 = (0, 0, . . . , 0)
denote the (n2 × 1) column vector of zeros. Then, the (n × 3) design matrix X can be written as ⎡
1n1 X = ⎣ 1n2 1n3 so that
⎡
−1n1 0n2 1n3
⎤ 1n1 0n2 ⎦ , 1n3
n (n3 − n1 ) X X = ⎣ (n3 − n1 ) (n1 + n3 ) (n1 + n3 ) (n3 − n1 ) ⎤ ⎡ 1 b a = n⎣ b a b ⎦, a b a
⎤ (n1 + n3 ) (n3 − n1 ) ⎦ (n1 + n3 )
where a = (π1 + π3 ) and b = (π3 − π1 ). (b) From standard unweighted leastsquares theory, we know that V(βˆ 2 ) = c22 σ2 , where c22 is the last diagonal entry in the matrix (X X)−1 = ((cll )), l = 0, 1, 2 and l = 0, 1, 2. We can use the theory of cofactors to find an explicit expression for c22 . In particular, the cofactor needed for determining c22 is equal to 1 b (2+2) 2 = n2 (a − b2 ), (−1) n b a so that c22 = n2 (a − b2 )/X X. And, X X = n3 (a2 + ab2 + ab2 − a3 − b2 − ab2 ) = n3 (1 − a)(a2 − b2 ) = n3 [1 − (π1 + π3 )][(π1 + π3 )2 − (π3 − π1 )2 ] = 4n3 π1 π2 π3 .
283
Solutions
Finally, V(βˆ 2 ) = c22 σ2 = . =
n2 (a − b2 )σ2 X X
/ [(π1 + π3 ) − (π3 − π1 )2 ] σ2 . 4nπ1 π2 π3
(c) We wish to find values for π1 , π2 , and π3 that minimize V(βˆ 2 ) subject to the constraint (π1 + π2 + π3 ) = 1. Instead of considering V(βˆ 2 ), we can equivalently consider the quantity Q=
[(π1 + π3 ) − (π3 − π1 )2 ] , π1 π2 π3
which can be rewritten as Q=
(π1 + π3 ) − [(π1 + π3 )2 − 4π1 π3 ] π1 π2 π3
=
(π1 + π3 )[1 − (π1 + π3 )] 4π1 π3 + π1 π2 π3 π1 π2 π3
=
(π1 + π3 ) 4 + , π1 π3 (1 − π1 − π3 )
since π2 = (1 − π1 − π3 ). Now, ∂Q =0 ∂π1
gives (1 − π1 − π3 )2 = 4π21
and ∂Q =0 ∂π3
gives (1 − π1 − π3 )2 = 4π23 .
Since π1 , π2 , and π3 are positive, these two equations imply that π1 = π3 . Then, from the equation (1 − π1 − π3 )2 = 4π21 , we obtain (1 − 2π1 )2 = 4π21 , or π1 = 41 . Thus, the values for π1 , π2 , andπ3 that minimize V(βˆ 2 ) subject to the constraint 3 i=1 πi = 1 are 1 1 1 and π3 = . π1 = , π2 = 4 2 4 Note that the Lagrange Multiplier method can also be used to obtain this answer. Solution 4.37* √ √ ¯ i ± kSi / n not to have at least one value in common ¯ i ± kSi / n and X (a) For the CIs X (i.e., not to overlap), it is required that either Si Si
Si S
¯ i ) < −k √ ¯i + k√ ¯ i − k √ ¯i −X X < X , which implies (X + √i n n n n
284
Estimation Theory
or
Si ¯i − k√ X n
Si
¯ i + k √ > X , n
¯ i ) > k ¯i −X which implies (X
S S
√i + √i . n n
Thus, these two% inequalities together can be written succinctly as the event Eii = 4 &5 √ √ ¯i −X ¯ i  > k Si / n + Si / n , which gives the desired result. X (b) First, note that
S 2 S 2 S S 2 S 2 S
S 2 S = √i + √i + 2 √i + √i , ≥ √i √i + √i √i n n n n n n n n
2% % √ √ & √ &2 % √ &2 so that Si / n + Si / n ≥ Si / n + Si / n . So, using the result from part (a), we have
Si Si Cp √ +√ n n ⎤ ⎡ 2 2
S S i i Cp ⎦ ¯ i  > k ¯i −X + √ ≤ pr ⎣X √ n n
¯i −X ¯ i  > k π∗ii = pr X
⎤
⎡
¯ i  ¯i −X ⎥ ⎢ X ⎥ = pr ⎢ ⎣ 1 2 2 > k Cp ⎦ . Si
S i √ + √ n
n
Now, ¯ i ) ¯ −X (X ∼ N(0, 1), Z = $i 2 2σ /n U=
(n − 1)(S2i + S2i ) σ2
∼ χ22(n−1) ,
and Z and U are independent random variables. So, since √
¯ i ) ¯ −X Z (X = 1 i 2 ∼ t2(n−1) , U/2(n − 1) S
Si 2 √ + √i n
and since pr[T2(n−1) > k] ≤ it follows that
n
α , 2
# " π∗ii ≤ pr T2(n−1)  > k Cp ≤ α.
285
Solutions
(c) For p = 3 and given condition C3 , let θ3 be the conditional probability that there are no values common to all three CIs, or equivalently, that at least two of the three CIs have no values in common. Hence, θ3 = pr(E12 ∪ E13 ∪ E23 C3 ) ≤ pr(E12 C3 ) + pr(E13 C3 ) + pr(E23 C3 ) = 3π∗ii . Finally, from part (b), since π∗ii ≤ α, we obtain θ3 ≤ 3α. Note that θ3 is the probability of incorrectly deciding statistically that the three population means are not equal to the same value when, in fact, they are equal to the same value; that is, θ3 is analogous to a Type I error rate when testing the null hypothesis H0 that all three population means are equal to the same value versus the alternative hypothesis H1 that they are all not equal to the same value. Since 3α > α, this CIbased algorithm can lead to an inflated Type I error rate. For example, when α = 0.05, then this Type I error rate could theoretically be as high as 0.15. Given the stated assumptions, oneway analysis of variance would be an appropriate method for testing H0 versus H1 . Solution 4.38* θ1 to minimize (a) We wish to choose : θ0 and : Q=
n
[Yi − (θ0 + θ1 xi )]2 .
i=1
Now, the equation ∂Q = −2 [Yi − (θ0 + θ1 xi )] = 0 ∂θ0 n
i=1
implies that : θ0 = Y¯ − : θ1 x¯ , where 1 Y¯ = Yi n n
i=1
1 xi . n n
and x¯ =
i=1
And, ∂Q = −2 xi [Yi − (θ0 + θ1 xi )] = 0 ∂θ1 n
i=1
implies that n i=1
xi Yi = : θ0
n i=1
xi + : θ1
n i=1
xi2 = (Y¯ − : θ1 x¯ )
n i=1
xi + : θ1
n i=1
xi2 .
286
Estimation Theory
Hence, n xi Yi − Y¯ ni=1 xi : θ1 = i=1 n x2 − x¯ n x n
i=1 i
i=1 i
¯
i=1 (xi − x¯ )(Yi − Y) n 2 i=1 (xi − x¯ ) n (xi − x¯ )Yi = i=1 . n 2 i=1 (xi − x¯ )
=
Now, E(: θ1 ) =
n
i=1 (xi − x¯ )(θ0 + θ1 xi ) n 2 i=1 (xi − x¯ )
n
n (xi − x¯ ) (xi − x¯ )xi i=1 = θ0 n + θ1 i=1 n 2 2 ¯ i=1 (xi − x) i=1 (xi − x¯ ) = θ1 . And, V(: θ1 ) =
n
2 i=1 (xi − x¯ ) (θ0 + θ1 xi ) "n # 2 2 i=1 (xi − x¯ )
θ1 ni=1 xi (xi − x¯ )2 θ0 = n + " # . 2 n 2 2 i=1 (xi − x¯ ) i=1 (xi − x¯ ) Also, ¯ − x¯ E(: θ1 ) E(: θ0 ) = E(Y) 1 (θ0 + θ1 xi ) − x¯ θ1 n n
=
i=1
= θ0 + θ1 x¯ − x¯ θ1 = θ0 . And, since n n x¯ n (xi − x¯ )Yi 1 : θ0 = Yi − ni=1 = ci Yi , 2 n i=1 (xi − x¯ ) i=1
i=1
where ci =
x¯ (xi − x¯ ) 1 , − n 2 n i=1 (xi − x¯ )
287
Solutions
and where the {Yi } are mutually independent, V(: θ0 ) =
n
ci2 V(Yi ) =
i=1
= θ0
n
ci2 (θ0 + θ1 xi )
i=1
n
ci2 + θ1
i=1
n
ci2 xi ,
i=1
where ci is defined as above. (b) The likelihood function L has the structure
L=
n
(θ0 + θ1 xi )yi e−(θ0 +θ1 xi ) /yi !
i=1
=
⎧ n ⎨ ⎩
(θ0 + θ1 xi )yi
i=1
lnL =
n
⎫ ⎧ ⎫ n ⎬ n ⎬ ⎨ (θ +θ x ) − −1 1 i i=1 0 (yi !) e ; ⎭ ⎩ ⎭ i=1
yi ln(θ0 + θ1 xi ) −
i=1
∂ ln L = ∂θ0 ∂ 2 ln L −E
∂θ20 ∂ 2 ln L
i=1
=−
=
i=1
∂ 2 ln L =− ∂θ0 ∂θ1 −E
∂ 2 ln L ∂θ0 ∂θ1
n
n
=
n
i=1
i=1
(θ0 + θ1 xi ) +
yi ; (θ0 + θ1 xi )2
(θ0 + θ1 xi ) = (θ0 + θ1 xi )−1 = A; 2 (θ0 + θ1 xi ) n
i=1
n i=1
n
n
yi − n; (θ0 + θ1 xi )
i=1
∂θ20
n
xi yi ; (θ0 + θ1 xi )2
xi (θ0 + θ1 xi )−1 = B;
i=1
xi yi ∂ ln L = − xi ; ∂θ1 (θ0 + θ1 xi ) n
n
i=1
∂ 2 ln L −E
∂θ21 ∂ 2 ln L ∂θ21
=−
=
i=1
n
xi2 yi
i=1
(θ0 + θ1 xi )2
n i=1
;
xi2 (θ0 + θ1 xi )−1 = C.
ln (yi !)−1 ;
288
Estimation Theory
So, the expected information matrix is I(θ0 , θ1 ) =
A B
B C
.
ˆ B, ˆ and Cˆ as follows: For the available data, we compute I(θˆ 0 , θˆ 1 ) using A, ˆ = 25 [2 + 4(1)]−1 + [2 + 4(2)]−1 + [2 + 4(3)]−1 + [2 + 4(4)]−1 = 9.8425, A
ˆB = 25 1 + 2 + 3 + 4 = 20.0800, 6 10 14 18
4 9 16 1 + + + = 52.4625. Cˆ = 25 6 10 14 18 So, I(θˆ 0 , θˆ 1 ) =
9.8425 20.0800
and hence
ˆ Cˆ − Bˆ 2 )−1 I −1 (θˆ 0 , θˆ 1 ) = (A
Cˆ −Bˆ
20.0800 52.4625
−Bˆ ˆ A
=
0.4636 = −0.1775
Bˆ , Cˆ
= [(9.8425)(52.4625) − (20.0800)2 ]−1
ˆ A Bˆ
−0.1775 . 0.0870
52.4625 −20.0800
−20.0800 9.8425
Now, ˆ = θˆ 0 + (2.5)θˆ 1 , ψ with ˆ θˆ 1 ) + 2(1)(2.5)9 ˆ ψ) ˆ θˆ 0 ) + (2.5)2 V( ˆ = V( cov(θˆ 0 , θˆ 1 ). V( Since ˆ −ψ ψ ∼ ˙ N(0, 1) 2 ˆ ψ) ˆ V( for large n, our 95% CI for ψ is 2 ˆ ψ) ˆ = [2 + (2.5)(4)] ± 1.96 [0.4636 + 6.250(0.0870) + 5(−0.1775)]1/2 ˆ ± 1.96 V( ψ √ = 12 ± 1.96 0.1199 = 12 ± 0.6787 = (11.3213, 12.6787).
289
Solutions
Solution 4.39* (a) ¯ = Xi − Di = (Xi − X)
n 1 1 1 Xi = 1 − Xi − Xj . n n n i=1
j =i
Since Xi ∼ N(μ, σ2 )∀i and the {Xi } are mutually independent, then Di is itself normal since Di is a linear combination of mutually independent normal variates. Also, ¯ = E(Xi ) − E(X) ¯ = μ − μ = 0, E(Di ) = E(Xi − X) and 1 2 2 (n − 1) 2 σ + σ V(Di ) = 1 − n n2 (n − 1)2 (n − 1) 2 (n − 1) = + σ = [(n − 1) + 1]σ2 n2 n2 n2 n−1 = σ2 . n So,
n−1 σ2 . Di ∼ N 0, n
(b) Now, cov(Di , Dj ) = E(Di Dj ) ¯ ¯ = E[(Xi − X)(X j − X)] ¯ − E(Xj X) ¯ + E(X ¯ 2) = E(Xi Xj ) − E(Xi X) 2 σ 2 2 ¯ − E(Xj X) ¯ + = μ − E(Xi X) +μ . n Now, ⎡
⎛
¯ = E ⎣Xi ⎝ 1 E(Xi X) n
n
⎞⎤ Xi ⎠⎦
i=1
1 E[X1 Xi + X2 Xi + · · · + Xi−1 Xi + Xi2 + Xi+1 Xi + . . . + Xn Xi ] n 1 = [(n − 1)μ2 + (μ2 + σ2 )] n 1 = (nμ2 + σ2 ) n =
= μ2 +
σ2 . n
290
Estimation Theory
¯ = μ2 + σ2 /n. So An identical argument shows that E(Xj X) cov(Di , Dj ) = μ2 −
σ2 μ2 +
−
n
σ2 μ2 +
+
n
σ2 + μ2 n
=
−σ2 . n
Finally, corr(Di , Dj ) = 2
cov(Di , Dj ) V(Di ) · V(Dj )
= 1
=
−σ2 /n n−1 σ2 · n−1 σ2 n n
−1 . (n − 1)
Clearly, lim [corr(Di , Dj )] = 0.
n→∞
¯ → 0 as n → +∞, corr(Di , Dj ) → corr(Xi , Xj ) = 0 as n → +∞. Since V(X) (c) (n − 1)S2x /σ2 Ux S2 = , R = 2x = Uy Sy (n − 1)S2y /σ2 where Ux ∼ χ2n−1 , Uy ∼ χ2n−1 , and Ux and Uy are independent. So, [(n−1)/2]−1
fUx ,Uy (ux , uy ) =
[(n−1)/2]−1 −ux /2 uy e−uy /2 ux e · , Γ[(n − 1)/2] · 2[(n−1)/2] Γ[(n − 1)/2] · 2[(n−1)/2]
ux > 0, uy > 0. Let R = Ux /Uy and S = Uy , so that Ux = RS and Uy = S; and, R > 0, S > 0. Also, ∂Ux ∂R J = ∂Uy ∂R
∂Ux ∂S S = ∂Uy 0 ∂S
R = S. 1
So, fR,S (r, s) = fUx ,Uy (rs, s) ×  J =
s[(n−3)/2] e−s/2 (rs)[(n−3)/2] e−rs/2 · ·s [(n−1)/2] Γ[(n − 1)/2] · 2 Γ[(n − 1)/2] · 2[(n−1)/2]
=
r[(n−3)/2] s(n−2) e−(1+r)s/2 , [Γ[(n − 1)/2]]2 · 2(n−1)
r > 0, s > 0.
291
Solutions
So, ∞ 2 r[(n−3)/2] −s/[ (1+r) ] s(n−1)−1 e ds 2 (n−1) [Γ[(n − 1)/2]] · 2 0
(n−1) r[(n−3)/2] 2 = · Γ(n − 1) (1 + r) [Γ[(n − 1)/2]]2 · 2(n−1) −2 n−1 = [Γ(n − 1)] Γ r[(n−3)/2] (1 + r)−(n−1) , 2
fR (r) =
r > 0.
(d) Clearly,
E(μ ˆ 1) =
1 ¯ + E(Y)] ¯ = (μ + μ) = μ. [E(X) 2 2
And, E(μ ˆ 2) = E
¯ 2 + YS ¯ 2 XS y x S2x + S2y
¯ + RY¯ X =E , (1 + R)
where R = S2x /S2y ∼ F(n−1),(n−1) .
¯ Y, ¯ S2 , and S2 are mutually independent random variables, Now, since X, x y .
¯ = r) + rE(YR ¯ = r) E(XR E(μ ˆ 2 ) = Er {E(μ ˆ 2 R = r)} = Er (1 + r) / . ¯ + rE(Y) ¯ E(X) = Er (1 + r) μ + rμ = Er (1 + r) μ(1 + r) = Er (1 + r) = Er (μ) = μ.
/
292
Estimation Theory
(e)
1 ¯ ¯ = 1 [V(X) ¯ + V(Y)] ¯ (X + Y) 2 4 σ2 σ2 1 σ2 + = . = n 2n 4 n
V(μ ˆ 1) = V
ˆ 2 R = r)} + Er {V(μ ˆ 2 )R = r} V(μ ˆ 2 ) = Vr {E(μ / . ¯ + RY¯ X = Vr (μ) + Er V R=r (1 + R) . / ¯ = r) ¯ = r) + r2 V(YR V(XR = 0 + Er (1 + r)2 . / ¯ + r2 V(Y) ¯ V(X) = Er (1 + r)2 . / (σ2 /n) + r2 (σ2 /n) = Er (1 + r)2 (1 + r2 ) σ2 . Er = n (1 + r)2
So, to find V(μ ˆ 2 ), we need to find E (1 + R2 )/(1 + R)2 , where fR (r) is given in part (c). So,
(1 + R2 ) E (1 + R)2
=E 1−2
R (1 + R)2
= 1 − 2E
R . (1 + R)2
Now, with u = r/(1 + r), E
R (1 + R)2
∞
r Γ(n − 1) r[(n−3)/2] (1 + r)−(n−1) dr (1 + r)2 [Γ[(n − 1)/2]]2 ∞ [(n−1)/2] r (1 + r)−(n−1) Γ(n − 1) dr = 2 [Γ[(n − 1)/2]] 0 (1 + r)2 1 Γ(n − 1) = u[(n+1)/2]−1 (1 − u)[(n+1)/2]−1 du [Γ(n − 1/2)]2 0 =
0
=
[(n − 1)/2]2 [Γ[(n + 1)/2]]2 Γ(n − 1) = · 2 Γ(n + 1) n(n − 1) [Γ[(n − 1)/2]]
=
(n − 1) . 4n
293
Solutions
Finally,
R σ2 1 − 2E n (1 + R)2
(n − 1) σ2 1−2 = n 4n
σ2 (n − 1) = 1− 2n n 2 σ 2n − n + 1 = n 2n n+1 σ2 . = 2n2
Since V(μ ˆ 1 ) = (1/2n) σ2 < V(μ ˆ 2 ) = (n + 1)/2n2 σ2 when n > 1, we prefer μ ˆ1 (a result which actually follows from the theory of sufficiency). V(μ ˆ 2) =
Solution 4.40* (a) Now,
¯ = k1 μ, E(μ ˆ 1 ) = E(k1 Y)
so that k1 = 1. Then, 2 2 ¯ = (θ /12) = θ . V(μ ˆ 1 ) = V(Y) n 12n
(b) Since
fY(n) (y(n) ; θ) = n
y(n) n−1 −1 n−1 θ = nθ−n y(n) , θ
0 < y(n) < θ,
we have, for r ≥ 0, r ] = nθ−n E[Y(n)
θ 0
(n+r)−1
y(n)
dy(n) =
So, E[Y(n) ] =
n θ. n+1
Thus, E(μ ˆ 2 ) = E[k2 Y(n) ] n = k2 θ n+1 n μ, = 2k2 n+1
n n+r
θr .
294
Estimation Theory
so that k2 = (n + 1)/2n. Since 2 ] − 4E[Y ]52 V[Y(n) ] = E[Y(n) (n)
= =
2 n n θ2 θ2 − n+1 n+2
nθ2 , (n + 1)2 (n + 2)
it follows that V(μ ˆ 2 ) = V(k2 Y(n) ) = k22 V(Y(n) ) =
nθ2 (n + 1)2 · 4n2 (n + 1)2 (n + 2)
=
θ2 . 4n(n + 2)
(c) Since fY(1) ,Y(n) (y(1) , y(n) ; θ) = n(n − 1)θ−n (y(n) − y(1) )n−2 ,
0 < y(1) < y(n) < θ,
we have, for r ≥ 0 and s ≥ 0, θ y(n)
r Ys −n r ys (y n−2 dy = n(n − 1)θ y(1) E Y(1) (1) dy(n) (n) (n) (n) − y(1) ) 0 0
= n(n − 1)θ−n = n(n − 1)θ−n = n(n − 1)θ−n =
θ 0
θ 0
θ 0
s y(n) s y(n)
y
(n)
0
1 0
r (y n−2 dy y(1) (n) − y(1) ) (1) dy(n)
(y(n) u)r (y(n) − y(n) u)n−2 y(n) du dy(n)
(n+r+s)−1 y(n)
1 0
u(r+1)−1 (1 − u)(n−1)−1 du dy(n)
n(n − 1) Γ(r + 1)Γ(n − 1) (r+s) · θ . (n + r + s) Γ(n + r)
Since E[Y(1) ] =
Γ(1 + 1)Γ(n − 1) (1+0) n(n − 1) θ · θ , = (n + 1 + 0) Γ(n + 1) (n + 1)
295
Solutions
it follows that k E(μ ˆ 3 ) = 3 [E(Y(1) ) + E(Y(n) )] 2
θ nθ k3 + = 2 (n + 1) (n + 1) θ = k3 μ, = k3 2 so that k3 = 1. Since V[Y(1) ] = V[Y(n) ], by symmetry, and since cov [Y(1) , Y(n) ] =
Γ(1 + 1)Γ(n − 1) (1+1) n(n − 1) · θ (n + 1 + 1) Γ(n + 1)
nθ θ − (n + 1) (n + 1)
=
θ2 nθ2 − (n + 2) (n + 1)2
=
θ2 , (n + 1)2 (n + 2)
it follows that
# 1" Y + Y(n) V(μ ˆ 3) = V 2 (1)
1 [V(Y(1) ) + V(Y(n) ) + 2cov(Y(1) , Y(n) )] 4 nθ2 1 θ2 = + 2 (n + 1)2 (n + 2) (n + 1)2 (n + 2)
=
=
θ2 . 2(n + 1)(n + 2)
(d) We have shown that V(μ ˆ 1) =
θ2 , 12n
V(μ ˆ 2) =
θ2 , 4n(n + 2)
V(μ ˆ 3) =
θ2 . 2(n + 1)(n + 2)
Now, 4n(n + 2) − 12n = 4n2 − 4n = 4n(n − 1) > 0
for n > 1,
4n(n + 2) − 2(n + 1)(n + 2) = 2(n + 2)(n − 1) > 0
for n > 1,
296
Estimation Theory
and 2(n + 1)(n + 2) − 12n = 2(n − 1)(n − 2) > 0
for n > 2.
So, for n > 2, we have V(μ ˆ 2 ) < V(μ ˆ 3 ) < V(μ ˆ 1 ). Now, V(μ ˆ 3) V(μ ˆ 2) lim = 0, = lim n→∞ V(μ ˆ 1 ) n→∞ V(μ ˆ 1) ˆ 2 and μ ˆ 3. so that μ ˆ 1 has an asymptotic efficiency of 0 relative to μ Since 1 V(μ ˆ 2) 2(n + 1)(n + 2) lim = lim = , n→∞ V(μ ˆ 3 ) n→∞ 4n(n + 2) 2 ˆ 2 . That μ ˆ 2 is the this implies that μ ˆ 3 is asymptotically 50% as efficient as μ estimator of choice should not be surprising since Y(n) is a (complete) sufficient statistic for θ (and hence for μ), and so μ ˆ 2 is the minimum variance unbiased estimator (MVUE) of μ. Solution 4.41* (a) The likelihood function L has the structure L=
n
i=1 n
=θ Since
θxi (1 − θ)1−xi
i=1 xi (1
− θ)n−
1 e−yi /μ(xi ) μ(xi )
n
i=1 xi e−
n
i=1 (α+βxi ) e−
n
i=1 e
−(α+βxi ) y
i.
n
i=1 xi = n1 , we have
ln L = n1 ln θ + n0 ln(1 − θ) − nα − βn1 −
n
e−(α+βxi ) yi .
i=1
So, n n0 ∂ ln L = 1 − =0 ∂θ θ (1 − θ) ⇒ n1 (1 − θ) − n0 θ = 0 ⇒ θˆ = n1 /(n0 + n1 ) = n1 /n = x¯ . And, ∂ ln L = −n + e−(α+βxi ) yi = 0 ∂α n
i=1
⇒ −n + e−α e−β n1 y¯ 1 + n0 y¯ 0 = 0.
(4.10)
297
Solutions
Also, ∂ ln L = −n1 + e−(α+βxi ) xi yi = 0 ∂β n
i=1
⇒ −n1 + e−α e−β n1 y¯ 1 = 0 ⇒ e−(α+β) y¯ 1 = 1.
(4.11)
So, using (Equation 4.11) in (Equation 4.10), we obtain −n + n1 + e−α n0 y¯ 0 = 0 ⇒ e−α =
n − n1 n 1 = 0 = n0 y¯ 0 n0 y¯ 0 y¯ 0
so that & % −αˆ = ln 1/¯y0 , or αˆ = ln(¯y0 ). ˆ
ˆ β) y¯ = 1, or Finally, since αˆ = ln(¯y0 ), it follows from (Equation 4.11) that e−(α+ 1 % & ˆ β α ˆ e = y¯ 1 /e , or βˆ = ln y¯ 1 /¯y0 . So, in summary,
θˆ = x¯ , αˆ = ln(¯y0 ),
y¯ ˆ and β = ln 1 . y¯ 0
(b) Now, n n0 ∂ 2 ln L = − 21 − , ∂θ2 θ (1 − θ)2 so that
∂ 2 ln L −E ∂θ2
=
E(n1 ) E(n0 ) + 2 θ (1 − θ)2
n n nθ n(1 − θ) n = . = 2 + = + θ (1 − θ) θ(1 − θ) θ (1 − θ)2 Clearly, ∂ 2 ln L ∂ 2 lnL = = 0. ∂θ∂α ∂θ∂β Now, with X = (X1 , X2 , . . . , Xn ) and x = (x1 , x2 , . . . , xn ), we have ∂ 2 ln L = −e−α e−βxi yi , ∂α2 n
i=1
298
Estimation Theory
so that −E
∂ 2 ln L ∂α2
.
/ ∂ 2 ln L X = x ∂α2 ⎡ ⎤ n e−βxi eα+βxi ⎦ = n. = e−α Ex ⎣ = −Ex E
i=1
And, ∂ 2 ln L = −e−α e−βxi xi2 yi , 2 ∂β n
i=1
so that
∂ 2 ln L −E ∂β2
.
/ ∂ 2 ln L = −Ex E X=x ∂β2 ⎡ ⎤ n −α −βx 2 α+βx ix e i⎦ e = e Ex ⎣ i i=1
=
n
n
Ex xi2 = θ(1 − θ) + θ2 = nθ.
i=1
i=1
Finally, ∂ 2 ln L = −e−α e−(α+βxi ) xi yi , ∂α∂β n
i=1
so that
∂ 2 ln L −E ∂α∂β
/ ∂ 2 ln L = −Ex E X=x ∂α∂β ⎡ ⎤ n n −α −βx α+βx ix e i⎦ = e Ex (xi ) = nθ. = e Ex ⎣ i .
i=1
i=1
So, ⎡
n ⎢ I = ⎣ nθ 0
nθ nθ 0
⎤ 0 ⎥ 0 ⎦, n θ(1 − θ)
299
Solutions
and ⎡ ⎢ ⎢
⎢ I −1 = ⎢ ⎢ ⎣
1 n(1 − θ) −1 n(1 − θ)
−1 n(1 − θ) 1 nθ(1 − θ)
0
0
⎤ 0
⎥ ⎥ ⎥ ⎥. 0 ⎥ θ(1 − θ) ⎦ n
So, % & ˙ V αˆ ≈
1 1 −1 ˆ ≈ ˆ ≈ ˙ ˙ , V(β) , cov(α, ˆ β) , n(1 − θ) nθ(1 − θ) n(1 − θ)
ˆ = cov(β, ˆ θ) ˆ = 0, cov(α, ˆ θ)
and
ˆ = V(θ)
θ(1 − θ) . n
(c) The parameter of interest is β, and statistical evidence that β = 0 suggests that the true mean time to death differs between adult males with advanced malignant melanoma depending on whether or not these adult males have a family history of skin cancer. An appropriate largesample 95% CI for β is (for large n): βˆ ± 1.96
1 ˆ − θ) ˆ nθ(1
.
With n = 50, θˆ = 0.60 and βˆ = 0.40, we obtain 0.40 ± 1.96
1 50(0.60)(1 − 0.60)
or
(−0.1658, 0.9658).
So, these data provide no evidence that β = 0, since 0 is contained in the computed CI. Solution 4.42* (a) Note that ⎛ ⎞ N N 1 x i ⎝ ⎠ Yx Yxi + x θˆ x = B0 + B1 x = i N 2 N j=1 xj i=1 i=1 ⎡ ⎤ N N xx 1 i ⎣ + ⎦ Yx = = ci Yxi , i N 2 N j=1 xj i=1 i=1 where
⎡
⎤ xx 1 i ⎦, c i = ⎣ + N N x2 j=1 j
i = 1, 2, . . . , N.
300
Estimation Theory
So, E(θˆ x ) =
N
ci E(Yxi ) =
N
i=1
= β0
i=1
N
ci + β1
i=1
Now,
ci (β0 + β1 xi + β2 xi2 )
N
ci xi + β2
i=1
N
ci xi2 .
i=1
⎤ ⎛ ⎞ N xx 1 x ⎣ + i ⎦ = 1 + ⎝ ⎠ ci = xi = 1, N N 2 2 N j=1 xj j=1 xj i=1 i=1 i=1
N
N
⎡
since μ1 = 0. Also, N
ci xi =
i=1
N i=1
⎡
⎤ 1 xx ⎣ + i ⎦ xi = x, N N x2 j=1 j
since μ1 = 0. Finally, ⎡
⎤ xx 1 ci xi2 = ⎣ + N i ⎦ xi2 = μ2 , 2 N j=1 xj i=1
N
since μ3 = 0. So, E(θˆ x ) = β0 + β1 x + β2 μ2 , where μ2 = N −1 And, finally, V(θˆ x ) =
N
ci2 V(Yxi ) = σ2
i=1
N i=1
⎡
N
2 i=1 xi .
⎡
⎤2 1 xx i ⎣ + ⎦ N N x2 j=1 j
⎤ N 2 x2 x 1 2xx ⎢ ⎥ i i = σ2 ⎣ 2 + N 2 + 2 ⎦ N N N j=1 xj i=1 x2 j=1 j
⎡ =
σ2 N
N ⎢ 2 + 0 + σ2 x 2 ⎣ xi i=1
⎞2 ⎤ =⎛ N 2 2 x σ ⎥ 2 ⎝ xi ⎠ ⎦ = 1+ . N μ2 i=1
Finally, θˆ x is normally distributed since it is a linear combination of mutually independent and normally distributed random variables. (b) Since E(θˆ x ) = β0 + β1 x + β2 μ2 , it follows that E(θˆ x ) − θx = β2 (μ2 − x2 ). So, Q=
1
1 2 (μ22 − 2μ2 x2 + x4 ) dx β2 (μ2 − x2 ) dx = β22 −1
= 2β22
μ2 −
1 3
2
+
4 , 45
−1
301
Solutions
which is minimized when μ2 = 13 . So, an optimal design for minimizing the integrated squared bias Q chooses the temperature spacings x1 , x2 , . . . , xN such that 1 2 μ2 = (1/N) N i=1 xi = 3 , given that μ1 = μ3 = 0. Note that μ1 = μ3 = 0 will be satisfied if the xi are chosen to be symmetric about zero. For example, when N = 4, we can choose x1 = −x4 and x2 = −x3 . Then, to satisfy μ2 = 2(x12 + x22 ) = 1/3 2 we can choose x1 to be any number in the interval (0, 16 ) and then choose 2 2 2 2 2 2 , x = 1 , x =− 1 , x =− 2 . x2 = 16 − x12 . For example, x1 = 18 2 3 4 18 18 18 Solution 4.43* (a) cov(Yi0 , Yi1 ) = E(Yi0 Yi1 ) − E(Yi0 )E(Yi1 ) = Eαi [E(Yi0 Yi1 αi )] − Eαi [E(Yi0 αi )]Eαi [E(Yi1 αi )]. Now,
p
E(Yij αi ) = Lij e(αi +βDij +
l=1 γl Cil ) .
So, since αi ∼ N(0, σα2 ), it follows from moment generating function theory that 2 2 E(etαi ) = et σα /2 ,
−∞ < t < +∞.
Thus,
p
E(Yij ) = Lij e(0.50σα +βDij + 2
l=1 γl Cil ) .
And, using the assumption that Yi0 and Yi1 are independent given αi fixed, we have E(Yi0 Yi1 ) = Eαi [E(Yi0 Yi1 αi )] = Eαi [E(Yi0 αi )E(Yi1 αi )]
p p = Eαi (Li0 e(αi + l=1 γl Cil ) )(Li1 e(αi +β+ l=1 γl Cil ) ) = Li0 Li1 e(2σα +β+2 2
p
l=1 γl Cil ) .
Thus, cov(Yi0 , Yi1 ) = Li0 Li1 e(σα +β+2 2
p
l=1 γl Cil )
2 eσα − 1 .
The inclusion of the random effect αi in the proposed statistical model serves two purposes: (1) to allow for families to have different (baseline) tendencies toward child abuse, and (2) to account for the anticipated positive correlation between Yi0 and Yi1 for the ith family [in particular, note that cov(Yi0 , Yi1 ) = 0 only when σα2 = 0 and is positive when σα2 > 0].
302
Estimation Theory
(b) Now, pYi1 (yi1 Yi = yi , αi ) = pr(Yi1 = yi1 Yi = yi , αi ) =
pr[(Yi1 = yi1 ) ∩ (Yi = yi )αi ] pr(Yi = yi αi )
=
pr[(Yi1 = yi1 ) ∩ (Yi0 = yi − yi1 )αi ] pr(Yi = yi αi )
=
pr(Yi1 = yi1 αi )pr[Yi0 = (yi − yi1 )αi ] pr(Yi = yi αi )
y −L λ
(y −y ) −L λ
=
(Li1 λi1 ) i1 e yi1 !
y
= Cyii1
(Li0 λi0 ) i i1 e (yi −yi1 )!
i1 i1
(Li0 λi0 +Li1 λi1 )yi e−(Li0 λi0 +Li1 λi1 yi !
i0 i0
)
yi1 (yi −yi1 ) Li1 θ Li0 , Li0 + Li1 θ Li0 + Li1 θ
where
yi1 = 0, 1, . . . , yi ,
p
e(αi +β+ l=1 γl Cil ) λ = eβ . θ = i1 = p λi0 e(αi + l=1 γl Cil ) (c) Since L=
n y Cyii1 i=1
yi1 (yi −yi1 ) Li1 θ Li0 , Li0 + Li1 θ Li0 + Li1 θ
it follows that ln(L) ∝
n [yi1 ln(θ) − yi ln(Li0 + Li1 θ)]. i=1
Thus,
n yi Li1 ∂ ln(L) yi1 = − = 0, ∂θ θ (Li0 + Li1 θ) i=1
so that the MLE θˆ of θ satisfies the equation θˆ
n i=1
(d) Since
yi Li1
Li0 + Li1 θˆ
=
n
yi1 .
i=1
n yi L2i1 ∂ 2 ln(L) −yi1 , = + (Li0 + Li1 θ)2 ∂θ2 θ2 i=1
303
Solutions
it follows that ⎤ ⎡ L θ i1 n 2 yi L +L L y ∂ 2 ln(L) θ i i1 i0 i1 ⎦ ⎣ −E {yi }, {αi } = − ∂θ2 θ2 (Li0 + Li1 θ)2
i=1
=
n 1 yi Li0 Li1 . θ (Li0 + Li1 θ)2 i=1
Thus, a largesample 95% CI for the rate ratio parameter θ is ⎡ n $ θˆ ± 1.96 θˆ ⎣ i=1
⎤−1/2 yi Li0 Li1 ⎦ . ˆ 2 (Li0 + Li1 θ)
For an important application of this methodology, see Gibbs et al. (2007). Solution 4.44*. With X ∗ = (X1∗ , X2∗ , . . . , Xn∗ ) and x∗ = (x1∗ , x2∗ , . . . , xn∗ ), we have E(βˆ ∗1 X ∗ = x∗ ) =
n
∗ ∗ ∗ ∗ i=1 (xi − x¯ )E(Yi X = x ) . n ∗ ∗ 2 i=1 (xi − x¯ )
Now, using the nondifferential measurement error assumption, we have, for i = 1, 2, . . . , n, " # E(Yi X ∗ = x∗ ) = E(Yi Xi∗ = xi∗ ) = EXi X ∗ =x∗ E(Yi Xi∗ = xi∗ , Xi = xi ) i i = EXi X ∗ =x∗ [E(Yi Xi = xi )] = EXi X ∗ =x∗ (β0 + β1 xi ) i
i
i
i
= β0 + β1 E(Xi Xi∗ = xi∗ ). And, since (Xi , Xi∗ = Xi + Ui ) ∼ BVN[μx , μx , σx2 , (σx2 + σu2 ), ρ], where cov(Xi , Xi∗ ) cov(X , X + Ui ) ρ= 2 = 2 i i V(Xi )V(Xi∗ ) σx2 (σx2 + σu2 ) V(Xi ) σx2 = 2 = 2 σx2 (σx2 + σu2 ) σx2 (σx2 + σu2 ) = 
1
σ2 1 + u2 σx
1 = √ , 1+λ
304
Estimation Theory
it follows from bivariate normal distribution theory that V(Xi ) ∗ ∗ ∗ (x − μx ) E(Xi Xi = xi ) = μx + ρ V(Xi∗ ) i 1 σx2 = μx + √ (xi∗ − μx ) 1+λ (σx2 + σu2 ) 1 (xi∗ − μx ). = μx + 1+λ Hence, we have E(Yi X ∗ = x∗ ) = β0 + β1 μx + where β∗0 = β0 + β1 μx
λ 1+λ
1 (xi∗ − μx ) = β∗0 + β∗1 xi∗ , 1+λ
and β∗1 =
β1 . (1 + λ)
Finally, we obtain E(βˆ ∗1 X ∗ = x∗ ) =
n
∗ ∗ ∗ ∗ ∗ i=1 (xi − x¯ )(β0 + β1 xi ) n ∗ ∗ 2 i=1 (xi − x¯ )
= β∗1 =
β1 . (1 + λ)
So, since 0 < λ < ∞, β∗1  < β1 , indicating a somewhat common detrimental measurement error effect called attenuation. Because the predictor variable is measured with error, the estimator βˆ ∗1 tends, on average, to underestimate the true slope β1 (i.e., the estimator βˆ ∗1 is said to be attenuated). As λ = σu2 /σx2 increases in value, the amount of attenuation increases. For more complicated measurement error scenarios, attenuation should not always be the anticipated measurement error effect; in particular, an estimator could actually have a tendency to overestimate a particular parameter of interest [e.g., see Kupper (1984)]. Solution 4.45* (a) First, cov(X, X ∗ C) = cov(α0 + α1 X ∗ + δ C + U, X ∗ C) = α1 V(X ∗ C). And,
V(XC) = α21 V(X ∗ C) + σu2 .
So, corr(X, X ∗ C) = √
cov(X, X ∗ C) V(XC)V(X ∗ C)
305
Solutions α1 V(X ∗ C) = 2 [α21 V(X ∗ C) + σu2 ]V(X ∗ C) 1 = 2 < 1. 2 1 + σu /[α21 V(X ∗ C)]
(b) With X = α0 + α1 X ∗ + δ C + U and given X ∗ and C, it follows directly that X has a normal distribution with E(XX ∗ , C) = α0 + α1 X ∗ + δ C and V(XX ∗ , C) = V(U) = σu2 . (c) Now, appealing to the nondifferential error assumption, we have " # pr(Y = 1X ∗ , C) = E(YX ∗ , C) = EXX ∗ ,C E(YX, X ∗ , C) " # = EXX ∗ ,C [E(YX, C)] = EXX ∗ ,C pr(Y = 1X, C)
= EXX ∗ ,C e(β0 +β1 X+γ C) = e(β0 +γ C) EXX ∗ ,C eβ1 X . Thus, from moment generating theory and the results in part (b), we have
2 2 ∗
pr(Y = 1X ∗ , C) = e(β0 +γ C) e[β1 (α0 +α1 X +δ C)]+(β1 σu )/2
∗ = e(θ0 +θ1 X +ξ C) ,
where θ0 = (β0 + β1 α0 + (β21 σu2 )/2), θ1 = β1 α1 , and ξ = (γ + β1 δ ). Since θ1 /β1 = α1 , it follows that 0 < θ1 ≤ β1 when 0 < α1 ≤ 1 and that θ1 > β1 when α1 > 1. So, the use of X ∗ instead of X will result in biased estimation of the parameter β1 . In particular, if 0 < α1 < 1, the tendency will be to underestimate β1 ; and, if α1 > 1, the tendency will be to overestimate β1 . Solution 4.46* (a) First, E(X) = δ and V(X) = δ(1 − δ). So, " # " # E(X ∗ ) = E[(X ∗ )2 ] = E E(X ∗ X = x) = E pr(X ∗ = 1X = x) = E(πx1 ) = π11 δ + π01 (1 − δ), and " # V(X ∗ ) = E(X ∗ ) 1 − E(X ∗ ) . Also, " # " # E(XX ∗ ) = E E(XX ∗ X = x) = E (x)pr(X ∗ = 1X = x) = E [(x)πx1 ] = (1)π11 δ = π11 δ.
306
Estimation Theory
Thus, cov(X, X ∗ ) corr(X, X ∗ ) = √ V(X)V(X ∗ ) =
E(XX ∗ ) − E(X)E(X ∗ ) √ V(X)V(X ∗ )
= $
π11 δ − δE(X ∗ ) δ(1 − δ)E(X ∗ ) [1 − E(X ∗ )]
.
So, corr(X, X ∗ ) = 1 when π11 = 1 and π01 = 0 (or, equivalently, when π10 = 0 and π00 = 1), since then E(X ∗ ) = δ. When π11 = pr(X ∗ = 1X = 1) < 1 and/or π01 = pr(X ∗ = 1X = 0) > 0, so that corr(X, X ∗ ) < 1, then X ∗ is an imperfect surrogate for X. (b) Now, using the nondifferential error assumption given earlier, we have μ∗x∗ = pr(Y = 1X ∗ = x∗ ) =
1
pr[(Y = 1) ∩ (X = x)X ∗ = x∗ ]
x=0
=
1
pr(Y = 1(X = x) ∩ (X ∗ = x∗ )]pr(X = xX ∗ = x∗ )
x=0
=
1
pr(Y = 1X = x)pr(X = xX ∗ = x∗ )
x=0
=
1
μx γx∗ x ,
x=0
where γx∗ x = pr(X = xX ∗ = x∗ ). So, since (γ00 + γ01 ) = (γ10 + γ11 ) = 1, we have θ∗ = (μ∗1 − μ∗0 ) =
1
μx γ1x −
x=0
1
μx γ0x
x=0
= (μ0 γ10 + μ1 γ11 ) − (μ0 γ00 + μ1 γ01 ) = μ0 (1 − γ11 ) + μ1 γ11 − μ0 (1 − γ01 ) − μ1 γ01 = (μ1 − μ0 )(γ11 − γ01 ) = θ(γ11 − γ01 ), so that θ∗  ≤ θ since γ11 − γ01  ≤ 1. Hence, under the assumption of nondifferential error, the use of X ∗ instead of X tends, on average, to lead to underestimation of the risk difference parameter θ, a phenomenon known as attenuation. For more detailed discussion about the effects of misclassification error on the validity of analyses of epidemiologic data, see Gustafson (2004) and Kleinbaum et al. (1982).
5 Hypothesis Testing Theory
5.1
Concepts and Notation
5.1.1 5.1.1.1
Basic Principles Simple and Composite Hypotheses
A statistical hypothesis is an assertion about the distribution of one or more random variables. If the statistical hypothesis completely specifies the distribution (i.e., the hypothesis assigns numerical values to all unknown population parameters), then it is called a simple hypothesis; otherwise, it is called a composite hypothesis. 5.1.1.2
Null and Alternative Hypotheses
In the typical statistical hypothesis testing situation, there are two hypotheses of interest: the null hypothesis (denoted H0 ) and the alternative hypothesis (denoted H1 ). The statistical objective is to use the information in a sample from the distribution under study to make a decision about whether H0 or H1 is more likely to be true (i.e., is more likely to represent the true “state of nature”). 5.1.1.3
Statistical Tests
Astatistical test of H0 versus H1 consists of a rule which, when operationalized using the available information in a sample, leads to a decision either to reject, or not to reject, H0 in favor of H1 . It is important to point out that a decision not to reject H0 does not imply that H0 is, in fact, true; in particular, the decision not to reject H0 is often due to data inadequacies (e.g., too small a sample size, erroneous and/or missing information, etc.) 5.1.1.4 Type I and Type II Errors For any statistical test, there are two possible decision errors that can be made. A “Type I” error occurs when the decision is made to reject H0 in favor of 307
308
Hypothesis Testing Theory
H1 when, in fact, H0 is true; the probability of a Type I error is denoted as α = pr(test rejects H0 H0 true). A “Type II” error occurs when the decision is made not to reject H0 when, in fact, H0 is false and H1 is true; the probability of a Type II error is denoted as β = pr(test does not reject H0 H0 false).
5.1.1.5
Power
The power of a statistical test is the probability of rejecting H0 when, in fact, H0 is false and H1 is true; in particular, POWER = pr(test rejects H0 H0 false) = (1 − β). Type I error rate α is controllable and is typically assigned a value satisfying the inequality 0 < α ≤ 0.10. For a given value of α, Type II error rate β, and hence the power (1 − β), will generally vary as a function of the values of population parameters allowable under a composite alternative hypothesis H1 . In general, for a specified value of α, the power of any reasonable statistical testing procedure should increase as the sample size increases. Power is typically used as a very important criterion for choosing among several statistical testing procedures in any given situation.
5.1.1.6 Test Statistics and Rejection Regions A statistical test of H0 versus H1 is typically carried out by using a test statistic. A test statistic is a random variable with the following properties: (i) its distribution, assuming the null hypothesis H0 is true, is known either exactly or to a close approximation (i.e., for large sample sizes); (ii) its numerical value can be computed using the information in a sample; and, (iii) its computed numerical value leads to a decision either to reject, or not to reject, H0 in favor of H1 . More specifically, for a given statistical test and associated test statistic, the set of all possible numerical values of the test statistic under H0 is divided into two disjoint subsets (or “regions”), the rejection region R and the non¯ The statistical test decision rule is then defined as follows: rejection region R. if the computed numerical value of the test statistic is in the rejection region R, then reject H0 in favor of H1 ; otherwise, do not reject H0 . The rejection region R is chosen so that, under H0 , the probability that the test statistic falls in the rejection region R is equal to (or approximately equal to) α (in which case the rejection region and the associated statistical test are both said to be of “size” α). Almost all popular statistical testing procedures use test statistics that, under H0 , follow (either exactly or approximately) welltabulated distributions such as the standard normal distribution, the tdistribution, the chisquared distribution, and the fdistribution.
309
Concepts and Notation
5.1.1.7
PValues
The Pvalue for a statistical test is the probability of observing a test statistic value at least as rare as the value actually observed under the assumption that the null hypothesis H0 is true. Thus, for a size α test, when the decision is made to reject H0 , then the Pvalue is less than α; and, when the decision is made not to reject H0 , then the Pvalue is greater than α.
5.1.2
Most Powerful (MP) and Uniformly Most Powerful (UMP) Tests
Let X = (X1 , X2 , . . . , Xn ) be a random row vector with likelihood function (or joint distribution) L(x; θ) depending on a row vector θ = (θ1 , θ2 , . . . , θp ) of p unknown parameters. Let R denote some subset of all the possible realizations x = (x1 , x2 , . . . , xn ) of the random vector X. Then, R is the most powerful (or MP) rejection region of size α for testing the simple null hypothesis H0 : θ = θ0 versus the simple alternative hypothesis H1 : θ = θ1 if, for every subset A of all possible realizations x of X for which pr(X ∈ AH0 : θ = θ0 ) = α, we have pr(X ∈ RH0 : θ = θ0 ) = α and pr(X ∈ RH1 : θ = θ1 ) ≥ pr(X ∈ AH1 : θ = θ1 ). Given L(x; θ), the determination of the structure of the MP rejection region R of size α for testing H0 : θ = θ0 versus H1 : θ = θ1 can be made using the Neyman–Pearson Lemma (Neyman and Pearson, 1933).
Neyman–Pearson Lemma Let X = (X1 , X2 , . . . , Xn ) be a random row vector with likelihood function (or joint distribution) of known form L(x; θ) that depends on a row vector θ = (θ1 , θ2 , . . . , θp ) of p unknown parameters. Let R be a subset of all possible realizations x = (x1 , x2 , . . . , xn ) of X. Then, R is the most powerful (MP) rejection region of size α (and the associated test using R is the most powerful test of size α) for testing the simple null hypothesis H0 : θ = θ0 versus the simple alternative hypothesis H1 : θ = θ1 if, for some k > 0, the following three conditions are satisfied: L(x; θ0 ) χ2p,1−α . 5.1.3.2 Wald Test ˆ 0 χ2 ¯ when W . would reject H0 : θ = θ0 in favor of H1 : θ ∈ ω p,1−α
5.1.3.3
Score Test
With the row vector S(θ) defined as
∂ ln L(x; θ) ∂ ln L(x; θ) ∂ ln L(x; θ) , S(θ) = , ,..., ∂θ1 ∂θ2 ∂θp ˆ 0 < Sˆ < +∞, for testing H0 : θ = θ0 versus H1 : θ ∈ ω the score test statistic S, ¯ is defined as Sˆ = S(θ0 )I −1 (θ0 )S (θ0 ) when using expected information, and is defined as Sˆ = S(θ0 )I −1 (x; θ0 )S (θ0 )
312
Hypothesis Testing Theory
when using observed information. For the simple null hypothesis H0 : θ = θ0 , note that the computation of the value of Sˆ involves no parameter estimation. Under certain regularity conditions (e.g., see Rao, 1947), for large n and under H0 : θ = θ0 , Sˆ ∼χ ˙ 2p . Thus, for a score test of approximate size α, one ¯ when Sˆ > χ2 . would reject H0 : θ = θ0 in favor of H1 : θ ∈ ω p,1−α
For further discussion concerning likelihood ratio, Wald, and score tests, see Rao (1973). Example As an example, let X1 , X2 , . . . , Xn constitute a random sample of size n from the parent population pX (x; θ) = θx (1 − θ)1−x , x = 0, 1 and 0 < θ < 1. Consider test ing H0 : θ = θ0 versus H1 : θ = θ0 . Then, with θˆ = X¯ = n−1 ni=1 Xi , it can be shown that ¯ ¯ X 1 − X −2 ln λˆ = 2n X¯ ln + (1 − X¯ ) ln θ0 1 − θ0 that
ˆ = W
and that
Sˆ =
$
(X¯ − θ0 ) X¯ (1 − X¯ )/n
(X¯ − θ0 ) √ θ0 (1 − θ0 )/n
2 ,
2 .
This simple example highlights an important general difference between Wald tests and score tests. Wald tests use parameter variance estimates assuming that θ ∈ Ω is true (i.e., assuming no restrictions on the parameter space Ω), and score tests use parameter variance estimates assuming that θ ∈ ω (i.e., assuming that H0 is true).
5.1.4
Large Sample MLBased Methods for Testing the Composite Null Hypothesis H0 : θ ∈ ω versus the Composite Alternative Hypothesis ¯ H1 : θ ∈ ω
Let Ri (θ) = 0, i = 1, 2, . . . , r, represent r ( χ2r,1−α . 5.1.4.2 Wald Test Let the (1 × r) row vector R(θ) be defined as R(θ) = [R1 (θ), R2 (θ), . . . , Rr (θ)] . Also, let the (r × p) matrix T(θ) have (i, j) element equal to [∂Ri (θ)]/∂θj , i = 1, 2, . . . , r and j = 1, 2, . . . , p. And, let the (r × r) matrix Λ(θ) have the structure Λ(θ) = T(θ)I −1 (θ)T (θ) when using expected information, and have the structure Λ(x; θ) = T(θ)I −1 (x; θ)T (θ) when using observed information. ˆ 0 χ2 in favor of H1 : θ ∈ ω ¯ when W r,1−α . 5.1.4.3
Score Test
ˆ 0 < Sˆ < +∞, for testing H0 : θ ∈ ω versus H1 : θ ∈ ω The score test statistic S, ¯ is defined as Sˆ = S(θˆ ω )I −1 (θˆ ω )S (θˆ ω ) when using expected information, and is defined as Sˆ = S(θˆ ω )I −1 (x; θˆ ω )S (θˆ ω ) when using observed information. Under certain regularity conditions, for large n and under H0 : θ ∈ ω, Sˆ ∼χ ˙ 2r . Thus, for a score test of approximate size α, one would reject H0 : θ ∈ ω in favor of H1 : θ ∈ ω ¯ when Sˆ > χ2r,1−α . Example As an example, let X1 , X2 , . . . , Xn constitute a random sample of size n from a N(μ, σ2 ) parent population. Consider testing the composite null hypothesis H0 : μ = μ0 , 0 < σ2 < +∞, versus the composite alternative hypothesis H1 : μ = μ0 , 0 < σ2 < +∞. Note that this test is typically called a test of H0 : μ = μ0 versus H1 : μ = μ0 . It is straightforward to show that the vector θˆ of MLEs of μ and σ2 for the unrestricted parameter space Ω is equal to n−1 θˆ = (μ, ˆ σˆ 2 ) = X¯ , S2 , n where X¯ = n−1 ni=1 Xi and S 2 = (n − 1)−1 ni=1 (Xi − X¯ )2 . Then, it can be shown directly that −2 ln λˆ = n ln 1 + where Tn−1 =
2 Tn−1
(n − 1)
,
(X¯ − μ0 ) ∼ tn−1 under H0 : μ = μ0 ; √ S/ n
315
Exercises
thus, the likelihood ratio test is a function of the usual onesample t test in this simple situation. In this simple situation, the Wald test is also a function of the usual onesample t test since n 2 . ˆ = W Tn−1 n−1 In contrast, the score test statistic has the structure Sˆ =
(X¯ − μ0 ) √ σˆ ω / n
2 ,
where 2 = n−1 σˆ ω
n
(Xi − μ0 )2
i=1
is the estimator of σ2 under the null hypothesis H0 : μ = μ0 .
Although all three of these MLbased hypothesistesting methods (the likelihood ratio test, the Wald test, and the score test) are asymptotically equivalent, their use can lead to different conclusions in some actual dataanalysis scenarios. EXERCISES Exercise 5.1. Consider sampling from the parent population fX (x; θ) = θxθ−1 ,
0 < x < 1, θ > 0.
(a) Based on a random sample X1 of size n = 1 from this parent population, what is the power of the MP test of H0 : θ = 1 versus H1 : θ = 2 if α = pr(Type I error) = 0.05? (b) If X1 and X2 constitute a random sample of size n = 2 from this parent population, derive the exact structure of the rejection region of size α = 0.05 associated with the MP test of H0 : θ = 1 versus H1 : θ = 2. Specifically, find the numerical value of the dividing point kα between the rejection and nonrejection regions. Exercise 5.2. Let Y1 , Y2 , . . . , Yn constitute a random sample of size n from the parent density fY (y; θ) = (1 + θ)(y + θ)−2 ,
y > 1, θ > −1.
(a) Develop an explicit expression for the form of the MP rejection region R for testing H0 : θ = 0 versus H1 : θ = 1 when pr(Type I error) = α. (b) If n = 1 and α = 0.05, find the numerical value of the dividing point between the rejection and nonrejection regions for this MP test.
316
Hypothesis Testing Theory
(c) If, in fact, θ = 1, what is the exact numerical value of the power of this MP test of H0 : θ = 0 versus H1 : θ = 1 when α = 0.05 and n = 1? Exercise 5.3. Let Y1 , Y2 , . . . , Yn constitute a random sample of size n from a N(0, σ2 ) population. Develop the structure of the rejection region for a uniformly most powerful (UMP) test of H0 : σ2 = 1 versus H1 : σ2 > 1. Then, use this result to find a reasonable value for the smallest sample size (say, n∗ ) that is needed to provide a power of at least 0.80 for rejecting H0 in favor of H1 when α = 0.05 and when the actual value of σ2 is no smaller than 2.0 in value. Exercise 5.4. Let X1 , X2 , . . . , Xn constitute a random sample of size n from pX (x; θ1 ) = θx1 (1 − θ1 )1−x ,
x = 0, 1, and 0 < θ1 < 1;
and, let Y1 , Y2 , . . . , Yn constitute a random sample of size n from y
pY (y; θ2 ) = θ2 (1 − θ2 )1−y ,
y = 0, 1, and 0 < θ2 < 1.
(a) If n = 30, derive a reasonable numerical value for the power of a size α = 0.05 MP test of H0 : θ1 = θ2 = 0.50 versus H1 : θ1 = θ2 = 0.60. (b) Now, suppose that it is of interest to test H0 : θ1 = θ2 = θ0 (where θ0 is a specified constant, 0 < θ0 < 1) versus H1 : θ1 > θ2 at the α = 0.05 level using a test statistic ¯ − Y), ¯ where X ¯ = n−1 n Xi = n−1 Sx and Y¯ = that is an explicit function of (X i=1 n−1 ni=1 Yi = n−1 Sy . Provide a reasonable value for the smallest sample size (say, n∗ ) needed so that the power for testing H0 versus H1 is at least 0.90 when θ0 = 0.10 and when (θ1 − θ2 ) ≥ 0.20. Exercise 5.5. An epidemiologist gathers data (xi , Yi ) on each of n randomly chosen noncontiguous and demographically similar cities in the United States, where xi (i = 1, 2, . . . , n) is the known population size (in millions of people) in city i, and where Yi is the random variable denoting the number of people in city i with colon cancer. It is reasonable to assume that Yi (i = 1, 2, . . . , n) has a Poisson distribution with mean E(Yi ) = θxi , where θ(>0) is an unknown parameter, and that Y1 , Y2 , . . . , Yn are mutually independent random variables. (a) Using the available data (xi , Yi ), i = 1, 2, . . . , n, construct a UMP test of H0 : θ = 1 versus H1 : θ > 1. (b) If ni=1 xi = 0.82, what is the power of this UMP test for rejecting H0 : θ = 1 versus H1 : θ > 1 when the probability of a Type I error α = ˙ 0.05 and when, in reality, θ = 5? Exercise 5.6. For i = 1, 2, suppose that it is desired to select a random sample Xi1 , Xi2 , . . . , Xini of size ni from a N(μi , σi2 ) population, where μ1 and μ2 are unknown parameters and where σ12 and σ22 are known parameters. For testing H0 : μ1 = μ2 versus H1 : μ1 − μ2 = δ(> 0), the test statistic Z=
(X¯1 − X¯2 ) − 0 √ V
317
Exercises
is to be used, where X¯ i =
ni
Xij
j=1
σ2 for i = 1, 2, and V = 1 + n1
σ22 n2
.
(a) If the null hypothesis is to be rejected when Z > Z1−α , show that the two conditions pr(Type I error)= α and pr(Type II error)= β are simultaneously satisfied when V=
δ Z1−α + Z1−β
2 = θ,
say.
(b) Subject to the constraint V = θ, find (as a function of σ12 and σ22 ) that value of n1 /n2 which minimizes the total sample size N = (n1 + n2 ). Due to logistical constraints, suppose that it is only possible to select a total sample size of N = (n1 + n2 ) = 100. If N = 100, σ12 = 9, and σ22 = 4, find the appropriate values of n1 and n2 . (c) Again, subject to the constraint V = θ, develop expressions for n1 and n2 (in terms of θ, σ1 , and σ2 ) that will minimize the total sampling cost if the cost of selecting an observation from Population 1 is four times the cost of selecting an observation from Population 2. What are the specific sample sizes needed if σ1 = 5, σ2 = 4, α = 0.05, β = 0.10, and δ = 3? Exercise 5.7. Let X1 , X2 , . . . , Xn constitute a random sample of size n from the parent population fX (x; θ) = θ−1 ,
0 < x < θ,
where θ is an unknown parameter. Suppose that a statistician proposes the following test of H0 : θ = θ0 versus H1 : θ > θ0 : “reject H0 in favor of H1 if X(n) > c, where X(n) is the largest observation in the set X1 , X2 , . . . , Xn and where c is a specified positive constant.” (a) If θ0 = 12 , find that specified value of c, say c∗ , such that pr(Type I error) = α. Note that c∗ will be a function of both n and α. (b) If the true value of θ is actually 43 , find the smallest value of n (say, n∗ ) required so that the power of the statistician’s test is at least 0.98 when α = 0.05 and θ0 = 12 .
Exercise 5.8. For the ith of n independently selected busy intersections in a certain heavily populated U.S. city, the number Xi of automobile accidents in any given year is assumed to have a Poisson distribution with mean μi , i = 1, 2, . . . , n. It can be assumed that these n intersections are essentially the same with respect to the rate of traffic flow per day. It is of interest to test the null hypothesis H0 : μi = μ, i = 1, 2, . . . , n, versus the (unrestricted) alternative hypothesis H1 that the μi ’s are not necessarily all equal to one another (i.e., that they are completely free to vary in value). In other words, we wish to use the n mutually independent Poisson random variables X1 , X2 , . . . , Xn to assess whether or not the true average number of accidents in any given year is the
318
Hypothesis Testing Theory
same at each of the n intersections. Note that testing H0 versus H1 is equivalent to testing “homogeneity” versus “heterogeneity” among the μi ’s. ˆ for testing (a) Develop an explicit expression for the likelihood ratio statistic −2 ln(λ) H0 versus H1 . If, in a sample of n = 40 intersections in a particular year, there were 20 intersections each with a total of 5 accidents, 10 intersections each with a total of 6 accidents, and 10 intersections each with a total of 8 accidents, demonstrate that H0 is not rejected at the α = 0.05 level based on these data. (b) Based on the data and the hypothesis test results for part (a), construct what you deem to be an appropriate 95% CI for μ. Exercise 5.9. It is of interest to compare two cities (say, City 1 and City 2) with regard to their true rates (λ1 and λ2 , respectively) of primary medical care utilization, where these two rates are expressed in units of the number of outpatient doctor visits per personyear of community residence. For i = 1, 2, suppose that n adult residents are randomly selected from City i; further, suppose that the values of the two variables Xij and Lij are recorded for the jth person (j = 1, 2, . . . , n) in this random sample from City i, where Xij is the total number of outpatient doctor visits made by this person while residing in City i, and where Lij is the length of residency (in years) in City i for this person. Hence, for i = 1, 2, the data for City i consist of the n mutually independent pairs (Xi1 , Li1 ), (Xi2 , Li2 ), . . . , (Xin , Lin ). In what follows, it is to be assumed that the distribution of Xij is POI(Lij λi ), so that E(Xij ) = V(Xij ) = Lij λi . Furthermore, the Lij ’s are to be considered as fixed known constants. (a) Develop an explicit expression for the likelihood function for all 2n observations (n from City 1 and n from City 2), and find two statistics which are jointly sufficient for λ1 and λ2 . (b) Using the likelihood function in part (a), prove that the MLE of λi is n
j=1 λˆ i = n
Xij
j=1 Lij
,
i = 1, 2.
(c) Suppose that it is of interest to test the composite null hypothesis H0 : λ1 = λ2 (= λ, say) versus the composite alternative hypothesis H1 : λ1 = λ2 . Assuming that H0 is true, find the MLE λˆ of λ. (d) Develop an explicit expression for the likelihood ratio statistic which can be used to test H0 : λ1 = λ2 versus H1 : λ1 = λ2 . (e) Suppose that n = 25, λˆ 1 = 0.02, λˆ 2 = 0.03, n L1j = 200, and n L2j = 300. j=1
j=1
Use the likelihood ratio statistic developed in part (d) to test H0 : λ1 = λ2 versus H1 : λ1 = λ2 at the α = 0.10 level. What is the Pvalue of your test? Exercise 5.10. Suppose that X and Y are continuous random variables representing the survival times (in years) for patients following two different types of surgical procedures for the treatment of advanced colon cancer. Further, suppose that these survival time distributions are assumed to be of the form fX (x; α) = αe−αx ,
x > 0, α > 0 and fY (y; β) = βe−βy ,
y > 0, β > 0.
319
Exercises
Let X1 , X2 , . . . , Xn and Y1 , Y2 , . . . , Yn denote random samples of size n from fX (x; α) ¯ = n−1 n Xi and let Y¯ = n−1 n Yi . and fY (y; β), respectively. Also, let X i=1 i=1 (a) For the likelihood ratio test of H0 : α = β versus H1 : α = β, show that the likelihood ratio statistic λˆ can be written in the form λˆ = [4u(1 − u)]n , where u = x¯ /(¯x + y¯ ). (b) If n = 100, x¯ = 1.25 years, and y¯ = 0.75 years, use a Pvalue computation to decide whether or not to reject H0 in favor of H1 , and then interpret your finding with regard to these two surgical procedures for the treatment of advanced colon cancer. Exercise 5.11. The number X of speeding tickets issued to a typical teenage driver during a specified twoyear period in a certain community (say, Community #1) having mandatory teenage driver education classes is assumed to have the distribution pX (x; θ1 ) = θ1 (1 − θ1 )x ,
x = 0, 1, . . . , ∞; 0 < θ1 < 1.
The number Y of speeding tickets issued to a typical teenage driver during that same 2year period in another community with similar sociodemographic characteristics (say, Community #2), but not having mandatory teenage driver education classes, is assumed to have the distribution pY (y; θ2 ) = θ2 (1 − θ2 )y ,
y = 0, 1, . . . , ∞; 0 < θ2 < 1.
Let X1 , X2 , . . . , Xn constitute a random sample of size n from pX (x; θ1 ), and let x1 , x2 , . . . , xn denote the corresponding n realizations (i.e., the actual set of observed numbers of speeding tickets) for the set of n randomly chosen teenage drivers selected from Community #1. Further, let Y1 , Y2 , . . . , Yn constitute a random sample of size n from pY (y; θ2 ), with y1 , y2 , . . . , yn denoting the corresponding realizations. (a) Using the complete set of observed data {x1 , x2 , . . . , xn ; y1 , y2 , . . . , yn }, develop an explicit expression for the likelihood ratio test statistic λˆ for testing the null hypothesis H0 : θ1 = θ2 (= θ, say) versus the alternative hypothesis H1 : θ1 = θ2 . If n = 25, x¯ = 1.00, and y¯ = 2.00, is there sufficient evidence to reject H0 in favor of H1 at the α = 0.05 level of significance? (b) Using observed information, use the data in part (a) to compute the numerical value ˆ the score statistic for testing H0 versus H1 . How do the conclusions based on of S, the score test compare with those based on the likelihood ratio test? (c) A highway safety researcher contends that the data do suggest that the teenage driver education classes might actually be beneficial, and he suggests that increasing the sample size n might actually lead to a highly statistically significant conclusion that these mandatory teenage driver education classes do lower the risk of speeding by teenagers. Making use of the available data, comment on the reasonableness of this researcher’s contention. Exercise 5.12. Suppose that n randomly selected adult male hypertensive patients are administered a new blood pressure lowering drug during a clinical trial designed to assess the efficacy of this new drug for promoting longterm remission of high
320
Hypothesis Testing Theory
blood pressure. Further, once each patient’s blood pressure returns to a normal range, suppose that each patient is examined monthly to see if the hypertension returns. For the ith patient in the study, let xi denote the age of the patient at the start of the clinical trial, and let Yi be the random variable denoting the number of months of followup until the hypertension returns for the first time. It is reasonable to assume that Yi has the geometric distribution % & pYi yi ; θi = (1 − θi )yi −1 θi ,
yi = 1, 2, . . . , ∞, 0 < θi < 1 and i = 1, 2, . . . , n.
It is wellestablished that age is a risk factor for hypertension. To take into account the differing ages of the patients at the start of the trial, it is proposed that θi be expressed as the following function of age: θi = βxi /(1 + βxi ),
β > 0.
Given the n pairs (xi , yi ), i = 1, 2, . . . , n, of data points, the analysis goal is to obtain the MLE βˆ of β, and then to use βˆ to make statistical inferences about β. (a) Prove that the MLE βˆ of β satisfies the equation βˆ =
n −1 . ˆ i x y 1 + βx i i i=1
n
(b) Prove that the asymptotic variance of βˆ is β2 ˆ = . V(β) n −1 i=1 (1 + βxi ) (c) If the clinical trial involves 50 patients of age 30 and 50 patients of age 40 at the start of the trial, find a largesample 95% CI for β if βˆ = 0.50. (d) For the data in part (c), carry out a Wald test of H0 : β = 1 versus H1 : β = 1 using α = 0.05. Do you reject H0 or not? What is the Pvalue of your test? (e) To test H0 : β = 1 versus H1 : β > 1, consider the test statistic (βˆ − 1) , U= 2 ˆ V0 (β) where ˆ = V0 (β) n
1
i=1 (1 + xi )
−1
is the largesample variance of βˆ when H0 : β = 1 is true. Assuming that (βˆ − β) ∼ ˙ N(0, 1) 2 ˆ V(β)
321
Exercises
ˆ is given in part (b), and using the age data in part (c), for large n, where V(β) what is the approximate power of U to reject H0 : β = 1 in favor of H1 : β > 1 when α = 0.025 and when the true value of β is equal to 1.10? Exercise 5.13. A random sample of 1000 diseasefree heavy smokers is followed for a 20year period. At the end of this 20year followup period, it is found that exactly 100 of these 1000 heavy smokers developed lung cancer during the followup period. It is of interest to make statistical inferences about the population parameter ψ = θ/(1 − θ), where θ is the probability that a member of the population from which this random sample came develops lung cancer during this 20year followup period. The parameter ψ is the odds of developing lung cancer during this 20year followup period, namely, the ratio of the probability of developing lung cancer to the probability of not developing lung cancer over this 20year period. (a) Using the available numerical information, construct an appropriate 95% CI for the parameter ψ. (b) Carry out Wald and score tests of the null hypothesis H0 : ψ = 0.10 versus the alternative hypothesis H1 : ψ > 0.10. What are the Pvalues of these two tests? Interpret your findings. Exercise 5.14. An environmental scientist postulates that the distributions of the concentrations X and Y (in parts per million) of two air pollutants can be modeled as follows: the conditional density of Y, given X = x, is postulated to have the structure fY (yX = x; α, β) =
1 e−y/(α+β)x , (α + β)x
y > 0, x > 0, α > 0, β > 0;
and, the marginal density of X is postulated to have the structure fX (x; β) =
1 −x/β , e β
x > 0, β > 0.
Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) constitute a random sample of size n from the joint density fX,Y (x, y; α, β) of X and Y. (a) Derive explicit expressions for two statistics U1 and U2 that are jointly sufficient for α and β, and then prove that corr(U1 , U2 ) = 0. (b) Using the random sample (Xi , Yi ) i = 1, . . . , n, derive explicit expressions for the MLEs αˆ and βˆ of the unknown parameters α and β. Then, if n = 30, αˆ = 2, and βˆ = 1, find the Pvalue for a Wald test (based on expected information) of H0 : α = β versus H1 : α = β. Also, use the available data to compute an appropriate 95% CI for the parameter (α − β), and then comment on any numerical connection between the confidence interval result and the Pvalue. Exercise 5.15. For the ith of two large formaldehyde production facilities located in two different southern cities in the United States, the expected amount E(Yij ) in pounds of formaldehyde produced by a certain chemical reaction, expressed as a function of
322
Hypothesis Testing Theory
the amount xij (>0) in pounds of catalyst used to promote the reaction, is given by the equation E(Yij ) = βi xij2 ,
where βi > 0 and xij > 0,
i = 1, 2 and
j = 1, 2, . . . , n. Let (xi1 , Yi1 ), (xi2 , Yi2 ), . . . , (xin , Yin ) be n independent pairs of data points from the ith production facility, i = 1, 2. Assume that Yij has a negative exponential distribution with mean αij = E(Yij ) = βi xij2 , that the xij ’s are known constants, and that the Yij ’s are a set of 2n mutually independent random variables. (a) Provide an explicit expression for the joint distribution (i.e., the unconditional likelihood function) of the 2n Yij ’s, and then provide explicit expressions for two statistics that are jointly sufficient for β1 and β2 . (b) Under the stated assumptions given earlier, develop an explicit expression (using expected information) for the score statistic Sˆ for testing H0 : β1 = β2 versus H1 :β1 = β2 . In particular, show that Sˆ can be expressed solely as a function of n, βˆ 1 , and βˆ 2 , where βˆ 1 and βˆ 2 are the MLEs of β1 and β2 , respectively, in the unrestricted parameter space. If n = 25, βˆ 1 = 2, and βˆ 2 = 3, do you reject H0 in favor of H1 at the α = 0.05 level? Exercise 5.16. Consider a clinical trial involving two different treatments for Stage IV malignant melanoma, namely, Treatment A and Treatment B. Let X1 , X2 , . . . , Xn denote the mutually independent survival times (in months) for the n patients randomly assigned to Treatment A. As a statistical model, consider X1 , X2 , . . . , Xn to constitute a random sample of size n from fX (x; θ) = θ−1 e−x/θ ,
x > 0, θ > 0.
Further, let Y1 , Y2 , . . . , Yn denote the mutually independent survival times (in months) for the n patients randomly assigned to Treatment B. As a statistical model, consider Y1 , Y2 , . . . , Yn to constitute a random sample of size n from fY (y; λ, θ) = (λθ)−1 e−y/λθ ,
y > 0, λ > 0, θ > 0.
Clearly, E(X) = θ and E(Y) = λθ, so that statistical inferences about the parameter λ can be used to decide whether or not the available data provide evidence of a difference in true average survival times for Treatment A and Treatment B. (a) Find explicit expressions for statistics that are jointly sufficient for making statistical inferences about the unknown parameters λ and θ. ˆ the MLEs of the unknown parameters λ (b) Derive explicit expressions for λˆ and θ, and θ. (c) Using expected information, derive an explicit expression for the score statistic Sˆ for testing H0 : λ = 1 versus H1 : λ = 1. Also, show directly how a variance
323
Exercises
ˆ For a particular data estimated under H0 enters into the explicit expression for S. n n −1 −1 set where n = 50, x¯ = n i=1 xi = 30, and y¯ = n i=1 yi = 40, what is the approximate Pvalue when the score statistic Sˆ is used to test H0 versus H1 ? Exercise 5.17. An oncologist reasons that the survival time X (in years) for advancedstage colorectal cancer follows an exponential distribution with unknown parameter λ; that is, fX (xλ) = λe−λx ,
x > 0, λ > 0.
Although this oncologist does not know the exact value of λ, she is willing to assume a priori that λ also follows an exponential distribution with known parameter β, namely, π(λ) = βe−βλ ,
λ > 0, β > 0.
In the Bayesian paradigm, fX (xλ) is called the likelihood function, and π(λ) is called the prior distribution of λ (i.e., the distribution assigned to λ before observing a value for X). (a) Find the marginal distribution fX (x) of X (i.e., the distribution of X averaged over all possible values of λ). (b) Find the posterior distribution π(λX = x) of λ. (c) A Bayesian measure of evidence against a null hypothesis (H0 ), and in favor of an alternative hypothesis (H1 ), is the Bayes Factor, denoted BF10 . In particular, BF10 =
pr(H1 X = x)pr(H0 ) pr(H1 X = x)/pr(H0 X = x) = , pr(H1 )/pr(H0 ) pr(H0 X = x)pr(H1 )
where pr(Hk ) and pr(Hk X = x) denote, respectively, the prior and posterior probabilities of hypothesis Hk , k = 0, 1. Hence, BF10 is the ratio of the posterior odds of H1 to the prior odds of H1 . According to Kass and Raftery (1995), 1 < BF10 ≤ 3 provides “weak” evidence in favor of H1 , 3 < BF10 ≤ 20 provides “positive” evidence in favor of H1 , 20 < BF10 ≤ 150 provides “strong” evidence in favor of H1 , and BF10 > 150 provides “very strong” evidence in favor of H1 . If β = 1 and x = 3, what is the Bayes factor for testing H0 : λ > 1 versus H1 : λ ≤ 1? Using the scale proposed by Kass and Raftery (1995), what is the strength of evidence in favor of H1 ? Exercise 5.18∗ . A controlled clinical trial was designed to compare the survival times (in years) of HIV patients receiving once daily dosing of the new drug Epzicom (a combination of 600 mg of Ziagen and 300 mg of Epivir) to the survival times (in years) of HIV patients receiving once daily dosing of the new drug Truvada (a combination of 300 mg of Viread and 200 mg of Emtriva). Randomly chosen HIV patients were paired together based on the values of several important factors, including age, current HIV levels, general health status, and so on. Then, one member of each pair was randomly selected to receive Epzicom, with the other member then receiving Truvada. For the ith pair, i = 1, 2, . . . , n, let Xi denote the survival time of the patient receiving Epzicom,
324
Hypothesis Testing Theory
and let Yi denote the survival time of the patient receiving Truvada. Further, assume that Xi and Yi are independent random variables with respective distributions fXi (xi ) = (θφi )−1 e−xi /θφi ,
xi > 0,
and −yi /φi , fYi (yi ) = φ−1 i e
yi > 0.
Here, φi (>0) is a parameter pertaining to characteristics of the ith pair, and θ (>0) is the parameter reflecting any difference in true average survival times for the two drugs Epzicom and Truvada. Hence, the value θ = 1 indicates no difference between the two drugs with regard to average survival time. (a) Provide an explicit expression for the joint distribution (i.e., the likelihood) of the 2n random variables X1 , X2 , . . . , Xn and Y1 , Y2 , . . . , Yn . How many parameters would have to be estimated by the method of ML? Comment on this finding. (b) A consulting statistician points out that the only parameter of real interest is θ. She suggests that an alternative analysis be based just on the n ratios Ri = Xi /Yi , i = 1, 2, . . . , n. In particular, this statistician claims that the distributions of these ratios depend only on θ and not on the {φi }, and that the {φi } are socalled nuisance parameters (i.e., parameters that appear in assumed statistical models, but that are not of direct relevance to the particular research questions of interest). Prove that this statistician is correct by showing that fRi (ri ) =
θ , (θ + ri )2
0 < ri < +∞, i = 1, 2, . . . , n.
(c) Using the n mutually independent random variables R1 , R2 , . . . , Rn , it is of interest to test H0 : θ = 1 versus H1 : θ > 1 at the α = 0.025 level. What is the smallest sample size n∗ required so that the power of an appropriate largesample test is at least 0.80 when, in fact, the true value of θ is 1.50? Exercise 5.19∗ . In many important practical data analysis situations, the statistical models being used involve several parameters, only a few of which are relevant for directly addressing the research questions of interest. The irrelevant parameters, generally referred to as “nuisance parameters,” are typically employed to ensure that the statistical models make scientific sense, but are generally unimportant otherwise. One method for eliminating the need to estimate these nuisance parameters, and hence generally to improve both statistical validity and precision, is to employ a conditional inference approach, whereby a conditioning argument is used to produce a conditional likelihood function that only involves the relevant parameters. For an excellent discussion of methods of conditional inference, see McCullagh and Nelder (1989). As an example, suppose that it is of interest to evaluate whether current smokers tend to miss more days of work due to illness than do nonsmokers. For a certain manufacturing industry, suppose that n mutually independent matched pairs of workers, one a current smoker and one a nonsmoker, are formed, where the workers in each pair are chosen (i.e., are matched) to have the same set of general risk factors (e.g.,
325
Exercises
age, current health status, type of job, etc.) for illnessrelated work absences. These 2n workers are then followed for a year, and the number of days missed due to illness during that year is recorded for each worker. For the ith pair of workers, i = 1, 2, . . . , n, let Yij ∼ POI(φi λj ), j = 0, 1, where j = 0 pertains to the nonsmoking worker and where j = 1 pertains to the worker who currently smokes. Further, assume that Yi1 and Yi0 are independent random variables. It is of interest to test H0 : λ1 = λ0 versus H1 : λ1 > λ0 . If H0 is rejected in favor of H1 , then this finding would supply statistical evidence that current smokers, on average, tend to miss more days of work due to illness than do nonsmokers. The n parameters {φ1 , φ2 , . . . , φn } are parameters reflecting inherent differences across the matched pairs with regard to general risk factors for illnessrelated work absences, and these n nuisance parameters are not of primary interest. The statistical analysis goal is to use a conditional inference approach that eliminates the need to estimate these nuisance parameters and that still produces an appropriate statistical procedure for testing H0 versus H1 . (a) Develop an explicit expression for the conditional distribution pYi1 (yi1 Yi1 + Yi0 = Si = si ) of the random variable Yi1 given that (Yi1 + Yi0 ) = Si = si . (b) Use the result in part (a) to develop an appropriate MLbased largesample test of H0 versus H1 that is based on the parameter θ = λ1 /(λ0 + λ1 ). For n = 50, ni=1 si = 500, and ni=1 yi1 = 275, is there statistical evidence for rejecting H0 in favor of H1 ? Can you detect another advantage of this conditional inference procedure? Exercise 5.20∗ . Let X1 , X2 , . . . , Xn constitute a random sample of size n from a Poisson distribution with parameter λx . Furthermore, let Y1 , Y2 , . . . , Yn constitute a random sample of the same size n from a different Poisson population with parameter λy . (a) Use these 2n mutually independent observations to develop an explicit expression for the score test statistic Sˆ (based on expected information) for testing the null hypothesis H0 : λx = λy versus the alternative hypothesis H1 : λx = λy . Suppose that n = 30, x¯ = n−1 ni=1 xi = 8.00, and y¯ = n−1 ni=1 yi = 9.00; do you reject H0 ˆ or not using S? (b) Now, suppose that n = 1, so that only the independent observations X1 and Y1 are available. By considering the conditional distribution of X1 given that (X1 + Y1 ) = s1 , develop a method for testing the null hypothesis H0 : λy = δλx versus the alternative hypothesis H1 : λy > δλx , where δ (> 0) is a known constant. Suppose that δ = 0.60, x1 = 4, and y1 = 10. What is the exact Pvalue of your test of H0 versus H1 ? Exercise 5.21∗ . For older adults with symptoms of Alzheimer’s disease, the distribution of the time X (in hours) required to complete a verbal aptitude test designed to measure the severity of dementia is assumed to have the distribution fX (x) = 1, 0.50 ≤ θ < x < (θ + 1) < +∞. Let X1 , X2 , . . . , Xn constitute a random sample of size n(> 1) from fX (x). Further, define X(1) = min{X1 , X2 , . . . , Xn }
and
X(n) = max{X1 , X2 , . . . , Xn }.
326
Hypothesis Testing Theory
It is of interest to test H0 : θ = 1 versus H1 : θ > 1. Suppose that the following decision rule is proposed: reject H0 : θ = 1 in favor of H1 : θ > 1 if and only if the event A ∪ B occurs, where A is the event that X(1) > k, where B is the event that X(n) > 2, and where k is a positive constant. (a) Find a specific expression for k, say kα , such that this particular decision rule has a Type I error rate exactly equal to α, 0 < α ≤ 0.10. (b) Find the power function for this decision rule; in particular, consider the power of this decision rule for appropriately chosen disjoint sets of values of θ, 1 < θ < +∞. Exercise 5.22∗ . Suppose that Y11 , Y12 , . . . , Y1n constitute a set of n random variables representing the responses to a certain lung function test for n farmers living in the same small neighborhood located very near to a large hog farm in rural North Carolina. Since these n farmers live in the same small neighborhood and so experience roughly the same harmful levels of air pollution from hog waste, it is reasonable to believe that the responses to this lung function test for these n farmers will not be independent. In particular, assume that Y1j ∼ N(μ1 , σ2 ), j = 1, 2, . . . , n, and that corr(Y1j , Y1j ) = ρ (> 0) for every j = j , j = 1, 2, . . . , n and j = 1, 2, . . . , n. Similarly, suppose that Y21 , Y22 , . . . , Y2n constitute a set of n random variables representing responses to the same lung function test for n farmers living in a different small rural North Carolina neighborhood that experiences only minimal levels of air pollution from hog waste. In particular, assume that Y2j ∼ N(μ2 , σ2 ), j = 1, 2, . . . , n, and that corr(Y2j , Y2j ) = ρ (> 0) for every j = j , j = 1, 2, . . . , n and j = 1, 2, . . . , n. Further, assume that the parameters σ2 and ρ have known values, that the sets of random variables {Y1j }nj=1 and {Y2j }nj=1 are independent of each other, and that the two sample means Y¯ 1 = n Y1j and Y¯ 2 = n Y2j are each normally distributed. j=1
j=1
(a) Find E(Y¯ 1 − Y¯ 2 ) and develop an explicit expression for V(Y¯ 1 − Y¯ 2 ) that is a function of n, σ2 , and ρ. (b) Given the stated assumptions, provide a hypothesis testing procedure involving the standard normal distribution for testing H0 : μ1 = μ2 versus H1 : μ1 > μ2 using a Type I error rate of α = 0.05. (c) Now, suppose that an epidemiologist with minimal statistical training incorrectly ignores the positive intraneighborhood correlation among responses and thus uses a test (based on the standard normal distribution) which incorrectly involves the assumption that ρ = 0. If this incorrect test is based on an assumed Type I error rate of 0.05, and if n = 10, σ2 = 2, and ρ = 0.50, compute the exact numerical value of the actual Type I error rate associated with the use of this incorrect test. There is an important lesson to be learned here; what is it? Exercise 5.23∗ . The normally distributed random variables X1 , X2 , . . . , Xn are said to follow a firstorder autoregressive process when Xi = θXi−1 + i ,
i = 1, 2, . . . , n,
where X0 ≡ 0, where θ (−∞ < θ < ∞) is an unknown parameter, and where 1 , 2 , . . . , n are mutually independent N(0,1) random variables.
327
Exercises
(a) Determine the conditional density fX2 (x2 X1 = x1 ) of X2 given X1 = x1 . (b) Develop an explicit expression for fX1 ,X2 (x1 , x2 ), the joint density of X1 and X2 . (c) Let f∗ denote the joint density of X1 , X2 , . . . , Xn , where, in general, f∗ = fX1 (x1 )
n
fXi (xi X1 = x1 , X2 = x2 , . . . , Xi−1 = xi−1 ).
i=2
Using a sample (X1 , X2 , . . . , Xn ) from the joint density f∗ , show that a likelihood ratio test of H0 : θ = 0 versus H1 : θ = 0 can be expressed explicitly as a function of the statistic 2 n xi−1 xi
i=2
n−1 i=1
xi2
n xi2
.
i=1
For n = 30, if ni=2 xi−1 xi = 4, ni=1 xi2 = 15, and xn = 2, would you reject H0 : θ = 0 at the α = 0.05 level using this likelihood ratio test? Exercise 5.24∗ . For lifetime residents of rural areas in the United States, suppose that it is reasonable to assume that the distribution of the proportion X of a certain biomarker of benzene exposure in a cubic centimeter of blood taken from such a rural resident has a beta distribution with parameters α = θr and β = 1, namely, fX (x; θr ) = θr xθr −1 ,
0 < x < 1,
θr > 0.
Let X1 , X2 , . . . , Xn constitute a random sample of size n from fX (x; θr ). Analogously, for lifetime residents of United States urban areas, let the distribution of Y, the proportion of this same biomarker of benzene exposure in a cubic centimeter of blood taken from such an urban resident, be fY (y; θu ) = θu yθu −1 ,
0 < y < 1,
θu > 0.
Let Y1 , Y2 , . . . , Ym constitute a random sample of size m from fY (y; θu ). (a) Using all (n + m) available observations, find two statistics that are jointly sufficient for θr and θu . (b) Show that a likelihood ratio test of H0 : θr = θu (= θ, say) versus H1 : θr = θu can be based on the test statistic n i) i=1 ln(X #. W = "n m i=1 ln(Xi ) + i=1 ln(Yi ) (c) Find the exact distribution of the test statistic W under H0 : θr = θu (= θ, say), and then use this result to construct a likelihood ratio test of H0 : θr = θu (= θ, say) versus H1 : θr = θu with an exact Type I error rate of α = 0.10 when n = m = 2.
328
Hypothesis Testing Theory
Exercise 5.25∗ . For two states in the United States with very different distributions of risk factors for AIDS (say, Maine and California), suppose that the number Yij of new cases of AIDS in county j ( j = 1, 2, . . . , n) of state i (i = 1, 2) during a particular year is assumed to have the negative binomial distribution k+yij −1 yij θi (1 + θi )−(k+yij ) ,
pYij (yij ; θi ) = Ck−1
yij = 0, 1, . . . , ∞ and θi > 0;
here, θ1 and θ2 are unknown parameters, and k is a known positive constant. For i = 1, 2, let Yi1 , Yi2 , . . . , Yin denote n mutually independent random variables representing the numbers of new AIDS cases developing during this particular year in n randomly chosen nonadjacent counties in state i. It is desired to use the 2n mutually independent observations {Y11 , Y12 , . . . , Y1n } and {Y21 , Y22 , . . . , Y2n } to make statistical inferences about the unknown parameters θ1 and θ2 . (a) Using these 2n mutually independent observations, develop an explicit expression ˆ for testing the null hypothesis H0 : θ1 = for the likelihood ratio test statistic −2ln(λ) θ2 (= θ, say) versus the alternative : θ1 = θ2 . For n = 50 and k = 3, if hypothesis H1 the observed data are such that nj=1 y1j = 5 and nj=1 y2j = 10, use the likelihood ratio statistic to test H0 versus H1 at the α = 0.05 significance level. What is the Pvalue associated with this particular test? (b) Using the observed data information given in part (a), what is the numerical value of the score statistic Sˆ for testing H0 versus H1? Use observed information in your calculations. What is the Pvalue associated with the use of Sˆ for testing H0 versus H1 ? (c) For i = 1, 2, let Y¯ i = n−1 n Yij . A biostatistician suggests that a test of H0 : θ1 = j=1
θ2 versus H1 : θ1 = θ2 can be based on a test statistic, involving (Y¯ 1 − Y¯ 2 ), that is approximately N(0, 1) for large n under H0 . Develop the structure of such a largesample test statistic. For k = 3 and α = 0.05, if the true parameter values are θ1 = 2.0 and θ2 = 2.4, provide a reasonable value for the minimum value of n (say, n∗ ) so that the power of this largesample test is at least 0.80 for rejecting H0 in favor of H1 .
Exercise 5.26∗ . Consider an investigation in which each member of a random sample of patients contributes a pair of binary (0−1) outcomes, with the possible outcomes being (1,1), (1,0), (0,1), and (0,0). Data such as these arise when a binary outcome (e.g., the presence or absence of a particular symptom) is measured on the same patient under two different conditions or at two different time points. Interest focuses on statistically testing whether the marginal probability of the occurrence of the outcome of interest differs for the two conditions or time points. To statistically analyze such data appropriately, it is necessary to account for the statistical dependence between the two outcomes measured on the same patient. For a random sample of n patients, let the discrete random variables Y11 , Y10 , Y01 , and Y00 denote, respectively, the numbers of patients having the response patterns (1,1), (1,0), (0,1), and (0,0), where 1 denotes the presence of a particular symptom, 0 denotes the absence of that symptom, and the two outcome measurements are made before and after a particular therapeutic intervention. Assuming that patients respond independently of one another, the observed data {y11 , y10 , y01 , y00 } may
329
Solutions
be assumed to arise from a multinomial distribution with corresponding probabil ities {π11 , π10 , π01 , π00 }, where 1i=0 1j=0 πij = 1. Note that the random variable (Y11 + Y10 ) is the number of patients who have the symptom prior to the intervention, and that the random variable (Y11 + Y01 ) is the number of patients who have the symptom after the intervention. Let δ = (π11 + π10 ) − (π11 + π01 ) = (π10 − π01 ),
−1 < δ < 1,
denote the difference in the probabilities of having the symptom before and after the intervention. Interest focuses on testing H0 : δ = 0 versus H1 : δ = 0. (a) Given observed counts y11 , y10 , y01 , and y00 , develop an explicit expression for the MLE δˆ of δ. (b) Using expected information, derive an explicit expression for the Wald chisquared test statistic for testing H0 : δ = 0 versus H1 : δ = 0. What is the Pvalue of the Wald chisquared test if y11 = 22, y10 = 3, y01 = 7, and y00 = 13? (c) For testing H0 : δ = 0 versus H1 : δ = 0, the testing procedure known as McNemar’s Test is based on the test statistic QM =
(Y01 − Y10 )2 . (Y01 + Y10 )
Under H0 , the statistic QM follows an asymptotic χ21 distribution, and so a twosided test at the 0.05 significance level rejects H0 in favor of H1 when QM > χ21,0.95 . Prove that McNemar’s test statistic is identical to the score test statistic used to test H0 : δ = 0 versus H1 : δ = 0. Also, show that the Wald chisquared statistic is always at least as large in value as the score chisquared statistic. (d) For the study in question, the investigators plan to enroll patients until (y10 + y01 ) is equal to 10. Suppose that these investigators decide to reject H0 if QM > χ21,0.95
and decide not to reject H0 if QM ≤ χ21,0.95 . What is the exact probability (i.e., the power) that H0 will be rejected if π11 = 0.80, π10 = 0.10, π01 = 0.05, and π00 = 0.05?
SOLUTIONS Solution 5.1 (a) To find the form of the MP rejection region, we need to employ the Neyman– Pearson Lemma. %n &θ−1 . Now, with x = (x1 , x2 , . . . , xn ), we have L(x; θ) = ni=1 (θxiθ−1 ) = θn i=1 xi In particular, for n = 1, L(x; θ) = θx1θ−1 . So,
(1)x11−1 L(x; 1) = (2x1 )−1 ≤ k. = L(x; 2) (2)x12−1 Thus, x1 ≥ kα is the form of the MP rejection region.
330
Hypothesis Testing Theory
Under H0 : θ = 1, fX1 (x1 ; 1) = 1, 0 < x1 < 1, so that kα = 0.95; i.e., we reject H0 if x1 > 0.95. So, 1 POWER = pr{X1 > 0.95θ = 2} = 2x12−1 dx1 = 0.0975. 0.95
(b) For n = 2, L(x; θ) = θ2 (x1 x2 )θ−1 . So, L(x; 1) 1 = ≤ k. L(x; 2) 4x1 x2 Thus, x1 x2 ≥ kα is the form of the MP rejection region. Under H0 : θ = 1, fX1 ,X2 (x1 , x2 ; 1) = fX1 (x1 ; 1)fX2 (x2 ; 1) = (1)(1) = 1, 0 < x1 < 1,
0 < x2 < 1.
So, we need to pick kα such that pr[(X1 , X2 ) ∈ RH0 : θ = 1] = 0.05. In other words, we must choose kα such that 1 1 kα kα /x1
(1) dx2 dx1 = 0.05 ⇒ [x1 − kα ln x1 ]1k = 0.05 α
˙ 0.70. ⇒ 1 − [kα − kα ln kα ] = 0.05 ⇒ kα ≈ Solution 5.2 (a) With y = (y1 , y2 , . . . , yn ), L(y; θ) = (1 + θ)n
n
(yi + θ)−2 .
i=1
The MP rejection region has the form n n −2 yi + 1 2 L(y; 0) −n i=1 (yi + 0) = n = 2 ≤k n −2 L(y; 1) yi 2 i=1 (yi + 1) i=1
or, equivalently, n
(1 + yi−1 )2 ≤ 2n k.
i=1
So, R=
⎧ ⎨ (y , y , . . . , yn ) : ⎩ 1 2
n i=1
⎫ ⎬
(1 + yi−1 )2 ≤ kα , ⎭
where kα is chosen so that pr{(Y1 , Y2 , . . . , Yn ) ∈ RH0 : θ = 0} = α.
331
Solutions
(b) If n = 1, we need to find kα such that pr
1 + Y1−1
2
< kα H0 : θ = 0 = 0.05.
Since y1 > 1, we have kα > 1, so that pr
1 + Y1−1
2
< kα H0 : θ = 0
$ = pr 1 + Y1−1 < kα H0 : θ = 0
$ = pr{Y1 > ( kα − 1)−1 H0 : θ = 0} ∞ y1−2 dy1 = √ ( kα −1)−1
+∞ = −y1−1 √
( kα −1)−1
$ = ( kα − 1) = 0.05, √ so that kα = (1.05)2 = 1.1025, or kα = ( kα − 1)−1 = 1/0.05 = 20. (c) POWER = pr(Y1 > 20θ = 1) ∞
∞ 2(y1 + 1)−2 dy1 = 2 −(y1 + 1)−1 = 20
20
=
2 = 0.0952. 21
The power is very small because n = 1. Solution 5.3. For any particular σ12 > 1, the optimal rejection region for a most powerful > (MP) test of H0 : σ2 = 1 versus H1 : σ2 = σ12 has the form L(y; 1) L(y; σ12 ) ≤ k, where y = (y1 , y2 , . . . , yn ). Since L(y; σ) =
n i=1
√
1 2πσ
>
2 2 e −yi 2σ
2 = (2π)−n/2 (σ2 )−n/2 e−(1/2σ )
n
2 i=1 yi ,
the optimal rejection region has the structure L(y; 1)
= L(y; σ12 )
1
(2π)−n/2 e− 2
n
2 i=1 yi
− 12 (2π)−n/2 (σ12 )−n/2 e 2σ1
n
2 i=1 yi
= (σ12 )n/2 e
1 −1 2σ2 2 1
n
2 i=1 yi
≤ k.
Since 1/2σ12 − 1/2 < 0 when σ12 > 1, the MP test rejects when ni=1 yi2 is large, that is when ni=1 yi2 ≥ k for some appropriately chosen k . Because we obtain the
332
Hypothesis Testing Theory
same optimal rejection region for all σ12 > 1, we have a UMP test. Under H0 : σ2 = 1, n 2 2 critical value k for an αlevel test is k = χ2n,1−α i=1 Yi ∼ χn ; so, the appropriate n 2 ≥ χ2 2 = 1 = α. Now, because pr Y : σ H 0 i=1 i n,1−α ⎤ Yi2 ≥ χ2n,0.95 σ2 = 2⎦ POWER = pr ⎣ i=1 ⎤ ⎡ 2 χ2 n 2 Y n,0.95 i σ = 2⎦ ≥ = pr ⎣ √ 2 2 i=1 ⎡
n
= pr χ2n ≥
χ2n,0.95 2
,
Y since √i ∼ N(0, 1) when σ2 = 2. 2
We want to find the smallest n (say, n∗ ) such that this probability is at least 0.80. By inspection of chisquare tables, we find n∗ = 25. Also, by the Central Limit Theorem, √ since Zi = Yi / 2 ∼ N(0, 1) when σ2 = 2, ⎡ ⎤ n n χ2n,0.95 χ2n,0.95 /2 − n Z2i − n i=1 2 ⎦ = pr POWER = pr ⎣ Zi ≥ ≥ √ √ 2 2n 2n i=1
≈ pr Z ≥
χ2n,0.95 /2 − n , √ 2n
where E(Z2i ) = 1, V(Z2i ) = 2, and Z ∼ ˙ N(0, 1) for large n. Since Z0.20 = −0.842, POWER ≥ 0.80 when (χ2n,0.95 /2) − n ≤ −0.842, √ 2n √ # " or, equivalently, when χ2n,0.95 ≤ 2 n − 0.842 2n , which is satisfied by a minimum value of n∗ = 25. Solution 5.4. (a) With x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ), we have
L(x, y; θ1 , θ2 ) =
n n y x θ1i (1 − θ1 )1−xi · θ2i (1 − θ2 )1−yi i=1
i=1
sy = θs1x (1 − θ1 )n−sx θ2 (1 − θ2 )n−sy ,
333
Solutions
where sx =
n
i=1 xi and sy =
n
i=1 yi . So, using the Neyman–Pearson Lemma,
(0.50)sx (0.50)n−sx (0.50)sy (0.50)n−sy L(x, y; 0.50, 0.50) = L(x, y; 0.60, 0.60) (0.60)sx (0.40)n−sx (0.60)sy (0.40)n−sy (0.50)2n
=
(sx +sy )
(0.60) (0.40)2n−(sx +sy ) (sx +sy ) 2n 2 5 = ≤ k =⇒ (sx + sy ) ≥ k
3 4 is the structure of the MP region. When θ1 = θ2 = θ, ∼ BIN(2n, θ). So, by the Central Limit Theorem, under H0 : θ = 12
S = (Sx + Sy )
S−n ∼ ˙ N(0, 1) √ n/2 for large n. So, for a size α = 0.05 test of H0 versus H1 , S−n POWER = pr √ > 1.645 H1 n/2 / . 1 n + n H1 = pr S > 1.645 2
1 ⎫ ⎧ n ⎪ ⎪ ⎪ + n − 2n(0.60) ⎪ 1.645 ⎬ ⎨ S − 2n(0.60) 2 H1 > = pr √ √ ⎪ 2n(0.60)(0.40) ⎪ ⎪ ⎪ ⎩ 2n(0.60)(0.40) ⎭ 1 ⎫ ⎧ ⎪ 30 ⎪ ⎪ + 30 − 2(30)(0.60) ⎪ 1.645 ⎬ ⎨ 2 H1 ˙ pr Z > ≈ √ ⎪ 2(30)(0.60)(0.40) ⎪ ⎪ ⎪ ⎩ ⎭ where Z ∼ N(0, 1). So ˙ pr(Z > 0.0978) = 1 − Φ(0.0978) = 0.46. POWER ≈ (b) Under H0 : θ1 = θ2 = θ0 ,
1
¯ − Y) ¯ (X 2θ0 (1 − θ0 ) n
∼ ˙ N(0, 1)
334
Hypothesis Testing Theory
for reasonably large n. So, ⎧ ⎪ ⎪ ⎨
⎫ ⎪ ⎪ ⎬ ¯ − Y) ¯ (X POWER = pr 1 > 1.645 (θ1 − θ2 ) ≥ 0.20 ⎪ ⎪ ⎪ ⎪ ⎩ 2θ0 (1 − θ0 ) ⎭ n ⎫ ⎧ ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ ¯ − Y) ¯ (X > 1.645 (θ1 − θ2 ) = 0.20 ≥ pr 1 ⎪ ⎪ 2θ0 (1 − θ0 ) ⎪ ⎪ ⎭ ⎩ n / . 1 2θ (1 − θ ) 0 0 ¯ − Y) ¯ > 1.645 = pr (X (θ1 − θ2 ) = 0.20 n 1 ⎫ ⎧ 2θ0 (1 − θ0 ) ⎪ ⎪ ⎪ ⎪ − 0.20 1.645 ⎬ ⎨ ¯ − Y) ¯ − 0.20 (X n > 1 = pr 1 ⎪ θ1 (1 − θ1 ) θ2 (1 − θ2 ) ⎪ ⎪ ⎪ ⎭ ⎩ θ1 (1 − θ1 ) + θ2 (1 − θ2 ) + n n n n 1 ⎫ ⎧ 2θ0 (1 − θ0 ) ⎪ ⎪ ⎪ − 0.20 ⎪ 1.645 ⎬ ⎨ n ˙ , ≈ pr Z > 1 ⎪ θ1 (1 − θ1 ) θ2 (1 − θ2 ) ⎪ ⎪ ⎪ ⎭ ⎩ + n n where Z ∼ N(0, 1). So, for POWER ≥ 0.90, we require $ √ 1.645 2θ0 (1 − θ0 ) − 0.20 n ≤ −1.282, $ θ1 (1 − θ1 ) + θ2 (1 − θ2 ) or n≥
2 $ $ 1.645 2θ0 (1 − θ0 ) + 1.282 θ1 (1 − θ1 ) + θ2 (1 − θ2 ) . 0.20
Now, given that θ1 = (θ2 + 0.20), the quantity [θ1 (1 − θ1 ) + θ2 (1 − θ2 )] is maximized at θ1 = 0.60 and θ2 = 0.40. So, for θ0 = 0.10 and to cover all (θ1 , θ2 ) values, choose √ √
2 1.645 2(0.10)(0.90) + 1.282 0.60(0.40) + 0.40(0.60) = 62.88; n≥ 0.20
so, n∗ = 63. Solution 5.5 (a) Consider the simple null hypothesis H0 : θ = 1 versus the simple alternative hypothesis H1 : θ = θ1 (> 1), where θ1 is any specific value of θ greater than 1.
335
Solutions
Then, from the Neyman–Pearson Lemma and with y = (y1 , y2 , . . . , yn ), the form of the rejection region for a MP test is based on the inequality L(y; 1) ≤ k, L(y; θ1 ) or
> %n & yi n − ni=1 xi i=1 xi e i=1 yi ! n > %n & ≤ k, y n yi −θ1 ni=1 xi θ1 i=1 i i=1 xi e i=1 yi !
or −
θ1
n
i=1 yi
e(θ1 −1)
n
i=1 xi
≤ k,
or ⎛ ⎝−
n
⎞ yi ⎠ (ln θ1 ) ≤ k ,
i=1
or n
yi ≥ k
, since ln θ1 > 0.
i=1
So, the MP rejection region R = {S : S ≥ kα }, where S = ni=1 Yi . Since S is a discrete random variable, kα is a positive integer chosen so that pr{S ≥ kα H0 : θ = 1}=α. ˙ Since this same form of rejection region is obtained for any value θ1 > 1, R is the UMP region for a test of H0 : θ = 1 versus H1 : θ > 1. (b) Since the test statistic is S = ni=1 Yi , we need to know the distribution of S. Since Yi ∼ POI(θxi ), i = 1, 2, . . . , n, and since the {Yi } are mutually independent, MS (t) = E[etS ] = E[et
⎡
n
i=1 Yi ] = E ⎣
n
⎤ etYi ⎦ =
i=1
=
n
[eθxi (e −1) ] = e(θ t
n
t i=1 xi )(e −1)
n
[E(etYi )]
i=1
= e0.82θ(e −1) , t
i=1
so that S ∼ POI(0.82θ). So, under H0 : θ = 1, S ∼ POI(0.82). So, we need to find k.05 such that pr(S ≥ k.05 θ = 1) = 1 −
−1) (k.05 s=0
(0.82)s e−0.82 = ˙ 0.05, s!
336
Hypothesis Testing Theory
or such that (k.05 −1) s=0
(0.82)s e−0.82 = ˙ 0.95. s!
˙ 0.05, we reject H0 : θ = 1 in favor of H1 : θ > By trialanderror, k.05 = 3. So, for α = 1 when S = ni=1 Yi ≥ 3. Now, when θ = 5, S ∼ POI(4.10), so that POWER = pr(S ≥ 3θ = 5) = 1 − pr(S < 3θ = 5) =1−
2 (4.10)s e−4.10 s=0
s!
= 1 − 0.2238 = 0.7762.
Solution 5.6. (a) Now, with k a constant, k ¯ ¯ α = pr[(X1 − X2 ) > kH0 ] = pr Z > √ H0 , V √ √ so that we require k/ V = Z1−α , or k = VZ1−α . And, δ (k − δ) ¯ ¯ (1 − β) = pr[(X1 − X2 ) > kH1 ] = pr Z − √ > √ H1 , V V which, since Z − √δ ∼ N(0,1) when H1 is true, requires that V
(k − δ) = −Z1−β , √ V Finally, the equation
√
or
√ k = − VZ1−β + δ.
√ VZ1−α = − VZ1−β + δ
gives the requirement V = θ. (b) Since the goal is to minimize N = (n1 + n2 ) with respect to n1 and n2 , subject to the constraint V = θ, consider the function Q = (n1 + n2 ) + λ
σ12 n1
+
σ22 n2
−θ ,
where λ is a Lagrange multiplier. Then, simultaneously solving the two equations λσ2 ∂Q = 1 − 21 = 0 ∂n1 n1
and
λσ2 ∂Q = 1 − 22 = 0 ∂n2 n2
337
Solutions
gives n21
σ12 = n22 σ22
n1 σ = 1. n2 σ2
or
Finally, if N = 100, σ12 = 9, and σ22 = 4, then n1 /n2 = 1.5; then, the equation (n1 + n2 ) = (1.5n2 + n2 ) = 2.5n2 = 100 gives n2 = 40 and n1 = 60. (c) Let C denote the cost of selecting an observation from Population 2, so that the total sampling cost is (4Cn1 + Cn2 ) = C(4n1 + n2 ). So, we want to minimize the function C(4n1 + n2 ) with respect to n1 and n2 , subject to the constraint V = θ. Again using Lagrange multipliers, if Q = C(4n1 + n2 ) + λ
σ2 + 2 −θ , n1 n2 σ12
then the equation 4Cn21
∂Q =0 ∂n1
gives
λ=
∂Q =0 ∂n2
gives
λ=
σ12
and the equation Cn22 σ22
,
implying that n1 /n2 = σ1 /2σ2 . And, since the equation
∂Q =0 ∂λ
gives
so that n1 = we obtain n1 =
V=
+
θ
θ
n2 =
n1
σ22
2σ2 σ1
n1 =
=
= θ,
n2
σ12 + (n1 /n2 ) σ22
σ12 + (σ1 /2σ2 ) σ22
and
σ12
,
σ12 + (σ1 σ2 )/2 θ
(2σ1 σ2 + σ22 ) θ
.
Then, with σ1 = 5, σ2 = 4, α = 0.05, β = 0.10, and δ = 3, then Z1−α = Z0.95 = 1.645, Z1−β = Z0.90 = 1.282, and V = (3)2 /(1.645 + 1.282)2 = 1.0505. Using these
338
Hypothesis Testing Theory
values, we obtain n1 = 33.3175 and n2 = 53.3079; in practice, one would use n1 = 34 and n2 = 54. Solution 5.7 (a) Note that the CDF of X is FX (x) = pr(X ≤ x) =
x 0
θ−1 dt =
x , θ
0 < x < θ.
Hence, it follows that
1 α = pr(Type I error) = pr X(n) > c∗ H0 : θ = 2 ⎫ ⎧ n ⎨ 1⎬ (Xi ≤ c∗ )H0 : θ = = 1 − pr ⎩ 2⎭ i=1
=1−
n i=1
1 pr Xi ≤ c∗ H0 : θ = 2 ⎤n
⎡
⎢ c∗ ⎥ ∗ n ⎥ =1−⎢ ⎣ 1 ⎦ = 1 − (2c ) . 2 ⇒
(2c∗ )n = (1 − α) ⇒
c∗ =
(1 − α)1/n . 2
For 0 < α < 1, note that 0 < c∗ <
1 . 2
(b) When α = 0.05, c∗ = (0.95)1/n /2. So, 3 . 0.98 ≤ POWER = pr X(n) > c∗ θ = 4 ⎫ ⎧ n ⎬ ⎨ 3 (Xi ≤ c∗ )θ = . = 1 − pr ⎩ 4⎭ i=1
=1−
n i=1
/ 3 (0.95)1/n pr Xi ≤ θ = 4 2 .
⎤n 1/n ⎢ (0.95) /2 ⎥ ⎡
=1−⎢ ⎣
3 4
⎥ ⎦
339
Solutions n 2 3 n 2 −0.02 ≤ −(0.95) 3 n 2 ≤ 0.0211 ⇒ n∗ = 10. 3 = 1 − (0.95)
⇒ ⇒ Solution 5.8
(a) Under H1 , and with x = (x1 , x2 , . . . , xn ) and μ = (μ1 , μ2 , . . . , μn ), the (unrestricted) likelihood and loglikelihood functions are ⎞−1 ⎞ ⎛ ⎛ n x n n n x μi i e−μi − μ i i=1 i ⎝ = ⎝ μi ⎠ e xi !⎠ L(x; μ) = xi ! i=1
i=1
i=1
and n
ln L(x; μ) =
xi ln μi −
i=1
n
μi −
i=1
n
ln xi !.
i=1
Solving x ∂ ln L(x; μ) = i −1=0 ∂μi μi ˆ = x, we have yields the (unrestricted) MLEs μ ˆ i = xi , i = 1, 2, . . . , n. Thus, with μ ⎞−1 ⎞ ⎛ n n n x ˆ = ⎝ xi i ⎠ e− i=1 xi ⎝ xi !⎠ . L(x; μ) ⎛
i=1
i=1
Under H0 , the likelihood and loglikelihood functions are ⎛
n
L(x; μ) = μ
i=1 xi e−nμ ⎝
n
⎞−1 xi !⎠
i=1
and ⎞ ⎛ n n xi ⎠ ln μ − nμ − ln xi !. ln L(x; μ) = ⎝ i=1
i=1
Solving ∂ ln L(x; μ) = ∂μ
n
i=1 xi − n = 0
μ
340
Hypothesis Testing Theory
yields the (restricted) MLE μ ˆ = x¯ = n−1
n
i=1 xi . Thus,
⎛ L(x; μ) ˆ = (¯x)n¯x e−n¯x ⎝
n
⎞−1 xi !⎠
.
i=1
So, the likelihood ratio statistic is λˆ =
&−1 %n (¯x)n¯x e−n¯x L(x; μ) ˆ (¯x)n¯x i=1 xi ! . = = % & −1 x x n n n ˆ L(x; μ) x i x i e−n¯x x! i=1 i
i=1 i
i=1 i
So, ln λˆ = (n¯x) ln x¯ −
n
n n xi ln xi = (ln x¯ ) xi − xi ln xi
i=1
=
n
i=1
xi (ln x¯ − ln xi ) =
i=1
n i=1
i=1
x¯ xi ln , xi
so that −2 ln λˆ = 2
n
xi ln
i=1
x i
x¯
.
Under H0 : μ1 = μ2 = · · · = μn , −2 ln λˆ ∼χ ˙ 2(n−1) for large n. For the given data set, x¯ =
240 20(5) + 10(6) + 10(8) = = 6, 40 40
so that 5 6 8 ˆ −2 ln λ = 2 20 5 ln + 10 6 ln + 10 8 ln 6 6 6 = 2[100(−0.1823) + 0 + 80(0.2877)] = 2(23.015 − 18.230) = 9.570. Since χ20.95,39 > 50, we do not reject H0 . (b) Based on the results in part (a), there is no evidence to reject H0 . Hence, L(x; μ) is the appropriate likelihood to use. Since ∂ ln L(x; μ) = ∂μ
n
i=1 xi − n = n
μ
x¯ n − 1 = (¯x − μ), μ μ
¯ is the MVBUE of μ. Hence, a CI it follows from exponential family theory that X ¯ would be an appropriate choice. based on X
341
Solutions
From ML theory (or Central Limit Theorem theory), ¯ −μ ¯ −μ X X = √ ∼ ˙ N(0, 1) $ ¯ μ/n V(X) ¯ is consistent for μ, by Slutsky’s Theorem, for large n. Since X ¯ −μ X ∼ ˙ N(0, 1) 2 ¯ X/n ¯ ± for large X 2 n. Thus, an appropriate largesample 100(1 − α)% CI for μ is: 2 6 = ¯ For the given data set, and for α = 0.05, we have 6 ± 1.96 40 Z1−α/2 X/n. 6 ± 0.759, giving (5.241, 6.759) as the computed 95% CI for μ. Solution 5.9 (a) With x = (x11 , x12 , . . . , x1n ; x21 , x22 , . . . , x2n ), L(x; λ1 , λ2 ) =
n 2
.
(Lij λi )xij e−Lij λi xij !
i=1 j=1
⎛ =⎝
n 2
/
⎞
n
j=1 x1j
(xij !)−1 ⎠ λ1
n
λ2
j=1 x2j
i=1 j=1
×
e
⎛
⎞ n 2 xij ⎝ Lij ⎠ i=1 j=1
−λ1
n
j=1 L1j e−λ2
n
j=1 L2j
n n n n j=1 x1j j=1 x2j −λ1 j=1 L1j e−λ2 j=1 L2j λ2 e = λ1 ⎧⎛ ⎞ ⎛ ⎞⎫ n n 2 2 ⎬ ⎨ x ij (xij !)−1 ⎠ · ⎝ Lij ⎠ , × ⎝ ⎭ ⎩ i=1 j=1
i=1 j=1
so nj=1 X1j and nj=1 X2j are jointly sufficient for λ1 and λ2 by the Factorization Theorem. (b) ⎛ ⎛ ⎞ ⎞ n n x1j ⎠ ln λ1 + ⎝ x2j ⎠ ln λ2 ln L(x; λ1 , λ2 ) = constant + ⎝ j=1
− λ1
n j=1
L1j − λ2
j=1 n j=1
L2j .
342
Hypothesis Testing Theory
Solving for λi in the equation n
j=1 xij
∂ ln L(x; λ1 , λ2 ) = ∂λi
−
λi
n
Lij = 0
j=1
yields the MLE λˆ i =
n
= n
Xij
j=1
i = 1, 2.
Lij ,
j=1
(c) Under H0 : λ1 = λ2 (= λ, say), ⎛ ⎞ ⎛ ⎞ n n n n ln L(x; λ) = constant + ⎝ x1j + x2j ⎠ ln λ − λ ⎝ L1j + L2j ⎠ . j=1
j=1
j=1
j=1
Solving the equation
n n j=1 x1j + j=1 x2j
∂ ln L(x; λ) = ∂λ
λ
⎛ ⎞ n n −⎝ L1j + L2j ⎠ = 0 j=1
j=1
yields the MLE n j=1 (x1j + x2j ) . λˆ = n j=1 (L1j + L2j ) (d) Now,
; n n xij xij −λˆ nj=1 Lij n j=1 ˆ e i=1 j=1 Lij λ j=1 xij ! Lˆ ω n = ; n x 2 n n xij ˆ Lˆ Ω ˆ j=1 ij e−λi j=1 Lij L λ x ! ij i=1 j=1 ij j=1 i 2
n
=
λˆ
n
λˆ 1
−λˆ
j=1 (x1j +x2j ) e
j=1 x1j
n
λˆ 2
j=1 x2j
n
−λˆ 1
n
e
−2 ln under H0 : λ1 = λ2 .
Lˆ ω Lˆ Ω
j=1 L2j
ˆ
j=1 L1j e−λ2
And, for large n,
n
j=1 L1j +
∼ ˙ χ21
n
j=1 L2j
.
343
Solutions
(e) From part (d), − 2 ln(Lˆ ω /Lˆ Ω ) ⎧⎛ ⎞ ⎛ ⎞ n n n n ⎨ x1j + x2j ⎠ ln λˆ − λˆ ⎝ L1j + L2j ⎠ = −2 ⎝ ⎩ ⎛ −⎝
j=1
n j=1
⎞
j=1
⎛
x1j ⎠ ln λˆ 1 − ⎝
n j=1
j=1
⎞
x2j ⎠ ln λˆ 2 + λˆ 1
n
L1j + λˆ 2
j=1
4+9 200 + 300
n j=1
⎫ ⎬ L2j ⎭
4+9 (200 + 300) 200 + 300 − (4) ln(0.02) − (9) ln(0.03) + 0.02(200) + 0.03(300)
= −2 (4 + 9) ln
j=1
−
= 0.477. Since χ21,0.90 = 2.706, we do not reject H0 : λ1 = λ2 , and the Pvalue = pr χ21 > 0.477H0 : λ1 = λ2 = ˙ 0.50. Solution 5.10 (a) Under H0 : α = β(= γ, say), the restricted likelihood is Lω = γ2n e−nγ(¯x+¯y) . So, 2n ∂ln(Lω ) = − n(¯x + y¯ ) = 0 ∂γ γ Thus, 2n e−nγˆ ω (¯x+¯y) = Lˆ ω = γˆ ω
gives γˆ ω = 2(¯x + y¯ )−1 .
2n 2 e−2n . (¯x + y¯ )
Under H1 : α = β, the unrestricted likelihood is LΩ = αn e−nα¯x βn e−nβ¯y . Thus, ∂ln(LΩ ) n = − n¯x = 0 ∂α α
gives αˆ Ω = (¯x)−1 ,
n ∂ln(LΩ ) = − n¯y = 0 ∂β β
gives βˆ Ω = (¯y)−1 .
and
Thus,
n n −1 −1 Lˆ Ω = x¯ −1 e−n¯x x¯ y¯ −1 e−n¯y y¯ = (¯xy¯ )−n e−2n .
344
Hypothesis Testing Theory
Finally, the likelihood ratio statistic λˆ can be written as λˆ =
n Lˆ ω 4¯xy¯ = = [4u(1 − u)]n , (¯x + y¯ )2 Lˆ Ω
with
u=
x¯ . (¯x + y¯ )
(b) For the given set of data, λˆ = 0.0016. For large n and under H0 : α = β, the random ˆ ∼χ variable −2 ln(λ) ˙ 21 . So, ˆ > −2 ln(0.0016)] < 0.0005. Pvalue ≈ pr[−2 ln(λ) Since E(X) = 1/α and E(Y) = 1/β, the available data provide strong statistical evidence that the two surgical procedures lead to different true average survival times for patients with advanced colon cancer. Solution 5.11 (a) With x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ), the unconstrained likelihood has the form LΩ = L(x, y; θ1 , θ2 ) =
n "
θ1 (1 − θ1 )xi θ2 (1 − θ2 )yi
#
i=1
= θn1 (1 − θ1 )sx θn2 (1 − θ2 )sy , where sx = ni=1 xi and sy = ni=1 yi . So, ln LΩ = n ln θ1 + sx ln(1 − θ1 ) + n ln θ2 + sy ln(1 − θ2 ). Solving ∂ ln LΩ /∂θ1 = 0 and ∂ ln LΩ /∂θ2 = 0 yields the unconstrained MLEs θˆ 1 = 1/(1 + x¯ ) and θˆ 2 = 1/(1 + y¯ ). Substituting these unrestricted MLEs for θ1 and θ2 into the expression for LΩ gives Lˆ Ω =
n n¯x n n¯y 1 x¯ 1 y¯ . 1 + x¯ 1 + x¯ 1 + y¯ 1 + y¯
Also, when θ1 = θ2 = θ, the constrained likelihood has the form Lω = θ2n (1 − θ)(sx +sy ) , so that ln Lω = 2n ln θ + (sx + sy ) ln(1 − θ). Solving ∂ ln Lω /∂θ = 0 yields the constrained MLE θˆ = 2/(2 + x¯ + y¯ ). Then, substituting this restricted MLE for θ into the expression for Lω gives Lˆ ω =
2n n(¯x+¯y) 2 x¯ + y¯ . 2 + x¯ + y¯ 2 + x¯ + y¯
So, λˆ =
22n (1 + x¯ )n(1+¯x) (1 + y¯ )n(1+¯y) (¯x + y¯ )n(¯x+¯y) Lˆ ω = . (¯x)n¯x (¯y)n¯y (2 + x¯ + y¯ )n(2+¯x+¯y) Lˆ Ω
When n = 25, x¯ = 1.00, and y¯ = 2.00, −2 ln λˆ = 3.461. Under H0 , −2 ln λˆ ∼ ˙ χ21 2 when n is large. Since χ1,0.95 = 3.841, we do not reject H0 at the α = 0.05 level of significance.
345
Solutions
(b) First, with ⎡
⎤ ⎡ sx ⎤ n ∂ ln LΩ ⎢ ∂θ ⎥ ⎢ θ − (1 − θ ) ⎥ 1 1 ⎥ 1 ⎥ ⎢ θ = (θ1 , θ2 ), S (θ) = ⎢ ⎥=⎢ sy ⎦ ⎣ ∂ ln LΩ ⎦ ⎣ n − θ2 (1 − θ2 ) ∂θ2 ⎡ n − nθ (1 + x¯ ) ⎤ 1 ⎢ θ1 (1 − θ1 ) ⎥ ⎥ =⎢ ⎣ n − nθ (1 + y¯ ) ⎦. 2
θ2 (1 − θ2 ) Now, θˆ = 2/(2 + x¯ + y¯ ) = 2/(2 + 1 + 2) = 0.40. So, when n = 25, x¯ = 1.00, y¯ = ˆ θ) ˆ = (0.40, 0.40), then S(θˆ ω ) = (20.83333, −20.83333). Now, 2.00, and θˆ ω = (θ, ∂ 2 ln LΩ ∂θ21
=
−n θ21
−
sx −n n¯x = 2 − , (1 − θ1 )2 (1 − θ1 )2 θ1
so that −∂ 2 ln LΩ ∂θ21
= 225.6944. ˆ θ1 =θ=0.40, n=25,¯x=1.0
Also, −∂ 2 ln LΩ −∂ 2 ln LΩ = = 0. ∂θ2 ∂θ1 ∂θ1 ∂θ2 And, ∂ 2 ln LΩ ∂θ22
=
−n θ22
−
sy (1 − θ2 )2
=
−n θ22
−
n¯y , (1 − θ2 )2
so that −∂ 2 ln LΩ ∂θ22
= 295.1389. ˆ θ2 =θ=0.40, n=25,¯y=2.0
Finally, Sˆ = S(θˆ ω )I −1 (x, y; θˆ ω )S ⎡ (θˆ ω )
1 ⎢ 225.6944 = (20.8333, −20.8333) ⎣ 0
⎤ 0 ⎥ 20.8333 = 3.394. ⎦ 1 −20.8333 295.1389
Under H0 , Sˆ ∼ ˙ χ21 for large n. Since χ21,0.95 = 3.841, we again do not reject H0 at the α = 0.05 level of significance. Although the numerical values of −2 ln λˆ and Sˆ agree closely in this particular example, this will not always be the case.
346
Hypothesis Testing Theory
(c) First, the actual Pvalue for either the likelihood ratio test or the score test satisfies ¯ and Y¯ are unbiased estimators the inequality 0.05 < Pvalue < 0.10. Also, since X of E(X) and E(Y), respectively, and since x¯ = 1.00 is half the size of y¯ = 2.00, the data do provide some evidence suggesting that the teenage driver education classes are beneficial. So, the suggestion by the highway safety researcher to increase the sample size is very reasonable; power calculations can be used to choose an appropriate sample size. Solution 5.12 (a) The unconditional likelihood L(β) is L(β) =
. n i=1
⇒
ln L(β) =
yi −1 / 1 βxi 1 + βxi 1 + βxi
n (yi − 1) ln i=1
=
n "
1 1 + βxi
+ ln
ln(βxi ) − yi ln(1 + βxi )
βxi 1 + βxi
#
i=1
⇒
d ln L(β) = dβ ⇒ ⇒
n = βˆ βˆ =
xi yi − =0 β (1 + βxi )
n 1 i=1 n i=1
xi yi ˆ i) (1 + βx
n −1 . ˆ x y 1 + βx i i=1 i i
n
(b) From part (a), we know that, n xi yi d ln L(β) = − dβ β (1 + βxi ) n
i=1
d2 ln L(β) = dβ2 d2 ln L(β) −E = dβ2 ⇒
⇒
n xi2 yi −n + β2 (1 + βxi )2 i=1
n xi2 E(Yi ) n − . β2 (1 + βxi )2 i=1
Now, since E(Yi ) = θ−1 i = (1 + βxi )/βxi , it follows that
d2 ln L(β) I (β) = −E dβ2
n xi2 (1 + βxi ) n = 2 − βxi (1 + βxi )2 β i=1
347
Solutions ⎤ ⎡ n n 1 ⎣ n xi βxi ⎦ = 2 − = 2 1− β(1 + βxi ) (1 + βxi ) β β i=1
1 = 2 β So,
i=1
n
(1 + βxi )−1 .
i=1
. /−1 d2 ln L(β) β2 ˆ V(β) = −E = . n −1 dβ2 i=1 (1 + βxi )
(c) For these data,
−1
−1 n 1 ˆ i )−1 = 50 1 + 1 (30) (1 + βx + 50 1 + (40) = 5.5060, 2 2 i=1
so that ˆ β) ˆ = V(
2 1 > 5.5060 = 0.0454. 2
So, a largesample 95% CI for β is 2 ˆ β) ˆ βˆ ± 1.96 V(
√ = 0.50 ± 1.96 0.0454 = (0.0824, 0.9176).
(d) n 1 I (β) = 2 (1 + βxi )−1 , β i=1
ˆ = I (1/2) = 4(5.5060) = 22.0240. so that I (β) 2 ˆ = 1 − 1 (22.0240) = 5.5060, since W ˆ = (βˆ − β0 )I (β)( ˆ βˆ − β0 ) and β0 = 1. So, W 2
˙ 0.02. Since χ21,0.95 = 3.84, we reject H0 ; Pvalue = (e) POWER = pr {U > 1.96β = 1.10} ⎫ ⎧ ⎪ ⎪ ⎬ ⎨ βˆ − 1 > 1.96 β = 1.10 = pr 2 ⎪ ⎪ ⎭ ⎩ V (β) ˆ 0
2 ˆ ˆ = pr β > 1 + 1.96 V0 (β)β = 1.10
348
Hypothesis Testing Theory ⎧ ⎫ 2 ⎪ ⎪ ⎨ βˆ − 1.10 ˆ 1 + 1.96 V0 (β) − 1.10 ⎬ = pr 2 > 2 ⎪ ⎪ ⎩ V(β) ⎭ ˆ ˆ V(β) ⎫ ⎧ 2 ⎪ ⎬ ⎨ ˆ − 0.10 ⎪ 1.96 V0 (β) , = ˙ pr Z > 2 ⎪ ⎪ ⎭ ⎩ ˆ V(β)
where Z∼N(0, ˙ 1) for large n. Now, when β = 1, ˆ = V0 (β)
1 = 0.3531; 50(1 + 30)−1 + 50(1 + 40)−1
and, when β = 1.10, β2 ˆ = V(β) n −1 i=1 (1 + βxi ) =
(1.10)2 50[1 + 1.10(30)]−1 + 50[1 + 1.10(40)]−1
= 0.4687.
So, / √ 1.96 0.3531 − 0.10 POWER = pr Z > √ 0.4687 .
= pr(Z > 1.5552) = ˙ 0.06. Solution 5.13 (a) If X is the random variable denoting the number of lung cancer cases developing over this 20year followup period in a random sample of n = 1000 heavy smokers, it is reasonable to assume that X ∼ BIN(n, θ). The maximum likelihood estimator of θ is X θˆ = , n with ˆ = θ and E(θ)
ˆ = V(θ)
θ(1 − θ) . n
Since n is large, by the Central Limit Theorem and by Slutsky’s Theorem, ⎧ ⎪ ⎪ ⎪ ⎪ ⎨
⎫ ⎪ ⎪ ⎪ ⎪ ⎬ ˆ θ−θ . 0.95 = pr −1.96 < < 1.96 = pr{L < θ < U}, ⎪ ⎪ ⎪ ⎪ ˆ − θ) ˆ ⎪ ⎪ θ(1 ⎪ ⎪ ⎩ ⎭ n
349
Solutions
where 

ˆ − θ) ˆ θ(1 L = θˆ − 1.96 n
ˆ − θ) ˆ θ(1 and U = θˆ + 1.96 . n
Since ψ = θ/(1 − θ), so that θ = ψ/(1 + ψ) and θ−1 = 1 + ψ−1 , and with 0 < L < U, . 0.95 = pr{U −1 < θ−1 < L−1 } = pr{U −1 < 1 + ψ−1 < L−1 } . −1 −1 / 1 1 0.10 is equivalent to testing H0 : θ = 0.0909 versus H1 : θ > 0.0909. Since x (n − x) ∂ ln L(x; θ) , = − (1 − θ) ∂θ θ we have ∂ 2 ln L(x; θ) −x (n − x) = 2 − ; 2 ∂θ θ (1 − θ)2 hence, the estimated observed information is 900 100 + = 10, 000 + 1111.111 = 11, 111.111. (1 − 0.10)2 (0.10)2 And, since
∂ 2 ln L(x; θ) −Ex ∂θ2
nθ (n − nθ) n = 2 + = , θ(1 − θ) θ (1 − θ)2
1,000 = 11, 111.111. So, the Wald the estimated expected information is 0.10(1−0.10) statistic is
ˆ = (0.10 − 0.0909)2 (11, 111.111) = 0.9201, W with $ √ . ˆ > 0.9201  H0 : θ = 0.0909) = Pvalue = pr( W pr(Z > 0.9592) . = 0.17, where Z ∼ N(0, 1). Since ∂ ln L(x; θ) (1, 000 − 100) (100) = − ∂θ 0.0909 (1 − 0.0909) θ=0.0909 = 1100.11 − 989.9901 = 110.1199, the score statistic is Sˆ =
(110.1199)2 = 1.0021, 1000/[0.0909(0.9091)]
with $ √ . . Pvalue = pr( Sˆ > 1.0021H0 : θ = 0.0909) = pr(Z > 1.0010) = 0.16, where Z ∼ N(0, 1). The results of these Wald and score tests imply that H0 : θ = 0.0909 cannot be rejected given the available data.
352
Hypothesis Testing Theory
Of course, we can equivalently work with the parameter ψ, and directly test H0 : ψ = 0.10 versus H1 : ψ > 0.10 using appropriate Wald and score tests. From part (a), the appropriate estimated observed information is (8101.6202 − 810.0162) = 7, 291.6040; so, the Wald test statistic is ˆ = (0.1111 − 0.10)2 (7, 291.6040) = 0.8984, W with $ √ . ˆ > 0.8984H0 : ψ = 0.10) = Pvalue = pr( W pr(Z > 0.9478) = ˙ 0.17, where Z ∼ N(0, 1). Since ∂ ln L(x; ψ) 100 1, 000 = − = 1000 − 909.0909 = 90.9091, ∂ψ 0.10 (1 + 0.10) ψ=0.10 the score test statistic is Sˆ =
(90.9091)2 = 1.0000, 1000/[.10(1 + .10)2 ]
with $ √ . ˙ 0.16, Pvalue = pr( Sˆ > 1.000H0 : ψ = 0.10) = pr(Z > 1.0000) = where Z ∼ N(0, 1). As before, there is not sufficient evidence to reject H0 : ψ = 0.10. Solution 5.14 (a) The unrestricted likelihood function LΩ has the structure LΩ =
n
fX,Y (xi , yi ; α, β) =
i=1
=
fX (xi ; β)fY (yi X = xi ; α, β)
i=1
n 1 i=1
n
β
= β−n e−
e−xi /β n
1 e−yi /(α+β)xi (α + β)xi
− i=1 xi /β (α + β)−n e
⎛ n yi > n (α+β) i=1 x i
·⎝
⎞−1 xi ⎠
.
i=1
By the Factorization Theorem, U1 = ni=1 Xi and U2 = ni=1 (Yi /Xi ) are jointly sufficient for α and β. If we can show that Xi and Yi /Xi are uncorrelated, then U1 and U2 will be uncorrelated. Now, E(Xi ) = β and E(Yi ) = Exi [E(Yi Xi = xi )] = E[(α + β)xi ] = (α + β)β.
353
Solutions
And, Yi = Exi E Xi = xi Xi
1 1 E(Yi Xi = xi ) = E (α + β)xi = (α + β). = Exi xi xi
E
Yi Xi
Since Y E Xi · i = E(Yi ) = (α + β)β, Xi Y Y Yi cov Xi , i = E Xi · i − E(Xi )E Xi Xi Xi = (α + β)β − β(α + β) = 0, and hence U1 and U2 are uncorrelated. (b) Now, ln LΩ = −n ln β −
n
i=1 xi − n ln(α + β) −
β
So, −n ∂ ln LΩ = + ∂α (α + β)
n
i=1 (yi /xi ) −
(α + β)
n
i=1 (yi /xi ) = 0 ⇒ (α ˆ = ˆ + β) (α + β)2
n
n
i=1 (yi /xi ) .
n
And, n n xi ∂ ln LΩ −n n i=1 (yi /xi ) = 0 = + i=1 − + 2 ∂β β (α + β) β (α + β)2 n n X (Y /X ) ¯ ¯ = i=1 i and αˆ = i=1 i i − X. ⇒ βˆ = X n n Now,
2 ni=1 (yi /xi ) ∂ 2 ln LΩ n = − , (α + β)3 ∂α2 (α + β)2
so that
−E
∂ 2 ln LΩ ∂α2
=
Also, 2 n ∂ 2 ln LΩ = 2 − ∂β2 β
−n 2n(α + β) n + = . (α + β)2 (α + β)3 (α + β)2
n
ln xi .
i=1
2 ni=1 (yi /xi ) n i=1 xi + − , β3 (α + β)2 (α + β)3
354
Hypothesis Testing Theory
so that
∂ 2 ln LΩ −E ∂β2
−n 2nβ n 2n(α + β) + + 3 − (α + β)3 β2 β (α + β)2
=
n n = 2 + . β (α + β)2 And,
2 ni=1 (yi /xi ) n ∂ 2 ln LΩ = − , ∂α∂β (α + β)2 (α + β)3
so that
−E
∂ 2 ln LΩ ∂α∂β
=
−n 2n(α + β) n + = . 2 3 (α + β) (α + β) (α + β)2
Thus, the expected information matrix I(α, β) is equal to ⎤ n 2 ⎥ (α + β) ⎥, ⎦ n n + 2 2 β (α + β)
⎡
n ⎢ (α + β)2 I(α, β) = ⎢ ⎣ n (α + β)2 and so
⎡
(α + β)2 β2 + ⎢ n n ⎢ I −1 (α, β) = ⎢ ⎣ 2 −β
⎤ −β2 ⎥ n ⎥ ⎥. β2 ⎦
n
n
For H0 : α = β, or equivalently H0 : R = (α − β) = 0, we have T=
∂R ∂R , = (1, −1), ∂α ∂β
and so Λ = TI −1 (α, β)T =
4β2 (α + β)2 + n n
ˆ = V(αˆ − β) ˆ = V(α) ˆ − 2cov(α, ˆ = V(R) ˆ + V(β) ˆ β). ˆ takes the form So, the Wald test statistic W ⎡
⎤2
ˆ ˆ −0 ⎥ ⎢ (αˆ − β) ⎥ ˆ = R = =⎢ W 1 ⎦ , ⎣ 2 2 ˆ ˆ ˆ Λ (αˆ + β) 4β ˆ ˆ V αˆ − β + n n ˆ 2 (αˆ − β)
as expected.
355
Solutions
For n = 30, αˆ = 2, and βˆ = 1, ˆ = W
(2 − 1)2 4(1)2 (2 + 1)2 + 30 30
= 2.31.
ˆ ∼ ˙ pr(χ21 > 2.31H0 : α = Since W ˙ χ21 under H0 : α = β for large n, the Pvalue = β) = ˙ 0.14. So, for the given data, there is not sufficient evidence to reject H0 : α = β. A (large sample) 95% confidence interval for (α − β) is 2 ˆ2 ˆ 2 ˆ αˆ − β) ˆ ± 1.96 V( ˆ = (αˆ − β) ˆ ± 1.96 (αˆ + β) + 4β (αˆ − β) n n 4(1)2 (2 + 1)2 + = (−0.29, 2.29). = (2 − 1) ± 1.96 30 30 The computed 95% CI contains the value 0, which agrees with the conclusion based on the Wald test. Solution 5.15 (a) With y = (y11 , y12 , . . . , y1n ; y21 , y22 , . . . , y2n ), we have ⎧ ⎫ n ⎨ 2 1 −yij /βi xij2 ⎬ L(y; β1 , β2 ) = e ⎩ βi x2 ⎭ ij
i=1 j=1
⎧ 2 ⎨
1 −β−1 i e = n 2 ⎩ βn j=1 xij i i=1
n
−2 j=1 xij yij
⎫ ⎬ ⎭
.
Hence, by the Factorization Theorem, n y 1j
x2 j=1 1j and
n y 2j
x2 j=1 2j
is sufficient for β1 ,
is sufficient for β2 .
(b) Now, ln L(y; β1 , β2 ) =
⎧ n ⎨ 2 i=1 j=1
= −n
2 i=1
ln βi −
− ln βi − ln xij2 − ⎩
n 2 i=1 j=1
ln xij2 −
2 i=1
β−1 i
⎫ yij ⎬ βi xij2 ⎭
n y ij
x2 j=1 ij
.
356
Hypothesis Testing Theory
So, for i = 1, 2, Si (β1 , β2 ) =
n ∂ ln L(y; β1 , β2 ) −n 1 yij = + 2 =0 ∂βi βi βi j=1 xij2
gives 1 yij , βˆ i = n x2 n
i = 1, 2.
j=1 ij
And, ∂ 2 ln L(y; β1 , β2 ) ∂β2i
n n 2 yij = 2 − 3 , βi βi j=1 xij2
i = 1, 2,
so that . −E
∂ 2 ln L(y; β1 , β2 )
/
∂β2i =
−n β2i
=
n 2 E(Yij ) + β2i β3i j=1 xij2
−n
2 n 2 βi xij
−n 2n n + 3 = 2 + 2 = 2. 2 βi j=1 xij βi βi βi
Also, ∂ 2 ln L(y; β1 , β2 ) ∂ 2 ln L(y; β1 , β2 ) = = 0, ∂β1 ∂β2 ∂β2 ∂β1 so that the expected information matrix is n/β21 I(β1 , β2 ) = 0
0 . n/β22
Under H0 : β1 = β2 (= β, say), ln L(y; β) = −2n ln β −
n 2
ln xij2 −
1 yij , β x2 2
n
i=1 j=1 ij
i=1 j=1
so that the equation 2 n ∂ ln L(y; β) −2n 1 = + 2 ∂β β β
yij
x2 i=1 j=1 ij
=0
357
Solutions
gives 2 βˆ =
i=1
n
yij j=1 x2 ij
2n
=
1 ˆ (β + βˆ 2 ). 2 1
ˆ β), ˆ S2 (β, ˆ β)], ˆ ˆ = [S1 (β, So, with S(β) ˆ β)S ˆ (β) ˆ ˆ −1 (β, Sˆ = S(β)I
⎡
−n nβˆ 1 + βˆ 2/n 0 ⎢ −n nβˆ 1 −n nβˆ 2 ⎢ βˆ βˆ 2 = + + , ⎢ 2 ˆβ ˆβ2 βˆ ˆβ2 ⎣ ˆ n βˆ −n 0 β /n + 2 ˆβ ˆβ2 ⎡ ⎤ −n nβˆ 1 ⎢ ˆ + ˆ2 ⎥ ⎢ ⎥ β = [−βˆ + βˆ 1 , −βˆ + βˆ 2 ] ⎢ β ⎥ ⎣ −n nβˆ 2 ⎦ + βˆ βˆ 2 ˆ 1 − β) ˆ n ( β ˆ (βˆ 2 − β)] ˆ = [(βˆ 1 − β), ˆ ˆ ˆ (β2 − β) β2
ˆ 2 + (βˆ 2 − β) ˆ 2] n[(βˆ 1 − β) βˆ 2
1 1 n (βˆ 1 − βˆ 2 )2 + (βˆ 1 − βˆ 2 )2 4 4 = 1 ˆ (β + βˆ 2 )2 4 1
=
=
2n(βˆ 1 − βˆ 2 )2 . (βˆ 1 + βˆ 2 )2
Under H0 : β1 = β2 , Sˆ ∼χ ˙ 21 for large n. For the given data, 50 2(25)(2 − 3)2 = = 2. Sˆ = 2 25 (2 + 3) Since χ20.95,1 = 3.841, we do not reject H0 at the α = 0.05 level. Solution 5.16 (a) The unrestricted likelihood function is LΩ =
n i=1
θ−1 e−xi /θ ·
n i=1
(λθ)−1 e−yi /λθ
⎤ ⎥ ⎥ ⎥ ⎦
358
Hypothesis Testing Theory ⎧ ⎨
⎧ ⎫ ⎫ n n ⎨ ⎬ ⎬ = θ−n exp −θ−1 xi (λθ)−n exp −(λθ)−1 yi ; ⎩ ⎩ ⎭ ⎭ i=1
i=1
so, by the Factorization Theorem, Sx =
n
and Sy =
Xi
n
Yi
i=1
i=1
are jointly sufficient for λ and θ. (b) From part (a), ln LΩ = −n ln θ −
n
i=1 xi − n ln(λθ) −
θ
= −2n ln θ − n ln λ −
n
i=1 xi − θ
n
i=1 yi
n
λθ
i=1 yi .
λθ
So, 2n ∂ ln LΩ =− + ∂θ θ
n
i=1 xi + θ2
n
i=1 yi , λθ2
and ∂ ln LΩ n =− + ∂λ λ
n
i=1 yi . λ2 θ
Now, −n ∂ ln LΩ = 0 =⇒ + ∂λ θ
n
i=1 yi = 0 =⇒ λθ2
n
i=1 yi = n . θ λθ2
Thus, ∂ ln LΩ =0 ∂θ gives −2n + θ
n
i=1 xi + n = −n + θ θ θ2
or θˆ = x¯ . Then, n¯y −n + 2 =0 λ λ x¯
n
i=1 xi = 0, θ2
359
Solutions
gives λˆ =
y¯ . x¯
(c) Let θ = (θ, λ) denote the set of unknown parameters. From part (b),
−n n¯y + 2 . λ λ θ
−2n n¯x n¯y S(θ) = + 2 + 2, θ θ λθ Under H0 : λ = 1, the restricted log likelihood is ln Lω = −2n ln θ −
n(¯x + y¯ ) ; θ
so, −2n n(¯x + y¯ ) ∂ ln Lω = + =0 ∂θ θ θ2 gives (¯x + y¯ ) . θˆ ω = 2 Thus, " # θˆ ω = (¯x + y¯ )/2, 1 . Now, −
4n¯x n¯x n¯y −4n 4n¯y 2n + + = + + = 0, (¯x + y¯ ) (¯x + y¯ )2 (¯x + y¯ )2 θˆ ω θˆ 2ω (1)θˆ 2ω
and −n 2n¯y n(¯y − x¯ ) −n(¯x + y¯ ) + 2n¯y + = , = (1) (¯x + y¯ ) (¯x + y¯ ) (1)2 (¯x + y¯ ) so that
n(¯y − x¯ ) S(θˆ ω ) = 0, . (¯x + y¯ ) Finally, we need I −1 (θˆ ω ). Now, 2n 2n¯x 2n¯y ∂ 2 ln LΩ = 2 − 3 − 3, ∂θ2 θ θ λθ
360
Hypothesis Testing Theory
so that
∂ 2 ln LΩ −E ∂θ2
=
2n −2n 2nθ 2nλθ + 3 + = 2. θ2 θ λθ3 θ
And, ∂ 2 ln LΩ −n¯y = 2 2, ∂θ∂λ λ θ so that −E
∂ 2 ln LΩ ∂θ∂λ
=
nλθ n = . λθ λ2 θ2
Also, ∂ 2 ln LΩ n 2n¯y = 2 − 3 , ∂λ2 λ λ θ so that
∂ 2 ln LΩ −E ∂λ2
=
n −n 2nλθ + 3 = 2. λ2 λ θ λ
So, ⎡
2n ⎢ θ2 I(θ) = ⎣ n λθ
⎤ n λθ ⎥ n ⎦, λ2
and hence ⎡
θ2 ⎢ −1 n I (θ) = ⎢ ⎣ −λθ n
⎤ −λθ ⎥ n ⎥. 2 2λ ⎦ n
So, ⎡
(¯x + y¯ )2 ⎢ 4n I −1 (θˆ ω ) = ⎢ ⎣ −(¯x + y¯ ) 2n
⎤ −(¯x + y¯ ) 2n ⎥ ⎥. ⎦ 2 n
361
Solutions
Finally,
Sˆ = 0,
⎡
⎡ =
⎤ ⎤ −(¯x + y¯ ) ⎡ 0 ⎥ 2n ⎥ ⎣ n(¯y − x¯ ) ⎦ ⎦ 2 (¯x + y¯ ) n
(¯x + y¯ )2 n(¯y − x¯ ) ⎢ ⎢ 4n (¯x + y¯ ) ⎣ −(¯x + y¯ ) 2n
2n(¯y − x¯ )2
⎤2
⎢ (¯y − x¯ ) ⎥ = ⎣2 ⎦ , Vˆ 0 (¯y − x¯ )
(¯x + y¯ )2
since 2 2 2 ¯ = θ + [(1)θ] = 2θ V0 (Y¯ − X) n n n
(¯x + y¯ ) and θˆ ω = , 2
so that 2 ˆ2 ˆ 0 (Y¯ − X) ¯ = 2θω = (¯x + y¯ ) . V n 2n
For n = 50, x¯ = 30, and y¯ = 40, 2(50)(40 − 30)2 Sˆ = = 2.04. (30 + 40)2 So, . Pvalue = pr χ21 > 2.04 H0 : λ = 1 = 0.15. So, there is not sufficient evidence with these data to reject H0 : λ = 1. Solution 5.17 (a) The marginal cumulative distribution function (CDF) of X, FX (x), is given by FX (x) = Eλ [FX (xλ)] = =
∞ 0
=1−
∞ 0
FX (xλ)π(λ) dλ
1 − e−λx βe−βλ dλ
∞ 0
βe−(x+β)λ dλ
∞ β (x + β)e−(x+β)λ dλ (x + β) 0 x −1 , x > 0. =1− 1+ β =1−
362
Hypothesis Testing Theory
Thus, fX (x) =
x −2 1 1+ , β β
x > 0, β > 0,
which is a generalized Pareto distribution with scale parameter equal to β, shape parameter equal to 1, and location parameter equal to 0. (b) Now, π(λX = x) = =
fX,λ (x, λ) f (xλ)π(λ) = X fX (x) fX (x) λβe−(x+β)λ 1 1 + x −2 β β
x 2 −(x+β)λ = λβ2 1 + e = λ(x + β)2 e−(x+β)λ , β
λ > 0.
Thus, the posterior distribution for λ is GAMMA[(x + β)−1 , 2]. Since π(λ) is GAMMA(β−1 , 1), the prior and posterior distributions belong to the same distributional family, and hence π(λ) is known as a conjugate prior. ∗
(c) For a given value of λ (λ∗ , say), pr(λ < λ∗ ) = 1 − e−βλ based on the prior distribution π(λ). And, given an observed value x of X, pr(λ < λ∗ X = x) = =
λ∗ 0
λ∗ 0
π(λX = x) dλ λ(x + β)2 e−(x+β)λ dλ.
Using integration by parts with u = λ and dv = (x + β)2 e−(x+β)λ , we have ∗
pr(λ < λ∗ X = x) = −λ(x + β)e−(x+β)λ λ 0 + ∗
λ∗ 0
(x + β)e−(x+β)λ dλ ∗
= −λ∗ (x + β)e−(x+β)λ + 1 − e−(x+β)λ " # ∗ = 1 − λ∗ (x + β) + 1 e−(x+β)λ .
With λ∗ = 1, β = 1 and x = 3, pr(H1 ) = 1 − e−1 = 0.6321, pr(H0 ) = 1 − pr(H1 ) = 0.3679, pr(H1 X = x) = 1 − 5e−4 = 0.9084, and pr(H0 X = x) = 1 − pr(H1 X = x) = 0.0916. Thus, BF10 =
(0.9084)(0.3679) pr(H1 X = x)pr(H0 ) = = 5.77. pr(H0 X = x)pr(H1 ) (0.0916)(0.6321)
Hence, observing a survival time of x = 3 years yields “positive,” but not “strong,” evidence in favor of H1 .
363
Solutions
Solution 5.18∗ (a) With x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ), the likelihood function has the structure
L(x, y; θ, φ1 , φ2 , . . . , φn ) =
n
fXi (xi )fYi (yi )
i=1
=
n
−yi /φi (θφi )−1 e−xi /θφi · φ−1 i e
i=1
⎛ = θ−n ⎝
n
⎞−2 x y − ni=1 θφi + φi i i φi ⎠ e ,
i=1
0 < xi < ∞, 0 < yi < ∞, i = 1, 2, . . . , n. Using this particular likelihood would entail the estimation of (n + 1) parameters, namely, θ, φ1 , φ2 , . . . , φn . Note that there are only 2n data points, so that the number of parameters to be estimated is more than half the number of data values; this type of situation often leads to unreliable statistical inferences. (b) We need to use the method of transformations. We know that fXi ,Yi (xi , yi ) = fXi (xi ) · fYi (yi ) = =
1 −xi /θφi 1 −yi /φi e · e θφi φi 1 θφ2i
x y − θφi + φi
e
i
i
,
0 < xi < +∞, 0 < yi < +∞.
Let Ri = Xi /Yi and Si = Yi , so that Xi = Ri Si and Yi = Si . Clearly, 0 < Ri < +∞ and 0 < Si < +∞. And, ∂X i ∂Ri J = ∂Yi ∂Ri
∂Xi ∂Si Si = ∂Yi 0 ∂Si
Ri = Si = Ji . 1
So,
fRi ,Si (ri , si ) =
1 θφ2i
ri si si − θφ +φ
e
i
i
(si ),
364
Hypothesis Testing Theory
0 < ri < +∞, 0 < si < +∞. Finally, fRi (ri ) =
1
∞
s − φi
si e
θφ2i 0 1
=
∞
i
−si
>
ri θ +1
dsi
φ r i i +1 θ ds
si e θφ2i 0 2 1 φi & = 2 % ri θφi θ +1 =
θ , (θ + ri )2
i
0 < ri < +∞.
(c) With r = (r1 , r2 , . . . , rn ), we have L(r; θ) ≡ L =
n i=1
θ (θ + ri )2
= θn
n
(θ + ri )−2 .
i=1
So, ln L = n ln θ − 2
n
ln(θ + ri ),
i=1
n ∂ ln L = −2 (θ + ri )−1 , ∂θ θ n
i=1
−n ∂ 2 ln L = 2 +2 (θ + ri )−2 . 2 ∂θ θ n
i=1
And, E[(θ + ri )−2 ] =
∞ 0
(θ + ri )−2
∞
θ dri (θ + ri )2
θ dri (θ + ri )4 0 ∞ −(θ + ri )−3 =θ 3 =
0
=
1 . 3θ2
So,
∂ 2 ln L −E ∂θ2
n n 2n n 1 n = 2 −2 = 2 − 2 = 2. θ 3θ2 θ 3θ 3θ i=1
365
Solutions
Hence, if θˆ is the MLE of θ, then θˆ − θ ∼N(0, ˙ 1) $ 3θ2 /n for large n. √ To test H0 : θ = 1 versus H1 : θ > 1, we would reject H0 if (θˆ − 1)/ 3/n > 1.96 for a size α = 0.025 test; note that this is a score test. So, when θ = 1.50, . / θˆ − 1 POWER = pr √ > 1.96θ = 1.50 3/n / . √ (1 + 1.96 3/n − 1.50) θˆ − 1.50 > = pr $ $ 3(1.50)2 /n 3(1.50)2 /n √ 1.96 n ≈ pr Z > − √ 1.50 3 3 where Z ∼ N(0, 1). So, we should choose n∗ as the smallest positive integer value of n such that √ − n 1.96 ≤ −0.84, √ + 1.50 3 3 or, equivalently, √ 1.96 √ + 0.84 = −11.1546 =⇒ n∗ = 125. − n ≤ −3 3 1.50 Solution 5.19∗ (a) First, we know that Si ∼ POI[φi (λ1 + λ0 )]. So, for i = 1, 2, . . . , n, pYi1 (yi1 Si = si ) =
pr[(Yi1 = yi1 ) ∩ (Si = si )] pr(Si = si )
pr[(Yi1 = yi1 ) ∩ (Yi0 = si − yi1 )] pr(Si = si ) (φi λ0 )(si −yi1 ) e−φi λ0 (φi λ1 )yi1 e−φi λ1 yi1 ! (si − yi1 )! / . = s −φ [φi (λ1 + λ0 )] i e i (λ1 +λ0 ) si !
=
s
= Cyii1
yi1 si −yi1 λ1 λ0 , λ1 + λ0 λ1 + λ0
yi1 = 0, 1, . . . , si . So, the conditional distribution of Yi1 given Si = si is BIN[si , λ1 /(λ1 + λ0 )].
366
Hypothesis Testing Theory
(b) Based on the result found in part (a), an appropriate (conditional) likelihood function is Lc =
n s Cyii1 θyi1 (1 − θ)si −yi1 , i=1
where θ = λ1 /(λ1 + λ0 ). Thus, ln Lc ∝ ln θ
n
⎛ ⎞ n n yi1 + ln(1 − θ) ⎝ si − yi1 ⎠ ,
i=1
so that ∂ ln Lc = ∂θ gives θˆ =
n
i=1 yi1
i=1
n
i=1 yi1 −
%n
i=1 si −
i=1
n
i=1 yi1
(1 − θ)
θ
& =0
> n
i=1 si .
And, with S = (S1 , S2 , . . . , Sn ) and s = (s1 , s2 , . . . , sn ), − ∂ 2 ln Lc = 2 ∂θ
n
i=1 yi1 − θ2
%n
i=1 si −
n
i=1 yi1 2 (1 − θ)
&
gives −1 ∂ 2 ln Lc ˆ V(θS = s) = − E S = s ∂θ2 =
θ ni=1 si θ2
+
(1 − θ)
n
i=1 si (1 − θ)2
−1
θ(1 − θ) = n . i=1 si
So, given S = s, under H0 : θ = 1/2 (or, equivalently, λ1 = λ0 ) and for large n, it follows that U= 
θˆ − 12 (1/2)(1/2) ( ni=1 si )
∼N(0, ˙ 1);
so, we would reject H0 : θ = 1/2 (or, equivalently, λ1 = λ0 ) in favor of H1 : θ > 1/2 (or, equivalently, λ1 > λ0 ) when the observed value u of U exceeds 1.645. Note that this is a scoretype test statistic. n ˆ When n = 50, ni=1 si = 500, and i=1 yi1 = 275, then θ = 275/500 = 0.55, so that 0.55 − 0.50 u = √ = 2.236; 1 1/500 2
367
Solutions
so, these data provide strong evidence (Pvalue = 0.0127) for rejecting H0 : θ = 1/2 in favor of H1 : θ > 1/2. Another advantage of this conditional inference procedure is that its use avoids the need to estimate the parameters λ1 and λ0 separately. Solution 5.20∗ (a) With x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ), L ≡ L(x, y; λx , λy ) =
x yi −λ n λxi e−λx λy e y · xi ! yi !
i=1
x −nλx λy e−nλy λn¯ x e · n . = n i=1 xi ! i=1 yi ! n¯y
So, ln L ∼ n¯x ln λx + n¯y ln λy − n(λx + λy ). Thus, ∂ ln L n¯x = − n, ∂λx λx ∂ 2 ln L ∂λ2x
=
−n¯x λ2x
∂ ln L n¯y = − n, ∂λy λy ∂ 2 ln L
,
∂λ2y
=
−n¯y λ2y
,
and ∂ 2 ln L ∂ 2 ln L = = 0. ∂λx ∂λy ∂λy ∂λx Hence, −E
∂ 2 ln L
=
∂λ2x
nλx λ2x
=
n , λx
=
n . λy
and −E
∂ 2 ln L ∂λ2y
=
nλy λ2y
So, ⎡ n ⎢λ I(λx , λy ) = ⎣ x 0
⎤ 0 ⎥ n ⎦ λy
⎡
λx ⎢ n −1 and I (λx , λy ) = ⎣ 0
⎤ 0 ⎥ λy ⎦ . n
368
Hypothesis Testing Theory
Under H0 : λx = λy (= λ, say), ln Lω ∼ n¯x ln λ + n¯y ln λ − 2nλ. Solving ∂ ln Lω n¯x n¯y = + − 2n = 0 ∂λ λ λ gives λˆ =
(¯x + y¯ ) . 2
So, ⎡ n¯x
−n
⎤
ˆ ⎢ ˆ ⎥ λ/n Sˆ = ⎣ λ ⎦ n¯y 0 −n λˆ
⎡ n¯x
−n
⎤
0 ⎢ λˆ ⎥ ⎦. ⎣ n¯y ˆ λ/n −n λˆ
Since λˆ = (8.00 + 9.00)/2 = 8.500 and n = 30, ⎡
⎤ ⎡ 8.5 30(8) − 30 ⎢ 8.5 ⎥ ⎢ 30 ⎥ ⎢ Sˆ = ⎢ ⎣ 30(9) ⎦ ⎣ − 30 0 8.5
⎤⎡ ⎤ 30(8) 0 ⎥⎢ − 30⎥ ⎥ ⎢ 8.5 ⎥ = 1.7645. ⎦ 8.5 ⎦ ⎣ 30(9) − 30 30 8.5
Since, Pvalue = pr(χ21 > 1.7645) ≥ 0.15, we would not reject H0 at any conventional α−level. (b) Now, since (X1 + Y1 ) ∼ POI(λx + λy ), we have pX1 (x1 X1 + Y1 = s1 ) = pr(X1 = x1 X1 + Y1 = s1 ) = =
pr [(X1 = x1 ) ∩ (X1 + Y1 = s1 )] pr [X1 + Y1 = s1 ] pr(X1 = x1 )pr(Y1 = s1 − x1 ) pr(X1 + Y1 = s1 ) s1 −x1 −λ x λy e y λx1 e−λx
=
x1 !
(s1 − x1 )!
(λx + λy )s1 e−(λx +λy )
s1 ! s1 ! = x1 !(s1 − x1 )!
λx λx + λy
= Cx11 πx1 (1 − π)s1 −x1 s
x1
λx 1− λx + λy
s1 −x1
369
Solutions
for x1 = 0, 1, . . . , s1 and π = λx /(λx + λy ). So, given (X1 + Y1 ) = s1 , X1 ∼ BIN n = s1 , π =
λx . (λx + λy )
If δ = 0.60, then H0 : λy = 0.60λx is equivalent to testing H 0 : π =
λx 1 = = 0.625, λx + 0.60λx 1.60
and H1 : λy > 0.60λx is equivalent to testing H 1 : π <
λx = 0.625. λx + 0.60λx
So, for the given data, the exact Pvalue is Pvalue = pr(X1 ≤ 4S1 = 14, θ = 0.625) =
4 x1 =0
. x1 14−x1 = 0.0084. C14 x1 (0.625) (0.375)
So, given the observed values of x1 = 4 and y1 = 10, one would reject H0 : λy = 0.60λx in favor of H1 : λy > 0.60λx using this conditional test. Solution 5.21∗ (a) From standard order statistics theory, it follows directly that fX(1) (x(1) ) = n[1 − x(1) + θ]n−1 ,
0.50 ≤ θ < x(1) < (θ + 1) < +∞,
that fX(n) (x(n) ) = n[x(n) − θ]n−1 ,
0.50 ≤ θ < x(n) < (θ + 1) < +∞,
and that fX(1) ,X(n) (x(1) , x(n) ) = n(n − 1)[x(n) − x(1) ]n−1 , 0.50 ≤ θ < x(1) < x(n) < (θ + 1) < +∞. Now, pr(BH0 : θ = 1) = pr(X(n) > 2H0 : θ = 1) = 0 since 1 < X(n) < 2 when θ = 1. Thus, it follows that the probability of a Type I error is equal to pr(AH0 : θ = 1) = pr(X(1) > kθ = 1) =
2 k
n(2 − x(1) )n−1 dx(1)
#2 " = −(2 − x(1) )n k = (2 − k)n ;
370
Hypothesis Testing Theory
thus, solving the equation (2 − k)n = α gives kα = 2 − α1/n ,
0 < α ≤ 0.10.
(b) First, consider the power for values of θ satisfying θ > kα > 1. In this situation, X(1) > kα , so that pr(Aθ > kα ) = pr(X(1) > kα θ > kα ) = 1, so that the power is 1 for θ > kα . For values of θ satisfying 1 < θ ≤ kα , POWER = pr(A1 < θ ≤ kα ) + pr(B1 < θ ≤ kα ) − pr(A ∩ B1 < θ ≤ kα ). Now, with kα = 2 − α1/n , pr(A1 < θ ≤ kα ) =
θ+1 kα
n[1 − x(1) + θ]n−1 dx(1)
n = (1 − kα + θ)n = θ − 1 + α1/n . And, pr(B1 < θ ≤ kα ) =
θ+1 2
n[x(n) − θ]n−1 dx(n)
= 1 − (2 − θ)n . Finally, pr(A ∩ B1 < θ ≤ kα ) = = =
θ+1 x(n) 2
kα
n(n − 1)(x(n) − x(1) )n−2 dx(1) dx(n)
θ+1
x(n) n −(x(n) − x(1) )n−1 dx(n) kα
2
θ+1 2
" #θ+1 n(x(n) − kα )n−1 dx(n) = (x(n) − kα )n 2
n = (θ + 1 − kα )n − (2 − kα )n = θ − 1 + α1/n − α. So, for 1 < θ ≤ kα = 2 − α1/n , n " n
#
POWER = θ − 1 + α1/n + 1 − (2 − θ)n − θ − 1 + α1/n + α = 1 + α − (2 − θ)n . As required, the above expression equals α when θ = 1 and equals 1 when θ = kα = 2 − α1/n .
371
Solutions
Solution 5.22∗ (a) Clearly, E(Y¯ 1 − Y¯ 2 ) = E(Y¯ 1 ) − E(Y¯ 2 ) = (μ1 − μ2 ). Now, ⎧ ⎫ ⎞ n n ⎬ ⎨ 1 1 V(Y¯ i ) = V ⎝ Yij ⎠ = 2 V(Yij ) + 2 cov(Yij , Yij ) ⎭ n n ⎩
⎛
j=1
j=1
allj μ2 at the α = 0.05 level, we reject H0 in favor of H1 when 2
(Y¯ 1 − Y¯ 2 ) − 0 2σ2 [1 + (n − 1)ρ] n
> 1.645.
(c) If one incorrectly assumes that ρ = 0, one would use (under the stated assumptions) the test statistic (Y¯ 1 − Y¯ 2 ) , $ 2σ2 /n and reject H0 : μ1 = μ2 in favor of H1 : μ1 > μ2 when (Y¯ 1 − Y¯ 2 ) > 1.645. 2 2σ2 n
372
Hypothesis Testing Theory
Thus, the actual Type I error rate using this incorrect testing procedure (when n = 10, σ2 = 2, and ρ = 0.50) is: ⎤ ¯ ¯ ⎥ ⎢ (Y − Y ) − 0 > 1.645n = 10, σ2 = 2, ρ = 0.50⎦ pr ⎣ 1 2 2 ⎡
⎡
2σ2 n
$ 1.645 2σ2 /n ⎢ (Y¯ 1 − Y¯ 2 ) − 0 > 2 = pr ⎣ 2 n = 10, 2σ2 [1 + (n − 1)ρ] 2σ2 [1 + (n − 1)ρ] n n ⎤ ⎥ σ2 = 2, ρ = 0.50⎦ = pr(Z > 0.7014] = ˙ 0.24. This simple example illustrates that ignoring positive “intracluster” (in our case, intraneighborhood) response correlation can lead to inflated Type I error rates, and more generally, to invalid statistical inferences. Solution 5.23∗ (a) Given X1 = x1 , where x1 is a fixed constant, X2 = θx1 + 2 ∼ N(θx1 , σ2 ). (b) fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 X1 = x1 ) 2 2 2 2 1 1 = √ e−x1 /2σ · √ e−(x2 −θx1 ) /2σ , 2πσ2 2πσ2 − ∞ < x1 < ∞, −∞ < x2 < ∞.
(c) For i = 2, 3, . . . , n, since Xi = θXi−1 + i , it follows from part (a) that fXi (xi Xj = xj , j = 1, 2, . . . , i − 1) = fXi (xi Xi−1 = xi−1 ), where the conditional density of Xi given Xi−1 = xi−1 is N(θxi−1 , σ2 ). So, f∗ = fX1 (x1 )
n
fXi (xi Xi−1 = xi−1 )
i=2
= √
1 2πσ2
e−x1 /2σ · 2
2
n
2 2 1 e−(xi −θxi−1 ) /2σ √ 2 2πσ i=2
373
Solutions ⎧ ⎨
⎡ ⎤⎫ n ⎬ 1 = (2π)−n/2 (σ2 )−n/2 exp − 2 ⎣x12 + (xi − θxi−1 )2 ⎦ , ⎩ 2σ ⎭ i=2
−∞ < xi < ∞, i = 1, 2, . . . , n. So, n n ln f∗ = − ln(2π) − ln σ2 − 2
2
⎡ ⎤ n 1 ⎣ 2 2 (xi − θxi−1 ) ⎦ . x1 + 2σ2 i=2
So, in the unrestricted parameter space Ω,
∂ ln f∗ ∂θ
n xi−1 xi n 1 i=2 = 2 xi−1 (xi − θxi−1 ) = 0 ⇒ θˆ Ω = . n−1 σ 2 i=2 xi i=1
And, ⎡ ⎤ n 1 ⎣ 2 ∂ ln f∗ −n 2 (xi − θxi−1 ) ⎦ = 2 + 4 x1 + ∂(σ2 ) 2σ 2σ i=2
⎡
⎤ n 1 2 = ⎣x2 + (xi − θˆ Ω xi−1 )2 ⎦ ⇒ σˆ Ω 1 n i=2
=
1 n
n
(xi − θˆ Ω xi−1 )2
since x0 ≡ 0.
i=1
So, −n/2 2 e−n/2 . Lˆ Ω = f∗ ˆ 2 2 = (2π)−n/2 σˆ Ω θ=θΩ ,σ =ˆσΩ And, in the restricted parameter space ω (i.e., where θ = 0), n n n 1 2 xi , ln f∗θ=0 = − ln(2π) − ln σ2 − 2 2 2 2σ i=1
n 1 2 −n + xi = 0 = ∂(σ2 ) 2σ2 2σ4
∂ ln f∗
i=1
2 = ⇒ σˆ ω
n xi2
i=1
n
,
374
Hypothesis Testing Theory
so that −n/2 2 Lˆ ω = f∗θ=0,σ2 =ˆσ2 = (2π)−n/2 σˆ ω e−n/2 . ω Thus, λˆ =
Lˆ ω = Lˆ Ω
σˆ 2 ⇒ λˆ 2/n = Ω = 2 σˆ ω
=
2 σˆ Ω
n/2
2 σˆ ω n
2 ˆ i=1 (xi − θΩ xi−1 ) n 2 i=1 xi
n 2 xi2 − 2θˆ Ω ni=1 xi−1 xi + θˆ 2Ω ni=1 xi−1
i=1
n
2 i=1 xi
2 = n−1 x2 since x ≡ 0. Note that ni=1 xi−1 xi = ni=2 xi−1 xi and ni=1 xi−1 0 i=1 i Thus, we have ⎧ % &2 > n−1 2 %n &2 > n−1 2 ⎫ n ⎨2 ⎬ xi−1 xi x x x − i−1 i i=2 i=2 i=1 i i=1 xi λˆ 2/n = 1 − n ⎩ ⎭ x2 i=1 i
&2
%n
xi−1 xi . = 1 − i=2 n−1 2 n 2 x x i=1 i i=1 i For the given data, 2
λˆ 30 = 1 −
(4)2 = 0.9030 ⇒ λˆ = (0.9030)15 = 0.2164 (15 − 4)(15)
⇒ −2 ln λˆ = 3.0610. Since χ21,0.95 = 3.84, these data do not provide sufficient evidence to reject H0 : θ = 0. Solution 5.24∗ (a) Now, with x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , ym ), we have L(x, y; θr , θu ) =
⎧ n ⎨ ⎩
θr xiθr −1
i=1
⎛ = θnr ⎝
n i=1
⎫⎧ m ⎬ ⎨ ⎭⎩
θu yiθu −1
⎫ ⎬
i=1
⎞θr −1 xi ⎠
⎛ ⎝ θm u
m i=1
⎭
⎞θu −1 yi ⎠
.
375
Solutions
So, by the Factorization Theorem, and θu .
n
i=1 Xi and
m
i=1 Yi are jointly sufficient for θr
(b) From part (a), the unrestricted log likelihood is ln LΩ = n ln θr + (θr − 1)
n
ln xi + m ln θu + (θu − 1)
i=1
m
ln yi .
i=1
So, n ∂ ln LΩ = + ln xi = 0 ∂θr θr n
i=1
gives ⎛ ⎞−1 n θˆ r = −n ⎝ ln xi ⎠ . i=1
Similarly, by symmetry, ∂ ln LΩ m = + ln yi = 0 ∂θu θu m
i=1
gives ⎛ ⎞−1 m θˆ u = −m ⎝ ln yi ⎠ . i=1
So, ⎛ Lˆ Ω = θˆ nr ⎝
n
⎞θˆ r −1
⎛ ⎝ θˆ m u
xi ⎠
i=1
⎞θˆ u −1
m
yi ⎠
.
i=1
Now, under H0 : θr = θu (= θ, say), the restricted log likelihood is ⎛ ⎞ n m ln xi + ln yi ⎠ , ln Lω = (n + m) ln θ + (θ − 1) ⎝ i=1
i=1
so that ⎛ ⎞ n m (n + m) ⎝ ∂ ln Lω = ln xi + ln yi ⎠ = 0 + ∂θ θ i=1
i=1
376
Hypothesis Testing Theory
gives ⎛ ⎞−1 n m θˆ = −(n + m) ⎝ ln xi + ln yi ⎠ . i=1
i=1
So, ⎛
⎞θ−1 ˆ
n m Lˆ ω = θˆ (n+m) ⎝ xi · yi ⎠ i=1 i=1
.
So, λˆ =
%n &θ−1 m ˆ θˆ (n+m) Lˆ ω i=1 xi · i=1 yi = %n &θˆ r −1 %m &θˆ u −1 Lˆ Ω θˆ nr θˆ m u i=1 xi i=1 yi
= =
θˆ θˆ r
n
θˆ θˆ u
⎞θ− ˆ θˆ r ⎛ ˆ θˆ u m ⎛ n ⎞θ− m ⎝ xi ⎠ ⎝ yi ⎠ i=1
i=1
⎛ ⎞θ− ⎞θ− ˆ θˆ r ⎛ ˆ θˆ u n m n+m n n n+m m m ⎝ yi ⎠ W (1 − W) ⎝ xi ⎠ . n m i=1
i=1
Thus, ln λˆ = n ln
n+m n
+ (θˆ − θˆ r ) = n ln
+ m ln
n
+ n ln W + m ln(1 − W)
ln xi + (θˆ − θˆ u )
i=1
n+m n
n+m m
+ m ln
m
ln yi
i=1
n+m m
+ n ln W + m ln(1 − W)
− (n + m)W + n − (n + m)(1 − W) + m n+m n+m + m ln + ln[W n (1 − W)m ]. = n ln n m Finally, −2 ln λˆ = −2n ln
n+m n
− 2m ln
n+m m
− 2 ln[W n (1 − W)m ].
˙ 21 for large n and m. Since 0 < W < 1, Under H0 : θr = θu , we know that −2 ln λˆ ∼χ −2 ln λˆ will be large (and hence favor rejecting H0 ) when either W is close to 0 or W is close to 1.
377
Solutions
(c) Under H0 : θr = θu (= θ, say) fX (x; θ) = θxθ−1 , 0 < x < 1, and fY (y; θ) = θyθ−1 , 0 < y < 1. Now, let U = − ln X, so that X = e−U and dX = −e−U dU. Hence, fU (u; θ) = θ(e−u )θ−1 e−u = θe−θu ,
0 < u < ∞,
so that U = − ln X ∼ GAMMA(α = θ−1 , β = 1). Thus, n
(− ln Xi ) = −
i=1
n
ln Xi ∼ GAMMA(α = θ−1 , β = n).
i=1
Analogously, m m (− ln Yi ) = − ln Yi ∼ GAMMA(α = θ−1 , β = m). i=1
i=1
Thus, n
ln Xi − ni=1 ln Xi R i=1 m m W = n = n = (R + S) ln X + ln Y − ln X − ln Y i i i i i=1 i=1 i=1 i=1 where R ∼ GAMMA(α = θ−1 , β = n), S ∼ GAMMA(α = θ−1 , β = m), and R and S are independent random variables. So, fR,S (r, s; θ) =
θn rn−1 e−θr Γ(n)
θm sm−1 e−θs Γ(m)
= θ(n+m) rn−1 sm−1 e−θ(r+s) /Γ(n)Γ(m),
r > 0, s > 0.
So, let W = R/(R + S) and P = (R + S); hence, R = PW and S = (P − PW) = P(1 − W). Clearly, 0 < W < 1 and 0 < P < +∞. Also, ∂R J = ∂P ∂S ∂P
∂R ∂W = W ∂S (1 − W) ∂W
P = −P, −P
so that J = P. Finally, θ(n+m) (pw)n−1 [p(1 − w)]m−1 e−θ[pw+p(1−w)] (p) Γ(n)Γ(m)
n+m (n+m)−1 −θp θ p e Γ(n + m) n−1 m−1 (1 − w) w , = Γ(n)Γ(m) Γ(n + m)
fW,P (w, p; θ) =
378
Hypothesis Testing Theory
0 < w < 1, 0 < p < ∞. So, W ∼ BETA(α = n, β = m), P ∼ GAMMA(α = θ−1 , β = n + m), and W and P are independent random variables. When n = m = 2, fW (w) =
Γ(4) w(1 − w) = 6w(1 − w), Γ(2)Γ(2)
0 < w < 1,
when H0 : θr = θu is true. So, we want to choose k.05 such that k.05 0
6t(1 − t)dt = 0.05,
2 − 2k 3 ) = 0.05, or (by trialanderror) k ˙ 0.135. So, for n = m = 2, reject or (3k.05 .05 = .05 H0 : θr = θu when either W < 0.135 or W > 0.865 for α = 0.10.
Solution 5.25∗ (a) For the unrestricted parameter space, LΩ =
n 2 k+yij −1 yij Ck−1 θi (1 + θi )−(k+yij ) , i=1 j=1
and ln LΩ =
n 2 k+yij −1 ln Ck−1 + yij ln θi − (k + yij ) ln(1 + θi ) , i=1 j=1
so that n (k + yij ) yij n¯y n(k + y¯ i ) ∂ ln LΩ = − = i − = 0, ∂θi θi (1 + θi ) θi (1 + θi ) j=1
where y¯ i = n−1
n
yij .
j=1
Thus, y¯ θˆ i = i = k
n
j=1 yij
nk
,
i = 1, 2.
So, Lˆ Ω =
n 2 k+yij −1 yij θˆ i (1 + θˆ i )−(k+yij ) . Ck−1 i=1 j=1
379
Solutions
For the restricted parameter space, Lω =
n 2 k+yij −1 yij Ck−1 θ (1 + θ)−(k+yij ) i=1 j=1
⎞ n 2 k+yij −1 ⎠ θs (1 + θ)−(2nk+s) , Ck−1 =⎝ ⎛
where s =
i=1 j=1
n 2
yij .
i=1 j=1
So, ∂Lω s (2nk + s) = − =0 ∂θ θ (1 + θ)
gives
y¯ θˆ = , k
where y¯ =
1 (¯y + y¯ 2 ). 2 1
Thus, ⎛
⎞ n 2 k+yij −1 ˆ −(2nk+s) . ⎠ θˆ s (1 + θ) Lˆ ω = ⎝ Ck−1 i=1 j=1
Hence,
⎧ ⎨
⎫ ⎬ ˆ −(2nk+s) θˆ s (1 + θ) −2 ln λˆ = −2 ln y n ˆ ij ⎩ 2 ˆ −(k+yij ) ⎭ i=1 j=1 θi (1 + θi ) ⎧ ⎨ ˆ = −2 s ln θˆ − (2nk + s) ln(1 + θ) ⎩ −
2 i=1
⎫ ⎬ [n¯yi ln θˆ i − n(k + y¯ i ) ln(1 + θˆ i )] . ⎭
5 = 0.10, y¯ = 10 = 0.20, θˆ = y¯ = (¯y1 +¯y2 ) = Now, s = 15, n = 50, k = 3, y¯ 1 = 50 2 50 k 2k 0.10+0.20 = 0.05, θˆ = 0.10 = 0.0333, and θˆ = 0.20 = 0.0667. 1 2 2(3)
3
3
˙ 0.22. So, −2 ln λˆ = 1.62. Since χ21,0.95 = 3.841, we do not reject H0 ; the Pvalue = (b) From part (a), ⎡
⎤ ⎡ ⎤ ∂ ln LΩ n¯y1 n(k + y¯ 1 ) − ⎢ ∂θ ⎥ ⎢ θ 1 + θ1 ⎥ 1 ⎥ ⎢ ⎢ 1 ⎥ S (θ1 , θ2 ) = ⎢ ⎥=⎢ ⎥. ⎣ ∂ ln LΩ ⎦ ⎣ n¯y2 n(k + y¯ 2 ) ⎦ − ∂θ2 θ2 1 + θ2 Under H0 : θ1 = θ2 (= θ, say), θˆ =
(¯y1 +¯y2 ) = 0.05. So, 2k
⎡ (50)(0.10) ⎢ ˆ θ) ˆ =⎢ S (θ, ⎣
0.05
−
(50)(0.20) − 0.05
(50)(3 + 0.10) ⎤ −47.6190 (1 + 0.05) ⎥ ⎥= . (50)(3 + 0.20) ⎦ +47.6190 (1 + 0.05)
380
Hypothesis Testing Theory
Now, ∂ 2 ln LΩ ∂θ21 ∂ 2 ln LΩ ∂θ22
= =
−n¯y1 θ21 −n¯y2 θ22
+
n(k + y¯ 1 ) , (1 + θ1 )2
+
n(k + y¯ 2 ) , (1 + θ2 )2
and ∂ 2 ln LΩ ∂ 2 ln LΩ = = 0. ∂θ1 ∂θ2 ∂θ2 ∂θ1 So, with y = (y11 , y12 , . . . , y1n ; y21 , y22 , . . . , y2n ), we have ⎡
⎤
n(k + y¯ 1 ) n¯y1 ⎢ θˆ 2 − (1 + θ) ˆ 2 ˆ =⎢ I(y; θ) ⎢ ⎣ 0
0
⎥ ⎥ ⎥ n(k + y¯ 2 ) ⎦ n¯y2 − ˆ 2 θˆ 2 (1 + θ)
⎡
(50)(0.10) (50)(3 + 0.10) ⎢ (0.05)2 − (1 + 0.05)2 ⎢ =⎢ ⎣ 0 =
1, 859.4104 0
0
3, 854.8753
⎤ 0
⎥ ⎥ ⎥ (50)(0.20) (50)(3.20) ⎦ − (0.05)2 (1 + 0.05)2
.
So, 2 2 ˆ (θ, ˆ θ) ˆ = (−47.6190) + (47.6190) = 1.81. ˆ θ)I ˆ −1 (y; θ)S Sˆ = S(θ, 1859.4104 3854.8753
Since χ21,0.95 = 3.84, we do not reject H0 ; the Pvalue = ˙ 0.18 . (c) With Xij = k + Yij , then xij −1 1 k θi xij −k pXij (xij ; θ) = Ck−1 , 1 + θi 1 + θi
xij = k, k + 1, . . . , ∞.
So, E(Yij ) = E(Xij ) − k = k(1 + θi ) − k = kθi and V(Yij ) = k
θi 1 + θi
(1 + θi )2 = kθi (1 + θi ).
381
Solutions
In general, 1
(Y¯ 1 − Y¯ 2 ) − k(θ1 − θ2 ) kθ1 (1 + θ1 ) kθ2 (1 + θ2 ) + n n
∼ ˙ N(0, 1)
for large n by the Central Limit Theorem. Under H0 : θ1 = θ2 (= θ, say), then (Y¯ 1 − Y¯ 2 ) − 0 ∼ ˙ N(0, 1) 1 2kθ(1 + θ) n
for large n.
Thus, via Slutsky’s Theorem, we could reject H0 : θ1 = θ2 (= θ, say) at the αlevel for large n when (Y¯ 1 − Y¯ 2 ) − 0 > Z1−α/2 . 2k θ(1 ˆ ˆ + θ) n Now, for large n, ⎫ ⎧ ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ Y¯ − Y¯ 2 = θ > Z POWER = ˙ pr 1 1 1−α/2 θ1 2 ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ 2kθ(1 + θ) n ⎫ ⎧ ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ (Y¯ − Y¯ ) 2 < − Z1−α/2 θ1 = θ2 = pr 1 1 ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ 2kθ(1 + θ) n ⎫ ⎧ ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ (Y¯ − Y¯ ) 1 2 > Z1−α/2 θ1 = θ2 . + pr 1 ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ 2kθ(1 + θ) n For θ1 = 2.0 and θ2 = 2.4, the contribution of the second term will be negligible. So, ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ⎨ (Y¯ − Y¯ ) ⎬ 1 2 POWER = ˙ pr 1 < − Z1−α/2 θ1 = 2.0, θ2 = 2.4 ⎪ ⎪ ⎪ ⎪ ⎩ 2kθ(1 + θ) ⎭ n ⎧ ⎪ ⎪ ⎨ (Y¯ − Y¯ ) − k(θ − θ ) 2 1 2 < = pr 1 1 ⎪ ⎪ ⎩ kθ1 (1 + θ1 ) + kθ2 (1 + θ2 ) n n
382
Hypothesis Testing Theory 1 ⎫ 2kθ(1 + θ) ⎪ ⎪ −Z1−α/2 − k(θ1 − θ2 ) ⎬ n = 2.0, θ = 2.4 θ 1 1 2 ⎪ kθ1 (1 + θ1 ) kθ2 (1 + θ2 ) ⎪ ⎭ + n n 1 ⎫ ⎧ 2kθ(1 + θ) ⎪ ⎪ ⎪ ⎪ − k(θ1 − θ2 ) −Z1−α/2 ⎬ ⎨ n θ1 = 2.0, θ2 = 2.4 , = pr Z < 1 ⎪ ⎪ kθ1 (1 + θ1 ) kθ2 (1 + θ2 ) ⎪ ⎪ ⎭ ⎩ + n n
where Z∼N(0, ˙ 1) for large n. So, with θ1 = 2.0, θ2 = 2.4, α = 0.05, k = 3, (1 − β) = 0.80, and θ = (θ1 + θ2 )/2 = 2.2, we require the smallest n (say, n∗ ) such that √ √ −1.96 2(3)(2.2)(1 + 2.2) − n(3)(2.0 − 2.4) ≥ 0.842, √ 3(2.0)(1 + 2.0) + 3(2.4)(1 + 2.4) giving n∗ = 231. Solution 5.26∗ (a) The multinomial likelihood function L is given by L==
n! y y y y π 11 π 10 π 01 π 00 , y11 !y10 !y01 !y00 ! 11 10 01 00
and so ln L ∝
1 1
yij ln πij .
i=0 j=0
To maximize ln L subject to the constraint (π11 + π10 + π01 + π00 ) = 1, we can use the method of Lagrange multipliers. Define
U=
1 1
⎛ yij ln πij + λ ⎝1 −
i=0 j=0
1 1
⎞ πij ⎠ .
i=0 j=0
The equations yij ∂U = − λ = 0, ∂πij πij
i = 0, 1 and j = 0, 1,
imply that y y y y11 = 10 = 01 = 00 = λ, π11 π10 π01 π00
383
Solutions
or equivalently, yij
= πij ,
λ
i = 0, 1 and j = 0, 1.
Additionally, 1 1
πij = 1 =⇒
i=0 j=0
1 1 y ij i=0 j=0
=⇒ λ =
λ
=1
1 1
yij = n.
i=0 j=0
Hence, π ˆ ij =
yij n
,
i = 0, 1 and j = 0, 1.
By the invariance property for MLEs, it follows that the MLE of δ is equal to (y − y01 ) ˆ 10 ) − (π ˆ 11 + π ˆ 01 ) = (π ˆ 10 − π ˆ 01 ) = 10 . δˆ = (π ˆ 11 + π n (b) Using the equality π00 = (1 − π11 − π10 − π01 ), the log likelihood function can be written as ln L ∝ y11 ln π11 + y10 ln π10 + y01 ln π01 + y00 ln(1 − π11 − π10 − π01 ). Now, y y00 ∂ ln L = 11 − , ∂π11 π11 (1 − π11 − π10 − π01 ) ∂ ln L y y00 = 10 − , ∂π10 π10 (1 − π11 − π10 − π01 ) and y y00 ∂ ln L = 01 − . ∂π01 π01 (1 − π11 − π10 − π01 ) So, for (i, j) equal to (1, 1), (1, 0), or (0, 1), we have ∂ 2 ln L ∂π2ij
yij y00 =− 2 − , (1 − π11 − π10 − π01 )2 πij
and hence ⎤ 2 ln L ∂ ⎦ ⎣ ⎡ −E
∂π2ij
=
n n + πij π00
.
384
Hypothesis Testing Theory
In addition, ∂ 2 ln L ∂ 2 ln L y00 ∂ 2 ln L = = =− , ∂π11 ∂π10 ∂π11 ∂π01 ∂π10 ∂π01 (1 − π11 − π10 − π01 )2 and so
∂ 2 ln L −E ∂π11 ∂π10
∂ 2 ln L = −E ∂π11 ∂π01
∂ 2 ln L = −E ∂π10 ∂π01
=
n . π00
Hence, with π = (π11 , π10 , π01 ), the expected Fisher information matrix I(π) is ⎡
1 1 ⎢ π11 + π00 ⎢ ⎢ 1 ⎢ I(π) = n ⎢ ⎢ π00 ⎢ ⎣ 1 π00
1 π00 1 1 + π10 π00 1 π00
1 π00 1 π00
⎤
⎥ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ 1 1 ⎦ + π01 π00
with π00 = (1 − π11 − π10 − π01 ). So, ⎡ π11 (1 − π11 ) 1 I −1 (π) = ⎣ −π11 π10 n −π11 π01
−π11 π10 π10 (1 − π10 ) −π10 π01
⎤ −π11 π01 −π10 π01 ⎦ . π01 (1 − π01 )
The null hypothesis of interest is H0 : R(π) ≡ R = (π10 − π01 ) = 0. Hence, with T(π) ≡ T = [0, 1, −1], Λ(π) ≡ Λ = TI −1 (π)T
=
π10 (1 − π10 ) π01 (1 − π01 ) 2π10 π01 + + n n n
=
(π10 + π01 ) (π10 − π01 )2 − . n n
ˆ takes the form So, the Wald test statistic W ˆ2 ˆ = R W ˆ Λ =
(π ˆ 10 − π ˆ 01 )2 (π ˆ 10 + π ˆ 01 ) (π ˆ 01 )2 ˆ −π − 10 n n
385
Solutions
=
=
(y10 − y01 )2 /n2 ; y01 y10 − y01 2 y10 + 2 − n n n2 n (y10 − y01 )2
(y − y01 )2 (y10 + y01 ) − 10 n
.
A simpler way to derive this test statistic is to note that ˆ2 ˆ = δ , W ˆ ˆ V(δ) ˆ δ) ˆ denotes the MLE of V(δ). ˆ Now, where V( ˆ =V V(δ)
(Y10 − Y01 ) n
V(Y10 ) + V(Y01 ) − 2cov(Y10 , Y01 ) n2 nπ10 (1 − π10 ) + nπ01 (1 − π01 ) + 2nπ10 π01 = n2
=
=
(π10 + π01 ) − (π10 − π01 )2 . n
By the invariance property, it follows that ˆ 01 ) − (π ˆ 10 − π ˆ 01 )2 ˆ 10 + π ˆ δ) ˆ = (π V( n (y10 + y01 ) (y10 − y01 )2 − n n2 . = n Finally, the Wald test statistic is given by ˆ 2 ˆ = (δ) W ˆ δ) ˆ V( (y10 − y01 )2 n2 ; = (y10 + y01 ) (y10 − y01 )2 − n n n2 =
(y10 − y01 )2 (y10 + y01 ) −
(y10 − y01 )2 n
.
386
Hypothesis Testing Theory
When y11 = 22, y10 = 3, y01 = 7, and y00 = 13, so that n = 45, the Wald test statistic is equal to ˆ = W
(3 − 7)2 (3 − 7)2 (3 + 7) − 45
= 1.6590.
An approximate Pvalue is ˙ 0.1977. Pvalue = pr χ21 > 1.6590H0 = (c) The score vector has the form S(π) = (s1 , s2 , s3 ), where s1 =
∂ ln L y y = 11 − 00 , ∂π11 π11 π00
s2 =
∂ ln L y y = 10 − 00 , ∂π10 π10 π00
s3 =
∂ ln L y y = 01 − 00 . ∂π01 π01 π00
and
Under H0 : π10 = π01 (= π, say), the restricted log likelihood is ln Lω ∝ y11 ln π11 + y10 ln π + y01 ln π + y00 ln π00 . Using the LaGrange multiplier method with U = y11 ln π11 + y10 ln π + y01 ln π + y00 ln π00 + λ(1 − π11 − 2π − π00 ), we have ∂U y = 11 − λ = 0, ∂π11 π11 ∂U (y + y01 ) = 10 − 2λ = 0, ∂π π and ∂U y = 00 − λ = 0. ∂π00 π00 Since λ = n, the restricted MLEs are y π ˆ ω11 = 11 , n (y10 + y01 ) (= π ˆ ω10 = π ˆ ω01 ), π ˆω = (2n)
387
Solutions
and y π ˆ ω00 = 00 . n % & % & ˆω = π Thus, with π ˆ ω11 , π ˆ ω10 , π ˆ ω01 = π ˆ ω11 , π ˆ ω, π ˆ ω , we have 1 −1 ˆ ω) = 3 Iˆ (π n ⎤ ⎡ −y11 (y10 + y01 ) −y11 (y10 + y01 ) y (n − y ) 11 ⎥ ⎢ 11 2 2 ⎥ ⎢ 2 ⎥ ⎢ −y11 (y10 + y01 ) (y10 + y01 )(n + y11 + y00 ) −(y + y ) 10 01 ⎥. ×⎢ ⎥ ⎢ 2 4 4 ⎥ ⎢ 2 ⎦ ⎣ −y (y + y ) −(y (y + y ) + y )(n + y + y ) 11 10 01 10 01 10 01 11 00 2 4 4 Now, ∂ ln L = 0, ∂π11 π=πˆ ω y − y01 ∂ ln L = n 10 ∂π10 π=πˆ ω y10 + y01 and y − y10 ∂ ln L = n 01 . y10 + y01 ∂π01 π=πˆ ω So, " # ˆ ω ) = 0, sˆ, −ˆs , S(π where sˆ = n
y10 − y01 y10 + y01
.
Finally, it can be shown with some algebra that ˆ ω )Iˆ S(π
−1
ˆ ω )S (π ˆ ω ) = QM . (π
When comparing the Wald and score test statistics, the nonnegative numerators of the two test statistics are identical. Since the nonnegative denominator of the score statistic is always at least as large as the denominator of the Wald statistic, it follows that the Wald statistic will always be at least as large in value as the score statistic.
388
Hypothesis Testing Theory
(d) Since χ21,0.95 = 3.84, H0 will be rejected if QM > 3.84 and will not be rejected if QM ≤ 3.84. Let Q(y10 ; 10) denote the value of QM when Y10 = y10 and when (y10 + y01 ) = 10. Note that Q(0; 10) = Q(10; 10) = 10.0; Q(1; 10) = Q(9; 10) = 6.4; Q(2; 10) = Q(8; 10) = 3.6; Q(3; 10) = Q(7; 10) = 1.6; Q(4; 10) = Q(6; 10) = 0.4; Q(5; 10) = 0.0. Thus, the null hypothesis will be rejected if Y10 takes any of the four values 0,1,9, or 10, and will not be rejected otherwise. For each randomly selected subject who has a discordant response pattern [i.e., (0,1) or (1,0)], the conditional probability of a (1,0) response [given that the response is either (1,0) or (1,0)] is equal to π10 /(π10 + π01 ). This probability remains constant and does not depend on the number of subjects who have a concordant [(0,0) or (1,1)] response, and so the binomial distribution applies. Under the assumption that π10 = 0.10 and π10 = 0.05, the probability of rejecting the null hypothesis is equal to POWER =
y∈{0,1,9,10}
C10 y
y 10−y 0.10 0.05 0.10 + 0.05 0.10 + 0.05
= 0.0000169 + 0.000339 + 0.0867 + 0.01734 = 0.1044. Thus, there is roughly a 10% chance that the null hypothesis will be rejected. A larger sample size is needed in order to achieve reasonable power for testing H0 : δ = 0 versus H1 : δ = 0 when π10 = 0.10 and δ = 0.05.
Appendix Useful Mathematical Results
A.1
Summations
a. Binomial:
n j=0
b. Geometric: ∞ i. rj =
1 , 1−r
r < 1.
rj =
r , 1−r
r < 1.
rj =
1 − r(n+1) , 1−r
j=0
ii.
∞ j=1
iii.
n
Cnj aj b(n−j) = (a + b)n ,
j=0
c. Negative Binomial:
∞ j=0
where Cnj =
n! j!(n−j)! .
−∞ < r < +∞, r = 0. j+k
Ck πj = (1 − π)−(k+1) ,
k a positive integer. ∞ j x x d. Exponential: j! = e ,
0 < π < 1,
−∞ < x < +∞.
j=0
e. Sums of Integers: n n(n + 1) i. i= . 2 i=1 n n(n + 1)(2n + 1) ii. i2 = . 6 i=1
n n(n + 1) 2 3 i = . iii. 2 i=1
A.2
Limits
a n a. limn→∞ 1 + = ea , n
−∞ < a < +∞. 389
390
A.3
Appendix: Useful Mathematical Results
Important CalculusBased Results
a. L’Hôpital’s Rule: For differentiable functions f(x) and g(x) and an “extended” real number c (i.e., c ∈ 1 or c = ±∞), suppose that limx→c f(x) = limx→c g(x) = 0, or that limx→c f(x) = limx→c g(x) = ±∞. Suppose also that limx→c f (x)/g (x) exists [in particular, g (x) = 0 near c, except possibly at c]. Then, f(x) f (x) = lim . x→c g(x) x→c g (x) lim
L’Hôpital’s Rule is also valid for onesided limits. b. Integration by Parts: Let u = f(x) and v = g(x), with differentials du = f (x) dx and dv = g (x) dx. Then,
u dv = uv − v du.
c. Jacobians for One and TwoDimensional ChangeofVariable Transformations: Let X be a scalar variable with support A ⊆ 1 . Consider a onetoone transformation U = g(X) that maps A → B ⊆ 1 . Denote the inverse of U as X = h(U). Then, the corresponding onedimensional Jacobian of the transformation is defined as J= so that
A
d[h(U)] , dU
f(X) dX = f[h(U)]J dU. B
Similarly, consider scalar variables X and Y defined on a twodimensional set A ⊆ 2 , and let U = g1 (X, Y) and V = g2 (X, Y) define a onetoone transformation that maps A in the xyplane to B ⊆ 2 in the uvplane. Define X = h1 (U, V) and Y = h2 (U, V). Then, the Jacobian of the (twodimensional) transformation is given by the secondorder determinant ∂h1 (U, V) ∂h1 (U, V) ∂U ∂V J= , ∂h (U, V) ∂h (U, V) 2 2 ∂U ∂V
391
Special Functions
so that
f(X, Y) dX dY = B
A
A.4
f[h1 (U, V), h2 (U, V)]JdU dV.
Special Functions
a. Gamma Function: i. For any real number t > 0, the Gamma function is defined as Γ(t) =
∞
yt−1 e−y dy.
0
ii. For any real number t > 0, Γ(t + 1) = tΓ(t). iii. For any positive integer n, Γ(n) = (n − 1)!. √ √ √ iv. Γ(1/2) = π; Γ(3/2) = π/2; Γ(5/2) = (3 π)/4. b. Beta Function: i. For α > 0 and β > 0, the Beta function is defined as B(α, β) =
1 0
yα−1 (1 − y)β−1 dy.
Γ(α)Γ(β) . Γ(α + β) c. Convex and Concave Functions: A realvalued function f(·) is said to be convex if, for any two points x and y in its domain and any t ∈ [0, 1], we have ii. B(α, β) =
f[tx + (1 − t)y] ≤ tf(x) + (1 − t)f(y). Likewise, f(·) is said to be concave if f[tx + (1 − t)y] ≥ tf(x) + (1 − t)f(y). Also, f(x) is concave on [a, b] if and only if −f(x) is convex on [a, b].
A.5
Approximations
a. Stirling’s Approximation: √ % &n For n a nonnegative integer, n! ≈ 2πn ne .
392
Appendix: Useful Mathematical Results
b. Taylor Series Approximations: i. Univariate Taylor Series: If f(x) is a realvalued function of x that is infinitely differentiable in a neighborhood of a real number a, then a Taylor series expansion of f(x) around a is equal to f(x) =
∞ (k) f (a)
k!
k=0
where
f (a) = (k)
dk f(x) dxk
(x − a)k ,
,
k = 0, 1, . . . , ∞.
x=a
When a = 0, the infinite series expansion above is called a Maclaurin series. As examples, a firstorder (or linear) Taylor series approximation to f(x) around the real number a is equal to
df(x) f(x) ≈ f(a) + (x − a), dx x=a and a secondorder Taylor series approximation to f(x) around the real number a is equal to
df(x) 1 d2 f(x) f(x) ≈ f(a) + (x − a) + (x − a)2 . dx x=a 2! dx2 x=a
ii. Multivariate Taylor series: For p ≥ 2, if f(x1 , x2 , . . . , xp ) is a realvalued function of x1 , x2 , . . . , xp that is infinitely differentiable in a neighborhood of (a1 , a2 , . . . , ap ), where ai , i = 1, 2, . . . , p, is a real number, then a multivariate Taylor series expansion of f(x1 , x2 , . . . , xp ) around (a1 , a2 , . . . , ap ) is equal to f(x1 , x2 , . . . , xp ) =
∞ ∞ k1 =0 k2 =0
×
p
···
∞ (k1 +k2 +···+kp ) f (a1 , a2 , . . . , ap ) k1 !k2 ! · · · kp !
kp =0
(xi − ai )ki ,
i=1
where f(k1 +k2 +···+kp ) (a1 , a2 , . . . , ap ) ⎡ ⎤ ∂ (k1 +k2 +···+kp ) f(x1 , x2 , . . . , xp ) ⎦ =⎣ kp k k ∂x11 ∂x22 · · · ∂xp (x
1 ,x2 ,...,xp )=(a1 ,a2 ,...,ap )
.
393
Lagrange Multipliers
As examples, when p = 2, a firstorder (or linear) multivariate Taylor series approximation to f(x1 , x2 ) around (a1 , a2 ) is equal to f(x1 , x2 ) ≈ f (a1 , a2 ) +
2 ∂f(x1 , x2 ) i=1
∂xi
(x1 ,x2 )=(a1 ,a2 )
(xi − ai ),
and a secondorder multivariate Taylor series approximation to f(x1 , x2 ) around (a1 , a2 ) is equal to f(x1 , x2 ) ≈ f(a1 , a2 ) +
2 ∂f(x1 , x2 ) i=1
∂xi
(x1 ,x2 )=(a1 ,a2 )
(xi − ai )
2 1 ∂ 2 f(x1 , x2 ) + (xi − ai )2 2 2! ∂x i i=1 (x1 ,x2 )=(a1 ,a2 ) ∂ 2 f(x1 , x2 ) + (x1 − a1 )(x2 − a2 ). ∂x1 ∂x2 (x1 ,x2 )=(a1 ,a2 )
A.6
Lagrange Multipliers
The method of Lagrange multipliers provides a strategy for finding stationary points x∗ of a differentiable function f(x) subject to the constraint g(x) = c, where x = (x1 , x2 , . . . , xp ) , where g(x) = [g1 (x), g2 (x), . . . , gm (x)] is a set of m(< p) constraining functions, and where c = (c1 , c2 , . . . , cm ) is a vector of known constants. The stationary points x∗ = (x1∗ , x2∗ , . . . , xp∗ ) can be (local) maxima, (local) minima, or saddle points. The Lagrange multiplier method involves consideration of the Lagrange function " #
Λ(x, λ) = f(x) − g(x) − c λ, where λ = (λ1 , λ2 , . . . , λm ) is a vector of scalars called “Lagrange multipliers.” In particular, the stationary points x∗ are obtained as the solutions for x using the (p + m) equations ∂Λ(x, λ) ∂f(x) = − ∂x ∂x and
. " # / ∂ g(x) − c λ=0 ∂x
" # ∂Λ(x, λ) = − g(x) − c = 0, ∂λ
394
Appendix: Useful Mathematical Results
where ∂f(x)/∂x is a (p × 1) column vector with ith element equal to ∂f(x)/∂xi , i = 1, 2, . . . , p, where ∂[g(x) − c] /∂x is a (p × m) matrix with (i, j)th element equal to ∂gj (x)/∂xi , i = 1, 2, . . . , p and j = 1, 2, . . . , m, and where 0 denotes a column vector of zeros. Note that the second matrix equation gives g(x) = c. As an example, consider the problem of finding the stationary points (x∗ , y∗ ) of the function f(x, y) = (x2 + y2 ) subject to the constraint g(x, y) = g1 (x, y) = (x + y) = 1. Here, p = 2, m = 1, and the Lagrange multiplier function is given by Λ(x, y, λ) = (x2 + y2 ) − λ(x + y − 1). The stationary points (x∗ , y∗ ) are obtained by solving the system of equations ∂Λ(x, y, λ) = 2x − λ = 0, ∂x ∂Λ(x, y, λ) = 2y − λ = 0, ∂y ∂Λ(x, y, λ) = x + y − 1 = 0. ∂λ Solving these three equations yields the solution x∗ = y∗ = 1/2. Since ∂Λ2 (x, y, λ) ∂Λ2 (x, y, λ) = >0 ∂x2 ∂y2
and
∂Λ2 (x, y, λ) = 0, ∂x∂y
this solution yields a minimum subject to the constraint x + y = 1.
References
Berkson J. 1950. “Are there two regressions?,” Journal of the American Statistical Association, 45(250), 164–180. Birkett NJ. 1988. “Evaluation of diagnostic tests with multiple diagnostic categories,” Journal of Clinical Epidemiology, 41(5), 491–494. Blackwell D. 1947. “Conditional expectation and unbiased sequential estimation,” Annals of Mathematical Statistics, 18, 105–110. Bondesson L. 1983. “On uniformly minimum variance unbiased estimation when no complete sufficient statistics exist,” Metrika, 30, 49–54. Breslow NE and Day NE. 1980. Statistical Methods in Cancer Research, Volume I: The Analysis of Case–Control Studies, International Agency for Research on Cancer (IARC) Scientific Publications. Casella G and Berger RL. 2002. Statistical Inference, Second Edition, Duxbury, Thomson Learning, Belmont, CA. Cram´er H. 1946. Mathematical Methods of Statistics, Princeton University Press, Princeton, NJ. Dempster AP, Laird NM, and Rubin DB. 1977. “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Series B, 39, 1–22. Feller W. 1968. An Introduction to Probability Theory and Its Applications, Volume I, Third Edition, John Wiley and Sons, Inc., Hoboken, NJ. Fuller WA. 2006. Measurement Error Models, paperback, John Wiley and Sons, Inc., Hoboken, NJ. Gibbs DA, Martin SL, Kupper LL, and Johnson RE. 2007. “Child maltreatment in enlisted soldiers’ families during combatrelated deployments,” Journal of the American Medical Association, 298(5), 528–535. Gustafson P. 2004. Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments, Chapman & Hall/CRC Press, London, UK. Halmos PR and Savage LJ. 1949. “Applications of the RadonNikodym theorem to the theory of sufficient statistics,” Annals of Mathematical Statistics, 20, 225–241. Hogg RV, Craig AT, and McKean JW. 2005. Introduction to Mathematical Statistics, Sixth Edition, PrenticeHall, Upper Saddle River, NJ. Hosmer DW and Lemeshow S. 2000. Applied Logistic Regression, Second Edition, John Wiley and Sons, Inc., Hoboken, NJ. Hosmer DW, Lemeshow S, and May S. 2008. Applied Survival Analysis: Regression Modeling of Time to Event Data, Second Edition, John Wiley and Sons, Inc., Hoboken, NJ. Houck N, Weller E, Milton DK, Gold DR, Ruifeng L, and Spiegelman D. 2006. “Home endotoxin exposure and wheeze in infants: correction for bias due to exposure measurement error,” Environmental Health Perspectives, 114(1), 135–140. 395
396
References
Kalbfleisch JG. 1985. Probability and Statistical Inference, Volume 1: Probability, Second Edition, Springer, New York, NY. Kalbfleisch JG. 1985. Probability and Statistical Inference, Volume 2: Statistical Inference, Second Edition, Springer, New York, NY. Kass RE and Raftery AE. 1995. “Bayes factors,” Journal of the American Statistical Association, 90, 773–795. Kleinbaum DG and Klein M. 2002. Logistic Regression: A SelfLearning Text, Second Edition, Springer, New York, NY. Kleinbaum DG and Klein M. 2005. Survival Analysis: A SelfLearning Text, Second Edition, Springer, New York, NY. Kleinbaum DG, Kupper LL, and Morgenstern H. 1982. Epidemiologic Research: Principles and Quantitative Methods, John Wiley and Sons, Inc., Hoboken, NJ. Kleinbaum DG, Kupper LL, Nizam A, and Muller KE. 2008. Applied Regression Analysis and Other Multivariable Methods, Fourth Edition, Duxbury Press, Belmont, CA. Kupper LL. 1984. “Effects of the use of unreliable surrogate variables on the validity of epidemiologic research studies,” American Journal of Epidemiology, 120(4), 643–648. Kupper LL and Hafner KB. 1989. “How appropriate are popular sample size formulas?,” The American Statistician, 43(2), 101–105. Kupper LL and Haseman JK. 1978. “The use of a correlated binomial model for the analysis of certain toxicological experiments,” Biometrics, 34, 69–76. Kutner MH, Nachtsheim CJ, and Neter J. 2004. Applied Linear Regression Models, Fourth Edition, McGrawHill/Irwin, Burr Ridge, IL. Lehmann EL. 1983. Theory of Point Estimation, Springer, New York, NY. Makri FS, Philippou AN, and Psillakis ZM. 2007. “Shortest and longest length of success runs in binary sequences,” Journal of Statistical Planning and Inference, 137, 2226–2239. McCullagh P and Nelder JA. 1989. Generalized Linear Models, Second Edition, Chapman & Hall/CRC Press, London, UK. Neyman J and Pearson ES. 1928. “On the use and interpretation of certain test criteria for purposes of statistical inference,” Biometrika, 20A, 175–240 and 263–294. Neyman J and Pearson ES. 1933. “On the problem of the most efficient tests of statistical hypotheses,” Philosophical Transactions, Series A, 231, 289–337. Rao CR. 1945. “Information and accuracy attainable in the estimation of statistical parameters,” Bulletin of the Calcutta Mathematical Society, 37, 81–91. Rao CR. 1947. “Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation,” Proceedings of the Cambridge Philosophical Society, 44, 50–57. Rao CR. 1973. Linear Statistical Inference and Its Applications, Second Edition. John Wiley and Sons, Inc., Hoboken, NJ. Ross S. 2006. A First Course in Probability, Seventh Edition, PrenticeHall, Inc., Upper Saddle River, NJ. SamuelCahn E. 1994. “Combining unbiased estimators,” The American Statistician, 48(1), 34–36. Serfling RJ. 2002. Approximation Theorems of Mathematical Statistics, John Wiley and Sons, Inc., Hoboken, NJ. Taylor DJ, Kupper LL, Rappaport SM, and Lyles RH. 2001. “A mixture model for occupational exposure mean testing with a limit of detection,” Biometrics, 57(3), 681–688.
References
397
Wackerly DD, Mendenhall III W, and Scheaffer RL. 2008. Mathematical Statistics With Applications, Seventh Edition, Duxbury, Thomson Learning, Belmont, CA. Wald A. 1943. “Tests of statistical hypotheses concerning several parameters when the number of observations is large,” Transactions of the American Mathematical Society, 54, 426–482.
View more...
Comments